friendliai/friendli-client
Python
Captured source
source ↗friendliai/friendli-client
Description: [⛔️ DEPRECATED] Friendli: the fastest serving engine for generative AI
Language: Python
License: Apache-2.0
Stars: 50
Forks: 7
Open issues: 3
Created: 2023-07-20T12:57:24Z
Pushed: 2025-06-25T05:46:33Z
Default branch: main
Fork: no
Archived: yes
README:
DEPRECATED 
This is no longer supported, please consider using Friendli Python SDK instead.
---
Supercharge Generative AI Serving with Friendli 🚀
The Friendli Client offers convenient interface to interact with endpoint services provided by Friendli Suite, the ultimate solution for serving generative AI models. Designed for flexibility and performance, it supports both synchronous and asynchronous operations, making it easy to integrate powerful AI capabilities into your applications.
Installation
To get started with Friendli, install the client package using pip:
pip install friendli-client
> [!IMPORTANT] > You must set FRIENDLI_TOKEN environment variable before initializing the client instance with client = Friendli(). > Alternatively, you can provide the value of your personal access token as the token argument when creating the client, like so: > > ``python > from friendli import Friendli > > client = Friendli(token="YOUR PERSONAL ACCESS TOKEN") >
Friendli Serverless Endpoints
Friendli Serverless Endpoint offer a simple, click-and-play interface for accessing popular open-source models like Llama 3.1. With pay-per-token billing, this is ideal for exploration and experimentation.
To interact with models hosted by serverless endpoints, provide the model code you want to use in the model argument. Refer to the pricing table for a list of available model codes and their pricing.
from friendli import Friendli
client = Friendli()
chat_completion = client.chat.completions.create(
model="meta-llama-3.1-8b-instruct",
messages=[
{
"role": "user",
"content": "Tell me how to make a delicious pancake",
}
],
)
print(chat_completion.choices[0].message.content)Friendli Dedicated Endpoints
Friendli Dedicated Endpoints enable you to run your custom generative AI models on dedicated GPU resources.
To interact with dedicated endpoints, provide the endpoint ID in the model argument.
import os
from friendli import Friendli
client = Friendli(
team_id=os.environ["TEAM_ID"], # If not provided, default team is used.
use_dedicated_endpoint=True,
)
chat_completion = client.chat.completions.create(
model=os.environ["ENDPOINT_ID"],
messages=[
{
"role": "user",
"content": "Tell me how to make a delicious pancake",
}
],
)
print(chat_completion.choices[0].message.content)Friendli Container
Friendli Container is perfect for users who prefer to serve LLMs within their own infrastructure. By deploying the Friendli Engine in containers on your on-premise or cloud GPUs, you can maintain complete control over your data and operations, ensuring security and compliance with internal policies.
from friendli import Friendli
client = Friendli(base_url="http://0.0.0.0:8000")
chat_completion = client.chat.completions.create(
messages=[
{
"role": "user",
"content": "Tell me how to make a delicious pancake",
}
],
)
print(chat_completion.choices[0].message.content)Async Usage
import asyncio
from friendli import AsyncFriendli
client = AsyncFriendli()
async def main() -> None:
chat_completion = await client.chat.completions.create(
model="meta-llama-3.1-8b-instruct",
messages=[
{
"role": "user",
"content": "Tell me how to make a delicious pancake",
}
],
)
print(chat_completion.choices[0].message.content)
asyncio.run(main())Streaming Usage
from friendli import Friendli
client = Friendli()
stream = client.chat.completions.create(
model="meta-llama-3.1-8b-instruct",
messages=[
{
"role": "user",
"content": "Tell me how to make a delicious pancake",
}
],
stream=True,
)
for chunk in stream:
print(chunk.choices[0].delta.content or "", end="", flush=True)The async client (AsyncFriendli) uses the same interface to stream the response.
import asyncio
from friendli import AsyncFriendli
client = AsyncFriendli()
async def main() -> None:
stream = await client.chat.completions.create(
model="meta-llama-3.1-8b-instruct",
messages=[
{
"role": "user",
"content": "Tell me how to make a delicious pancake",
}
],
stream=True,
)
async for chunk in stream:
print(chunk.choices[0].delta.content or "", end="", flush=True)
asyncio.run(main())Advanced Usage
Sending Requests to LoRA Adapters
If your endpoint is serving a Multi-LoRA model, you can send request to one of the adapters by providing the adapter route in the model argument.
For Friendli Dedicated Endpoints, provide the endpoint ID and the adapter route separated by a colon (:).
import os
from friendli import Friendli
client = Friendli(
team_id=os.environ["TEAM_ID"], # If not provided, default team is used.
use_dedicated_endpoint=True,
)
chat_completion = client.lora.completions.create(
model=f"{os.environ['ENDPOINT_ID']}:{os.environ['ADAPTER_ROUTE']}",
messages=[
{
"role": "user",
"content": "Tell me how to make a delicious pancake",
}
],
)For Friendli Container, just provide the adapter name.
import os
from friendli import Friendli
client = Friendli(base_url="http://0.0.0.0:8000")
chat_completion = client.lora.completions.create(
model=os.environ["ADAPTER_NAME"],
messages=[
{
"role": "user",
"content": "Tell me how to make a delicious pancake",
}
],
)Using the gRPC Interface
> [!IMPORTANT] > gRPC is only supported by Friendli Container, and only the streaming API of v1/completions is available.
When Frienldi Container is running in gPRC mode, the client can interact with the gRPC server by initializing it with use_grpc=True argument.
from friendli import Friendli
client = Friendli(base_url="0.0.0.0:8000", use_grpc=True)
stream = client.chat.completions.create(
messages=[
{
"role": "user",
"content": "Tell me how to make a delicious pancake",
}
],
stream=True, # Only streaming…Excerpt shown — open the source for the full document.