RepoFriendliAIFriendliAIpublished Jul 20, 2023seen 5d

friendliai/friendli-client

Python

Open original ↗

Captured source

source ↗
published Jul 20, 2023seen 5dcaptured 15hhttp 200method plain

friendliai/friendli-client

Description: [⛔️ DEPRECATED] Friendli: the fastest serving engine for generative AI

Language: Python

License: Apache-2.0

Stars: 50

Forks: 7

Open issues: 3

Created: 2023-07-20T12:57:24Z

Pushed: 2025-06-25T05:46:33Z

Default branch: main

Fork: no

Archived: yes

README:

DEPRECATED ![No Maintenance Intended](http://unmaintained.tech/)

This is no longer supported, please consider using Friendli Python SDK instead.

---

Supercharge Generative AI Serving with Friendli 🚀

The Friendli Client offers convenient interface to interact with endpoint services provided by Friendli Suite, the ultimate solution for serving generative AI models. Designed for flexibility and performance, it supports both synchronous and asynchronous operations, making it easy to integrate powerful AI capabilities into your applications.

Installation

To get started with Friendli, install the client package using pip:

pip install friendli-client

> [!IMPORTANT] > You must set FRIENDLI_TOKEN environment variable before initializing the client instance with client = Friendli(). > Alternatively, you can provide the value of your personal access token as the token argument when creating the client, like so: > > ``python > from friendli import Friendli > > client = Friendli(token="YOUR PERSONAL ACCESS TOKEN") >

Friendli Serverless Endpoints

Friendli Serverless Endpoint offer a simple, click-and-play interface for accessing popular open-source models like Llama 3.1. With pay-per-token billing, this is ideal for exploration and experimentation.

To interact with models hosted by serverless endpoints, provide the model code you want to use in the model argument. Refer to the pricing table for a list of available model codes and their pricing.

from friendli import Friendli

client = Friendli()

chat_completion = client.chat.completions.create(
model="meta-llama-3.1-8b-instruct",
messages=[
{
"role": "user",
"content": "Tell me how to make a delicious pancake",
}
],
)
print(chat_completion.choices[0].message.content)

Friendli Dedicated Endpoints

Friendli Dedicated Endpoints enable you to run your custom generative AI models on dedicated GPU resources.

To interact with dedicated endpoints, provide the endpoint ID in the model argument.

import os
from friendli import Friendli

client = Friendli(
team_id=os.environ["TEAM_ID"], # If not provided, default team is used.
use_dedicated_endpoint=True,
)

chat_completion = client.chat.completions.create(
model=os.environ["ENDPOINT_ID"],
messages=[
{
"role": "user",
"content": "Tell me how to make a delicious pancake",
}
],
)
print(chat_completion.choices[0].message.content)

Friendli Container

Friendli Container is perfect for users who prefer to serve LLMs within their own infrastructure. By deploying the Friendli Engine in containers on your on-premise or cloud GPUs, you can maintain complete control over your data and operations, ensuring security and compliance with internal policies.

from friendli import Friendli

client = Friendli(base_url="http://0.0.0.0:8000")

chat_completion = client.chat.completions.create(
messages=[
{
"role": "user",
"content": "Tell me how to make a delicious pancake",
}
],
)
print(chat_completion.choices[0].message.content)

Async Usage

import asyncio
from friendli import AsyncFriendli

client = AsyncFriendli()

async def main() -> None:
chat_completion = await client.chat.completions.create(
model="meta-llama-3.1-8b-instruct",
messages=[
{
"role": "user",
"content": "Tell me how to make a delicious pancake",
}
],
)
print(chat_completion.choices[0].message.content)

asyncio.run(main())

Streaming Usage

from friendli import Friendli

client = Friendli()

stream = client.chat.completions.create(
model="meta-llama-3.1-8b-instruct",
messages=[
{
"role": "user",
"content": "Tell me how to make a delicious pancake",
}
],
stream=True,
)
for chunk in stream:
print(chunk.choices[0].delta.content or "", end="", flush=True)

The async client (AsyncFriendli) uses the same interface to stream the response.

import asyncio
from friendli import AsyncFriendli

client = AsyncFriendli()

async def main() -> None:
stream = await client.chat.completions.create(
model="meta-llama-3.1-8b-instruct",
messages=[
{
"role": "user",
"content": "Tell me how to make a delicious pancake",
}
],
stream=True,
)
async for chunk in stream:
print(chunk.choices[0].delta.content or "", end="", flush=True)

asyncio.run(main())

Advanced Usage

Sending Requests to LoRA Adapters

If your endpoint is serving a Multi-LoRA model, you can send request to one of the adapters by providing the adapter route in the model argument.

For Friendli Dedicated Endpoints, provide the endpoint ID and the adapter route separated by a colon (:).

import os
from friendli import Friendli

client = Friendli(
team_id=os.environ["TEAM_ID"], # If not provided, default team is used.
use_dedicated_endpoint=True,
)

chat_completion = client.lora.completions.create(
model=f"{os.environ['ENDPOINT_ID']}:{os.environ['ADAPTER_ROUTE']}",
messages=[
{
"role": "user",
"content": "Tell me how to make a delicious pancake",
}
],
)

For Friendli Container, just provide the adapter name.

import os
from friendli import Friendli

client = Friendli(base_url="http://0.0.0.0:8000")

chat_completion = client.lora.completions.create(
model=os.environ["ADAPTER_NAME"],
messages=[
{
"role": "user",
"content": "Tell me how to make a delicious pancake",
}
],
)

Using the gRPC Interface

> [!IMPORTANT] > gRPC is only supported by Friendli Container, and only the streaming API of v1/completions is available.

When Frienldi Container is running in gPRC mode, the client can interact with the gRPC server by initializing it with use_grpc=True argument.

from friendli import Friendli

client = Friendli(base_url="0.0.0.0:8000", use_grpc=True)

stream = client.chat.completions.create(
messages=[
{
"role": "user",
"content": "Tell me how to make a delicious pancake",
}
],
stream=True, # Only streaming…

Excerpt shown — open the source for the full document.