ForkNous ResearchNous Researchpublished May 19, 2025seen 5d

NousResearch/litellm

forked from BerriAI/litellm

Open original ↗

Captured source

source ↗
published May 19, 2025seen 5dcaptured 14hhttp 200method plain

NousResearch/litellm

Description: Python SDK, Proxy Server (LLM Gateway) to call 100+ LLM APIs in OpenAI format - [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, Replicate, Groq]

Language: Python

License: NOASSERTION

Stars: 9

Forks: 1

Open issues: 1

Created: 2025-05-19T00:38:12Z

Pushed: 2025-05-23T07:30:03Z

Default branch: main

Fork: yes

Parent repository: BerriAI/litellm

Archived: no

README:

🚅 LiteLLM

Call all LLM APIs using the OpenAI format [Bedrock, Huggingface, VertexAI, TogetherAI, Azure, OpenAI, Groq etc.]

LiteLLM Proxy Server (LLM Gateway) | Hosted Proxy (Preview) | Enterprise Tier

LiteLLM manages:

  • Translate inputs to provider's completion, embedding, and image_generation endpoints
  • Consistent output, text responses will always be available at ['choices'][0]['message']['content']
  • Retry/fallback logic across multiple deployments (e.g. Azure/OpenAI) - Router
  • Set Budgets & Rate limits per project, api key, model LiteLLM Proxy Server (LLM Gateway)

**Jump to LiteLLM Proxy (LLM Gateway) Docs**

**Jump to Supported LLM Providers**

🚨 Stable Release: Use docker images with the -stable tag. These have undergone 12 hour load tests, before being published. More information about the release cycle here

Support for more providers. Missing a provider or LLM Platform, raise a feature request.

Usage (**Docs**)

> [!IMPORTANT] > LiteLLM v1.0.0 now requires openai>=1.0.0. Migration guide here > LiteLLM v1.40.14+ now requires pydantic>=2.0.0. No changes required.

pip install litellm
from litellm import completion
import os

## set ENV variables
os.environ["OPENAI_API_KEY"] = "your-openai-key"
os.environ["ANTHROPIC_API_KEY"] = "your-anthropic-key"

messages = [{ "content": "Hello, how are you?","role": "user"}]

# openai call
response = completion(model="openai/gpt-4o", messages=messages)

# anthropic call
response = completion(model="anthropic/claude-3-sonnet-20240229", messages=messages)
print(response)

Response (OpenAI Format)

{
"id": "chatcmpl-565d891b-a42e-4c39-8d14-82a1f5208885",
"created": 1734366691,
"model": "claude-3-sonnet-20240229",
"object": "chat.completion",
"system_fingerprint": null,
"choices": [
{
"finish_reason": "stop",
"index": 0,
"message": {
"content": "Hello! As an AI language model, I don't have feelings, but I'm operating properly and ready to assist you with any questions or tasks you may have. How can I help you today?",
"role": "assistant",
"tool_calls": null,
"function_call": null
}
}
],
"usage": {
"completion_tokens": 43,
"prompt_tokens": 13,
"total_tokens": 56,
"completion_tokens_details": null,
"prompt_tokens_details": {
"audio_tokens": null,
"cached_tokens": 0
},
"cache_creation_input_tokens": 0,
"cache_read_input_tokens": 0
}
}

Call any model supported by a provider, with model=/. There might be provider-specific details here, so refer to provider docs for more information

Async (Docs)

from litellm import acompletion
import asyncio

async def test_get_response():
user_message = "Hello, how are you?"
messages = [{"content": user_message, "role": "user"}]
response = await acompletion(model="openai/gpt-4o", messages=messages)
return response

response = asyncio.run(test_get_response())
print(response)

Streaming (Docs)

liteLLM supports streaming the model response back, pass stream=True to get a streaming iterator in response. Streaming is supported for all models (Bedrock, Huggingface, TogetherAI, Azure, OpenAI, etc.)

from litellm import completion
response = completion(model="openai/gpt-4o", messages=messages, stream=True)
for part in response:
print(part.choices[0].delta.content or "")

# claude 2
response = completion('anthropic/claude-3-sonnet-20240229', messages, stream=True)
for part in response:
print(part)

Response chunk (OpenAI Format)

{
"id": "chatcmpl-2be06597-eb60-4c70-9ec5-8cd2ab1b4697",
"created": 1734366925,
"model": "claude-3-sonnet-20240229",
"object": "chat.completion.chunk",
"system_fingerprint": null,
"choices": [
{
"finish_reason": null,
"index": 0,
"delta": {
"content": "Hello",
"role": "assistant",
"function_call": null,
"tool_calls": null,
"audio": null
},
"logprobs": null
}
]
}

Logging Observability (Docs)

LiteLLM exposes pre defined callbacks to send data to Lunary, MLflow, Langfuse, DynamoDB, s3 Buckets, Helicone, Promptlayer, Traceloop, Athina, Slack

from litellm import completion

## set env variables for logging tools (when using MLflow, no API key set up is required)
os.environ["LUNARY_PUBLIC_KEY"] = "your-lunary-public-key"
os.environ["HELICONE_API_KEY"] = "your-helicone-auth-key"
os.environ["LANGFUSE_PUBLIC_KEY"] = ""
os.environ["LANGFUSE_SECRET_KEY"] = ""
os.environ["ATHINA_API_KEY"] = "your-athina-api-key"

os.environ["OPENAI_API_KEY"] = "your-openai-key"

# set callbacks
litellm.success_callback = ["lunary", "mlflow", "langfuse", "athina", "helicone"] # log input/output to lunary, langfuse, supabase, athina, helicone etc

#openai call
response = completion(model="openai/gpt-4o", messages=[{"role": "user", "content": "Hi 👋 - i'm openai"}])

LiteLLM Proxy Server (LLM Gateway) - (Docs)

Track spend + Load Balance across multiple projects

Hosted Proxy (Preview)

The proxy provides:

1. Hooks for auth 2. Hooks for logging 3. Cost tracking 4.…

Excerpt shown — open the source for the full document.

Notability

notability 2.0/10

Routine fork with low stars