RepoGroqGroqpublished Feb 14, 2024seen 5d

groq/groq-python

Python

Open original ↗

Captured source

source ↗
published Feb 14, 2024seen 5dcaptured 15hhttp 200method plain

groq/groq-python

Description: The official Python Library for the Groq API

Language: Python

License: Apache-2.0

Stars: 606

Forks: 59

Open issues: 5

Created: 2024-02-14T22:48:15Z

Pushed: 2026-06-03T14:31:01Z

Default branch: main

Fork: no

Archived: no

README:

Groq Python API library

The Groq Python library provides convenient access to the Groq REST API from any Python 3.10+ application. The library includes type definitions for all request params and response fields, and offers both synchronous and asynchronous clients powered by httpx.

It is generated with Stainless.

Documentation

The REST API documentation can be found on console.groq.com. The full API of this library can be found in [api.md](api.md).

Installation

# install from PyPI
pip install groq

Usage

The full API of this library can be found in [api.md](api.md).

import os
from groq import Groq

client = Groq(
api_key=os.environ.get("GROQ_API_KEY"), # This is the default and can be omitted
)

chat_completion = client.chat.completions.create(
messages=[
{
"role": "user",
"content": "Explain the importance of low latency LLMs",
}
],
model="openai/gpt-oss-20b",
)
print(chat_completion.choices[0].message.content)

While you can provide an api_key keyword argument, we recommend using python-dotenv to add GROQ_API_KEY="My API Key" to your .env file so that your API Key is not stored in source control.

Async usage

Simply import AsyncGroq instead of Groq and use await with each API call:

import os
import asyncio
from groq import AsyncGroq

client = AsyncGroq(
api_key=os.environ.get("GROQ_API_KEY"), # This is the default and can be omitted
)

async def main() -> None:
chat_completion = await client.chat.completions.create(
messages=[
{
"role": "user",
"content": "Explain the importance of low latency LLMs",
}
],
model="openai/gpt-oss-20b",
)
print(chat_completion.choices[0].message.content)

asyncio.run(main())

Functionality between the synchronous and asynchronous clients is otherwise identical.

With aiohttp

By default, the async client uses httpx for HTTP requests. However, for improved concurrency performance you may also use aiohttp as the HTTP backend.

You can enable this by installing aiohttp:

# install from PyPI
pip install groq[aiohttp]

Then you can enable it by instantiating the client with http_client=DefaultAioHttpClient():

import os
import asyncio
from groq import DefaultAioHttpClient
from groq import AsyncGroq

async def main() -> None:
async with AsyncGroq(
api_key=os.environ.get("GROQ_API_KEY"), # This is the default and can be omitted
http_client=DefaultAioHttpClient(),
) as client:
chat_completion = await client.chat.completions.create(
messages=[
{
"role": "user",
"content": "Explain the importance of low latency LLMs",
}
],
model="openai/gpt-oss-20b",
)
print(chat_completion.id)

asyncio.run(main())

Using types

Nested request parameters are TypedDicts. Responses are Pydantic models which also provide helper methods for things like:

  • Serializing back into JSON, model.to_json()
  • Converting to a dictionary, model.to_dict()

Typed requests and responses provide autocomplete and documentation within your editor. If you would like to see type errors in VS Code to help catch bugs earlier, set python.analysis.typeCheckingMode to basic.

Nested params

Nested parameters are dictionaries, typed using TypedDict, for example:

from groq import Groq

client = Groq()

chat_completion = client.chat.completions.create(
messages=[
{
"content": "string",
"role": "system",
}
],
model="meta-llama/llama-4-scout-17b-16e-instruct",
compound_custom={},
)
print(chat_completion.compound_custom)

File uploads

Request parameters that correspond to file uploads can be passed as bytes, or a `PathLike` instance or a tuple of (filename, contents, media type).

from pathlib import Path
from groq import Groq

client = Groq()

client.audio.transcriptions.create(
model="whisper-large-v3-turbo",
file=Path("/path/to/file"),
)

The async client uses the exact same interface. If you pass a `PathLike` instance, the file contents will be read asynchronously automatically.

Handling errors

When the library is unable to connect to the API (for example, due to network connection problems or a timeout), a subclass of groq.APIConnectionError is raised.

When the API returns a non-success status code (that is, 4xx or 5xx response), a subclass of groq.APIStatusError is raised, containing status_code and response properties.

All errors inherit from groq.APIError.

import groq
from groq import Groq

client = Groq()

try:
client.chat.completions.create(
messages=[
{
"role": "system",
"content": "You are a helpful assistant.",
},
{
"role": "user",
"content": "Explain the importance of low latency LLMs",
},
],
model="openai/gpt-oss-20b",
)
except groq.APIConnectionError as e:
print("The server could not be reached")
print(e.__cause__) # an underlying Exception, likely raised within httpx.
except groq.RateLimitError as e:
print("A 429 status code was received; we should back off a bit.")
except groq.APIStatusError as e:
print("Another non-200-range status code was received")
print(e.status_code)
print(e.response)

Error codes are as follows:

| Status Code | Error Type | | ----------- | -------------------------- | | 400 | BadRequestError | | 401 | AuthenticationError | | 403 | PermissionDeniedError | | 404 | NotFoundError | | 422 | UnprocessableEntityError | | 429 | RateLimitError | | >=500 | InternalServerError | | N/A | APIConnectionError |

Retries

Certain errors are automatically retried 2 times by default, with a short exponential backoff. Connection errors (for example, due to a network connectivity problem), 408 Request Timeout, 409 Conflict, 429 Rate Limit, and >=500 Internal errors are all retried by default.

You can use the max_retries option to configure or disable retry settings:

from groq import Groq

# Configure the default for all requests:
client = Groq(...

Excerpt shown — open the source for the full document.