DigitalOcean Serverless Inference: A Deep Dive
Captured source
source ↗DigitalOcean Serverless Inference: A Deep Dive | DigitalOcean
© 2026 DigitalOcean, LLC. Sitemap .
Dark mode is coming soon. Engineering DigitalOcean Serverless Inference: A Deep Dive
By smehta
Updated: June 3, 2026 17 min read
"}], "max_tokens": 1024 }'
Model Context Protocol (MCP)
Connect to remote MCP servers — authenticated or unauthenticated — for live data access:
Shell curl -X POST https://inference.do-ai.run/v1/chat/completions \ -H "Authorization: Bearer $MODEL_ACCESS_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "openai-gpt-4o", "messages": [{"role": "user", "content": "Fetch my DigitalOcean account info."}], "tools": [{ "type": "mcp", "server_label": "digitalocean", "server_url": "https://accounts.mcp.digitalocean.com/mcp", "authorization": "Bearer $DIGITALOCEAN_API_TOKEN", "allowed_tools": ["account-get-information"] }], "tool_choice": "required", "max_tokens": 512 }'
Web Search
Give models access to real-time web content:
Shell curl -X POST https://inference.do-ai.run/v1/responses \ -H "Authorization: Bearer $MODEL_ACCESS_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "openai-gpt-4o", "input": "What are the latest DigitalOcean Droplet pricing changes?", "tools": [{"type": "web_search", "max_uses": 3, "max_results": 5}], "max_output_tokens": 1024 }'
Agentic Workflows (Claude Code)
We offer full Anthropic tool-use compatibility through /v1/messages. Set ANTHROPIC_BASE_URL to https://inference.do-ai.run/v1/messages to run Claude Code and other agentic workflows on DigitalOcean:
Shell curl https://inference.do-ai.run/v1/messages \ -H "x-api-key: $MODEL_ACCESS_KEY" \ -H "anthropic-version: 2023-06-01" \ -H "content-type: application/json" \ -d '{ "model": "anthropic-claude-4.6-sonnet", "max_tokens": 4096, "tools": [{ "name": "read_file", "description": "Read a file from the local filesystem.", "input_schema": {"type": "object", "properties": {"path": {"type": "string"}}, "required": ["path"]} }], "messages": [{"role": "user", "content": "Refactor the authentication logic in src/auth.ts."}] }'
Pricing
(current as of May 2026)
Knowledge base retrieval and MCP incur no additional charges beyond standard per-token inference costs. Web search is $10 per 1,000 requests.
Inference Router
We mentioned the Inference Router earlier as a key differentiator. Here’s how it works in practice.
The Inference Router classifies each incoming request against your configured tasks, then selects the best model from a pool. Each task has up to 3 models and a selection policy: Cost Efficiency (cheapest by token cost), Speed Optimization (fastest by TTFT), Manual Ranking (your specified order), or Optimal (DigitalOcean’s benchmarking, for pre-configured tasks).
Using it is a one-line change — prefix the router name with router: in the model field:
Shell curl -X POST https://inference.do-ai.run/v1/chat/completions \ -H "Authorization: Bearer $MODEL_ACCESS_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "router:my-support-router", "messag
Notability
notability 4.0/10Product blog, not model release