{"schema_version":"onlylabs.public_analysis.v1","url":"https://onlylabs.fyi/analysis/deepinfra","json_url":"https://onlylabs.fyi/analysis/deepinfra/analysis.json","evidence_json_url":"https://onlylabs.fyi/analysis/deepinfra/evidence.json","generated_at":"2026-06-27T22:31:05.465Z","analysis":{"org_slug":"deepinfra","url":"https://onlylabs.fyi/analysis/deepinfra","json_url":"https://onlylabs.fyi/analysis/deepinfra/analysis.json","evidence_json_url":"https://onlylabs.fyi/analysis/deepinfra/evidence.json","dossier_url":"https://onlylabs.fyi/labs/deepinfra","org":{"slug":"deepinfra","name":"DeepInfra","category":"neocloud","category_label":"Neocloud","homepage_url":"https://deepinfra.com"},"title":"DeepInfra analysis","summary":"DeepInfra is an inference-cloud provider exploiting the open-weight model boom, not a model-building lab. Its GitHub footprint reveals a company systematically forking and maintaining the full inference-serving stack — from CUDA kernels to serving engines to client SDKs — while its $107M Series B and targeted hiring confirm a bet on inference infrastructure as a standalone business. The org tracks frontier…","markdown":"## Thesis\n\nDeepInfra is an inference-cloud provider exploiting the open-weight model boom, not a model-building lab. Its GitHub footprint reveals a company systematically forking and maintaining the full inference-serving stack — from CUDA kernels to serving engines to client SDKs — while its $107M Series B [W3](https://aitechconnect.in/news/deepinfra-107m-series-b-inference-cloud-2026) and targeted hiring [W5](https://www.gravityer.com/jobs/inference-optimization-engineer-deepinfra)[W6](https://www.gravityer.com/jobs/ai-research-engineer-deepinfra-inc) confirm a bet on inference infrastructure as a standalone business. The org tracks frontier open-weight releases (GLM-5.2, Step-3.7-Flash, Nemotron-3-Ultra) as they land [W1](https://www.linkedin.com/posts/deep-infra_zai-orgglm-52-demo-deepinfra-activity-7472759602959548416-jMF3)[W2](https://deepinfra.com/stepfun-ai/Step-3.7-Flash)[W4](https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-BF16), positioning itself as the neutral deployment layer for models it does not train.\n\n## Signal desks\n\n### Hiring\n\n- **Inference Optimization Engineer** — DeepInfra seeks GPU systems engineers explicitly for optimizing inference engines, implementing quantization/pruning, profiling across hardware, and building automated performance testing tooling. The role demands C++ and CUDA/OpenCL expertise [W5](https://www.gravityer.com/jobs/inference-optimization-engineer-deepinfra).\n- **AI Research Engineer** — A data-and-modeling role covering data pipeline construction, exploratory data analysis, statistical modeling, algorithm optimization, and production model deployment/monitoring [W6](https://www.gravityer.com/jobs/ai-research-engineer-deepinfra-inc).\n- Both roles point to infrastructure depth (CUDA-level optimization) and data pipeline buildout rather than pretraining or fundamental research hires [W5](https://www.gravityer.com/jobs/inference-optimization-engineer-deepinfra)[W6](https://www.gravityer.com/jobs/ai-research-engineer-deepinfra-inc).\n\n### Forks\n\n- **Inference serving engines** — DeepInfra forks every major open LLM serving framework: vLLM [P9](https://github.com/deepinfra/vllm)[E41](https://github.com/deepinfra/vllm), SGLang [E25](https://github.com/deepinfra/sglang), TensorRT-LLM [P14](https://github.com/deepinfra/TensorRT-LLM)[E37](https://github.com/deepinfra/TensorRT-LLM), tensorrtllm_backend (Triton) [P13](https://github.com/deepinfra/tensorrtllm_backend)[E38](https://github.com/deepinfra/tensorrtllm_backend), text-generation-inference (TGI, maintained as Apache 2.0 fork after upstream license change) [P6](https://github.com/deepinfra/text-generation-inference)[E4](https://github.com/deepinfra/text-generation-inference), Dynamo [E6](https://github.com/deepinfra/dynamo), and vllm-omni [E10](https://github.com/deepinfra/vllm-omni). This is the densest signal: DeepInfra is building and maintaining its own inference backends across the full spectrum.\n- **CUDA and kernel optimization** — Forks of flash-attention (vllm-project fork) [E22](https://github.com/deepinfra/flash-attention), CUTLASS (NVIDIA) [E23](https://github.com/deepinfra/cutlass), Model-Optimizer (NVIDIA) [E12](https://github.com/deepinfra/Model-Optimizer), TorchSpec [E1](https://github.com/deepinfra/TorchSpec), and SpecForge [E17](https://github.com/deepinfra/SpecForge) indicate hands-on GPU kernel and model optimization work.\n- **Evaluation and benchmarking** — Forks of EleutherAI/lm-evaluation-harness [E35](https://github.com/deepinfra/lm-evaluation-harness) and groq/openbench [E15](https://github.com/deepinfra/openbench) suggest internal evaluation infrastructure.\n- **Agent and LLM orchestration** — Forks of LangChain [P2](https://github.com/deepinfra/langchain)[E8](https://github.com/deepinfra/langchain), LangChain.js [E30](https://github.com/deepinfra/langchainjs), LiteLLM [P11](https://github.com/deepinfra/litellm)[E40](https://github.com/deepinfra/litellm), llama-stack [E27](https://github.com/deepinfra/llama-stack), Roo-Code [E18](https://github.com/deepinfra/Roo-Code), and kilocode [E19](https://github.com/deepinfra/kilocode) point to agent-framework and multi-provider routing interests.\n- **Audio and speech** — whisper-timestamped [P5](https://github.com/deepinfra/whisper-timestamped)[E50](https://github.com/deepinfra/whisper-timestamped), Kokoro-FastAPI [E21](https://github.com/deepinfra/Kokoro-FastAPI), and Zonos [E24](https://github.com/deepinfra/Zonos) reflect speech/audio inference product expansion.\n- **Vision and OCR** — olmOCR [E20](https://github.com/deepinfra/olmocr) and Pyramid-Flow [E26](https://github.com/deepinfra/Pyramid-Flow) signal multimodal and document-parsing inference use cases.\n- **Model frameworks and tokenization** — transformers [P4](https://github.com/deepinfra/transformers)[E51](https://github.com/deepinfra/transformers), sentence-transformers [P3](https://github.com/deepinfra/sentence-transformers)[E52](https://github.com/deepinfra/sentence-transformers), and tiktoken [E9](https://github.com/deepinfra/tiktoken) are foundational dependencies being tracked.\n- **Deployment and containers** — cog (Replicate) [P8](https://github.com/deepinfra/cog)[E45](https://github.com/deepinfra/cog) and cog-llama-2 [P15](https://github.com/deepinfra/cog-llama-2) suggest compatibility with container-based model packaging.\n- **Infrastructure and operations** — superfans-gpu-controller [P7](https://github.com/deepinfra/superfans-gpu-controller)[E49](https://github.com/deepinfra/superfans-gpu-controller) reveals bare-metal GPU server management (SUPERMICRO fan control via IPMI). ngx-http-auth-jwt-module [E28](https://github.com/deepinfra/ngx-http-auth-jwt-module) and fetch-event-source [P10](https://github.com/deepinfra/fetch-event-source)[E43](https://github.com/deepinfra/fetch-event-source), fetch-stream-parser [P12](https://github.com/deepinfra/fetch-stream-parser)[E39](https://github.com/deepinfra/fetch-stream-parser) address API gateway and streaming concerns.\n- **Documentation and developer surfaces** — hub-docs [E11](https://github.com/deepinfra/hub-docs), huggingface.js [E16](https://github.com/deepinfra/huggingface.js), full-stack-deep-learning-website [P1](https://github.com/deepinfra/full-stack-deep-learning-website)[E53](https://github.com/deepinfra/full-stack-deep-learning-website) reflect documentation and developer-education investments.\n\n### Releases\n\n- **deepctl CLI** — Multiple releases tracked: v0.3.8 [P24](https://github.com/deepinfra/deepctl/releases/tag/v0.3.8)[E48](https://github.com/deepinfra/deepctl/releases/tag/v0.3.8), v0.4.1 [P22](https://github.com/deepinfra/deepctl/releases/tag/v0.4.1)[E47](https://github.com/deepinfra/deepctl/releases/tag/v0.4.1), v0.4.2 [P25](https://github.com/deepinfra/deepctl/releases/tag/v0.4.2)[E46](https://github.com/deepinfra/deepctl/releases/tag/v0.4.2), v0.4.3 [P23](https://github.com/deepinfra/deepctl/releases/tag/v0.4.3)[E42](https://github.com/deepinfra/deepctl/releases/tag/v0.4.3), v0.6.0 [E29](https://github.com/deepinfra/deepctl/releases/tag/v0.6.0). The CLI (Rust, 36 stars) is the primary user-facing deployment tool for DeepInfra's cloud inference service [P16](https://github.com/deepinfra/deepctl)[E2](https://github.com/deepinfra/deepctl). Release notes are absent for all versions [P22](https://github.com/deepinfra/deepctl/releases/tag/v0.4.1)[P23](https://github.com/deepinfra/deepctl/releases/tag/v0.4.3)[P24](https://github.com/deepinfra/deepctl/releases/tag/v0.3.8)[P25](https://github.com/deepinfra/deepctl/releases/tag/v0.4.2).\n- **deepinfra-node SDK** — Versioned releases from 1.6.2 through 2.0.2 [P26](https://github.com/deepinfra/deepinfra-node/releases/tag/1.6.2)[E36](https://github.com/deepinfra/deepinfra-node/releases/tag/1.6.2)[P27](https://github.com/deepinfra/deepinfra-node/releases/tag/2.0.0-rc)[E34](https://github.com/deepinfra/deepinfra-node/releases/tag/2.0.0-rc)[P28](https://github.com/deepinfra/deepinfra-node/releases/tag/2.0.0)[E33](https://github.com/deepinfra/deepinfra-node/releases/tag/2.0.0)[E31](https://github.com/deepinfra/deepinfra-node/releases/tag/2.0.2)[E32](https://github.com/deepinfra/deepinfra-node/releases/tag/2.0.1). v1.6.2 added text-to-image fixes and Cog model/SDXL support [P26](https://github.com/deepinfra/deepinfra-node/releases/tag/1.6.2). v2.0.0 introduced environment-variable API key support, image classification, and zero-shot image classification [P28](https://github.com/deepinfra/deepinfra-node/releases/tag/2.0.0). The 2.0.0-rc framed the release as a \"better developer experience\" [P27](https://github.com/deepinfra/deepinfra-node/releases/tag/2.0.0-rc).\n- No model-weight releases, research papers, or model cards attributed to DeepInfra as author appear in this evidence pack.\n\n### Talking\n\n- **Open-weight frontier model hosting as narrative** — A LinkedIn post showcasing GLM-5.2 on DeepInfra positions the company as the deployment layer for competitive open-weight models, highlighting architecture details (744B total / 40B active MoE, IndexShare trick) and benchmark results [W1](https://www.linkedin.com/posts/deep-infra_zai-orgglm-52-demo-deepinfra-activity-7472759602959548416-jMF3). The framing is explicitly: \"DeepInfra exists for this.\"\n- **Step-3.7-Flash launch page** — The model detail page for StepFun's MoE reasoning model (198B total / ~11B active) serves as both product listing and technical explainer, linking to Hugging Face weights and GitHub code [W2](https://deepinfra.com/stepfun-ai/Step-3.7-Flash).\n- **Series B announcement ($107M)** — Coverage frames DeepInfra as an inference-economy play backed by 500 Global and Georges Harik, with an existing NVIDIA collaborator relationship predating the round [W3](https://aitechconnect.in/news/deepinfra-107m-series-b-inference-cloud-2026).\n- **Nemotron-3-Ultra listing** — Hosting NVIDIA's 550B-A55B frontier model on Hugging Face reinforces the pattern of carrying the latest open-weight releases [W4](https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-BF16).\n- No evidence of original research papers, technical blog posts authored by DeepInfra, or policy/alignment commentary in this pack.\n\n## Shipping\n\n- **deepctl** — A Rust CLI for the DeepInfra cloud ML inference service providing auth, model listing, deployment creation, and inference calls. Ships via shell installer and GitHub releases. 36 stars, 3 forks, 2 open issues [P16](https://github.com/deepinfra/deepctl)[E2](https://github.com/deepinfra/deepctl).\n- **deepinfra-node** — Official TypeScript SDK wrapping the DeepInfra Inference API with typed clients for text generation, embeddings, and image generation (SDXL). Published to npm (`deepinfra`). 20 stars, 3 forks, 8 open issues [P17](https://github.com/deepinfra/deepinfra-node)[E3](https://github.com/deepinfra/deepinfra-node).\n- **deepinfra-chat** — A Next.js sample chat app integrating DeepInfra models with Vercel AI SDK, deployable via Vercel one-click. 1 star, 2 forks [P18](https://github.com/deepinfra/deepinfra-chat)[E7](https://github.com/deepinfra/deepinfra-chat).\n- **ocr-tools** — Tutorial and script for using DeepInfra's olmOCR endpoint to parse PDFs. 5 stars, 2 forks [P19](https://github.com/deepinfra/ocr-tools)[E5](https://github.com/deepinfra/ocr-tools).\n- **docs** — Mintlify-based platform documentation site in MDX, actively maintained [P20](https://github.com/deepinfra/docs)[E14](https://github.com/deepinfra/docs).\n- **cookbooks** — Jupyter Notebook tutorials with benchmarks and production examples, starting with Nemotron 3 Nano [P21](https://github.com/deepinfra/cookbooks)[E13](https://github.com/deepinfra/cookbooks).\n- **cog-llama-2** — A Cog-based container for running Llama 2 via llama.cpp server, released the day after the Cog fork [P15](https://github.com/deepinfra/cog-llama-2)[E44](https://github.com/deepinfra/cog-llama-2).\n- **text-generation-inference (fork)** — Apache 2.0 fork of HuggingFace's TGI, maintained after upstream license change, with an explicit call for community contributions. 9 stars, 2 forks, 6 open issues [P6](https://github.com/deepinfra/text-generation-inference)[E4](https://github.com/deepinfra/text-generation-inference).\n\n## Research themes\n\nNo cited evidence in this pack. DeepInfra does not publish original research papers, model weights, or model cards as a first-party author. Its research-adjacent activity is observational: tracking and deploying others' models through inference infrastructure. The only proximity to research is the AI Research Engineer role [W6](https://www.gravityer.com/jobs/ai-research-engineer-deepinfra-inc), which mentions \"exploratory data analysis and statistical modeling,\" but no associated publications or preprints are cited.\n\n## Hiring & scaling\n\nDeepInfra's open roles [W5](https://www.gravityer.com/jobs/inference-optimization-engineer-deepinfra)[W6](https://www.gravityer.com/jobs/ai-research-engineer-deepinfra-inc) plus its $107M Series B [W3](https://aitechconnect.in/news/deepinfra-107m-series-b-inference-cloud-2026) signal a scaling phase focused on inference engineering depth rather than breadth. The Inference Optimization Engineer role targets CUDA-level optimization, quantization, and cross-hardware profiling — consistent with a company running bare-metal GPU fleets (the superfans-gpu-controller fork [P7](https://github.com/deepinfra/superfans-gpu-controller)[E49](https://github.com/deepinfra/superfans-gpu-controller) confirms physical server operations). The AI Research Engineer role adds data pipeline and model monitoring capabilities. Both roles are technical infrastructure hires. No GTM, sales, or product management roles appear in this evidence pack, though the deepinfra-chat [P18](https://github.com/deepinfra/deepinfra-chat) and cookbooks [P21](https://github.com/deepinfra/cookbooks) repos suggest developer-marketing investment. The NVIDIA collaborator relationship [W3](https://aitechconnect.in/news/deepinfra-107m-series-b-inference-cloud-2026) and the density of NVIDIA-origin forks (TensorRT-LLM [P14](https://github.com/deepinfra/TensorRT-LLM), CUTLASS [E23](https://github.com/deepinfra/cutlass), Model-Optimizer [E12](https://github.com/deepinfra/Model-Optimizer), tensorrtllm_backend [P13](https://github.com/deepinfra/tensorrtllm_backend)) point to an NVIDIA-hardware-aligned infrastructure strategy.\n\n## Category implications\n\n- **Infrastructure strategy** — The fork portfolio reveals a multi-engine inference architecture spanning vLLM, SGLang, TensorRT-LLM, TGI, and Dynamo [P9](https://github.com/deepinfra/vllm)[E25](https://github.com/deepinfra/sglang)[P14](https://github.com/deepinfra/TensorRT-LLM)[P6](https://github.com/deepinfra/text-generation-inference)[E6](https://github.com/deepinfra/dynamo). This is not a single-backend shop; it implies an orchestration layer that routes or benchmarks across engines, consistent with the litellm fork [P11](https://github.com/deepinfra/litellm)[E40](https://github.com/deepinfra/litellm) and openbench fork [E15](https://github.com/deepinfra/openbench). The Apache 2.0 TGI fork with its explicit community callout [P6](https://github.com/deepinfra/text-generation-inference) suggests license-risk hedging as a deliberate tactic.\n- **Hardware posture** — NVIDIA-only signals dominate: TensorRT-LLM [P14](https://github.com/deepinfra/TensorRT-LLM), CUTLASS [E23](https://github.com/deepinfra/cutlass), tensorrtllm_backend [P13](https://github.com/deepinfra/tensorrtllm_backend), Model-Optimizer [E12](https://github.com/deepinfra/Model-Optimizer), superfans-gpu-controller (SUPERMICRO NVIDIA GPU servers) [P7](https://github.com/deepinfra/superfans-gpu-controller). No AMD ROCm, Intel, or TPU evidence appears. The NVIDIA collaborator relationship [W3](https://aitechconnect.in/news/deepinfra-107m-series-b-inference-cloud-2026) reinforces this.\n- **Product surface** — DeepInfra ships a CLI [P16](https://github.com/deepinfra/deepctl), a TypeScript SDK [P17](https://github.com/deepinfra/deepinfra-node), REST API via TGI [P6](https://github.com/deepinfra/text-generation-inference), a Vercel-integrated chat demo [P18](https://github.com/deepinfra/deepinfra-chat), OCR tools [P19](https://github.com/deepinfra/ocr-tools), and cookbooks [P21](https://github.com/deepinfra/cookbooks). The product surface targets developers integrating inference into applications, not enterprise procurement.\n- **Model breadth vs. depth** — Evidence shows DeepInfra hosts text generation (LLMs), embeddings [P17](https://github.com/deepinfra/deepinfra-node), speech/audio [P5](https://github.com/deepinfra/whisper-timestamped)[E21](https://github.com/deepinfra/Kokoro-FastAPI)[E24](https://github.com/deepinfra/Zonos), image generation (SDXL) [P17](https://github.com/deepinfra/deepinfra-node), OCR [P19](https://github.com/deepinfra/ocr-tools)[E20](https://github.com/deepinfra/olmocr), and image classification [P28](https://github.com/deepinfra/deepinfra-node/releases/tag/2.0.0). The model catalog spans modalities but all models are third-party open-weight, consistent with the inference-cloud rather than model-lab thesis [W1](https://www.linkedin.com/posts/deep-infra_zai-orgglm-52-demo-deepinfra-activity-7472759602959548416-jMF3)[W2](https://deepinfra.com/stepfun-ai/Step-3.7-Flash)[W4](https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-BF16).\n- **GTM and commercialization** — The Vercel integration [P18](https://github.com/deepinfra/deepinfra-chat), npm package [P17](https://github.com/deepinfra/deepinfra-node), shell installer [P16](https://github.com/deepinfra/deepctl), and Mintlify docs [P20](https://github.com/deepinfra/docs) are developer-onboarding investments. The cookbooks repo explicitly promises \"performance benchmarks, and production-ready code examples\" [P21](https://github.com/deepinfra/cookbooks). No enterprise sales or platform SLAs appear in cited evidence.\n- **Research implications** — None. DeepInfra produces no cited original research. Its competitive edge is operational (inference throughput, latency, cost) not scientific [W5](https://www.gravityer.com/jobs/inference-optimization-engineer-deepinfra)[W6](https://www.gravityer.com/jobs/ai-research-engineer-deepinfra-inc)[W3](https://aitechconnect.in/news/deepinfra-107m-series-b-inference-cloud-2026).\n- **Hiring implications** — The two cited roles [W5](https://www.gravityer.com/jobs/inference-optimization-engineer-deepinfra)[W6](https://www.gravityer.com/jobs/ai-research-engineer-deepinfra-inc) concentrate on inference optimization and data infrastructure. The absence of pretraining, alignment/safety, or research scientist roles confirms the org is not competing on model capability R&D.\n\n## Traction highlights\n\n- **$107M Series B** led by 500 Global and Georges Harik, announced May 2026, positioned as one of the largest inference-infrastructure rounds [W3](https://aitechconnect.in/news/deepinfra-107m-series-b-inference-cloud-2026).\n- **deepctl CLI** — 36 GitHub stars, 3 forks, active release cadence through mid-2024 [P16](https://github.com/deepinfra/deepctl)[E2](https://github.com/deepinfra/deepctl)[E29](https://github.com/deepinfra/deepctl/releases/tag/v0.6.0).\n- **deepinfra-node SDK** — 20 stars, published to npm, iterated from 1.6.2 to 2.0.2 across Q1-Q2 2024 [P17](https://github.com/deepinfra/deepinfra-node)[E3](https://github.com/deepinfra/deepinfra-node)[P26](https://github.com/deepinfra/deepinfra-node/releases/tag/1.6.2)[P28](https://github.com/deepinfra/deepinfra-node/releases/tag/2.0.0)[E31](https://github.com/deepinfra/deepinfra-node/releases/tag/2.0.2).\n- **text-generation-inference fork** — 9 stars, 2 forks, community-contribution posture [P6](https://github.com/deepinfra/text-generation-inference)[E4](https://github.com/deepinfra/text-generation-inference).\n- **ocr-tools** — 5 stars, practical utility [P19](https://github.com/deepinfra/ocr-tools)[E5](https://github.com/deepinfra/ocr-tools).\n- **Model catalog traction** — Hosting frontier open-weight models from NVIDIA (Nemotron-3-Ultra) [W4](https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-BF16), StepFun (Step-3.7-Flash) [W2](https://deepinfra.com/stepfun-ai/Step-3.7-Flash), and Zhipu AI (GLM-5.2) [W1](https://www.linkedin.com/posts/deep-infra_zai-orgglm-52-demo-deepinfra-activity-7472759602959548416-jMF3) signals DeepInfra as a go-to hosting target for major open-weight releases.\n- **Early NVIDIA collaborator relationship** predating the Series B [W3](https://aitechconnect.in/news/deepinfra-107m-series-b-inference-cloud-2026).\n- Caveat: GitHub star counts are modest across all repos; the strongest traction signal is the Series B raise and the model-provider relationships, not community adoption of DeepInfra-authored OSS.\n\n## Sources\n\n- [P1](https://github.com/deepinfra/full-stack-deep-learning-website) deepinfra/full-stack-deep-learning-website — forked from the-full-stack/the-full-stack-website\n- [P2](https://github.com/deepinfra/langchain) deepinfra/langchain — forked from langchain-ai/langchain\n- [P3](https://github.com/deepinfra/sentence-transformers) deepinfra/sentence-transformers — forked from huggingface/sentence-transformers\n- [P4](https://github.com/deepinfra/transformers) deepinfra/transformers — forked from huggingface/transformers, actively pushed to 2025\n- [P5](https://github.com/deepinfra/whisper-timestamped) deepinfra/whisper-timestamped — forked from linto-ai/whisper-timestamped\n- [P6](https://github.com/deepinfra/text-generation-inference) deepinfra/text-generation-inference — Apache 2.0 fork of HuggingFace TGI, 9 stars\n- [P7](https://github.com/deepinfra/superfans-gpu-controller) deepinfra/superfans-gpu-controller — NVIDIA GPU fan control for SUPERMICRO servers\n- [P8](https://github.com/deepinfra/cog) deepinfra/cog — forked from replicate/cog\n- [P9](https://github.com/deepinfra/vllm) deepinfra/vllm — forked from vllm-project/vllm, pushed to Feb 2026\n- [P10](https://github.com/deepinfra/fetch-event-source) deepinfra/fetch-event-source — forked from Azure/fetch-event-source\n- [P11](https://github.com/deepinfra/litellm) deepinfra/litellm — forked from BerriAI/litellm\n- [P12](https://github.com/deepinfra/fetch-stream-parser) deepinfra/fetch-stream-parser — forked from talrasha007/fetch-stream-parser\n- [P13](https://github.com/deepinfra/tensorrtllm_backend) deepinfra/tensorrtllm_backend — forked from triton-inference-server/tensorrtllm_backend\n- [P14](https://github.com/deepinfra/TensorRT-LLM) deepinfra/TensorRT-LLM — forked from NVIDIA/TensorRT-LLM, pushed to Jun 2026\n- [P15](https://github.com/deepinfra/cog-llama-2) deepinfra/cog-llama-2 — llama.cpp-based Cog container for Llama 2\n- [P16](https://github.com/deepinfra/deepctl) deepinfra/deepctl — Rust CLI, 36 stars, primary user tool\n- [P17](https://github.com/deepinfra/deepinfra-node) deepinfra/deepinfra-node — TypeScript SDK, 20 stars\n- [P18](https://github.com/deepinfra/deepinfra-chat) deepinfra/deepinfra-chat — Next.js + Vercel AI SDK sample\n- [P19](https://github.com/deepinfra/ocr-tools) deepinfra/ocr-tools — olmOCR tutorial, 5 stars\n- [P20](https://github.com/deepinfra/docs) deepinfra/docs — Mintlify platform docs\n- [P21](https://github.com/deepinfra/cookbooks) deepinfra/cookbooks — Jupyter notebooks and benchmarks\n- deepctl releases v0.4.1, v0.4.3, v0.3.8, v0.4.2\n- deepinfra-node releases 1.6.2, 2.0.0-rc, 2.0.0\n- Event stream covering forks, releases, and repo creation\n- [W1](https://www.linkedin.com/posts/deep-infra_zai-orgglm-52-demo-deepinfra-activity-7472759602959548416-jMF3) LinkedIn post — GLM-5.2 demo on DeepInfra\n- [W2](https://deepinfra.com/stepfun-ai/Step-3.7-Flash) Step-3.7-Flash product page on DeepInfra\n- [W3](https://aitechconnect.in/news/deepinfra-107m-series-b-inference-cloud-2026) AI Tech Connect — $107M Series B coverage\n- [W4](https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-BF16) NVIDIA Nemotron-3-Ultra HF page\n- [W5](https://www.gravityer.com/jobs/inference-optimization-engineer-deepinfra) Inference Optimization Engineer job listing\n- [W6](https://www.gravityer.com/jobs/ai-research-engineer-deepinfra-inc) AI Research Engineer job listing","generated_at":"2026-06-27T19:00:07.913+00:00","citations":[{"url":"https://github.com/deepinfra/full-stack-deep-learning-website","path":null,"label":"deepinfra/full-stack-deep-learning-website","type":"external"},{"url":"https://github.com/deepinfra/langchain","path":null,"label":"deepinfra/langchain","type":"external"},{"url":"https://github.com/deepinfra/sentence-transformers","path":null,"label":"deepinfra/sentence-transformers","type":"external"},{"url":"https://github.com/deepinfra/transformers","path":null,"label":"deepinfra/transformers","type":"external"},{"url":"https://github.com/deepinfra/whisper-timestamped","path":null,"label":"deepinfra/whisper-timestamped","type":"external"},{"url":"https://github.com/deepinfra/text-generation-inference","path":null,"label":"deepinfra/text-generation-inference","type":"external"},{"url":"https://github.com/deepinfra/superfans-gpu-controller","path":null,"label":"deepinfra/superfans-gpu-controller","type":"external"},{"url":"https://github.com/deepinfra/cog","path":null,"label":"deepinfra/cog","type":"external"},{"url":"https://github.com/deepinfra/vllm","path":null,"label":"deepinfra/vllm","type":"external"},{"url":"https://github.com/deepinfra/fetch-event-source","path":null,"label":"deepinfra/fetch-event-source","type":"external"},{"url":"https://github.com/deepinfra/litellm","path":null,"label":"deepinfra/litellm","type":"external"},{"url":"https://github.com/deepinfra/fetch-stream-parser","path":null,"label":"deepinfra/fetch-stream-parser","type":"external"},{"url":"https://github.com/deepinfra/tensorrtllm_backend","path":null,"label":"deepinfra/tensorrtllm_backend","type":"external"},{"url":"https://github.com/deepinfra/TensorRT-LLM","path":null,"label":"deepinfra/TensorRT-LLM","type":"external"},{"url":"https://github.com/deepinfra/cog-llama-2","path":null,"label":"deepinfra/cog-llama-2","type":"external"},{"url":"https://github.com/deepinfra/deepctl","path":null,"label":"deepinfra/deepctl","type":"external"},{"url":"https://github.com/deepinfra/deepinfra-node","path":null,"label":"deepinfra/deepinfra-node","type":"external"},{"url":"https://github.com/deepinfra/deepinfra-chat","path":null,"label":"deepinfra/deepinfra-chat","type":"external"},{"url":"https://github.com/deepinfra/ocr-tools","path":null,"label":"deepinfra/ocr-tools","type":"external"},{"url":"https://github.com/deepinfra/docs","path":null,"label":"deepinfra/docs","type":"external"},{"url":"https://github.com/deepinfra/cookbooks","path":null,"label":"deepinfra/cookbooks","type":"external"},{"url":"https://github.com/deepinfra/deepctl/releases/tag/v0.4.1","path":null,"label":"deepinfra/deepctl","type":"external"},{"url":"https://github.com/deepinfra/deepctl/releases/tag/v0.4.3","path":null,"label":"deepinfra/deepctl","type":"external"},{"url":"https://github.com/deepinfra/deepctl/releases/tag/v0.3.8","path":null,"label":"deepinfra/deepctl","type":"external"},{"url":"https://github.com/deepinfra/deepctl/releases/tag/v0.4.2","path":null,"label":"deepinfra/deepctl","type":"external"},{"url":"https://github.com/deepinfra/deepinfra-node/releases/tag/1.6.2","path":null,"label":"deepinfra/deepinfra-node","type":"external"},{"url":"https://github.com/deepinfra/deepinfra-node/releases/tag/2.0.0-rc","path":null,"label":"deepinfra/deepinfra-node","type":"external"},{"url":"https://github.com/deepinfra/deepinfra-node/releases/tag/2.0.0","path":null,"label":"deepinfra/deepinfra-node","type":"external"},{"url":"https://github.com/deepinfra/TorchSpec","path":null,"label":"deepinfra/TorchSpec","type":"external"},{"url":"https://github.com/deepinfra/deepctl","path":null,"label":"deepinfra/deepctl","type":"external"},{"url":"https://github.com/deepinfra/deepinfra-node","path":null,"label":"deepinfra/deepinfra-node","type":"external"},{"url":"https://github.com/deepinfra/text-generation-inference","path":null,"label":"deepinfra/text-generation-inference","type":"external"},{"url":"https://github.com/deepinfra/ocr-tools","path":null,"label":"deepinfra/ocr-tools","type":"external"},{"url":"https://github.com/deepinfra/dynamo","path":null,"label":"deepinfra/dynamo","type":"external"},{"url":"https://github.com/deepinfra/deepinfra-chat","path":null,"label":"deepinfra/deepinfra-chat","type":"external"},{"url":"https://github.com/deepinfra/langchain","path":null,"label":"deepinfra/langchain","type":"external"},{"url":"https://github.com/deepinfra/tiktoken","path":null,"label":"deepinfra/tiktoken","type":"external"},{"url":"https://github.com/deepinfra/vllm-omni","path":null,"label":"deepinfra/vllm-omni","type":"external"},{"url":"https://github.com/deepinfra/hub-docs","path":null,"label":"deepinfra/hub-docs","type":"external"},{"url":"https://github.com/deepinfra/Model-Optimizer","path":null,"label":"deepinfra/Model-Optimizer","type":"external"},{"url":"https://github.com/deepinfra/cookbooks","path":null,"label":"deepinfra/cookbooks","type":"external"},{"url":"https://github.com/deepinfra/docs","path":null,"label":"deepinfra/docs","type":"external"},{"url":"https://github.com/deepinfra/openbench","path":null,"label":"deepinfra/openbench","type":"external"},{"url":"https://github.com/deepinfra/huggingface.js","path":null,"label":"deepinfra/huggingface.js","type":"external"},{"url":"https://github.com/deepinfra/SpecForge","path":null,"label":"deepinfra/SpecForge","type":"external"},{"url":"https://github.com/deepinfra/Roo-Code","path":null,"label":"deepinfra/Roo-Code","type":"external"},{"url":"https://github.com/deepinfra/kilocode","path":null,"label":"deepinfra/kilocode","type":"external"},{"url":"https://github.com/deepinfra/olmocr","path":null,"label":"deepinfra/olmocr","type":"external"},{"url":"https://github.com/deepinfra/Kokoro-FastAPI","path":null,"label":"deepinfra/Kokoro-FastAPI","type":"external"},{"url":"https://github.com/deepinfra/flash-attention","path":null,"label":"deepinfra/flash-attention","type":"external"},{"url":"https://github.com/deepinfra/cutlass","path":null,"label":"deepinfra/cutlass","type":"external"},{"url":"https://github.com/deepinfra/Zonos","path":null,"label":"deepinfra/Zonos","type":"external"},{"url":"https://github.com/deepinfra/sglang","path":null,"label":"deepinfra/sglang","type":"external"},{"url":"https://github.com/deepinfra/Pyramid-Flow","path":null,"label":"deepinfra/Pyramid-Flow","type":"external"},{"url":"https://github.com/deepinfra/llama-stack","path":null,"label":"deepinfra/llama-stack","type":"external"},{"url":"https://github.com/deepinfra/ngx-http-auth-jwt-module","path":null,"label":"deepinfra/ngx-http-auth-jwt-module","type":"external"},{"url":"https://github.com/deepinfra/deepctl/releases/tag/v0.6.0","path":null,"label":"deepinfra/deepctl","type":"external"},{"url":"https://github.com/deepinfra/langchainjs","path":null,"label":"deepinfra/langchainjs","type":"external"},{"url":"https://github.com/deepinfra/deepinfra-node/releases/tag/2.0.2","path":null,"label":"deepinfra/deepinfra-node","type":"external"},{"url":"https://github.com/deepinfra/deepinfra-node/releases/tag/2.0.1","path":null,"label":"deepinfra/deepinfra-node","type":"external"},{"url":"https://github.com/deepinfra/deepinfra-node/releases/tag/2.0.0","path":null,"label":"deepinfra/deepinfra-node","type":"external"},{"url":"https://github.com/deepinfra/deepinfra-node/releases/tag/2.0.0-rc","path":null,"label":"deepinfra/deepinfra-node","type":"external"},{"url":"https://github.com/deepinfra/lm-evaluation-harness","path":null,"label":"deepinfra/lm-evaluation-harness","type":"external"},{"url":"https://github.com/deepinfra/deepinfra-node/releases/tag/1.6.2","path":null,"label":"deepinfra/deepinfra-node","type":"external"},{"url":"https://github.com/deepinfra/TensorRT-LLM","path":null,"label":"deepinfra/TensorRT-LLM","type":"external"},{"url":"https://github.com/deepinfra/tensorrtllm_backend","path":null,"label":"deepinfra/tensorrtllm_backend","type":"external"},{"url":"https://github.com/deepinfra/fetch-stream-parser","path":null,"label":"deepinfra/fetch-stream-parser","type":"external"},{"url":"https://github.com/deepinfra/litellm","path":null,"label":"deepinfra/litellm","type":"external"},{"url":"https://github.com/deepinfra/vllm","path":null,"label":"deepinfra/vllm","type":"external"},{"url":"https://github.com/deepinfra/deepctl/releases/tag/v0.4.3","path":null,"label":"deepinfra/deepctl","type":"external"},{"url":"https://github.com/deepinfra/fetch-event-source","path":null,"label":"deepinfra/fetch-event-source","type":"external"},{"url":"https://github.com/deepinfra/cog-llama-2","path":null,"label":"deepinfra/cog-llama-2","type":"external"},{"url":"https://github.com/deepinfra/cog","path":null,"label":"deepinfra/cog","type":"external"},{"url":"https://github.com/deepinfra/deepctl/releases/tag/v0.4.2","path":null,"label":"deepinfra/deepctl","type":"external"},{"url":"https://github.com/deepinfra/deepctl/releases/tag/v0.4.1","path":null,"label":"deepinfra/deepctl","type":"external"},{"url":"https://github.com/deepinfra/deepctl/releases/tag/v0.3.8","path":null,"label":"deepinfra/deepctl","type":"external"},{"url":"https://github.com/deepinfra/superfans-gpu-controller","path":null,"label":"deepinfra/superfans-gpu-controller","type":"external"},{"url":"https://github.com/deepinfra/whisper-timestamped","path":null,"label":"deepinfra/whisper-timestamped","type":"external"},{"url":"https://github.com/deepinfra/transformers","path":null,"label":"deepinfra/transformers","type":"external"},{"url":"https://github.com/deepinfra/sentence-transformers","path":null,"label":"deepinfra/sentence-transformers","type":"external"},{"url":"https://github.com/deepinfra/full-stack-deep-learning-website","path":null,"label":"deepinfra/full-stack-deep-learning-website","type":"external"},{"url":"https://www.linkedin.com/posts/deep-infra_zai-orgglm-52-demo-deepinfra-activity-7472759602959548416-jMF3","path":null,"label":"linkedin.com/posts","type":"external"},{"url":"https://deepinfra.com/stepfun-ai/Step-3.7-Flash","path":null,"label":"deepinfra.com/stepfun-ai","type":"external"},{"url":"https://aitechconnect.in/news/deepinfra-107m-series-b-inference-cloud-2026","path":null,"label":"aitechconnect.in/news","type":"external"},{"url":"https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-BF16","path":null,"label":"nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-BF16","type":"external"},{"url":"https://www.gravityer.com/jobs/inference-optimization-engineer-deepinfra","path":null,"label":"gravityer.com/jobs","type":"external"},{"url":"https://www.gravityer.com/jobs/ai-research-engineer-deepinfra-inc","path":null,"label":"gravityer.com/jobs","type":"external"}],"provenance":{"provider":"deepseek","model":"deepseek-v4-pro","workflow":"onlylabs-deepagents-analysis-v3","agent":"deepagents"},"evidence":{"total":87,"pages":28,"events":53,"web":6,"signal_desks":{"forks":36,"repos":7,"hiring":0,"talking":0,"releases":10},"data_radar_lanes":null,"data_radar_matches":null}},"signal_counts":{"total":53,"model_released":0,"release":10,"repo_new":7,"repo_forked":36,"post_published":0,"job_opened":0}}