{"schema_version":"onlylabs.public_analysis_evidence.v1","title":"Baseten analysis evidence pack","description":"Public onlylabs evidence pack for cited agent analysis: captured pages, ranked public signals, and stored web-search provenance used by the background analysis workflow.","url":"https://onlylabs.fyi/analysis/baseten","json_url":"https://onlylabs.fyi/analysis/baseten/evidence.json","generated_at":"2026-06-27T22:34:00.617Z","org":{"slug":"baseten","name":"Baseten","category":"neocloud","category_label":"Neocloud","dossier_url":"https://onlylabs.fyi/labs/baseten"},"analysis":{"url":"https://onlylabs.fyi/analysis/baseten","json_url":"https://onlylabs.fyi/analysis/baseten/analysis.json","generated_at":"2026-06-27T18:45:40.723+00:00"},"workflow":{"version":"onlylabs-deepagents-analysis-v3","provider":"deepseek","model":"deepseek-v4-pro","agent":"deepagents","public_pack_mode":"local-pages-and-events","live_web_fetches":false,"note":"Public evidence exports do not trigger live Exa calls; stored Exa provenance is included when analysis metadata contains it."},"stats":{"pages":28,"events":140,"web":0,"evidence":88,"signal_desks":{"hiring":13,"forks":12,"releases":16,"talking":18,"repos":1},"data_radar_lanes":null,"data_radar_matches":null,"stored_analysis_evidence":92,"stored_analysis_web":4,"stored_analysis_signal_desks":{"forks":12,"repos":1,"hiring":13,"talking":18,"releases":16},"stored_analysis_data_radar_lanes":null,"stored_analysis_data_radar_matches":null},"stored_web_provenance":{"queries":["\"Baseten\" frontier AI lab recent model release research hiring GitHub Hugging Face","\"Baseten\" AI lab what they are building talking about hiring releasing forking"],"request_ids":["062db842769fc1f8101ca2c1139564ea","1a3634939e2a212ae56caec2df586748"],"skipped":null},"evidence":[{"ref":"P1","kind":"page","title":"basetenlabs/run-report-action v1","date":"2026-06-11T04:05:53.476917+00:00","date_source":null,"source_url":"https://github.com/basetenlabs/run-report-action/releases/tag/v1","signal_url":null,"signal_json_url":null,"text":"# v1\n\nRepository: basetenlabs/run-report-action\n\nTag: v1\n\nPublished: 2026-02-18T19:41:33Z\n\nPrerelease: no\n\nRelease notes: none published."},{"ref":"P2","kind":"page","title":"basetenlabs/action-junit-report repository metadata","date":"2026-06-11T02:54:18.745062+00:00","date_source":null,"source_url":"https://github.com/basetenlabs/action-junit-report","signal_url":null,"signal_json_url":null,"text":"# basetenlabs/action-junit-report\n\nDescription: Reports junit test results as GitHub Pull Request Check\n\nLicense: Apache-2.0\n\nStars: 0\n\nForks: 0\n\nOpen issues: 6\n\nCreated: 2025-03-17T14:46:16Z\n\nPushed: 2026-02-07T12:48:30Z\n\nDefault branch: main\n\nFork: yes\n\nParent repository: mikepenz/action-junit-report\n\nArchived: no\n\nREADME:\n<div align=\"center\">\n:octocat:\n</div>\n<h1 align=\"center\">\naction-junit-report\n</h1>\n\n<p align=\"center\">\n... reports JUnit test results as GitHub pull request check.\n</p>\n\n<div align=\"center\">\n<img src=\".github/images/action.png\"/>\n</div>\n\n<div align=\"center\">\n<a href=\"https://github.com/mikepenz/action-junit-report\">\n<img src=\"https://github.com/mikepenz/action-junit-report/workflows/CI/badge.svg\"/>\n</a>\n</div>\n<br />\n\n-------\n\n<p align=\"center\">\n<a href=\"#whats-included-\">What's included 🚀</a> &bull;\n<a href=\"#setup\">Setup 🛠️</a> &bull;\n<a href=\"#sample-%EF%B8%8F\">Sample 🖥️</a> &bull;\n<a href=\"#contribute-\">Contribute 🧬</a> &bull;\n<a href=\"#license\">License 📓</a>\n</p>\n\n-------\n\n### What's included 🚀\n\n- Flexible JUnit parser with wide support\n- Supports nested test suites\n- Blazingly fast execution\n- Lighweight\n- Rich build log output\n\nThis action processes JUnit XML test reports on pull requests and shows the result as a PR check with summary and\nannotations.\n\nBased on action for [Surefire Reports by ScaCap](https://github.com/ScaCap/action-surefire-report)\n\n## Setup\n\n### Configure the workflow\n\n```yml\nname: build\non:\npull_request:\n\njobs:\nbuild:\nname: Build and Run Tests\nruns-on: ubuntu-latest\nsteps:\n- name: Checkout Code\nuses: actions/checkout@v4\n- name: Build and Run Tests\nrun: # execute your tests generating test results\n- name: Publish Test Report\nuses: mikepenz/action-junit-report@v5\nif: success() || failure() # always run even if the previous step fails\nwith:\nreport_paths: '**/build/test-results/test/TEST-*.xml'\n```\n\n### Inputs\n\n| **Input** | **Description** |\n|----------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| `report_paths` | Optional. [Glob](https://"},{"ref":"P3","kind":"page","title":"basetenlabs/run-report-action repository metadata","date":"2026-06-11T02:43:10.121864+00:00","date_source":null,"source_url":"https://github.com/basetenlabs/run-report-action","signal_url":null,"signal_json_url":null,"text":"# basetenlabs/run-report-action\n\nDescription: A GitHub action for displaying a run report within a pull request.\n\nLanguage: TypeScript\n\nStars: 1\n\nForks: 0\n\nOpen issues: 7\n\nCreated: 2026-02-18T18:49:46Z\n\nPushed: 2026-05-22T10:43:40Z\n\nDefault branch: master\n\nFork: yes\n\nParent repository: moonrepo/run-report-action\n\nArchived: no\n\nREADME:\n# moon - CI run reports\n\nA GitHub action that reports the results of a `moon ci` run to a pull request as a comment and\nworkflow summary. The report will render all results, their final status, and time to completion, in\na [beautiful markdown table](#example).\n\nThe report will also include additional information about the environment, workflow matrix, and\ntouched files.\n\n## Installation\n\nThe action _must run after_ the `moon ci` command!\n\n```yaml\n# ...\njobs:\nci:\nname: CI\nruns-on: ubuntu-latest\nsteps:\n# ...\n- run: moon ci\n- uses: moonrepo/run-report-action@v1\nif: success() || failure()\nwith:\naccess-token: ${{ secrets.GITHUB_TOKEN }}\n```\n\nIf your workflow job is using a build matrix, you'll need to pass the entire matrix object as a JSON\nstring to the `matrix` input, otherwise the pull request comments will overwrite each other.\n\n```yaml\n# ...\njobs:\nci:\nname: CI\nruns-on: ${{ matrix.os }}\nstrategy:\nmatrix:\nos: [ubuntu-latest, windows-latest]\nnode-version: [16, 18]\nsteps:\n# ...\n- run: moon ci\n- uses: moonrepo/run-report-action@v1\nif: success() || failure()\nwith:\naccess-token: ${{ secrets.GITHUB_TOKEN }}\nmatrix: ${{ toJSON(matrix) }}\n```\n\n## Inputs\n\n- `access-token` (`string`) - REQUIRED: A GitHub access token that's used for posting comments on\nthe pull request.\n- `matrix` (`string`) - The workflow's build matrix as a JSON string. This is required for\ndifferentiating builds/comments.\n- `slow-threshold` (`number`) - Number of seconds before an action is to be considered slow.\nDefaults to 120 (2 minutes).\n- `sort-by` (`label | time`) - The field to sort the actions table on. If not defined (the default),\nwill display in the action graph's topological order.\n- `sort-dir` (`asc | desc`) - The direction to sort the actions table.\n- `workspace-root` (`string`) - Root of the moon workspace (if running in a sub-directory). Defaults\nto working "},{"ref":"P4","kind":"page","title":"Baseten Chains For Production Compound Ai Systems","date":"2026-06-27T08:01:04.599716+00:00","date_source":null,"source_url":"https://www.baseten.co/blog/baseten-chains-for-production-compound-ai-systems/","signal_url":null,"signal_json_url":null,"text":"Baseten Chains is now GA for production compound AI systems \nAnnouncing our Series F . Learn more \n\nNews \n\nBaseten Chains is now GA for production compound AI systems\n\nBaseten Chains delivers ultra-low-latency compound AI at scale, with custom hardware per model and simplified model orchestration.\n\nAuthors\n\nMarius Killinger \n\nTyron Jung \n\nRachel Rapp \n\nLast updated\nFebruary 6, 2025\n\nShare\n\nTL;DR Baseten Chains is an SDK for deploying performant compound AI systems to production. Chains enables AI builders to ship ultra-low-latency compound AI with unique hardware and autoscaling for each step, eliminating performance bottlenecks and model orchestration headaches while keeping inference cost-efficient. With improved performance and developer tooling since beta, we’re thrilled to announce that Chains is now generally available!\n\n✕\nA speech-to-speech Chain with independent hardware and autoscaling for each step. \nDeploying compound AI systems requires a different toolset than single-model deployments. Traditional approaches often force developers to build monolithic deployments where workflows and hardware are tightly coupled, and intricate model orchestration must be manually implemented. But this can be cumbersome and error-prone, leading to excess hardware, engineering, and maintenance costs, as well as performance bottlenecks.\nServing compound AI systems performantly in production brings unique challenges around:\nModel orchestration: Managing multiple AI models and processing steps, as well as their data exchange.\n\nLatency: Since models and processing steps need to pass data to one another, compound AI systems can easily incur excess latency.\n\nReliability: If one part of the system fails, the entire request fails.\n\nCost: Without sufficient composability, you can end up paying for idle GPU time and unnecessary egress costs. \n\nAfter working with industry leaders serving AI-native products at massive scale, we saw the need for a more efficient way to deploy compound AI systems in production. We set out to build a solution that lets you:\nDeploy ultra-low-latency compound AI systems , with efficient data exchange between models.\n\nReduce hardware costs and eliminate "},{"ref":"P5","kind":"page","title":"Testing Llama Inference Performance Nvidia Gh200 Lambda Cloud","date":"2026-06-27T08:01:04.349561+00:00","date_source":null,"source_url":"https://www.baseten.co/blog/testing-llama-inference-performance-nvidia-gh200-lambda-cloud/","signal_url":null,"signal_json_url":null,"text":"Testing Llama 3.3 70B inference performance on NVIDIA GH200 in Lambda Cloud \nAnnouncing our Series F . Learn more \n\nInfrastructure \n\nTesting Llama 3.3 70B inference performance on NVIDIA GH200 in Lambda Cloud\n\nThe NVIDIA GH200 Superchip combines an NVIDIA Hopper GPU with an ARM CPU via high-bandwidth interconnect\n\nAuthors\n\nPankaj Gupta \n\nPhilip Kiely \n\nLast updated\nFebruary 7, 2025\n\nShare\n\nThe NVIDIA GH200 Grace Hopper™ Superchip is a unique and interesting offering within NVIDIA’s datacenter hardware lineup. The NVIDIA Grace Hopper architecture combines an NVIDIA Hopper GPU with an ARM CPU via a high-bandwidth interconnect called NVLink-C2C. This is a similar architecture to the Grace Blackwell Superchip in GB200 NVL72 .\n✕\nThe NVIDIA GH200 provides a high-bandwidth connection between GPU and CPU resources. \nThis GPU-plus-CPU architecture is promising for AI inference workloads that require extremely large KV cache allocations. We leveraged a GH200 instance from our friends at Lambda to test how this architecture translates to real-world performance.\nIn this article, we’ll break down what makes the GH200 architecture interesting, what potential it has for high-performance inference, and the results from our early experiments serving Llama 3.3 70B on the 96GB GH200.\nGH200 vs H100 and H200 \nThe GH200 Superchip has the exact same compute profile as the NVIDIA H100 GPU and NVIDIA H200 GPU and has two different memory profiles available:\n\nHowever, the GH200 has the ARM CPU with a fast interconnect. While a server built around an H100 GPU has up to 64 GB/s in one-way bandwidth between the GPU and CPU, the GH200 Superchip has up to a 450 GB/s interconnect in each direction between its onboard CPU and GPU.\nThanks to this high-speed interconnect on GH200, it’s feasible to offload parts of the KV cache in the abundant CPU memory rather than the limited GPU memory during inference.\nGH200 for model serving \nGH200s offer theoretical advantages over H100 GPUs on both phases of LLM inference:\nPrefill, which generates the first token, is often compute-bound. While the GH200 doesn’t offer any extra compute, offloading the KV cache to abundant CPU memory offers extra space for pr"},{"ref":"P6","kind":"page","title":"Baseten Chains Is Now Ga Deploy Ultra Low Latency Compound Ai At Scale","date":"2026-06-27T08:01:04.332843+00:00","date_source":null,"source_url":"https://www.baseten.co/resources/changelog/baseten-chains-is-now-ga-deploy-ultra-low-latency-compound-ai-at-scale/","signal_url":null,"signal_json_url":null,"text":"Baseten Chains is now GA: Deploy ultra-low-latency compound AI systems \nAnnouncing our Series F . Learn more \n\nchangelog / post \n\nBaseten Chains is now GA: Deploy ultra-low-latency compound AI systems\n\nFeb 10, 2025 \nGo back\n\nNow with improved performance, robustness, and an even more delightful DevEx since our beta launch, we’re thrilled to announce the general availability of Baseten Chains for production compound AI!\nChains enables you to:\nCall a sequence of models and processing steps without incurring excess latency\n\nModularize complex workflows (allocating custom hardware and autoscaling) while keeping them cohesive\n\nAbstract away complex model orchestration\n\nDeploy any compound AI system with Chains and gain the optimized model performance and elastic horizontal scaling we specialize in. Building complex, multi-model workflows is as simple as calling local, type-safe Python functions.\nCheck out our launch blog to learn more, and join us live on March 6th to see Chains in action!\n\nExplore Baseten today\nStart deploying Talk to an engineer \n\nPopular models\nGLM 5.2 \n\nKimi K2.7 Code \n\nDeepSeek V4 \n\nGPT OSS 120B \n\nWhisper Large V3 \n\nNVIDIA Nemotron 3 Ultra \n\nExplore all \n\nPopular models\nGLM 5.2 \n\nKimi K2.7 Code \n\nDeepSeek V4 \n\nGPT OSS 120B \n\nWhisper Large V3 \n\nNVIDIA Nemotron 3 Ultra \n\nExplore all"},{"ref":"P7","kind":"page","title":"Introducing Baseten Embeddings Inference Bei","date":"2026-06-27T08:01:04.103072+00:00","date_source":null,"source_url":"https://www.baseten.co/blog/introducing-baseten-embeddings-inference-bei/","signal_url":null,"signal_json_url":null,"text":"Introducing Baseten Embeddings Inference: The fastest embeddings solution available \nAnnouncing our Series F . Learn more \n\nNews \n\nIntroducing Baseten Embeddings Inference: The fastest embeddings solution available\n\nBaseten Embeddings Inference (BEI) delivers 2x higher throughput and 10% lower latency for production embedding, reranker and classification models at scale.\n\nAuthors\n\nMichael Feil \n\nRachel Rapp \n\nLast updated\nMarch 27, 2025\n\nShare\n\nTL;DR Baseten Embeddings Inference (BEI) is the most performant embeddings inference solution for high-throughput, low-latency production workloads. With over 2x higher throughput and 10% lower latency than previous industry standards, BEI powers embedding, reranker, and classifier inference for rapid responses even under heavy load. If you need high-performance inference for your embeddings workloads, reach out to talk to our engineers!\n\nFrom search and retrieval applications to agents and recommender systems, rapid responses are a must-have for an excellent user experience. Companies building products that leverage embeddings in production need to ensure fast and reliable model performance, whether you’re processing an entire database worth of documents or handling 100,000 user requests.\nWe’re excited to announce Baseten Embeddings Inference (BEI), the fastest embeddings inference on the market, to provide users with the highest-throughput and lowest-latency embeddings inference at scale. With over 2x higher throughput and 10% lower latency than the next-best solution, BEI provides optimized inference performance out of the box for embedding, reranker, and classification models.\n\nBEI is tailored specifically for the needs of embedding workloads, which often receive high numbers of requests and require low latency for individual queries. Coupled with our optimized cold starts, elastic horizontal scale, and five nines uptime, you can use BEI with open-source, custom, or fine-tuned models or as part of compound AI systems for fast, reliable inference in production.\nIn this post, we’ll look at performance benchmarks and common use cases suited for BEI. If you’re looking for fast, reliable, and cost-efficient production infe"},{"ref":"P8","kind":"page","title":"Announcing Baseten 75m Series C","date":"2026-06-27T08:01:04.098916+00:00","date_source":null,"source_url":"https://www.baseten.co/blog/announcing-baseten-75m-series-c/","signal_url":null,"signal_json_url":null,"text":"Announcing Baseten’s $75M Series C \nAnnouncing our Series F . Learn more \n\nNews \n\nAnnouncing Baseten’s $75M Series C\n\nBaseten raised a $75M Series C to power mission-critical AI inference for leading AI companies.\n\nAuthors\n\nTuhin Srivastava \n\nLast updated\nFebruary 19, 2025\n\nShare\n\nWe founded Baseten in 2019 to help builders bring the power of AI into every product. We believed AI was the next big thing back then, but deploying models into production was a massive challenge due to the lack of the right tooling. My co-founders and I experienced that pain firsthand throughout our careers. That’s why we set out to build the inference infrastructure, workflows, and tools necessary to bring AI to life at scale.\nFast forward to today and everything is accelerating faster than we could have imagined. AI has become the dominant force on the world stage, and every company has realized that it must become an AI company to survive. Even since our Series B just a year ago there have been huge changes in the market. \nReasoning models have taken the main stage, the gap between closed and open-source models has evaporated, and companies are increasingly waking up to the reality that inference is the biggest challenge left to solve.  \nBut today&#x27;s models are bigger, faster, and far more complex than ever. You need the best possible software, tooling and knowledge to deliver world-class product experiences. We think about inference as three interweaving problem areas:\n1. Applied model performance research \nModern AI demands that models run at peak efficiency on every chip. Extracting maximum speed, quality, and reliability requires cutting-edge techniques like speculative decoding paired with knowledge of the latest hardware . This isn’t just about making models fast; it’s about squeezing out every ounce of performance for every model modality across a constantly evolving hardware space.\n2. Elastically scaling infrastructure \nIt’s not enough to optimize for a single chip. Our systems must scale reliably across thousands of nodes, regions, and clouds. Delivering consistent, mission-critical performance is a monumental infrastructure challenge. Whether for compliance, cost, or "},{"ref":"P9","kind":"page","title":"How Multi Node Inference Works Llms Deepseek R1","date":"2026-06-27T08:01:04.082537+00:00","date_source":null,"source_url":"https://www.baseten.co/blog/how-multi-node-inference-works-llms-deepseek-r1/","signal_url":null,"signal_json_url":null,"text":"How multi-node inference works for massive LLMs like DeepSeek-R1 \nAnnouncing our Series F . Learn more \n\nModel performance \n\nHow multi-node inference works for massive LLMs like DeepSeek-R1\n\nRunning DeepSeek-R1 on H100 GPUs requires multi-node inference to connect the 16 H100s needed to hold the model weights.\n\nAuthors\n\nPhil Howes \n\nPhilip Kiely \n\nLast updated\nFebruary 13, 2025\n\nShare\n\nTL;DR What do you do when you have a model like DeepSeek-R1 that’s too big to fit into an 8xH100 GPU node? Multi-node inference lets you recruit more than eight GPUs to serve a single model, but introduces new infrastructure and model performance challenges. At Baseten, we’ve built production-ready multi-node inference, and in this blog we’ll cover the key technical knowledge for understanding how it works.\n\nIf you want to run DeepSeek-R1 on H100 GPUs, you will very quickly encounter a major problem: even a full-size 8xH100 node does not have enough memory to run the model.\n✕\nDeepSeek weights are too large to fit in a single 8xH100 node \nTo run LLM inference in production, your model serving instance must have enough VRAM to not only load the model weights but also store the KV cache and activations – in this case, hundreds of gigabytes of headroom on top of DeepSeek-R1’s massive 671GB of model weights.\nCombining two H100 nodes – totaling 16 H100 GPUs – gets us 1280 GB of VRAM, plenty to run the model. H100 GPUs offer a great balance of performance and cost with increasing worldwide availability. Updating our math from above, we can see that a multi-node H100 instance will let us serve DeepSeek-R1 in production .\n✕\nDeepSeek-R1 runs in production on 16 H100 GPUs in a multi-node configuration \nOn paper, this math looks great. But getting multi-node inference running in production is another story. Multi-node inference combines problems from two separate domains:\nInfrastructure : How do you ensure that the GPU nodes you provision have sufficient interconnects and establish consistent multi-cloud abstractions?\n\nModel performance : How do you make sure your LLM takes full advantage of these GPU resources for low-latency, high-throughput inference?\n\nAt Baseten, we can run our customers’"},{"ref":"P10","kind":"page","title":"Baseten Is Fully Openai Compatible","date":"2026-06-27T08:01:03.922553+00:00","date_source":null,"source_url":"https://www.baseten.co/resources/changelog/baseten-is-fully-openai-compatible/","signal_url":null,"signal_json_url":null,"text":"Baseten is now fully OpenAI compatible \nAnnouncing our Series F . Learn more \n\nchangelog / post \n\nBaseten is now fully OpenAI compatible\n\nMar 21, 2025 \nGo back\n\nThe OpenAI SDK has become a standard for interacting with AI models, making it extremely important in the inference space. We’re happy to announce official OpenAI-compatible APIs for both chat completions and completions on models deployed to Baseten!\nThis takes our previous “OpenAI Bridge” to the next level with full, baked-in support for OpenAI-compatible models out of the box. Migrating existing client code is now simpler than ever: after updating the URL, both synchronous and streaming functionality will work immediately. Take a look at our examples in our model library to start using OpenAI-compatible models today.\n✕\n\nExplore Baseten today\nStart deploying Talk to an engineer \n\nPopular models\nGLM 5.2 \n\nKimi K2.7 Code \n\nDeepSeek V4 \n\nGPT OSS 120B \n\nWhisper Large V3 \n\nNVIDIA Nemotron 3 Ultra \n\nExplore all \n\nPopular models\nGLM 5.2 \n\nKimi K2.7 Code \n\nDeepSeek V4 \n\nGPT OSS 120B \n\nWhisper Large V3 \n\nNVIDIA Nemotron 3 Ultra \n\nExplore all"},{"ref":"P11","kind":"page","title":"How We Built Bei High Throughput Embedding Inference","date":"2026-06-27T08:01:03.805374+00:00","date_source":null,"source_url":"https://www.baseten.co/blog/how-we-built-bei-high-throughput-embedding-inference/","signal_url":null,"signal_json_url":null,"text":"How we built BEI: high-throughput embedding, reranker, and classifier inference \nAnnouncing our Series F . Learn more \n\nModel performance \n\nHow we built BEI: high-throughput embedding, reranker, and classifier inference\n\nDiscover how we optimized embedding, reranker, and classifier inference using TensorRT-LLM, doubling throughput and achieving ultra-low latency at scale.\n\nAuthors\n\nMichael Feil \n\nPhilip Kiely \n\nLast updated\nMarch 27, 2025\n\nShare\n\nTL;DR We built Baseten Embedding Inference (BEI), an optimized inference runtime leveraging TensorRT-LLM to significantly boost throughput and minimize latency for embedding, reranker, and classification models. In this piece, we&#x27;ll show the benchmarking methodology behind our claims of 2x higher throughput and detail the challenges we overcame to create the world&#x27;s fastest embedding runtime.\n\nSince the release of BERT in 2018, the humble embedding model has grown up in the shadow of LLMs, quietly powering critical AI tasks from search to reranking, classification, and retrieval. In the past year, embeddings models have shifted from BERT-based architectures to building on top of smaller language models from families like Llama , Qwen , Gemma , and Mistral .\nThis new class of embeddings models is more accurate, scores better on benchmarks, and can handle more advanced use cases. However, they’re also substantially larger, going from a few hundred million parameters for BERT-based models to several billion for the most powerful LLM-based models.\n✕\nToday’s most powerful embeddings models are 10x-100x larger than previous architectures. \nThese larger models require new approaches for inference optimization. Improving the performance of embeddings models is a unique challenge within the model performance space because you have to optimize for two very different workload profiles:\nCorpus processing: An embedding inference runtime must be able to process extremely high-throughput workloads efficiently to handle multi-billion-token document processing tasks, or even multi-trillion-token training data preparation tasks.\n\nReal-time querying: The runtime must also be able to handle individual requests with millisecond-le"},{"ref":"P12","kind":"page","title":"A Checklist For Switching To Open Source Ml Models","date":"2026-06-27T08:01:03.656879+00:00","date_source":null,"source_url":"https://www.baseten.co/blog/a-checklist-for-switching-to-open-source-ml-models/","signal_url":null,"signal_json_url":null,"text":"A checklist for switching to open source ML models \nAnnouncing our Series F . Learn more \n\nAI engineering \n\nA checklist for switching to open source ML models\n\nTransitioning from using ML models via closed source APIs to open source ML models? This checklist provides all necessary resources for the shift.\n\nAuthors\n\nPhilip Kiely \n\nLast updated\nApril 4, 2025\n\nShare\n\nSwitching from a closed source ecosystem where you consume ML models from API endpoints to the world of open source ML models can seem intimidating. But this checklist will give you all of the resources you need to make the leap.\n✕\n\nPick an open source model \nThe biggest advantage of the open source ecosystem in ML is the sheer number and variety of models to choose from. But that amount of choice can be overwhelming. Here are some alternatives to closed-source models to get you started:\nLarge language models (LLMs):\nClosed source: GPT , Claude \n\nOpen source: Llama , DeepSeek \n\nText embedding models:\nClosed source: OpenAI text-embedding-3 \n\nOpen source: BAAI text embedding models \n\nSpeech to text (audio transcription) models:\nClosed source: Whisper from the Audio API \n\nOpen source: Whisper on your own infra \n\nText to speech (audio generation) models:\nClosed source: Audio API text to speech endpoint \n\nOpen source: Orpheus TTS \n\nChoose a GPU for model inference \nInference for most generative models like LLMs requires GPUs. Picking the right GPU is essential: you want the least expensive GPU powerful enough to run the model with acceptable performance.\nFor a 70 billion parameter LLM like Llama 3.3 70B , you need 2-4 H100 GPUs, but for the largest LLMs like DeepSeek-R1, you&#x27;ll need H200 GPUs or multi-node inference. Partial H100 GPUs via multi-instance GPU (MIG) or smaller, cheaper L4 GPUs give great performance for smaller models like Whisper and embedding models.\nHere are some buyer’s guides to GPUs:\nH100 GPU guide .\n\nH200 GPU guide .\n\nMulti-node inference for DeepSeek models .\n\nFind optimizations relevant to your use case \nIf you’re just experimenting with open source models or you need to get something in production yesterday, you can skip this step. But one of the most powerful things that switch"},{"ref":"P13","kind":"page","title":"Deployment And Inference For Open Source Text Embedding Models","date":"2026-06-27T08:01:03.647236+00:00","date_source":null,"source_url":"https://www.baseten.co/blog/deployment-and-inference-for-open-source-text-embedding-models/","signal_url":null,"signal_json_url":null,"text":"Deployment and inference for open source text embedding models \nAnnouncing our Series F . Learn more \n\nAI engineering \n\nDeployment and inference for open source text embedding models\n\nText embedding models convert text into semantic vectors. Numerous open source models cater to search, recommendation, classification & LLM-augmented retrieval.\n\nAuthors\n\nPhilip Kiely \n\nLast updated\nApril 4, 2025\n\nShare\n\nTL;DR A text embedding model transforms text into a vector of numbers that represents the text’s semantic meaning. There are a number of high-quality open source text embedding models for different use cases across search, recommendation, classification, and retrieval-augmented generation with LLMs.\n\nText embedding models aren’t flashy like large language models, but they’re a foundational piece of the natural language processing field and a key component for building production-ready applications on LLMs.\nWhy create text embeddings? \nAt face value, turning nice human-readable text into a long list of numbers might seem pointless. One text embedding can’t be used for much. But creating embeddings from a corpus of text—say every post on your blog or every paragraph in your documentation—enables use cases like:\nSearch : given a query, create an embedding of that query and compare its similarity with embeddings from the data set, and return the most relevant content.\n\nRetrieval-augmented generation (RAG) : use embedding search to grab chunks of content to use as context for text generation with LLMs.\n\nRecommendations : surface related content like similar blog posts or podcast episodes.\n\nClassification and clustering : categorize text by similarity.\n\nAs each of these use cases relies on creating a set of embeddings, it’s important to use the same embeddings model for both the initial dataset and any subsequent embeddings (such as search queries).\nWhat is a text embedding? \nA text embedding encodes a chunk of text as a vector (a list of floating-point numbers). This vector represents the text’s meaning in an n-dimensional space.\nThis is difficult to visualize at the scale of real text embedding models, which have hundreds of dimensions, but here’s a simple example in t"},{"ref":"P14","kind":"page","title":"Docs Refresh","date":"2026-06-27T08:01:03.548554+00:00","date_source":null,"source_url":"https://www.baseten.co/resources/changelog/docs-refresh/","signal_url":null,"signal_json_url":null,"text":"Docs refresh \nAnnouncing our Series F . Learn more \n\nchangelog / post \n\nDocs refresh\n\nApr 7, 2025 \nGo back\n\nWe’ve overhauled the Baseten docs to make them more readable, structured, and easier to navigate for both new and returning users. Some highlights:\nNew homepage to help new users get started\n\nAll-new sidebar structure for better content hierarchy\n\nQuickstart section tailored to user goals\n\nTwo new concept pages explaining Baseten’s core value and how Baseten works\n\nExamples now live under their own tab for easier discovery\n\nAll API/CLI/SDK references are consolidated under the Reference tab\n\nNew Status tab showing real-time Baseten system status\n\nCheck it out \n✕\n\nExplore Baseten today\nStart deploying Talk to an engineer \n\nPopular models\nGLM 5.2 \n\nKimi K2.7 Code \n\nDeepSeek V4 \n\nGPT OSS 120B \n\nWhisper Large V3 \n\nNVIDIA Nemotron 3 Ultra \n\nExplore all \n\nPopular models\nGLM 5.2 \n\nKimi K2.7 Code \n\nDeepSeek V4 \n\nGPT OSS 120B \n\nWhisper Large V3 \n\nNVIDIA Nemotron 3 Ultra \n\nExplore all"},{"ref":"P15","kind":"page","title":"Building Performant Embedding Workflows With Chroma And Baseten","date":"2026-06-27T08:01:03.425752+00:00","date_source":null,"source_url":"https://www.baseten.co/blog/building-performant-embedding-workflows-with-chroma-and-baseten/","signal_url":null,"signal_json_url":null,"text":"Building performant embedding workflows with Chroma and Baseten \nAnnouncing our Series F . Learn more \n\nCommunity \n\nBuilding performant embedding workflows with Chroma and Baseten\n\nIntegrate Chroma’s open-source vector database with Baseten’s fast inference engine for efficient, real-time embedding inference in your AI-native apps.\n\nAuthors\n\nPhilip Kiely \n\nLast updated\nApril 11, 2025\n\nShare\n\nYou can now use Chroma , the open-source AI application database, with Baseten&#x27;s inference platform to create AI-native apps—from agents to RAG pipelines and search backends. \nChroma is unique among vector databases because it is open-source: you can run it locally, self-host it in your cloud provider of choice, or use Chroma’s new hosted cloud offering . This makes Chroma a natural choice for developers building with open models who want control over their entire AI infrastructure stack.\nVector databases like Chroma are powered by embedding models, which reduce inputs to numerical vectors that encode their semantic meaning. Baseten offers dedicated deployments of every open-source, fine-tuned, and custom embedding model (as well as all other generative AI models) on autoscaling infrastructure.\nRecently, Baseten announced Baseten Embedding Inference (BEI), the world’s fastest runtime for embedding models. BEI offers twice the throughput of the previous leading solutions for modern LLM-based embedding models.\n✕\nBEI outperforms the next-best embedding inference engine by up to 2.05x \nBEI is useful with Chroma in two ways:\nWhen filling the Chroma vector database with an initial corpus of data, BEI provides substantial speed and cost savings for embedding large corpora.\n\nWhen passing user queries to the Chroma database, BEI provides low-latency, real-time embedding inference and handles a large number of simultaneous users.\n\nYou can use BEI-optimized embedding models deployed on Baseten with Chroma via our official integration .\nHow to use Chroma with Baseten \nYou can call an embedding model running on Baseten using the Chroma Python SDK in less than five minutes.\nStep 1: Deploy an embedding model on Baseten \nIf you don’t yet have a Baseten account, you can sign up and you’"},{"ref":"P16","kind":"page","title":"Stream Baseten Logs From Terminal","date":"2026-06-27T08:01:03.409323+00:00","date_source":null,"source_url":"https://www.baseten.co/resources/changelog/stream-baseten-logs-from-terminal/","signal_url":null,"signal_json_url":null,"text":"Stream Baseten logs from the terminal \nAnnouncing our Series F . Learn more \n\nchangelog / post \n\nStream Baseten logs from the terminal\n\nApr 10, 2025 \nGo back\n\nFor users who love working in the terminal, we&#x27;re excited to announce truss push --tail , which streams Baseten logs directly to your command line.\nYou no longer need to switch context between your commands and browser: iterate on the entire model lifecycle—from build to deploy to debugging—while staying within your favorite shell.\nThis functionality is available as of Truss version 0.9.71.\n✕\n\nExplore Baseten today\nStart deploying Talk to an engineer \n\nPopular models\nGLM 5.2 \n\nKimi K2.7 Code \n\nDeepSeek V4 \n\nGPT OSS 120B \n\nWhisper Large V3 \n\nNVIDIA Nemotron 3 Ultra \n\nExplore all \n\nPopular models\nGLM 5.2 \n\nKimi K2.7 Code \n\nDeepSeek V4 \n\nGPT OSS 120B \n\nWhisper Large V3 \n\nNVIDIA Nemotron 3 Ultra \n\nExplore all"},{"ref":"P17","kind":"page","title":"Flexible Instance Types Per Model Deployment","date":"2026-06-27T08:01:03.190502+00:00","date_source":null,"source_url":"https://www.baseten.co/resources/changelog/flexible-instance-types-per-model-deployment/","signal_url":null,"signal_json_url":null,"text":"Flexible instance types per model deployment \nAnnouncing our Series F . Learn more \n\nchangelog / post \n\nFlexible instance types per model deployment\n\nApr 14, 2025 \nGo back\n\nModel deployments now support changing instance types, enabling you to experiment with different hardware configurations and use specific hardware for staging, development, and production environments.\nWe also added more environment details to the UI, highlighting autoscaling settings, promotion settings, and instance types. You can find a list of all supported instance types in our docs here .\n✕\nUpdated Baseten UI highlighting autoscaling settings, promotion settings, and instance type \nChanging instance types for new deployments \nUpdating the instance type for a deployment creates a new deployment with the specified instance type. When you run truss push from the CLI on an existing model, it will respect any changes made to the resources field in your config.yaml .\nChanging instance types on promotion \nWhen promoting a deployment to an environment, you now have the option to keep its instance type or use the instance type of the environment you&#x27;re promoting to.\nThe instance type of the environment will be used by default. You can opt not to use the environment instance type via the promotion dialog in the Baseten UI, or in the latest version of the Truss CLI using the --no-preserve-env-instance-type flag (for example, truss push --environment production --no-preserve-env-instance-type ).\n✕\nThe new promotion dialog: the instance type of the environment is used by default \n\nExplore Baseten today\nStart deploying Talk to an engineer \n\nPopular models\nGLM 5.2 \n\nKimi K2.7 Code \n\nDeepSeek V4 \n\nGPT OSS 120B \n\nWhisper Large V3 \n\nNVIDIA Nemotron 3 Ultra \n\nExplore all \n\nPopular models\nGLM 5.2 \n\nKimi K2.7 Code \n\nDeepSeek V4 \n\nGPT OSS 120B \n\nWhisper Large V3 \n\nNVIDIA Nemotron 3 Ultra \n\nExplore all"},{"ref":"P18","kind":"page","title":"Accelerating Inference Nvidia B200 Gpus","date":"2026-06-27T08:01:03.171304+00:00","date_source":null,"source_url":"https://www.baseten.co/blog/accelerating-inference-nvidia-b200-gpus/","signal_url":null,"signal_json_url":null,"text":"Accelerating inference with NVIDIA B200 GPUs \nAnnouncing our Series F . Learn more \n\nInfrastructure \n\nAccelerating inference with NVIDIA B200 GPUs\n\nNVIDIA B200 GPUs improve cost, throughput, and latency for use cases like code generation, search, reasoning, agents, and more.\n\nAuthors\n\nPhilip Kiely \n\nLast updated\nApril 18, 2025\n\nShare\n\nIn the past year, AI inference workloads have become substantially more demanding. LLMs are larger than ever, with DeepSeek-R1 tipping the scales at 671 billion parameters and Meta’s upcoming Llama 4 Behemoth model promising to be three times larger. \nAs inference has become mission-critical for AI-native products, latency, throughput, and cost-efficiency have become essential for running applications like code generation, search, reasoning agents, and more in production.\n✕\nHighly demanding models like DeepSeek-R1 and DeepSeek-V3 benefit most from NVIDIA B200 GPUs \nTo meet this demand, we introduced inference on NVIDIA B200 GPUs on Baseten. With B200s, our earliest users are already seeing: \n5x higher throughput for high-traffic endpoints\n\nMore than 50% lower cost per token with throughput-optimized deployments\n\nUp to 38% lower latency for serving the largest LLMs like DeepSeek-R1\n\nIn this piece, we will detail the technical advantages of B200 GPUs and the new use cases they unlock, from frontier LLMs to demanding workloads like video generation. NVIDIA B200 GPUs are available today on Baseten — contact us to get started running your workloads with B200s !\nPerformance boosts using B200s \nB200 GPUs are based on NVIDIA’s current-generation Blackwell architecture and can replace Hopper GPUs (the H100 and H200) for a wide range of workloads.\nBefore the B200, AI-native companies had limited options for high-throughput deployments of models like DeepSeek-R1 . They could use H200s, which are more expensive and less available, but can fit the model in a single node, or connect multiple sets of smaller H100 GPUs together using multi-node inference . Now, with B200s, developers have a cost-efficient and high-performance solution for large reasoning models.\nLlama 4 Scout showcases how B200 GPUs can be a game-changer even for smaller models. W"},{"ref":"P19","kind":"page","title":"Early Access Announcing B200s On Baseten","date":"2026-06-27T08:01:03.092041+00:00","date_source":null,"source_url":"https://www.baseten.co/resources/changelog/early-access-announcing-b200s-on-baseten/","signal_url":null,"signal_json_url":null,"text":"Early Access: Announcing B200s on Baseten \nAnnouncing our Series F . Learn more \n\nchangelog / post \n\nEarly Access: Announcing B200s on Baseten \n\nApr 15, 2025 \nGo back\n\nWe&#x27;re thrilled to announce early access to NVIDIA B200 GPUs on Baseten!\nFrom benchmarks on models like DeepSeek R1, Llama 4, and Qwen, we’re already seeing 5x higher throughput, over 2x better cost per token, and 38% lower latency—powering use cases from code generation to search and more.\nIf you want to start using B200s, you can reach out to our team for access here . \n\nExplore Baseten today\nStart deploying Talk to an engineer \n\nPopular models\nGLM 5.2 \n\nKimi K2.7 Code \n\nDeepSeek V4 \n\nGPT OSS 120B \n\nWhisper Large V3 \n\nNVIDIA Nemotron 3 Ultra \n\nExplore all \n\nPopular models\nGLM 5.2 \n\nKimi K2.7 Code \n\nDeepSeek V4 \n\nGPT OSS 120B \n\nWhisper Large V3 \n\nNVIDIA Nemotron 3 Ultra \n\nExplore all"},{"ref":"P20","kind":"page","title":"Day Zero Benchmarks For Qwen 3 With Sglang On Baseten","date":"2026-06-27T08:01:02.849882+00:00","date_source":null,"source_url":"https://www.baseten.co/blog/day-zero-benchmarks-for-qwen-3-with-sglang-on-baseten/","signal_url":null,"signal_json_url":null,"text":"Day zero benchmarks for Qwen 3 with SGLang on Baseten \nAnnouncing our Series F . Learn more \n\nModel performance \n\nDay zero benchmarks for Qwen 3 with SGLang on Baseten\n\nQwen 3 235B: open-source MoE LLM brings frontier reasoning to 4 H100 GPUs. See benchmarks, SGLang setup, and FP8 tips for cost-efficient inferencing.\n\nAuthors\n\nYineng Zhang \n\nMichael Feil \n\nPhilip Kiely \n\nLast updated\nApril 29, 2025\n\nShare\n\nTL;DR Qwen 3, a new family of open-source LLMs by Alibaba, introduces Qwen 3 235B, a state-of-the-art reasoning model that rivals DeepSeek-R1 but requires a quarter of the hardware resources to run in production. Using SGLang, an open source fast inference framework, we were able to optimize and deploy Qwen 3 in production for customers within minutes of the weights dropping on Hugging Face.\n\nQwen 3 introduced eight new open-source LLMs, including Qwen 3 235B, a state-of-the-art reasoning model that requires only 4 H100 GPUs for inference. That’s a quarter of the hardware needed for DeepSeek-R1, making Qwen 3 235B a highly cost-efficient reasoning model.\nQwen 3 235B is useful for everything from agentic workflows to reasoning chat to code generation, while the smaller models in the family are great when workloads need to be fast and inexpensive, like code completion. Qwen 3 compares favorably on public benchmarks with models like OpenAI-o1 and OpenAI-o3-mini while staying competitive with much larger frontier models like DeepSeek-R1 and Gemini 2.5 Pro. When we spot-checked our implementation with the gsm8k benchmark, we found that Qwen 3 scored within a margin of error of DeepSeek-R1.\n✕\nAlibaba published benchmark results for Qwen showing competitive performance vs much larger leading models \nTo take full advantage of the model’s efficient performance in production, we need to serve the model with low latency and high throughput. That’s where SGLang , an open-source fast inference framework for LLMs, comes in. SGLang core maintainers worked closely with Qwen engineers to ensure day-zero support for the new family of models.\nWith SGLang, we were able to get production-ready performance the moment the Qwen 3 model weights were made public. However, as with any n"},{"ref":"P21","kind":"page","title":"Introducing Our New Brand","date":"2026-06-27T08:01:02.752892+00:00","date_source":null,"source_url":"https://www.baseten.co/resources/changelog/introducing-our-new-brand/","signal_url":null,"signal_json_url":null,"text":"Introducing our new brand \nAnnouncing our Series F . Learn more \n\nchangelog / post \n\nIntroducing our new brand\n\nMay 19, 2025 \nGo back\n\nWe&#x27;re thrilled to introduce our new brand! \nWe believe inference is the foundation of all AI going forward. That&#x27;s what our new look is all about:\nBaseten is the building blocks of AI. \n\nBaseten is inference. \n\nAnd inference is everything. \n\nLots more to come this week. In the meantime, check out Tuhin&#x27;s blog on the new brand.\n\nExplore Baseten today\nStart deploying Talk to an engineer \n\nPopular models\nGLM 5.2 \n\nKimi K2.7 Code \n\nDeepSeek V4 \n\nGPT OSS 120B \n\nWhisper Large V3 \n\nNVIDIA Nemotron 3 Ultra \n\nExplore all \n\nPopular models\nGLM 5.2 \n\nKimi K2.7 Code \n\nDeepSeek V4 \n\nGPT OSS 120B \n\nWhisper Large V3 \n\nNVIDIA Nemotron 3 Ultra \n\nExplore all"},{"ref":"P22","kind":"page","title":"Canopy Labs Selects Baseten As Preferred Inference Provider For Orpheus Tts Model","date":"2026-06-27T08:01:02.71992+00:00","date_source":null,"source_url":"https://www.baseten.co/blog/canopy-labs-selects-baseten-as-preferred-inference-provider-for-orpheus-tts-model/","signal_url":null,"signal_json_url":null,"text":"Canopy Labs selects Baseten as preferred inference provider for Orpheus TTS models \nAnnouncing our Series F . Learn more \n\nCommunity \n\nCanopy Labs selects Baseten as preferred inference provider for Orpheus TTS models\n\nCanopy Labs recommends that startups run Orpheus TTS, their open source speech synthesis model, on Baseten for production\n\nAuthors\n\nPhilip Kiely \n\nLast updated\nMay 5, 2025\n\nShare\n\nCanopy Labs , a foundation model company operating in San Francisco, is on a mission to build digital humans that are indistinguishable from real humans. In March, they released Orpheus TTS , an open-source model for real-time lifelike speech synthesis.\nOrpheus went viral, racking up over 100K downloads on Hugging Face as a top 5 trending model. As thousands of developers experimented with Orpheus, many began to ask how to run the model in production.\nTo help developers use Orpheus TTS in production, Canopy Labs has selected Baseten as its preferred inference provider. Baseten and Canopy have collaborated to create the world’s highest-performance Orpheus inference server based on NVIDIA’s TensorRT-LLM.\nWe’re excited to partner with Baseten to optimize and serve Orpheus TTS for demanding AI applications like real-time voice agents. \nBelow, we’ll dive into benchmarks, optimizations, and client code for Orpheus on Baseten. You can deploy the model in a couple of clicks from Baseten’s model library to get started building with Orpheus instantly.\nBenchmarking real-time speech synthesis \nBenchmarking performance for TTS models is interesting because you’re not as concerned with raw tokens per second as with LLMs. For Orpheus, 83 tokens is roughly equivalent to one second of audio. So once you can get to 83 tokens per second, you actually want to increase batch size rather than token speed to support more real-time connections on the same hardware, reducing cost.\nOur original implementation of Orpheus supported 7 simultaneous real-time streams on an H100 MIG GPU – effectively half of an H100 for cost-efficient performance. After optimizing the model with TensorRT-LLM, we see a base rate of 16 simultaneous streams or up to 24 for applications with stable traffic patterns.\n✕\nBase"},{"ref":"P23","kind":"page","title":"Ai Inference Explained","date":"2026-06-27T08:01:02.594788+00:00","date_source":null,"source_url":"https://www.baseten.co/blog/ai-inference-explained/","signal_url":null,"signal_json_url":null,"text":"AI inference explained: The hidden process behind every prediction \nAnnouncing our Series F . Learn more \n\nFoundations \n\nAI inference explained: The hidden process behind every prediction\n\nWhat AI inference is, how the inference process works, and why it&#x27;s challenging to build well.\n\nAuthors\n\nMadison Kanna \n\nLast updated\nMay 19, 2025\n\nShare\n\nTL;DR AI inference is when a trained AI model makes predictions on new data (like ChatGPT generating responses to an input). It&#x27;s challenging because it must be fast, reliable, and cost-efficient — requirements that often conflict with each other. Success is measured by latency (speed), throughput (efficiency), and cost.\n\nEvery day, AI applications support millions of users, giving instant, seemingly magical answers. Behind every AI application is a process that’s invisible to end users but determines everything about their product experience: inference . But what is AI inference, and why is it so important for building scalable AI applications? \nThe two stages of AI: Training and inference \nWorking with AI models involves two distinct stages: \nThe training stage, during which a model learns how to perform a task (like recognizing images, generating text, or making decisions). \n\nThe inference stage, where the model puts what it has learned into practice.\n\nThink of training as the education phase: you&#x27;re feeding the model massive amounts of data, adjusting its parameters through many iterations, and essentially teaching it to recognize patterns and relationships. During training, the model gradually improves its understanding by learning from its mistakes. This process is computationally intensive and can take days or weeks, depending on the model&#x27;s complexity and the amount of data involved.\nAI inference is the process of using a trained AI model to make predictions on new data. In this phase, the model applies what it’s learned to become useful in the real world. Unlike training, inference must be fast and efficient, as it often occurs in real-time as users interact with AI applications. \n\nFor example, when you ask an LLM-powered application like ChatGPT a question and get a reply back, that is AI infere"},{"ref":"P24","kind":"page","title":"Introducing Our New Brand","date":"2026-06-27T08:01:02.576071+00:00","date_source":null,"source_url":"https://www.baseten.co/blog/introducing-our-new-brand/","signal_url":null,"signal_json_url":null,"text":"Introducing our new brand \nAnnouncing our Series F . Learn more \n\nNews \n\nIntroducing our new brand\n\nWe believe inference is the foundation of all AI going forward. That&#x27;s what our new look is all about.\n\nAuthors\n\nTuhin Srivastava \n\nLast updated\nMay 19, 2025\n\nShare\n\nWhen we founded Baseten in late 2019 to power the next generation of AI-powered products, we weren&#x27;t sure exactly how AI would evolve. What we did know was that the success of AI would be inextricably tied to the infrastructure that supports it. \nFast-forward to 2025 and AI is everywhere. This is a moment like no other, and we are proud to power inference for some of the best AI experiences in the world. Leading AI companies like Abridge , OpenEvidence , Gamma , and Writer use Baseten to provide the best possible inference across millions of users. \n✕\nLeading AI companies use Baseten to power their inference. \nInference in production is the destination for most Gen AI models today. Doing inference well in production means it’s fast, reliable, and cost-effective. We’ve built an entire inference stack for every step of the journey to get there, and are excited to extend that this week with some exciting product announcements (watch this space!).\nAt Baseten, we obsess over details; from how we talk to our customers, to what our CLI looks like, to the in-app experience. Our visual identity has evolved over the last few years, but most recently we felt that our design system, logo, and style didn’t quite have the versatility and expressiveness that we’d like to convey those details. We worked with the talented team at Basement to rethink the building blocks of design we need for the next phase of our journey - we hope you like it!\n\nSubscribe to our newsletter\nStay up to date on model performance, inference infrastructure, and more.\n‌ \n\nExplore Baseten today\nStart deploying Talk to an engineer \n\nRelated posts\nView all News \n\nNews Announcing our Series F \n\nTuhin Srivastava \n3 others\n\nNews Welcome, Gabe Stern! \n\nDannie Herzberg \n1 other\n\nNews Welcome, Sameer Paranjpye! \n\nAmir Haghighat \n\nPopular models\nGLM 5.2 \n\nKimi K2.7 Code \n\nDeepSeek V4 \n\nGPT OSS 120B \n\nWhisper Large V3 \n\nNVIDIA Nemotron 3 Ultra"},{"ref":"P25","kind":"page","title":"Introducing Model Apis And Training","date":"2026-06-27T08:01:02.419872+00:00","date_source":null,"source_url":"https://www.baseten.co/blog/introducing-model-apis-and-training/","signal_url":null,"signal_json_url":null,"text":"Introducing Model APIs and Training \nAnnouncing our Series F . Learn more \n\nNews \n\nIntroducing Model APIs and Training\n\nToday we&#x27;re launching two new products to serve the inference lifecycle: production-ready Model APIs and Training infrastructure.\n\nAuthors\n\nTuhin Srivastava \n\nLast updated\nMay 21, 2025\n\nShare\n\nOver the last few years, we’ve been steadfast in supporting dedicated infrastructure for our customers. Customers have come to us when they have trained their own models or wanted to use open-source models at sufficient scale, and care deeply about consistent, fast performance, reliable uptime, and best-in-class developer experience to manage models in production.\nBut there’s been a significant shift — open-source models are better than ever, and out-of-the-box are rivaling SOTA in most benchmarks. Today, we&#x27;re excited to introduce two new products that help take those models into production: Model APIs and Training.\nModel APIs \nWe were excited to be the first provider that gave access to DeepSeek V3 and R1, and the ecosystem continues to get richer with recent drops of Llama 4 and Qwen 3. However, APIs for accessing these models have lagged; performance is variable, reliability is nonexistent, and the developer experiences have left a lot to be desired. For developers getting started using open-source models, you are forced to deal with non-production grade products with subpar experiences.\nWe think we can do better, and today we’re releasing Baseten Model APIs . Our goal with Model APIs is to provide an easy path to integrate open-source models into production. We’ve built it with developers in mind, and we aim to provide state-of-the-art performance, production-grade reliability and an easy path from getting started to using Model APIs to using Dedicated Infrastructure on Baseten. We’re launching with four great models today, and we’ll add new models across modalities as the landscape evolves going forward. Get started here today. Thank you to our great partners Retool , OpenRouter , and Poe for helping us get Model APIs launch ready. \n✕\n\n✕\n\nTraining \nAt the same time, we’ve seen a natural evolution of customers going from closed-source model"},{"ref":"P26","kind":"page","title":"Your Client Code Matters 10x Higher Embedding Throughput With Python And Rust","date":"2026-06-27T08:01:02.313091+00:00","date_source":null,"source_url":"https://www.baseten.co/blog/your-client-code-matters-10x-higher-embedding-throughput-with-python-and-rust/","signal_url":null,"signal_json_url":null,"text":"Your client code matters: 12x higher embedding throughput with Python and Rust \nAnnouncing our Series F . Learn more \n\nAI engineering \n\nYour client code matters: 12x higher embedding throughput with Python and Rust\n\nBoost embedding throughput by up to 12x with the Baseten Performance Client, a Python library for processing high-volume workloads.\n\nAuthors\n\nMichael Feil \n\nLast updated\nJune 12, 2025\n\nShare\n\nThe Baseten Performance Client is a new open-source Python library (built with Rust) that massively improves throughput for high-volume embedding tasks, like standing up a new vector database or pre-processing text samples for model training. The client is OpenAI-compatible, so whether you are running models on Baseten (I hope you are) or using another inference provider, our Performance Client can improve embedding throughput by up to 12x.\nWe achieve this boost by freeing Python’s Global Interpreter Lock (GIL) during network-bound tasks – allowing true parallel request execution. The result is dramatically lower latencies under heavy loads; for example, in our benchmarks with large batches of embeddings, PerformanceClient delivered results in 1 minute and 11 seconds vs. more than 15 minutes for a standard AsyncOpenAI client at extreme scale (2,097,152 parallel inputs). It also fully utilizes multi-core CPUs to maximize throughput.\n✕\nThe Baseten Performance Client achieves 12x faster corpus backfill in an extremely high-volume test. \nUsing the client is as simple as pointing it to your embedding model’s endpoint (whether you’re running on Baseten, OpenAI, or any OpenAI-compatible service) and swapping in our client class. You’ll immediately benefit from higher throughput and lower latencies for tasks like semantic search (embeddings), retrieval-augmented generation (where you might embed and then rerank documents), or real-time classification of content.\nIn this post, we’ll explain how PerformanceClient works under the hood, compare it to a typical Python client, and show why it has such a high impact on high-volume embedding, reranking, classification, and custom batched workloads.\nHow the Baseten Performance Client removes inference performance bottlenecks \nPy"},{"ref":"P27","kind":"page","title":"Forward Deployed Engineering","date":"2026-06-27T08:01:02.092102+00:00","date_source":null,"source_url":"https://www.baseten.co/blog/forward-deployed-engineering/","signal_url":null,"signal_json_url":null,"text":"Forward deployed engineering on the frontier of AI \nAnnouncing our Series F . Learn more \n\nCommunity \n\nForward deployed engineering on the frontier of AI \n\nI lead Forward Deployed Engineering at Baseten. This post covers how FDEs accelerate value, shape product, and how to build and scale an effective FDE team.\n\nAuthors\n\nVlad Shulman \n\nLast updated\nJune 10, 2025\n\nShare\n\nThe world’s fastest-growing AI dev tooling companies are all facing a similar question: how do you accelerate adoption of highly technical products when the market is evolving faster than any roadmap can capture?\nOne solution is building out a Forward Deployed Engineering (FDE) function. An FDE team sits within the engineering organizations and works directly with your customers’ engineers to accelerate time to value with your product.\nOver the years, as both a founder (Retain.ai, acquired by Dagster) and technical lead at fast-growing companies, I’ve been interested in the ways companies can organize around delivering product value. Today, I lead the FDE team at Baseten, an AI inference platform for mission-critical workloads. \nI wrote this piece to combine those perspectives and provide founders and engineering leaders with guidance on what makes FDE special, when FDE is a good fit for a company, and how to establish and scale an FDE function.\nWhat is FDE, and why is it not just consulting? \n✕\nNote to self: We should stop letting consultants ship so much code to production. \nThe first question I get about FDE is whether it’s just a different name for more commonly understood functions, such as consulting, solutions architecture, or sales engineering. These roles are valuable and are a better fit than FDE for many companies. While there are some tactical differences – FDE tends to be more hands-on than other functions – the main difference is strategic.\nForward Deployed Engineering is unique because it sits within the engineering organization and makes regular contributions to the product, both through building the product and having an outsized influence on the product roadmap. \nBy sitting within engineering and working directly with customers, FDE expands the product&#x27;s perimeter. Where co"},{"ref":"P28","kind":"page","title":"How Baseten Multi Cloud Capacity Management Mcm Powers Cloud Self Hosted And Hybr","date":"2026-06-27T08:01:02.080903+00:00","date_source":null,"source_url":"https://www.baseten.co/blog/how-baseten-multi-cloud-capacity-management-mcm-powers-cloud-self-hosted-and-hybr/","signal_url":null,"signal_json_url":null,"text":"How Baseten multi-cloud capacity management (MCM) unifies deployments \nAnnouncing our Series F . Learn more \n\nInfrastructure \n\nHow Baseten multi-cloud capacity management (MCM) unifies deployments\n\nBaseten&#x27;s MCM system unifies GPU capacity across clouds, letting you run inference in our cloud, yours, or both—99.99 % uptime, low latency, compliance-ready.\n\nAuthors\n\nRachel Rapp \n\nAmir Haghighat \n\nLast updated\nJune 9, 2025\n\nShare\n\nTL;DR Baseten&#x27;s MCM system is a unified control layer that provisions and scales thousands of GPUs across 10+ clouds and regions. Three deployment modes share the same inference stack:\nBaseten Cloud – fully managed, multi-cloud scale and latency optimisation.\n\nSelf-hosted – the full stack inside your VPC for strict data, security, or customization needs.\n\nHybrid – run core workloads self-hosted and burst to Baseten Cloud on demand.\n\nDelivers 99.99 % uptime, lowest-possible latency, data-residency compliance (SOC 2 Type II, HIPAA, GDPR) and freedom from vendor lock-in.\n\nAt Baseten, we operate one of the most complex inference infrastructures in production today—thousands of GPUs distributed across 10+ cloud providers and multiple regions globally. This scale exposed fundamental limitations in traditional deployment approaches: single points of failure, regional and cloud-specific capacity constraints, and the operational nightmare of managing heterogeneous cloud environments.\nWe built our multi-cloud management (MCM) system to address these problems for our diverse customer base. Our MCM system is a set of automation, tools, and practices designed to manage compute across different cloud service providers (CSPs) from a single pane of glass. It comprises the core of the Baseten Inference Stack. We&#x27;ve used it for the past two years to power production workloads for Abridge, Writer, Patreon, and hundreds of others.\n✕\nBaseten supports Cloud, Self-hosted, and Hybrid deployments with Multi-cloud Capacity Management (MCM) for cloud-agnostic provisioning, orchestration, and scaling of resources. \nWe designed the MCM system for maximum flexibility, enabling customers to run in our cloud, their cloud, or a combination of both. This gi"},{"ref":"E1","kind":"event","title":"How We Built The Worlds Fastest Api For Glm 52","date":"2026-06-23T00:04:46+00:00","date_source":"sitemap.lastmod","source_url":"https://www.baseten.co/blog/how-we-built-the-worlds-fastest-api-for-glm-52/","signal_url":"https://onlylabs.fyi/signals/054c390b-737d-4921-8f4a-eff7a4bcab14","signal_json_url":"https://onlylabs.fyi/signals/054c390b-737d-4921-8f4a-eff7a4bcab14/signal.json","text":"post_published · How We Built The Worlds Fastest Api For Glm 52 · signal_desk=talking · occurred_at=2026-06-23T00:04:46+00:00 · url=https://www.baseten.co/blog/how-we-built-the-worlds-fastest-api-for-glm-52/ · hn=6 points/0 comments"},{"ref":"E2","kind":"event","title":"Announcing Our Series F","date":"2026-06-22T12:55:00+00:00","date_source":"sitemap.lastmod","source_url":"https://www.baseten.co/blog/announcing-our-series-f/","signal_url":"https://onlylabs.fyi/signals/85e1ee09-2dfa-427a-8129-33317966a472","signal_json_url":"https://onlylabs.fyi/signals/85e1ee09-2dfa-427a-8129-33317966a472/signal.json","text":"post_published · Announcing Our Series F · signal_desk=talking · occurred_at=2026-06-22T12:55:00+00:00 · url=https://www.baseten.co/blog/announcing-our-series-f/ · hn=5 points/0 comments"},{"ref":"E3","kind":"event","title":"basetenlabs/rlm","date":"2026-06-26T22:49:01+00:00","date_source":"source","source_url":"https://github.com/basetenlabs/rlm","signal_url":"https://onlylabs.fyi/signals/7156ff0b-47f0-4027-8694-ac1e6f678d1a","signal_json_url":"https://onlylabs.fyi/signals/7156ff0b-47f0-4027-8694-ac1e6f678d1a/signal.json","text":"repo_forked · basetenlabs/rlm · signal_desk=forks · occurred_at=2026-06-26T22:49:01+00:00 · url=https://github.com/basetenlabs/rlm · raw={\"repo\":\"basetenlabs/rlm\",\"parent\":\"alexzhang13/rlm\"}"},{"ref":"E4","kind":"event","title":"basetenlabs/langchain-baseten libs/baseten/v0.2.1","date":"2026-06-26T20:22:48+00:00","date_source":"source","source_url":"https://github.com/basetenlabs/langchain-baseten/releases/tag/libs/baseten/v0.2.1","signal_url":"https://onlylabs.fyi/signals/f20a5fca-2449-4a6c-b34a-2b63038d77c1","signal_json_url":"https://onlylabs.fyi/signals/f20a5fca-2449-4a6c-b34a-2b63038d77c1/signal.json","text":"release · basetenlabs/langchain-baseten libs/baseten/v0.2.1 · signal_desk=releases · occurred_at=2026-06-26T20:22:48+00:00 · url=https://github.com/basetenlabs/langchain-baseten/releases/tag/libs/baseten/v0.2.1 · raw={\"repo\":\"basetenlabs/langchain-baseten\"}"},{"ref":"E5","kind":"event","title":"basetenlabs/ucx","date":"2026-06-26T00:02:25+00:00","date_source":"source","source_url":"https://github.com/basetenlabs/ucx","signal_url":"https://onlylabs.fyi/signals/33c3a757-62a7-40fc-a35d-f17a9110f607","signal_json_url":"https://onlylabs.fyi/signals/33c3a757-62a7-40fc-a35d-f17a9110f607/signal.json","text":"repo_forked · basetenlabs/ucx · signal_desk=forks · occurred_at=2026-06-26T00:02:25+00:00 · url=https://github.com/basetenlabs/ucx · raw={\"repo\":\"basetenlabs/ucx\",\"parent\":\"openucx/ucx\"}"},{"ref":"E6","kind":"event","title":"basetenlabs/ucxx","date":"2026-06-26T00:00:29+00:00","date_source":"source","source_url":"https://github.com/basetenlabs/ucxx","signal_url":"https://onlylabs.fyi/signals/bdd69fe9-a78f-4366-b3b4-4993f760c246","signal_json_url":"https://onlylabs.fyi/signals/bdd69fe9-a78f-4366-b3b4-4993f760c246/signal.json","text":"repo_forked · basetenlabs/ucxx · signal_desk=forks · occurred_at=2026-06-26T00:00:29+00:00 · url=https://github.com/basetenlabs/ucxx · raw={\"repo\":\"basetenlabs/ucxx\",\"parent\":\"rapidsai/ucxx\"}"},{"ref":"E7","kind":"event","title":"Ai Training Vs Inference","date":"2026-06-25T20:33:01+00:00","date_source":"sitemap.lastmod","source_url":"https://www.baseten.co/blog/ai-training-vs-inference/","signal_url":"https://onlylabs.fyi/signals/a3a2d7ae-29f8-43fe-883e-895d650d06ba","signal_json_url":"https://onlylabs.fyi/signals/a3a2d7ae-29f8-43fe-883e-895d650d06ba/signal.json","text":"post_published · Ai Training Vs Inference · signal_desk=talking · occurred_at=2026-06-25T20:33:01+00:00 · url=https://www.baseten.co/blog/ai-training-vs-inference/"},{"ref":"E8","kind":"event","title":"Async Log Downloads","date":"2026-06-25T18:46:49+00:00","date_source":"sitemap.lastmod","source_url":"https://www.baseten.co/resources/changelog/async-log-downloads/","signal_url":"https://onlylabs.fyi/signals/d547d503-a119-49f0-8536-e189cc071ebd","signal_json_url":"https://onlylabs.fyi/signals/d547d503-a119-49f0-8536-e189cc071ebd/signal.json","text":"post_published · Async Log Downloads · signal_desk=talking · occurred_at=2026-06-25T18:46:49+00:00 · url=https://www.baseten.co/resources/changelog/async-log-downloads/"},{"ref":"E9","kind":"event","title":"Live Draft Model Training For Speculative Decoding","date":"2026-06-25T14:23:27+00:00","date_source":"sitemap.lastmod","source_url":"https://www.baseten.co/blog/live-draft-model-training-for-speculative-decoding/","signal_url":"https://onlylabs.fyi/signals/7feda2eb-5ae5-4333-b689-4bf60aeb58a7","signal_json_url":"https://onlylabs.fyi/signals/7feda2eb-5ae5-4333-b689-4bf60aeb58a7/signal.json","text":"post_published · Live Draft Model Training For Speculative Decoding · signal_desk=talking · occurred_at=2026-06-25T14:23:27+00:00 · url=https://www.baseten.co/blog/live-draft-model-training-for-speculative-decoding/"},{"ref":"E10","kind":"event","title":"Model Deprecation Deepseek V31 Minimax M25","date":"2026-06-24T21:55:40+00:00","date_source":"sitemap.lastmod","source_url":"https://www.baseten.co/resources/changelog/model-deprecation-deepseek-v31-minimax-m25/","signal_url":"https://onlylabs.fyi/signals/440dc7f5-0223-4590-9174-c34d28b0275c","signal_json_url":"https://onlylabs.fyi/signals/440dc7f5-0223-4590-9174-c34d28b0275c/signal.json","text":"post_published · Model Deprecation Deepseek V31 Minimax M25 · signal_desk=talking · occurred_at=2026-06-24T21:55:40+00:00 · url=https://www.baseten.co/resources/changelog/model-deprecation-deepseek-v31-minimax-m25/"},{"ref":"E11","kind":"event","title":"basetenlabs/truss v0.18.17","date":"2026-06-24T16:39:00+00:00","date_source":"source","source_url":"https://github.com/basetenlabs/truss/releases/tag/v0.18.17","signal_url":"https://onlylabs.fyi/signals/a323f44d-b6b5-43d7-aaf6-a9b431b5f5ba","signal_json_url":"https://onlylabs.fyi/signals/a323f44d-b6b5-43d7-aaf6-a9b431b5f5ba/signal.json","text":"release · basetenlabs/truss v0.18.17 · signal_desk=releases · occurred_at=2026-06-24T16:39:00+00:00 · url=https://github.com/basetenlabs/truss/releases/tag/v0.18.17 · raw={\"repo\":\"basetenlabs/truss\"}"},{"ref":"E12","kind":"event","title":"basetenlabs/baseten-cli v0.2.0","date":"2026-06-24T16:31:20+00:00","date_source":"source","source_url":"https://github.com/basetenlabs/baseten-cli/releases/tag/v0.2.0","signal_url":"https://onlylabs.fyi/signals/acf563d6-4a9f-4b43-a7a0-97ba2ba3c8d2","signal_json_url":"https://onlylabs.fyi/signals/acf563d6-4a9f-4b43-a7a0-97ba2ba3c8d2/signal.json","text":"release · basetenlabs/baseten-cli v0.2.0 · signal_desk=releases · occurred_at=2026-06-24T16:31:20+00:00 · url=https://github.com/basetenlabs/baseten-cli/releases/tag/v0.2.0 · raw={\"repo\":\"basetenlabs/baseten-cli\"}"},{"ref":"E13","kind":"event","title":"Field Productivity & Enablement Lead","date":"2026-06-24T14:56:38.769+00:00","date_source":"ashby.publishedAt","source_url":"https://jobs.ashbyhq.com/baseten/5a1c6228-3906-4ccc-9988-d9bd67383b9d","signal_url":"https://onlylabs.fyi/signals/af7ee7bd-1753-46b6-907e-5c1ef7505006","signal_json_url":"https://onlylabs.fyi/signals/af7ee7bd-1753-46b6-907e-5c1ef7505006/signal.json","text":"job_opened · Field Productivity & Enablement Lead · signal_desk=hiring · occurred_at=2026-06-24T14:56:38.769+00:00 · url=https://jobs.ashbyhq.com/baseten/5a1c6228-3906-4ccc-9988-d9bd67383b9d · raw={\"location\":\"San Francisco\",\"team\":\"Revenue Operations\",\"ats\":\"ashby\"}"},{"ref":"E14","kind":"event","title":"How To Run Glm 52 In Any Harness","date":"2026-06-24T00:00:00+00:00","date_source":"sitemap.lastmod","source_url":"https://www.baseten.co/blog/how-to-run-glm-52-in-any-harness/","signal_url":"https://onlylabs.fyi/signals/9dd15ab6-22da-4153-a4d2-6bada3944234","signal_json_url":"https://onlylabs.fyi/signals/9dd15ab6-22da-4153-a4d2-6bada3944234/signal.json","text":"post_published · How To Run Glm 52 In Any Harness · signal_desk=talking · occurred_at=2026-06-24T00:00:00+00:00 · url=https://www.baseten.co/blog/how-to-run-glm-52-in-any-harness/"},{"ref":"E15","kind":"event","title":"Senior Analyst, Revenue Strategy & Operations","date":"2026-06-23T18:48:02.862+00:00","date_source":"ashby.publishedAt","source_url":"https://jobs.ashbyhq.com/baseten/6d32aa11-ac93-4f90-8f62-bdeb79214ee5","signal_url":"https://onlylabs.fyi/signals/bcdcbe36-41de-45d2-a6f2-15c085a9ee4a","signal_json_url":"https://onlylabs.fyi/signals/bcdcbe36-41de-45d2-a6f2-15c085a9ee4a/signal.json","text":"job_opened · Senior Analyst, Revenue Strategy & Operations · signal_desk=hiring · occurred_at=2026-06-23T18:48:02.862+00:00 · url=https://jobs.ashbyhq.com/baseten/6d32aa11-ac93-4f90-8f62-bdeb79214ee5 · raw={\"location\":\"San Francisco\",\"team\":\"GTM\",\"ats\":\"ashby\"}"},{"ref":"E16","kind":"event","title":"Senior Frontend Engineer","date":"2026-06-23T16:06:39.411+00:00","date_source":"ashby.publishedAt","source_url":"https://jobs.ashbyhq.com/baseten/3622a1ee-50a9-4c45-af6e-aa12bd5de22f","signal_url":"https://onlylabs.fyi/signals/a92806da-3a17-4f24-9f8e-1fa5b74c0732","signal_json_url":"https://onlylabs.fyi/signals/a92806da-3a17-4f24-9f8e-1fa5b74c0732/signal.json","text":"job_opened · Senior Frontend Engineer · signal_desk=hiring · occurred_at=2026-06-23T16:06:39.411+00:00 · url=https://jobs.ashbyhq.com/baseten/3622a1ee-50a9-4c45-af6e-aa12bd5de22f · raw={\"location\":\"San Francisco\",\"team\":\"Dedicated Inference\",\"ats\":\"ashby\"}"},{"ref":"E17","kind":"event","title":"basetenlabs/truss v0.18.16","date":"2026-06-23T15:27:26+00:00","date_source":"source","source_url":"https://github.com/basetenlabs/truss/releases/tag/v0.18.16","signal_url":"https://onlylabs.fyi/signals/752a81fe-faec-4f28-bb1c-9e6b9fe8edd1","signal_json_url":"https://onlylabs.fyi/signals/752a81fe-faec-4f28-bb1c-9e6b9fe8edd1/signal.json","text":"release · basetenlabs/truss v0.18.16 · signal_desk=releases · occurred_at=2026-06-23T15:27:26+00:00 · url=https://github.com/basetenlabs/truss/releases/tag/v0.18.16 · raw={\"repo\":\"basetenlabs/truss\"}"},{"ref":"E18","kind":"event","title":"Partnerships Product Marketing Manager","date":"2026-06-23T01:39:03.56+00:00","date_source":"ashby.publishedAt","source_url":"https://jobs.ashbyhq.com/baseten/132295d6-eeb4-4655-9847-a7e9a586d273","signal_url":"https://onlylabs.fyi/signals/d12b4bc2-0d54-4033-a2ab-2eda45341cda","signal_json_url":"https://onlylabs.fyi/signals/d12b4bc2-0d54-4033-a2ab-2eda45341cda/signal.json","text":"job_opened · Partnerships Product Marketing Manager · signal_desk=hiring · occurred_at=2026-06-23T01:39:03.56+00:00 · url=https://jobs.ashbyhq.com/baseten/132295d6-eeb4-4655-9847-a7e9a586d273 · raw={\"location\":\"San Francisco\",\"team\":\"Product Marketing\",\"ats\":\"ashby\"}"},{"ref":"E19","kind":"event","title":"basetenlabs/truss v0.18.16rc0","date":"2026-06-23T01:21:09+00:00","date_source":"source","source_url":"https://github.com/basetenlabs/truss/releases/tag/v0.18.16rc0","signal_url":"https://onlylabs.fyi/signals/7576b9d3-cb91-4427-a67a-59295122335b","signal_json_url":"https://onlylabs.fyi/signals/7576b9d3-cb91-4427-a67a-59295122335b/signal.json","text":"release · basetenlabs/truss v0.18.16rc0 · signal_desk=releases · occurred_at=2026-06-23T01:21:09+00:00 · url=https://github.com/basetenlabs/truss/releases/tag/v0.18.16rc0 · raw={\"repo\":\"basetenlabs/truss\"}"},{"ref":"E20","kind":"event","title":"Nvidia Bionemo Agent Toolkit On Baseten","date":"2026-06-23T00:00:00+00:00","date_source":"sitemap.lastmod","source_url":"https://www.baseten.co/blog/nvidia-bionemo-agent-toolkit-on-baseten/","signal_url":"https://onlylabs.fyi/signals/16827949-c3af-49b9-9f6a-8904fae40174","signal_json_url":"https://onlylabs.fyi/signals/16827949-c3af-49b9-9f6a-8904fae40174/signal.json","text":"post_published · Nvidia Bionemo Agent Toolkit On Baseten · signal_desk=talking · occurred_at=2026-06-23T00:00:00+00:00 · url=https://www.baseten.co/blog/nvidia-bionemo-agent-toolkit-on-baseten/"},{"ref":"E21","kind":"event","title":"basetenlabs/truss v0.18.15","date":"2026-06-22T20:27:29+00:00","date_source":"source","source_url":"https://github.com/basetenlabs/truss/releases/tag/v0.18.15","signal_url":"https://onlylabs.fyi/signals/d45756e4-517c-4466-b3f3-65620cfafe97","signal_json_url":"https://onlylabs.fyi/signals/d45756e4-517c-4466-b3f3-65620cfafe97/signal.json","text":"release · basetenlabs/truss v0.18.15 · signal_desk=releases · occurred_at=2026-06-22T20:27:29+00:00 · url=https://github.com/basetenlabs/truss/releases/tag/v0.18.15 · raw={\"repo\":\"basetenlabs/truss\"}"},{"ref":"E22","kind":"event","title":"basetenlabs/container-debug-support","date":"2026-06-22T18:19:56+00:00","date_source":"source","source_url":"https://github.com/basetenlabs/container-debug-support","signal_url":"https://onlylabs.fyi/signals/a177c87e-07a8-4bc4-8b5b-fdafbc5c8518","signal_json_url":"https://onlylabs.fyi/signals/a177c87e-07a8-4bc4-8b5b-fdafbc5c8518/signal.json","text":"repo_forked · basetenlabs/container-debug-support · signal_desk=forks · occurred_at=2026-06-22T18:19:56+00:00 · url=https://github.com/basetenlabs/container-debug-support · raw={\"repo\":\"basetenlabs/container-debug-support\",\"parent\":\"GoogleContainerTools/container-debug-support\"}"},{"ref":"E23","kind":"event","title":"basetenlabs/sw-example-ci-cd","date":"2026-06-22T18:19:48+00:00","date_source":"source","source_url":"https://github.com/basetenlabs/sw-example-ci-cd","signal_url":"https://onlylabs.fyi/signals/77b2e3ce-8a6c-4d8c-b854-270636d6056d","signal_json_url":"https://onlylabs.fyi/signals/77b2e3ce-8a6c-4d8c-b854-270636d6056d/signal.json","text":"repo_new · basetenlabs/sw-example-ci-cd · signal_desk=repos · occurred_at=2026-06-22T18:19:48+00:00 · url=https://github.com/basetenlabs/sw-example-ci-cd · raw={\"repo\":\"basetenlabs/sw-example-ci-cd\",\"description\":\"Example CI/CD with model config separated from vllm config\"}"},{"ref":"E24","kind":"event","title":"Filter And Stream Model Logs From The Cli","date":"2026-06-22T17:22:43+00:00","date_source":"sitemap.lastmod","source_url":"https://www.baseten.co/resources/changelog/filter-and-stream-model-logs-from-the-cli/","signal_url":"https://onlylabs.fyi/signals/3390a6e8-f57f-4328-b14f-9f11694809ba","signal_json_url":"https://onlylabs.fyi/signals/3390a6e8-f57f-4328-b14f-9f11694809ba/signal.json","text":"post_published · Filter And Stream Model Logs From The Cli · signal_desk=talking · occurred_at=2026-06-22T17:22:43+00:00 · url=https://www.baseten.co/resources/changelog/filter-and-stream-model-logs-from-the-cli/"},{"ref":"E25","kind":"event","title":"basetenlabs/truss v0.18.14","date":"2026-06-22T16:26:06+00:00","date_source":"source","source_url":"https://github.com/basetenlabs/truss/releases/tag/v0.18.14","signal_url":"https://onlylabs.fyi/signals/c40d39e0-1b9c-4cab-8510-191c80511f7d","signal_json_url":"https://onlylabs.fyi/signals/c40d39e0-1b9c-4cab-8510-191c80511f7d/signal.json","text":"release · basetenlabs/truss v0.18.14 · signal_desk=releases · occurred_at=2026-06-22T16:26:06+00:00 · url=https://github.com/basetenlabs/truss/releases/tag/v0.18.14 · raw={\"repo\":\"basetenlabs/truss\"}"},{"ref":"E26","kind":"event","title":"basetenlabs/DeepGEMM","date":"2026-06-17T23:54:43+00:00","date_source":"source","source_url":"https://github.com/basetenlabs/DeepGEMM","signal_url":"https://onlylabs.fyi/signals/fc3c9297-13c2-4354-947d-12be64637e4e","signal_json_url":"https://onlylabs.fyi/signals/fc3c9297-13c2-4354-947d-12be64637e4e/signal.json","text":"repo_forked · basetenlabs/DeepGEMM · signal_desk=forks · occurred_at=2026-06-17T23:54:43+00:00 · url=https://github.com/basetenlabs/DeepGEMM · raw={\"repo\":\"basetenlabs/DeepGEMM\",\"parent\":\"deepseek-ai/DeepGEMM\"}"},{"ref":"E27","kind":"event","title":"The Best Open Source Large Language Models Llms","date":"2026-06-17T21:35:24+00:00","date_source":"sitemap.lastmod","source_url":"https://www.baseten.co/blog/the-best-open-source-large-language-models-llms/","signal_url":"https://onlylabs.fyi/signals/bed89352-49c4-4815-b936-faaa516e774f","signal_json_url":"https://onlylabs.fyi/signals/bed89352-49c4-4815-b936-faaa516e774f/signal.json","text":"post_published · The Best Open Source Large Language Models Llms · signal_desk=talking · occurred_at=2026-06-17T21:35:24+00:00 · url=https://www.baseten.co/blog/the-best-open-source-large-language-models-llms/"},{"ref":"E28","kind":"event","title":"basetenlabs/truss v0.18.13","date":"2026-06-17T11:19:23+00:00","date_source":"source","source_url":"https://github.com/basetenlabs/truss/releases/tag/v0.18.13","signal_url":"https://onlylabs.fyi/signals/cf2eb190-7c5d-46dd-a944-1db1c085ea8a","signal_json_url":"https://onlylabs.fyi/signals/cf2eb190-7c5d-46dd-a944-1db1c085ea8a/signal.json","text":"release · basetenlabs/truss v0.18.13 · signal_desk=releases · occurred_at=2026-06-17T11:19:23+00:00 · url=https://github.com/basetenlabs/truss/releases/tag/v0.18.13 · raw={\"repo\":\"basetenlabs/truss\"}"},{"ref":"E29","kind":"event","title":"Glm 52 Available On Baseten","date":"2026-06-16T23:52:09+00:00","date_source":"sitemap.lastmod","source_url":"https://www.baseten.co/resources/changelog/glm-52-available-on-baseten/","signal_url":"https://onlylabs.fyi/signals/5691b77e-dedc-4d56-8b64-be8b9d2d905f","signal_json_url":"https://onlylabs.fyi/signals/5691b77e-dedc-4d56-8b64-be8b9d2d905f/signal.json","text":"post_published · Glm 52 Available On Baseten · signal_desk=talking · occurred_at=2026-06-16T23:52:09+00:00 · url=https://www.baseten.co/resources/changelog/glm-52-available-on-baseten/"},{"ref":"E30","kind":"event","title":"Kimi K27 Coder On","date":"2026-06-16T16:21:29+00:00","date_source":"sitemap.lastmod","source_url":"https://www.baseten.co/resources/changelog/kimi-k27-coder-on/","signal_url":"https://onlylabs.fyi/signals/9ae215f7-499a-4587-8b28-4fcd13b43435","signal_json_url":"https://onlylabs.fyi/signals/9ae215f7-499a-4587-8b28-4fcd13b43435/signal.json","text":"post_published · Kimi K27 Coder On · signal_desk=talking · occurred_at=2026-06-16T16:21:29+00:00 · url=https://www.baseten.co/resources/changelog/kimi-k27-coder-on/"},{"ref":"E31","kind":"event","title":"basetenlabs/truss v0.18.12","date":"2026-06-16T15:20:13+00:00","date_source":"source","source_url":"https://github.com/basetenlabs/truss/releases/tag/v0.18.12","signal_url":"https://onlylabs.fyi/signals/292ac27a-2393-4002-bb5b-2fa47124e88c","signal_json_url":"https://onlylabs.fyi/signals/292ac27a-2393-4002-bb5b-2fa47124e88c/signal.json","text":"release · basetenlabs/truss v0.18.12 · signal_desk=releases · occurred_at=2026-06-16T15:20:13+00:00 · url=https://github.com/basetenlabs/truss/releases/tag/v0.18.12 · raw={\"repo\":\"basetenlabs/truss\"}"},{"ref":"E32","kind":"event","title":"basetenlabs/truss v0.18.11","date":"2026-06-16T02:15:45+00:00","date_source":"source","source_url":"https://github.com/basetenlabs/truss/releases/tag/v0.18.11","signal_url":"https://onlylabs.fyi/signals/e58eea00-86d5-4b84-8b33-873c613e3fdc","signal_json_url":"https://onlylabs.fyi/signals/e58eea00-86d5-4b84-8b33-873c613e3fdc/signal.json","text":"release · basetenlabs/truss v0.18.11 · signal_desk=releases · occurred_at=2026-06-16T02:15:45+00:00 · url=https://github.com/basetenlabs/truss/releases/tag/v0.18.11 · raw={\"repo\":\"basetenlabs/truss\"}"},{"ref":"E33","kind":"event","title":"Capacity Strategy & Operations Lead","date":"2026-06-16T00:09:06.835+00:00","date_source":"ashby.publishedAt","source_url":"https://jobs.ashbyhq.com/baseten/918dde84-09c6-4ee7-a0b9-a3e3253ac4b0","signal_url":"https://onlylabs.fyi/signals/994efb97-efcd-4083-adad-b7088ee7bbea","signal_json_url":"https://onlylabs.fyi/signals/994efb97-efcd-4083-adad-b7088ee7bbea/signal.json","text":"job_opened · Capacity Strategy & Operations Lead · signal_desk=hiring · occurred_at=2026-06-16T00:09:06.835+00:00 · url=https://jobs.ashbyhq.com/baseten/918dde84-09c6-4ee7-a0b9-a3e3253ac4b0 · raw={\"location\":\"San Francisco\",\"team\":\"Compute\",\"ats\":\"ashby\"}"},{"ref":"E34","kind":"event","title":"basetenlabs/truss v0.18.10","date":"2026-06-15T23:06:28+00:00","date_source":"source","source_url":"https://github.com/basetenlabs/truss/releases/tag/v0.18.10","signal_url":"https://onlylabs.fyi/signals/572e666f-9145-40e7-9ab1-1ad241c49977","signal_json_url":"https://onlylabs.fyi/signals/572e666f-9145-40e7-9ab1-1ad241c49977/signal.json","text":"release · basetenlabs/truss v0.18.10 · signal_desk=releases · occurred_at=2026-06-15T23:06:28+00:00 · url=https://github.com/basetenlabs/truss/releases/tag/v0.18.10 · raw={\"repo\":\"basetenlabs/truss\"}"},{"ref":"E35","kind":"event","title":"Software Engineer - Capacity","date":"2026-06-12T20:06:27.481+00:00","date_source":"ashby.publishedAt","source_url":"https://jobs.ashbyhq.com/baseten/902a7ddb-c21f-4272-aaab-879680697986","signal_url":"https://onlylabs.fyi/signals/7abc1e6e-10cd-47c8-80f2-a8b9ff12bafe","signal_json_url":"https://onlylabs.fyi/signals/7abc1e6e-10cd-47c8-80f2-a8b9ff12bafe/signal.json","text":"job_opened · Software Engineer - Capacity · signal_desk=hiring · occurred_at=2026-06-12T20:06:27.481+00:00 · url=https://jobs.ashbyhq.com/baseten/902a7ddb-c21f-4272-aaab-879680697986 · raw={\"location\":\"San Francisco\",\"team\":\"Internal Platform (Dev Tooling)\",\"ats\":\"ashby\"}"},{"ref":"E36","kind":"event","title":"Customer Marketing Manager ","date":"2026-06-12T17:36:14.12+00:00","date_source":"ashby.publishedAt","source_url":"https://jobs.ashbyhq.com/baseten/6111c4a1-4e29-4fb8-aca3-1c8c8e0cbfb1","signal_url":"https://onlylabs.fyi/signals/45704b0d-6604-4a0d-b905-b55dd259f12a","signal_json_url":"https://onlylabs.fyi/signals/45704b0d-6604-4a0d-b905-b55dd259f12a/signal.json","text":"job_opened · Customer Marketing Manager  · signal_desk=hiring · occurred_at=2026-06-12T17:36:14.12+00:00 · url=https://jobs.ashbyhq.com/baseten/6111c4a1-4e29-4fb8-aca3-1c8c8e0cbfb1 · raw={\"location\":\"San Francisco\",\"team\":\"Marketing\",\"ats\":\"ashby\"}"},{"ref":"E37","kind":"event","title":"Rolling Deployments Zero Downtime Model Updates","date":"2026-06-12T16:25:28+00:00","date_source":"sitemap.lastmod","source_url":"https://www.baseten.co/blog/rolling-deployments-zero-downtime-model-updates/","signal_url":"https://onlylabs.fyi/signals/d45222d4-556c-44af-8667-82259e8f3ce7","signal_json_url":"https://onlylabs.fyi/signals/d45222d4-556c-44af-8667-82259e8f3ce7/signal.json","text":"post_published · Rolling Deployments Zero Downtime Model Updates · signal_desk=talking · occurred_at=2026-06-12T16:25:28+00:00 · url=https://www.baseten.co/blog/rolling-deployments-zero-downtime-model-updates/"},{"ref":"E38","kind":"event","title":"New Sidebar Navigation","date":"2026-06-12T01:10:45+00:00","date_source":"sitemap.lastmod","source_url":"https://www.baseten.co/resources/changelog/new-sidebar-navigation/","signal_url":"https://onlylabs.fyi/signals/2f023b95-b578-4ce3-af47-c3d3fa743e86","signal_json_url":"https://onlylabs.fyi/signals/2f023b95-b578-4ce3-af47-c3d3fa743e86/signal.json","text":"post_published · New Sidebar Navigation · signal_desk=talking · occurred_at=2026-06-12T01:10:45+00:00 · url=https://www.baseten.co/resources/changelog/new-sidebar-navigation/"},{"ref":"E39","kind":"event","title":"basetenlabs/truss v0.18.9","date":"2026-06-11T19:14:59+00:00","date_source":"source","source_url":"https://github.com/basetenlabs/truss/releases/tag/v0.18.9","signal_url":"https://onlylabs.fyi/signals/15f16c54-184f-4344-881c-49fe54dbafda","signal_json_url":"https://onlylabs.fyi/signals/15f16c54-184f-4344-881c-49fe54dbafda/signal.json","text":"release · basetenlabs/truss v0.18.9 · signal_desk=releases · occurred_at=2026-06-11T19:14:59+00:00 · url=https://github.com/basetenlabs/truss/releases/tag/v0.18.9 · raw={\"repo\":\"basetenlabs/truss\"}"},{"ref":"E40","kind":"event","title":"Container Restart Tracking","date":"2026-06-11T18:34:07+00:00","date_source":"sitemap.lastmod","source_url":"https://www.baseten.co/resources/changelog/container-restart-tracking/","signal_url":"https://onlylabs.fyi/signals/1a6065f3-7cb2-4e41-af58-5f64dc70dcd6","signal_json_url":"https://onlylabs.fyi/signals/1a6065f3-7cb2-4e41-af58-5f64dc70dcd6/signal.json","text":"post_published · Container Restart Tracking · signal_desk=talking · occurred_at=2026-06-11T18:34:07+00:00 · url=https://www.baseten.co/resources/changelog/container-restart-tracking/"},{"ref":"E41","kind":"event","title":"Mercury 2 Is Now Available On Baseten","date":"2026-06-11T00:00:00+00:00","date_source":"sitemap.lastmod","source_url":"https://www.baseten.co/blog/mercury-2-is-now-available-on-baseten/","signal_url":"https://onlylabs.fyi/signals/73a8fdcb-b90a-4198-beec-e83889e8338f","signal_json_url":"https://onlylabs.fyi/signals/73a8fdcb-b90a-4198-beec-e83889e8338f/signal.json","text":"post_published · Mercury 2 Is Now Available On Baseten · signal_desk=talking · occurred_at=2026-06-11T00:00:00+00:00 · url=https://www.baseten.co/blog/mercury-2-is-now-available-on-baseten/"},{"ref":"E42","kind":"event","title":"Product Manager, Developer Experience","date":"2026-06-10T20:43:54.991+00:00","date_source":"ashby.publishedAt","source_url":"https://jobs.ashbyhq.com/baseten/2d78fdcf-53e1-45d3-a047-2aefb5ad3153","signal_url":"https://onlylabs.fyi/signals/9fdd79ab-e89f-4aee-94e6-cd1cabd7542b","signal_json_url":"https://onlylabs.fyi/signals/9fdd79ab-e89f-4aee-94e6-cd1cabd7542b/signal.json","text":"job_opened · Product Manager, Developer Experience · signal_desk=hiring · occurred_at=2026-06-10T20:43:54.991+00:00 · url=https://jobs.ashbyhq.com/baseten/2d78fdcf-53e1-45d3-a047-2aefb5ad3153 · raw={\"location\":\"San Francisco\",\"team\":\"Product\",\"ats\":\"ashby\"}"},{"ref":"E43","kind":"event","title":"basetenlabs/truss v0.18.8","date":"2026-06-10T15:10:22+00:00","date_source":"source","source_url":"https://github.com/basetenlabs/truss/releases/tag/v0.18.8","signal_url":"https://onlylabs.fyi/signals/bb9acec0-a183-422c-ac83-36af1986a992","signal_json_url":"https://onlylabs.fyi/signals/bb9acec0-a183-422c-ac83-36af1986a992/signal.json","text":"release · basetenlabs/truss v0.18.8 · signal_desk=releases · occurred_at=2026-06-10T15:10:22+00:00 · url=https://github.com/basetenlabs/truss/releases/tag/v0.18.8 · raw={\"repo\":\"basetenlabs/truss\"}"},{"ref":"E44","kind":"event","title":"Vllm And Sglang Metrics","date":"2026-06-10T01:18:18+00:00","date_source":"sitemap.lastmod","source_url":"https://www.baseten.co/resources/changelog/vllm-and-sglang-metrics/","signal_url":"https://onlylabs.fyi/signals/f907682f-00e0-4d92-b8ed-6ce33029e2a7","signal_json_url":"https://onlylabs.fyi/signals/f907682f-00e0-4d92-b8ed-6ce33029e2a7/signal.json","text":"post_published · Vllm And Sglang Metrics · signal_desk=talking · occurred_at=2026-06-10T01:18:18+00:00 · url=https://www.baseten.co/resources/changelog/vllm-and-sglang-metrics/"},{"ref":"E45","kind":"event","title":"Technical Program Manager, Infrastructure","date":"2026-06-10T01:04:24.882+00:00","date_source":"ashby.publishedAt","source_url":"https://jobs.ashbyhq.com/baseten/7d9d5a1f-3834-434e-b22f-4bd62317be3c","signal_url":"https://onlylabs.fyi/signals/9309facb-aea2-4492-bd58-9f4cec23c926","signal_json_url":"https://onlylabs.fyi/signals/9309facb-aea2-4492-bd58-9f4cec23c926/signal.json","text":"job_opened · Technical Program Manager, Infrastructure · signal_desk=hiring · occurred_at=2026-06-10T01:04:24.882+00:00 · url=https://jobs.ashbyhq.com/baseten/7d9d5a1f-3834-434e-b22f-4bd62317be3c · raw={\"location\":\"San Francisco\",\"team\":\"Infrastructure\",\"ats\":\"ashby\"}"},{"ref":"E46","kind":"event","title":"basetenlabs/truss v0.18.7","date":"2026-06-09T21:42:48+00:00","date_source":"source","source_url":"https://github.com/basetenlabs/truss/releases/tag/v0.18.7","signal_url":"https://onlylabs.fyi/signals/438bd1d6-1b54-48af-a4de-f1005cad5e5e","signal_json_url":"https://onlylabs.fyi/signals/438bd1d6-1b54-48af-a4de-f1005cad5e5e/signal.json","text":"release · basetenlabs/truss v0.18.7 · signal_desk=releases · occurred_at=2026-06-09T21:42:48+00:00 · url=https://github.com/basetenlabs/truss/releases/tag/v0.18.7 · raw={\"repo\":\"basetenlabs/truss\"}"},{"ref":"E47","kind":"event","title":"Engineering Manager, Cloud Platform","date":"2026-06-09T18:23:27.319+00:00","date_source":"ashby.publishedAt","source_url":"https://jobs.ashbyhq.com/baseten/0870ed34-7365-4b9f-a50a-481783b8c266","signal_url":"https://onlylabs.fyi/signals/456a26b8-1d15-4aa8-8ef9-5509ab3feac8","signal_json_url":"https://onlylabs.fyi/signals/456a26b8-1d15-4aa8-8ef9-5509ab3feac8/signal.json","text":"job_opened · Engineering Manager, Cloud Platform · signal_desk=hiring · occurred_at=2026-06-09T18:23:27.319+00:00 · url=https://jobs.ashbyhq.com/baseten/0870ed34-7365-4b9f-a50a-481783b8c266 · raw={\"location\":\"San Francisco\",\"team\":\"Infrastructure\",\"ats\":\"ashby\"}"},{"ref":"E48","kind":"event","title":"Engineering Manager, Internal Platform","date":"2026-06-09T18:12:27.736+00:00","date_source":"ashby.publishedAt","source_url":"https://jobs.ashbyhq.com/baseten/1e721b74-58b9-4b03-ac0f-4b1e5c32342e","signal_url":"https://onlylabs.fyi/signals/66e5e498-c460-4c04-b4fa-6cd9bab21eda","signal_json_url":"https://onlylabs.fyi/signals/66e5e498-c460-4c04-b4fa-6cd9bab21eda/signal.json","text":"job_opened · Engineering Manager, Internal Platform · signal_desk=hiring · occurred_at=2026-06-09T18:12:27.736+00:00 · url=https://jobs.ashbyhq.com/baseten/1e721b74-58b9-4b03-ac0f-4b1e5c32342e · raw={\"location\":\"San Francisco\",\"team\":\"Internal Platform (Dev Tooling)\",\"ats\":\"ashby\"}"},{"ref":"E49","kind":"event","title":"Engineering Manager, Runtime Fabric","date":"2026-06-09T18:02:59.642+00:00","date_source":"ashby.publishedAt","source_url":"https://jobs.ashbyhq.com/baseten/aae72bd3-6f75-4238-9741-95fec11facb9","signal_url":"https://onlylabs.fyi/signals/bdf68b9a-56a7-427a-9d90-5401af51c02b","signal_json_url":"https://onlylabs.fyi/signals/bdf68b9a-56a7-427a-9d90-5401af51c02b/signal.json","text":"job_opened · Engineering Manager, Runtime Fabric · signal_desk=hiring · occurred_at=2026-06-09T18:02:59.642+00:00 · url=https://jobs.ashbyhq.com/baseten/aae72bd3-6f75-4238-9741-95fec11facb9 · raw={\"location\":\"San Francisco\",\"team\":\"Runtime Fabric\",\"ats\":\"ashby\"}"},{"ref":"E50","kind":"event","title":"Log Export To Otlp Endpoints","date":"2026-06-08T22:16:01+00:00","date_source":"sitemap.lastmod","source_url":"https://www.baseten.co/resources/changelog/log-export-to-otlp-endpoints/","signal_url":"https://onlylabs.fyi/signals/48883b91-0360-4101-bde9-fab028725f46","signal_json_url":"https://onlylabs.fyi/signals/48883b91-0360-4101-bde9-fab028725f46/signal.json","text":"post_published · Log Export To Otlp Endpoints · signal_desk=talking · occurred_at=2026-06-08T22:16:01+00:00 · url=https://www.baseten.co/resources/changelog/log-export-to-otlp-endpoints/"},{"ref":"E51","kind":"event","title":"Strategic Finance Associate / Sr. Associate","date":"2026-06-08T18:26:02.827+00:00","date_source":"ashby.publishedAt","source_url":"https://jobs.ashbyhq.com/baseten/71a011b6-0f17-4c0a-b0ba-38deffce1adb","signal_url":"https://onlylabs.fyi/signals/ad886de5-9ed0-420b-8fd8-5266d3eff536","signal_json_url":"https://onlylabs.fyi/signals/ad886de5-9ed0-420b-8fd8-5266d3eff536/signal.json","text":"job_opened · Strategic Finance Associate / Sr. Associate · signal_desk=hiring · occurred_at=2026-06-08T18:26:02.827+00:00 · url=https://jobs.ashbyhq.com/baseten/71a011b6-0f17-4c0a-b0ba-38deffce1adb · raw={\"location\":\"San Francisco\",\"team\":\"G&A\",\"ats\":\"ashby\"}"},{"ref":"E52","kind":"event","title":"basetenlabs/baseten-go v0.1.0","date":"2026-06-08T17:52:34+00:00","date_source":"source","source_url":"https://github.com/basetenlabs/baseten-go/releases/tag/v0.1.0","signal_url":"https://onlylabs.fyi/signals/1cebec52-c245-4db2-bce1-7839deb3391b","signal_json_url":"https://onlylabs.fyi/signals/1cebec52-c245-4db2-bce1-7839deb3391b/signal.json","text":"release · basetenlabs/baseten-go v0.1.0 · signal_desk=releases · occurred_at=2026-06-08T17:52:34+00:00 · url=https://github.com/basetenlabs/baseten-go/releases/tag/v0.1.0 · raw={\"repo\":\"basetenlabs/baseten-go\"}"},{"ref":"E53","kind":"event","title":"basetenlabs/baseten-python v0.9.0","date":"2026-06-08T14:33:19+00:00","date_source":"source","source_url":"https://github.com/basetenlabs/baseten-python/releases/tag/v0.9.0","signal_url":"https://onlylabs.fyi/signals/2c598b49-1126-4125-aaf2-5493dc18464d","signal_json_url":"https://onlylabs.fyi/signals/2c598b49-1126-4125-aaf2-5493dc18464d/signal.json","text":"release · basetenlabs/baseten-python v0.9.0 · signal_desk=releases · occurred_at=2026-06-08T14:33:19+00:00 · url=https://github.com/basetenlabs/baseten-python/releases/tag/v0.9.0 · raw={\"repo\":\"basetenlabs/baseten-python\"}"},{"ref":"E54","kind":"event","title":"basetenlabs/ideogram4","date":"2026-06-03T20:18:15+00:00","date_source":"source","source_url":"https://github.com/basetenlabs/ideogram4","signal_url":"https://onlylabs.fyi/signals/2c538801-4357-4519-a15a-708e2ece2643","signal_json_url":"https://onlylabs.fyi/signals/2c538801-4357-4519-a15a-708e2ece2643/signal.json","text":"repo_forked · basetenlabs/ideogram4 · signal_desk=forks · occurred_at=2026-06-03T20:18:15+00:00 · url=https://github.com/basetenlabs/ideogram4 · raw={\"repo\":\"basetenlabs/ideogram4\",\"parent\":\"ideogram-oss/ideogram4\"}"},{"ref":"E55","kind":"event","title":"basetenlabs/tinker-cookbook","date":"2026-06-03T01:38:21+00:00","date_source":"source","source_url":"https://github.com/basetenlabs/tinker-cookbook","signal_url":"https://onlylabs.fyi/signals/fd7262b9-0c91-4cb4-8988-e408d16e0f67","signal_json_url":"https://onlylabs.fyi/signals/fd7262b9-0c91-4cb4-8988-e408d16e0f67/signal.json","text":"repo_forked · basetenlabs/tinker-cookbook · signal_desk=forks · occurred_at=2026-06-03T01:38:21+00:00 · url=https://github.com/basetenlabs/tinker-cookbook · stars=1 · raw={\"repo\":\"basetenlabs/tinker-cookbook\",\"parent\":\"thinking-machines-lab/tinker-cookbook\"}"},{"ref":"E56","kind":"event","title":"basetenlabs/runc","date":"2026-06-02T16:22:10+00:00","date_source":"source","source_url":"https://github.com/basetenlabs/runc","signal_url":"https://onlylabs.fyi/signals/df3d49f6-047e-433d-b76b-6b24aaf06198","signal_json_url":"https://onlylabs.fyi/signals/df3d49f6-047e-433d-b76b-6b24aaf06198/signal.json","text":"repo_forked · basetenlabs/runc · signal_desk=forks · occurred_at=2026-06-02T16:22:10+00:00 · url=https://github.com/basetenlabs/runc · raw={\"repo\":\"basetenlabs/runc\",\"parent\":\"opencontainers/runc\"}"},{"ref":"E57","kind":"event","title":"basetenlabs/compact-rl","date":"2026-05-21T16:40:17+00:00","date_source":"source","source_url":"https://github.com/basetenlabs/compact-rl","signal_url":"https://onlylabs.fyi/signals/af6458c8-4673-489a-8d00-3f15546a303b","signal_json_url":"https://onlylabs.fyi/signals/af6458c8-4673-489a-8d00-3f15546a303b/signal.json","text":"repo_forked · basetenlabs/compact-rl · signal_desk=forks · occurred_at=2026-05-21T16:40:17+00:00 · url=https://github.com/basetenlabs/compact-rl · stars=7 · raw={\"repo\":\"basetenlabs/compact-rl\",\"parent\":\"PrimeIntellect-ai/prime-rl\"}"},{"ref":"E58","kind":"event","title":"basetenlabs/TorchSpec","date":"2026-05-19T16:37:18+00:00","date_source":"source","source_url":"https://github.com/basetenlabs/TorchSpec","signal_url":"https://onlylabs.fyi/signals/a390e7b3-e28a-4cf0-8035-0fab6f120633","signal_json_url":"https://onlylabs.fyi/signals/a390e7b3-e28a-4cf0-8035-0fab6f120633/signal.json","text":"repo_forked · basetenlabs/TorchSpec · signal_desk=forks · occurred_at=2026-05-19T16:37:18+00:00 · url=https://github.com/basetenlabs/TorchSpec · raw={\"repo\":\"basetenlabs/TorchSpec\",\"parent\":\"lightseekorg/TorchSpec\"}"},{"ref":"E59","kind":"event","title":"basetenlabs/mcore-bridge","date":"2026-05-04T04:14:29+00:00","date_source":"source","source_url":"https://github.com/basetenlabs/mcore-bridge","signal_url":"https://onlylabs.fyi/signals/066a3ef5-36ab-4da7-b762-211c6a0f083f","signal_json_url":"https://onlylabs.fyi/signals/066a3ef5-36ab-4da7-b762-211c6a0f083f/signal.json","text":"repo_forked · basetenlabs/mcore-bridge · signal_desk=forks · occurred_at=2026-05-04T04:14:29+00:00 · url=https://github.com/basetenlabs/mcore-bridge · raw={\"repo\":\"basetenlabs/mcore-bridge\",\"parent\":\"modelscope/mcore-bridge\"}"},{"ref":"E60","kind":"event","title":"basetenlabs/autocomp","date":"2026-05-01T17:27:12+00:00","date_source":"source","source_url":"https://github.com/basetenlabs/autocomp","signal_url":"https://onlylabs.fyi/signals/16b2dbbf-5bd2-4f2a-a1f8-21f0bbaa0bf9","signal_json_url":"https://onlylabs.fyi/signals/16b2dbbf-5bd2-4f2a-a1f8-21f0bbaa0bf9/signal.json","text":"repo_forked · basetenlabs/autocomp · signal_desk=forks · occurred_at=2026-05-01T17:27:12+00:00 · url=https://github.com/basetenlabs/autocomp · raw={\"repo\":\"basetenlabs/autocomp\",\"parent\":\"ucb-bar/autocomp\"}"}]}