WritingDigitalOcean (GradientAI)DigitalOcean (GradientAI)published Apr 2, 2026seen 5d

The Glue Problem in Modern AI Development

Open original ↗

Captured source

source ↗
published Apr 2, 2026seen 5dcaptured 3dhttp 200method plain

The Glue Problem in Modern AI Development | DigitalOcean

© 2026 DigitalOcean, LLC. Sitemap .

Dark mode is coming soon. Engineering The Glue Problem in Modern AI Development

By James Skelton

AI/ML Technical Content Strategist

Updated: April 2, 2026 10 min read

<- Back to blog home

AI is now central to modern software development. Teams across industries are turning to AI to solve product and workflow problems in software. But building production systems is still complex. The hardest part of deploying AI isn’t the model, it’s everything around it. That complexity becomes a glue-code problem when storage, compute, orchestration, networking, authentication, and inference live in separate systems with different operating models. The more seams a workflow crosses, the more developer effort shifts from building product logic to wiring services together.

A more integrated platform model reduces that burden. This article examines what it takes to deploy and operate AI applications in today’s cloud landscape. Using two examples, we will compare the process in two landscapes: a neocloud combined with a hyperscaler versus a vertically integrated cloud stack. While surface-level costs may look similar, the integrated model offers clear advantages in efficiency by reducing the time developers spend writing glue code and managing the problems that emerge as AI products scale.

Key Takeaways

The biggest cost in AI systems isn’t infrastructure: it’s integration. Fragmented, multi-provider stacks force developers to spend time writing and maintaining glue code instead of building product features, turning engineering effort into the real cost center.

Raw infrastructure pricing is no longer the differentiator; total cost of ownership is. Even when platform costs are nearly identical, the added complexity of cross-cloud orchestration increases operational overhead, failure points, and staffing requirements at scale.

The future of AI platforms is vertical integration, not more tools. Platforms that unify compute, storage, and inference reduce friction, accelerate development, and allow smaller teams to build and scale AI applications more efficiently.

The Real Problem Is Fragmentation

Consider the modern landscape for AI deployment . AI applications rely on far more than inference alone. Real workflows span object storage, compute, prompt transformation, model endpoints, persistence, and monitoring, each of which requires its own operational expertise.

Naturally, these pieces are usually not connected natively. That’s just the current state of the cloud: siloed products, resources, and services that are often only able to connect to one another through APIs. As a result, developers spend valuable time setting up and maintaining those connections. Bridging the gaps between services and products operating in different clouds requires real, manual work. The separated system is harder to scale, secure, and debug.

Consider a neocloud like Baseten or Fireworks.AI paired with a hyperscaler like AWS . In this setup, the neocloud hosts the model while the hyperscaler orchestrates the surrounding application or workflow. We could have an application that processes user-uploaded documents stored in a storage service like S3 and uses an LLM to summarize them. Developers often manage authentication via API keys instead of shared cloud identity primitives. In AWS, a file upload could trigger a function such as Lambda through an S3 event. In this example, Lambda would download the file, convert its contents into a JSON prompt format, and send it via an HTTP request to the neocloud’s model endpoint using an API key. The model returns a response, which the Lambda then parses and writes back to S3 or a database.

If the model is scaled to zero, the request may experience additional latency, and if processing large batches, the developer must implement their own batching, retries, and error handling. None of this pipeline is managed natively by the neocloud, so the developer is responsible for orchestrating every step between AWS services and the external inference layer. None of this orchestration is the product’s differentiator, but it still has to be built and maintained.

Developer time and compute resources are consumed by connecting services and maintaining the code that bridges them. That complexity directly increases cost.

What This Means for Developers

The operational consequences become clearer at scale. First, scaling gets harder. Each custom handoff adds another failure point that teams must monitor and maintain. As usage grows, queues, retries, timeouts, and concurrency limits all come under pressure, forcing teams to spend more time keeping the system stable.

Next, networking gets objectively messier. Hosting an LLM often means exposing it through a public API endpoint. That makes the model behave more like an external SaaS dependency than native infrastructure, increasing concerns around security, egress, latency, and network boundaries.

Finally, data pipelines are not integrated. Without native connections to services like AWS’s S3 or Step Functions , developers must move and reshape data themselves. Teams must spend engineering time moving data between services and ensuring those connections work reliably. Batch and event-driven workflows may require extra orchestration layers, as well.

In practice, this shows how costs become fragmented. Different billing models across providers make spending harder to track. Furthermore, forecasting and debugging cost spikes become more difficult when usage spans multiple systems. In practice, this can force companies to dedicate entire teams to operating a fragmented stack.

The Hidden Cost of Glue Code

A typical AI pipeline may require 5–10 integration points, each introducing latency, failure risk, and engineering overhead. At scale, teams often have to dedicate developers or even entire teams to maintaining these connections. This is where the hidden cost of glue code becomes unavoidable: not in infrastructure or inference, but in people.

Once a certain scale is achieved, inference will eventually need to be migrated from serverless to dedicated providers. As you scale, serverless inference can become less efficient because you trade simplicity for less control over cost, performance, and capacity. It works well for low or unpredictable traffic, but at higher volumes, cold starts, concurrency limits, variable latency, and per-request…

Excerpt shown — open the source for the full document.

Notability

notability 3.0/10

Article on AI development concept, not notable release