ForkBasetenBasetenpublished Feb 11, 2026seen 5d

basetenlabs/modelexpress

forked from ai-dynamo/modelexpress

Open original ↗

Captured source

source ↗
published Feb 11, 2026seen 5dcaptured 16hhttp 200method plain

basetenlabs/modelexpress

Description: Model Express is a Rust-based component meant to be placed next to existing model inference systems to speed up their startup times and improve overall performance.

License: Apache-2.0

Stars: 1

Forks: 0

Open issues: 3

Created: 2026-02-11T17:51:52Z

Pushed: 2026-04-24T17:10:31Z

Default branch: main

Fork: yes

Parent repository: ai-dynamo/modelexpress

Archived: no

README:

Dynamo Model Express

Model Express is a Rust-based model cache management service designed to be deployed as a sidecar alongside existing inference solutions such as NVIDIA Dynamo. Model Express accelerates overall inference performance by reducing the latency of artifact downloads and writes.

Project Overview

It should be established that although Model Express is a component of the Dynamo inference stack, Model Express can be deployed standalone to accelerate other inference solutions such as vLLM, Sglang, etc. independent of Dynamo.

The current version of Model Express acts as a cache for HuggingFace, providing fast access to pre-trained models and reducing the need for repeated downloads across multiple servers. Model Express supports two deployment modes: shared storage (where client and server share a network drive) and distributed mode (where model files are transferred over gRPC when shared storage is not available). This enables flexible deployment in various infrastructure setups, from high-performance shared filesystem environments to distributed cloud deployments.

Model Express also shines in multi-node / multi-worker environments, where inference solutions may spawn multiple replicas that require model artifacts to be shared efficiently.

Future versions will expand support to additional model providers (AWS, Azure, NFS, etc.) and include features like model versioning, advanced caching strategies, advanced networking using NIXL, checkpoint storage, as well as a peer-to-peer model sharing system.

Architecture

The project is organized as a Rust workspace with the following components:

  • `modelexpress_server`: The main gRPC server that provides model services
  • `modelexpress_client`: Client library for interacting with the server
  • `modelexpress_common`: Shared code and constants between client and server

The current diagram represents a high-level overview of the Model Express architecture in shared storage mode. In this mode, both the server and client share access to the same persistent volume for model storage. Model Express also supports a distributed mode where the client and server do not share storage; in this case, model files are transferred over gRPC streams from the server to the client. The architecture will evolve with time as we add new features and components.

architecture-beta
group MXS(cloud)[Model Express]

service db(database)[Database] in MXS
service disk(disk)[Persistent Volume Storage] in MXS
service server(server)[Server] in MXS

db:L -- R:server
disk:T -- B:server

group MXC(cloud)[Inference Server]

service client(server)[Client] in MXC
disk:T -- B:client

The client is either a library embedded in the inference server of your choice, or a CLI tool which can be used beforehand to hydrate the model cache.

CLI Tool

The client library includes a command-line interface, meant to facilitate interaction with the Model Express server, and act as a HuggingFace CLI replacement. In the future, it will also abstract other model providers, making it a one-stop shop for interacting with various model APIs.

See [docs/CLI.md](docs/CLI.md) for detailed CLI documentation.

Prerequisites

  • Rust: Latest stable version (recommended: 1.90)
  • Cargo: Rust's package manager (included with Rust)
  • protoc: The Protocol Buffers compiler is expected to be installed and usable
  • Docker (optional): For containerized deployment

Quick Start

1. Clone the Repository

git clone
cd modelexpress

2. Build the Project

cargo build

3. Run the Server

cargo run --bin modelexpress-server

The server will start on 0.0.0.0:8001 by default.

Running Options

Option 1: Local Development

# Start the gRPC server
cargo run --bin modelexpress-server

# In another terminal, run tests
cargo test

# Run integration tests
./run_integration_tests.sh

Option 2: Docker Deployment

# Build and run with docker-compose
docker-compose up --build

# Or build and run manually
docker build -t model-express .
docker run -p 8000:8000 model-express

Option 3: Kubernetes Deployment

Prerequisites:

  • Kubernetes Cluster: With GPU support and kubectl configured to access your cluster
  • HuggingFace Token: Required for accessing HuggingFace models within your cluster via k8s secret as shown here:
export HF_TOKEN=your_hf_token
kubectl create secret generic hf-token-secret \
--from-literal=HF_TOKEN=${HF_TOKEN} \
-n ${NAMESPACE}
  • Docker Registry: Container registry accessible from your cluster (Docker Hub, private registry, or local registry)
  • Model Express Image: Built and pushed to your registry by building from root directory of repository
# Build the Model Express image
docker build -t model-express:latest .

# Tag for your registry
docker tag model-express:latest your-registry/model-express:latest

# Push to your registry
docker push your-registry/model-express:latest
  • Update Image Reference: Update the image reference in your deployment files to match your registry
# In k8s-deployment.yaml or agg.yaml, update:
image: your-registry/model-express:latest

Now to deploy Modelexpress in your cluster you can run:

kubectl apply -f k8s-deployment.yaml

Please follow the guide here to learn more on how to launch modelexpress with dynamo on kubernetes.

Configuration

ModelExpress uses a layered configuration system that supports multiple sources in order of precedence:

1. Command line arguments (highest priority) 2. Environment variables 3. Configuration files (YAML) 4. Default values (lowest priority)

Configuration File

Create a configuration file (supports YAML):

# Generate a sample configuration file
cargo run --bin config_gen -- --output model-express.yaml

Start the server with a configuration file:…

Excerpt shown — open the source for the full document.

Notability

notability 1.0/10

Low-star fork, trivial event.