RepoOpenBMB (MiniCPM)OpenBMB (MiniCPM)published Oct 9, 2025seen 5d

OpenBMB/Scicore-Omics

Python

Open original ↗

Captured source

source ↗
published Oct 9, 2025seen 5dcaptured 14hhttp 200method plain

OpenBMB/Scicore-Omics

Description: SciCore-Omics the first tri-modal foundation model linking histology images, spatial transcriptomics, and biological language.

Language: Python

License: Apache-2.0

Stars: 8

Forks: 2

Open issues: 0

Created: 2025-10-09T11:04:30Z

Pushed: 2026-06-03T01:02:56Z

Default branch: main

Fork: no

Archived: no

README:

---

News

  • 2026-06: SciCore-Omics model weights are publicly available on Hugging Face: `openbmb/SciCore-Omics`.
  • 2026-06: The online demo is available through Hugging Face Spaces: `Alkaidxxy/SciCore-Omics`.
  • 2026-06: Training and inference code has been released in this repository.

---

Overview

SciCore-Omics is a gene-aware multimodal foundation model for joint reasoning over histology images, spatial transcriptomic profiles, and biological language.

Built on a MiniCPM-V-style multimodal language-model stack, SciCore-Omics introduces a dedicated transcriptomic branch that encodes gene-expression profiles with NicheFormer, compresses gene representations through a Gene Q-Former, and projects them into the language-model token space through a Gene Projector.

The model is designed for spatial biology and pathology scenarios where tissue morphology and molecular states should be interpreted together rather than treated as isolated modalities.

---

Highlights

  • Tri-modal foundation model for spatial biology

SciCore-Omics links histology images, spatial transcriptomics, and biological language in a unified autoregressive modeling framework.

  • Dedicated gene branch

The model uses NicheFormer, a Gene Q-Former, and a Gene Projector to transform transcriptomic profiles into LLM-compatible embeddings.

  • Image-gene-text reasoning

SciCore-Omics supports image-only, gene-only, and image-gene joint inputs, enabling morphology-aware and molecule-aware biological reasoning.

  • Staged training pipeline

The repository provides separate training stages for gene-bridge distillation, Swift-based CPT/SFT, and GSPO/PPO-style reinforcement learning refinement.

  • Public release

Model weights, online demo, local inference code, and training entrypoints are publicly available.

---

What Can SciCore-Omics Do?

SciCore-Omics can be used for research tasks such as:

  • histology-conditioned biological description generation;
  • transcriptome-conditioned biological description generation;
  • joint image-gene reasoning over spatial omics spots;
  • spatial domain recognition;
  • gene-expression-related reasoning;
  • pathology and tissue-level question answering;
  • preliminary case-level molecular interpretation from histology images.

> Note: SciCore-Omics is released for research use. It is not a standalone clinical diagnostic system.

---

Model Release

| Item | Status | Link | | --------------- | --------- | ---------------------------------------------------------------------------------- | | Model weights | Available | `openbmb/SciCore-Omics` | | Online demo | Available | `Alkaidxxy/SciCore-Omics` | | Source code | Available | `OpenBMB/Scicore-Omics` | | License | Available | [Apache-2.0](LICENSE) | | Training code | Available | train-distill-gene/, train-swift-cpt-sft/, train-rl/ | | Local inference | Available | eval/ |

---

Repository Structure

Scicore-Omics/
├── model/ # Core model, processor, tokenizer, and gene branch definitions
├── eval/ # Minimal inference and evaluation examples
├── figs/ # Figures used in README and documentation
├── train-distill-gene/ # Gene bridge distillation scripts
├── train-swift-cpt-sft/ # Swift-based CPT/SFT training scripts
├── train-rl/ # GSPO/PPO-style RL refinement pipeline
├── environment.yml # Conda environment specification
├── LICENSE # Apache-2.0 license
└── README.md # Project documentation

If you are new to the codebase, the recommended reading order is:

1. eval/ — run the model first; 2. model/ — understand the architecture; 3. train-distill-gene/ — understand gene-branch alignment; 4. train-swift-cpt-sft/ — understand CPT/SFT training; 5. train-rl/ — understand reinforcement learning refinement.

---

Quick Start

Option 1: Try the Online Demo

The fastest way to try SciCore-Omics is through the Hugging Face Space:

👉 https://huggingface.co/spaces/Alkaidxxy/SciCore-Omics

The demo supports scientific questions with uploaded gene-expression files and/or histology images.

---

Option 2: Run Local Inference

1. Clone the repository

git clone https://github.com/OpenBMB/Scicore-Omics.git
cd Scicore-Omics

2. Create the environment

conda env create -f environment.yml
conda activate OMICS

The reference environment was developed on Linux with NVIDIA A800-SXM4-80GB GPUs.

flash-attn can be sensitive to CUDA, PyTorch, and GPU versions. If installation fails, please adjust the flash-attn version according to your local environment.

3. Download the model weights

You can download the public weights from Hugging Face:

huggingface-cli download openbmb/SciCore-Omics \
--local-dir ./weights/SciCore-Omics

Alternatively, you can directly load the model by setting:

model_path = "openbmb/SciCore-Omics"

4. Run a minimal example

python eval/example.py \
--model_path ./weights/SciCore-Omics \
--image_path examples/assets/example.png \
--gene_path examples/assets/example.h5ad \
--prompt "Please describe the tissue morphology and molecular state of this sample."

Expected output:

The model generates a natural-language response describing tissue morphology,
transcriptomic context, and potentially relevant biological processes.

---

Before You Run

Please make sure you have:

  • a CUDA-enabled GPU environment;
  • the SciCore-Omics model weights from Hugging Face;
  • a histology image in .png, .jpg, or .jpeg format;
  • a spatial transcriptomics file in .h5ad format;
  • gene names compatible with the gene tokenizer resources under model/gene_tokenizer/.

---

Examples

Image + Gene Input

python eval/example.py \
--model_path openbmb/SciCore-Omics \
--image_path examples/assets/example.png \
--gene_path…

Excerpt shown — open the source for the full document.

Notability

notability 3.0/10

Low-starred new repo from known lab