What does this repo signal mean?

InclusionAI (Ant Group) published inclusionAI/dFactory (Python). This repository signal exposes tooling, eval, infrastructure, or model-adjacent work before it may appear in a launch post. High-signal details: repo inclusionAI/dFactory · language Python · Solid new repo with moderate traction.. onlylabs links this event to 1 captured evidence page and 6 related repo signals.

InclusionAI (Ant Group) Repo: inclusionAI/dFactory

Captured source

source ↗

GitHub/github.com/inclusionAI/dFactory

inclusionAI/dFactory repository metadata

Source ↗

published Nov 5, 2025seen 5dcaptured 15hhttp 200method plain

inclusionAI/dFactory

Description: Easy and Efficient dLLM Fine-Tuning

Language: Python

License: Apache-2.0

Stars: 258

Forks: 15

Open issues: 8

Created: 2025-11-05T09:54:14Z

Pushed: 2026-03-02T06:00:00Z

Default branch: main

Fork: no

Archived: no

README:

dFactory: Easy and Efficient dLLM Fine-Tuning

Features

Various models: LLaDA2.0-mini (16B), LLaDA2.0-flash (100B)
Integrated methods: (Continous) supervised-finetuning (block-diffusion, full attention), etc.

Supported Models

| Model ID | Description | Size | Config Path | Hugging Face Link | | --- | --- | --- | --- | --- | | inclusionAI/LLaDA2.0-mini-preview | Instruction-tuned model, ready for downstream applications. | 16B | configs/model_configs/llada2_mini/ | 🤗 Model Card | | inclusionAI/LLaDA2.0-mini | Instruction-tuned model, ready for downstream applications. | 16B | configs/model_configs/llada2_mini/ | 🤗 Model Card | | inclusionAI/LLaDA2.0-flash-preview | Instruction-tuned model, ready for downstream applications. | 100B | configs/model_configs/llada2_flash/ | 🤗 Model Card | | inclusionAI/LLaDA2.0-flash | Instruction-tuned model, ready for downstream applications. | 100B | configs/model_configs/llada2_flash/ | 🤗 Model Card |

TODO

We are actively working on enhancing the project with new features and improvements. Our roadmap for the near future includes:

[☑️] Comprehensive Documentation: A full documentation site is underway, which will feature in-depth tutorials, API references, and best practices.
[☑️] Trainable Parallel Decoding: Integration of support for trainable parallel decoding to enable more advanced use cases.

Stay tuned for these updates!

Getting Started

0. Environment Setup

Option A: Use uv (Recommended)

# Install uv if not already installed
curl -LsSf https://astral.sh/uv/install.sh | sh

git clone https://github.com/inclusionAI/dFactory.git --recursive
cd dFactory/VeOmni

# Install dependencies
uv sync --extra gpu

# Activate environment
source .venv/bin/activate

# Back to our workdir
cd ..

Option B: Use pip

git clone https://github.com/inclusionAI/dFactory.git --recursive
cd dFactory
pip install -e VeOmni/

1. Download and Merge Model Weights

Our training scripts require model weights in a "merged-expert" format for optimal performance. Before starting, you must download the standard weights and convert them.

1. Download the original model: We provide a helper script to download the weights from the Hugging Face Hub.

# Choose a destination for the original model files
python ./scripts/download_hf_model.py \
--repo_id inclusionAI/LLaDA2.0-mini-preview \
--local_dir /path/to/separate_expert_model

2. Convert to the merged format: Run the following script to create the merged checkpoint required for training.

# Use the path from the previous step as the source
python scripts/moe_convertor.py \
--input-path /path/to/separate_expert_model \
--output-path /path/to/save/merged_model \
--mode merge

The directory /path/to/save/merged_model is what you will use for the training script. For more details, see [MoE Expert Merging and Splitting Utilities](#moe-expert-merging-and-splitting-utilities)

2. Prepare Training Data

Before training, the dataset must be prepared. This tutorial uses the openai/gsm8k dataset and demonstrates how to convert it into the conversational format.

We provide an example script, ./scripts/build_gsm8k_dataset.py, for this purpose. You can adapt this script or write your own to process other datasets.

Running the following command executes the script. It converts the "question" and "answer" fields into a conversational messages field. The processed dataset is then saved to the ./gsm8k_datasets/ directory, split into two separate files: train.jsonl for training and test.jsonl for evaluation.

python ./scripts/build_gsm8k_dataset.py

3. Modify Training Configs

Edit configs/sft/llada2_mini_bd_sft.yaml:

model:
model_path: "/your/model/path"
data:
train_path: "/your/data/path"
train:
output_dir: "/your/output/path"

4. Run Training

With all preparations complete, you can now start the fine-tuning process with a single command:

PYTHONPATH=$(pwd)/VeOmni:$PYTHONPATH sh train.sh tasks/train_llada2_bd.py configs/sft/llada2_mini_bd_sft.yaml

5. Interacting with the Fine-Tuned Model

To interact with your fine-tuned model, you must complete two main steps: converting the checkpoint and copying the modeling file.

Step 1: Convert the Checkpoint

First, you need to convert the checkpoint from the merged format used during training back to the standard Mixture-of-Experts (MoE) structure.

> Important: Finding the Correct Input Path > > The --input-path for the conversion script is the path to the saved Hugging Face checkpoint, not the root output directory you specified during training. The checkpoint is typically located in a subdirectory like: > > TRAIN_OUTPUT_DIR/checkpoints/global_step_XXX/hf_ckpt/

Run the following command to perform the conversion:

python scripts/moe_convertor.py \
--input-path /path/to/merged_model \
--output-path /path/to/save/separate_expert_model \
--mode split

Step 2: Copy the Modeling File

After the conversion, a final manual step is required. You must copy the model's architecture file (e.g., modeling_llada2_moe.py) into the newly created separate_expert_model directory.

This file must come from the directory of your original base model — the one you started with before any merge or training operations. The training and conversion processes only update the model weights, not the architecture file, which is why the original version is needed.

# Example: Copying from the initial, pre-merge model directory
cp /path/to/original_base_model/modeling_llada2_moe.py /path/to/save/separate_expert_model/

With the model converted and the modeling file in place, you are now ready to chat! Follow the instructions on the official model card to start a conversation with your model.

MoE…

Excerpt shown — open the source for the full document.

Notability

notability 5.0/10

Solid new repo with moderate traction.