What does this repo signal mean?

InclusionAI (Ant Group) published inclusionAI/humming (Python). This repository signal exposes tooling, eval, infrastructure, or model-adjacent work before it may appear in a launch post. High-signal details: repo inclusionAI/humming · language Python · New repo with modest stars. onlylabs links this event to 1 captured evidence page and 6 related repo signals.

InclusionAI (Ant Group) Repo: inclusionAI/humming

Captured source

source ↗

GitHub/github.com/inclusionAI/humming

inclusionAI/humming repository metadata

Source ↗

published Feb 11, 2026seen Jun 5captured Jun 11http 200method plain

inclusionAI/humming

Language: Python

License: Apache-2.0

Stars: 134

Forks: 18

Open issues: 2

Created: 2026-02-11T10:55:17Z

Pushed: 2026-06-10T02:13:51Z

Default branch: main

Fork: no

Archived: no

README:

Humming

Humming is a high-performance, lightweight, and highly flexible JIT (Just-In-Time) compiled GEMM kernel library specifically designed for quantized inference.

Key Features

High Flexibility
Supports inference for any weight type under 8-bit across FP16 / BF16 / FP8 / FP4 / INT8 / INT4 activations (provided the activation's dynamic range covers the weight type).
Supports various quantization strategies.
Supports various scale types (BF16, FP16, E4M3, E5M2, and UE8M0).
Supports both Dense GEMM and MoE GEMM.
High Compatibility: supports all NVIDIA GPUs from SM75+ (Turing architecture) and beyond.
High Performance
Delivers State-of-the-Art (SOTA) throughput and efficiency across a wide range of computational scenarios.
Ultra-Lightweight
Minimal dependencies: Requires only PyTorch and NVCC.
Compact footprint: The package size is only 100+KB.

Support Matrix

| Activation Type | Supported Devices | Supported Weight Types | | :--- | :--- | :--- | | FP16 (e5m10) | SM75+ | • Symmetric INT1-8 • INT1-8 with dynamic zero point • Arbitrary signed FP (kBits ≤ 8, kExp ≤ 5) | | BF16 (e8m7) | SM80+ | • Symmetric INT1-8 • INT1-8 with dynamic zero point • Arbitrary signed FP (kBits ≤ 8) | | FP8 (e4m3) | SM89+ | • Symmetric INT1-5 • INT1-4 with dynamic zero point • Arbitrary signed FP (kExp ≤ 4, kMan ≤ 3) | | FP8 (e5m2) | SM89+ | • Symmetric INT1-4 • INT1-3 with dynamic zero point • Arbitrary signed FP (kExp ≤ 5, kMan ≤ 2) | | FP4 (e2m1) | SM120+ | • Symmetric INT1-3 • INT1-2 with dynamic zero point • Arbitrary signed FP (kExp ≤ 2, kMan ≤ 1) | | INT8 | SM75+ | • Symmetric INT1-8 • INT1-7 with dynamic zero point | | INT4 | SM80+ | • Symmetric INT1-4 • INT1-3 with dynamic zero point |

Getting Started

Installation

pip install git+https://github.com/inclusionAI/humming.git

Usage Example

import torch
from humming.layer import HummingLayer

layer = HummingLayer(
shape_n=8192,
shape_k=8192,
weight_config={"dtype": "int6"},
torch_dtype=torch.float16,
).cuda()

weight = torch.randn((8192, 8192), dtype=torch.float16, device="cuda:0")
inputs = torch.randn((128, 8192), dtype=torch.float16, device="cuda:0")

# Load unquantized weight and quantize to layer quantization format
layer.load_from_unquantized(weight)
# Transform weight to humming format and prepare default kernels
layer.transform()

# Run quantized GEMM (tuning_config is optional, auto-selected by default)
output = layer(inputs)

print("Quantized GEMM Output:")
print(output)
print("\nReference Output:")
print(inputs.matmul(weight.T))

Acknowledgement

This project is highly inspired by

DeepGEMM
Marlin Kernel and vLLM Marlin Kernel
lmdeploy GEMM kernel
CUTLASS

Notability

notability 3.0/10

New repo with modest stars