mimalloc: A new, high-performance, scalable memory allocator for the modern era
Captured source
source ↗mimalloc: A new, high-performance, scalable memory allocator for the modern era - Microsoft Research
Skip to main content
Research
Publications Code & data People Microsoft Research blog
Artificial intelligence Audio & acoustics Computer vision Graphics & multimedia Human-computer interaction Human language technologies Search & information retrieval
Data platforms and analytics Hardware & devices Programming languages & software engineering Quantum computing Security, privacy & cryptography Systems & networking
Algorithms Mathematics
Ecology & environment Economics Medical, health & genomics Social sciences Technology for emerging markets
Academic programs Events & academic conferences Microsoft Research Forum
Behind the Tech podcast Microsoft Research blog Microsoft Research Forum Microsoft Research podcast
About Microsoft Research Careers & internships People Emeritus program News & awards Microsoft Research newsletter
Africa AI for Science AI Frontiers Asia-Pacific Cambridge Health Futures India Montreal New England New York City Redmond
Applied Sciences Mixed Reality & AI - Cambridge Mixed Reality & AI - Zurich
Register: Research Forum
Microsoft Security Azure Dynamics 365 Microsoft 365 Microsoft Teams Windows 365
Microsoft AI Azure Space Mixed reality Microsoft HoloLens Microsoft Viva Quantum computing Sustainability
Education Automotive Financial services Government Healthcare Manufacturing Retail
Find a partner Become a partner Partner Network Microsoft Marketplace Software companies
Blog Microsoft Advertising Developer Center Documentation Events Licensing Microsoft Learn Microsoft Research
View Sitemap
Return to Blog Home Microsoft Research Blog
At a glance
Today’s critical services and applications are often highly concurrent, using hundreds of threads. They also operate at large memory scales, frequently hundreds of gigabytes, especially when using large language models.
mimalloc is an open-source, modern, scalable memory allocator that is a drop-in replacement for malloc and free. It is relatively small (~12K lines), with clear internal data structures, and is easy to build and integrate into other projects. It provides bounded worst-case allocation times (up to OS primitives), bounded space overhead, low internal fragmentation, and minimal contention by relying almost exclusively on atomic operations.
mimalloc is available on GitHub (opens in new tab) and has over 12K stars.
mimalloc
At the RiSE group at Microsoft Research (MSR) , we conduct fundamental research into formal methods, programming languages, and software engineering (including emerging agentic systems), with a particular focus on systems that can be provably correct, secure, and performant. The mimalloc memory allocator was initially designed in 2020 as a fast allocator for the state-of-the-art Lean (opens in new tab) and Koka (opens in new tab) programming languages developed at RiSE, both of which use novel compiler-guided reference counting (see Perceus ).
The scalable design of mimalloc has also proved to work exceedingly well for large services at Microsoft. Through close cooperation with product teams, mimalloc has significantly improved the response times in services such as Bing. Today, mimalloc is widely used in large services and applications, both within and outside Microsoft. It serves as the allocator for NoGIL CPython 3.13+, is integrated into Unreal Engine, and is used in games such as Death Stranding . The project is open source on GitHub, with over 12K stars its Rust wrapper alone sees over 100K downloads per day.
mimalloc is effective across a wide range of scenarios; from small-scale applications like Koka or Lean, to large services with memory footprints exceeding 500 GiB and hundreds of threads.
Despite this range, the codebase is still compact, at around 12K lines of C. Reflecting its research origins, mimalloc emphasizes clear internal data structures with strong invariants, making it easier to understand and reason about than many industry allocators. As Fred Brooks already remarked in his famous book The Mythical Man-Month : “ S how me your flowchart and conceal your tables, and I shall continue to be mystified. Show me your tables, and I won’t need your flowchart; it’ll be obvious.”
As a result, mimalloc has been ported to many platforms—Windows, macOS, Linux, FreeBSD, NetBSD, DragonFly, and various consoles—, and is easy to build and integrate into other projects. For example, the clear data structures enabled Sam Gross and others to adopt mimalloc as the concurrent allocator for NoGIL CPython. The design also makes it relatively straightforward to implement cyclic garbage collection on top of this.
The Fast Path
As with other scalable allocators (such as tcmalloc and jemalloc), a core design principle of mimalloc is that each thread maintains its own thread-local heap, which we call a “theap”. Each theap owns a set of mimalloc “pages,” which are usually 64 KiB. Each mimalloc page contains blocks of a fixed size, organized into size classes to reduce internal fragmentation. By giving each thread its own theap and set of mimalloc pages, memory allocation and deallocation typically proceed without synchronization. Atomic operations are only required when a thread frees a block allocated by another thread.
Moreover, in practice, most allocations are quite small, often less than 1 KiB. For such small allocations, mimalloc provides a fast path where the main allocation function looks like:
void * mi_malloc( size_t size ) { mi_theap_t * const theap = mi_get_thread_local_theap(); if (size > MI_MAX_SMALL_SIZE ) return mi_malloc_generic(theap,size); // slow generic path
const size_t index = (size + sizeof ( void *))/ sizeof ( void *); // round size mi_page_t * const page = theap->small_pages[index];
mi_block_t * const block = page->free; // head of free list if (block == NULL ) return mi_malloc_generic(theap,size); // slow generic path
page->free = block->next; // pop free list page->used++; return block; }
By using thread-local theaps, we need no atomic operations or thread synchronization. We also try to minimize the number of branches. In particular, the thread-local theap is never NULL , and we initialize it with a special empty theap with all empty pages. This way, we do not need a separate check if the theap is NULL . Similarly, the pointers in the small_pages array are never NULL , and we use again special empty pages (with page-…
Excerpt shown — open the source for the full document.
Notability
notability 6.0/10Solid performance allocator release with moderate HN traction.