← Back to Discovery

2026.01.16-bloomberg-memray

Memray: Full-Stack Memory Profiling for Python and Native Extensions

Overview

Memray is Bloomberg's open-source memory profiler that tracks every allocation across the entire call stack — Python code, C/C++ native extensions, and the CPython interpreter itself. Unlike sampling profilers that approximate hotspot detection, Memray hooks into every malloc, realloc, and free call, producing a complete, deterministic record of memory behavior. For teams building data pipelines, ML inference services, or performance-sensitive Python applications, it bridges a critical observability gap: what is using memory and exactly which code path caused it.

Key Features

Technical Architecture

Memray works by intercepting every memory allocation at the operating system level. On Linux, it uses LD_PRELOAD to inject a shared library that wraps malloc, calloc, realloc, posix_memalign, and free. On macOS, it uses DYLD_INSERT_LIBRARIES. Each intercepted call records the allocation size, address, and the current instruction pointer, then unwinds the stack using libunwind (Linux) or the macOS frame-pointer walk.

The stack frames are stored as raw instruction pointers — symbolification is deliberately deferred to report-generation time, not collection time. This minimizes profiling overhead (typically 10-30% slowdown for Python-only, higher with --native). At analysis time, Memray converts IPs to human-readable symbols using DWARF debug info when available, falling back to ELF symbol tables. It also integrates with debuginfod to fetch debug symbols from remote servers, which is essential when distribution packages lack debuginfo.

A key architectural decision: Memray uses the Tracker context manager API (with memray.Tracker("out.bin")) internally even in CLI mode. This makes the same engine available as a Python library for fine-grained, programmatic profiling of specific code regions.

Use Cases

Pros & Cons

Pros: Complete, deterministic trace; native extension visibility unmatched by alternatives; fast enough for real workloads; rich visualization ecosystem; excellent pytest integration for CI.

Cons: Linux/macOS only (no Windows support); native mode overhead is higher (50-100% slowdown); symbolification requires debug symbols on the same machine; attaching to a running process carries crash risk on edge cases; no built-in heap diffing across snapshots.

Alternatives

Memray occupies a unique niche: deterministic full-stack tracing with native extension support, in a package you can pip install without a PhD in tooling.

Who Should Use It

Python teams shipping performance-sensitive applications: ML inference services using NumPy/TensorFlow, data engineering pipelines, web services with C extensions, or anyone profiling memory in CI/CD. Requires Python 3.7+ on Linux (preferred) or macOS. System dependencies include libunwind and liblz4 (automatically handled via binary wheels on most platforms).

Getting Started

pip install memray
memray run -o output.bin my_script.py
memray flamegraph output.bin
# Opens memray-flamegraph-my_script.html in browser

For native mode:

memray run --native -o output.bin my_script.py

For CI integration, add pytest-memray and annotate tests:

@pytest.mark.limit_memory("50 MB")
def test_allocation_bound():
result = expensive_function()
assert validate(result)