feat(server): NUMA awareness by tungtose · Pull Request #2412 · apache/iggy

tungtose · 2025-11-26T07:52:33Z

Flow

This PR addressing #2387

Benchmark

I was trying to launch an EC2 machine with multiple NUMA cores, but they wouldn't allow me to do so. They require me to submit a ticket and wait for approval.

Here is the bench on my local machine: Intel(R) Core(TM) Ultra 9 285H

Bench cmd: target/release/iggy-bench -m 200 -r 800MB pp tcp

Before:

2025-11-26T07:22:39.247127Z INFO bench_report::prints: \x1b[34mBenchmark: Pinned Producer, 8 producers, 8 streams, 1 topic per stream, 1 partitions per topic, 8000000 messages, 1000 messages per batch, 8000 message batches, 100 bytes per message, 800MB of data processed
\x1b[0m
2025-11-26T07:22:39.247182Z INFO bench_report::prints: \x1b[32mProducers Results: Total throughput: 790.38 MB/s, 7903786 messages/s, average throughput per Producer: 98.80 MB/s, p50 latency: 0.80 ms, p90 latency: 1.30 ms, p95 latency: 1.45 ms, p99 latency: 1.80 ms, p999 latency: 2.49 ms, p9999 latency: 15.00 ms, average latency: 0.84 ms, median latency: 0.80 ms, min: 0.26 ms, max: 8.52 ms, std dev: 0.06 ms, total time: 1.11 s\x1b[0m

After:

x1b[34mBenchmark: Pinned Producer, 8 producers, 8 streams, 1 topic per stream, 1 partitions per topic, 8000000 messages, 1000 messages per batch, 8000 message batches, 100 bytes per message, 800MB of data processed
\x1b[0m
2025-11-26T07:29:38.128852Z INFO bench_report::prints: \x1b[32mProducers Results: Total throughput: 798.21 MB/s, 7982140 messages/s, average throughput per Producer: 99.78 MB/s, p50 latency: 0.46 ms, p90 latency: 0.77 ms, p95 latency: 0.92 ms, p99 latency: 1.83 ms, p999 latency: 3.58 ms, p9999 latency: 4.32 ms, average latency: 0.51 ms, median latency: 0.46 ms, min: 0.17 ms, max: 1.78 ms, std dev: 0.04 ms, total time: 1.00 s\x1b[0m

hubcio · 2025-11-26T09:03:15Z

I have Ryzen 9 9950X3D, and I see prints (using default config):

2025-11-26T08:16:44.969801Z  INFO main iggy_server: Using 0 shards with 0 NUMA node, 0 Core per node, and avoid hyperthread true
2025-11-26T08:16:44.969809Z  INFO main iggy_server: Using mimalloc allocator
2025-11-26T08:16:44.969812Z  INFO main iggy_server: Starting 16 shard(s)

Here's my numa topology (i just printed NumaTopology crate hwlocality in ShardAllocator::new()):

{
    topology: Topology {
        is_abi_compatible: true,
        build_flags: BuildFlags(
            0x0,
        ),
        is_this_system: true,
        feature_support: FeatureSupport {
            discovery: Some(
                DiscoverySupport {
                    pu_count: true,
                    numa_count: true,
                    numa_memory: true,
                },
            ),
            cpubind: Some(
                CpuBindingSupport {
                    set_current_process: true,
                    get_current_process: true,
                    set_process: true,
                    get_process: true,
                    set_current_thread: true,
                    get_current_thread: true,
                    set_thread: true,
                    get_thread: true,
                    get_current_process_last_cpu_location: true,
                    get_process_last_cpu_location: true,
                    get_current_thread_last_cpu_location: true,
                },
            ),
            membind: Some(
                MemoryBindingSupport {
                    set_current_process: false,
                    get_current_process: false,
                    set_process: false,
                    get_process: false,
                    set_current_thread: true,
                    get_current_thread: true,
                    set_area: true,
                    get_area: true,
                    get_area_memory_location: true,
                    allocate_bound: true,
                    first_touch_policy: true,
                    bind_policy: true,
                    interleave_policy: true,
                    next_touch_policy: false,
                    migrate_flag: true,
                },
            ),
        },
        type_filter: {
            "Bridge": KeepNone,
            "Core": KeepAll,
            "Group": KeepStructure,
            "L1Cache": KeepAll,
            "L1ICache": KeepNone,
            "L2Cache": KeepAll,
            "L2ICache": KeepNone,
            "L3Cache": KeepAll,
            "L3ICache": KeepNone,
            "L4Cache": KeepAll,
            "L5Cache": KeepAll,
            "Machine": KeepAll,
            "Misc": KeepNone,
            "NUMANode": KeepAll,
            "OSDevice": KeepNone,
            "PCIDevice": KeepNone,
            "PU": KeepAll,
            "Package": KeepAll,
        },
        objects_per_depth: [
            (
                "0",
                [
                    Machine with CpuSet(0-31) (
                      total=96336932KB,
                      DMIProductName=MS-7E59,
                      DMIProductVersion=2.0,
                      DMIBoardVendor="Micro-Star International Co., Ltd.",
                      DMIBoardName="MAG X870E TOMAHAWK WIFI (MS-7E59)",
                      DMIBoardVersion=2.0,
                      DMIBoardAssetTag="To be filled by O.E.M.",
                      DMIChassisVendor="Micro-Star International Co., Ltd.",
                      DMIChassisType=3,
                      DMIChassisVersion=2.0,
                      DMIChassisAssetTag="To be filled by O.E.M.",
                      DMIBIOSVendor="American Megatrends International, LLC.",
                      DMIBIOSVersion=2.A91,
                      DMIBIOSDate=09/09/2025,
                      DMISysVendor="Micro-Star International Co., Ltd.",
                      Backend=Linux,
                      LinuxCgroup=/user.slice/user-1000.slice/user@1000.service/app.slice/app-Alacritty@a6178dd85eea4b1e910e4663bd122a64.service,
                      OSName=Linux,
                      OSRelease=6.17.9-2-cachyos,
                      OSVersion="#1 SMP PREEMPT_DYNAMIC Tue, 25 Nov 2025 01:13:51 +0000",
                      HostName=atlas,
                      Architecture=x86_64,
                      hwlocVersion=2.12.2,
                      ProcessName=iggy-server
                    ),
                ],
            ),
            (
                "1",
                [
                    Package with CpuSet(0-31) (
                      total=96336932KB,
                      CPUVendor=AuthenticAMD,
                      CPUFamilyNumber=26,
                      CPUModelNumber=68,
                      CPUModel="AMD Ryzen 9 9950X3D 16-Core Processor          ",
                      CPUStepping=0
                    ),
                ],
            ),
            (
                "2",
                [
                    Die with CpuSet(0-7,16-23),
                    Die with CpuSet(8-15,24-31),
                ],
            ),
            (
                "3",
                [
                    L3Cache with CpuSet(0-7,16-23) (
                      size=98304KB,
                      linesize=64,
                      ways=16,
                      Inclusive=0
                    ),
                    L3Cache with CpuSet(8-15,24-31) (
                      size=32768KB,
                      linesize=64,
                      ways=16,
                      Inclusive=0
                    ),
                ],
            ),
            (
                "4",
                [
                    L2Cache with CpuSet(0,16) (
                      size=1024KB,
                      linesize=64,
                      ways=16,
                      Inclusive=1
                    ),
                    L2Cache with CpuSet(1,17) (
                      size=1024KB,
                      linesize=64,
                      ways=16,
                      Inclusive=1
                    ),
                    L2Cache with CpuSet(2,18) (
                      size=1024KB,
                      linesize=64,
                      ways=16,
                      Inclusive=1
                    ),
                    L2Cache with CpuSet(3,19) (
                      size=1024KB,
                      linesize=64,
                      ways=16,
                      Inclusive=1
                    ),
                    L2Cache with CpuSet(4,20) (
                      size=1024KB,
                      linesize=64,
                      ways=16,
                      Inclusive=1
                    ),
                    L2Cache with CpuSet(5,21) (
                      size=1024KB,
                      linesize=64,
                      ways=16,
                      Inclusive=1
                    ),
                    L2Cache with CpuSet(6,22) (
                      size=1024KB,
                      linesize=64,
                      ways=16,
                      Inclusive=1
                    ),
                    L2Cache with CpuSet(7,23) (
                      size=1024KB,
                      linesize=64,
                      ways=16,
                      Inclusive=1
                    ),
                    L2Cache with CpuSet(8,24) (
                      size=1024KB,
                      linesize=64,
                      ways=16,
                      Inclusive=1
                    ),
                    L2Cache with CpuSet(9,25) (
                      size=1024KB,
                      linesize=64,
                      ways=16,
                      Inclusive=1
                    ),
                    L2Cache with CpuSet(10,26) (
                      size=1024KB,
                      linesize=64,
                      ways=16,
                      Inclusive=1
                    ),
                    L2Cache with CpuSet(11,27) (
                      size=1024KB,
                      linesize=64,
                      ways=16,
                      Inclusive=1
                    ),
                    L2Cache with CpuSet(12,28) (
                      size=1024KB,
                      linesize=64,
                      ways=16,
                      Inclusive=1
                    ),
                    L2Cache with CpuSet(13,29) (
                      size=1024KB,
                      linesize=64,
                      ways=16,
                      Inclusive=1
                    ),
                    L2Cache with CpuSet(14,30) (
                      size=1024KB,
                      linesize=64,
                      ways=16,
                      Inclusive=1
                    ),
                    L2Cache with CpuSet(15,31) (
                      size=1024KB,
                      linesize=64,
                      ways=16,
                      Inclusive=1
                    ),
                ],
            ),
            (
                "5",
                [
                    L1dCache with CpuSet(0,16) (
                      size=48KB,
                      linesize=64,
                      ways=12,
                      Inclusive=0
                    ),
                    L1dCache with CpuSet(1,17) (
                      size=48KB,
                      linesize=64,
                      ways=12,
                      Inclusive=0
                    ),
                    L1dCache with CpuSet(2,18) (
                      size=48KB,
                      linesize=64,
                      ways=12,
                      Inclusive=0
                    ),
                    L1dCache with CpuSet(3,19) (
                      size=48KB,
                      linesize=64,
                      ways=12,
                      Inclusive=0
                    ),
                    L1dCache with CpuSet(4,20) (
                      size=48KB,
                      linesize=64,
                      ways=12,
                      Inclusive=0
                    ),
                    L1dCache with CpuSet(5,21) (
                      size=48KB,
                      linesize=64,
                      ways=12,
                      Inclusive=0
                    ),
                    L1dCache with CpuSet(6,22) (
                      size=48KB,
                      linesize=64,
                      ways=12,
                      Inclusive=0
                    ),
                    L1dCache with CpuSet(7,23) (
                      size=48KB,
                      linesize=64,
                      ways=12,
                      Inclusive=0
                    ),
                    L1dCache with CpuSet(8,24) (
                      size=48KB,
                      linesize=64,
                      ways=12,
                      Inclusive=0
                    ),
                    L1dCache with CpuSet(9,25) (
                      size=48KB,
                      linesize=64,
                      ways=12,
                      Inclusive=0
                    ),
                    L1dCache with CpuSet(10,26) (
                      size=48KB,
                      linesize=64,
                      ways=12,
                      Inclusive=0
                    ),
                    L1dCache with CpuSet(11,27) (
                      size=48KB,
                      linesize=64,
                      ways=12,
                      Inclusive=0
                    ),
                    L1dCache with CpuSet(12,28) (
                      size=48KB,
                      linesize=64,
                      ways=12,
                      Inclusive=0
                    ),
                    L1dCache with CpuSet(13,29) (
                      size=48KB,
                      linesize=64,
                      ways=12,
                      Inclusive=0
                    ),
                    L1dCache with CpuSet(14,30) (
                      size=48KB,
                      linesize=64,
                      ways=12,
                      Inclusive=0
                    ),
                    L1dCache with CpuSet(15,31) (
                      size=48KB,
                      linesize=64,
                      ways=12,
                      Inclusive=0
                    ),
                ],
            ),
            (
                "6",
                [
                    Core with CpuSet(0,16),
                    Core with CpuSet(1,17),
                    Core with CpuSet(2,18),
                    Core with CpuSet(3,19),
                    Core with CpuSet(4,20),
                    Core with CpuSet(5,21),
                    Core with CpuSet(6,22),
                    Core with CpuSet(7,23),
                    Core with CpuSet(8,24),
                    Core with CpuSet(9,25),
                    Core with CpuSet(10,26),
                    Core with CpuSet(11,27),
                    Core with CpuSet(12,28),
                    Core with CpuSet(13,29),
                    Core with CpuSet(14,30),
                    Core with CpuSet(15,31),
                ],
            ),
            (
                "7",
                [
                    PU with CpuSet(0),
                    PU with CpuSet(16),
                    PU with CpuSet(1),
                    PU with CpuSet(17),
                    PU with CpuSet(2),
                    PU with CpuSet(18),
                    PU with CpuSet(3),
                    PU with CpuSet(19),
                    PU with CpuSet(4),
                    PU with CpuSet(20),
                    PU with CpuSet(5),
                    PU with CpuSet(21),
                    PU with CpuSet(6),
                    PU with CpuSet(22),
                    PU with CpuSet(7),
                    PU with CpuSet(23),
                    PU with CpuSet(8),
                    PU with CpuSet(24),
                    PU with CpuSet(9),
                    PU with CpuSet(25),
                    PU with CpuSet(10),
                    PU with CpuSet(26),
                    PU with CpuSet(11),
                    PU with CpuSet(27),
                    PU with CpuSet(12),
                    PU with CpuSet(28),
                    PU with CpuSet(13),
                    PU with CpuSet(29),
                    PU with CpuSet(14),
                    PU with CpuSet(30),
                    PU with CpuSet(15),
                    PU with CpuSet(31),
                ],
            ),
            (
                "<NUMANode>",
                [
                    NUMANode with CpuSet(0-31) (
                      local=96336932KB,
                      total=96336932KB
                    ),
                ],
            ),
        ],
        memory_parents_depth: Ok(
            PositiveInt(1),
        ),
        cpuset: CpuSet(0-31),
        complete_cpuset: CpuSet(0-31),
        allowed_cpuset: CpuSet(0-31),
        nodeset: NodeSet(0),
        complete_nodeset: NodeSet(0),
        allowed_nodeset: NodeSet(0),
        distances: Ok(
            [],
        ),
    },
    node_count: 1,
    physical_cores_per_node: [
        16,
    ],
    logical_cores_per_node: [
        32,
    ],
}

hubcio · 2025-11-26T09:11:32Z

But if i run this program on my PC:

use hwlocality::object::types::ObjectType;
use hwlocality::topology::Topology;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let topo = Topology::new()?;
    let numa_count = topo.objects_with_type(ObjectType::NUMANode).len();

    println!("NUMA nodes: {numa_count}");
    Ok(())
}

I get: NUMA nodes: 1

tungtose · 2025-11-26T09:20:24Z

Thanks @hubcio. It is indeed one node. The reason is that I put the wrong logic in the print info. The log means using Node number 0, not 0 nodes. Let me update that

hubcio · 2025-12-12T10:21:20Z

I checked your PR on AWS, unfortunately i'm not able to book ec2 instances with numa nodes > 1, however on my PC it works fine.

I'm gonna need you to fix one more thing: when building for musl targets, please use vendored hwloc because we don't want to link with glibc.

[target.'cfg(target_env = "musl")'.dependencies]
hwlocality = { version = "1.0.0-alpha.11", features = ["vendored"] }

[target.'cfg(not(target_env = "musl"))'.dependencies]
hwlocality = { version = "1.0.0-alpha.11" }

Other than that LGTM.

tungtose · 2025-12-14T10:21:17Z

I checked your PR on AWS, unfortunately i'm not able to book ec2 instances with numa nodes > 1, however on my PC it works fine.

I'm gonna need you to fix one more thing: when building for musl targets, please use vendored hwloc because we don't want to link with glibc.
[target.'cfg(target_env = "musl")'.dependencies]
hwlocality = { version = "1.0.0-alpha.11", features = ["vendored"] }

[target.'cfg(not(target_env = "musl"))'.dependencies]
hwlocality = { version = "1.0.0-alpha.11" }
Other than that LGTM.

Thanks @hubcio, I have updated it, along with the CI fix

tungtose added 12 commits November 26, 2025 13:02

feat(server): numa awareness

bdd812e

fix test

2b2bcfa

fix bench

f9b4b68

remove unwrap

4cbdedc

update config docs

d53f970

remove redundant code

81d53fc

add tests

2b97e2c

more logs

1d800d3

fix git hook

458c027

rebase master

c253485

fix licenses list

1968183

try fix ci

76199f4

tungtose force-pushed the numa-awareness branch from 47e5efe to 76199f4 Compare November 26, 2025 08:44

install missing deps

b75660a

tungtose and others added 8 commits November 26, 2025 17:40

fix missing deps bdd test

fafa9ff

resolve wrong log logic

27d5580

Merge branch 'master' into numa-awareness

c619197

resolve confict

c0573e3

Merge branch 'master' into numa-awareness

ecb9740

resolve merge conflict

fdabb6f

no op

daa9b1c

Merge branch 'master' into numa-awareness

66b7691

tungtose and others added 4 commits December 12, 2025 17:32

Merge branch 'master' into numa-awareness

f02faf6

Merge branch 'master' into numa-awareness

950f85a

resolve comments

6f6862a

fix ci

4bfd08d

tungtose added 5 commits December 14, 2025 16:01

don't run apt-get on MacOS runner

17978bc

don't run apt-get on MacOS runner

072b0c9

ci random failed?

065e2c3

run apt-install on linux only

f6987b8

try install hwlock on macOS

f58566e

Merge branch 'master' into numa-awareness

28677a2

hubcio approved these changes Dec 15, 2025

View reviewed changes

spetz approved these changes Dec 15, 2025

View reviewed changes

spetz merged commit a5d5694 into apache:master Dec 15, 2025
53 checks passed

hubcio mentioned this pull request Dec 15, 2025

Make iggy-server aware of NUMA #2387

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(server): NUMA awareness#2412

feat(server): NUMA awareness#2412
spetz merged 31 commits into
apache:masterfrom
tungtose:numa-awareness

tungtose commented Nov 26, 2025

Uh oh!

hubcio commented Nov 26, 2025

Uh oh!

hubcio commented Nov 26, 2025

Uh oh!

tungtose commented Nov 26, 2025

Uh oh!

hubcio commented Dec 12, 2025

Uh oh!

tungtose commented Dec 14, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

tungtose commented Nov 26, 2025

Flow

Benchmark

Before:

After:

Uh oh!

hubcio commented Nov 26, 2025

Uh oh!

hubcio commented Nov 26, 2025

Uh oh!

tungtose commented Nov 26, 2025

Uh oh!

hubcio commented Dec 12, 2025

Uh oh!

tungtose commented Dec 14, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants