Mini Blog: Lazy Page Mapping and why should you care ?
As a robotics engineer working on real time applications, I'll explain these concepts with examples from my field, BUT they apply to all software where performance is critical.
We use malloc
/new
a lot in our applications for dynamic memory allocation. This gets us a pointer, we store our data, and we move on. But when you're processing large amounts of data like lidar point clouds at high frequency or handling camera feeds, every millisecond counts. There's a hidden cost in memory allocation that can be detrimental to your application's performance.
I ran some tests to dig into this, and the results were quite revealing.
So what did i do ?
I wrote a simple test: allocate a buffer, write to every byte, then read from every byte. This simulates a common pitfall where developers sometimes allocate a new buffer for each piece of incoming data. While you wouldn't do this in a well designed, optimized system, it's a good way to understand why not to do this. For the test, I measured two things: page faults and execution time.
What's a page fault? It's occurrence of an interrupt. When you access freshly allocated memory for the first time, the OS hasn't actually mapped it to physical RAM yet. That first access triggers a CPU interrupt, telling the OS to map the page. This process is fast, but not free.
I tested two scenarios and ran the full suite twice to observe the memory allocator's behavior:
- Fresh allocation:
malloc
a new buffer for each test. - Reused buffer: Allocate one buffer at the start and reuse it.
And what did i observe ?
Buffer Size | Test Type | Write Faults | Write Time (ms) | Read Time (ms) | Total Time (ms) | Speedup |
---|---|---|---|---|---|---|
First Run | ||||||
4 KB | Fresh | 1 | 0.004 | 0.003 | 0.007 | - |
4 KB | Reused | 0 | 0.002 | 0.003 | 0.005 | 1.5x |
16 KB | Fresh | 3 | 0.010 | 0.009 | 0.019 | - |
16 KB | Reused | 0 | 0.005 | 0.007 | 0.012 | 1.6x |
256 KB | Fresh | 65 | 0.170 | 0.151 | 0.321 | - |
256 KB | Reused | 0 | 0.078 | 0.144 | 0.222 | 1.4x |
1024 KB | Fresh | 257 | 0.782 | 0.493 | 1.275 | - |
1024 KB | Reused | 0 | 0.247 | 0.417 | 0.665 | 1.9x |
Second Run | ||||||
All sizes | Both | 0 | ~Same | ~Same | ~Same | ~1.1x |
And what did i learn ?
1. OS is Lazy (And That's a Good Thing)
In my test, page faults happened on the first write because that was the first time the program touched the memory. This "lazy allocation" is good because the OS doesn't waste time mapping memory you might never use. By the way the test also confirms my system's page size which is 4KB. To store 1024KB of data, we need 256 pages (1024KB ÷ 4KB = 256). I got recorded 257 page faults, which is almost exactly one fault per page.
2. Reuse Buffers Whenever You Can
Reusing buffers resulted in zero page faults and consistently better performance. For a 1MB buffer, you save about 0.61ms, achieving a nearly 1.9x speedup. That's a significant saving, and when you're processing sensor data in a high frequency loop, those milliseconds add up quickly.
3. The Memory Allocator is quite Smart
Notice the "Second Run" shows zero page faults even for "fresh" allocations? This is because malloc
is smart. When you free()
memory, the allocator doesn't immediately return it to the OS. It keeps those already mapped pages in a pool and hands them back to you on the next malloc()
. This is an optimization you get for free, just by running your program.
What This Means for Your Applications
- Pre allocate sensor buffers: If you're constantly allocating memory for lidar scans or camera frames inside a loop, create them once outside the loop and reuse them.
- Batch your allocations: Do your memory intensive setup during initialization, not in your real time control loop.
- Profile your hot paths: Page faults might be adding hidden latency. Use a profiler to see if they are impacting your critical code paths.
I hope this helped you learn something new ! Happy Coding :)