B-Heap vs. Binary Heap (2010)
A great reminder that spatial locality can be critical to performance. But that's actually quite well-known, and maybe a more important lesson would be: don't take performance for granted, even in standard libraries of well stablished languages or algorithms, there might be much room for improvement for your use case (if you need it).
This said, the article feels quite arrogant. As other commenters say, the idea behind the optimization is nothing new, just some basic concepts well applied. Which is a job well done, but probably doesn't justify the tone (that at least I perceive) from the article.
It's pretty much the same reason why databases use B-trees and not binary trees --- when you have to consider the nonuniform latency of storage access, traditional algorithmic analysis that assumes every memory address is accessible in the same time as any other becomes extremely misleading.
I agree with his rant about "CS departments' algorithmic analysis" failing to cover these real-world hardware issue. That was certainly true of my ~1997 undergrad CS course at the University of St Andrews, that was otherwise exemplary.
The only thing I don't understand is why his data structure for finding the least-recently-used item wasn't just a linked list - where each time you touch an object you move it to one end of the list, and when you want to discard an object you remove from the other end. That'd be o(1) and would cause, at most, one page fault per update and none for the add and remove cases (since the two ends would surely be paged "in"). And it'd be a lot easier than his fancy thing. I guess I missed something.
It might be worth pointing out that NVMe or Intel Optane would solve his problem (1) without his fancy data structure too. Some might go as far as to say that while his fancy data structure is out dated now, the stuff that the CS departments teach became correct again. Maybe St Andrews had the right idea all along.
1) Or huge-pages or NUMA node pinning, to some extent.
A similar but more sophisticated use of this was Intel Research's seminal paper on FAST tree (index). Also from 2010.
If anyone wants to read more, there are (pre this article) papers on cache friendly/oblivious priority queues, e.g.
http://erikdemaine.org/papers/BufferTrees_STOC2002/paper.pdf
The rough gist is that if the only thing you need to be able to do is insert and pop-min, then you can buffer up roughly a page's worth of incomplete work for each page, getting amortized performance that tracks the optimal number of cache transfers at each level of the cache.
Further, in the paper above you are introduced (in the related work) to the citation:
"A. LaMarca and R. E. Ladner. The influence of caches on the performance of heaps. Journal of Experimental Algorithmics, 1, 1996."
which describes the author's "B-Heap" (at least, it describes a b-ary heap where you pick b to match your cache line/page size).
I'm guessing the problem isn't that academics don't understand these things so much that practicing programmers don't always read very much.
> ..., not to mention wasting enormous amounts of hardware and electricity.
I wonder how legitimate this concern is from an environmental angle. On one hand a lot of code in the world probably has much worse performance ratios than than the one described in the article -- using python is sufficient to cause an order of performance loss. On the other hand data centers are using more renewable energy sources and reducing their carbon output.
Is this only a wasted money problem? How worth while is it to write "green code"? Back of the envelope calculations on how much developer comfort costs would be interesting.
I had a quick question about this part: """ Two details are worth noting:
* Once we leave a VM page through the bottom, it is important for performance that both child nodes live in the same VM page, because we are going to compare them both with their parent.
* Because of this, the tree fails to expand for one generation every time it enters a new VM page in order to use the first two elements in the page productively. """
Is the point just that you want to pack 8 elements into each page, but if the tree expands the first generation of a page, then you end up only being able to pack in 6?
Binary Heap was always my favorite data structure because of how it would improve my A* pathfinding performance. Guess I should have used B-Heap.
Despite the arrogance, I would be more interested if there were some more mathematical oriented explanation and/or a self-contained code sample like for example the TinyQueue javascript implementation (any language is welcome though.
Do I understand this correctly: to optimise for virtual memory, do the things you'd do to optimise for the CPU cache except on a larger scale?
undefined
There is a joke inthere about economics but i cant think of it right now