c - cache locality for a binary tree -


if have tree following

struct tree_t {     //data     tree_t *left;     tree_t *right; }; 

and want start allocating memory leaves, there way ensure when traverse tree leaves being cached? if using malloc think leaves scattered around in heap , there cache miss every time try access one.

others gave proper answer, fixed-size pool (https://en.wikipedia.org/wiki/memory_pool), there additional caveats merit more-thorough explanation. still possible or leaves allocated using memory pool have low cache hit rate. blocks of 8*n tree_t's aligned on 64-byte boundaries ideal, though there no benefit above n == 1024.

as side note, @ judy arrays, cache-optimized tree-ish data structure (https://en.wikipedia.org/wiki/judy_array).

it helpful review (briefly) how cache works. caches divided sets of fixed-size lines. typically line size 64 bytes in l1; intel , amd have used 64-byte l1 caches 15 years, , modern arm processors such a15 use 64-byte lines. associativity determines how many lines correspond set. when data brought cache, function hashes address set. data may stored in of lines in set. example, in 2-way set associative cache, given address can stored in 1 of 2 possible cache lines.

maximizing cacheability involves reducing cache line fetches:
1. organizing data cache-line-sized chunks.
2. aligning chunks on cache line boundary.
3. allocating chunks @ addresses map different cache sets.
4. storing data temporal locality (i.e. accessed around same time) in same cache line.
5. reducing size of data, if possible, increase density.

if don't (1), fetches bring in nearby, likely-unuseful data, reducing amount of space data care about. if don't (2), objects straddle cache lines, requiring twice many fetches. if don't (3), cache sets underutilized, similar effect (1). if don't (4), though you're maximizing cache utilization, of data fetched isn't useful when it's fetched, , line gets evicted before data becomes useful. (5) increases number of objects fit in cache packing them less space. example, if can guarantee have fewer 2^32 leaves, store uint32_t index tree_t[] array rather pointers, 100% improvement on 64-bit platforms.

note: malloc() typically returns 8-byte- or 16-byte-aligned blocks, unsuitable; use posix_memalign() on gcc or _aligned_malloc() on msvc.

in case, you're traversing tree, presumably in-order traversal. unless data set fits cache, leaves spread evenly throughout , ergo unlikely have temporal locality nodes in same cache line. in case, best can make sure objects not straddle cache lines allocating cache-line-sized , cache-line-aligned blocks.

i picked blocks of 8*n tree_t's on conservative assumption of 64-byte cache line , 4-byte pointers, results in 8-byte tree_t, , 64 / 8 = 8 tree_t/line. upper bound of n == 1024 due particular x86 cpu (which 1 escapes me @ moment) ignoring bit 18 of address purposes of picking set.


Comments

Popular posts from this blog

toolbar - How to add link to user registration inside toobar in admin joomla 3 custom component -

linux - disk space limitation when creating war file -