2015-02-27

Old String Set/Map Data Structure Benchmark (2012)

Just want to share some table from initial chapter of my thesis (early 2012), it's about modification (added lots of compression) of HAT-Trie for DNS suffix blocking. These tables are from chapter IV, since it's the only exciting part about it .__.)/||. That time I didn't know about Cedar yet (of course it's 2013 XD). This is the list of benchmarked data structure:


The data structure name that marked with "*" sign are those who can be set as nested per subdomain (that is should be a map/associative array). The strings tested are C++'s std::string, Qt's QString, csubdom (compressed subdomain), strcsubdom (compressed subdomain, stored in std::string), clabels (compressed full domain), strclables (compressed full domain name). Compressed in this term are packed characters (from 8-bit to 5/6-bit so it would use less memory). This benchmark performed on Ubuntu 64-bit Linux 3.2, GCC 4.6.3, AMD Phenom X9850, 8GB RAM and non-SSD disk, compile flag: g++ -c -m64 -pipe -O2 -Wall -W.


Notes about that table header "insert" is a benchmark about inserting 2.1 million blacklisted domain names, after it's completed, the data structure erased and the insert operation repeated until 30 seconds passes. The "misses" benchmark about checking if 68.4k domains that doesn't exists on the blacklist, the operation repeated until 2 seconds passes. The "exists" benchmark is about rechecking blacklisted domain names in sorted order, repeated until 8 seconds passes. The "random" benchmark is about checking random blacklisted items, repeated until 20 seconds passes. The value there are number of milliseconds required per domain name. Last column on the table is average bytes required to store one domain name.