11.20.07

Returning to Computer Science: Hashing It Up

Posted in Programming, Computer Science, Computer Science III at 4:53 pm by Nick

While I am in the midst of Thanksgiving break from school, I find myself not taking a break at all. I’m working, and when not working I’m attempting to get school work done, as I have much to do (as always).

In any event, I decided to jump into hashing, and hash tables more specifically. Hash tables are something I’m very used to because when I started programming with SMAUG it had a hash table built into it (if you have a thousand instances of a mob it is an awful waste of space to allocate the same string a thousand times). So I learned the in’s and out’s required to use it, but like all of my knowledge at the time, I knew how to make it work without knowing -why- it worked. That has changed.

Data Structure Techniques by Thomas A. Standish had a lot of good information on hash tables. It taught me many things, even with my preconceived notions of what a hash table did and what its purpose was. I never even considered adding a key by a method other then division (modulo). His text on the multiplicative method is very interesting, though it seems a little too narrow to be useful in a large number of instances.

I initially had trouble testing my different hashing functions for one very simple reason: my computer. Since my last computer bit the dust (the eMachine lasted 5 years with no problems, I’d say thats pretty good quality), I’ve finally invested the money into a -real- computer (more on that here). Simply put, with the original test cases my independent, worst-case variable, the list structure, performed so fast I had to come up with another solution.

As I was in node mood to figure out and implement the use of a high performance timer (known as QueryPerformanceCounter on Windows), I merely used google and grabbed an easy implementation (doesn’t get easier then one structure!). After implementing that, I increased the number of items hashed by about 100 fold, bring most of the searches up to 15 million items. This gave a sufficient look at the differences between hashing functions and their performance compared to a list.

To compare hash functions, I went to Wikipedia (I think) and grabbed a recommended hash function, which happened to be Jenkin’s One-At-A-Time Hash. I then implemented two more very basic hashes. One was based off the SMAUG version, which hashes a string based on its length. The other is based on the first letter of the string. Oddly enough I named that CaseHash, when really it has nothing to do with the case of the first letter, but the letter itself.

The data was rather astounding. Jenkin’s hash almost always performed as well as or better then all other methods. Sometimes it would take a little longer (we are talking less then hundredths of a second longer though), but almost always was the top performer. Note in the final screen shot how it outperforms the other has functions, or performs as well. When doing a particularly hard lookup it would perform much, much better.

Provided via the link below is the source and a couple of screen shots (some have different output because I was busy tinkering). The first time is provided using the time() function, the second time is using the performance timer (with a precision of 15).

Code and screen shots -> http://ew.xidus.net/download/cs3/hash

Notes:
1) I intended for this hash table implementation to be far more flexible, but after fighting with templates for quite some time I decided to merely call it quits and hard code the strings in.
2) In screen shots 1,2, and 3 it appears to be sorting numbers. These are not numbers, but in fact strings that contain incrementing numbers. This was the product of the program I used to generate the data.
3) The programs I used to generate the data are also included in the link above. They are small and simple.
4) All code listed above is public domain. Use it as you like, or don’t use it. Your choice.

Leave a Comment