11.30.07

Planting AA Trees

Posted in Programming, Computer Science, Computer Science III at 1:45 am by Nick

While researching Red-Black trees during my original coding homework sometime ago I stumbled upon the Arne Anderson (AA) tree. Looking back I don’t see why I didn’t push off the Red-Black trees and go with AA Trees. Who knows? Here is an analysis of my work with AA Trees:

AA Tree’s are very similar to Red-Black trees. They add one rule in addition to the traditional Red-Black tree rules, and that is that a Red node may only be a Right child. This eliminates a substantial amount of special cases, making this algorithm far easier to implement. Interestingly enough, AA trees perform as well as Red-Black trees. Since the AA algorithm is simpler, it takes less time to compute, but since it makes more rotations then Red-Black trees the performance evens out. Due to the nature of the algorithm (which shapes the tree), AA trees also tend to have better retrieval times then Red-Black trees. For those unfamiliar with the tree (which is probably most people), there are only two operations required for maintaining the tree (compared to the red-black tree’s five). A right rotation occurs when an insertion or deletion creates a red left child; this is known as a Skew. Split’s perform a (conditional) left rotation when a function creates two red links in a row.

My program implements a string to integer map using AA Trees. As an example, it counts word frequencies in a file. The string handling is rather rudimentary (converts to lower case and tosses out punctuation), so some of the words are kind of funky looking, but it works pretty well. I was surprised at how fast this program ran. I didn’t time it exactly, but it scanned a 11.7MB text file full of text in less than 30 seconds.

During my coding I hit several roadblocks. Coding the algorithm went well, but when I was programming my insertion function it all went screwy. I rewrote it three or four times only to realize that my initialization in my constructor was wrong (while cascading assignments for pointers I forgot the NULL at the end). While stepping through the program for an hour using GDB, I was rather clueless as to what was going on. Everything was wrong. None of the pointers were right, none had key data stored, and it crashed randomly. After fixing this I ran into an error with Skew running on my root pointer, which should never happen. Once I got passed these problems all remaining changes were pretty cosmetic.

I would love to package this up using templates and reuse this tree implementation elsewhere. It is pretty applicable, but again, the STL has a very good built-in map class.

About the code:
I did not flesh this code out much. The tree lacks a deletion function, and I show no examples of how to retrieve a word’s count using Lookup, though it is pretty straight forward. I use Lookup primarily as a wrapper for Insert (and for clarity), differentiating the names for their intended use within the code. There is also a large amount of pointer manipulation in this code, which I tried to document, but it may still be confusing to those new to pointers. Interestingly enough, while the algorithm itself contains a significant amount of references to red and black nodes (which this algorithm is based on), there is no reference to such in the code. The code assumes all Right links are red nodes, and instead everything is based off of the “level” of a node, which is the number of left links that must be taken to reach a NULL pointer.

It comes with a few text files as well. Data.txt is a smaller text file I used to test, being about 500KB or so, which resulted in out.txt. The file test.txt was a small test, and bigout.txt is the result of scanning the whole 11.7MB file.

Code -> http://ew.xidus.net/download/cs3/tree

Note: A wonderful summary, analysis, and comparison between some of the most common efficient data types can be found at http://www.upgrade-cepis.org/issues/2004/5/up5-5Mosaic.pdf. I recommend at least a glance at the diagrams. It is obvious that AA trees are outperformed on many levels, but they hold steady and beat data types on various challenges. Considering the simplicity of the algorithm and ease of implementation compared to the others, I’d say its far more useful.

Leave a Comment