PATRICIA
PATRICIA – Practical Algorithm to Retrieve Information Coded in Alphanumeric, D.R.Morrison (1968).
A PATRICIA tree is related to a Trie. The problem with Tries is that when the set of keys is sparse, i.e. when the actual keys form a small subset of the set of potential keys, as is very often the case, many (most) of the internal nodes in the Trie have only one descendant. This causes the Trie to have a high space-complexity.
A Trie uses every part (bit, character, ...) of the key, in turn, to determine which subtree to select. A PATRICIA tree instead nominates (by storing its position in the node) which element of the key will next be used to determine the branching. This removes the need for any nodes with just one descendant:
PATRICIA's index differs from Fredkin's Binary Trie structure in that the index records only true [i.e. genuine] branches; where a phrase has only one proper right extension, it is not recorded in the index. This fact reduces the number of index rows to only twice the number of starts, amd makes it independent of the length of the stored phrases.
— Morrison 1968 pp520.
It is easiest to create a PATRICIA tree for keys (strings) over an alphabet of size two: {a,b}, or {0,1}. However, strings over an alphabet of more than two elements, e.g. ascii, can be treated as strings over an alphabet of two by taking the bits within each character of the larger alphabet.
The following example shows the growth of a PATRICIA tree under a sequence of insertions:
empty -- initial state
12345 -- number character positions insert ababb -- the key
----> ababb
- insert ababa;
- search ends at ababb~=ababa;
- 1st difference is at position 5, so...
----> [5] -- i.e. test position #5 . . a. . b . . ababa ababb
- insert ba;
- has no position #5;
- can skip key positions but must test in order, so...
--------> [1] . . . . . . [5] ba . . . . . . ababa ababb
- insert aaabba;
- search ends at ababb~=aaabba;
- can skip key positions but must test in order, so...
--------> [1] . . . . . . [2] ba . . . . . . aaabba [5] . . . . . . ababa ababb
- insert ab;
- ab is also a prefix of ababa and ababb;
- must have ability to terminate at an intermediate node, as with Tries.
-------> [1] . . . . . . [2] ba . . . . . . aaabba [3]--->ab . . . [5] . . . . . . ababa ababb
Dealing with a key, such as `ab', which is the prefix of another key, e.g. `ababa', can be handled in various ways. An actual, or notional, terminating character, often denoted `$' (or `\0' in C and its relatives) can be considered to be a third character in the alphabet, but only allowed to occur at the ends of keys. This slight complication does not arise in the special case that all keys have the same length.
Notes
- D. R. Morrison.
PATRICIA -
Practical Algorithm to Retrieve Information Coded in Alphanumeric.
Jrnl. of the ACM, 15(4) pp514-534, Oct 1968.
Originally presented PATRICIA as an index for searching in marked-up text. - G. H. Gonnet. Handbook of Algorithms and Data Structures.
Addison-Wesley, International Computer Series, pp 109, 1984.
Contains a nice straightforward coding of the PATRICIA algorithms, and of many others. This book is a great resource. - R. Sedgewick.
Algorithms in C. Addison-Wesley, edn 1, pp253, 1990.
Stores elements (or pointers to elements) within internal `fork' nodes. - See Suffix Trees for string searching etc..