Hierarchical Lists

This chapter considers hierarchical lists after the style of the programming language Lisp. Where the lists of a previous chapter were flat or one-level. Hierarchical lists contain both atomic elements and sublists. This allows them to store structured data and they are general enough to be the only structured data type of the programming language Lisp. (Any serious computer scientist is strongly recommended to read The LISP 1.5 Programmer's Manual by J. McCarthy et al, M.I.T. Press 1962.)

Types:
   list e = atom e | nil | cons (list e × list e)

Constant:
   nil

Operations:
   atomic  :list e -> boolean
   null    :list e -> boolean
   head    :list e -> list e
   tail    :list e -> list e
   cons    :list e × list e -> list e

Rules, for all x:e and L:list e:
   atomic(atom(x)) = true
   atomic(nil) = atomic(cons(L,L')) = false

   null(nil)=true
   null(cons(L,L')) = false
   null(atom(x)) = false

   head(cons(L,L')) = L
   head(nil) = head(atom(x)) = error

   tail(cons(L,L')) = L'
   tail(nil) = tail(atom(x)) = error

Shorthand
  [L1, L2, ..., Ln]
    = cons(L1, cons(L2, ...,cons(Ln,nil) ...))

The Hierarchical List Abstract Data Type.

(It is also common to use "curried" versions of functions especially in functional languages.)

There is a convention that lists do not terminate in an atom. This could be enforced by making cons(L,atom(e)) an error.

Most of the operations on flat lists also work on hierarchical lists. Note however that a sublist is considered to be an element so that reversal of a list will not reverse its sublists.

reverse( [1, [2,3], 4] ) = [4, [2, 3], 1]

The map function, as previously defined, applies a function to every element of a list. There is no difficulty if those elements are themselves lists.
map(reverse, [[1,2,3], [4,5,6,7]])
  = [[3,2,1], [7,6,5,4]]

Hierarchical lists allow information with substructure to be stored. Some list operations are only applicable to hierarchical lists. For example, such a list can be flattened.

flatten(nil) = nil  |
flatten(atom(e))   = [e]    |
flatten(cons(L,L')) = append(flatten(L),flatten(L'))

eg. flatten [1, [2, 3], 4] = [1, 2, 3, 4]

Note that this version of flatten potentially takes O(n2) time (exercise: why?). An O(n) version which does not use append is possible using an accumulating parameter.

There are three alternative cases in the definition of the new list type - atomic, empty or non-empty. This means that most functions for manipulating lists have three cases in their definition.

f( atom(e) )    = ... |
f( nil )        = ... |
f( cons(L,L') ) = ...often f(L) and/or f(L')...

General Pattern.

If one of these cases is missing it is probably the sign of an error. The atomic and null cases are usually very simple, often returning a simple function of the atom or a constant value. This leaves only the constructed case. Here the result often involves a recursive call of the function on one or both of the sublists. As an example, consider counting the atoms in a list:
count( atom(e) )    = 1 |       % count atoms in list
count( nil )        = 0 |
count( cons(L,L') ) = count(L)+count(L')

Match this instance of a function against the general pattern given previously.

Implementation of Hierarchical Lists

The most natural way to implement the new list type is with pointers and records although arrays can also be used. The empty list can be represented by the nil pointer but there are two other cases and these must be implemented by a union.

There are two classes of node or cell in these lists; the tag `t' indicates the class of a cell. An atomic cell contains an element. A cons cell points to two sublists; by convention the second is always non-atomic.

If this implementation can be assumed, it is faster to substitute in-line code for the basic operations rather than to call the routines.

Because a list structure can now contain sublists to an arbitrary depth, printing a list becomes more complex.

Note that this routine uses a loop to iterate along the elements of the list. Recursion is used to print a sublist. The recursion can proceed to an arbitrary depth to match the list structure.

Because there is no implicit conversion from the element type to an atomic list, it is necessary to have a routine to construct an atomic cell. This necessitates trivial changes to some of the routines previously defined on flat lists.

Lists and Trees

A (rooted) tree structure consists of nodes (or vertices) and arcs (or edges). A special ancestral node is called the root. An arc points from a parent node to each of its child nodes, if any.

©
L
.
A
l
l
i
s
o
n
root---------------->1
                   . . .
                 .   .   .
               .     .     .
              2      3       4
             . .             .
            .   .            .
           5     6           7

Example Tree.

Trees can be used to represent hierarchical data, such as a business organisation. They share this ability with hierarchical lists and the connection is much stronger; there is a way of representing such a tree by a list.
root------------------>1 nil
                       |
                       |
                       |
                       2-------->3-------->4 nil
                       |         |         |
                       |         |         |
                       |         |         |
                       |         nil       |
                       |                   |
                       5-->6 nil           7 nil

Tree Transformed into a List.

The technique is to link siblings together into a list.

Trees are very important in computing. A common type is the binary tree where each node has at most two descendants; it is the subject of the next chapter.

Testing and Debugging

The techniques of testing and debugging that apply to flat lists are if anything even more useful in the case of hierarchical lists. Pre and postconditions and assertions should be used wherever possible. Flat lists can only have problems of contents and termination. A hierarchical list can also have problems of incorrect structure. For this reason an output routine is very important so as to be able to print list values when things go wrong.

There are just three types of list - atomic, empty and non-empty. This realisation reduces the apparent complexity of lists and makes systematic testing and debugging possible. The key cases are an atom, nil, and a list of a "few" elements. If a case is missing from a routine it probably indicates an error. Depending on the application, the third case may include a small number of examples of different nesting structure.

For example, we might have written the following incorrect function to count the atoms in a list:

badcount(atom(e)) = 1 |
badcount(nil) = 0 |
badcount(cons(L,L')) = 1+badcount(L')

Testing on [1,2,3,4] would give the correct answer 4. Testing on [1,[2,3],4] would produce the incorrect answer 3 rather than the correct answer 4 and a diagnosis should easily follow. Care is needed however because testing on [1,[2],3,4] would not show the error up.

In general, testing can show a program to be wrong but can never prove it to be correct. Nevertheless judicious testing as an adjunct to program verification can add to confidence in a program. Matching test cases to the structure of a data type and to the structure of the routines acting on it leads to fast systematic testing and debugging.

Exercises

  1. An element at the top level of a list is said to be at a depth of 1. If L' is a sublist at depth 1 of L and x is at depth d of L' then x is at depth d+1 of L. Write a function to find the depth of the deepest element of a list.
  2. Write a routine to `reflect' a hierarchical list so that the list and all sublists are reversed as if seen in a mirror.
  3. Write a fast O(n) time routine to flatten a hierarchical list of n cells.
    eg. flatten [1, [[2, 3], 4], 5] = [1, 2, 3, 4, 5].
  4. Use a hierarchical list to represent the structure of (part of) your university or institute. This may contain faculties, schools and departments or perhaps it has other "layers". It will probably be necessary to add extra information fields to the list cells.
  5. Modify the routine that prints a list so that it makes use of indentation to represent sublist structure.
    eg. [1, 2, [3, 4, 5], [6, [7, 8]]] -->
    
        [1, 2,
           [3, 4, 5],
           [6,
              [7, 8]
           ]
        ]
    

    Note that there is no single best answer to this problem; the ideal style is a matter of taste.
  6. Convert the list printing routine, PutList, into iterative, non-recursive form.
© L.Allison