Standard Factorization of a Word (String)
A Lyndon word is a word that is lexicographically (alphabetically) strictly smaller than all of its non-trivial cyclic rotations. For example, ‘a’, ‘ab’, ‘aabab’ and ‘abac’ are Lyndon words, and ‘aa’, ‘abab’, ‘ba’ and ‘abaac’ are not.
An arbitrary word, w, can be uniquely factorized, w = w0 w1 ... wm-1, into m factors, m≥1, such that each factor, wi, is a Lyndon word, and w0 ≥ w1 ≥ ... ≥ wm-1. Duval (1983) gave two linear-time algorithms for this task, one making 2|w| comparisons and using O(1)-space, the other making 3|w|/2 comparisons but using O(|w|)-space.
- The smallest proper suffix (smallest non-empty suffix)
of w is the smallest factor, that is the last factor.
- If the factorization of w is of the form t...uvk, k≥1, where factor u is greater than factor v, then the smallest cyclic rotation of w is vkt...u. (If the factorization is vk, k≥1, the smallest rotation is just w itself.)
- -- L.A., August 2009
- J. P. Duval, Factorizing words over an ordered alphabet, J. of Algorithms, 4, pp.363.381, 1983.