Standard Factorization of a Word (String)

LA home
Computing
Algorithms
 Glossary
 Strings
  Suffix array
  BWT
  Factors
  Suffix tree
  Factors

also see
  necklaces

A Lyndon word is a word that is lexicographically (alphabetically) strictly smaller than all of its non-trivial cyclic rotations. For example, ‘a’, ‘ab’, ‘aabab’ and ‘abac’ are Lyndon words, and ‘aa’, ‘abab’, ‘ba’ and ‘abaac’ are not.

An arbitrary word, w, can be uniquely factorized, w = w0 w1 ... wm-1, into m factors, m≥1, such that each factor, wi, is a Lyndon word, and w0 ≥ w1 ≥ ... ≥ wm-1. Duval (1983) gave two linear-time algorithms for this task, one making 2|w| comparisons and using O(1)-space, the other making 3|w|/2 comparisons but using O(|w|)-space.

w=
op
The smallest proper suffix (smallest non-empty suffix) of w is the smallest factor, that is the last factor.
If the factorization of w is of the form t...uvk, k≥1, where factor u is greater than factor v, then the smallest cyclic rotation of w is vkt...u. (If the factorization is vk, k≥1, the smallest rotation is just w itself.)
 
-- L.A., August 2009
 
J. P. Duval, Factorizing words over an ordered alphabet, J. of Algorithms, 4, pp.363.381, 1983.
www #ad:

↑ © L. Allison, www.allisons.org/ll/   (or as otherwise indicated).
Created with "vi (Linux)",  charset=iso-8859-1,   fetched Tuesday, 16-Apr-2024 07:23:02 UTC.

Free: Linux, Ubuntu operating-sys, OpenOffice office-suite, The GIMP ~photoshop, Firefox web-browser, FlashBlock flash on/off.