## On sufficient statistics of least-squares superposition of vector sets

#### Arun S. Konagurthu, Parthan Kasarapu, Lloyd Allison, James H. Collier, and Arthur M. Lesk

#### Journal of Computational Biology, 22(6), pp.487-497, doi:10.1089/cmb.2014.0154, May 2015,

**Abstract**:
The problem of superposition of two corresponding vector sets by
minimizing their sum-of-squares error under orthogonal transformation
is a fundamental task in many areas of science,
notably structural molecular biology.
This problem can be solved exactly using an algorithm whose time
complexity grows linearly with the number of correspondences.
This efficient solution has facilitated the widespread use of
the superposition task, particularly in studies involving macromolecular
structures.
This article formally derives a set of sufficient statistics for
the least-squares superposition problem.
These statistics are additive.
This permits a highly efficient (constant time) computation of
superpositions (and sufficient statistics) of vector sets that
are composed from its constituent vector sets under addition or
deletion operation, where the sufficient statistics of the constituent
sets are already known (that is, the constituent vector sets have
been previously superposed).
This results in a drastic improvement in the run time of the methods
that commonly superpose vector sets under addition or deletion operations,
where previously these operations were carried out ab initio
(ignoring the sufficient statistics).
We experimentally demonstrate the improvement our work offers in the
context of protein structural alignment programs that assemble a
reliable structural alignment from well-fitting (substructural)
fragment pairs.
A C++ library for this task is available online under an
open-source license.