Arithmetic on bigInts

L. Allison, FIT, Monash University, Australia 3800.

16 August 2021

1. Introduction

An arbitrarily big integer, a bigInt, can be represented by a sign (±) and a vector (array) of digits in some base:
      sign D[d-1], ..., D[0]
where the base is chosen so that each digit fits in an integer as can be manipulated by basic computer operations. Addition (subtraction) of two bigInts can be performed digit by digit taking account of any carries (borrows), e.g., in base ten, 9+9+1 = 19 = 9 carry 1; it is convenient if 'digit1+digit2+carry' cannot overflow. Multiplication of two bigInts can be performed by various algorithms that ultimately rely on basic computer operations to multiply individual digits and add any carries. Consider for example, in base ten,
      9×9+8 = 81+8 = 89 = 9 carry 8
and, in base 16,
      1111×1111+1110 = 11101111 = 1111 carry 1110
      (15×15+14 = 239 = 14×16+15 = 15 carry 14).
We see that the product of two b-bit "digits" can produce a 2b-bit result. If a 2b-bit integer fits in one computer Int then two digits can be multiplied by a basic computer multiply. Failing that, two digits must be multiplied by a small software procedure of a few computer operations. Division of bigInts can be performed by multiplication combined with other operations on bigInts.


(← may take a few seconds) 
Figure 1: Some Results

Timings of long division and long multiplication are reasonably consistent with quadratic time-complexity O(d2)  (d~log N), div having a constant several times bigger (but this version is operating bit-wise not digit-wise like '×'). Note that N.toString() does a special 'divide by 10' repeatedly (log10(N) times) and takes maybe O(d2)-time overall but is very lumpy (try 64K bits or more), while N.toBinary() is very fast of course. (And the garbage collector may kick in at any time.)

2. Addition and Subtraction

Addition and subtraction of d-digit integers take O(d)time.

3. Multiplication

3.1. Long-Multiplication

       1 2 3
     x 4 5 6
   ----------.
   4 9 2     | 4
     6 1 5   | 5
       7 3 8 | 6
   ---------
   5 6 0 8 8
    Figure 2: Long-Multiplication.

For d-digit integers, long-multiplication takes O(d2)-time.

3.2. Karatsuba ×

Multiply two d-digit integers by multiplying three half-length, (d/2)-digit, integers [Kar62].

A;B and C;D are big integers.
A;B × C;D = A×C<< + (A×D+B×C)< + B×D
now (A×D+B×C) = −(A−B)×(C−D) + A×C + B×D   (also = (A+B)×(C+D) − A×C − B×D),   hence
A;B × C;D = A×C<< + A×C< − (A−B)×(C−D)< + C×D< + C×D
So to multiply A;B × C;D we only need to do three multiplications of half-size integers, A×C, C×D and (A−B)×(C−D), and some addition, subtraction and left-shift (<).

The algorithm has O(dlog2(3)) time complexity for d-digit multiplication, log2(3)≈1.58, so it beats long-multiplication for big enough bigInts.

2.2.1 The polynomial view:

An integer corresponds to a polynomial, the integer's digits being the coefficients of the polynomial. Multiplying two integers is mirrored by multiplying their corresponding polynomials to give the product polynomial. Recall that a polynomial of degree 'd' is defined by its values at d+1 points. In particular, a quadratic is defined by its values at three points, e.g., 0, 1 and −1, say.

f(x) × g(x) = (Ax + B) × (Cx + D) = Wx2 + Yx + Z = p(x).

p(0)   = Z = f(0) × g(0) = (0+B)×(0+D) = BD
p(1)   = W + Y + Z = (A+B)×(C+D)
p(−1) = W − Y + Z = (−A+B)×(−C+D),

p(1)−p(−1) = 2Y = (A+B)×(C+D) − (−A+B)×(−C+D) = 2(AD + BC)
p(1)+p(−1) = 2(W + Z) = (A+B)×(C+D) + (−A+B)×(−C+D) = 2(AC + BD)
W = ×AC,   Y = AD+BC,   Z = ×BD, so it is right
and, looking at p(−1),  Y = W+Z−(A−B)×(C−D)

3.3. Toom-Cook × (Toom3)

Multiply two d-digit integers by multiplying five (d/3)-digit integers, version after M. Bodrato [Bod07].

A;B;C × D;E;F
f(x) × g(x) = (Ax2 + Bx + C) × (Dx2 + Ex + F) = Ux4 + Vx3 + Wx2 + Yx + Z = p(x)

p(0)   = Z = f(0) × g(0) = C×F
p(1)   = U+V+W+Y+Z = (A+B+C) × (D+E+F)
p(−1) = U−V+W−Y+Z = (A−B+C) × (D−E+F)
p(−2) = 16U−8V+4W−2Y+Z = (4A−2B+C) × (4D−2E+F)
p(∞):  U = A×D

Z = p(0) = CF ✓
U = p(∞) = AD ✓
r = (p(-2) − p(1))/3
    = (((4A−2B+C)×(4D−2E+F)) − ((A+B+C)×(D+E+F)))/3
    = 5AD−3AE+AF−3BD+BE−BF+CD−CE
s = (p(1) − p(-1))/2
    = ((A+B+C)×(D+E+F)) − ((A−B+C)×(D−E+F)))/2
    = AE+BD+BF+CE
t = p(-1) − Z
    = (A−B+C)×(D−E+F)−CF
    = AD−AE+AF−BD+BE−BF+CD−CE
V = (t − r)/2 + 2.U
    = AE+BD ✓
W = t + s − U
    = AF+BE+CD ✓
Y = s − V
    = BF+CE ✓
note five '×' (and some shifts, '+'s and '−'s) to calculate p(0), p(1), p(−1), p(−2) and p(∞) and from them U, V, W, Y and Z.

The algorithm has O(dlog3(5)) time complexity (log3(5)≈1.465). This beats Karatsuba asymptotically but the more extensive house-keeping makes for a larger constant of proportionality.

3.4. Faster Multiplication

The Schonhage-Strassen algorithm [SS71], running in O(n log n log log d)-time, was the asymptotically fastest multiplication algorithm for more than four decades. Harvey and van der Hoeven [HV19] finally removed the log log d term (as S and S had speculated would be possible); the improvement is of great theoretical interest but little practical use.

        1   2   3      123 x 456
        <--------
    6 | 7   3   8
      |           .
    5 | 6   1   5   .
      |           .   .
    4 | 4   9   2   .  8
      v   .   .   .  8
            .   .  0
              .  6
               5

    123 x 456 = 56088
      Figure 4: Another Angle on Long-Multiplication.

Long multiplication is closely related to vector convolution, M conv N, where we pair-wise multiply elements and add up diagonals and that is all, the carries are dealt with afterwards (this is linear convolution below):

        1   2   3      123 conv 456
        <--------
    6 | 6  12  18
      |           .
    5 | 5  10  15   .
      |           .   .
    4 | 4   8  12   . 18
      v   .   .   . 27
            .   . 28
              . 13
               4

    4,13,28,27,18  >carries>  5, 6, 0, 8, 8, i.e., 56088
      Figure 5: Linear Convolution

Trouble is, the obvious way to do the above still takes quadratic time, O(d2), but there is a faster method.

The Schonhage-Strassen algorithm [SS71] relies on the Fast Fourier Transform (FFT) algorithm [CT65] which calculates the Discrete Fourier Transform (F) of a vector and on the fact that
      conv(X, Y) = F-1( F(X) . F(Y) )
      where F(X) . F(Y) is element-wise multiplication.
The FFT and FFT-1 algorithms carry out O(d log d) arithmetic operations.

4. Division

To divide M≥0 by N>0, M ÷ N, find quotient Q≥0 and remainder r∈[0, N−1] such that M = Q×N + r,  e.g., for 14÷4, Q=3 and r=2.

4.1. Long-Division

Long-division has O(d2) time complexity.

   e.g., 3219876 ÷ 394

            8172 rem 108
        .-------
   394  |3219876
    x8 = 3152
         ----
           678
    x1 =   394
           ---
           2847
    x7 =   2758
           ----
             896
    x2 =     788
             ---
             108
     Figure 3: Long-Division

Long-division can be carried out in any base. If the base of the stored integers is a power of two (2k) the algorithm can still act in terms of binary (21) and this is the simplest version to implement because the only possible digit-multipliers are zero and one, however the time-complexity does decrease as the base increases because there are fewer digits.

4.2. Recursive Division

There are recursive algorithms [BZ98] to divide bigInts, e.g.,

To divide M by N, giving a quotient, Q, and a remainder:
Initialise Q := 0.
If M < N return 0 remainder M.
If M is "short", or if N is "nearly" as long as M, use long-division.
Otherwise M > N, M is "long" and M is longer than N.
If N is "long" set half-length integers N1 and N0 such that N=N1;N0 and set N1' =N1+1 (note that (N1'<<|N0|) > N), otherwise N is "short", set N1' = N.
Set half-length integers M1 and M0 such that M=M1;M0; note that M1 <<|M0| ≤ M. Set Q1 := (M1 ÷ N1') <<|M0|-|N0| (←recursion); note that Q1 ≤ M ÷ N and Q1 > 0. Set Q := Q + Q1, and M := M−Q1×N.
Repeat, accumulating the quotient Q, until M < N, which leaves the remainder in M.
(<<k is left-shift by k digit places and |X| is the # of digits in X.)

Naturally this division algorithm avails itself of the fastest multiplication algorithm.

5. Conclusion

To make the very fastest implementation of operations on large integers it is best to use a programming language that is close to the underlying hardware and C, C++ or assembler are obvious choices. Javascript is not that kind of choice but it does run in web browsers which is convenient for interactive demonstrations of algorithms.

References

[AKS21] L. Allison, A. S. Konagurthu and D. F. Schmidt 'On universal codes for integers: Wallace Tree, Elias Omega and beyond', Data Compression Conference (DCC), IEEE, pp.313-322, March 2021 [more] [doi:10.1109/DCC50243.2021.00039]
An application of bigInts to encoding and decoding large integer values for data compression and machine learning.

[Bod07] M. Bodrato. 'Toward optimal Toom-Cook multiplication for univariate and multivariate polynomials in characteristic 2 and 0', Int. Workshop on Arith. & Finite Fields (WAIFI), Springer, 2007, [doi:10.1007/978-3-540-73074-3_10].
Also see Toom-Cook@[wikip]['21].

[BZ98] C. Burnikel and J. Ziegler, 'Fast recursive division', Im Stadtwald, D-Saarbrucken, 1998 [www]['21].

[CT65] J. W. Cooley and J. W. Tukey, 'An algorithm for the machine calculation of complex Fourier series', Math. Comp., 19, 297-301, 1965 [doi:10.1090/S0025-5718-1965-0178586-1]
Also see the FFT@[wikip]['21].

[HV19] D. Harvey and J. Van Der Hoeven, 'Integer multiplication in time O(n log n)', HALarchive, 2019 [www]['21].

[Kar62] A. Karatsuba and Yu. Ofman, 'Multiplication of many-digital numbers by automatic computers', Proc. USSR Acad. Scis, 145, pp.293-294, 1962
Translation in the acad. j. Physics-Doklady, 7, pp.595-596, 1963.
Also see algorithm@[wikip]['21].

[SS71] A. Schonhage and V. Strassen, 'Schnelle multiplikation grosser zahlen', Computing, 7, pp.281-292, 1971 [doi:10.1007/BF02242355]
O(n log n log log n)-time. Also see algorithm@[wikip]['21].