Basic Maths

Basic Mathematics

In algorithmic analysis, one is interested in the amount of resources used by an algorithm and in the algorithm's correctness. The time complexity measures the amount of time used. The space complexity measures the amount of space used. The best case, average case and worst case complexities are of interest. The asymptotic complexity when a problem becomes "large" is usually important and O( ) notation is used to quantify this. Logic is invaluable in program proof.

The mathematics required for this analysis is summarised here. A mathematically able reader should skip this chapter. Other readers should scan it quickly and refer back to it when the material is needed in later sections.

Recurrence Relations

Recurrence relations often arise in calculating the time and space complexity of algorithms. The amount of time (or space) that a program takes, T_k, is typically a function of the size, k, of the input data. A recurrence relation arises when T_k is some function of T_k' for k'<k. The solutions of recurrence relations are often best proved by induction as illustrated in some of the following sections.

Logarithmic Complexity

The following recurrence

T_k = T_k/2 + a if k~=1
   = b       if k=1

has the solution

T_n = a*log₂(n) + b

when n is a power of 2. The solution should be no surprise because when the problem size is doubled from k/2 to k the resource used only increase by a constant a. Repeated doubling, i.e. 1, 2, 4, 8, 16, 32, ..., reaches any limit n in about log₂(n) steps. Unless otherwise stated, logarithms are to base two.

The mathematical proof of the solution by induction has two stages. First the base case for k=1 is shown to be a particular case of the general solution. Secondly, assuming the solution up to n/2, it is then shown to hold for the next value, n.

(i) T₁ = a*log 1 + b
    = b
(ii) T_n = T_n/2 + a
     = (a*log(n/2) + b) + a
     = a*(log n - log 2) + b + a
     = a*log n + b

The binary-search algorithm over n (sorted) objects has this time-complexity. It is a very good complexity to have because T_n grows only slowly as n gets large.

Linear Complexity

The solution of the following recurrence

T_k = b + T_k-1   if k~=1
   = a     if k=1

can be found as follows

T_n = b + T_n-1
   = b + (b+T_n-2)
   = 2b + T_n-2
     ...
   = (n-1)b + T₁
   = (n-1)b + a
   = n*b + (a-b)

Many linear-algorithms, linearly-recursive or iterative, have this time-complexity. See the section on linear search. While not as good as logarithmic complexity, linear complexity is somehow "fair": double the size of the problem and the algorithm uses twice as much resource.

Exponential Complexity

A third class of algorithms has complexity given by the following recurrence relation.

T_k = 2*T_k-1 + a   if k~=0
   = b          if k=0

The solution for T_n is:

T_n = (a+b)*2ⁿ-a

The solution is by induction:

(i)  T₀ = (a+b)*2⁰ -a
     = (a+b)-a
     = b

(ii) T_n = 2*T_n-1+a

     = 2*((a+b)2^n-1-a)+a

     = (a+b)*2ⁿ - a

Many binary-recursive procedures have this time-complexity; see for example the (slow) Fibonacci routine. Exponential complexity grows very rapidly indeed with the size of the problem. Just adding one to the problem size causes the complexity to roughly double. If a problem requires algorithms with this complexity and the problem size is large then its solution by computer may be possible in principle but infeasible in practice.

Complexity and big O

The time or space used by an implementation of an algorithm can be measured exactly in seconds or in bytes for a particular type of computer but will be different on another computer. Thus the precise constant of proportionality for a particular computer is often not of interest. Alternatively, some operations may be chosen and the number of these executed can be used as a machine-independent measure of the time complexity. For example, the number of comparisons and assignments is usually chosen to compare sorting algorithms. It may be difficult to calculate the exact complexity of an algorithm but its long-term asymptotic behaviour for big problems may be easier to work out. Again, the constant of proportionality may be difficult to calculate exactly and unimportant in many cases.

To address the issues discussed above, big `O' notation is used. Given two functions f and g, f is said to be order g, or f is O(g), if f grows no more quickly than some constant times g past some fixed point. In symbols:

Given  f, g :real->real
Define f is O(g)
if there exist k and m such that
n≥m   =>   f(n) ≤ k*g(n)

Beyond m, f does not exceed k*g for some fixed k and m. Eg.

x is O(3x)
x is O(x²)
0.01x² is O(x²)
3x²+7x-8 is O(x²)

It can be seen that if m<n then x^m is O(xⁿ). Only the dominant term with the largest exponent is significant in a polynomial. Normally we write O( ) of a polynomial by giving the highest order term with a coefficient of one, eg. we write O(x³) rather than O(7x³+x²).

   n log2(n) n*log2(n)     n**2     n**3        2**n

   1    0.00      0.00        1        1           2
   2    1.00      2.00        4        8           4
   3    1.58      4.75        9       27           8
   4    2.00      8.00       16       64          16
   5    2.32     11.61       25      125          32
   6    2.58     15.51       36      216          64
   7    2.81     19.65       49      343         128
   8    3.00     24.00       64      512         256
   9    3.17     28.53       81      729         512
  10    3.32     33.22      100     1000        1024

  99    6.63    656.31     9801   970299 6.338253e29
 100    6.64    664.39    10000  1000000 1.267651e30

-- Some Important Complexity Functions --

L
.
A
l
l
i
s
o
n
.
1
9
8
4

Eventually the exponential functions outstrip any polynomial ones. As a rough classification, problems having polynomial-time algorithms are often called feasible and those with only exponential-time algorithms are called infeasible. Let us say we have a problem and current technology and budget allows us to solve it for sizes up to n. If a new machine with twice the power becomes available, then if the problem is O(n) the new machine enables problems of size 2n to be solved. If the problem is O(n²), problems of size 1.4n can now be solved. These are big steps forward. However, if the problem is O(2ⁿ), problems only up to size n+1 can be solved!

The time or space used by an algorithm generally depends on some characteristic of the input data, often its size or length. Looking ahead to various algorithms are defined in later chapters, the simple 2-loop sorting algorithms on n items take O(n²) time and the faster algorithms such as heap-sort take O(n*log(n)) time. Linear-search over n items takes O(n) time on average and in the worst case. Binary-search takes O(log(n)) time.

Care must be taken in specifying the yardstick for measuring a problem's size. For example, the length of an integer is the number of digits and is proportional to the logarithm of the integer's value. To avoid confusion it must be clear which quantity is taken as its size.

Notation

f is O(g) iff: there are positive constants k and m s.t. 0≤f(n)≤k*g(n) for all n≥m.
i.e. f grows no more rapidly than g, in the long run, give or take a multiplicative constant.
f is Ω(g) iff: there are positive constants k and m s.t. f(n)≥k*g(n)≥0 for all n≥m; equivalently g(n)≤(1/k)*f(n) so g is O(f).
i.e. f grows at least as quickly as g, in the long run, give or take a multiplicative constant.
f is Θ(g) iff: f is O(g) and f is Ω(g), or equivalently f is O(g) and g is O(f).
i.e. f and g grow at the same rate, in the long run, give or take a multiplicative constant.
f is o(g) iff: f is O(g) and f is not Θ(g).
i.e. f grows strictly more slowly than g, in the long run, give or take a multiplicative constant.

Note that slightly more general Computer Science definitions can be given where f and/or g may take negative values but the above are sufficient for time- and space-complexity functions. More general definitions have been used in mathematics. See Knuth 1976.

Some Common Forms

The complexity of a single loop is often obvious.

for i :1..n
   body
end for

If the body of the loop always takes (approximately) the same amount of time to execute, regardless of i or n, the loop's overall complexity is O(n). Many programs on one-dimensional arrays have this form. See for example the earlier section on swapping array segments. Note that what appears to be a single loop may actually be a nested loop if the body calls a procedure or function that contains a loop.

Nested loops often, but not always, have worse time complexity than a single loop.

for i :1..m
   for j :1..n
      body
   end for
end for

Again provided that the time to execute the body is independent of i, j, m and n, the complexity is O(m*n). Many programs on two-dimensional arrays have this form. For example the dynamic-programming algorithm on strings A and B takes O(m*n) time where m=|A| and n=|B|. If the inner and outer loops are both executed from 1 to n, the complexity is O(n²). If the inner loop is executed i times the complexity is of the same order.

for i :1..n
   for j :1..i  % same complexity for j :i..n
      body
   end for
end for

The body is executed 1+2+3+...+n = n*(n+1)/2 times which is still O(n²).

Estimation of Complexity

Sometimes it is not easy to get a formula for the complexity of an algorithm. In such cases it may be possible to estimate it by experiment. Counting-variables can be added to the program, incremented when some critical operation is carried out and the final totals printed. The running time can also be measured, either by a stop-watch or better by calling a routine to print the computer system's clock. The complexity might be inferred by examining how such measures vary with the problem size.

The accuracy of timing a program or an operation can be improved by timing a number of executions, perhaps in a loop, and dividing the total time taken by that number. A time-shared computer is used by many people simultaneously. The elapsed time taken by a program depends on the system load. Therefore any timing done on a shared machine must be based on the central processor time devoted to the particular program under study and not on the elapsed time.

Examining differences between adjacent terms in a series can indicate the form of the underlying function that defines the series. A linear function, T(n)=a*n+b gives rise to constant difference between T(n) and T(n-1):
D₁(n) = T(n)-T(n-1) = a*n+b-a*(n-1)-b = a
A quadratic function T(n)=a*n²+b*n+c gives rise to linear first-order differences:
D₁(n) = T(n)-T(n-1) = a*n²+b*n+c-a*(n-1)²-b*(n-1)-c = 2a*n-a+b
which gives rise to constant second-order differences D₂(n) = D₁(n)-D₁(n-1). In general, a polynomial of degree d is revealed by constant d^th-order differences.

Plotting graphs of measured quantities against the problem size is another useful technique. If the complexity has the form T(n) = a*n^d for some order d then log(T(n)) = log(n^d)+log(a) = d*log(n)+log(a). In this case a plot of log(T(n)) against log(n) should give a straight line with slope d. Log(1) is zero, so the line crosses n=1 at log(a). If T(n) is O(n^d) the line and its slope may only be evident asymptotically for large n. A good range of values of n must be chosen. If T(n) = a*n^d+b*n^e where d>e and a is much smaller than b, T(n) may appear to be O(n^e) from small values of n whereas it is really O(n^d).

If T(n) = aⁿ then log(T(n)) = log(aⁿ) = log(a)*n and a plot of log(T(n)) against n should give a straight line with slope log(a).

Extra care must be taken if the behaviour of an algorithm depends not only on the size of the input but also on other characteristics of it. For example, some sorting algorithms examined later run faster or slower depending on the amount of ordering in the input. The results of a large number of runs on random data may have to be averaged to enable conclusions to be drawn.

Logic and Boolean

The use of logic is indispensable in computer programming. It is essential to know that a program is correct. For this reason some programming languages provide forms to help in proving programs correct, e.g.

assert <boolean expression>


if p then
   ...
else [not p must be true here]
   ...
end if


loop
[invariant <boolean expression>]
 ...
end loop


while p do
[invariant]            % provided that
 body                  % [invariant] body [invariant]
end while
[invariant and not p]


procedure p
[pre  <boolean expression>]
[post <boolean expression>]
 ...
end p

These forms are used to include tests of conditions that are necessary to the correct functioning of a program. If a test fails, the system halts the program with an informative message rather than allowing the program to run amok. Most systems also have flags or switches that allow this checking to be turned off once confidence in the program has been established. When carefully chosen, the tests can also be used to prove that a program is correct. To do this it is necessary to prove that the program terminates and that it produces the correct results when it terminates. (If your programming language does not have `assert' etc. or equivalents, you can generally create your own with suitable procedures or macros.)

The tests that can be included in a program are boolean expressions.

Type:
  boolean

Constants:
  true, false

Operations:
  and, or, =>, =, not = :boolean² -> boolean
  not :boolean -> boolean

Rules, p, q, r :boolean :
  p and q     = q and p                    %and
  true and p  = p
  false and p = false

  p or q     = q or p                      %or
  true or p  = true
  false or p = p

  p=>q = not p or q                        %implies

  not true  = false                        %not
  not false = true


  p and (q or r) = p and q or p and r      %distributive
  p or (q and r) = (p or q) and (p or r)   %laws

-- The Abstract Data Type Boolean --

Operators vary between programming languages. Note that these rules are mathematical ideals for a language to attempt to implement. A point of difference is that in some languages, e.g. Turing

p and q    can be replaced by    q and p

only when both p and q always terminate correctly and evaluate to true or to false (and have no side-effects), similarly for `p && q' in C and its close relatives. For example

x > 0 and 1/x > x

will return true or false for all values of x. But

1/x > x and x > 0

is not defined when x is zero due to trying to divide one by zero. The former is making use of

(false and p) = false  where p is  1/x > x

to avoid evaluating p when x=0. The fact that C and some other programming languages do this "short-cut" evaluation is often very convenient. A similar shortcut is taken for "true or p". Some languages, such as Pascal, do not take these short-cuts.

Sometimes to prove a program it is necessary to use logical expressions that are not easily expressed as boolean expressions in the programming language. In such cases comments can be used. They are not checked automatically but they make very informative comments for another reader.

— © L.A. 1984

Also see the [Wff] pages for more on logic.

Exercises

Suppose that algorithm A takes 1000n³ steps and algorithm B takes 2ⁿ steps for a problem of size n. For what size of problem is algorithm A faster than B?
What is the time complexity of the traditional manual integer multiplication method, i.e. `long multiplication', as a function of the length (not value) of the integers to be multiplied?
What is the time complexity of the usual matrix multiplication algorithm if it is assumed that basic arithmetic operations on numbers take O(1) (constant) time?
Insert a counting variable L in the inner loop of the dynamic programming algorithm for the edit distance problem. Do runs on pairs of strings of equal length `n' for various values of n. Plot log(L) against log(n).
Below is a list of pairs of statements. For each pair, indicate if one statement is stronger than (implies) the other (& say which), or not.
- It's raining. It's pouring.
- N is a multiple of six. N is even.
- A lack of vitamin C causes scurvy. Eating vitamins makes you healthy.
- The animal is a marsupial. The animal is a kangaroo.