Instruction Sets

Below are the rudiments of typical computer instruction-sets as they affect code-generation by a compiler. The main kinds of machines are 0-address, 1-address, 2-address, and 3-address machines. A particular computer may have instructions of only one kind, or it may have a mixture of instructions of two or more kinds.

General

Typical operation-code mnemonics:
 
<opcd> ::= load | sto (store) | add | sub | rev_sub | mult | div | rev_div | and | or | compare |...
<op> ::= + | - | rev- | * | % | rev% | & | or | compare |...
<cond> ::= eq | ne | le | lt | ge | gt |...
 
note that (x `rev-` y) is (y-x),  and that (x `rev%` y) is (y%x).
 
Division (div) typically leaves the result and the remainder in a pair of registers on a 2-address machine, possibly an even-odd pair.
 
There may be instructions for
bit-shifting (left | right) and (arithmetic | logical | circular).
 
As well is general-purpose integer-|index-registers, there may also be separate registers for floating-point arithmetic (possibly single and double precision).
 
The following ignores byte- | half-word- | word- | double-word |... addressing. It assumes that memory is in units all of one size -- the word. In fact many machines use byte-addressing where a word is probably 4 bytes (?maybe 8 bytes on a 64-bit machine?) and is probably aligned on boundaries that are multiples of that size. In that case an op-code would also indicate if it stands for a half-word, word or double-word operation where necessary.

0-address machine (reverse Polish)

where the processor has a stack and some supporting hardware, at least a top of stack (TOS) pointer.

operation e.g. or comment
load_literal <int> effect:
TOS:=TOS+1; stack[TOS]:=<int>
load a constant onto the top of stack; this can be used in arithmetic or to get an address onto the stack for use by a load or a store instruction later (it is splitting hairs to argue whether the literal is an address or a constant which might happen to be used as an address elsewhere)
load effect:
stack[TOS]:=memory[stack[TOS]]
take the top-of-stack as an address, replace the top-of-stack with the contents of that address.
sto effect:
memory[stack[TOS-1]]:=stack[TOS]; TOS:=TOS-2
store contents of top of stack at the address in stack[TOS-1] then pop the value and the address
<opcd> where <opcd> is add | sub |...
effect:
stack[TOS-1] := stack[TOS-1] <op> stack[TOS];
TOS:=TOS-1
possible code generated for   x := y + z:
 
load_literal @x
load_literal @y
load
load_literal @z
load
add
sto
 
where @x is the address of x; the compiler retrieves the address of a variable from the compiler's lookup-table, e.g., @x might be 36, say.
 
Some possible extensions to the instruction set include:
load <addr>  (equiv. to load_literal <addr>; load);
sto <addr>   (similar);
instructions (such as permute, or duplicate) to operate on elements near the top of the stack (which can make up for the lack of reverse-subtract or reverse-divide)
 

1-address machine

where the processor has a special accumulator which is implicitly used in each arithmetic operation. The accumulator could be the only register, or there could also be one or more index-registers in which case there would very probably be accumulator-to-indexReg transfer operations. In the former case an address is limited to being an integer constant; in the latter case indexed addresses would be allowed.

operation alternative notation e.g. or comment
load <addr> acc:=<addr> load the word at the given address into the acc
load_indirect1 <addr> acc := [<addr>] use the word at the address as a pointer to the word to be loaded into acc; something like this is necessary if there are no index registers, or...
load_indirect2 acc := [acc] ... use the contents of the acc as a pointer to the word to be loaded into the acc; something like this is necessary if there are no index registers
sto <addr> <addr>:=acc store contents of accumulator at <addr>
store indirect, similar considerations to load indirect
<opcd> <addr> acc <op>= <addr> effect:
acc := acc <op> memory[<addr>]
possible code generated for   x := y + z:
 
load @y   --acc := y
add @z   --acc += z
sto @x   --x := acc
 
the compiler retrieves the address of a variable from the compiler's lookup-table.
 

2-address machine (?1.5-address?)

where
<reg> ::= R0 ... R15, say.
general-purpose index- | integer- registers.
<addr> ::= <offset> | <offset>[<reg>]
if present the register is an index and it and the offset are added to give the address.
 
In some machines only the early registers, e.g., R0-R3, can be used for indexing.
 
In some 2-address machines indexing with R0 is used to denote do not index, in which case <offset>[R0] is equivalent to <offset>.
operation alternative notation e.g. or comment
load <reg>, <reg> <reg>:=<reg> load R7, R3, or
R7:=R3
transfer from one register to another
<opcd> <reg>,<reg> <reg> <op>= <reg> add R7, R3, or
R7 += R3,
arithmetic operation on two registers
load <reg>, <addr> <reg>:=<addr> load R7,36[R2], or
R7 := 36[R2].
load the register with the contents of the word at the given address
sto <reg>, <addr> <addr>:=<reg> sto R7,36[R2], or
36[R2] := R2,
store the contents of the register in the word at the given address
<opcd> <reg>, <addr> <reg><op>=<addr> add R3, 36[R2], or
R3+=36[R2]
jmp <reg> jmp <reg> unconditional jump to the address held in the register
jmp <addr> jmp <addr> unconditional jump to the address
jmp <cond> <addr> jmp <cond> <addr> jump to the address if the condition has been set, e.g., by a compare instruction
BAL <reg>, <addr> BAL <reg>, <addr> Branch and link:
save the program counter in the register (e.g., for procedure return) and then jump to the address
possible code generated for   x := y + z:
 
load R5, @y   --R5 := y
add R5, @z   --R5 += z
sto R5, @x   --x := R5
 
the compiler retrieves the address of a variable from the compiler's lookup-table. Note that local, non-local and global variables in a block-structured language require different, perhaps more complex, code sequences to access them.

Each instruction above applies to a register and a "full" memory address. Arguably a register is a restricted kind of address, so you might call them 1.5-address instructions. There may also be some 2-full-address instructions. These are in fact necessary for any operation on arbitrarily long operands such as strings or decimal numbers; there is also the matter of specifying an operand's length -- either in the instruction, or in the operand, or in a register.

<opcd> <addr>,<addr> <addr> <op>= <addr> add 10[R3], 20[R6], or
10[R3] += 20[R6]


3-address machine

where three-address instructions act on values in memory, not in registers.

operation alternative notation e.g. or comment
<opcd> <addr> <addr> <addr> <addr> := <addr> <op> <addr> add 10[R1],20[R2],30[R3], or
10[R1]:=20[R2]+30[R3]
Combine the second and third operands using the operator (e.g., +) and store the result in the location indicated by the first address.
possible code generated for   x := y + z:
 
add @x, @y, @z
 

Register-file machine (3 register addresses)

has 2-address (1.5-address) instructions for loading from, and storing in, memory but the other instructions each specify three registers.

operation alternative notation e.g. or comment
load <reg> <addr> <reg> := <addr>  
sto <reg> <addr> <addr> := reg>  
<opcd> <reg><reg><reg> <reg> := <reg><op><reg> add R2 R3 R4, or
R2:=R3+R4
etc.
possible code generated for   x := y + z:
load R1, @y  --R1 := y
load R2, @z  --R2 := z
add R6, R1, R2   --R6 := R1 + R2
sto R6, @x  --x := R3

On the CDC 6600 machines, address-registers, A0-A7, were paired with general arithmetic registers, R0-R7 (floating-point values, or integer values stored as unnormalised floats). Loading an address into A0-A5 (A6-A7) caused the corresponding general arithmetic register to be loaded from (stored into) the memory word having that address. There were 3-address instructions on the general arithmetic registers:

possible code generated for   x := y + z:
A1 :=literal @y   --R1 := y
A2 :=literal @z   --R2 := z
R6 := R1 + R2
A6 :=literal @x   --x := R6 (!)

Note that this form of instruction is efficient for stepping through an array because adding one to an A-register causes the corresponding R-register to be loaded from (stored in) the array.


The zero-address machine needs more instructions for x := y + z than the 3-address machine, say, but they are shorter instructions so the 0-address machine does not necessarily have a performance disadvantage, and the 3-address machine does not necessarily have a performance advantage, because of this.

-- LA