Instruction Sets

Below are the rudiments of typical computer instruction-sets as they affect code-generation by a compiler. The main kinds of machines are 0-address, 1-address, 2-address, and 3-address machines. A particular computer may have instructions of only one kind, or it may have a mixture of instructions of two or more kinds.

General

Typical operation-code mnemonics:: <opcd> ::= load | sto (store) | add | sub | rev_sub | mult | div | rev_div | and | or | compare |...
<op> ::= + | - | rev- | * | % | rev% | & | or | compare |...
<cond> ::= eq | ne | le | lt | ge | gt |...; note that (x `rev-` y) is (y-x), and that (x `rev%` y) is (y%x).; Division (div) typically leaves the result and the remainder in a pair of registers on a 2-address machine, possibly an even-odd pair.
There may be instructions for: bit-shifting (left | right) and (arithmetic | logical | circular).
As well is general-purpose integer-|index-registers, there may also be separate registers for floating-point arithmetic (possibly single and double precision).
The following ignores byte- | half-word- | word- | double-word |... addressing. It assumes that memory is in units all of one size -- the word. In fact many machines use byte-addressing where a word is probably 4 bytes (?maybe 8 bytes on a 64-bit machine?) and is probably aligned on boundaries that are multiples of that size. In that case an op-code would also indicate if it stands for a half-word, word or double-word operation where necessary.

0-address machine (reverse Polish)
where the processor has a stack and some supporting hardware, at least a top of stack (TOS) pointer.
operation	e.g. or comment
load_literal <int>	effect: TOS:=TOS+1; stack[TOS]:=<int> load a constant onto the top of stack; this can be used in arithmetic or to get an address onto the stack for use by a load or a store instruction later (it is splitting hairs to argue whether the literal is an address or a constant which might happen to be used as an address elsewhere)
load	effect: stack[TOS]:=memory[stack[TOS]] take the top-of-stack as an address, replace the top-of-stack with the contents of that address.
sto	effect: memory[stack[TOS-1]]:=stack[TOS]; TOS:=TOS-2 store contents of top of stack at the address in stack[TOS-1] then pop the value and the address
<opcd>	where <opcd> is add \| sub \|... effect: stack[TOS-1] := stack[TOS-1] <op> stack[TOS]; TOS:=TOS-1
possible code generated for x := y + z: load_literal @x load_literal @y load load_literal @z load add sto where @x is the address of x; the compiler retrieves the address of a variable from the compiler's lookup-table, e.g., @x might be 36, say. Some possible extensions to the instruction set include: load <addr> (equiv. to load_literal <addr>; load); sto <addr> (similar); instructions (such as permute, or duplicate) to operate on elements near the top of the stack (which can make up for the lack of reverse-subtract or reverse-divide)

1-address machine
where the processor has a special accumulator which is implicitly used in each arithmetic operation. The accumulator could be the only register, or there could also be one or more index-registers in which case there would very probably be accumulator-to-indexReg transfer operations. In the former case an address is limited to being an integer constant; in the latter case indexed addresses would be allowed.
operation	alternative notation	e.g. or comment
load <addr>	acc:=<addr>	load the word at the given address into the acc
load_indirect₁ <addr>	acc := [<addr>]	use the word at the address as a pointer to the word to be loaded into acc; something like this is necessary if there are no index registers, or...
load_indirect₂	acc := [acc]	... use the contents of the acc as a pointer to the word to be loaded into the acc; something like this is necessary if there are no index registers
sto <addr>	<addr>:=acc	store contents of accumulator at <addr>
store indirect, similar considerations to load indirect
<opcd> <addr>	acc <op>= <addr>	effect: acc := acc <op> memory[<addr>]
possible code generated for x := y + z: load @y --acc := y add @z --acc += z sto @x --x := acc the compiler retrieves the address of a variable from the compiler's lookup-table.

2-address machine (?1.5-address?)
where <reg> ::= R0 ... R15, say. general-purpose index- \| integer- registers. <addr> ::= <offset> \| <offset>[<reg>] if present the register is an index and it and the offset are added to give the address. In some machines only the early registers, e.g., R0-R3, can be used for indexing. In some 2-address machines indexing with R0 is used to denote do not index, in which case <offset>[R0] is equivalent to <offset>.
operation	alternative notation	e.g. or comment
load <reg>, <reg>	<reg>:=<reg>	load R7, R3, or R7:=R3 transfer from one register to another
<opcd> <reg>,<reg>	<reg> <op>= <reg>	add R7, R3, or R7 += R3, arithmetic operation on two registers
load <reg>, <addr>	<reg>:=<addr>	load R7,36[R2], or R7 := 36[R2]. load the register with the contents of the word at the given address
sto <reg>, <addr>	<addr>:=<reg>	sto R7,36[R2], or 36[R2] := R2, store the contents of the register in the word at the given address
<opcd> <reg>, <addr>	<reg><op>=<addr>	add R3, 36[R2], or R3+=36[R2]
jmp <reg>	jmp <reg>	unconditional jump to the address held in the register
jmp <addr>	jmp <addr>	unconditional jump to the address
jmp <cond> <addr>	jmp <cond> <addr>	jump to the address if the condition has been set, e.g., by a compare instruction
BAL <reg>, <addr>	BAL <reg>, <addr>	Branch and link: save the program counter in the register (e.g., for procedure return) and then jump to the address
possible code generated for x := y + z: load R5, @y --R5 := y add R5, @z --R5 += z sto R5, @x --x := R5 the compiler retrieves the address of a variable from the compiler's lookup-table. Note that local, non-local and global variables in a block-structured language require different, perhaps more complex, code sequences to access them. Each instruction above applies to a register and a "full" memory address. Arguably a register is a restricted kind of address, so you might call them 1.5-address instructions. There may also be some 2-full-address instructions. These are in fact necessary for any operation on arbitrarily long operands such as strings or decimal numbers; there is also the matter of specifying an operand's length -- either in the instruction, or in the operand, or in a register.
<opcd> <addr>,<addr>	<addr> <op>= <addr>	add 10[R3], 20[R6], or 10[R3] += 20[R6]

3-address machine
where three-address instructions act on values in memory, not in registers.
operation	alternative notation	e.g. or comment
<opcd> <addr> <addr> <addr>	<addr> := <addr> <op> <addr>	add 10[R1],20[R2],30[R3], or 10[R1]:=20[R2]+30[R3] Combine the second and third operands using the operator (e.g., +) and store the result in the location indicated by the first address.
possible code generated for x := y + z: add @x, @y, @z

Register-file machine (3 register addresses)

has 2-address (1.5-address) instructions for loading from, and storing in, memory but the other instructions each specify three registers.

operation

alternative notation

e.g. or comment

load <reg> <addr>

sto <reg> <addr>

add R2 R3 R4, or
R2:=R3+R4
etc.

possible code generated for x := y + z:

load R1, @y	--R1 := y
load R2, @z	--R2 := z
add R6, R1, R2	--R6 := R1 + R2
sto R6, @x	--x := R3

On the CDC 6600 machines, address-registers, A0-A7, were paired with general arithmetic registers, R0-R7 (floating-point values, or integer values stored as unnormalised floats). Loading an address into A0-A5 (A6-A7) caused the corresponding general arithmetic register to be loaded from (stored into) the memory word having that address. There were 3-address instructions on the general arithmetic registers:

possible code generated for x := y + z:: A1 :=literal @y --R1 := y; A2 :=literal @z --R2 := z; R6 := R1 + R2; A6 :=literal @x --x := R6 (!)

Note that this form of instruction is efficient for stepping through an array because adding one to an A-register causes the corresponding R-register to be loaded from (stored in) the array.

The zero-address machine needs more instructions for x := y + z than the 3-address machine, say, but they are shorter instructions so the 0-address machine does not necessarily have a performance disadvantage, and the 3-address machine does not necessarily have a performance advantage, because of this.

-- LA

General

0-address machine (reverse Polish)

1-address machine

2-address machine (?1.5-address?)

3-address machine

Register-file machine (3 register addresses)