7.2 KiB
CSCU9A5 Week 3
By the end of this week you should have the ability to:
- Explain the differences between static, stack, and heap storage allocation
- Describe the purpose of frames in a stack
- Explain why garbage collection is needed and describe an approach to carrying it out
- Describe what a template is in the context of code generation, and give simple examples
- Implement simple additions to the visitor pattern within a compiler
- Reflect on the value of where and when good commenting is needed
Runtime Organisation: Static Storage Allocation
Memory is basically like a very long list with each element having an address and a value.
If we have a variable that might change in size, and indirect representation is used - a pointer. This mean that the variable can occupy a fixed size space in memory because it just points to the address of where the data is really stored (usually in the heap).
Stack storage allocation
Stacks are last-in-first-out data structures: think of a stack of paper where you can only add or remove from the top. Stacks are an effective way to store local variables throughout the lifetime of a program.
Stack Storage Allocation:
- Variables are stored in frames; each frame contains the local variables for a routine
- Global variables are stored at the base of the stack
- Link data at the start of each frame contains:
- Static link (reference to the start of the frame of the routine containing the current one)
- Dynamic link (reference to the start of the frame for the previously active routine)
- Return address (reference to the code instruction to jump back to when the routine is finished)
- When a routine is called, a new frame is pushed onto the stack; when it returns, the frame is removed
- Arguments are placed on the stack immediately before a routine is called
- When a routine returns, its arguments and frame are replaced by a return value on the stack
Example of Routines and returns
let
var g: Integer;
func F(m: Integer, n: Integer): Integer ~
m * n
in
begin
getint(var g)
putint(F(g, g+1))
end
PUSH 1
LOADA 0[SB]
CALL getint
LOAD 0[SB]
LOAD 0[SB]
PUSH 1
CALL add
CALL(SB) F
CALL putint
POP
HALT
F:
LOAD -2[LB]
LOAD -1[LB]
CALL mult
RETURN(1) 2
Heap Storage Allocation and Garbage Collection
Heap storage allocation is another way to organise memory, and is good for indirect storage of variables.
A heap variable is allocated by a special command called an allocator in Java. This is whenever you use the keyword new and in C it is malloc(). These return a pointer to the variable in the heap. They exist in memory until it is unallocated (via free() in C or automatically in Java).
Heap is placed in opposite to the stack:
[ Stack ] <-SB
[ a ]
[ b ]
[ ...... ] <-ST
[ ]
[ 8 ] <-HT
[ H ]
[ ]
[ Heap ] <-HB
Heap Storage Allocation:
- Heaps can indirectly store more complex data structures
- Heap variables are added to the heap when they are created
- They are either removed by explicit deallocation, or automatically by a garbage collector
- Gaps appear in the heap over time; these can be managed by:
- Trying to match new variables to the closest size of available gap
- Merging gaps when variables are deallocated
- Compacting the heap periodically
- Garbage is when a variable is inaccessible, because no pointers to it remain in the program
- Explicit deallocation can lead to garbage or dangling pointers
- Automatic deallocation runs periodically, deallocating inaccessible variables
- Instances variables for objects have a pointer to a class object that defines what methods are applicable to them
Code Generation: Code Selection
Code Selection is the process of choosing the specific machine code instructions that are needed to represent a high high-level code structure. Code Templates are used for this purpose, though it is complicated by dealing with special cases.
Storage Allocation is the process of deciding the addresses for each variable in the program. This is why we have spent some time discussing stack and heap memory.
We have parsed the program to an AST and applied contextual analysis to that AST to ensure that all the types are valid! We are now ready to generate the target code for our program.
n := n + 1
LOAD n
LOAD 1
ADD
STORE n
Adding one to a variable is very common thing to do. We can instead use this instruction
LOAD n
SUCC
STORE n
SUCC, means successor instruction to increase the value on the top of the stack by one. This uses 3 instructions instead of 4, meaning it is around 25% faster.
The Triangle Abstract Machine
The Triangle Abstract Machine (TAM) is a virtual machine designed as the target for our case study compiler. It is also implemented in Java, and you have a copy of it by virtue of cloning the Triangle-Tools project. TAM has the following features:
-
Its memory is organised to have a stack at the low-address end and a heap at the high address end. Low level operations are provided for adding and removing data in these
-
Call and return instructions for routines handle frames automatically; return automatically replaces the arguments in the stack with a result from the functions return.
-
The only registers are dedicated to specific purposes as we’ve described. SB, ST, HB and HT to locate the stack and heap; LB points to the topmost frame on the stack, and so on. These are updated automatically by the instructions that add or remove things from memory
-
Several routines such as ADD, MULT, and NOT are provided for basic arithmetic and logic operations. There are also routines for reading and writing text on the console.
A full description of TAM is given in the extracts from Programming Language Processors in Java given in the Canvas Reading List. You don’t need to become familiar with this but the text is there to serve as reference material and you will likely want to refer to it when reading and extending the compiler.
From Programming Language Processors in Java: Compilers and Interpreters by Watt, D.A
Yarr link be here! https://www.cin.ufpe.br/~jml/programming-language-processors-in-java-compilers-and-interpreters.9780130257864.25356.pdf
- Both stack and heap can expand and contrast. Storage exhaustion arises when ST and HT attempt to cross over.
Layout of a TAM frame
- A static link points to an underlying frame associated with teh routine that textually encloses R in the source program
- The dynamic link points to the frame immediately underlying this one in the stack.
- The return address is the address of the instruction immediately following the call instruction that activated R.
TAM instruction format
All TAM instructions have a common format.
- op: the operation code [4bits]
- r: a register number [4bits]
- n: the size of the operand. [8bits]
- d: address displacement (possibly negative) [16bit signed]
[ op ][ r ][ n ][ d ]
4 bits 4 bits 8 bits 16 bits

