TODO
Read Chapter 3 of Clean Code
public class Example{
private int count;
public int doStuff(int[] values){
int count = 0;
...
}
}
Here there is two variables with the name count. How does the compiler know which is which?
We could walk back up the abstract syntax tree to figure out what needs to go where in memory (size, type etc.). That would work but, it's slow. (Think back to Tree Walks).
The ID table might contain a list of these attributes or just a pointer to the place as to where the identifier was declared.
Declarations are functions, class in Java or even int etc.
A block is any part of the program that limits the scope of the declaration in Java.
Each declaration has a scope.
A programming language exhibits a monolithic block structure if there is only one block:
program
D
begin
C
end
One block for the whole program. The ID table may look like this:
| Identity | Attribute | |----------|-----------| | b | (1) | | n | (2) | | c | (3) |
program
(1) integer b = 10
(2) integer n
(3) char c
begin
...
n = n * b
...
write c
...
end
Several overlapping blocks, local and global scopes.
Most programming languages fall into this structure (Java, C and Python).
Many scope levels, declarations can be global (scope level 1, outer most scope) in scope or local in scope.
Example:
let
(1) var a : Integer
(2) var b : Boolean
in
begin
....
let
(3) var b : Integer
(4) var c : Boolean
in
begin
....
let
(5) var d : Integer
in
....
end
let
(6) var d : Boolean
(7) var e : Integer
in
....
end
| Level | Identity | Attribute | |-------|----------|-----------| | 1 | a | (1) | | 1 | b | (2) | | 2 | b | (3) | | 2 | c | (4) | | 3 | d | (5) | | 2 | d | (6) | | 2 | e | (7) |
What do we need to store instead of the identifiers?
This is a key feature in a statically typed language like Triangle or Java. This helps programmers not make mistakes (assigning a string to an int). This type checking happens at compile time.
The picture is much more complex for dynamically typed languages such as JavaScript or Go!
Principles to follow with data representation:
(1) nonconfusion: different values of a given type should have different representations
(2) uniqueness: each value should always have the same representation.
All values of a given type should occupy the same amount of space, this being that an int will have a storage size of 8 bytes, a boolean value having only 1 byte and a char only having 1 byte too.
The compiler then can efficiently work out where to map the values in memory as it knows all the space needed.
Should values be represented directly or indirectly?
If a variable is directly represented, it simply maps to a binary representation of the variables value somewhere in heap memory.
| Address | Value |
|---------|-------|
| 0 | ? | int x = 130
| 1 | ? | | (direct binary repr.)
| 2 | 130 |<---|
| 4 | ? |
| ... | ... |
If a value is indirectly, the variable maps to a handle, a pointer to a storage area where the binary representation of that variable exists (most likely in a heap memory area).
| Address | Value |
|---------|-------|
| 0 | ? | Object x = new Object()
| 1 | ? | | (indirect binary repr.)
| 2 | 160 |<---| (160 is an addr. in memory)
| 4 | ? |
| ... | ... |
| 160 |Object |
| 161 | data |
| 162 | here |
| 163 |etc... |
Indirect representation is necessary for variable whose values vary in size by a large amount e.g. dynamic arrays or objects. (Think malloc() in C).
Primate types are stored directly, as this is more efficient than managing a pointer and the allocation of space in heap memory.
Integer, Boolean and Char types are supported directly by the target machine with corresponding operations like add, multiply etc.
Composite types are types which can be simplified into collections of primitive types.
Record: A collection of variables of fields, each of which has an identifier.
Simplest way to store this in memory is to store the variables consecutively in memory (like in a row).
type Date = record
y : Integer
m : Integer
d : Integer
end;
type Details = record
manager : Boolean
joined : Date
dep : Char
end;
This could be stored in memory like so:
var person : Details
| Address | Value |
|---------|-------|
| 1 | ?? |
| 2 | true | <-- person.manager
| 2 | 1992 | <--- person.joined.y ]
| 4 | 3 | <--- person.joined.m ]> person.joined
| 5 | 1 | <--- person.joined.d ]
| 6 | 'c' | <-- person.dep
| 7 | ?? |
Array: consists of several elements which are all of the same type. Each element is referenced by an index (often an integer), there is a one-to-one relationship between indexes and array elements. Arrays start at 0 (some terrible languages don't)
We are talking about static arrays (defined at compile time to a fixed size) but some languages support dynamic arrays whereby the length can vary at runtime.
For dynamic arrays, we need a handle for the array where it points to the beginning and the end addresses somewhere else in the memory.
Static arrays could look like this:
| Address | Value |
|---------|-------|
| 1 | ?? |
| 2 | ?? |
| 3 | ?? |
| 4 | 'h' | a[0]
| 5 | 'e' | a[1]
| 6 | 'l' | a[2]
| 7 | 'l' | a[3]
| 8 | 'o' | a[4]
| 9 | '!' | a[5]
| 10 | ?? |
| 11 | ?? |
| ... | ... |
Accessing the elements of this array requires an extra computation at runtime compared to accessing a single variable or a record field.
To summarise, broadly there are two approaches to storing temporary values: registers and stack. Registers are fixed locations that can be referenced directly, but this means a tricky process of choosing which registers to hold which values. Stacks grow and shrink to accommodate new values, but lead to simpler evaluation of expressions.
Identification Table: associates identifiers with a list of attributes, or the original declaration
Declaration: something like a function declaration in python, class declaration in Java, or a variable declaration like “int a;” in Java
Scope: each declaration has a scope, which is the portion of the program that the declaration takes effect.
Block: any part of the program that limits the scope of a declaration (e.g. curly brackets in Java, indentation in Python) In Triangle, scope is determined by the let...in... Command.
Block structure: there are three types of block structure: monolithic, flat, and nested
Non-confusion: different values of a given type should have different representations
Uniqueness: each value should always have the same representation.
Two issues to remember in practice:
All values of a given type should occupy the same amount of space
Should values be represented directly or indirectly? If all values of a given type occupy the same space (that is, the same number of bits or bytes), it is possible for the compiler to plan the allocation of space efficiently simply by knowing the type of each variable.