CSCU9A5 Week 2

TODO

Read Chapter 3 of Clean Code

Contextual Analysis

public class Example{
    private int count;
    
    public int doStuff(int[] values){
        int count = 0;
        ...
    }
}

Here there is two variables with the name count. How does the compiler know which is which?

Identification Tables

We could walk back up the abstract syntax tree to figure out what needs to go where in memory (size, type etc.). That would work but, it's slow. (Think back to Tree Walks).

The ID table might contain a list of these attributes or just a pointer to the place as to where the identifier was declared.

Declarations are functions, class in Java or even int etc.

A block is any part of the program that limits the scope of the declaration in Java.

Each declaration has a scope.

Monolithic Block Structure

A programming language exhibits a monolithic block structure if there is only one block:

All declarations are global in scope
No identifier may be declared more than once
For every reference to an identifier, i, there must be a corresponding declaration of i.

program
    D
begin
    C
end

One block for the whole program. The ID table may look like this:

| Identity | Attribute | |----------|-----------| | b | (1) | | n | (2) | | c | (3) |

program
(1) integer b = 10
(2) integer n
(3) char c 
begin
...
n = n * b
...
write c
...
end

Flat Block Structure

Several overlapping blocks, local and global scopes.

Nested Block Structure

Most programming languages fall into this structure (Java, C and Python).

Many scope levels, declarations can be global (scope level 1, outer most scope) in scope or local in scope.

Example:

let
    (1) var a : Integer
    (2) var b : Boolean
in
    begin
        ....
        let
            (3) var b : Integer
            (4) var c : Boolean 
        in
        begin
            ....
            let
                (5) var d : Integer
            in
            ....
        end
    let
        (6) var d : Boolean
        (7) var e : Integer
    in
    ....
end

| Level | Identity | Attribute | |-------|----------|-----------| | 1 | a | (1) | | 1 | b | (2) | | 2 | b | (3) | | 2 | c | (4) | | 3 | d | (5) | | 2 | d | (6) | | 2 | e | (7) |

What do we need to store instead of the identifiers?

Type Checking

This is a key feature in a statically typed language like Triangle or Java. This helps programmers not make mistakes (assigning a string to an int). This type checking happens at compile time.

The picture is much more complex for dynamically typed languages such as JavaScript or Go!

Runtime Organisation: Data Representation

Principles to follow with data representation:

(1) nonconfusion: different values of a given type should have different representations

(2) uniqueness: each value should always have the same representation.

All values of a given type should occupy the same amount of space, this being that an int will have a storage size of 8 bytes, a boolean value having only 1 byte and a char only having 1 byte too.

The compiler then can efficiently work out where to map the values in memory as it knows all the space needed.

Should values be represented directly or indirectly?

If a variable is directly represented, it simply maps to a binary representation of the variables value somewhere in heap memory.

| Address | Value |
|---------|-------|  
| 0       | ?     | int x = 130
| 1       | ?     |    | (direct binary repr.)
| 2       | 130   |<---|
| 4       | ?     |
| ...     | ...   |

If a value is indirectly, the variable maps to a handle, a pointer to a storage area where the binary representation of that variable exists (most likely in a heap memory area).

| Address | Value |
|---------|-------|  
| 0       | ?     | Object x = new Object()
| 1       | ?     |    | (indirect binary repr.)
| 2       | 160   |<---| (160 is an addr. in memory)
| 4       | ?     |
| ...     | ...   |
| 160     |Object |
| 161     |  data |
| 162     |  here |
| 163     |etc... |

Indirect representation is necessary for variable whose values vary in size by a large amount e.g. dynamic arrays or objects. (Think malloc() in C).

Primate Types

Primate types are stored directly, as this is more efficient than managing a pointer and the allocation of space in heap memory.

Integer, Boolean and Char types are supported directly by the target machine with corresponding operations like add, multiply etc.

Composite Types

Composite types are types which can be simplified into collections of primitive types.

Record

Record: A collection of variables of fields, each of which has an identifier.

Records in Triangle
Structs in C, Go etc.
Java Class with only public variables (closest anyway...)

Simplest way to store this in memory is to store the variables consecutively in memory (like in a row).

type Date = record
        y : Integer
        m : Integer
        d : Integer
        end;

type Details = record
        manager : Boolean
        joined  : Date
        dep     : Char
        end;

This could be stored in memory like so:

var person : Details

| Address | Value |
|---------|-------|
| 1       | ??    |
| 2       | true  | <-- person.manager
| 2       | 1992  | <---  person.joined.y ]
| 4       | 3     | <---  person.joined.m ]> person.joined
| 5       | 1     | <---  person.joined.d ]
| 6       | 'c'   | <-- person.dep
| 7       | ??    |

Array

Array: consists of several elements which are all of the same type. Each element is referenced by an index (often an integer), there is a one-to-one relationship between indexes and array elements. Arrays start at 0 (some terrible languages don't)

We are talking about static arrays (defined at compile time to a fixed size) but some languages support dynamic arrays whereby the length can vary at runtime.

For dynamic arrays, we need a handle for the array where it points to the beginning and the end addresses somewhere else in the memory.

Static arrays could look like this:

| Address | Value |
|---------|-------|
| 1       | ??    |
| 2       | ??    |
| 3       | ??    |
| 4       | 'h'   |  a[0]
| 5       | 'e'   |  a[1]
| 6       | 'l'   |  a[2]
| 7       | 'l'   |  a[3]
| 8       | 'o'   |  a[4]
| 9       | '!'   |  a[5]
| 10      | ??    |
| 11      | ??    |
| ...     | ...   |

Accessing the elements of this array requires an extra computation at runtime compared to accessing a single variable or a record field.

Expression Evaluation

To summarise, broadly there are two approaches to storing temporary values: registers and stack. Registers are fixed locations that can be referenced directly, but this means a tricky process of choosing which registers to hold which values. Stacks grow and shrink to accommodate new values, but lead to simpler evaluation of expressions.

Summary of Key Concepts

Identification Table: associates identifiers with a list of attributes, or the original declaration
Declaration: something like a function declaration in python, class declaration in Java, or a variable declaration like “int a;” in Java
Scope: each declaration has a scope, which is the portion of the program that the declaration takes effect.
Block: any part of the program that limits the scope of a declaration (e.g. curly brackets in Java, indentation in Python) In Triangle, scope is determined by the let...in... Command.
Block structure: there are three types of block structure: monolithic, flat, and nested
Non-confusion: different values of a given type should have different representations
Uniqueness: each value should always have the same representation.

Two issues to remember in practice:

All values of a given type should occupy the same amount of space
Should values be represented directly or indirectly? If all values of a given type occupy the same space (that is, the same number of bits or bytes), it is possible for the compiler to plan the allocation of space efficiently simply by knowing the type of each variable.