# CSCU9A5 Week 2 TODO > Read Chapter 3 of Clean Code ## Contextual Analysis ```java public class Example{ private int count; public int doStuff(int[] values){ int count = 0; ... } } ``` Here there is two variables with the name **count**. How does the compiler know which is which? ### Identification Tables We could walk back up the abstract syntax tree to figure out what needs to go where in memory (size, type etc.). That would work but, it's slow. (Think back to Tree Walks). The ID table might contain a list of these *attributes* or just a *pointer* to the place as to where the identifier was declared. Declarations are functions, class in Java or even int etc. A *block* is any part of the program that limits the scope of the declaration in Java. Each declaration has a **scope**. ### Monolithic Block Structure A programming language exhibits a monolithic block structure if there is only one block: * All declarations are **global in scope** * No identifier may be declared more than once * For every reference to an identifier, *i*, there must be a corresponding declaration of *i*. ```code program D begin C end ``` One block for the **whole** program. The ID table may look like this: | Identity | Attribute | |----------|-----------| | b | (1) | | n | (2) | | c | (3) | ```code program (1) integer b = 10 (2) integer n (3) char c begin ... n = n * b ... write c ... end ``` ### Flat Block Structure Several overlapping blocks, local and global scopes. ### Nested Block Structure Most programming languages fall into this structure (Java, C and Python). Many scope levels, declarations can be global (scope level 1, outer most scope) in scope or local in scope. Example: ```code let (1) var a : Integer (2) var b : Boolean in begin .... let (3) var b : Integer (4) var c : Boolean in begin .... let (5) var d : Integer in .... end let (6) var d : Boolean (7) var e : Integer in .... end ``` | Level | Identity | Attribute | |-------|----------|-----------| | 1 | a | (1) | | 1 | b | (2) | | 2 | b | (3) | | 2 | c | (4) | | 3 | d | (5) | | 2 | d | (6) | | 2 | e | (7) | What do we need to store instead of the identifiers? ## Type Checking This is a key feature in a **statically** typed language like Triangle or Java. This helps programmers not make mistakes (assigning a string to an int). This type checking happens at compile time. The picture is much more complex for **dynamically** typed languages such as JavaScript or Go! ## Runtime Organisation: Data Representation > Principles to follow with data representation: > (1) nonconfusion: different values of a given type should have different representations > (2) uniqueness: each value should always have the same representation. All values of a given type should occupy the same amount of space, this being that an int will have a storage size of 8 bytes, a boolean value having only 1 byte and a char only having 1 byte too. The compiler then can efficiently work out where to map the values in memory as it knows all the space needed. Should values be represented directly or indirectly? If a variable is directly represented, it simply maps to a binary representation of the variables value somewhere in heap memory. ```text | Address | Value | |---------|-------| | 0 | ? | int x = 130 | 1 | ? | | (direct binary repr.) | 2 | 130 |<---| | 4 | ? | | ... | ... | ``` If a value is indirectly, the variable maps to a **handle**, a pointer to a storage area where the binary representation of that variable exists (most likely in a heap memory area). ```text | Address | Value | |---------|-------| | 0 | ? | Object x = new Object() | 1 | ? | | (indirect binary repr.) | 2 | 160 |<---| (160 is an addr. in memory) | 4 | ? | | ... | ... | | 160 |Object | | 161 | data | | 162 | here | | 163 |etc... | ``` **Indirect representation** is necessary for variable whose values vary in size by a large amount e.g. dynamic arrays or objects. (Think malloc() in C). ## Primate Types Primate types are stored directly, as this is more efficient than managing a pointer and the allocation of space in heap memory. Integer, Boolean and Char types are supported directly by the target machine with corresponding operations like *add*, *multiply* etc. ## Composite Types Composite types are types which can be simplified into collections of primitive types. ### Record **Record**: A collection of variables of fields, each of which has an identifier. * Records in Triangle * Structs in C, Go etc. * Java Class with only public variables (closest anyway...) Simplest way to store this in memory is to store the variables consecutively in memory (like in a row). ```code type Date = record y : Integer m : Integer d : Integer end; type Details = record manager : Boolean joined : Date dep : Char end; ``` This could be stored in memory like so: ```text var person : Details | Address | Value | |---------|-------| | 1 | ?? | | 2 | true | <-- person.manager | 2 | 1992 | <--- person.joined.y ] | 4 | 3 | <--- person.joined.m ]> person.joined | 5 | 1 | <--- person.joined.d ] | 6 | 'c' | <-- person.dep | 7 | ?? | ``` ### Array **Array**: consists of several elements which are all of the same type. Each element is referenced by an index (often an integer), there is a one-to-one relationship between indexes and array elements. Arrays start at 0 (some terrible languages don't) We are talking about static arrays (defined at compile time to a fixed size) but some languages support dynamic arrays whereby the length can vary at runtime. For dynamic arrays, we need a handle for the array where it points to the beginning and the end addresses somewhere else in the memory. Static arrays could look like this: ```text | Address | Value | |---------|-------| | 1 | ?? | | 2 | ?? | | 3 | ?? | | 4 | 'h' | a[0] | 5 | 'e' | a[1] | 6 | 'l' | a[2] | 7 | 'l' | a[3] | 8 | 'o' | a[4] | 9 | '!' | a[5] | 10 | ?? | | 11 | ?? | | ... | ... | ``` Accessing the elements of this array requires an extra computation at runtime compared to accessing a single variable or a record field. ## Expression Evaluation To summarise, broadly there are two approaches to storing temporary values: registers and stack. Registers are fixed locations that can be referenced directly, but this means a tricky process of choosing which registers to hold which values. Stacks grow and shrink to accommodate new values, but lead to simpler evaluation of expressions. --- ## Summary of Key Concepts * Identification Table: associates identifiers with a list of attributes, or the original declaration * Declaration: something like a function declaration in python, class declaration in Java, or a variable declaration like “int a;” in Java * Scope: each declaration has a scope, which is the portion of the program that the declaration takes effect. * Block: any part of the program that limits the scope of a declaration (e.g. curly brackets in Java, indentation in Python) In Triangle, scope is determined by the let...in... Command. * Block structure: there are three types of block structure: monolithic, flat, and nested * Non-confusion: different values of a given type should have different representations * Uniqueness: each value should always have the same representation. Two issues to remember in practice: * All values of a given type should occupy the same amount of space * Should values be represented directly or indirectly? If all values of a given type occupy the same space (that is, the same number of bits or bytes), it is possible for the compiler to plan the allocation of space efficiently simply by knowing the type of each variable.