CSCU9A5 Week 4

Read: Chapter 5 of Clean Code

Variable declarations: Martin suggests that declarations should be as close to the usage of the variables as possible. Others feel they should all be at the top, or at least together somewhere. What's your opinion?

When it comes to variables within a program - it's important to remember 'scope'. We need to make it clear what scope the variables are in.

If we are declaring variables at the top of the program (not in a main but the very top) - this is what I would consider global variables. These will be used throughout the entire program and therefore be rather important e.g. A Window object. This variable will be altered through the program.

We've talked about global variables - let's talk about local ones.

I would say there are two types of local variables. One's within a methods usages and ones within loops.

Method usage local are the ones you would declare at the start of a function. These could be used to calculate a sum and then be returned. There is no need to put the variable else where. Now I believe you should use the variable right after declaring it in this situation.

Sometimes it's helpful to declare all the variables at the start - this helps give context to what the function will be doing and possibly what it is returning without the need of comments. I find this helpful for functions that handle complex maths equations.

The variables used within loops should come before the loops start; this gives me as a programmer better control on the scope and also context. An example could be line number, character position and file name before a for loop. I know that these three variables will be used in this for loop for keeping track of where we are in the loop.

Do you have any of your own formatting rules you follow (even if only sometimes?)

I try and reduce the amount of nesting I do - this falls under refactoring. I also like to still use {} around one line if functions too.

Code Generation: Algorithm

The Visitor Pattern is used to generate the necessary low level instructions.

The algorithm we are using follows the visitor pattern. We can already see this in the visualisation of the AST.

We specifically write visitNode methods to view the tree.

The Visitor pattern walks the AST calling emit() methods to generate machine code instructions as it goes. Lookups into the AST are used to decide things like the value of literals or specific operators to use.

Backpatching is used when we need to make forward jumps when we need them (look at visitIf command).

Code Generation: Resource Allocation

let
    const b ~ 10;
    var i : Integer
in
    i := i * b

b is bound to 10 and i is bound to an address large enough to hold an integer.

When b is called in the program - it should be translated to a 10 by the compiler. Each time i is used, it should be translated to a memory address.

In this example, the address for i is 4. The machine code could look like this:

LOAD    4[SB]
LOADL   10
CALL    mult    
STORE   4[SB]

In any declaration, identifiers can be bound to values or addresses, and these might be known or unknown at compile time. So at any declaration, there is one of four possible known values:

Within the Triangle Compiler

Known Value

...
public KnownValue(int size, int value){ ... }
...
public void encodeFetch(Emitter ...){
 emitter.emit(OpCode.LOADL, 0, value)
}
...

A Known value (KnownValue.java) is simple. Size is the amount of memory taken up by this value and Value is the value itself. encodeFetch loads the literal value onto the stack (0).

Unknown Value

...
public UnknownValue(int size, int level, int displacement){ ... }
...
public void encodeFetch(Emitter ...){
 if (vname.indexed){
  emitter.emit(OpCode.LOADA ...
  emitter.emit(OpCode.CALL ...
  emitter.emit(OpCode.LOADI ...
 } else {
  emitter.emit(OpCode.LOAD ...
 }
}
...

An Unknown Value is made up of two parts - level and displacement. Level is how nested the routine that contained this declaration. Displacement is where the entity is located relative to the base of the frame (i.e. how many words is from the start if the frame it is).

We can use the frame as the displacement; that's the top of the stack.

Code Generation: Procedures and Functions

How do we handle procedures and functions? These both translate to low level routines. A routine is a series of instructions and the template might look something like this:

elaborate [proc I () ~ C] =
    JUMP g
e:  execute c
    RETURN (0) 0
h: 

Compiler Optimisations

Compiler optimisation can happen in a few places:

while(j < k) {
    a[j] := b + c;
    j++;
}

In this example: b and c can be moved outside the loop so they are only computed once. We could turn the above code into this:

if (j < k){
    tmp = b + c;
    while (j < k){
        a[j] := tmp;
        j++;
    }
}

Optional reading: More examples are mentioned in the book Introduction to Compiler Design (Mogensen) cited in the module home page; also here https://compileroptimizations.com

Interpreters and Native Code, JIT

Python is an interpreted language, meaning that the instructions called are done in real time - this creates more slow-down when running the program compared to a compiled language like C.

Java is in the middle. Javac generates bytecode which is targeted at a virtual machine. This is then interpreted, so there is still some overhead involved when running with Java.

Our toy language compiles into a Tam file. This is that intermediate language similar to bytecode in Java.

Just-in-time compilation, HotSpot, and GI

JIT (Just-in-Time) compilation was introduced to help speed up Java's slow interpreted bytecode. Java's compiler doesn't do much optimisation itself, the idea of JIT is to offload certain expensive CPU executions into native machine code. Only parts of the program are complied in this way rather than the whole thing (most of it wont be ran often enough to make it worthwhile.).

The process of targeting this extra compilation step is where the HotSpot (Oracles implementation of Java gets it name) - the targeted code is where the program runs hot.

Simply put, there's a count of the number of times each method, loop and numerous other structures are executed. If that number would reach a particular threshold, that relevant block of code is compiled natively on the machine!

JIT will also monitor branches of code and do the heavy work of lifting the never executed blocks of code from the compilation process off.

When and Where to Optimise Your Code

We must not prematurely optimise! We should write straight forward clean code. We don't want to write hard-to-read code; slows down development, introduce bugs that are hard to track and make maintenance so much harder.

log.log(Level.FINE, "..." + calcX() ... + calcY() ... );

This simple line has to be compiled and the method calls have to be computed i.e moving values on and off the stack and tracing ourselves around the program calls. If there is an error, log only reports levels higher than FINE so all that computation is wasted! We can rewrite this to:

if(log.isLoaggable(Level.FINE)){_
 log.log(Level.FINE, "..." + calcX() ... + calcY() ... );
}

This is much more efficient.

Sometimes we want to swap values around. We could do it like this:

temp = a
a = b
b = temp

This can be done better though. Maybe we can remove the temp variable. Let's have a look at the XOR binary function.

XOR works like this:

| ----- | - | | 0 + 0 | 0 | | 0 + 1 | 1 | | 1 + 0 | 1 | | 1 + 0 | 0 |

Turns out, if we use XOR in a variable three times - it swaps!

2023-10-02_35

Unfortunately, it turns out that in practise, at best this will perform the same as using a temporary variable. On many modern CPU's, copies between registers are extremely fast. Some provide low-level instructions to swap variables anyway.

In summary: