2026-03-07 16:31:25 +00:00

9.4 KiB
Raw Blame History

CSCU9A5 Week 1

Read Chapters 1 and 2 of Clean Code

You get the drift. Indeed, the ratio of time spent reading vs. writing is well over 10:1. We are constantly reading old code as part of the effort to write new code. Because this ratio is so high, we want the reading of code to be easy, even if it makes the writing harder. Of course theres no way to write code without reading it, so making it easy to read actually makes it easier to write.

Class names

Classes and objects should have noun or noun phrase names like Customer, WikiPage, Account, and AddressParser. Avoid words like Manager, Processor, Data, or Info in the name of a class. A class name should not be a verb.

Method names

Methods should have verb or verb phrase names like postPayment, deletePage, or save. Accessors, mutators, and predicates should be named for their value and prefixed with get, set, and is according to the javabean standard.

string name = employee.getName();
customer.setName("mike");
if (paycheck.isPosted())...

When constructors are overloaded, use static factory methods with names that describe the arguments. For example,

Complex fulcrumPoint = Complex.FromRealNumber(23.0);  

is generally better than

Complex fulcrumPoint = new Complex(23.0);  

Consider enforcing their use by making the corresponding constructors private.

Note: Listings 2-1 and his solution in 2-2 make me cringe

Its too long! Random methods with very long names makes it hard to read on screens like IDE's. You also end up jumping around the code, back and forth and trying to figure out what method returns what and where the next bit of code goes. There is also the issue of creating entire methods for one case of something happening. You could easily split the decision making within the one method with better coding. You can check if there is more and 0 counts and then handle what happens there. We already have access to all the variables we need passed as parameters, why split this off into many methods. I hate it.

Extended BNF (EBNF)

  • We can use * to make things appear 0-to-many times by wrapping then in brackets
N ::== a | (b | c)*

This can be read as "N may consist of a or alternatively a series of characters comprising zero to many b or c". A valid syntax matching this rule would include:

  • "a"
  • an empty string (the * helps with this as its 0 to many)
  • "b"
  • "c"
  • "bc"
  • "cb"
  • "cbcbcbc"
  • or "ccccccb"

Example

We have a simple programming language that allows the user to write arithmetic sums for some existing variables a, b, c, d, or e. We can use +, -, /, * and () on the variables too.

Expression              ::== primary-Expression (Operator primary-Expression)*
primary-Expression      ::== Identifier
                            | (Expression)
Idenifier               ::== a | b | c | d | e
Operator                ::== + | - | / | *

For example:

a + (b * c)

Here you can view the Java specifications as well as Pythons! Go has a very good break down of its grammar, with lots of documentation!

Triangle

Features of Triangle

  • Three primitive types of variables:

    • Boolean (i.e. true or false)
    • Char (a single character; same as the Java char type)
    • Integer (values between -32767 and 32767)
  • Two types of composite variable:

    • Records (a bit like dictionaries in python)
    • Arrays
  • There are no strings or floating point types built-in

  • Let blocks are used to declare local constants, variables, procedures and functions (see repl.tri and factorials.ri below)

  • The only conditional command is if and the only loop command is while

  • if commands look like this:

    • if x < y then dostuff else dootherstuff
    • the “else” is needed, but can just be followed by ; (do nothing)
  • while commands look like this:

    • while x < y do dostuff
  • begin and end signify a block of commands; the same as {} in Java or indentation in Python

  • Assignment of a value to a variable is done using the := operator like this

    • a := 10
  • We can also bind values within declarations by using ~

  • put() and get(), and similarly named functions, are used to write and read from the console

Examples of Triangle (.tri)

#hi.tri
begin
    put('H'); put('i'); put('!')
end

Output: Hi!

#while.tri
let
    var a : Integer
in
begin
    a := 0;
    while a < 5 do
    begin
        put('a');
        a := a + 1;
    end
end

Output: aaaaa

#str.tri
let
    type Str ~ array 10 of Char;
    
    func replicate (c: Char): Str ~
        [c,c,c,c,c,c,c,c,c,c];
        
  var s: Str
in

begin
    s := replicate('*');
    put (s[0]); put(s[9]); puteol()
end

Output: **

AST Examples

See OneNote folder (Year\ 3/AST/) for the answers

Syntactic Analysis

Let's have a look at Java's syntax

int myNumber = 55;

Kindis the category (Identifier, Integer etc) Spelling is the actual text used in the code

Let's break it down into tokens:

  • int - of kind Identifier(name of type) (spelling "int")
  • myNumber - of kind Identifier(name of variable) (spelling "myNumber")
  • = - of kind Becomes (spelling "=")
  • 55 of kind Integer Literal (spelling "55")
  • ; of kind Semicolon (spelling ";")

Below is the same example in Triangle:

let
    var myNumber : Integer
in 
    myNumber := 55

Let's break it down into tokens again:

  • let, :, var, in of kind let, :, var, in (spelling "let, :, var, in ")
  • := of kind Becomes (spelling "Becomes")
  • myNumber of kind Identifier (spelling "myNumber")
  • Integer of kind Identifier (spelling "Integer")
  • 55 of kind Integer Literal (spelling "55")

Scanning a string into tokens

The basic ideas is that we have a loop implemented using recursion that gobbles up characters from the text until they match a known template for a token.

myNumber := 55+ 10
^

Our starting point is char 'm', so we assume that we are working along a Identifier. We continue working our way through to the end of the word (until we hit a space). At this stage, it is impossible to know if we are working with either method names, string literals or anything of that nature.

myNumber := 55+ 10
~~~~~~~~^

We now check that string with a list of reserved words (such as if, end, while). If it was, say for example, if, the kind of this token will be set to If. We don't match any known reserved words so this token is labelled as an Identifier with the spelling "myNumber".

myNumber := 55+ 10
~~~~~~~~~^

We now keep moving along the spaces until we hit something other than a space. We've hit a colon. There are two types of colon in Triangle, one for separating a variable type from its name or a colon-equals (Becomes token). Let's take the next character as we will know what to do with it after this.

myNumber := 55+ 10
~~~~~~~~~~^

It is an colon equals sign. That tells us that the token is Becomes with the spelling ":="

myNumber := 55+ 10
~~~~~~~~~~~~^

We hit a 5. Triangle can only support Integer types (handy!) so we know that we can carry along this number until we hit something that isn't a number (to get the full context of the Integer). In this case, we move along until we hit a space!

myNumber := 55+ 10
~~~~~~~~~~~~~~^

We've hit something that isn't a digit! This means that this token is of kind Integer Literal and has the spelling "55".

myNumber := 55+ 10
~~~~~~~~~~~~~~^

The next character is a plus. Operators can start with this character so we keep taking characters until reaching something that's not an operator. The next character is a space

myNumber := 55+ 10
~~~~~~~~~~~~~~~~~~^

We hit another Integer (10!) and reach the EOL (end of line). Most methods use recursion to keep reading until we reach the end of a line of a end of file.

Syntactic Analysis: Parsing into an AST

Let's create a theoretical language, Micro-English. Here is the EBNF:

Sentence    ::== Subject Verb Object .
Subject     ::== I | a Noun | the Noun
Object      ::== Me | a Noun | the Noun 
Noun        ::== cat | mat | rat
Verb        ::== like | is | see | sees
(terminals are english lowercase words, e.g. like, the, etc)

Bottom Up

   Noun
    |
the cat sees a rat.
Subject
 ____
 | Noun
 |  |
the cat sees a rat.
  S
 ____
 |  N   Verb
 |  |    |
the cat sees a rat.
  S
 ____
 |  N    V      N
 |  |    |      |
the cat sees a rat.
  S          Obj
 ____        ____
 |  N    V   |  N
 |  |    |   |  |
the cat sees a rat.
     Sentence
___________________
  |      |    |   |
  S      |   Obj  |
 ____    |   ____ |
 |  N    V   |  N |
 |  |    |   |  | |
the cat sees a rat.

Key definitions for compilers

  • Syntactic analysis: scanning and parsing, which takes the text of the source code and transforms it into an abstract syntax tree

  • Contextual analysis: checks things like variable types and scope, and creates the connections within the AST so we can later look up declarations for named identifiers like variables, constants and function.

  • Code generation: generates the output code, which might be machine code, another high-level language, or an intermediate language (as it the case with Java bytecode). Might also include optimisations of the code.