Simon Kellet 023ae00e71 init
2026-03-07 11:02:36 +00:00

302 lines
9.4 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# CSCU9A5 Week 1
## Read Chapters 1 and 2 of Clean Code
> You get the drift. Indeed, the ratio of time spent reading vs. writing is well over 10:1. We are constantly reading old code as part of the effort to write new code. Because this ratio is so high, we want the reading of code to be easy, even if it makes the writing harder. Of course theres no way to write code without reading it, so making it easy to read actually makes it easier to write.
### Class names
> Classes and objects should have noun or noun phrase names like Customer, WikiPage, Account, and AddressParser. Avoid words like Manager, Processor, Data, or Info in the name of a class. A class name should not be a verb.
### Method names
> Methods should have verb or verb phrase names like postPayment, deletePage, or save. Accessors, mutators, and predicates should be named for their value and prefixed with get, set, and is according to the javabean standard.
```java
string name = employee.getName();
customer.setName("mike");
if (paycheck.isPosted())...
```
> When constructors are overloaded, use static factory methods with names that describe the arguments. For example,
```java
Complex fulcrumPoint = Complex.FromRealNumber(23.0);
```
> is generally better than
```java
Complex fulcrumPoint = new Complex(23.0);
```
> Consider enforcing their use by making the corresponding constructors private.
### Note: Listings 2-1 and his solution in 2-2 make me cringe
Its too long! Random methods with very long names makes it hard to read on screens like IDE's. You also end up jumping around the code, back and forth and trying to figure out what method returns what and where the next bit of code goes. There is also the issue of creating entire methods for one case of something happening. You could easily split the decision making within the one method with better coding. You can check if there is more and 0 counts and then handle what happens there. We already have access to all the variables we need passed as parameters, why split this off into many methods. I hate it.
## Extended BNF (EBNF)
* We can use * to make things **appear 0-to-many times** by wrapping then in brackets
```text
N ::== a | (b | c)*
```
This can be read as **"N may consist of a or alternatively a series of characters comprising zero to many b or c"**. A valid syntax matching this rule would include:
* "a"
* an empty string (the * helps with this as its 0 to many)
* "b"
* "c"
* "bc"
* "cb"
* "cbcbcbc"
* or "ccccccb"
## Example
We have a simple programming language that allows the user to write arithmetic sums for some existing variables **a, b, c, d, or e**. We can use +, -, /, * and () on the variables too.
```text
Expression ::== primary-Expression (Operator primary-Expression)*
primary-Expression ::== Identifier
| (Expression)
Idenifier ::== a | b | c | d | e
Operator ::== + | - | / | *
```
For example:
```text
a + (b * c)
```
Here you can view the [Java](https://docs.oracle.com/javase/specs/jls/se18/html/jls-19.html) specifications as well as [Pythons](https://docs.python.org/3/reference/grammar.html)! [Go](https://go.dev/ref/spec) has a very good break down of its grammar, with lots of documentation!
## Triangle
## Features of Triangle:
* Three primitive types of variables:
* Boolean (i.e. true or false)
* Char (a single character; same as the Java char type)
* Integer (values between -32767 and 32767)
* Two types of composite variable:
* Records (a bit like dictionaries in python)
* Arrays
* There are no strings or floating point types built-in
* Let blocks are used to declare local constants, variables, procedures and functions (see repl.tri and factorials.ri below)
* The only conditional command is **if** and the only loop command is **while**
* **if** commands look like this:
* if x < y then *dostuff* else *dootherstuff*
* the else is needed, but can just be followed by ; (do nothing)
* **while** commands look like this:
* while x < y do *dostuff*
* **begin** and **end** signify a block of commands; the same as {} in Java or indentation in Python
* Assignment of a value to a variable is done using the := operator like this
* a := 10
*
* We can also bind values within declarations by using **~**
* **put()** and **get()**, and similarly named functions, are used to write and read from the console
### Examples of Triangle (.tri)
```
#hi.tri
begin
put('H'); put('i'); put('!')
end
```
Output: **Hi!**
```
#while.tri
let
var a : Integer
in
begin
a := 0;
while a < 5 do
begin
put('a');
a := a + 1;
end
end
```
Output: **aaaaa**
```
#str.tri
let
type Str ~ array 10 of Char;
func replicate (c: Char): Str ~
[c,c,c,c,c,c,c,c,c,c];
var s: Str
in
begin
s := replicate('*');
put (s[0]); put(s[9]); puteol()
end
```
Output: **
## AST Examples
***See OneNote folder (Year\ 3/AST/) for the answers***
## Syntactic Analysis
Let's have a look at Java's syntax
```java
int myNumber = 55;
```
**Kind*** is the category (Identifier, Integer etc)
**Spelling*** is the actual text used in the code
Let's break it down into tokens:
* **int** - of kind Identifier(name of type) (spelling "int")
* **myNumber** - of kind Identifier(name of variable) (spelling "myNumber")
* **=** - of kind Becomes (spelling "=")
* **55** of kind Integer Literal (spelling "55")
* **;** of kind Semicolon (spelling ";")
Below is the same example in Triangle:
```
let
var myNumber : Integer
in
myNumber := 55
```
Let's break it down into tokens again:
* **let, :, var, in** of kind let, :, var, in (spelling "let, :, var, in ")
* **:=** of kind Becomes (spelling "Becomes")
* **myNumber** of kind Identifier (spelling "myNumber")
* **Integer** of kind Identifier (spelling "Integer")
* **55** of kind Integer Literal (spelling "55")
## Scanning a string into tokens
The basic ideas is that we have a loop implemented using recursion that gobbles up characters from the text until they match a known template for a token.
```
myNumber := 55+ 10
^
```
Our starting point is char 'm', so we assume that we are working along a Identifier. We continue working our way through to the end of the word (until we hit a space). At this stage, it is impossible to know if we are working with either method names, string literals or anything of that nature.
```
myNumber := 55+ 10
~~~~~~~~^
```
We now check that string with a list of reserved words (such as *if*, *end*, *while*). If it was, say for example, **if**, the kind of this token will be set to If. We don't match any known reserved words so this token is labelled as an Identifier with the spelling "myNumber".
```
myNumber := 55+ 10
~~~~~~~~~^
```
We now keep moving along the spaces until we hit something other than a space. We've hit a colon. There are two types of colon in Triangle, one for separating a variable type from its name or a colon-equals (Becomes token). Let's take the next character as we will know what to do with it after this.
```
myNumber := 55+ 10
~~~~~~~~~~^
```
It is an colon equals sign. That tells us that the token is **Becomes** with the spelling ":="
```
myNumber := 55+ 10
~~~~~~~~~~~~^
```
We hit a 5. Triangle can only support Integer types (handy!) so we know that we can carry along this number until we hit something that isn't a number (to get the full context of the Integer). In this case, we move along until we hit a space!
```
myNumber := 55+ 10
~~~~~~~~~~~~~~^
```
We've hit something that isn't a digit! This means that this token is of kind Integer Literal and has the spelling "55".
```
myNumber := 55+ 10
~~~~~~~~~~~~~~^
```
The next character is a plus. Operators can start with this character so we keep taking characters until reaching something that's not an operator. The next character is a space
```
myNumber := 55+ 10
~~~~~~~~~~~~~~~~~~^
```
We hit another Integer (10!) and reach the EOL (end of line). Most methods use recursion to keep reading until we reach the end of a line of a end of file.
## Syntactic Analysis: Parsing into an AST
Let's create a theoretical language, Micro-English. Here is the EBNF:
```
Sentence ::== Subject Verb Object .
Subject ::== I | a Noun | the Noun
Object ::== Me | a Noun | the Noun
Noun ::== cat | mat | rat
Verb ::== like | is | see | sees
(terminals are english lowercase words, e.g. like, the, etc)
```
### Bottom Up
```
Noun
|
the cat sees a rat.
```
```
Subject
____
| Noun
| |
the cat sees a rat.
```
```
S
____
| N Verb
| | |
the cat sees a rat.
```
```
S
____
| N V N
| | | |
the cat sees a rat.
```
```
S Obj
____ ____
| N V | N
| | | | |
the cat sees a rat.
```
```
Sentence
___________________
| | | |
S | Obj |
____ | ____ |
| N V | N |
| | | | | |
the cat sees a rat.
```
---
### Key definitions for compilers
* **Syntactic analysis**: scanning and parsing, which takes the text of the source code and transforms it into an abstract syntax tree
* **Contextual analysis**: checks things like variable types and scope, and creates the connections within the AST so we can later look up declarations for named identifiers like variables, constants and function.
* **Code generation**: generates the output code, which might be machine code, another high-level language, or an intermediate language (as it the case with Java bytecode). Might also include optimisations of the code.