Sungwon Chung

Sungwon Chung@sungpi

Type Lattice

C++ 에서의 자료형type 을 크게 네 가지로 생각해 보면 다음과 같다:

  • bool
  • char
  • int
  • double

이 네 가지 자료형Type 에는 우선 순위가 있다. 이를 Type Lattice라고 한다.
자료형 변환Cast을 할 때나, 연산을 할 때 이 우선 순위가 적용된다:

bool to char
bool to int
bool to double
char to int
char to double
int to double
위와 같은 경우는 자료형을 변환할 때 안전하다고 한다. (Safe conversion)
이 외의 경우에는 안전하지 않다고 한다. 이는 자료형Type안의 값Value가 파괴될 수 있기 때문이다. C++에서는 변수를 선언할 때나 변환할 때, { } 를 사용함으로써 컴파일 단계에서 이 안전하지 않은 변환에서 에러를 낼 수 있다. 그리고 이 에러를 Narrowing이라고 한다.


int a{4.2};
int b = 4.2;
1 + 2.5;

이 코드는 int a에 double 값인 4.2가 들어감으로써 narrow error를 발생시킨다.
반면, int b = 4.2는 b에 4를 넣어주는 강제 형변환이 된다.

연산

연산 시에는 Type Lattice에 따라, 안전하게 변환이 되게 컴파일러가 계산을 한다.
예를 들어 1 + 2.5 는 int와 double의 +연산이다. 이 경우에 int + int로 계산하거나, double + double로 계산하는 두 가지 경우가 있다. 전자는 int + double->int의 연산이고, 후자는 int->double + double의 연산이다. 전자는 double->int가 unsafe conversion이고, 후자는 int->double이 safe conversion이므로 컴파일러는 자동으로 후자를 택해서 연산을 하게 된다.

Testcode


int a;
std::cout << a;

dho dkseh

int a;
std::cout << a;

왜??

import urllib

print ‘Hello World'

Other Concepts

Synthesized Translation

The term synthesized translation includes two ideas:

  • A complex expression is constructed from subexpressions
  • The translation of a complex expression is constructed in a mechanical way form the translations of its subexpressions
    if x > max then max := x
    is constructed of (x > max) and (max := x), therefore, ( if (> x max) (:= max x) )

Controllability and Observability

These are central concepts from control theory. We will define them as :

  • Controllability: the ability to chance the behavior of a system by changing its parameters.
  • Observability: the ability to observe the behavior of a system well enough to control it.

In order to control a system, both controllability and observability are required.

The implications For Large Software Systems
  • Aspects of software that cannot easily be observed will never be debugged
  • All Large Software Systems must have observability built in
  • Observability is a requirement, not a luxury. The time spent building in observability will be well repaid.
  • In-process traces can be turned on by setting bits in a bit vector.

Comparison of Parsing Techniques

Recursive Descent and Operator Precedence
Advantages:
  • easy to control, understand
  • Good error messages can be produced
Disadvantages:
  • More code to write
  • Grammar not as clear
Parser Generator (yacc)
Advantages
  • Less code to write
  • Grammar is clear and easy to change
Disadvantages:
  • The programmer may get stuck on grammar errors that are hard to understand
  • Error messages during compilation may be unhelpful and far from the actual error

Lexical Scoping

Scope is the region of program text over which a symbol can be referenced.

With lexical scoping, a name is defined when it can be looked up in the lexicon at the point of use.

Structure References

  • Every expression has type
  • Types from a tree structure (a graph when pointers to types are included, but still treated as a tree)
  • The structure references in source code specify a traversal down the type tree
  • A reference to a part of a structure depends on the type of the structure; the results of the reference are:
    • An address expression for the substructure
    • A new type for the substructure

Citation:

  • Aho, Lam, Sethi, & Ullman, Compilers: Principles, Techniques, and Tools
  • UT Austin CS375: Compilers by G.Novak.

Address Calculations

For Array References (Know the Fomulars)

A simple array declaration has a form such as:
* array[ $low_1$ .. $high_1$] of type

  • Find the symbol table entry for type
  • Make a symbol table entry for the array type. The total size of the array is: ($high_1$ - $low_1$ + 1) * size(type)

Address Calculations for Access to Fields of a Record

Determining Types of Accesses To Parts of a Structure

Be Able to Calculate the Size of An Array Records

Be Able to Calculate the Address Offsets For Access to an Array Element and Record Field

Citation:

  • Aho, Lam, Sethi, & Ullman, Compilers: Principles, Techniques, and Tools
  • UT Austin CS375: Compilers by G.Novak.

Data Area and Base-Offset Addressing

Storage Allocation Algorithm

Data area is contiguous region of storage specified by its base address and size.
Two kinds of data are are arrays and records.

Allocation of Data in Records

Allocation of storage is done as an offset to a base address, which is associated with a block of storage. Assignment of storage locations is done sequentially by a simple algorithm:

  • next = 0
offset = next;
next = next + n;
return offset;
  • Finally, next gives the total size of the block.

Word Alignment and Padding

storage alignment: Certain items must be allocated at restricted locations.
In such cases, next is advanced to next available boundary if needed, and the intervening(사이에 오는) storage is wasted; this is called padding

wordaddress(next, m) = ( (next + m - 1) / m) *m

Variant Records

A record declaration has a form such as : record $field_1$, …, $field_n$ : $type_1$ ; .. end *
Such a declaration is processed as follows :

  • Initialize offset within the record to be 0.
  • For each entry group,
    • find the symbol table entry for the type
    • allocate storage within the record using the storage allocation algorithm and size of type
    • make a symbol table entry for each field, filling in its print name, type, offset in the record, and size.
    • link the entries for the field to an entry for the record
  • The size of the record is the total size given by the storage allocation algorithm, rounded up to whole words.
  • Variant records simply restart the storage allocation at the place where the variant part begins. Total size is the maximum size of the variants.

Citation:

  • Aho, Lam, Sethi, & Ullman, Compilers: Principles, Techniques, and Tools
  • UT Austin CS375: Compilers by G.Novak.

Symbol Tables

Definition

Symbol table is a data structure that associates names with information about the objects that are denoted by the names.
Must be well organized:

  • For fast lookup of symbols
  • To reflect the organization of the program (block structure)
    쉽게 생각하면, Lexicon = dictionary = symbol table.

Operations : insert / search / search-insert

Method of Organization

  • Think of key-value dictionary.
  • Linear, unsorted: Symbols are stored in an array; new symbols are inserted at the end.
  • Linear, sorted: Symbols are stored in an array that is sorted.
  • Hash table: typical hash table :: do you remember collision and rehash?
  • Indexed Buckets: hashed with buckets / alphabetized buckets
  • Tree of symbol table
  • Stack symbol table
    Tree and stack formats both have concepts of scoping: inheritance.
Speed
  • Linear, unsorted: Insert O(1) Search O(n)
  • Linear, sorted: Insert: O(n) Search(log n) binary search
  • Hash table: Insert: O(1) Search(1)
Advantages
  • Binary tree symbol tables have log n search time, so build AVL trees.
  • Hash tables are super fast, don’t have to sort!
Disadvantages
  • AVL tree is bit complicated.
  • Hash Table is more complex, and hard to find hashing function, also complexity of rehashing.
Comparison

Be Able to select a method for a particular application.

Search is More Frequent than Insertion

Symbol Tables for Block-Structured Languages

Tree, Stack structure

What Things Are Put in Symbol Tables?

Symbols: statement labels, variables, constants, subprogram names

  • Symbolic constants
  • Enumerated types
  • Subrange types
  • Scalar types
  • Variables
  • Arrays
  • Records
  • Functions and procedures
  • Labels
  • Objects

Examples of Symbols in a Compiler

|variable|description|
|:———:|:—————:|
|link|link to next symbol|
|name string|name of symbol|

Deletion of Symbols

When it is needed
Relation to Table Organization Methods

Methods of Strong Identifier Names

User of Indirection or Auxiliary Table Structures

Citation:

  • Aho, Lam, Sethi, & Ullman, Compilers: Principles, Techniques, and Tools
  • UT Austin CS375: Compilers by G.Novak.

Shift-Reduce Parser

Use of Stack

Use stack to save statements

Four Actions of Shift-Reduce Parser

  • Shift : Shift the current input onto the stack and go to the specified next state s.
  • Reduce : Given the production r : A -> b, remove enough items from the stack to match b, produce output structure from those items, and put the resulting structure A back on the stack. The next state is found from the goto table.
  • Accept : Parsing is completed! return top of stack as result.
  • Error : The input cannot be parsed according to the grammar; call an error recovery routine. Error entries are denoted by blanks in the parsing table.

Relationship to Canonical Derivation

x

How Parse Tree is Constructed Using Shift-Reduce Parser

Conflicts

  • Reduce/reduce: resolved by choosing the first listed production
  • Shift/reduce: resolved in favor of shift
    Reduce if the precedence of the production is greater than that of the terminal or if they are equal and the associativity of the production is left.
  • reduce의 결과가 터미널의 결과보다 우선순위가 높을 때.
  • 만약 그 결과들이 같으면 associativity는 왼 쪽.
    Shift otherwise.

Citation:

  • Aho, Lam, Sethi, & Ullman, Compilers: Principles, Techniques, and Tools
  • UT Austin CS375: Compilers by G.Novak.

Top-down Parsing

Definition

Parser begins with the Sentence symbol S, expands a production for S, and so on recursively until words are reached. (Terminals) If the string of words matches the input, a parsing has been found.

Why Backup May Be Required

This approach to parsing might seem hopelessly inefficient. However, top-down filtering, that is testing whether the next word in the input string could begin the phrase about to be tried, can prune many failing paths early.

For What Kinds of Languages Does It Work Well

Languages with keywords, such as programming languages or natural language application. Easy to program.

Recursive Descent Parser

A parse for some context-free languages can be written using the technique of recursive descent. The basic idea is to write a procedure to parse each kind of statement or expression in the grammar. When such procedures are written in a recursive language, they can call each other as needed.
It works well for a well-structured language such as Pascal. In Pascal each statement other than assignment statements begin with a unique reserved word; thus, it is easy for a “big switch" program to determine which statement parser to call.

Problems with Left Recursion

It may be necessary to restructure a grammar to avoid left recursion, which can cause a recursive descent parser to go into a loop.

Left Factoring

A method of modifying a grammar to eliminate left recursion

Implementation of Top-down Parser by A Recursive Program

be able to write a procedure for a construct such as an IF statement

Avoiding Ambiguity while keeping the syntax

Pascal and C follows this convention.

  • If parser is SLR, LR(1), or LALR LR parser generator, the programmer will often rely on the generated parser feature of preferring shift over reduce whenever there is a conflict.
  • If the parser is produced by a pruning and deep pruning LR generator, one can issue directives that prune away the ambiguities completely.
  • If the parser is hand-written, the programmer may use a non-ambiguous context-free grammar. Alternatively, one may rely on a non-context-free grammar or a parsing expression grammar.
Avoiding Ambiguity By Changing the Syntax
  • Have a “end if" statement.
  • Disallow following then to be if.
  • Require a bracelet when else follows if
  • Require every if to be paired with an else

Citation:

  • Aho, Lam, Sethi, & Ullman, Compilers: Principles, Techniques, and Tools
  • UT Austin CS375: Compilers by G.Novak.

Operator Precedence Grammar and Parser

Definitions

Expressions could be written in an unambiguous, fully parenthesized form. However, this is less convenient for the programmer.

Precedence numbers

Precedence specifies which operations in a flat expression are to be performed first.
Usually,

  • .
  • ^
  • unary -
  • * /
    • -
  • = <> >= > <= <
  • not
  • and
  • or
  • :=
Associativity

Associativity specifies which operations are to be performed first when adjacent operators have the same precedence.

How operator precedence parser works(give example)

Shift & Reduce

  • Operands
  • Operator
  • (
  • )
  • End

Citation:

  • Aho, Lam, Sethi, & Ullman, Compilers: Principles, Techniques, and Tools
  • UT Austin CS375: Compilers by G.Novak.

Derivation From a Context-Free Grammar

Derivation is the process of deriving a sentence from the start symbol S according to a grammar.

Relationship to Parse Tree

Notations for a Derivation

Symbol => is used to denote a derivation step.
=>* is used to denote derivation in zero or more steps.
=>+ is used to denote derivation in one or more steps.

Definition of a language in Terms of Derivations

Leftmost derivation : the leftmost nonterminal is replaced in each step. If S=>*(lm) a then a is called a left-sentential form of the grammar G. A leftmost derivation is traced by a predictive, top-down parser.
Rightmost derivation canonical derivation : the right most nonterminal is replaced in each step. A shift-reduce parser traces a rightmost derivation “backwards"

Citation:

  • Aho, Lam, Sethi, & Ullman, Compilers: Principles, Techniques, and Tools
  • UT Austin CS375: Compilers by G.Novak.