Sungwon Chung

Type Lattice

C++ 에서의 자료형type 을 크게 네 가지로 생각해 보면 다음과 같다:

bool
char
int
double

이 네 가지 자료형Type 에는 우선 순위가 있다. 이를 Type Lattice라고 한다.
자료형 변환Cast을 할 때나, 연산을 할 때 이 우선 순위가 적용된다:

bool to char
bool to int
bool to double
char to int
char to double
int to double
위와 같은 경우는 자료형을 변환할 때 안전하다고 한다. (Safe conversion)
이 외의 경우에는 안전하지 않다고 한다. 이는 자료형Type안의 값Value가 파괴될 수 있기 때문이다. C++에서는 변수를 선언할 때나 변환할 때, { } 를 사용함으로써 컴파일 단계에서 이 안전하지 않은 변환에서 에러를 낼 수 있다. 그리고 이 에러를 Narrowing이라고 한다.


int a{4.2};
int b = 4.2;
1 + 2.5;

이 코드는 int a에 double 값인 4.2가 들어감으로써 narrow error를 발생시킨다.
반면, int b = 4.2는 b에 4를 넣어주는 강제 형변환이 된다.

연산

연산 시에는 Type Lattice에 따라, 안전하게 변환이 되게 컴파일러가 계산을 한다.
예를 들어 1 + 2.5 는 int와 double의 +연산이다. 이 경우에 int + int로 계산하거나, double + double로 계산하는 두 가지 경우가 있다. 전자는 int + double->int의 연산이고, 후자는 int->double + double의 연산이다. 전자는 double->int가 unsafe conversion이고, 후자는 int->double이 safe conversion이므로 컴파일러는 자동으로 후자를 택해서 연산을 하게 된다.

Sungwon Chung Posted on March 27th, 2015

Testcode


int a;
std::cout << a;

dho dkseh

int a;
std::cout << a;

왜??

import urllib

print ‘Hello World'

Sungwon Chung Posted on March 13th, 2015

Other Concepts

Synthesized Translation

The term synthesized translation includes two ideas:

A complex expression is constructed from subexpressions
The translation of a complex expression is constructed in a mechanical way form the translations of its subexpressions
if x > max then max := x
is constructed of (x > max) and (max := x), therefore, ( if (> x max) (:= max x) )

Controllability and Observability

These are central concepts from control theory. We will define them as :

Controllability: the ability to chance the behavior of a system by changing its parameters.
Observability: the ability to observe the behavior of a system well enough to control it.

In order to control a system, both controllability and observability are required.

The implications For Large Software Systems

Aspects of software that cannot easily be observed will never be debugged
All Large Software Systems must have observability built in
Observability is a requirement, not a luxury. The time spent building in observability will be well repaid.
In-process traces can be turned on by setting bits in a bit vector.

Comparison of Parsing Techniques

Recursive Descent and Operator Precedence

Advantages:

easy to control, understand
Good error messages can be produced

Disadvantages:

More code to write
Grammar not as clear

Parser Generator (yacc)

Advantages

Less code to write
Grammar is clear and easy to change

Disadvantages:

The programmer may get stuck on grammar errors that are hard to understand
Error messages during compilation may be unhelpful and far from the actual error

Lexical Scoping

Scope is the region of program text over which a symbol can be referenced.

With lexical scoping, a name is defined when it can be looked up in the lexicon at the point of use.

Structure References

Every expression has type
Types from a tree structure (a graph when pointers to types are included, but still treated as a tree)
The structure references in source code specify a traversal down the type tree
A reference to a part of a structure depends on the type of the structure; the results of the reference are:
- An address expression for the substructure
- A new type for the substructure

Citation:

Aho, Lam, Sethi, & Ullman, Compilers: Principles, Techniques, and Tools
UT Austin CS375: Compilers by G.Novak.

Sungwon Chung Posted on March 12th, 2015

Address Calculations

For Array References (Know the Fomulars)

A simple array declaration has a form such as:
* array[ $low_1$ .. $high_1$] of type

Find the symbol table entry for type
Make a symbol table entry for the array type. The total size of the array is: ($high_1$ - $low_1$ + 1) * size(type)

Address Calculations for Access to Fields of a Record

Determining Types of Accesses To Parts of a Structure

Be Able to Calculate the Size of An Array Records

Be Able to Calculate the Address Offsets For Access to an Array Element and Record Field

Citation:

Aho, Lam, Sethi, & Ullman, Compilers: Principles, Techniques, and Tools
UT Austin CS375: Compilers by G.Novak.

Sungwon Chung Posted on March 12th, 2015

Data Area and Base-Offset Addressing

Storage Allocation Algorithm

Data area is contiguous region of storage specified by its base address and size.
Two kinds of data are are arrays and records.

Allocation of Data in Records

Allocation of storage is done as an offset to a base address, which is associated with a block of storage. Assignment of storage locations is done sequentially by a simple algorithm:

next = 0

offset = next;
next = next + n;
return offset;

Finally, next gives the total size of the block.

Word Alignment and Padding

storage alignment: Certain items must be allocated at restricted locations.
In such cases, next is advanced to next available boundary if needed, and the intervening(사이에 오는) storage is wasted; this is called padding

wordaddress(next, m) = ( (next + m - 1) / m) *m

Variant Records

A record declaration has a form such as : record $field_1$, …, $field_n$ : $type_1$ ; .. end *
Such a declaration is processed as follows :

Initialize offset within the record to be 0.
For each entry group,
- find the symbol table entry for the type
- allocate storage within the record using the storage allocation algorithm and size of type
- make a symbol table entry for each field, filling in its print name, type, offset in the record, and size.
- link the entries for the field to an entry for the record
The size of the record is the total size given by the storage allocation algorithm, rounded up to whole words.
Variant records simply restart the storage allocation at the place where the variant part begins. Total size is the maximum size of the variants.

Citation:

Aho, Lam, Sethi, & Ullman, Compilers: Principles, Techniques, and Tools
UT Austin CS375: Compilers by G.Novak.

Sungwon Chung Posted on March 12th, 2015

Symbol Tables

Definition

Symbol table is a data structure that associates names with information about the objects that are denoted by the names.
Must be well organized:

For fast lookup of symbols
To reflect the organization of the program (block structure)
쉽게 생각하면, Lexicon = dictionary = symbol table.

Operations : insert / search / search-insert

Method of Organization

Think of key-value dictionary.
Linear, unsorted: Symbols are stored in an array; new symbols are inserted at the end.
Linear, sorted: Symbols are stored in an array that is sorted.
Hash table: typical hash table :: do you remember collision and rehash?
Indexed Buckets: hashed with buckets / alphabetized buckets
Tree of symbol table
Stack symbol table
Tree and stack formats both have concepts of scoping: inheritance.

Speed

Linear, unsorted: Insert O(1) Search O(n)
Linear, sorted: Insert: O(n) Search(log n) binary search
Hash table: Insert: O(1) Search(1)

Advantages

Binary tree symbol tables have log n search time, so build AVL trees.
Hash tables are super fast, don’t have to sort!

Disadvantages

AVL tree is bit complicated.
Hash Table is more complex, and hard to find hashing function, also complexity of rehashing.

Comparison

Be Able to select a method for a particular application.

Search is More Frequent than Insertion

Symbol Tables for Block-Structured Languages

Tree, Stack structure

What Things Are Put in Symbol Tables?

Symbols: statement labels, variables, constants, subprogram names

Symbolic constants
Enumerated types
Subrange types
Scalar types
Variables
Arrays
Records
Functions and procedures
Labels
Objects

Examples of Symbols in a Compiler

Deletion of Symbols

When it is needed

Relation to Table Organization Methods

Methods of Strong Identifier Names

User of Indirection or Auxiliary Table Structures

Citation:

Aho, Lam, Sethi, & Ullman, Compilers: Principles, Techniques, and Tools
UT Austin CS375: Compilers by G.Novak.

Sungwon Chung Posted on March 12th, 2015

Shift-Reduce Parser

Use of Stack

Use stack to save statements

Four Actions of Shift-Reduce Parser

Shift : Shift the current input onto the stack and go to the specified next state s.
Reduce : Given the production r : A -> b, remove enough items from the stack to match b, produce output structure from those items, and put the resulting structure A back on the stack. The next state is found from the goto table.
Accept : Parsing is completed! return top of stack as result.
Error : The input cannot be parsed according to the grammar; call an error recovery routine. Error entries are denoted by blanks in the parsing table.

Relationship to Canonical Derivation

x

How Parse Tree is Constructed Using Shift-Reduce Parser

Conflicts

Reduce/reduce: resolved by choosing the first listed production
Shift/reduce: resolved in favor of shift
Reduce if the precedence of the production is greater than that of the terminal or if they are equal and the associativity of the production is left.
reduce의 결과가 터미널의 결과보다 우선순위가 높을 때.
만약 그 결과들이 같으면 associativity는 왼 쪽.
Shift otherwise.

Citation:

Aho, Lam, Sethi, & Ullman, Compilers: Principles, Techniques, and Tools
UT Austin CS375: Compilers by G.Novak.

Sungwon Chung Posted on March 12th, 2015

Top-down Parsing

Definition

Parser begins with the Sentence symbol S, expands a production for S, and so on recursively until words are reached. (Terminals) If the string of words matches the input, a parsing has been found.

Why Backup May Be Required

This approach to parsing might seem hopelessly inefficient. However, top-down filtering, that is testing whether the next word in the input string could begin the phrase about to be tried, can prune many failing paths early.

For What Kinds of Languages Does It Work Well

Languages with keywords, such as programming languages or natural language application. Easy to program.

Recursive Descent Parser

A parse for some context-free languages can be written using the technique of recursive descent. The basic idea is to write a procedure to parse each kind of statement or expression in the grammar. When such procedures are written in a recursive language, they can call each other as needed.
It works well for a well-structured language such as Pascal. In Pascal each statement other than assignment statements begin with a unique reserved word; thus, it is easy for a “big switch" program to determine which statement parser to call.

Problems with Left Recursion

It may be necessary to restructure a grammar to avoid left recursion, which can cause a recursive descent parser to go into a loop.

Left Factoring

A method of modifying a grammar to eliminate left recursion

Implementation of Top-down Parser by A Recursive Program

be able to write a procedure for a construct such as an IF statement

Avoiding Ambiguity while keeping the syntax

Pascal and C follows this convention.

If parser is SLR, LR(1), or LALR LR parser generator, the programmer will often rely on the generated parser feature of preferring shift over reduce whenever there is a conflict.
If the parser is produced by a pruning and deep pruning LR generator, one can issue directives that prune away the ambiguities completely.
If the parser is hand-written, the programmer may use a non-ambiguous context-free grammar. Alternatively, one may rely on a non-context-free grammar or a parsing expression grammar.

Avoiding Ambiguity By Changing the Syntax

Have a “end if" statement.
Disallow following then to be if.
Require a bracelet when else follows if
Require every if to be paired with an else

Citation:

Aho, Lam, Sethi, & Ullman, Compilers: Principles, Techniques, and Tools
UT Austin CS375: Compilers by G.Novak.

Sungwon Chung Posted on March 12th, 2015

Operator Precedence Grammar and Parser

Definitions

Expressions could be written in an unambiguous, fully parenthesized form. However, this is less convenient for the programmer.

Precedence numbers

Precedence specifies which operations in a flat expression are to be performed first.
Usually,

.
^
unary -
* /
- -
= <> >= > <= <
not
and
or
:=

Associativity

Associativity specifies which operations are to be performed first when adjacent operators have the same precedence.

How operator precedence parser works(give example)

Shift & Reduce

Operands
Operator
(
)
End

Citation:

Aho, Lam, Sethi, & Ullman, Compilers: Principles, Techniques, and Tools
UT Austin CS375: Compilers by G.Novak.

Sungwon Chung Posted on March 12th, 2015

Derivation From a Context-Free Grammar

Derivation is the process of deriving a sentence from the start symbol S according to a grammar.

Relationship to Parse Tree

Notations for a Derivation

Symbol => is used to denote a derivation step.
=>* is used to denote derivation in zero or more steps.
=>+ is used to denote derivation in one or more steps.

Definition of a language in Terms of Derivations

Leftmost derivation : the leftmost nonterminal is replaced in each step. If S=>*(lm) a then a is called a left-sentential form of the grammar G. A leftmost derivation is traced by a predictive, top-down parser.
Rightmost derivation canonical derivation : the right most nonterminal is replaced in each step. A shift-reduce parser traces a rightmost derivation “backwards"

Citation:

Aho, Lam, Sethi, & Ullman, Compilers: Principles, Techniques, and Tools
UT Austin CS375: Compilers by G.Novak.

Sungwon Chung Posted on March 12th, 2015

Sungwon Chung@sungpi

C++ 에서의 자료형type 을 크게 네 가지로 생각해 보면 다음과 같다:

연산

Synthesized Translation

Controllability and Observability

The implications For Large Software Systems

Comparison of Parsing Techniques

Recursive Descent and Operator Precedence

Advantages:

Disadvantages:

Parser Generator (yacc)

Advantages

Disadvantages:

Lexical Scoping

Structure References

Citation:

For Array References (Know the Fomulars)

Address Calculations for Access to Fields of a Record

Determining Types of Accesses To Parts of a Structure

Be Able to Calculate the Size of An Array Records

Be Able to Calculate the Address Offsets For Access to an Array Element and Record Field

Citation:

Storage Allocation Algorithm

Allocation of Data in Records

Word Alignment and Padding

Variant Records

Citation:

Definition

Method of Organization

Speed

Advantages

Disadvantages

Comparison

Be Able to select a method for a particular application.

Search is More Frequent than Insertion

Symbol Tables for Block-Structured Languages

What Things Are Put in Symbol Tables?

Examples of Symbols in a Compiler

Deletion of Symbols

When it is needed

Relation to Table Organization Methods

Methods of Strong Identifier Names

User of Indirection or Auxiliary Table Structures

Citation:

Use of Stack

Four Actions of Shift-Reduce Parser

Relationship to Canonical Derivation

How Parse Tree is Constructed Using Shift-Reduce Parser

Conflicts

Citation:

Definition

Why Backup May Be Required

For What Kinds of Languages Does It Work Well

Recursive Descent Parser

Problems with Left Recursion

Left Factoring

Implementation of Top-down Parser by A Recursive Program

Avoiding Ambiguity while keeping the syntax

Avoiding Ambiguity By Changing the Syntax

Citation:

Definitions

Precedence numbers

Associativity

How operator precedence parser works(give example)

Citation:

Relationship to Parse Tree

Notations for a Derivation

Definition of a language in Terms of Derivations

Citation: