Sungwon Chung

Sungwon Chung@sungpi

Showing all posts tagged "Compiler"

Type Systems & Type Checking

Dynamic Type and Static Type

  • Runtime vs Compile time
  • Slow vs fast
  • flexible vs inflexible

Strong Typing

  • Sound type system: guarantees that a no type errors can occur at runtime
  • cons
    • Some checks can only be done dynamically (Array bounds checking)
    • Such checking can cause serious performance degradation
    • Even so-called strongly typed languages often have “holes" in the type system (e.g. variant records in Pascal)

Type Equivalence

  • If types are required to match, the question of what types are considered to be “equivalent" arises
  • Structural equivalence requires equivalent structures
    • Identical basic types
    • Same type constructor applied to equivalent types
  • Name equivalence considered types to be equal only if the same names are used
  • C uses structural equivalence except for records

Type Signatures

  • The signature of a procedure is a description of the types of its arguments and type of its result.
  • i.e) sqrt : real -> real, mod: integer X integer -> integer

Polymorphic Procedures

  • A procedure that has multiple type signatures is called polymorphic

Failure Of Type Checking Due To Aliasing, Variant Records, Subroutine Calls

Citation:

  • Aho, Lam, Sethi, & Ullman, Compilers: Principles, Techniques, and Tools

  • UT Austin CS375: Compilers by G.Novak.

Record Structure Declarations and References

What Information Is Stored In Symbol Table From Declaration

  • A record declaration has a form such as : record $field_1$, …, $field_n$ : $type_1$ ; … end
    1. Initialize offset within the record to be 0.
    1. For each entry group,
    2. (a) Find the symbol table entry for the type
    3. (b) Allocate storage within the record using the storage allocation algorithm
    4. (c) Make a symbol table entry for each field, filling in its print name, type, offset in the record, and size
    5. (d) Link the entries for the fields to an entry for the record
    1. The size of the record is the total size given by the storage allocation algorithm, rounded up to whole words, (e.g. multiple of 8)
    1. Variant records simply restart the storage allocation at the place where the variant part begins. Total size is the maximum size of the variants

Generation Of Intermediate Code For References To Records

type date = record mo : 1..12;
day : 1..31;
year : integer end;
person = record name : alfa;
ss : integer;
birth : date end;

var people : array[1..100] of person;

  • people[i].birth.day
  • people[i] : (aref people (i-1)32 ) = (aref people (+ -32 ( i 32) ) )

References Using Pointers

Records With Variant Parts

Citation:

  • Aho, Lam, Sethi, & Ullman, Compilers: Principles, Techniques, and Tools

  • UT Austin CS375: Compilers by G.Novak.

Array References:

Methods For Storing Arrays

Effect of Array Order and Loop Order On Efficiency

Computation Of Effective Address(Formula)

Why Optimization Of Array References Is Important

Citation:

  • Aho, Lam, Sethi, & Ullman, Compilers: Principles, Techniques, and Tools

  • UT Austin CS375: Compilers by G.Novak.

Topics In Parsing Programming Language Constructs:

Current Status Variable

Type Coercion

  • When a binary operator is reduced, it is necessary to check the types of the arguments, possibly to coerce an argument to the correct type, and to infer the result type.
  • char < int < float < double
  • For most operators, right-most type takes precedence
  • For assignment, left-most type takes precedence

Conversion of Names With Constant Compile-Time Values To Constants

Intermediate Code for Non-Arithmetic Statements

Citation:

  • Aho, Lam, Sethi, & Ullman, Compilers: Principles, Techniques, and Tools

  • UT Austin CS375: Compilers by G.Novak.

Example Questions For Final Exam

Adderessing

From Midterm!

Show how an operator precedence parser would parse the string (ex. A-(B/C-D)/E+F)

  • Show the contents of the stacks at each step; produce a tree as output

Give one advantage and disadvantage of hashing as a method of symbol table organization

Consider the regular expression (a|b)b+b. What is the simplest regular expression that denotes the same language?

Give the allowable form of productions for a Regular grammar.

Consider the following grammar :

  • S -> a S
  • S -> S b
  • S -> b
  • What kind of grammar is this in the Chomsky hierarchy?
  • What kind of language does it denote?
  • Is there a simpler kind of grammar that denotes the same language? If so, give the grammar, if not, explain why not.

Briefly and clearly define the following terms : vocab

Other Questions

A robot moves on a square grid. The robot can go forward, turn left or turn right. Give a grammar to describe the language of all sequences of moves that leave the robot pointing in the same direction as when it started

What kind of grammar is the above(in the Chomsky hierarchy)?

Describe a kind of local optimization

Describe what it means for a subexpression to be (a) available (b) busy (c) killed

Describe sources of extra run-time overhead in (a) time and (b)space in an OOP language

Draw boxes around the following code to show the basic blocks:

n := k*2;
if n < j then write(‘less’) else begin k := k - 1; write(‘more’) end;
writeln(k);

  • Number the blocks and draw a flow graph; give the matrix form of the flow graph and its transitive closure

Give an advantage and a disadvantage for (a) call by reference (b) call by value

What are the most important things to optimize in a scientific program? Why?

Give three examples of computer architecture innovations that require compiler innovations to be effective

Briefly and clearly define the following terms: … 20 terms chosen from the vocabulary list on the study guide

Citation:

  • Aho, Lam, Sethi, & Ullman, Compilers: Principles, Techniques, and Tools

  • UT Austin CS375: Compilers by G.Novak.

Other Concepts

Synthesized Translation

The term synthesized translation includes two ideas:

  • A complex expression is constructed from subexpressions
  • The translation of a complex expression is constructed in a mechanical way form the translations of its subexpressions
    if x > max then max := x
    is constructed of (x > max) and (max := x), therefore, ( if (> x max) (:= max x) )

Controllability and Observability

These are central concepts from control theory. We will define them as :

  • Controllability: the ability to chance the behavior of a system by changing its parameters.
  • Observability: the ability to observe the behavior of a system well enough to control it.

In order to control a system, both controllability and observability are required.

The implications For Large Software Systems
  • Aspects of software that cannot easily be observed will never be debugged
  • All Large Software Systems must have observability built in
  • Observability is a requirement, not a luxury. The time spent building in observability will be well repaid.
  • In-process traces can be turned on by setting bits in a bit vector.

Comparison of Parsing Techniques

Recursive Descent and Operator Precedence
Advantages:
  • easy to control, understand
  • Good error messages can be produced
Disadvantages:
  • More code to write
  • Grammar not as clear
Parser Generator (yacc)
Advantages
  • Less code to write
  • Grammar is clear and easy to change
Disadvantages:
  • The programmer may get stuck on grammar errors that are hard to understand
  • Error messages during compilation may be unhelpful and far from the actual error

Lexical Scoping

Scope is the region of program text over which a symbol can be referenced.

With lexical scoping, a name is defined when it can be looked up in the lexicon at the point of use.

Structure References

  • Every expression has type
  • Types from a tree structure (a graph when pointers to types are included, but still treated as a tree)
  • The structure references in source code specify a traversal down the type tree
  • A reference to a part of a structure depends on the type of the structure; the results of the reference are:
    • An address expression for the substructure
    • A new type for the substructure

Citation:

  • Aho, Lam, Sethi, & Ullman, Compilers: Principles, Techniques, and Tools
  • UT Austin CS375: Compilers by G.Novak.

Address Calculations

For Array References (Know the Fomulars)

A simple array declaration has a form such as:
* array[ $low_1$ .. $high_1$] of type

  • Find the symbol table entry for type
  • Make a symbol table entry for the array type. The total size of the array is: ($high_1$ - $low_1$ + 1) * size(type)

Address Calculations for Access to Fields of a Record

Determining Types of Accesses To Parts of a Structure

Be Able to Calculate the Size of An Array Records

Be Able to Calculate the Address Offsets For Access to an Array Element and Record Field

Citation:

  • Aho, Lam, Sethi, & Ullman, Compilers: Principles, Techniques, and Tools
  • UT Austin CS375: Compilers by G.Novak.

Data Area and Base-Offset Addressing

Storage Allocation Algorithm

Data area is contiguous region of storage specified by its base address and size.
Two kinds of data are are arrays and records.

Allocation of Data in Records

Allocation of storage is done as an offset to a base address, which is associated with a block of storage. Assignment of storage locations is done sequentially by a simple algorithm:

  • next = 0
offset = next;
next = next + n;
return offset;
  • Finally, next gives the total size of the block.

Word Alignment and Padding

storage alignment: Certain items must be allocated at restricted locations.
In such cases, next is advanced to next available boundary if needed, and the intervening(사이에 오는) storage is wasted; this is called padding

wordaddress(next, m) = ( (next + m - 1) / m) *m

Variant Records

A record declaration has a form such as : record $field_1$, …, $field_n$ : $type_1$ ; .. end *
Such a declaration is processed as follows :

  • Initialize offset within the record to be 0.
  • For each entry group,
    • find the symbol table entry for the type
    • allocate storage within the record using the storage allocation algorithm and size of type
    • make a symbol table entry for each field, filling in its print name, type, offset in the record, and size.
    • link the entries for the field to an entry for the record
  • The size of the record is the total size given by the storage allocation algorithm, rounded up to whole words.
  • Variant records simply restart the storage allocation at the place where the variant part begins. Total size is the maximum size of the variants.

Citation:

  • Aho, Lam, Sethi, & Ullman, Compilers: Principles, Techniques, and Tools
  • UT Austin CS375: Compilers by G.Novak.

Symbol Tables

Definition

Symbol table is a data structure that associates names with information about the objects that are denoted by the names.
Must be well organized:

  • For fast lookup of symbols
  • To reflect the organization of the program (block structure)
    쉽게 생각하면, Lexicon = dictionary = symbol table.

Operations : insert / search / search-insert

Method of Organization

  • Think of key-value dictionary.
  • Linear, unsorted: Symbols are stored in an array; new symbols are inserted at the end.
  • Linear, sorted: Symbols are stored in an array that is sorted.
  • Hash table: typical hash table :: do you remember collision and rehash?
  • Indexed Buckets: hashed with buckets / alphabetized buckets
  • Tree of symbol table
  • Stack symbol table
    Tree and stack formats both have concepts of scoping: inheritance.
Speed
  • Linear, unsorted: Insert O(1) Search O(n)
  • Linear, sorted: Insert: O(n) Search(log n) binary search
  • Hash table: Insert: O(1) Search(1)
Advantages
  • Binary tree symbol tables have log n search time, so build AVL trees.
  • Hash tables are super fast, don’t have to sort!
Disadvantages
  • AVL tree is bit complicated.
  • Hash Table is more complex, and hard to find hashing function, also complexity of rehashing.
Comparison

Be Able to select a method for a particular application.

Search is More Frequent than Insertion

Symbol Tables for Block-Structured Languages

Tree, Stack structure

What Things Are Put in Symbol Tables?

Symbols: statement labels, variables, constants, subprogram names

  • Symbolic constants
  • Enumerated types
  • Subrange types
  • Scalar types
  • Variables
  • Arrays
  • Records
  • Functions and procedures
  • Labels
  • Objects

Examples of Symbols in a Compiler

|variable|description|
|:———:|:—————:|
|link|link to next symbol|
|name string|name of symbol|

Deletion of Symbols

When it is needed
Relation to Table Organization Methods

Methods of Strong Identifier Names

User of Indirection or Auxiliary Table Structures

Citation:

  • Aho, Lam, Sethi, & Ullman, Compilers: Principles, Techniques, and Tools
  • UT Austin CS375: Compilers by G.Novak.

Shift-Reduce Parser

Use of Stack

Use stack to save statements

Four Actions of Shift-Reduce Parser

  • Shift : Shift the current input onto the stack and go to the specified next state s.
  • Reduce : Given the production r : A -> b, remove enough items from the stack to match b, produce output structure from those items, and put the resulting structure A back on the stack. The next state is found from the goto table.
  • Accept : Parsing is completed! return top of stack as result.
  • Error : The input cannot be parsed according to the grammar; call an error recovery routine. Error entries are denoted by blanks in the parsing table.

Relationship to Canonical Derivation

x

How Parse Tree is Constructed Using Shift-Reduce Parser

Conflicts

  • Reduce/reduce: resolved by choosing the first listed production
  • Shift/reduce: resolved in favor of shift
    Reduce if the precedence of the production is greater than that of the terminal or if they are equal and the associativity of the production is left.
  • reduce의 결과가 터미널의 결과보다 우선순위가 높을 때.
  • 만약 그 결과들이 같으면 associativity는 왼 쪽.
    Shift otherwise.

Citation:

  • Aho, Lam, Sethi, & Ullman, Compilers: Principles, Techniques, and Tools
  • UT Austin CS375: Compilers by G.Novak.