Showing all posts tagged "Compiler"
Top-down Parsing
Definition
Parser begins with the Sentence symbol S, expands a production for S, and so on recursively until words are reached. (Terminals) If the string of words matches the input, a parsing has been found.
Why Backup May Be Required
This approach to parsing might seem hopelessly inefficient. However, top-down filtering, that is testing whether the next word in the input string could begin the phrase about to be tried, can prune many failing paths early.
For What Kinds of Languages Does It Work Well
Languages with keywords, such as programming languages or natural language application. Easy to program.
Recursive Descent Parser
A parse for some context-free languages can be written using the technique of recursive descent. The basic idea is to write a procedure to parse each kind of statement or expression in the grammar. When such procedures are written in a recursive language, they can call each other as needed.
It works well for a well-structured language such as Pascal. In Pascal each statement other than assignment statements begin with a unique reserved word; thus, it is easy for a “big switch" program to determine which statement parser to call.
Problems with Left Recursion
It may be necessary to restructure a grammar to avoid left recursion, which can cause a recursive descent parser to go into a loop.
Left Factoring
A method of modifying a grammar to eliminate left recursion
Implementation of Top-down Parser by A Recursive Program
be able to write a procedure for a construct such as an IF statement
Avoiding Ambiguity while keeping the syntax
Pascal and C follows this convention.
- If parser is SLR, LR(1), or LALR LR parser generator, the programmer will often rely on the generated parser feature of preferring shift over reduce whenever there is a conflict.
- If the parser is produced by a pruning and deep pruning LR generator, one can issue directives that prune away the ambiguities completely.
- If the parser is hand-written, the programmer may use a non-ambiguous context-free grammar. Alternatively, one may rely on a non-context-free grammar or a parsing expression grammar.
Avoiding Ambiguity By Changing the Syntax
- Have a “end if" statement.
- Disallow following then to be if.
- Require a bracelet when else follows if
- Require every if to be paired with an else
Citation:
- Aho, Lam, Sethi, & Ullman, Compilers: Principles, Techniques, and Tools
- UT Austin CS375: Compilers by G.Novak.
Posted on March 12th, 2015
Operator Precedence Grammar and Parser
Definitions
Expressions could be written in an unambiguous, fully parenthesized form. However, this is less convenient for the programmer.
Precedence numbers
Precedence specifies which operations in a flat expression are to be performed first.
Usually,
- .
- ^
- unary -
- * /
-
- -
- = <> >= > <= <
- not
- and
- or
- :=
Associativity
Associativity specifies which operations are to be performed first when adjacent operators have the same precedence.
How operator precedence parser works(give example)
Shift & Reduce
- Operands
- Operator
- (
- )
- End
Citation:
- Aho, Lam, Sethi, & Ullman, Compilers: Principles, Techniques, and Tools
- UT Austin CS375: Compilers by G.Novak.
Posted on March 12th, 2015
Derivation From a Context-Free Grammar
Derivation is the process of deriving a sentence from the start symbol S according to a grammar.
Relationship to Parse Tree
Notations for a Derivation
Symbol => is used to denote a derivation step.
=>* is used to denote derivation in zero or more steps.
=>+ is used to denote derivation in one or more steps.
Definition of a language in Terms of Derivations
Leftmost derivation : the leftmost nonterminal is replaced in each step. If S=>*(lm) a then a is called a left-sentential form of the grammar G. A leftmost derivation is traced by a predictive, top-down parser.
Rightmost derivation canonical derivation : the right most nonterminal is replaced in each step. A shift-reduce parser traces a rightmost derivation “backwards"
Citation:
- Aho, Lam, Sethi, & Ullman, Compilers: Principles, Techniques, and Tools
- UT Austin CS375: Compilers by G.Novak.
Posted on March 12th, 2015
Context-free Languages
Grammar Definition
A - a
A $\in$ N
a $\in$ V*
Since left-hand-side of each production is a single nonterminal, every derivation is a tree.
Memory Needed for Parsing.
Recursive program = a stack for temporary storage
Notations for defining the syntax of a language :
BNF
Pascal-style transition diagrams.
Ambiguous grammars VS unambiguous grammars
Ambiguous : grammar for which some sentence has more than one parse tree is ambiguous. There’s two way to deal with this. Re-write the grammar and User disambiguating rules to guide the parser.
Augmented transition network;Yacc
- Arbitrary tests : can be added to the arcs.
- Structure-building actions : can be added to the arcs. These actions may save information in registers to be used later by the parser, or to build the representation of the meaning of the sentence. Transformations, can also be handled.
- Phrase names : as well as part-of-speech names, may appear on arcs. This allows a grammar to be called as a subroutine.
What kinds of language constructs are context-free?
Arithmetic expressions, noun pharses.
Any language that requires counting must be at least context free.
Citation:
- Aho, Lam, Sethi, & Ullman, Compilers: Principles, Techniques, and Tools
- UT Austin CS375: Compilers by G.Novak.
Posted on March 12th, 2015
Lexical Analyzer
What constructs are parsed
Type of grammar used
Use regular expressions.
Need for speed?
NFA to DFA, and converted to tables to control a table-driven parser.
lex
Citation:
- Aho, Lam, Sethi, & Ullman, Compilers: Principles, Techniques, and Tools
- UT Austin CS375: Compilers by G.Novak.
Posted on March 12th, 2015
Regular Expression
Definition
Regular expresssions are a more convenient way(than a regular grammar) to specify a regular language.
Construct a Regular Expression for a Language
in lex rule.
[chars] : Any number of the set of characters chars
[c1-c2] : Any character from c1 through c2
[^chars] : not chars
{category} : An instance of a previously named category
“string" : Exactly the specified string.
s1 | s2 : s1 or s2
a : zero or more a
a+ : one or more a
a? : zero or one a
* a{2,5} : 2 to 5 a
Construct the Language Denoted by a Regular Expression
Follow the rule.
Relation Between Regular Expression and Finite Automata
Always can change into DFA.
Automatic Construction of Finite Automata from Regular Expressions
Citation:
- Aho, Lam, Sethi, & Ullman, Compilers: Principles, Techniques, and Tools
- UT Austin CS375: Compilers by G.Novak.
Posted on March 12th, 2015
Regular Languages 정규 언어
Grammar?
Formal Definition of a Grammar
Grammar specifies the legal syntax of a language. A grammar specifies a set of productions; non-terminal symbols. A grammar describes the structure of the sentences of a language in terms of components or phrases. The mathematical description of phrase structure grammars is due to Chomsky.
문법은 언어의 syntax(the rules by which legitimate statements can be constructed)를 나타내는 규칙이다. 문법은 production으로 이루어져 있다.
Formally, a Grammar is a four-tuple G = (T, N, S, P) where:
- T is the set of terminal symbols or words of the language.
- N is the set of nonterminal symbols or phrase names that are used in specifying the grammar. We say V = T $\cup$ N is the vocabulary of the grammar.
- S is a distinguished element of N called the start symbol.
- P is a set of productions, P $\subseteq V^NV^ \texttimes V^*$. We write productions in the form a $\textrightarrow$ b where a is a string of symbols from V containing at lest one nonterminal and b is any string of symbols from V.
- V = Vocabulrary
문법은 언어의 구조가 올바른지를 판단하는 거다.
Chart about Grammar Classes.
- General Phrase-structure
- Context Sensitive
- Context Free pushdown stack O($n^3$)
- Regular finite storage O(n)
We’ll focus on Context-Free and Regular Language.
Regular Languages
Grammar definitions
A - xB.
A - x.
A, B $\in$ N
x $\in T^*$
Only one nonterminal can appear in any derived string, and it must appear at the rightmost.
Uses
In compilers, we use to parse the stream of characters into tokens. Cause parser never has to back up or do search. O(n); Linear parsing time. Indentifiers, numbers, word forms.
Local Ambiguity
Examples
A - xB
B - x
Way to Handle
Make test cases to pre-process it.
Relationship of finite automata and regular grammars
Equivalent to a deterministic finite automaton
Linear nature of a derivation from a regular grammar
Since only one nonterminal can appear in any derived string, and the rightmost end, parse tree has only one step to evaluate it, produces right-biased tree.
Citation:
- Aho, Lam, Sethi, & Ullman, Compilers: Principles, Techniques, and Tools
- UT Austin CS375: Compilers by G.Novak.
Posted on March 12th, 2015
Major Phases of a Compiler 컴파일러의 주요 단계들
What the phases do 각 단계는 무엇을 하나?
Lexical Analyzer
or scanning
Reads the stream of characters and group the characters into meaningful sequences called lexemes.
글자들을 읽어서 글자들에 의미를 부여하는 단계; 문장에서 단어를 잘라준다고 생각하면 편하다.
ex a = 1 + 2 는 a,=,1,+,2 5개의 token이 된다.
Syntax Analyzer
or parsing
The parser uses the first components of the tokens produced by the lexical analyzer to produce a tree-like intermediate representation that describes the grammatical structure of the token stream.
Parser는 lexer가 만들어준 토큰을 가지고, 문장에 의미를 부여하기 시작한다. (컴퓨터 과학에서는 tree로 token을 엮어낸다.)
Semantic Analyzer
Semantic analyzer uses the syntax tree and the information in the symbol table to check the source program for semantic consistency with the language definition.
Parser가 만들어준 트리로, 이 것이 과연 문법적으로 맞는 놈인지 문법을 검사한다.
Intermediate Code Generator
Machine-independent Code optimizer
Code Generator
Machine-dependent Code Optimizer
Data flow between phases 각 단계별 데이터의 흐름
- Character Stream : Before lexical analysis ; lexer는 character들을 받아서
- Token : Before syntax analysis ; token을 생성하고
- Syntax Tree : before intermediate code generation ; parser는 syntax tree를 만들며
- Intermediate representation : before code generation ; 코드 제네레이터는 중간 코드를 생성하고
- Target Machine Code : after code generation ; 코드 제네레이터는 컴퓨터가 읽을수 있는 형태로 뽑아준다.
Citation:
- Aho, Lam, Sethi, & Ullman, Compilers: Principles, Techniques, and Tools
- UT Austin CS375: Compilers by G.Novak.
Posted on March 12th, 2015
Lexical Analysis
Lexical Analysis
3.1 The Role of the Lexical Analyzer
-
To read the input characters of the source program
-
Group them into lexemes
-
Produce as output a sequence of tokens for each lexeme in the source program
Scanning and Lexical analysis
3.1.1 Lexical Analysis Versus Parsing
Why do we separate lexer and parser?
-
Simplicity of design is the most important consideration
-
Compiler efficiency is improved
-
Compiler portability is enhanced
(only lexer has to deal with device-input)
3.1.2 Tokens, Patterns, and Lexemes
ex printf(“Hello World!");
printf and score are lexemes matching the pattern for token id.
Posted on March 9th, 2015