1 alphabet

\Sigma = \{a^{(1)}, \cdots, a^{(\rho)}\} Non-empty!

1.1 words

Finite sequence of elements (letters)

Empty word: w = \varepsilon

A set of all words over an alphabet is infinite and countable: \Sigma^*

We can enumerate all words as follows (canonical order):

shorter words proceed longer ones
equal lengths are sorted alphabetically

1.2 language

A language over an alphabet is any subset of \Sigma^*, there is an infinite uncountable number of languages

A, B, C, \cdots - nonterminals
a, b, c, \cdots, 0, 1, 2, \cdots, +, -, \cdots - terminals
\alpha, \beta, \gamma, \cdots - strings of terminals and nonterminals
z, y, x, w, v, \cdots - strings of terminals
Z, Y, X, W, \cdots - either terminal or nonterminal

1.3 grammars

G = (V, T, P, S)

V - finite set of symbols called nonterminals
T - finite set of symbols called terminals
P - finite set of productions
S - nonterminal (distinguished) called initial symbol of a grammar

Production is a pair (\alpha, \beta), \alpha, \beta \in (V \cup T)^*, \alpha \ne \varepsilon

By default S, E are the initial symbols of a grammar

1.3.1 direct derivation

\gamma \alpha \delta \implies \gamma \beta \delta, \alpha, \beta, \gamma, \delta \in (V \cup T)^* and \alpha \to \beta is a production

1.3.2 derivation

Application of direct derivation several times

A language L(G) generated by a grammar G is the set of all words over T that can be derived from S

1.3.3 regular grammars

1.3.3.1 left linear

Productions of the form A \to Bw or A \to w

1.3.3.2 right linear

Productions of the form A \to wB or A \to w

1.4 operations

set theoretical operations: union, intersection, complement
concatenation
1. of words: u \circ v
2. of languages: L_1 \circ L_2
* closure, Kleene closure: L^* = \bigcup_{k=0}^\infty L^k

\begin{cases} L^0 = \{\varepsilon\} \\ L^{k+1} = L^k \circ L \\ \end{cases}

1.5 Chomsky’s hierarchy of languages

regular \subsetneq context free \subsetneq context sensitive \subsetneq recursive \subsetneq recursively enumerable \subsetneq all

1.6 regular expressions

Finite strings of symbols over alphabet

\emptyset, \varepsilon and \forall_{a \in \Sigma} a
let r and s be regular expressions: (r+s), (r \circ s), (r^*) (operator precedence is as in the presented order)
apply the rules above finite amount of times

Regular expressions generate languages

- \emptyset generate \emptyset
- \varepsilon generate \{\varepsilon\}
- \forall_{a \in \Sigma} a generates \{a\}
if r and s generate languages R and S then
- (r+s) generates R \cup S
- (r \circ s) generates R \circ S
- (r^*) generates R^*

Regular languages are those generated by regular expressions and only those. Regular grammars generate regular languages and only regular languages.

1.6.1 equivalent expressions

\emptyset + r \equiv r
r + s \equiv s + r
(r+s) + t \equiv r + (s+t)
\varepsilon r \equiv r\varepsilon \equiv r
r(s+t) \equiv rs + rt
(r^*)^* \equiv r^*
(r^*s^*)^* \equiv (r+s)^*
(\varepsilon + r)^* \equiv r^*
(s + r)^* \equiv (r^* + s^*)^*

1.6.2 tools to check if a language is regular

regular grammars
regular expressions
pumping lemma
Myhill-Nerode lemma

1.6.2.1 pumping lemma

If a language L is regular then there exists a constant n_L s.t. for any z \in L:

|z| \ge n_L \implies (\exists_{u,v,w \in \Sigma^*, z = uvw, |uv| \le n_L, |v| \ge 1})(\forall_{i=0,1,2,\cdots})z_i= uv^iw\in L

1.6.2.1.1 contraposition

If for any constant N there exists z \in L s.t. the following conditions holds:

|z| \ge N \land (\forall_{u,v,w \in \Sigma^*, z= uvw, |uv| \le N, |v| \ge 1})(\exists_{i=0,1,2,\cdots})z_i=uv^iw \notin L

then L is not regular

1.6.2.2 Myhill-Nerode

A language L is regular iff the relation R_L induced by L has finite index (finite number of equivalence classes)

Where uR_Lv \equiv \big [(\forall_{z\in \Sigma^*})uz\in L \equiv vz \in L\big]

let \rho \subset \Sigma^*\times\Sigma^*, u,v\in \Sigma^*, u\rho v \equiv |supp(u) \cap supp(v)| = 2 where supp(u) = \text{the set of letters in u}

1.7 context-free grammars and languages

A grammar is context-free iff it’s productions are of a form A \mapsto \alpha where A \in V, \alpha \in (V\cup T)^*

Context-free languages are languages generated by context-free grammars and only those.

1.7.1 simplification

removing useless symbols¹ \to remove productions using those symbols
removing unit productions
1. delete productions of the form A \mapsto A
2. replace A \mapsto B with A \mapsto \text{right hand side of B-production}
changing status of nullable symbols (nonterminals which can generate an empty word)

1.7.1.1 identification of useless symbols

1.7.1.1.1 not reachable

TODO: wording here is very wrong

V_r = \{s\}, T_r = \emptyset
production starts with a reachable left hand symbol and symbols on its right hand side to the two sets above
repeat step 2. as long as V_r, T_r are being changed
remove productions that do not include symbols from sets 1.

1.7.1.1.2 not generative

V_g = \emptyset, T_g = T
for each production if its right hand side is empty or has only generative symbols then add its left hand side to V_g
repeat step 2. as lon as V_g is being changed
remove non generative symbols and productions using those

1.7.1.1.3 nullables

V_n = \emptyset the set of nullable symbols
for each production if its right hand side is empty or is a string of nullable symbols then add its left hand side to V_n
repeat step 2. as long as V_n is being changed
change status of nullable symbols and delete nullable productions

A \mapsto X_1, \cdots, X_k, \cdots X_n

if X_k is nullable then replace this production with

A \mapsto X_1, \cdots, X_k, \cdots X_n
A \mapsto X_1, \cdots, X_{k-1}, X_{k+1} \cdots X_n

Finally, delete productions to empty word

If a grammar doesn’t generate the empty word then the generated language does not change

1.7.2 normal form

Any context-free grammar not generating empty word can be transformed to normal forms

1.7.2.1 Chomsky

The productions are A \mapsto a or A \mapsto BC

simplify a grammar
this yields productions A \mapsto a or A \mapsto x_1, \cdots , x_n where n \ge 2
in all productions A \mapsto x_1, \cdots , x_n where n \ge 2 replace terminals with a nonterminal and add a new production A_a \mapsto a
this yields productions A \mapsto a, A \mapsto BC, and A \mapsto B_1, \cdots, B_n where n > 2
replace A \mapsto B_1, \cdots, B_n where n > 2 with A \mapsto B_1, \cdots, B_{n-2}, B_{n-1,n} and B_{n-1,n} \mapsto B_{n-1}B_n
repeat step 5. until all productions are in Chomsky’s form

1.7.2.2 Greibach

The productions are A \mapsto a\alpha where \alpha \in V^*

transform a grammar to Chomsky’s form (obligatory in automatic transformation)
enumerate all nonterminals A_1, A_2, \cdots A_N
all productions have to satisfy the star condition:

(*) \begin{cases} A_i \mapsto a\alpha & a \in T, \alpha \in V^* \\ A_i \mapsto A_j\alpha & j > i, \alpha \in V^*, |\alpha| > 0\\ \end{cases}

assume that all A_i productions satisfy (*) for i < k and A_k productions do not, thus are A_k \mapsto a\alpha, A_k \mapsto A_i\alpha where i < k, and A_k \mapsto A_k\alpha
replace in A_k \mapsto A_i\alpha where i < k the A_i with right-hand side of A_i productions. After this operation if we still have of the same form they will have the first nonterminal of greater index than i. Repeating this step will make it satisfy (*)
assume that we have productions A_k \mapsto A_k\alpha_1 | A_k\alpha_2 | \cdots | A_k\alpha_m | \beta_1 | \cdots | \beta_n. Add another nonterminal B_k and replace productions with A_k \mapsto \beta_1 | \cdots | \beta_n | B_k\beta_1 | \cdots | B_k\beta_n and B_k \mapsto \alpha_1 | \cdots | \alpha_m | \alpha_1B_k|\cdots | \alpha_mB_k. B_k proceeds all A_i nonterminals and B nonterminals are arranged according to their indices. Repeat to A_{k+1}, A_N
A_N productions are in the Greibach form
A_{N-1} productions are either A_{N-1} \mapsto a\alpha or A_{N-1} A_N \alpha. Replace with right-hand side of A_N. Repeat for A_{N-l}.

1.7.3 derivation trees

A derivation tree of a word in a grammar is a tree with the following properties:

its internal nodes are labelled by nonterminals
its leaves are labelled by an empty word or terminals
its root is labelled by the initial symbol of the grammar
for each internal node its children are labelled by symbol of right hand side of a production from this node’s label (these children must appear in the order defined by the right hand side derivations)
if a node has a child labelled by an empty word then it is the only child of this node
the crop² of such a tree is the derived word

1.7.3.1 unique derivation

We say that a word has unique derivation in a grammar if there is exactly one derivation tree

1.7.4 pumping lemma

If a language L is context free then there exists a constant n_L such that for each z \in L the following holds

|z| \ge \implies (\exists_{u,v,w,x,y \in \Sigma^*}, z=uvwxy, |vwx| \le n_L, |vx| \ge 1)(\forall_{i=0,1,2,\cdots})z_i = uv^iwx^iy \in L

1.7.4.1 contraposition

If for any constant N there exists z \in L s.t. the following conditions holds:

|z| \ge N \land (\forall_{u,v,w,x,y \in \Sigma^*, z = uvwxy, |vwx| \le N, |vx| \ge 1})(\exists_{i=0,1,2,\cdots})z_i= uv^iwx^iy \notin L

then L is not context free

1.8 Cocke-Younger-Kasami algorithm

Algorithm for checking whether a word belongs to the language generated by a grammar to grammar in Chomsky’s normal form.

1.9 translation grammars

G = (V, T, T_r, P, S) where T_r denote the translation symbols

1.10 LL(1) grammars

Recall context free productions: A \mapsto \alpha where \alpha \in (V \cup T)^*

FIRST: a set of terminals that begin any intermediate word of any derivation from \alpha to a string of terminals
FOLLOW: it is a set of terminals which can directly follow A in any intermediate word of any derivation in this grammar
SELECT: FIRST or FIRST \cup FOLLOW. SELECT is FIRST iff production is non-nullable

A grammar is a LL(1) iff for each nonterminal A sets SELECT are pairwise disjoint for all A-productions

1.11 Context-sensitive grammars

Productions of the form:

\alpha \mapsto \beta
\alpha, \beta \in (V \cup T)^*

Where 1 \le |\alpha| \le |\beta|

So the empty word does not belong to context-sensitive languages (CSL)

1.11.1 Normal form

Productions are \delta_1 A \delta_2 \mapsto \delta_1 \alpha \delta_2 where A \in V, \alpha \in (V \cup T)^*, |\alpha| \ge 1, and \delta_1, \delta_2 \in (V \cup T)^*

1.12 Recursive languages

Those that are accepted by turing machines with a stop property. There is no class of grammars generating these languages.

useless symbols - Not reachable (does not appear in any derivation), not generative (nonterminals that cannot generate a string of terminals)↩︎
sequence of leaves↩︎