1 alphabet

\Sigma = \{a^{(1)}, \cdots, a^{(\rho)}\} Non-empty!

1.1 words

Finite sequence of elements (letters)

Empty word: w = \varepsilon

A set of all words over an alphabet is infinite and countable: \Sigma^*

We can enumerate all words as follows (canonical order):

  1. shorter words proceed longer ones
  2. equal lengths are sorted alphabetically

1.2 language

A language over an alphabet is any subset of \Sigma^*, there is an infinite uncountable number of languages

1.3 grammars

G = (V, T, P, S)

Production is a pair (\alpha, \beta), \alpha, \beta \in (V \cup T)^*, \alpha \ne \varepsilon

By default S, E are the initial symbols of a grammar

1.3.1 direct derivation

\gamma \alpha \delta \implies \gamma \beta \delta, \alpha, \beta, \gamma, \delta \in (V \cup T)^* and \alpha \to \beta is a production

1.3.2 derivation

Application of direct derivation several times

A language L(G) generated by a grammar G is the set of all words over T that can be derived from S

1.3.3 regular grammars

1.3.3.1 left linear

Productions of the form A \to Bw or A \to w

1.3.3.2 right linear

Productions of the form A \to wB or A \to w

1.4 operations

\begin{cases} L^0 = \{\varepsilon\} \\ L^{k+1} = L^k \circ L \\ \end{cases}

1.5 Chomsky’s hierarchy of languages

regular \subsetneq context free \subsetneq context sensitive \subsetneq recursive \subsetneq recursively enumerable \subsetneq all

1.6 regular expressions

Finite strings of symbols over alphabet

  1. \emptyset, \varepsilon and \forall_{a \in \Sigma} a
  2. let r and s be regular expressions: (r+s), (r \circ s), (r^*) (operator precedence is as in the presented order)
  3. apply the rules above finite amount of times

Regular expressions generate languages

  1. if r and s generate languages R and S then

Regular languages are those generated by regular expressions and only those. Regular grammars generate regular languages and only regular languages.

1.6.1 equivalent expressions

1.6.2 tools to check if a language is regular

  1. regular grammars
  2. regular expressions
  3. pumping lemma
  4. Myhill-Nerode lemma

1.6.2.1 pumping lemma

If a language L is regular then there exists a constant n_L s.t. for any z \in L:

|z| \ge n_L \implies (\exists_{u,v,w \in \Sigma^*, z = uvw, |uv| \le n_L, |v| \ge 1})(\forall_{i=0,1,2,\cdots})z_i= uv^iw\in L

1.6.2.1.1 contraposition

If for any constant N there exists z \in L s.t. the following conditions holds:

|z| \ge N \land (\forall_{u,v,w \in \Sigma^*, z= uvw, |uv| \le N, |v| \ge 1})(\exists_{i=0,1,2,\cdots})z_i=uv^iw \notin L

then L is not regular

1.6.2.2 Myhill-Nerode

A language L is regular iff the relation R_L induced by L has finite index (finite number of equivalence classes)

Where uR_Lv \equiv \big [(\forall_{z\in \Sigma^*})uz\in L \equiv vz \in L\big]

let \rho \subset \Sigma^*\times\Sigma^*, u,v\in \Sigma^*, u\rho v \equiv |supp(u) \cap supp(v)| = 2 where supp(u) = \text{the set of letters in u}

1.7 context-free grammars and languages

A grammar is context-free iff it’s productions are of a form A \mapsto \alpha where A \in V, \alpha \in (V\cup T)^*

Context-free languages are languages generated by context-free grammars and only those.

1.7.1 simplification

  1. removing useless symbols1 \to remove productions using those symbols
  2. removing unit productions
    1. delete productions of the form A \mapsto A
    2. replace A \mapsto B with A \mapsto \text{right hand side of B-production}
  3. changing status of nullable symbols (nonterminals which can generate an empty word)

1.7.1.1 identification of useless symbols

1.7.1.1.1 not reachable

TODO: wording here is very wrong

  1. V_r = \{s\}, T_r = \emptyset
  2. production starts with a reachable left hand symbol and symbols on its right hand side to the two sets above
  3. repeat step 2. as long as V_r, T_r are being changed
  4. remove productions that do not include symbols from sets 1.
1.7.1.1.2 not generative
  1. V_g = \emptyset, T_g = T
  2. for each production if its right hand side is empty or has only generative symbols then add its left hand side to V_g
  3. repeat step 2. as lon as V_g is being changed
  4. remove non generative symbols and productions using those
1.7.1.1.3 nullables
  1. V_n = \emptyset the set of nullable symbols
  2. for each production if its right hand side is empty or is a string of nullable symbols then add its left hand side to V_n
  3. repeat step 2. as long as V_n is being changed
  4. change status of nullable symbols and delete nullable productions

A \mapsto X_1, \cdots, X_k, \cdots X_n

if X_k is nullable then replace this production with

Finally, delete productions to empty word

If a grammar doesn’t generate the empty word then the generated language does not change

1.7.2 normal form

Any context-free grammar not generating empty word can be transformed to normal forms

1.7.2.1 Chomsky

The productions are A \mapsto a or A \mapsto BC

  1. simplify a grammar
  2. this yields productions A \mapsto a or A \mapsto x_1, \cdots , x_n where n \ge 2
  3. in all productions A \mapsto x_1, \cdots , x_n where n \ge 2 replace terminals with a nonterminal and add a new production A_a \mapsto a
  4. this yields productions A \mapsto a, A \mapsto BC, and A \mapsto B_1, \cdots, B_n where n > 2
  5. replace A \mapsto B_1, \cdots, B_n where n > 2 with A \mapsto B_1, \cdots, B_{n-2}, B_{n-1,n} and B_{n-1,n} \mapsto B_{n-1}B_n
  6. repeat step 5. until all productions are in Chomsky’s form

1.7.2.2 Greibach

The productions are A \mapsto a\alpha where \alpha \in V^*

  1. transform a grammar to Chomsky’s form (obligatory in automatic transformation)
  2. enumerate all nonterminals A_1, A_2, \cdots A_N
  3. all productions have to satisfy the star condition:

(*) \begin{cases} A_i \mapsto a\alpha & a \in T, \alpha \in V^* \\ A_i \mapsto A_j\alpha & j > i, \alpha \in V^*, |\alpha| > 0\\ \end{cases}

  1. assume that all A_i productions satisfy (*) for i < k and A_k productions do not, thus are A_k \mapsto a\alpha, A_k \mapsto A_i\alpha where i < k, and A_k \mapsto A_k\alpha
  2. replace in A_k \mapsto A_i\alpha where i < k the A_i with right-hand side of A_i productions. After this operation if we still have of the same form they will have the first nonterminal of greater index than i. Repeating this step will make it satisfy (*)
  3. assume that we have productions A_k \mapsto A_k\alpha_1 | A_k\alpha_2 | \cdots | A_k\alpha_m | \beta_1 | \cdots | \beta_n. Add another nonterminal B_k and replace productions with A_k \mapsto \beta_1 | \cdots | \beta_n | B_k\beta_1 | \cdots | B_k\beta_n and B_k \mapsto \alpha_1 | \cdots | \alpha_m | \alpha_1B_k|\cdots | \alpha_mB_k. B_k proceeds all A_i nonterminals and B nonterminals are arranged according to their indices. Repeat to A_{k+1}, A_N
  4. A_N productions are in the Greibach form
  5. A_{N-1} productions are either A_{N-1} \mapsto a\alpha or A_{N-1} A_N \alpha. Replace with right-hand side of A_N. Repeat for A_{N-l}.

1.7.3 derivation trees

A derivation tree of a word in a grammar is a tree with the following properties:

  1. its internal nodes are labelled by nonterminals
  2. its leaves are labelled by an empty word or terminals
  3. its root is labelled by the initial symbol of the grammar
  4. for each internal node its children are labelled by symbol of right hand side of a production from this node’s label (these children must appear in the order defined by the right hand side derivations)
  5. if a node has a child labelled by an empty word then it is the only child of this node
  6. the crop2 of such a tree is the derived word

1.7.3.1 unique derivation

We say that a word has unique derivation in a grammar if there is exactly one derivation tree

1.7.4 pumping lemma

If a language L is context free then there exists a constant n_L such that for each z \in L the following holds

|z| \ge \implies (\exists_{u,v,w,x,y \in \Sigma^*}, z=uvwxy, |vwx| \le n_L, |vx| \ge 1)(\forall_{i=0,1,2,\cdots})z_i = uv^iwx^iy \in L

1.7.4.1 contraposition

If for any constant N there exists z \in L s.t. the following conditions holds:

|z| \ge N \land (\forall_{u,v,w,x,y \in \Sigma^*, z = uvwxy, |vwx| \le N, |vx| \ge 1})(\exists_{i=0,1,2,\cdots})z_i= uv^iwx^iy \notin L

then L is not context free

1.8 Cocke-Younger-Kasami algorithm

Algorithm for checking whether a word belongs to the language generated by a grammar to grammar in Chomsky’s normal form.

1.9 translation grammars

G = (V, T, T_r, P, S) where T_r denote the translation symbols

1.10 LL(1) grammars

Recall context free productions: A \mapsto \alpha where \alpha \in (V \cup T)^*

A grammar is a LL(1) iff for each nonterminal A sets SELECT are pairwise disjoint for all A-productions

1.11 Context-sensitive grammars

Productions of the form:

Where 1 \le |\alpha| \le |\beta|

So the empty word does not belong to context-sensitive languages (CSL)

1.11.1 Normal form

Productions are \delta_1 A \delta_2 \mapsto \delta_1 \alpha \delta_2 where A \in V, \alpha \in (V \cup T)^*, |\alpha| \ge 1, and \delta_1, \delta_2 \in (V \cup T)^*

1.12 Recursive languages

Those that are accepted by turing machines with a stop property. There is no class of grammars generating these languages.


  1. useless symbols - Not reachable (does not appear in any derivation), not generative (nonterminals that cannot generate a string of terminals)↩︎

  2. sequence of leaves↩︎