|
Notes > Models of Computation > Regular Expressions
|
Regular expressions allow a dynamic look at the sequences of "states" which a program can reach given a small number of basic rules. Compilers use BNF and the theory involved in regular expressions and they take the form of finite-state machines.
Sequence, selection and iteration are fundamental constructs in any programming language these constructs can be modelled by defining a basic set of operators. These operators make up the regular expression notation.
There are four basic symbols for regular expressions:
- "|" meaning "or"
- "*" meaning zero, one or more occurrences
- "e" which denotes empty
- "Sequencing" which refers to the order of production of the strings in the program.
Regular expression notation can then be used to create language constructs. x|y|z means that an "x" or a "y" or a "z" can exist. x* can be an empty string, or a single "x", or as many occurences of "x" as desired.
The concept of sequencing is simply applied by following the sequence from left to right. For example, the regular expression a*b | b* | c, would mean that "aaab" is valid, but "caaa" is not.
Set notation can be used to express the possible produced strings in a program. For example, x|y|z is written as {x,y,z} in set notation. For the expression x*, the equivalent set notation is {e,x,xx,xxx,xxxx,xxxxx...}. The number of different strings that can be produced is infinite and it is also worth noting that the different possible strings are created by recursion.
Taking the example of x*|y* as a regular expression, this could allow for any number of occurences of "x" (including zero) OR any number of occurrences of "y". The sets for x and y would therefore be {e,x,xx,xxx,xxxx,xxxxx...} or {e,y,yy,yyy,yyyy,yyyyy...}. By combining these sets the following set can be gained: {e,x,xx,xxx...y,yy,yyy...}.
Brackets can be used in the constructing of Regular Expressions. Brackets are always evaluated first. For example, (xy)*|z
is different to xy*|z. The set for (xy)*|z is {e,xy,xyxy,xyxyxy,...,z} whereas the set for xy*|z is {e,xy,xyy,xyyy,...,z}
The combination ((xy)*|y*)* which involves two sets of brackets can be simplified because it is actually the same as ((xy)*|y*). Both expresions mean that (xy)* or y* can occur as many times as required.
Recursive statements can be made in algebraic form for Regular Expressions. For example, B -> bbbB means that a "bbb" can be followed by any amount of occurences of "bbb". The "B" at the end of the statement refers back to the start of the definition and therefore an infinite string of "b"s would be created.
A "prefix" is a group of letters that is attached to the front of a word. For example, "com" is a prefix of "computing". A "suffix" is a string obtained by removing any number of leading symbols of string. For example a suffix of "computing" is "omputing". A "subsequence" is any string that can be obtained by deleting any number of symbols (including zero) from a given string. For example, "apple" could become "ale" if two "p"s are removed.
Search for "Regular Expressions" on:
Google |
Kelkoo |
Amazon |
eBay (UK) |
eBay (US)
Search for "Regular Expressions" on the rest of Computing Students: Regular Expressions
|
|
|