| |
Abstract Syntax Trees
Description
Various programs (such as compilers and calculators), use
a data structure called an Abstract Syntax Tree (AST)
to represent syntax that has been recognized by the program.
NOTE:
An Abstract Syntax Tree is not the same
as a Binary Search Tree.
In a Binary Search Tree, both
interior and leaf nodes can contain data, and the nodes
are ordered in such a way that the data can be found by
searching the tree in a prescribed order. In an Abstract
Syntax Tree, the interior nodes contain operator information,
while the leaf nodes contain operand information.
The
order of the nodes within a Binary Search Tree is
determined by the contents of each node. The order of the
nodes in an AST is determined by the expression that the
AST represents, and by the precedence of the expression's
components.
For example, the arithmetic expression:
-5+12*4
would be represented in Abstract Syntax Tree form as:

Note:
Conventionally, trees in Computer Science are drawn
upside down when compared to trees in the real world.
Thus, the root of the tree is at the top,
not the bottom.
An arithmetic expression uses a convention to
distinguish between the different possible interpretations.
For example, the expression:
-5+12*4
could be interpreted as either:
(-5+12)*4
or:
-5+(12*4)
The latter is the conventionally correct interpretation,
because the multiplication operator conventionally has higher
precedence than the addition operator. However, the
Abstract Syntax Tree representation is unambiguous,
which is one of the reasons it is so popular.
In the above AST diagram, each element in the tree is
connected to other items in the tree by lines. An element in
an Abstract Syntax Tree is usually called a Node.
Nodes can either be interior nodes or leaf
nodes. Interior nodes are those that have one or more
lines connecting them with other nodes further down the tree.
Leaf nodes have no such connections; they are the lowest
nodes in that particular branch of the tree. We say that
interior nodes have child nodes, while leaf nodes do
not.
In this example, interior nodes are used to represent the arithmetic
operators (+,-,*,/,
etc.). Arithmetic operators can be unary (operating on
a single operand), or binary (operating on two
operands). A unary operator has exactly one child node, while
a binary operator has exactly two child nodes. For example,
the minus sign in the above expression represents the unary
minus operator, while the asterisk represents the binary
multiplication operator, and the plus sign represents the
binary addition operator.
A node below an operand in the tree can represent a value
such as 5 or 12 (in the above diagram, all
values are integer values). If the node represents a simple
value, then it must be a leaf node. If it represents an
expression (consisting of some combination of operators and
operands), then the node must represent an operator, and
therefore must be an interior node.
For example, imagine you wish to modify the original
expression to look like this:
-(5/23)+12*(4-2)
The Abstract Syntax Tree for this new expression looks
like:
Note that we have replaced the original leaf nodes for 5 and 4
by subtrees which represent the new subexpressions.
The top node in each of these subexpressions represents an operator; which operator depends on the relative precedence
of the operators in the subexpression.
A program can evaluate such a tree by walking
the tree in a prescribed order: at each interior node it
determines the value from each subtree, and applies the
operator to those values to produce a value for the tree
represented by that interior node. A leaf node provides a
value directly. A program can also walk the tree in the same
fashion to produce a printout of the equivalent arithmetic
expression (although it would have to insert parentheses
appropriately to maintain the proper operator precedence).
The Assignment
Your job is to write a set of classes which will allow you
to construct such an abstract syntax tree, and then walk it,
and evaluate it. Here's an abstract Node class to
start with:
#ifndef _NODES_H
#define _NODES_H
//
// nodes.h
//
// The Node base class
class Node
{
public:
virtual ~Node() {}
virtual double Evaluate() const = 0;
virtual void Walk() const = 0;
};
#endif
|
This class is to be used as a base class for a family of
node classes which can be used to construct an AST and then
evaluate and walk it. Cut it from this web page and paste it into your
editor.
The member function Evaluate()
returns the value for the (sub)tree, starting from that node
on down. (Notice that it returns a double,
not an int. This is because
we would like to be able to evaluate expressions that contain
more than integer values -- see later.) For example, calling Evaluate() for the node that
represents the binary multiplication operator, *, in the original expression, will
return a value of 48.0 (12*4). Calling Evaluate() for the node at the top
of the tree will return the value of the entire expression.
The member function Walk()
prints out (on cout ) the
arithmetic expression represented by that (sub)tree. In the
original example expression, calling Walk()
for the node that represents the unary minus operator will
print out:
-5
while calling Walk() for
the node that represents the binary multiplication operator
will print out:
12 * 4
Calling Walk() for the
node at the top of the tree will print out the entire
expression:
-5 + 12 * 4
Questions:
- What
makes the Node class
abstract?
- Why
must Node::~Node() be virtual?
Part A: Integer, UnaryMinus, Plus and Times
The first task is to write a set of classes: Integer, UnaryMinus, Plus and Times, which use single
inheritance and virtual functions, and which
can be used in the following main program:
//
// syntax.cpp
//
// Test evaluation of an abstract syntax tree.
//
#include <iostream>
using namespace std;
#include "nodes.h"
int main()
{
Node *expression = // AST represents: -5 + 12 * 4
new Plus( // +
new UnaryMinus( // -
new Integer(5) // 5
),
new Times( // *
new Integer(12), // 12
new Integer(4) // 4
)
);
double result = expression->Evaluate();
cout << "Result of ";
expression->Walk();
cout << " = " << result << endl;
delete expression; // Clean up expression AST
// Test additional expression functionality
expression = // AST represents: -(5 + 12 * 4)
new UnaryMinus(
new Plus(
new Integer(5),
new Times(
new Integer(12),
new Integer(4)
)
)
);
result = expression->Evaluate();
cout << "Result of ";
expression->Walk();
cout << " = " << result << endl;
delete expression; // Clean up expression AST
return 0;
}
|
Note:
In all of the classes you write, be sure to implement
any non-trivial member functions in
a separate .cpp file,
not in the .h file.
In Part 1, this will be in file nodes.cpp ; in Part 2, in file
extnodes.cpp . Any code depending on
<iostream> should be placed in the .cpp file.
Sequence of steps to perform:
- In nodes.h, write a class Integer which is publically
derived from class Node.
The class will have a data member which will contain the integer value
for that instance of Integer. The
class will also have a constructor which allows an
integer value to be passed in as a parameter, and virtual functions Evaluate() and Walk(). Instances of class Integer
will be leaf nodes in the Abstract Syntax Tree. Any non-inline member
function implementations should go in file nodes.cpp.
- Now it's time to look at interior (non-leaf) nodes: the
operators. As mentioned before, these break down into two subclasses -- unary
and binary operators. However, all operators share some
degree of commonality, so, first, create an abstract
class Operator,
which is publically derived from Node, and which contains
nothing but a single protected pure virtual member
function:
virtual const char *Name() const = 0;
This will force every non-abstract class that derives from
the Operator class
to implement a virtual function with this signature. The Name() virtual member function is
used to output the operator's symbol (+, -,
*, etc.). The Name() function will be called from
the Walk() function (see
later). (Note that you can add appropriate spacing around the operator name
to obtain pleasing results when the expression is output.)
- All unary operators share some common
attributes. Naturally, every unary operator is an Operator. In particular,
each unary operator needs only a single pointer
to its operand (child) node. Create an abstract
class, UnaryOperator,
derived from Operator.
This class will serve as the base class for all unary
operators. It will contain data members and member
functions common to all unary operators.
The UnaryOperator
class should have a single data member to point to its operand. What type
should this data member be? (Hint: read the above paragraph carefully.) Make
this data member private!
Provide a protected member
function (with an appropriate name) to allow derived classes
to access, but not modify it. Make the
constructor and destructor for UnaryOperator
be protected.
Note:
The test program assumes that the tree may be cleaned
up by performing a delete. The easiest way to
accomplish this is to assume that every Node will be allocated (new) from the heap, and so can
be validly deleted. Do the appropriate delete(s) in the
UnaryOperator destructor.
Implement the Walk()
virtual member function in the UnaryOperator
class. [Why there?] Make it protected.
Hint:
This Walk()
function will make recursive calls to itself and other
virtual Walk()
member functions.
- Create the concrete (non-abstract) class UnaryMinus,
derived from UnaryOperator.
It should have a very simple public
constructor (with what for arguments?).
Implement the Evaluate()
and Name() virtual
functions for this class.
- All binary operators share some common
attributes. Naturally, every binary operator is an Operator. In addition,
each binary operator needs two pointers -- one
to its left operand (child) node, and one to its right
operand (child) node. Create an abstract class, BinaryOperator,
publically derived from Operator.
This class will serve as the base class for all binary
operators. It will contain data members and member
functions common to all binary operators.
The BinaryOperator class should have a two data
members. What type should each one be? Hint:
read the above paragraph carefully.) Make these data
members private. Provide protected access member functions
(with appropriate names) to allow derived classes to
access, but not modify them. Make the constructor and
destructor for BinaryOperator
be protected.
Note:
The test program assumes that the tree may be cleaned
up by performing a delete. The easiest way to
accomplish this is to assume that every Node will be allocated (new) from the heap, and so can
be validly deleted. Do the appropriate delete(s) in the
UnaryOperator destructor.
Implement the Walk()
virtual member function in the BinaryOperator
class. Make it protected.
Hint:
This Walk()
function will make recursive calls to itself and other
virtual Walk()
member functions.
- Create the classes Plus
and Times,
each publically derived from BinaryOperator.
They should have a very simple public constructor (with
what for arguments?) Implement the Evaluate() and Name() virtual functions for
these classes.
- By now, you should be able to compile and link
the entire syntax
program, and run it. (Note that you will have to compile
and link together two .cpp
files)
Does it work? If so, do
you understand how?
My version produced the following output for the first expression:
Result of -5 + 12 * 4 = 43.0
Part B: Adding More Node Types
Requirement: From now on,
consider all the .h and .cpp
files you developed in Part A locked. This means that you
cannot add or change anything in those files.
Create new files extnodes.h and extnodes.cpp, and add the
additional node classes to them (not
to nodes.h or nodes.cpp):
- A Real class that will hold floating
point (double)
values.
- A UnaryPlus class.
- A Minus class.
- A DividedBy class
Here's an extended test program to test your work:
//
// syntax.cpp
//
// Test evaluation of an abstract syntax tree.
//
#include <iostream>
using namespace std;
#include "nodes.h"
#include "extnodes.h" // For Real, UnaryPlus, Minus and DividedBy
int main()
{
Node *expression = // AST represents: -5 + 12 * 4
new Plus( // +
new UnaryMinus( // -
new Integer(5) // 5
),
new Times( // *
new Integer(12), // 12
new Integer(4) // 4
)
);
double result = expression->Evaluate();
cout << "Result of ";
expression->Walk();
cout << " = " << result << endl;
delete expression; // Clean up expression AST
// Test additional expression functionality
expression = // AST represents: 53.5 / 45 - 73.54 * (+3 + 8)
new Minus(
new DividedBy(
new Real(53.5),
new Integer(45)
),
new Times(
new Real(73.54),
new Plus(
new UnaryPlus(
new Integer(3)
),
new Integer(8)
)
)
);
result = expression->Evaluate();
cout << "Result of ";
expression->Walk();
cout << " = " << result << endl;
delete expression; // Clean up expression AST
// Another expression
expression = // AST represents: (53.5 - 45) * (73.54 + -3 + 8)
new Times(
new Minus(
new Real(53.5),
new Integer(45)
),
new Plus(
new Real(73.54),
new Plus(
new UnaryMinus(
new Integer(3)
),
new Integer(8)
)
)
);
result = expression->Evaluate();
cout << "Result of ";
expression->Walk();
cout << " = " << result << endl;
delete expression; // Clean up expression AST
return 0;
}
|
When you have completed the additional classes, compile this new syntax.cpp
file and run it to get the results of the three expressions. (Cut the above
code from this webpage and paste it into your editor.)
Note: You may
notice that the second and third expressions in the above class are not
correctly parenthesized when printed. Don't worry about that for now;
we'll fix that later, in Part C.
Note that, now, you'll have three .cpp
files to compile and link together.
How hard was it
to add these?
What did you
learn from this exercise?
Part C: Teaching Walk() to Correctly Parenthesize
Note: Before starting to work on this part, you'll need to change a setting
in your Microsoft Visual C++ project. See here to find out how to do
that. Of course, if you're not using Visual C++, maybe you can ignore
this; however, you may have to figure out something equivalent in your
C++ environment.
You probably noticed in Part B that there are certain expressions (such as
the second and third expressions in our enhanced main program) where the
output isn't quite right -- it needs extra parentheses to represent the
correct operator precedence.
Let's fix this. Here are the steps we'll follow:
- Modify the Operator
class as follows:
- At this point, if you compile your classes again, you'll find that they
won't compile -- they generate errors. That's because we've added a
requirement that all concrete classes that derive from Operator
implement the Precedence() virtual
member function, but we haven't actually implemented those functions yet.
To solve this, you'll have to add an implementation of the Precedence()
member function to every concrete class that extends Operator.
- Have each member function return one of the values defined in Operator
(remember that Operator
is a base class, and so each derived class inherits these values):
- The UnaryPlus
and UnaryMinus
classes should return PREC_HIGHEST,
- The Times and DividedBy
classes should return PREC_HIGHER,
and
- The Plus and Minus
classes should return PREC_MEDIUM
- Once you've implemented all these Precedence()
member functions, the entire project should build again. However, we
haven't actually used the member functions anywhere yet, so the behavior
of our classes will still be as it was before.
So, let's finish the job:
- In each of the Walk()
member functions, you'll have to do the following:
- For each operand, find out whether the operand is an Operator.
How?
Hint: Take a look at the dynamic_cast
operator.
Note: In Visual C++ 6.0, and
perhaps later versions thereof, you'll have to turn on the
Run-Time Type Information (RTTI) to ensure that the compiler
generates the proper code for this. (See the above note for a
link.)
- If it is an Operator,
compare its precedence level with the precedence level of the
current operator.
- If the precedence level of the operand operator is lower than
that of the current operator, insert parentheses around the Walk()
call for that operand.
- That should do it! Rebuild, and try it.
In my implementation, the second and third expressions printed out as:
Result of 53.5 / 45 - 73.54 * (+3 + 8) = -807.751
Result of (53.5 - 45) * (73.54 + -3 + 8) = 667.59
|