Assignment 7


Assignment 7
Home ]Up ]Assignment 1 ]Assignment 2 ]Assignment 3 ]Assignment 4 ]Assignment 5 ]Assignment 6 ][ Assignment 7 ]Old Assignments ]

 






Abstract Syntax Trees


Description


Various programs (such as compilers and calculators), use
a data structure called an Abstract Syntax Tree (AST)
to represent syntax that has been recognized by the program.

NOTE:
An Abstract Syntax Tree is not the same
as a Binary Search Tree. 

In a Binary Search Tree, both
interior and leaf nodes can contain data, and the nodes
are ordered in such a way that the data can be found by
searching the tree in a prescribed order. In an Abstract
Syntax Tree, the interior nodes contain operator information,
while the leaf nodes contain operand information. 

The
order of the nodes within a Binary Search Tree is
determined by the contents of each node. The order of the
nodes in an AST is determined by the expression that the
AST represents, and by the precedence of the expression’s
components.

For example, the arithmetic expression:

-5+12*4

would be represented in Abstract Syntax Tree form as:


Note:
Conventionally, trees in Computer Science are drawn
upside down when compared to trees in the real world.
Thus, the root of the tree is at the top,
not the bottom.


An arithmetic expression uses a convention to
distinguish between the different possible interpretations.
For example, the expression:


-5+12*4


could be interpreted as either:

(-5+12)*4	

or:

-5+(12*4)

The latter is the conventionally correct interpretation,
because the multiplication operator conventionally has higher
precedence
than the addition operator. However, the
Abstract Syntax Tree representation is unambiguous,
which is one of the reasons it is so popular.

In the above AST diagram, each element in the tree is
connected to other items in the tree by lines. An element in
an Abstract Syntax Tree is usually called a Node.
Nodes can either be interior nodes or leaf
nodes
. Interior nodes are those that have one or more
lines connecting them with other nodes further down the tree.
Leaf nodes have no such connections; they are the lowest
nodes in that particular branch of the tree. We say that
interior nodes have child nodes, while leaf nodes do
not.

In this example, interior nodes are used to represent the arithmetic
operators
(+,-,*,/,
etc.). Arithmetic operators can be unary (operating on
a single operand), or binary (operating on two
operands). A unary operator has exactly one child node, while
a binary operator has exactly two child nodes. For example,
the minus sign in the above expression represents the unary
minus
operator, while the asterisk represents the binary
multiplication
operator, and the plus sign represents the
binary addition operator.

A node below an operand in the tree can represent a value
such as 5 or 12 (in the above diagram, all
values are integer values). If the node represents a simple
value, then it must be a leaf node. If it represents an
expression (consisting of some combination of operators and
operands), then the node must represent an operator, and
therefore must be an interior node.

For example, imagine you wish to modify the original
expression to look like this:

-(5/23)+12*(4-2)

The Abstract Syntax Tree for this new expression looks
like:



Note that we have replaced the original leaf nodes for 5 and 4
by subtrees which represent the new subexpressions.
The top node in each of these subexpressions represents an operator; which operator depends on the relative precedence
of the operators in the subexpression.

A program can evaluate such a tree by walking
the tree in a prescribed order: at each interior node it
determines the value from each subtree, and applies the
operator to those values to produce a value for the tree
represented by that interior node. A leaf node provides a
value directly. A program can also walk the tree in the same
fashion to produce a printout of the equivalent arithmetic
expression (although it would have to insert parentheses
appropriately to maintain the proper operator precedence).


The Assignment


Your job is to write a set of classes which will allow you
to construct such an abstract syntax tree, and then walk it,
and evaluate it. Here’s an abstract Node class to
start with:


#ifndef _NODES_H
#define _NODES_H
//
// nodes.h
//
// The Node base class

class Node
{
  public:
    virtual ~Node() {}

    virtual double Evaluate() const = 0; 
    virtual void Walk() const = 0;
};
#endif

This class is to be used as a base class for a family of
node classes which can be used to construct an AST and then
evaluate and walk it.  Cut it from this web page and paste it into your
editor.

The member function Evaluate()
returns the value for the (sub)tree, starting from that node
on down. (Notice that it returns a double,
not an int. This is because
we would like to be able to evaluate expressions that contain
more than integer values — see later.) For example, calling Evaluate() for the node that
represents the binary multiplication operator, *, in the original expression, will
return a value of 48.0 (12*4). Calling Evaluate() for the node at the top
of the tree will return the value of the entire expression.

The member function Walk()
prints out (on cout) the
arithmetic expression represented by that (sub)tree. In the
original example expression, calling Walk()
for the node that represents the unary minus operator will
print out:

-5

while calling Walk() for
the node that represents the binary multiplication operator
will print out:

12 * 4

Calling Walk() for the
node at the top of the tree will print out the entire
expression:

-5 + 12 * 4

Questions:

  • What
    makes the Node class
    abstract?
  • Why
    must Node::~Node() be virtual?

Part A: Integer, UnaryMinus, Plus and Times


The first task is to write a set of classes: Integer, UnaryMinus, Plus and Times, which use single
inheritance
and virtual functions, and which
can be used in the following main program:


//
// syntax.cpp
//
// Test evaluation of an abstract syntax tree.
//

#include <iostream>
using namespace std;
#include "nodes.h"

int main()
{
    Node *expression = // AST represents: -5 + 12 * 4
             new Plus(                    // +
                   new UnaryMinus(        //   -
                         new Integer(5)   //    5
                         ),
                   new Times(             //   *
                         new Integer(12), //    12
                         new Integer(4)   //    4
                         )
                   );

    double result = expression->Evaluate();

    cout << "Result of ";
    expression->Walk();
    cout << " = " << result << endl;

    delete expression; // Clean up expression AST
 
    // Test additional expression functionality

    expression = // AST represents: -(5 + 12 * 4)
	    new UnaryMinus(
		   new Plus(
			new Integer(5),
			new Times(
				new Integer(12),
				new Integer(4)
				)
			)
		);

    result = expression->Evaluate();
    cout << "Result of ";
    expression->Walk();
    cout << " = " << result << endl;

    delete expression; // Clean up expression AST

    return 0;
}


Note:
In all of the classes you write, be sure to implement
any non-trivial member functions
in
a separate .cpp file
,
not in the .h file.  
In Part 1, this will be in file nodes.cpp; in Part 2, in file
extnodes.cpp. Any code depending on

<iostream>
should be placed in the .cpp file.


Sequence of steps to perform:


  1. In nodes.h, write a class Integer which is publically
    derived
    from class Node
    The  class will have a data member which will contain the integer value
    for that instance of Integer. The
    class will also have a constructor which allows an
    integer value to be passed in as a parameter, and virtual functions Evaluate() and Walk().  Instances of class Integer
    will be leaf nodes in the Abstract Syntax Tree.  Any non-inline member
    function implementations should go in file nodes.cpp.
  1. Now it’s time to look at interior (non-leaf) nodes: the
    operators. As mentioned before, these break down into two subclasses — unary
    and binary operators. However, all operators share some
    degree of commonality, so, first, create an abstract
    class Operator,
    which is publically derived from Node, and which contains
    nothing but a single protected pure virtual member
    function
    :

virtual const char *Name() const = 0; 


This will force every non-abstract class that derives from
the Operator class
to implement a virtual function with this signature. The Name() virtual member function is
used to output the operator’s symbol (+, ,
*, etc.). The Name() function will be called from
the Walk() function (see
later). (Note that you can add appropriate spacing around the operator name
to obtain pleasing results when the expression is output.)


  1. All unary operators share some common
    attributes. Naturally, every unary operator is an Operator. In particular,
    each unary operator needs only a single pointer
    to its operand (child) node. Create an abstract
    class, UnaryOperator,
    derived from Operator.
    This class will serve as the base class for all unary
    operators
    . It will contain data members and member
    functions common to all unary operators.

The UnaryOperator
class should have a single data member to point to its operand. What type
should this data member be? (Hint: read the above paragraph carefully.)
Make
this data member private!
Provide a protected member
function (with an appropriate name) to allow derived classes
to access, but not modify it. Make the
constructor and destructor for UnaryOperator
be protected.


Note:
The test program assumes that the tree may be cleaned
up by performing a delete. The easiest way to
accomplish this is to assume that every Node will be allocated (new) from the heap, and so can
be validly deleted. Do the appropriate delete(s) in the
UnaryOperator destructor.


Implement the Walk()
virtual member function in the UnaryOperator
class. [Why there?]  Make it protected

Hint:
This
Walk()
function will make recursive calls to itself and other
virtual
Walk()
member functions.


  1. Create the concrete (non-abstract) class UnaryMinus,
    derived from UnaryOperator.
    It should have a very simple public
    constructor (with what for arguments?).
    Implement the Evaluate()
    and Name() virtual
    functions for this class.
  1. All binary operators share some common
    attributes. Naturally, every binary operator is an Operator. In addition,
    each binary operator needs two pointers — one
    to its left operand (child) node, and one to its right
    operand (child) node. Create an abstract class, BinaryOperator,
    publically derived from Operator.
    This class will serve as the base class for all binary
    operators. It will contain data members and member
    functions common to all binary operators.

The BinaryOperator class should have a two data
members. What type should each one be? Hint:
read the above paragraph carefully.)
Make these data
members private. Provide protected access member functions
(with appropriate names) to allow derived classes to
access, but not modify them. Make the constructor and
destructor for BinaryOperator
be protected.


Note:
The test program assumes that the tree may be cleaned
up by performing a delete. The easiest way to
accomplish this is to assume that every Node will be allocated (new) from the heap, and so can
be validly deleted. Do the appropriate delete(s) in the
UnaryOperator destructor.


Implement the Walk()
virtual member function in the BinaryOperator
class. Make it protected

Hint:
This
Walk()
function will make recursive calls to itself and other
virtual
Walk()
member functions.


  1. Create the classes Plus
    and Times,
    each publically derived from BinaryOperator.
    They should have a very simple public constructor (with
    what for arguments?
    )   Implement the Evaluate() and Name() virtual functions for
    these classes.
  1. By now, you should be able to compile and link
    the entire syntax
    program, and run it. (Note that you will have to compile
    and link together two .cpp
    files)  

    Does it work? If so, do
    you understand how?

    My version produced the following output for the first expression:

    Result of -5 + 12 * 4 = 43.0

Part B: Adding More Node Types


Requirement: From now on,
consider all the .h and .cpp
files you developed in Part A locked. This means that you
cannot add or change anything in those files
.

Create new files extnodes.h and extnodes.cpp, and add the
additional node classes to them (not
to
nodes.h or nodes.cpp):

  1. A Real class that will hold floating
    point (double)
    values.
  1. A UnaryPlus class.
  1. A Minus class.
  1. A DividedBy class


Here’s an extended test program to test your work:


//
// syntax.cpp
//
// Test evaluation of an abstract syntax tree.
//

#include <iostream>
using namespace std;
#include "nodes.h"
#include "extnodes.h"   // For Real, UnaryPlus, Minus and DividedBy

int main()
{
    Node *expression = // AST represents: -5 + 12 * 4
             new Plus(                    // +
                   new UnaryMinus(        //   -
                         new Integer(5)   //    5
                         ),
                   new Times(             //   *
                         new Integer(12), //    12
                         new Integer(4)   //    4
                         )
                   );

    double result = expression->Evaluate();

    cout << "Result of ";
    expression->Walk();
    cout << " = " << result << endl;

    delete expression; // Clean up expression AST
 
     // Test additional expression functionality

    expression = // AST represents: 53.5 / 45 - 73.54 * (+3 + 8)
             new Minus(
                   new DividedBy(
                         new Real(53.5),
                         new Integer(45)
                         ),
                   new Times(
                         new Real(73.54),
                         new Plus(
                               new UnaryPlus(
                                     new Integer(3)
                                     ),
                               new Integer(8)
                               )
                         )
                   );

    result = expression->Evaluate();
    cout << "Result of ";
    expression->Walk();
    cout << " = " << result << endl;

    delete expression; // Clean up expression AST

     // Another expression

    expression = // AST represents: (53.5 - 45) * (73.54 + -3 + 8)
             new Times(
                   new Minus(
                         new Real(53.5),
                         new Integer(45)
                         ),
                   new Plus(
                         new Real(73.54),
                         new Plus(
                               new UnaryMinus(
                                     new Integer(3)
                                     ),
                               new Integer(8)
                               )
                         )
                   );

    result = expression->Evaluate();
    cout << "Result of ";
    expression->Walk();
    cout << " = " << result << endl;

    delete expression; // Clean up expression AST

    return 0;
}

When you have completed the additional classes, compile this new syntax.cpp
file and run it to get the results of the three expressions. (Cut the above
code from this webpage and paste it into your editor.)

Note: You may
notice that the second and third expressions in the above class are not
correctly parenthesized when printed. Don’t worry about that for now;
we’ll fix that later, in Part C.

Note that, now, you’ll have three .cpp
files to compile and link together.

How hard was it
to add these?

What did you
learn from this exercise?


Part C: Teaching Walk() to Correctly Parenthesize


Note: Before starting to work on this part, you’ll need to change a setting
in your Microsoft Visual C++ project.  See here to find out how to do
that.  Of course, if you’re not using Visual C++, maybe you can ignore
this;  however, you may have to figure out something equivalent in your
C++ environment.

You probably noticed in Part B that there are certain expressions (such as
the second and third expressions in our enhanced main program) where the
output isn’t quite right — it needs extra parentheses to represent the
correct operator precedence.

Let’s fix this. Here are the steps we’ll follow:

  • Modify the Operator
    class as follows:

    • Add a set of public const static data
      members (why static
      and why const?) of type int,
      to indicate various operator precedence levels:
      PREC_HIGHEST, PREC_HIGHER,
      PREC_MEDIUM, PREC_LOWER, PREC_LOWEST
      Note that the actual values of these data members
      don’t matter, except that the values should represent the relative
      precedence values appropriately. I suggest that PREC_LOWEST
      be assigned a value of 0, and that you increase the value by 5 or 10
      for each precedence level above that. That way, if you later needed to
      insert an additional precedence level, you have room to do so.
    • Add a public pure
      virtual member function Precedence() 

      virtual int Precedence() const = 0;

      to force every class that is derived from the Operator
      class to implement this member function.

  • At this point, if you compile your classes again, you’ll find that they
    won’t compile — they generate errors. That’s because we’ve added a
    requirement that all concrete classes that derive from Operator
    implement the Precedence() virtual
    member function, but we haven’t actually implemented those functions yet.

    To solve this, you’ll have to add an implementation of the Precedence()
    member function to every concrete class that extends Operator.

    • Have each member function return one of the values defined in Operator
      (remember that Operator
      is a base class, and so each derived class inherits these values):

      • The UnaryPlus
        and UnaryMinus
        classes should return PREC_HIGHEST,
      • The Times and DividedBy
        classes should return PREC_HIGHER,
        and
      • The Plus and Minus
        classes should return PREC_MEDIUM
  • Once you’ve implemented all these Precedence()
    member functions, the entire project should build again. However, we
    haven’t actually used the member functions anywhere yet, so the behavior
    of our classes will still be as it was before.

    So, let’s finish the job:

    • In each of the Walk()
      member functions, you’ll have to do the following:

      • For each operand, find out whether the operand is an Operator.
        How?

        Hint: Take a look at the dynamic_cast
        operator.

        Note: In Visual C++ 6.0, and
        perhaps later versions thereof, you’ll have to turn on the
        Run-Time Type Information (RTTI) to ensure that the compiler
        generates the proper code for this. (See the above note for a
        link.)

      • If it is an Operator,
        compare its precedence level with the precedence level of the
        current operator.
      • If the precedence level of the operand operator is lower than
        that of the current operator, insert parentheses around the Walk()
        call for that operand.
  • That should do it! Rebuild, and try it.

    In my implementation, the second and third expressions printed out as:

Result of 53.5 / 45 - 73.54 * (+3 + 8) = -807.751
Result of (53.5 - 45) * (73.54 + -3 + 8) = 667.59


 


 




This page was last changed on 30 Apr 2006