Assignment 7
Home ] Up ] Assignment 1 ] Assignment 2 ] Assignment 3 ] Assignment 4 ] Assignment 5 ] Assignment 6 ] [ Assignment 7 ] Old Assignments ]

 

Abstract Syntax Trees

Description

Various programs (such as compilers and calculators), use a data structure called an Abstract Syntax Tree (AST) to represent syntax that has been recognized by the program.

NOTE: An Abstract Syntax Tree is not the same as a Binary Search Tree. 

In a Binary Search Tree, both interior and leaf nodes can contain data, and the nodes are ordered in such a way that the data can be found by searching the tree in a prescribed order. In an Abstract Syntax Tree, the interior nodes contain operator information, while the leaf nodes contain operand information. 

The order of the nodes within a Binary Search Tree is determined by the contents of each node. The order of the nodes in an AST is determined by the expression that the AST represents, and by the precedence of the expression's components.

For example, the arithmetic expression:

-5+12*4

would be represented in Abstract Syntax Tree form as:

Note: Conventionally, trees in Computer Science are drawn upside down when compared to trees in the real world. Thus, the root of the tree is at the top, not the bottom.

An arithmetic expression uses a convention to distinguish between the different possible interpretations. For example, the expression:

-5+12*4

could be interpreted as either:

(-5+12)*4	

or:

-5+(12*4)

The latter is the conventionally correct interpretation, because the multiplication operator conventionally has higher precedence than the addition operator. However, the Abstract Syntax Tree representation is unambiguous, which is one of the reasons it is so popular.

In the above AST diagram, each element in the tree is connected to other items in the tree by lines. An element in an Abstract Syntax Tree is usually called a Node. Nodes can either be interior nodes or leaf nodes. Interior nodes are those that have one or more lines connecting them with other nodes further down the tree. Leaf nodes have no such connections; they are the lowest nodes in that particular branch of the tree. We say that interior nodes have child nodes, while leaf nodes do not.

In this example, interior nodes are used to represent the arithmetic operators (+,-,*,/, etc.). Arithmetic operators can be unary (operating on a single operand), or binary (operating on two operands). A unary operator has exactly one child node, while a binary operator has exactly two child nodes. For example, the minus sign in the above expression represents the unary minus operator, while the asterisk represents the binary multiplication operator, and the plus sign represents the binary addition operator.

A node below an operand in the tree can represent a value such as 5 or 12 (in the above diagram, all values are integer values). If the node represents a simple value, then it must be a leaf node. If it represents an expression (consisting of some combination of operators and operands), then the node must represent an operator, and therefore must be an interior node.

For example, imagine you wish to modify the original expression to look like this:

-(5/23)+12*(4-2)

The Abstract Syntax Tree for this new expression looks like:

Note that we have replaced the original leaf nodes for 5 and 4 by subtrees which represent the new subexpressions. The top node in each of these subexpressions represents an operator; which operator depends on the relative precedence of the operators in the subexpression.

A program can evaluate such a tree by walking the tree in a prescribed order: at each interior node it determines the value from each subtree, and applies the operator to those values to produce a value for the tree represented by that interior node. A leaf node provides a value directly. A program can also walk the tree in the same fashion to produce a printout of the equivalent arithmetic expression (although it would have to insert parentheses appropriately to maintain the proper operator precedence).

The Assignment

Your job is to write a set of classes which will allow you to construct such an abstract syntax tree, and then walk it, and evaluate it. Here's an abstract Node class to start with:

#ifndef _NODES_H
#define _NODES_H
//
// nodes.h
//
// The Node base class

class Node
{
  public:
    virtual ~Node() {}

    virtual double Evaluate() const = 0; 
    virtual void Walk() const = 0;
};
#endif

This class is to be used as a base class for a family of node classes which can be used to construct an AST and then evaluate and walk it.  Cut it from this web page and paste it into your editor.

The member function Evaluate() returns the value for the (sub)tree, starting from that node on down. (Notice that it returns a double, not an int. This is because we would like to be able to evaluate expressions that contain more than integer values -- see later.) For example, calling Evaluate() for the node that represents the binary multiplication operator, *, in the original expression, will return a value of 48.0 (12*4). Calling Evaluate() for the node at the top of the tree will return the value of the entire expression.

The member function Walk() prints out (on cout) the arithmetic expression represented by that (sub)tree. In the original example expression, calling Walk() for the node that represents the unary minus operator will print out:

-5

while calling Walk() for the node that represents the binary multiplication operator will print out:

12 * 4

Calling Walk() for the node at the top of the tree will print out the entire expression:

-5 + 12 * 4

Questions:

  • What makes the Node class abstract?
  • Why must Node::~Node() be virtual?

Part A: Integer, UnaryMinus, Plus and Times

The first task is to write a set of classes: Integer, UnaryMinus, Plus and Times, which use single inheritance and virtual functions, and which can be used in the following main program:

//
// syntax.cpp
//
// Test evaluation of an abstract syntax tree.
//

#include <iostream>
using namespace std;
#include "nodes.h"

int main()
{
    Node *expression = // AST represents: -5 + 12 * 4
             new Plus(                    // +
                   new UnaryMinus(        //   -
                         new Integer(5)   //    5
                         ),
                   new Times(             //   *
                         new Integer(12), //    12
                         new Integer(4)   //    4
                         )
                   );

    double result = expression->Evaluate();

    cout << "Result of ";
    expression->Walk();
    cout << " = " << result << endl;

    delete expression; // Clean up expression AST
 
    // Test additional expression functionality

    expression = // AST represents: -(5 + 12 * 4)
	    new UnaryMinus(
		   new Plus(
			new Integer(5),
			new Times(
				new Integer(12),
				new Integer(4)
				)
			)
		);

    result = expression->Evaluate();
    cout << "Result of ";
    expression->Walk();
    cout << " = " << result << endl;

    delete expression; // Clean up expression AST

    return 0;
}

Note: In all of the classes you write, be sure to implement any non-trivial member functions in a separate .cpp file, not in the .h file.   In Part 1, this will be in file nodes.cpp; in Part 2, in file extnodes.cpp. Any code depending on <iostream> should be placed in the .cpp file.

Sequence of steps to perform:

  1. In nodes.h, write a class Integer which is publically derived from class Node.  The  class will have a data member which will contain the integer value for that instance of Integer. The class will also have a constructor which allows an integer value to be passed in as a parameter, and virtual functions Evaluate() and Walk().  Instances of class Integer will be leaf nodes in the Abstract Syntax Tree.  Any non-inline member function implementations should go in file nodes.cpp.
  1. Now it's time to look at interior (non-leaf) nodes: the operators. As mentioned before, these break down into two subclasses -- unary and binary operators. However, all operators share some degree of commonality, so, first, create an abstract class Operator, which is publically derived from Node, and which contains nothing but a single protected pure virtual member function:
virtual const char *Name() const = 0; 

This will force every non-abstract class that derives from the Operator class to implement a virtual function with this signature. The Name() virtual member function is used to output the operator's symbol (+, -, *, etc.). The Name() function will be called from the Walk() function (see later). (Note that you can add appropriate spacing around the operator name to obtain pleasing results when the expression is output.)

  1. All unary operators share some common attributes. Naturally, every unary operator is an Operator. In particular, each unary operator needs only a single pointer to its operand (child) node. Create an abstract class, UnaryOperator, derived from Operator. This class will serve as the base class for all unary operators. It will contain data members and member functions common to all unary operators.

The UnaryOperator class should have a single data member to point to its operand. What type should this data member be? (Hint: read the above paragraph carefully.) Make this data member private! Provide a protected member function (with an appropriate name) to allow derived classes to access, but not modify it. Make the constructor and destructor for UnaryOperator be protected.

Note: The test program assumes that the tree may be cleaned up by performing a delete. The easiest way to accomplish this is to assume that every Node will be allocated (new) from the heap, and so can be validly deleted. Do the appropriate delete(s) in the UnaryOperator destructor.

Implement the Walk() virtual member function in the UnaryOperator class. [Why there?]  Make it protected

Hint: This Walk() function will make recursive calls to itself and other virtual Walk() member functions.

  1. Create the concrete (non-abstract) class UnaryMinus, derived from UnaryOperator. It should have a very simple public constructor (with what for arguments?). Implement the Evaluate() and Name() virtual functions for this class.
  1. All binary operators share some common attributes. Naturally, every binary operator is an Operator. In addition, each binary operator needs two pointers -- one to its left operand (child) node, and one to its right operand (child) node. Create an abstract class, BinaryOperator, publically derived from Operator. This class will serve as the base class for all binary operators. It will contain data members and member functions common to all binary operators.

The BinaryOperator class should have a two data members. What type should each one be? Hint: read the above paragraph carefully.) Make these data members private. Provide protected access member functions (with appropriate names) to allow derived classes to access, but not modify them. Make the constructor and destructor for BinaryOperator be protected.

Note: The test program assumes that the tree may be cleaned up by performing a delete. The easiest way to accomplish this is to assume that every Node will be allocated (new) from the heap, and so can be validly deleted. Do the appropriate delete(s) in the UnaryOperator destructor.

Implement the Walk() virtual member function in the BinaryOperator class. Make it protected

Hint: This Walk() function will make recursive calls to itself and other virtual Walk() member functions.

  1. Create the classes Plus and Times, each publically derived from BinaryOperator. They should have a very simple public constructor (with what for arguments?)   Implement the Evaluate() and Name() virtual functions for these classes.
  1. By now, you should be able to compile and link the entire syntax program, and run it. (Note that you will have to compile and link together two .cpp files)  

    Does it work? If so, do you understand how?

    My version produced the following output for the first expression:

    Result of -5 + 12 * 4 = 43.0

Part B: Adding More Node Types

Requirement: From now on, consider all the .h and .cpp files you developed in Part A locked. This means that you cannot add or change anything in those files.

Create new files extnodes.h and extnodes.cpp, and add the additional node classes to them (not to nodes.h or nodes.cpp):

  1. A Real class that will hold floating point (double) values.
  1. A UnaryPlus class.
  1. A Minus class.
  1. A DividedBy class

Here's an extended test program to test your work:

//
// syntax.cpp
//
// Test evaluation of an abstract syntax tree.
//

#include <iostream>
using namespace std;
#include "nodes.h"
#include "extnodes.h"   // For Real, UnaryPlus, Minus and DividedBy

int main()
{
    Node *expression = // AST represents: -5 + 12 * 4
             new Plus(                    // +
                   new UnaryMinus(        //   -
                         new Integer(5)   //    5
                         ),
                   new Times(             //   *
                         new Integer(12), //    12
                         new Integer(4)   //    4
                         )
                   );

    double result = expression->Evaluate();

    cout << "Result of ";
    expression->Walk();
    cout << " = " << result << endl;

    delete expression; // Clean up expression AST
 
     // Test additional expression functionality

    expression = // AST represents: 53.5 / 45 - 73.54 * (+3 + 8)
             new Minus(
                   new DividedBy(
                         new Real(53.5),
                         new Integer(45)
                         ),
                   new Times(
                         new Real(73.54),
                         new Plus(
                               new UnaryPlus(
                                     new Integer(3)
                                     ),
                               new Integer(8)
                               )
                         )
                   );

    result = expression->Evaluate();
    cout << "Result of ";
    expression->Walk();
    cout << " = " << result << endl;

    delete expression; // Clean up expression AST

     // Another expression

    expression = // AST represents: (53.5 - 45) * (73.54 + -3 + 8)
             new Times(
                   new Minus(
                         new Real(53.5),
                         new Integer(45)
                         ),
                   new Plus(
                         new Real(73.54),
                         new Plus(
                               new UnaryMinus(
                                     new Integer(3)
                                     ),
                               new Integer(8)
                               )
                         )
                   );

    result = expression->Evaluate();
    cout << "Result of ";
    expression->Walk();
    cout << " = " << result << endl;

    delete expression; // Clean up expression AST

    return 0;
}

When you have completed the additional classes, compile this new syntax.cpp file and run it to get the results of the three expressions. (Cut the above code from this webpage and paste it into your editor.)

Note: You may notice that the second and third expressions in the above class are not correctly parenthesized when printed. Don't worry about that for now; we'll fix that later, in Part C.

Note that, now, you'll have three .cpp files to compile and link together.

How hard was it to add these?

What did you learn from this exercise?

Part C: Teaching Walk() to Correctly Parenthesize

Note: Before starting to work on this part, you'll need to change a setting in your Microsoft Visual C++ project.  See here to find out how to do that.  Of course, if you're not using Visual C++, maybe you can ignore this;  however, you may have to figure out something equivalent in your C++ environment.

You probably noticed in Part B that there are certain expressions (such as the second and third expressions in our enhanced main program) where the output isn't quite right -- it needs extra parentheses to represent the correct operator precedence.

Let's fix this. Here are the steps we'll follow:

  • Modify the Operator class as follows:
    • Add a set of public const static data members (why static and why const?) of type int, to indicate various operator precedence levels:
      PREC_HIGHEST, PREC_HIGHER, PREC_MEDIUM, PREC_LOWER, PREC_LOWEST
      Note that the actual values of these data members don't matter, except that the values should represent the relative precedence values appropriately. I suggest that PREC_LOWEST be assigned a value of 0, and that you increase the value by 5 or 10 for each precedence level above that. That way, if you later needed to insert an additional precedence level, you have room to do so.
    • Add a public pure virtual member function Precedence() 
      virtual int Precedence() const = 0;

      to force every class that is derived from the Operator class to implement this member function.

  • At this point, if you compile your classes again, you'll find that they won't compile -- they generate errors. That's because we've added a requirement that all concrete classes that derive from Operator implement the Precedence() virtual member function, but we haven't actually implemented those functions yet.

    To solve this, you'll have to add an implementation of the Precedence() member function to every concrete class that extends Operator.

    • Have each member function return one of the values defined in Operator (remember that Operator is a base class, and so each derived class inherits these values):
      • The UnaryPlus and UnaryMinus classes should return PREC_HIGHEST,
      • The Times and DividedBy classes should return PREC_HIGHER, and
      • The Plus and Minus classes should return PREC_MEDIUM
  • Once you've implemented all these Precedence() member functions, the entire project should build again. However, we haven't actually used the member functions anywhere yet, so the behavior of our classes will still be as it was before.

    So, let's finish the job:

    • In each of the Walk() member functions, you'll have to do the following:
      • For each operand, find out whether the operand is an Operator. How?

        Hint: Take a look at the dynamic_cast operator.

        Note: In Visual C++ 6.0, and perhaps later versions thereof, you'll have to turn on the Run-Time Type Information (RTTI) to ensure that the compiler generates the proper code for this. (See the above note for a link.)

      • If it is an Operator, compare its precedence level with the precedence level of the current operator.
      • If the precedence level of the operand operator is lower than that of the current operator, insert parentheses around the Walk() call for that operand.
  • That should do it! Rebuild, and try it.
    In my implementation, the second and third expressions printed out as:
Result of 53.5 / 45 - 73.54 * (+3 + 8) = -807.751
Result of (53.5 - 45) * (73.54 + -3 + 8) = 667.59

 

 

This page was last changed on 30 Apr 2006