Assignment 6
Home ] Up ] Assignment 1 ] Assignment 2 ] Assignment 3 ] Assignment 4 ] Assignment 5 ] [ Assignment 6 ] Assignment 7 ] Assignment 8 ]

 

Assignment 6a
Assignment 6b
Assignment 6c

 

Abstract Syntax Trees

Description

Various programs (such as compilers and calculators), use a data structure called an Abstract Syntax Tree (AST) to represent syntax that has been recognized by the program.

Note: An Abstract Syntax Tree is not the same as a Binary Search Tree.

In a Binary Search Tree, both interior and leaf nodes can contain data, and the nodes are ordered in such a way that the data can be found by searching the tree in a prescribed order. In an Abstract Syntax Tree, the interior nodes contain operator information, while the leaf nodes contain operand information.

The order of the nodes within a Binary Search Tree is determined by the contents of each node. The order of the nodes in an AST is determined by the expression that the AST represents, and by the precedence of the expression's components.

For example, the arithmetic expression:

-5+12*4

would be represented in Abstract Syntax Tree form as:

Note: Conventionally, trees in Computer Science are drawn upside down when compared to trees in the real world. Thus, the root of the tree is at the top, not the bottom.

An arithmetic expression uses a convention to distinguish between the different possible interpretations. For example, the expression:

-5+12*4

could be interpreted as either:

(-5+12)*4

or:

-5+(12*4)

The latter is the conventionally correct interpretation, because the multiplication operator conventionally has higher precedence than the addition operator. However, the Abstract Syntax Tree representation is inherently unambiguous, which is one of the reasons it is so popular.

In the above AST diagram, each element in the tree is connected to other items in the tree by lines. An element in an Abstract Syntax Tree is usually called a Node. Nodes can either be interior nodes or leaf nodes. Interior nodes are those that have one or more lines connecting them with other nodes further down the tree. Leaf nodes have no such connections; they are the lowest nodes in that particular branch of the tree. We say that interior nodes have child nodes, while leaf nodes do not.

In this example, interior nodes are used to represent the arithmetic operators (+, -, *, /, etc.). Arithmetic operators can be unary (operating on a single operand), or binary (operating on two operands). A unary operator has exactly one child node, while a binary operator has exactly two child nodes. For example, the minus sign in the above expression represents the unary minus operator, while the asterisk represents the binary multiplication operator, and the plus sign represents the binary addition operator.

A node below an operand in the tree can represent a value such as 5 or 12 (in the above diagram, all values are integer values). If the node represents a simple value, then it must be a leaf node. If it represents an expression (consisting of some combination of operators and operands), then the node must represent an operator, and therefore must be an interior node.

For example, imagine you wish to modify the original expression to look like this:

-(5/23)+12*(4-2)

The Abstract Syntax Tree for this new expression looks like:

Note that we have replaced the original leaf nodes for 5 and 4 by subtrees which represent the new subexpressions. The top node in each of these subexpressions represents an operator; which operator depends on the relative precedence of the operators in the subexpression.

A program can evaluate such a tree by walking the tree in a prescribed order: at each interior node it determines the value from each subtree, and applies the operator to those values to produce a value for the tree represented by that interior node. A leaf node provides a value directly. A program can also walk the tree in the same fashion to produce a printout of the equivalent arithmetic expression (although it would have to insert parentheses appropriately to maintain the operator precedence).

The Assignment

Your job is to write a set of classes which will allow you to construct such an abstract syntax tree, and then walk it, and evaluate it. Here's an abstract Node class to start with:

package abstractSyntaxTree;

/**
 *  A class to represent a node in an Abstract Syntax Tree.
 *  <p>
 *  @author: Bryan J. Higgs, 5 September 1998
 *  <p>
 *  @version 1.0
 */

public abstract class Node
{
    public abstract double evaluate();
    public abstract void walk();
}

This class is to be used as a base class for a family of node classes which can be used to construct an AST and then evaluate and walk it. Cut it from this web page and paste it into your editor.

The evaluate() method returns the value for the (sub)tree, starting from that node on down. (Notice that it returns a double, not an int. This is because we would like to be able to evaluate expressions that contain more than integer values -- see later.) For example, calling evaluate() for the node that represents the binary multiplication operator, *, in the original expression, will return a value of 48.0 (12*4). Calling evaluate() for the node at the top of the tree will return the value of the entire expression.

The walk() method prints out (on System.out) the arithmetic expression represented by that (sub)tree. In the original example expression, calling walk() for the node that represents the unary minus operator will print out:

-5

while calling walk() for the node that represents the binary multiplication operator will print out:

12 * 4

Calling walk() for the node at the top of the tree will print out the entire expression:

-5 + 12 * 4

We will break the assignment into three parts:

  1. Integer, UnaryMinus, Plus and Times
  2. Adding More Node Types
  3. Teaching walk() to Correctly Parenthesize
 

This page was last modified on 02 October, 2007