SUNY Geneseo, Department of Computer Science


Homework 9

CSci 342, Fall 2001

Prof. Doug Baldwin

Complete by Monday, November 26
Grade by Thursday, November 29

Purpose

This exercise is intended to help you appreciate the practical applications of context-free languages and push-down automata.

The exercise does these things by asking you to write an LL(1) parser for a subset of HTML.

Background

A Web browser is essentially an interpreter for HTML, i.e., a program that reads a stream of HTML text and interprets the tags in that stream in order to produce a nice rendering of the text. Interpreters of any sort often use parse trees as their central representation of whatever they are interpreting, since a parse tree makes operator-operand relationships explicit, and implies a great deal about the meaning of the text being interpreted. Thus a parser for HTML can form an essential front-end for a Web browser.

We discussed LL(1) parsing in class, in the meetings of October 31, November 2, and November 5. In addition to these lectures, any compiler design textbook will discuss LL(1) parsing and construction of LL(1) parse tables.

Exercise

Your assignment consists of two steps. For both of these steps, I assume that you have a scanner from homework 5 available -- either your own, one borrowed from a classmate, or mine.

Grammar

First, devise an LL(1) grammar for a subset of HTML. A subset containing between 5 and 10 tag types (counting an opening tag and the corresponding closing tag as one "tag type") should be big enough to be interesting and yet not so big as to be overwhelming. You can work with a larger subset if you wish. Within these guidelines, I don't particularly care what subset you choose.

Your grammar can treat as terminals any tokens identifiable by the scanner you will use -- presumably this means things like tags of various types and text blocks.

Parser

Second, write an LL(1) parser for the HTML subset you chose above. You can write your parser in any language you like. However, you may not use language features or other tools that act as parser generators or similar context free language processors.

Your parser should be a function (or message) that returns a parse tree for a Web page. The parser can take whatever arguments you find useful. We didn't explicitly discuss building parse trees in the LL(1) parsing algorithm we developed in class. However, there are two general approaches you can take:

I expect that your parser will call your scanner in order to fetch tokens from a Web page. If the scanner is written right, it should provide some function that the parser can call any time it needs a new token, i.e., wherever the LL(1) parsing pseudocode developed in our lectures says something like "advance input" or "get next input symbol".

The easiest way to demonstrate that your parser works is to write a driver that uses the parser to generate a parse tree from a Web page, and that then prints that parse tree.

Follow-Up

I will grade your parser in the usual fifteen minute face-to-face meeting. I would like to both read and run the parser during this meeting, so have your parser ready to go at the time of the meeting. I can go to a lab with you, telnet to a remote computer, see the parser on a laptop computer you bring to the meeting, etc. However, do not expect me to be able to copy the parser to my own computer and get it running during a fifteen minute meeting.

I will also want to see the grammar you developed during your meeting. You can bring it on a separate sheet of paper, or place it in comments in your program.