CS 232

Assignment 5

Due: Friday, Oct. 27

Problem 1: Top-Down and Bottom-Up Parsers

Complete the exercises in Labs #6 and #7 on the top-down and bottom-up language parsers.

Problem 2: Top-Down Parsing using a Context-Free Grammar

Given the context-free grammar and lexicon shown below, parse the following sentence using the Top-Down Parser presented in class and illustrated on pages 49 and 51 of the excerpt from the Allen text:

   the old man can hold the green can

Be sure to apply the grammar rules in the order that they are listed below:

Grammar:     Lexicon:
S --> NP VP     the: ART
NP --> N     old: N, ADJ
NP --> ART N     man: N, V
NP --> ART ADJ N     can: N, AUX, V
VP --> V NP     hold: N, V
VP --> AUX VP     green: ADJ, N

Problem 3: Simulating the Bottom-Up Chart Parsing Algorithm

Using the method for hand-simulating the bottom-up chart parsing algorithm described in class and in section 3.4 of the excerpt from the Allen text, parse the following sentence:

   the slithy toves gymbled on the wabe

Assume the following grammar and possible categories for the words in the sentence:

Grammar:     Lexicon:
S --> NP VP     the: ART
NP --> N     slithy: ADJ N
NP --> ART N     toves: N V
NP --> ART ADJ N     gymbled: ADJ N V
NP --> NP PP     on: P
VP --> V     wabe: N
VP --> V NP  
VP --> V PP  
VP --> VP PP  
PP --> P NP  

Problem 4: Context-Free vs. RTN Grammars

Part a: Transform the following context-free grammar into an equivalent recursive transition network that represents the same grammar. Your recursive transition network grammar should use at most four networks, for S, NP, VP and PP. Try to make your networks as small as possible.

S --> NP VP     NP2 --> ADJ NP2
VP --> V     NP2 --> NP3 PREPS
VP --> V NP     NP3 --> N
VP --> V PP     PREPS --> PP
NP --> ART NP2     PREPS --> PP PREPS
NP --> NP2     PP --> P NP
NP2 --> N  

Part b: Consider the partial context-free grammar and recursive transition network grammar shown below:

NP --> ART NP1
NP1 --> ADJ N PPS
PPS --> PP
PPS --> PP PPS
PP --> P NP

    

(1) State two ways in which the languages described by these two grammars differ. For each, give a sample sentence that is recognized by one grammar but not the other and that demonstrates the difference.

(2) Write a new context-free grammar equivalent to the recursive transition network grammar shown above.

(3) Write a new recursive transition network grammar equivalent to the context-free grammar shown above.

Problem 4: Parsing with an RTN Grammar

Using the recursive transition network grammar shown below and the hand-simulation method described in class and illustrated on pages 64 and 65 of the excerpt of the Allen text, parse the following sentence:

   the orange hit the table

Assume that the word orange is either an adjective or a noun, and that the two words hit and table can be either a noun or verb.

Problem 5: Implementing an RTN Parser

In this problem, you will complete an implementation of a syntactic parser that uses a recursive transition network to represent the grammar. To begin, copy the file ~cs232/download/nlp/RTN.lisp to your individual directory. A particular grammar is defined in this file, based on the RTN illustrated below. At the beginning of the RTN.lisp file, there is a global variable defined for each state of the network. Each variable is assigned to a list of two-element lists, where each two-element list contains the label of an arc that leads away from the corresponding state and the new state that the arc leads to. For example, in the RTN below, the state NP1 has two arcs leading away from it: a noun arc leading to state NP2 and an adj arc leading back to NP1. The variable NP1 is therefore assigned to the list ((noun NP2) (adj NP1)). The order of the two inner lists corresponds to the numbering of the two arcs in the diagram.

There are two functions already defined in the RTN.lisp file, parser and parse. The top-level function is parser, which has a single input that is a list of words assumed to represent a complete sentence. The parser function returns the value VALID if the sentence is a legal sentence, given this particular RTN grammar, and otherwise returns the value NOT-VALID (this simple implementation does not record the full parse tree that was created in the previous parsers that you worked with). The main parsing process is carried out by the function parse. At each moment during the parsing process, the possible arcs that can be followed are stored in a list that is provided as input to the parse function. Similar to the search code that we worked with, this list serves as a queue in the sense that the parsing process continues by considering the next arc that is located at the front of the list of possibilities. The lists on the queue consist of four elements:

   (arc-label new-state remaining-words return-list)

arc-label is the label of an arc that can be followed next. Traversing this arc leads to the state new-state. remaining-words is the list of words that remain to be parsed, and return-list is a list of states to return to when pop arcs are reached. Suppose, for example, that (1) the parsing process is currently in state NP1, (2) the list of words (red ball hit the bat) remains to be parsed, and (3) when traversal of the NP network is complete, the parsing process should return to state S1. At this point, the following list should appear at the front of the queue:

   (ADJ NP1 (RED BALL HIT THE BOAT) (S1))

The parse function currently includes a print statement near the beginning that prints the current value of queue. A printout of the results of a completed implementation of the parser is given below, showing the progression of the value of queue as the parsing proceeds.

To complete this implementation, you need to complete the definitions of two functions, word-categories and get-next-actions. Your code solution will be evaluated not only on whether it works correctly, but also on quality. Points will be deducted if your code is overly cumbersome or unnecessarily repetitive.

? (parser '(the large man hit the green ball with a red bat))
(QUEUE ((NP S1 (THE LARGE MAN HIT THE GREEN BALL WITH A RED BAT)
        NIL)))
(QUEUE ((ART NP1 (THE LARGE MAN HIT THE GREEN BALL WITH A RED BAT)
        (S1))))
(QUEUE ((ADJ NP1 (LARGE MAN HIT THE GREEN BALL WITH A RED BAT) (S1))))
(QUEUE ((NOUN NP2 (MAN HIT THE GREEN BALL WITH A RED BAT) (S1))))
(QUEUE ((POP S1 (HIT THE GREEN BALL WITH A RED BAT) NIL)))
(QUEUE ((VP S2 (HIT THE GREEN BALL WITH A RED BAT) NIL)))
(QUEUE ((VERB VP1 (HIT THE GREEN BALL WITH A RED BAT) (S2))))
(QUEUE ((POP S2 (THE GREEN BALL WITH A RED BAT) NIL)
        (NP VP2 (THE GREEN BALL WITH A RED BAT) (S2))))
(QUEUE ((NP VP2 (THE GREEN BALL WITH A RED BAT) (S2))))
(QUEUE ((ART NP1 (THE GREEN BALL WITH A RED BAT) (VP2 S2))))
(QUEUE ((NOUN NP2 (GREEN BALL WITH A RED BAT) (VP2 S2))
        (ADJ NP1 (GREEN BALL WITH A RED BAT) (VP2 S2))))
(QUEUE ((POP VP2 (BALL WITH A RED BAT) (S2))
        (ADJ NP1 (GREEN BALL WITH A RED BAT) (VP2 S2))))
(QUEUE ((POP S2 (BALL WITH A RED BAT) NIL)
        (PP VP2 (BALL WITH A RED BAT) (S2))
        (ADJ NP1 (GREEN BALL WITH A RED BAT) (VP2 S2))))
(QUEUE ((PP VP2 (BALL WITH A RED BAT) (S2))
        (ADJ NP1 (GREEN BALL WITH A RED BAT) (VP2 S2))))
(QUEUE ((ADJ NP1 (GREEN BALL WITH A RED BAT) (VP2 S2))))
(QUEUE ((NOUN NP2 (BALL WITH A RED BAT) (VP2 S2))))
(QUEUE ((POP VP2 (WITH A RED BAT) (S2))))
(QUEUE ((POP S2 (WITH A RED BAT) NIL)
        (PP VP2 (WITH A RED BAT) (S2))))
(QUEUE ((PP VP2 (WITH A RED BAT) (S2))))
(QUEUE ((PREP PP1 (WITH A RED BAT) (VP2 S2))))
(QUEUE ((NP PP2 (A RED BAT) (VP2 S2))))
(QUEUE ((ART NP1 (A RED BAT) (PP2 VP2 S2))))
(QUEUE ((ADJ NP1 (RED BAT) (PP2 VP2 S2))))
(QUEUE ((NOUN NP2 (BAT) (PP2 VP2 S2))))
(QUEUE ((POP PP2 NIL (VP2 S2))))
(QUEUE ((POP VP2 NIL (S2))))
(QUEUE ((POP S2 NIL NIL)))
VALID

Part a: Complete the definition of the function word-categories. This function has a single input that is a word, and should return a list of the possible word categories for this word. For example, (word-categories 'green) should return the list (ADJ NOUN). The order of the categories in this list does not matter. Consider the above comments about the design of your code, and do not use the simple brute force approach that provides a separate LISP expression for each word recognized by the parser.

Part b: Complete the definition of the function get-next-actions. The input to this function is a four-element list of information about an arc to follow in the parsing process, as described above. This function should return a list of four-element lists that provide information about new arcs that could be followed in the parsing process, after the input arc is traversed. For example, if the input action is:

   (ART NP1 (THE GREEN BALL HIT THE BOAT) (S1))

then the get-next-actions function should return a list consisting of two possible new actions:

   ((NOUN NP2 (GREEN BALL HIT THE BOAT) (S1))
   (ADJ NP1 (GREEN BALL HIT THE BOAT) (S1)))

The word green can be an adjective or noun, so both arcs leading from the state NP1 should be considered, and according to the grammar, the noun arc should be considered first. Note that you can test the get-next-actions function directly. When testing it with the parser function, start with simple sentences.

Part c: Thoroughly test your completed implementation with sentences that test all aspects of the code. You can create your own vocabulary, but be sure to include words that have multiple word categories, and sentence examples that require backing up during the parsing process. Include your test sentences in the RTN.lisp code file.

Part d: An RTN parser can be implemented in a way that uses either depth-first or breadth-first search through the network. Given the details of the current implementation, what search method is used? What modification is needed to change the implementation to use the "other" search method?

Part e: In class, we observed that there are two ways to parse the following sentence:

   the man hit the table with the ball

Given the current RTN grammar used in this problem, can both syntactic structures be identified? Why or why not?

Submission Details: Hand in an electronic copy of your completed RTN.lisp code file using submit:

   submit cs232 RTN RTN.lisp

Also hand in a hardcopy of this code file, together with your answers to the above questions.