|
CS 232
Assignment 5 Due: Friday, Oct. 27 |
|
Complete the exercises in Labs #6 and #7 on the top-down and bottom-up language parsers.
Given the context-free grammar and lexicon shown below, parse the following sentence using the Top-Down Parser presented in class and illustrated on pages 49 and 51 of the excerpt from the Allen text:
the old man can hold the green can
Be sure to apply the grammar rules in the order that they are listed below:
| Grammar: | Lexicon: |
| S --> NP VP | the: ART |
| NP --> N | old: N, ADJ |
| NP --> ART N | man: N, V |
| NP --> ART ADJ N | can: N, AUX, V |
| VP --> V NP | hold: N, V |
| VP --> AUX VP | green: ADJ, N |
Using the method for hand-simulating the bottom-up chart parsing algorithm described in class and in section 3.4 of the excerpt from the Allen text, parse the following sentence:
the slithy toves gymbled on the wabe
Assume the following grammar and possible categories for the words in the sentence:
| Grammar: | Lexicon: |
| S --> NP VP | the: ART |
| NP --> N | slithy: ADJ N |
| NP --> ART N | toves: N V |
| NP --> ART ADJ N | gymbled: ADJ N V |
| NP --> NP PP | on: P |
| VP --> V | wabe: N |
| VP --> V NP | |
| VP --> V PP | |
| VP --> VP PP | |
| PP --> P NP |
Part a: Transform the following context-free grammar into an equivalent recursive transition network that represents the same grammar. Your recursive transition network grammar should use at most four networks, for S, NP, VP and PP. Try to make your networks as small as possible.
| S --> NP VP | NP2 --> ADJ NP2 |
| VP --> V | NP2 --> NP3 PREPS |
| VP --> V NP | NP3 --> N |
| VP --> V PP | PREPS --> PP |
| NP --> ART NP2 | PREPS --> PP PREPS |
| NP --> NP2 | PP --> P NP |
| NP2 --> N |
Part b: Consider the partial context-free grammar and recursive transition network grammar shown below:
|
|
(1) State two ways in which the languages described by these two grammars differ. For each, give a sample sentence that is recognized by one grammar but not the other and that demonstrates the difference.
(2) Write a new context-free grammar equivalent to the recursive transition network grammar shown above.
(3) Write a new recursive transition network grammar equivalent to the context-free grammar shown above.
Using the recursive transition network grammar shown below and the hand-simulation method described in class and illustrated on pages 64 and 65 of the excerpt of the Allen text, parse the following sentence:
the orange hit the table
Assume that the word orange is either an adjective or a noun, and that the two words hit and table can be either a noun or verb.

In this problem, you will complete an implementation of a syntactic parser that uses
a recursive transition network to represent the grammar. To begin, copy the file
~cs232/download/nlp/RTN.lisp to your individual directory. A particular
grammar is defined in this file, based on the RTN illustrated below. At the beginning of
the RTN.lisp file, there is a global variable defined for each state of the
network. Each variable is assigned to a list of two-element lists, where each two-element
list contains the label of an arc that leads away from the corresponding state and the
new state that the arc leads to. For example, in the RTN below, the state NP1
has two arcs leading away from it: a noun arc leading to state NP2
and an adj arc leading back to NP1. The variable NP1
is therefore assigned to the list ((noun NP2) (adj NP1)). The order of the two
inner lists corresponds to the numbering of the two arcs in the diagram.

There are two functions already defined in the RTN.lisp file,
parser and parse. The top-level function is parser,
which has a single input that is a list of words assumed to represent a complete sentence.
The parser function returns the value VALID if the sentence is a
legal sentence, given this particular RTN grammar, and otherwise returns the value
NOT-VALID (this simple implementation does not record the full parse tree that
was created in the previous parsers that you worked with). The main parsing process is
carried out by the function parse. At each moment during the parsing process,
the possible arcs that can be followed are stored in a list that is provided as input to
the parse function. Similar to the search code that we worked with, this list
serves as a queue in the sense that the parsing process continues by considering the next
arc that is located at the front of the list of possibilities. The lists on the queue
consist of four elements:
(arc-label new-state remaining-words return-list)
arc-label is the label of an arc that can be followed next. Traversing
this arc leads to the state new-state. remaining-words is the list of
words that remain to be parsed, and return-list is a list of states to
return to when pop
arcs are reached. Suppose, for example, that (1) the parsing process
is currently in state NP1, (2) the list of words (red ball hit the
bat) remains to be parsed, and (3) when traversal of the NP network
is complete, the parsing process should return to state S1. At this point,
the following list should appear at the front of the queue:
(ADJ NP1 (RED BALL HIT THE BOAT) (S1))
The parse function currently includes a print statement near
the beginning that prints the current value of queue. A printout of the
results of a completed implementation of the parser is given below, showing the progression
of the value of queue as the parsing proceeds.
To complete this implementation, you need to complete the definitions of two functions,
word-categories and get-next-actions. Your code solution will
be evaluated not only on whether it works correctly, but also on quality. Points will be
deducted if your code is overly cumbersome or unnecessarily repetitive.
? (parser '(the large man hit the green ball with a red bat))
(QUEUE ((NP S1 (THE LARGE MAN HIT THE GREEN BALL WITH A RED BAT)
NIL)))
(QUEUE ((ART NP1 (THE LARGE MAN HIT THE GREEN BALL WITH A RED BAT)
(S1))))
(QUEUE ((ADJ NP1 (LARGE MAN HIT THE GREEN BALL WITH A RED BAT) (S1))))
(QUEUE ((NOUN NP2 (MAN HIT THE GREEN BALL WITH A RED BAT) (S1))))
(QUEUE ((POP S1 (HIT THE GREEN BALL WITH A RED BAT) NIL)))
(QUEUE ((VP S2 (HIT THE GREEN BALL WITH A RED BAT) NIL)))
(QUEUE ((VERB VP1 (HIT THE GREEN BALL WITH A RED BAT) (S2))))
(QUEUE ((POP S2 (THE GREEN BALL WITH A RED BAT) NIL)
(NP VP2 (THE GREEN BALL WITH A RED BAT) (S2))))
(QUEUE ((NP VP2 (THE GREEN BALL WITH A RED BAT) (S2))))
(QUEUE ((ART NP1 (THE GREEN BALL WITH A RED BAT) (VP2 S2))))
(QUEUE ((NOUN NP2 (GREEN BALL WITH A RED BAT) (VP2 S2))
(ADJ NP1 (GREEN BALL WITH A RED BAT) (VP2 S2))))
(QUEUE ((POP VP2 (BALL WITH A RED BAT) (S2))
(ADJ NP1 (GREEN BALL WITH A RED BAT) (VP2 S2))))
(QUEUE ((POP S2 (BALL WITH A RED BAT) NIL)
(PP VP2 (BALL WITH A RED BAT) (S2))
(ADJ NP1 (GREEN BALL WITH A RED BAT) (VP2 S2))))
(QUEUE ((PP VP2 (BALL WITH A RED BAT) (S2))
(ADJ NP1 (GREEN BALL WITH A RED BAT) (VP2 S2))))
(QUEUE ((ADJ NP1 (GREEN BALL WITH A RED BAT) (VP2 S2))))
(QUEUE ((NOUN NP2 (BALL WITH A RED BAT) (VP2 S2))))
(QUEUE ((POP VP2 (WITH A RED BAT) (S2))))
(QUEUE ((POP S2 (WITH A RED BAT) NIL)
(PP VP2 (WITH A RED BAT) (S2))))
(QUEUE ((PP VP2 (WITH A RED BAT) (S2))))
(QUEUE ((PREP PP1 (WITH A RED BAT) (VP2 S2))))
(QUEUE ((NP PP2 (A RED BAT) (VP2 S2))))
(QUEUE ((ART NP1 (A RED BAT) (PP2 VP2 S2))))
(QUEUE ((ADJ NP1 (RED BAT) (PP2 VP2 S2))))
(QUEUE ((NOUN NP2 (BAT) (PP2 VP2 S2))))
(QUEUE ((POP PP2 NIL (VP2 S2))))
(QUEUE ((POP VP2 NIL (S2))))
(QUEUE ((POP S2 NIL NIL)))
VALID
Part a: Complete the definition of the function
word-categories. This function has a single input that is a word, and should
return a list of the possible word categories for this word. For example,
(word-categories 'green) should return the list (ADJ NOUN). The
order of the categories in this list does not matter. Consider the above comments about
the design of your code, and do not use the simple brute force approach that provides a
separate LISP expression for each word recognized by the parser.
Part b: Complete the definition of the function
get-next-actions. The input to this function is a four-element list
of information about an arc to follow in the parsing process, as described above.
This function should return a list of four-element lists that provide information about
new arcs that could be followed in the parsing process, after the input arc is traversed.
For example, if the input action is:
(ART NP1 (THE GREEN BALL HIT THE BOAT) (S1))
then the get-next-actions function should return a list consisting of two
possible new actions:
((NOUN NP2 (GREEN BALL HIT THE BOAT) (S1))
(ADJ NP1 (GREEN BALL HIT THE BOAT) (S1)))
The word green can be an adjective or noun, so both arcs leading from the
state NP1 should be considered, and according to the grammar, the
noun arc should be considered first. Note that you can test the
get-next-actions function directly. When testing it with the
parser function, start with simple sentences.
Part c: Thoroughly test your completed implementation with sentences
that test all aspects of the code. You can create your own vocabulary, but be sure to
include words that have multiple word categories, and sentence examples that require
backing up during the parsing process. Include your test sentences in the
RTN.lisp code file.
Part d: An RTN parser can be implemented in a way that uses either depth-first or breadth-first search through the network. Given the details of the current implementation, what search method is used? What modification is needed to change the implementation to use the "other" search method?
Part e: In class, we observed that there are two ways to parse the following sentence:
the man hit the table with the ball
Given the current RTN grammar used in this problem, can both syntactic structures be identified? Why or why not?
Submission Details: Hand in an electronic copy of your completed
RTN.lisp code file using submit:
submit cs232 RTN RTN.lisp
Also hand in a hardcopy of this code file, together with your answers to the above questions.