Writing Response #7: Vision & Language
The following papers are required reading for the week of November 2nd:
Berzak, Y., Barbu, A., Harari, D., Katz, B., Ullman, S. (2015)
Do You See What I Mean? Visual Resolution
of Linguistic Ambiguities, Proceedings of the 2015 Conference on Empirical Methods on Natural Language Processing,
Lisbon, Portugal, 1477-1487.
Siddharth, N., Babu, A., Siskind, J. M. (2014)
Seeing What You're Told: Sentence-Guided Activity Recognition in Video,
IEEE Conference on Computer Vision and Pattern Recognition.
In advance of Wednesday's presentation and discussion of this reading, you should submit a response to the following question:
The work by Berzak et al. (2015) extends the sentence recognition model described in the Siddharth et al. (2014) paper, which computes a numerical score that captures whether a given sentence depicts an event portrayed in a given video. The model described in the Berzak et al. paper "simultaneously tracks the participants in the events described by the sentence while recognizing the events themselves" (page 1481) and "simultaneously finds the best tracks and the best state sequences for every predicate" (Figure 3 legend) contained in the interpretation of a sentence that may be depicted in a video clip. Why are all of these computations performed simultaneously, rather than sequentially? An alternative approach, for example, is to detect and track all the objects in the scene prior to performing recognition, and to recognize components of the sentence interpretation sequentially. Briefly summarize how the model integrates simultaneous tracking of objects in the video and recognition of the full sentence interpretation, and describe the benefits of this approach.
You should write at most one page (500 words) in response to this question. In addition, you should also submit a question of your own, related to the reading. This writing assignment is due on Tuesday, November 3 at midnight and should be submitted to the 9.523 course site on stellar using your private account: https://stellar.mit.edu/S/course/9/fa15/9.523/. If you have a problem with submitting to stellar, please send your response to Ellen Hildreth at ehildreth@wellesley.edu.