Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

11 Solved Problems on Natural Language Processing - Final Exam | CS 5340, Exams of Computer Science

Material Type: Exam; Class: Natural Language; Subject: Computer Science; University: University of Utah; Term: Fall 2006;

Typology: Exams

Pre 2010

Uploaded on 08/30/2009

koofers-user-efp
koofers-user-efp 🇺🇸

5

(1)

10 documents

1 / 17

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
NAME:
CS-5340/6340, Natural Language Processing
Final Exam SOLUTIONS, Fall 2006
1. (15 pts) Indicate whether each sequence of part-of-speech tags would be accepted by
the grammar below.
Grammar
SNP VP
NP art NP1
NP NP2
NP1 adj NP1
NP1 NP2
NP2 noun NP2
NP2 noun NP3
NP2 noun
NP3 prep NP
NP3 prep NP NP3
VP modal VP1
VP aux VP1
VP1 adv VP1
VP1 verb VP2
VP1 verb
VP2 verb VP2
VP2 verb
VP2 adv
(a) noun prep noun modal verb
yes
(b) noun prep noun prep noun modal verb
yes
(c) art noun modal verb
yes
(d) art noun prep noun modal verb
yes
(e) art noun prep noun modal verb adv
yes
1
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff

Partial preview of the text

Download 11 Solved Problems on Natural Language Processing - Final Exam | CS 5340 and more Exams Computer Science in PDF only on Docsity!

NAME:

CS-5340/6340, Natural Language Processing Final Exam – SOLUTIONS, Fall 2006

  1. (15 pts) Indicate whether each sequence of part-of-speech tags would be accepted by the grammar below.

Grammar S → NP VP NP → art NP NP → NP NP1 → adj NP NP1 → NP NP2 → noun NP NP2 → noun NP NP2 → noun NP3 → prep NP NP3 → prep NP NP VP → modal VP VP → aux VP VP1 → adv VP VP1 → verb VP VP1 → verb VP2 → verb VP VP2 → verb VP2 → adv

(a) noun prep noun modal verb yes

(b) noun prep noun prep noun modal verb yes

(c) art noun modal verb yes

(d) art noun prep noun modal verb yes

(e) art noun prep noun modal verb adv yes

(f) art noun prep noun modal verb adv adv no

(g) noun prep noun modal adv adv verb yes

(h) noun prep noun verb adv no

(i) noun noun noun prep noun noun aux verb yes

(j) noun noun prep noun aux adv verb adv yes

(k) noun noun prep noun aux aux adv verb no

(l) noun noun prep noun modal aux adv verb no

(m) art noun noun prep art noun noun aux verb yes

(n) adj noun noun prep art noun aux verb no

(o) art noun prep art adj adj noun noun aux verb verb yes

  1. (15 pts) Suppose you want to use Yarowsky’s word sense disambiguation algorithm to disambiguate between 3 different senses of the word “star”: the astronomy sense (e.g., a star in the sky), the celebrity sense (e.g., Tom Cruise is a star), and the shape sense (e.g., the snowflake was a 7-pointed star). Assume that a thesaurus lists the following words for each category:

astronomy celebrity shape galaxy famous angle moon important circle planet money cluster sky performance geometry universe reputation polygon

Use the 15 sentences in the box below as your “text corpus”. This corpus contains exactly 200 words. The words from the thesaurus appear in boldface.

The universe may have been similar in composition to a neutron star. The children drop the circle, square, and triangle shapes into a box. Geometry tells you a lot about the planet and its distance from the star. Tatum O’Neal, known for her performance in Paper Moon, got married today. It passes bright star Spica in the eastern sky. The five-pointed star is one of the most important symbols in history. The Milky Way gets its name from a Greek myth about the goddess Hera who sprayed milk across the sky. The hour angle is measured along the equator between the HC of the star and the observer’s meridian. Appearing in movies can make you famous and give you power. First you get the money, then you get the star power. Hubble found an important planet that orbits its star every 10 hours. The famous star cluster omega Cen is the largest in the galaxy. The vertices of a star are sorted by angle. Broadway blitzed the showbiz world by announcing that movie star Tom Cruise will appear in a performance of Grease. Volume rendering can be performed more quickly than polygon rendering but has the reputation for being slower.

Compute the Salience value of each word below for the category given. You should as- sume that the context window for a word spans the entire sentence containing the word but does not cross sentence boundaries. All calculations should be case insensitive (i.e., “case”, “Case”, and “CASE” should all be treated as the same word). Ignore punc- tuation marks. Please leave your answers in fractional form and show all of your work!

  • salience(“power”), with respect to celebrity

P (power | CELEBRIT Y ) = (^28) P (power) = 2002 Salience =

2 (^82) 200

  • salience(“power”), with respect to astronomy

P (power | AST RONOMY ) = (^07) P (power) = 2002 Salience =

(^07) 2 200

  • salience(“star”), with respect to celebrity

P (star | CELEBRIT Y ) = (^58) P (star) = 20010 Salience =

(^58) 20010 =^100080 =^252

  1. (12 pts)

(a) Write a script that would represent the typical experience of traveling on a com- mercial airplane.

One possible script would be:

  1. Go to airport.
  2. Wait in ticketing line.
  3. Get boarding pass and check luggage.
  4. Go to gate.
  5. Wait at gate until boarding begins.
  6. Board plane.
  7. Put on seat belt.
  8. Sit in cramped seat during flight.
  9. Depart plane.
  10. Go to baggage claim.
  11. Get baggage.
  12. Go to destination.

(b) List 3 roles that would be relevant to an airplane travel script.

Some possible answers are: pilot, flight attendants, ticket taker, passengers, security agents

(c) List 3 props that would be relevant to an airplane travel script.

Some possible answers are: airplane, seat, seat belt, luggage, ticket, boarding pass, oxygen mask

(d) List 3 settings that would be relevant to an airplane travel script.

Some possible answers are: parking garage, origin airport, security area (metal detectors), inside of airplane, destination airport

  1. (8 pts) For each problem below, state whether it is best characterized as an informa- tion retrieval task, an information extraction task, or a named entity recogni- tion task. You can assume that each problem would be applied to an on-line archive of newspaper articles.

(a) Identifying mentions of buildings or property that were damaged in a hurricane.

Information Extraction

(b) Identifying articles written about the Utah Jazz.

Information Retrieval

(c) Identifying mentions of colleges and universities.

Named Entity Recognition

(d) Identifying the names of companies that were acquired by another company.

Information Extraction

(e) Identifying references to currency (i.e., money amounts).

Named Entity Recognition

(f) Identifying the names of people who won a lawsuit.

Information Extraction

(g) Identifying articles that review Chinese restaurants.

Information Retrieval

(h) Identifying the names of cities.

Named Entity Recognition

  1. (12 pts) The table below shows a sentence, its correct part-of-speech tags (TRUTH), and part-of-speech tags assigned to it by an initial state annotator (INIT).

Bo Zo plans to buy Mary a hat and take Mary TRUTH n n verb inf verb n art n conj verb n INIT n n n prep verb n art n conj n n to Alta because the ski runs will open after the storm TRUTH prep n conj art n n mod verb prep art n INIT prep n conj art n verb n verb prep art verb

(a) Consider a transformation-based learning (TBL) system that uses only this tem- plate: Change tag X to tag Y if the previous word has tag Z. Show all rules that would be generated from this template that would fix at least one POS tagging error.

Change tag n to verb if the previous word has tag n Change tag prep to inf if the previous word has tag n Change tag n to verb if the previous word has tag conj Change tag verb to n if the previous word has tag n Change tag n to mod if the pervious word has tag verb Change tag verb to n if the pervious word has tag art

(b) Consider a transformation-based learning (TBL) system that uses only one tem- plate: Change tag X to tag Y if the previous word is Z. Show all rules that would be generated from this template that would fix at least one POS tagging error.

Change tag n to verb if the previous word is Zo Change tag prep to inf if the previous word is plans Change tag n to verb if the previous word is and Change tag verb to n if the previous word is ski Change tag n to mod if the pervious word is runs Change tag verb to n if the pervious word is the

  1. (4 pts) In statistical approaches to NLP, smoothing techniques can help to generate better statistics for which of the following situations? Answer yes if smoothing will likely help, answer no if smoothing will not necessarily help. (No further explanation is necessary.)

(a) words that did not occur in a large training corpus yes

(b) words that occurred exactly once in a large training corpus yes

(c) words that occurred exactly twice in a large training corpus yes

(d) words that occurred many times in a large training corpus no

  1. (12 pts total) Short-answer questions.

(a) (3 pts) What would be a good baseline method to use when evaluating a text summarization system designed to summarize news articles?

A good baseline system would be to use the “position-based” method and just use the first N sentences of the article as a summary for that article.

(b) (3 pts) Consider the following two sentences:

(1) Bill Smith reluctantly flew to New York City in 2005. | | / | | ___ _|__/ / / | \ ---- / / \ \ / / | \ / -- / -------------- / | | / / / / ------------ | | / / | / /\
| | | | | / / \
(2) William Smith took a plane last year to NYC.

Pretend that sentence (1) and sentence (2) are translations of each other in two different natural languages. Show how the words in these sentences should be aligned by a perfect word alignment algorithm.

The alignments are drawn above, with my lovely character graphics. :)

IMPORTANT: Question #11 is for CS-6340 students ONLY!

  1. (16 pts total) This question relates to the Collins & Singer bootstrapping method for named entity recognition. The predicate Contains(w) is satisfied if a sequence of words contains the word w.

TABLE 1

NP CONTEXT CLASS

Michael Jordan basketball player person Apple Computer a computer company company Jeff Jordan chief executive officer person Jeff Jordan president of PayPal person Jeff Citron chairman and CEO of Vonage person Jeff Citron a CEO person Jordan small country location Apple Gwyneth’s child person

(a) (8 pts) Using only the Contains(w) predicate, fill in the table below with all of the rules that would be generated from the NPs in TABLE 1 and compute the probability of each rule. Leave the probabilities in fractional form!

NP Rule Probability If Contains(“Michael”) → person 1/ If Contains(“Jordan”) → person 3/ If Contains(“Apple”) → company 1/ If Contains(“Computer”) → company 1/ If Contains(“Jeff”) → person 4/ If Contains(“Citron”) → person 2/ If Contains(“Jordan”) → location 1/ If Contains(“Apple”) → person 1/

(b) (3 pts) Using only the NP rules in your previous table that have a probability of 1.0, apply those rules to the instances in TABLE 2 and fill in TABLE 2 with the class that would be assigned to each instance. If no class would be assigned to an example, simply put none.

TABLE 2

NP CONTEXT CLASS Jeff Jones CEO person Citron Inc car company person River Jordan Inc internet corporation none Maine Apple Orchard family farm none Dell Computer Inc computer manufacturer company

(c) (5 pts) Using only the Contains(w) predicate, fill in the table below with all of the Context Rules that would be generated from the instances in TABLE 2 and compute the probability of each rule based on the instances in both TABLE 1 and TABLE 2. Leave the probabilities in fractional form!

Context Rule Probability If Contains(“CEO”) → person 3/ If Contains(“car”) → person 1/ If Contains(“company”) → person 1/ If Contains(“computer”) → company 2/ If Contains(“manufacturer”) → company 1/