5 분 소요

Question Answering

  • The task of answering a (natural language) question
  • One of the oldest NLP tasks (punched card systems in 1961)
    • Idea: Doing dependency parsing on question and searching most simliar answer

image

image

image

QUESTION ANSWERING

- Factoid QA: the answer is a single entity / numeric - "who wrote the book Dracula?"
- Non-factoid QA: answer is free text - "why is Dracula so evil?"

QA subtypes (could be factoid or non-factoid):

  • Semantic parsing: question is mapped to a logical form which is then executed over some database
    • “how many people did Dracula bite?”
  • Reading comprehension: answer is a span of text within a document
  • Community-based QA: question is answerd by multiple web users
  • Visual QA: questions about images

Reading Comprehension

image

Textual Question Answering

image

Conversational Question Answering

image

Long-form Question answering

image

Open-domain Question Answering

image

Knowledge Base Question Answering

image

Table-based Question Answering

image

Visual Question Answering

image

Stanford Question Answering Dataset (SQuAD)

image

image

METHODS FOR QUESTION ANSWERING

Feature Based Methods

image

Bi-Directional Attention Flow for Machine Comprehension image

Coattention Encoder image

BiLSTM-based Models

  • Encode the question using word/char embeddings
    • pass on an biLSTM encoder
  • Encode the passage similarly
  • Passage-to-question and question-to-passage attention
  • Modeling layer: another BiLSTM layer
  • Output layer: two classifiers for predicting start and end points
  • The entire model can be trained in an end-to-end way

BERT-based Models

  • Concatenate question and passage as one single sequence separated with a [SEP] token, then pass it to the BERT encoder
  • Train two classifiers on top of passage tokens

image

Experiments on SQuAD image

Is Reading Comprehension Solved? image

image

Question Aswering Datasets

  • Reading comprehension
    • CNN/Daily Mail, CoQA, HotpotQA, QuAC, RACE, SQuAD, SWAG, Receipt QA, NarrativeQA, DROP, Story Cloze Test, …
  • Open-domain question answering
    • DuReader, Quasar, SearchQA, …
  • Knowledge base question answering
  • More datasets
    • http://nlpprogress.com/english/question_answering.html

READING COMPREHENSION

image

CNN Article image

  • Dataset Statistics
    • Articles were collected from April 2007 for CNN and June 2010 for Daily Mal, until the end of April 2015
    • Validation data is from March, test data from April 2015
  • Question Diffuculty
    • Distribution (in percent) of queries over category and number of context sentences required to answer them based on a subset of the CNN validate data

image

  • Simple Baselines

image

Reading vs Encoding

Use neural encoding models for estimating the probability of word type 𝑎𝑎from document 𝑑𝑑answering query 𝑞𝑞: (a|d,q) ∝ exp(W(a)g(d,q)) where

  • W(a) indexes row 𝑎𝑎of weight matrix W
  • g(d,q) returns a vector embedding of a document and query pair

Deep LSTM Reader

image

image

image

Attentive Reader

image

image

image

Attentive Reader Training image

image

Impatient Reader

image

image

image

image

Summary

  • Supervised machine reading is a viable research direction with the available data
  • LSTM based recurrent networks constantly surprise with their ability to encode dependencies in sequences
  • Attention is a very effective and flexible modeling technique

OPEN-DOMAIN QUESTION ANSWERING

  • Simmons et al. (1964) did first exploration of answering questions from an expository text based on matching dependency parses of a question and answer
  • Murax(Kupiec1993) aimed to answer questions over an online encyclopedia using IR and shallow linguistic processing
  • The NIST TREC QA track begun in 1999 first rigorously investigated answering fact questions over a large collection of documents
  • IBM’s Jeopardy! System (DeepQA, 2011) brought attention to a version of the problem; it used an ensemble of many methods
  • DrQA(Chen et al. 2017) uses IR followed by neural reading comprehension to bring deep learning to Open-domain QA

IBM’s Watson and Jeopardy! Challenge image

Compared to Reading Comprehension?

  • Challenges from both large-scale open-domain QA and of machine comprehension
  • The question can be any open-domain questions
    • Instead of quesetions posed after reading the passage

Traditional QA System

image

QA System

image

DrQA: Two-Stage Retriever and Reader image

Document Retriever: Two Steps

  1. TF-IDF bag-of-words vector
  2. Efficient bigram hashing (Weinberger et al., 2009)
    • Map the bigram to 224 bins with an unsigned murmur3 hash
    • Preserving speed and memory efficiency - Murmur3: map a word or string to a 32-bit or 128 bit value

Document Reader

image

Paragraph Encoding

image

  • Represent kokens in a paragraph as a sequence of feature vectors
    • Word embedding
    • Exact match
    • Token features
    • Aighned question embedding
  • Pass features as the input to a RNN (multi-layer Bidirectional LSTM)

Question Encoding

image

  • Apply another RNN to top of word embeddings of qi and get qj
  • Combining the resulting units into one single vector image

Prediction

image

image

Dataset and Example Training Data & Result

image

image

Summary: DrQA

  • DrQAwas the first attempt to scale up reading comprehension to open-domain question answering, by combining IR techniques and neural reading comprehension models
  • Although we achieved good accuracy on SQuADin 2017, the final QA accuracy still remains low:
  • Distant supervision + multi-task learning helps

Limitations of Current Models image

Latent Retrieval for Weakly Supervised Open Domain Question Answering

image image

Information Retrieval (IR) image

Reader (QA) image

Result image

MULTI-HOP QUESTION ANSWERING

  • Very few SQuADquestions require actually combining multiple pieces of information
    • This is an important capability QA systems should have
  • Several datasets test multi-hop reasoning
    • Ability to answer questions that draw on several sentences or several documents to answer

WikiHop

  • Annotators shown Wikipedia and asked to pose a simple question linking two entities that require a third (bridging) entity to associate
  • A model shouldn’t be able to answer these without doing some reasoning about the intermediate entity
  • image

HotpotQA

image

Multi-Hop Reasoning

  • This is idealized version of multi-hop reasoning
  • Do models need to do this to do well on this task?

image

  • Model can ignore the bridging entity and directly predict the answer

image

  • No simple lexical overlap
  • But only one government position appears in the context

image

NEW TYPES OF QA

DROP

  • Let’s build QA datasets to help the community focus on modeling particular things

image

  • Question types: subtraction, comparison (which did he visit first), counting and sorting (which kecker kecked more field goals),
  • Invites ad hoc solutions (structure the model around predicting differences between numbers)

NarrativeQA

  • Humans see a summary of a book
    • … Peter’s former girlfriend Dana Barrett has had a son, Oscar …
  • Question: How is Oscar related to Dana?
  • Answering these questions from the source text (not summary) requires complex inferences and is extremely challenging

image

Summary

  • Lots of problems with current QA settings, lots of new datasets
  • Models can often work well for one QA task but don’t generalize
  • We still do not have (solvable) QA settings which seem to require really complex reasoning as opposed to surface-level pattern recognition

댓글남기기