[NLP] Question Answering

5 분 소요

Question Answering

The task of answering a (natural language) question
One of the oldest NLP tasks (punched card systems in 1961)
- Idea: Doing dependency parsing on question and searching most simliar answer

QUESTION ANSWERING

- Factoid QA: the answer is a single entity / numeric - "who wrote the book Dracula?"
- Non-factoid QA: answer is free text - "why is Dracula so evil?"

QA subtypes (could be factoid or non-factoid):

Semantic parsing: question is mapped to a logical form which is then executed over some database
- “how many people did Dracula bite?”
Reading comprehension: answer is a span of text within a document
Community-based QA: question is answerd by multiple web users
Visual QA: questions about images

Reading Comprehension

Textual Question Answering

Conversational Question Answering

Long-form Question answering

Open-domain Question Answering

Knowledge Base Question Answering

Table-based Question Answering

Visual Question Answering

Stanford Question Answering Dataset (SQuAD)

METHODS FOR QUESTION ANSWERING

Feature Based Methods

Bi-Directional Attention Flow for Machine Comprehension

Coattention Encoder

BiLSTM-based Models

Encode the question using word/char embeddings
- pass on an biLSTM encoder
Encode the passage similarly
Passage-to-question and question-to-passage attention
Modeling layer: another BiLSTM layer
Output layer: two classifiers for predicting start and end points
The entire model can be trained in an end-to-end way

BERT-based Models

Concatenate question and passage as one single sequence separated with a [SEP] token, then pass it to the BERT encoder
Train two classifiers on top of passage tokens

Experiments on SQuAD

Is Reading Comprehension Solved?

Question Aswering Datasets

Reading comprehension
- CNN/Daily Mail, CoQA, HotpotQA, QuAC, RACE, SQuAD, SWAG, Receipt QA, NarrativeQA, DROP, Story Cloze Test, …
Open-domain question answering
- DuReader, Quasar, SearchQA, …
Knowledge base question answering
More datasets
- http://nlpprogress.com/english/question_answering.html

READING COMPREHENSION

CNN Article

Dataset Statistics
- Articles were collected from April 2007 for CNN and June 2010 for Daily Mal, until the end of April 2015
- Validation data is from March, test data from April 2015
Question Diffuculty
- Distribution (in percent) of queries over category and number of context sentences required to answer them based on a subset of the CNN validate data

Simple Baselines

Reading vs Encoding

Use neural encoding models for estimating the probability of word type 𝑎𝑎from document 𝑑𝑑answering query 𝑞𝑞: (a|d,q) ∝ exp(W(a)g(d,q)) where

W(a) indexes row 𝑎𝑎of weight matrix W
g(d,q) returns a vector embedding of a document and query pair

Deep LSTM Reader

Attentive Reader

Attentive Reader Training

Impatient Reader

Summary

Supervised machine reading is a viable research direction with the available data
LSTM based recurrent networks constantly surprise with their ability to encode dependencies in sequences
Attention is a very effective and flexible modeling technique

OPEN-DOMAIN QUESTION ANSWERING

Simmons et al. (1964) did first exploration of answering questions from an expository text based on matching dependency parses of a question and answer
Murax(Kupiec1993) aimed to answer questions over an online encyclopedia using IR and shallow linguistic processing
The NIST TREC QA track begun in 1999 first rigorously investigated answering fact questions over a large collection of documents
IBM’s Jeopardy! System (DeepQA, 2011) brought attention to a version of the problem; it used an ensemble of many methods
DrQA(Chen et al. 2017) uses IR followed by neural reading comprehension to bring deep learning to Open-domain QA

IBM’s Watson and Jeopardy! Challenge

Compared to Reading Comprehension?

Challenges from both large-scale open-domain QA and of machine comprehension
The question can be any open-domain questions
- Instead of quesetions posed after reading the passage

Traditional QA System

QA System

DrQA: Two-Stage Retriever and Reader

Document Retriever: Two Steps

TF-IDF bag-of-words vector
Efficient bigram hashing (Weinberger et al., 2009)
- Map the bigram to 224 bins with an unsigned murmur3 hash
- Preserving speed and memory efficiency - Murmur3: map a word or string to a 32-bit or 128 bit value

Document Reader

Paragraph Encoding

Represent kokens in a paragraph as a sequence of feature vectors
- Word embedding
- Exact match
- Token features
- Aighned question embedding
Pass features as the input to a RNN (multi-layer Bidirectional LSTM)

Question Encoding

Apply another RNN to top of word embeddings of qi and get qj
Combining the resulting units into one single vector

Prediction

Dataset and Example Training Data & Result

Summary: DrQA

DrQAwas the first attempt to scale up reading comprehension to open-domain question answering, by combining IR techniques and neural reading comprehension models
Although we achieved good accuracy on SQuADin 2017, the final QA accuracy still remains low:
Distant supervision + multi-task learning helps

Limitations of Current Models

Latent Retrieval for Weakly Supervised Open Domain Question Answering

Information Retrieval (IR)

Reader (QA)

Result

MULTI-HOP QUESTION ANSWERING

Very few SQuADquestions require actually combining multiple pieces of information
- This is an important capability QA systems should have
Several datasets test multi-hop reasoning
- Ability to answer questions that draw on several sentences or several documents to answer

WikiHop

Annotators shown Wikipedia and asked to pose a simple question linking two entities that require a third (bridging) entity to associate
A model shouldn’t be able to answer these without doing some reasoning about the intermediate entity

HotpotQA

Multi-Hop Reasoning

This is idealized version of multi-hop reasoning
Do models need to do this to do well on this task?

Model can ignore the bridging entity and directly predict the answer

No simple lexical overlap
But only one government position appears in the context

NEW TYPES OF QA

DROP

Let’s build QA datasets to help the community focus on modeling particular things

Question types: subtraction, comparison (which did he visit first), counting and sorting (which kecker kecked more field goals),
Invites ad hoc solutions (structure the model around predicting differences between numbers)

NarrativeQA

Humans see a summary of a book
- … Peter’s former girlfriend Dana Barrett has had a son, Oscar …
Question: How is Oscar related to Dana?
Answering these questions from the source text (not summary) requires complex inferences and is extremely challenging

Summary

Lots of problems with current QA settings, lots of new datasets
Models can often work well for one QA task but don’t generalize
We still do not have (solvable) QA settings which seem to require really complex reasoning as opposed to surface-level pattern recognition

Twitter Facebook LinkedIn

LEE CHANWOO

Question Answering

QUESTION ANSWERING

Stanford Question Answering Dataset (SQuAD)

METHODS FOR QUESTION ANSWERING

Feature Based Methods

BiLSTM-based Models

BERT-based Models

Question Aswering Datasets

READING COMPREHENSION

Reading vs Encoding

Deep LSTM Reader

Attentive Reader

Impatient Reader

Summary

OPEN-DOMAIN QUESTION ANSWERING

Traditional QA System

QA System

Document Reader

Paragraph Encoding

Question Encoding

Prediction

Dataset and Example Training Data & Result

Summary: DrQA

Latent Retrieval for Weakly Supervised Open Domain Question Answering

MULTI-HOP QUESTION ANSWERING

WikiHop

HotpotQA

Multi-Hop Reasoning

NEW TYPES OF QA

DROP

NarrativeQA

Summary

공유하기

댓글남기기

참고

[Machine Learning] Parquet vs Arrow

[Programming] gRPC란? gRPC와 REST의 차이점

[Python] uv : 패키지 관리 도구

[Python] PEP 8 : Style Guide for Python Code