[NLP] Machine Translation

3 분 소요

Machine Translation

The task of translating a sentence x from one language (the source language) to a sentence y in another language (the target language)

Challenges

Ambiguities
- Word
- Morphology
- Syntax
- Semantics
- Pragmatics
Gaps in data
- Availablility of corpus
- Commonsense knowledge
Understanding of context, connotation, social norms, etc.

When I look at an article in Russian, I say: “This is really written in English, but it has been coded in some strange symbols. I will now proceed to decode”

Noisy Channel Models

A pattern for modeling a pair of random variables, W and A
W is the plaintext, the true message, the missing information
A is the ciphertext, the garbled message, the observable evidence

Decoding: select w given A = a

MT as Direct Modeling

one model does everything
Trained to reproduce a corpus of translations

Two Views of MT

Noisy channel model
- I know the target language
- I have example translations texts (example enciphered data)
Direct model
- I have really good learning algorithms and a bunch of example inputs (source language sentences) and outputs (target language translations)

Noisy channel model
- Easy to use monolingual target language data
- Search happens under a product of two models
  - Individual models can be simple, product can be powerful
Direct model
- Directly model the process you care about
- Model must be very powerful

Now?

Direct modeling is where most of the action is
- Neural networks are very good at generalizing and conceptually very simple
- Inference in “product of wto models” is hard
Noisy channel ideas are incredibly important and still play a big role in how we think about translation

Parallel Corpora

Europarl (proceedings of European parliament, 50M words/language)
- http://www.statmt.org/europarl/
UN Corpus (United Nations documents, six languages, 300M words/language)
- http://www.euromatrixplus.net/multi-un
Common crawl (Web documents, long tail of language pairs)

Challenges

Word Translation Ambiguity

What is the best translation?

Solution intuition: use counts in parallel corpus

Word Order

Problem: different languages organize words in different order to express the same idea

Solution intuition: language modeling!

Output Language Fluency

what is most fluent?

Solution intuition: a language modeling problem!

How Good is Machine Translation Today?

MT History

Neural Machine Translation

Encoder-decoder framework

Encoder

Decoder

we have a model, how can we generate transaltions?
Answers
- Sampling: generate a random sentence according to probability distribution
- Argmax: generate sentence with highest probability

Inference Methods

Greedy inference
- We just start at the left, and use our classifier at each position to assign a label
- One by one, pick single highest probability word
Problems
- Often generates easy words first
- Often prefers multiple common words to rare words

Beam inference

At each position keep the top k complete sequences
Extend each sequence in each local way
The extensions compete for the k slots at the next position

Neural Machine Translation

Google’s NMT System 2016

Encoder and decoder are both transformers
Decoder consumes the previous generated token (and attends to input), but has no recurrent state

Evaluation

How good is a given machine translation system?
Many different translations acceptable
Evaluation metric
- Subjective judgments by human evaluators
- Automatic evaluation metrics
- Task-based evaluation

Adequacy and Fluency

Automatic Evaluation Metrics

Goal: computer program that computes quality of translations
Advantages: low cost, optimizable, consistent
Basic strategy
- Given: MT output
- Given: human reference translation
- Task: compute similarity between them

Precision and Recall of Words

Bilingual Evaluation Understudy (BLEU)

N-gram overlap between machine translation output and reference translation
Compute precision for n-grams of size 1 to 4
Add brevity penalty (for too short translations)

Typically computed over the entire corpus, not single sentences

Drawbacks of Automatic Metrics

All words are treated as equally relevant
Operate on local level
Sceres are meaningless (absolute value not informative)
Human translators score low on BLEU

BLEU Correlate with Human Judgement

BERTScore

Twitter Facebook LinkedIn

LEE CHANWOO

[NLP] Machine Translation

Machine Translation

Noisy Channel Models

MT as Direct Modeling

Two Views of MT

Parallel Corpora

Challenges

Word Translation Ambiguity

Word Order

Output Language Fluency

Neural Machine Translation

Inference Methods

Beam inference

Neural Machine Translation

Evaluation

Adequacy and Fluency

Automatic Evaluation Metrics

Precision and Recall of Words

Bilingual Evaluation Understudy (BLEU)

Drawbacks of Automatic Metrics

BERTScore

공유하기

댓글남기기

참고

[Programming] gRPC란? gRPC와 REST의 차이점

[Python] uv : 패키지 관리 도구

[Python] PEP 8 : Style Guide for Python Code

[Python] PEP 20 : The Zen of Python