3 분 소요

Machine Translation

The task of translating a sentence x from one language (the source language) to a sentence y in another language (the target language)

image

Challenges

  • Ambiguities
    • Word
    • Morphology
    • Syntax
    • Semantics
    • Pragmatics
  • Gaps in data
    • Availablility of corpus
    • Commonsense knowledge
  • Understanding of context, connotation, social norms, etc.

image

When I look at an article in Russian, I say: “This is really written in English, but it has been coded in some strange symbols. I will now proceed to decode”

image

Noisy Channel Models

  • A pattern for modeling a pair of random variables, W and A
  • W is the plaintext, the true message, the missing information
  • A is the ciphertext, the garbled message, the observable evidence

image

  • Decoding: select w given A = a

image

image

MT as Direct Modeling

  • one model does everything
  • Trained to reproduce a corpus of translations

image

Two Views of MT

image

  • Noisy channel model
    • I know the target language
    • I have example translations texts (example enciphered data)
  • Direct model
    • I have really good learning algorithms and a bunch of example inputs (source language sentences) and outputs (target language translations)

  • Noisy channel model
    • Easy to use monolingual target language data
    • Search happens under a product of two models
      • Individual models can be simple, product can be powerful
  • Direct model
    • Directly model the process you care about
    • Model must be very powerful

Now?

  • Direct modeling is where most of the action is
    • Neural networks are very good at generalizing and conceptually very simple
    • Inference in “product of wto models” is hard
  • Noisy channel ideas are incredibly important and still play a big role in how we think about translation

Parallel Corpora

image

image

  • Europarl (proceedings of European parliament, 50M words/language)
    • http://www.statmt.org/europarl/
  • UN Corpus (United Nations documents, six languages, 300M words/language)
    • http://www.euromatrixplus.net/multi-un
  • Common crawl (Web documents, long tail of language pairs)

Challenges

Word Translation Ambiguity

  • What is the best translation?

image

  • Solution intuition: use counts in parallel corpus

Word Order

  • Problem: different languages organize words in different order to express the same idea

image

  • Solution intuition: language modeling!

Output Language Fluency

  • what is most fluent?

image

  • Solution intuition: a language modeling problem!

How Good is Machine Translation Today? image

image

MT History image

Neural Machine Translation

Encoder-decoder framework

image

Encoder

image

Decoder

image

  • we have a model, how can we generate transaltions?
  • Answers
    • Sampling: generate a random sentence according to probability distribution
    • Argmax: generate sentence with highest probability

Inference Methods

  • Greedy inference
    • We just start at the left, and use our classifier at each position to assign a label
    • One by one, pick single highest probability word
  • Problems
    • Often generates easy words first
    • Often prefers multiple common words to rare words

image

Beam inference

  • At each position keep the top k complete sequences
  • Extend each sequence in each local way
  • The extensions compete for the k slots at the next position

image

Neural Machine Translation

Google’s NMT System 2016

image

  • Encoder and decoder are both transformers
  • Decoder consumes the previous generated token (and attends to input), but has no recurrent state

image

Evaluation

  • How good is a given machine translation system?
  • Many different translations acceptable
  • Evaluation metric
    • Subjective judgments by human evaluators
    • Automatic evaluation metrics
    • Task-based evaluation

Adequacy and Fluency

image

image

Automatic Evaluation Metrics

  • Goal: computer program that computes quality of translations
  • Advantages: low cost, optimizable, consistent
  • Basic strategy
    • Given: MT output
    • Given: human reference translation
    • Task: compute similarity between them

Precision and Recall of Words

image

image

Bilingual Evaluation Understudy (BLEU)

  • N-gram overlap between machine translation output and reference translation
  • Compute precision for n-grams of size 1 to 4
  • Add brevity penalty (for too short translations)

image

  • Typically computed over the entire corpus, not single sentences

Drawbacks of Automatic Metrics

  • All words are treated as equally relevant
  • Operate on local level
  • Sceres are meaningless (absolute value not informative)
  • Human translators score low on BLEU

BLEU Correlate with Human Judgement

image

BERTScore

image

댓글남기기