[NLP] Word Representation (2)

1 분 소요

Word Representation

Count-based
- Created by a simple function of the counts of nearby words (tf-idf, PPMI)
Class=based
- Created through hierarchical clustering (Brown clusters)
Distributed prediction-based embeddings
- Created by training a classifier to distinguish nearby and far-away words (Word2vec, Fasttext)
Distributed contextual embeddings from language models
- Embeddings from language model (ELMo, BERT, GPT)

Probability distributions over sentences
- P(W) = P(w1, w2, w3, …, wk)
- Ex) Probability of “I like riding a bicycle”
Can use them to generate strings
- P(wk w1, w2, w3, …, wk-1)
- Ex) Probability of ‘bicycle’ given the strings “I like riding a”
Rank possible sentences
- Ex) P(“I like riding a bicycle”) > P(“like a I bicycle riding”)
- Ex) P(“I like riding a bicycle”) > P(“I like riding a computer”)

Application

N-grams model
- P(wk w1, w2, w3, …, wk-1) ≈ P(wk kw-n, …, wk-1)
- Unigram, Bigram, Trigram, …
Neural language models
- RNN
- ELMo
- BERT
- GPT

A family of neural networks for processing sequential data

Limitations of naive RNN: Long-term dependencies problem

Make word embedding from two separate directional LSTM

Make word embedding from transformer

Attention Human pay attention to correlate words in one sentence or different regions of an image

Transformer

Language understanding is bedirectional (forward and backward)
- ELMo models the bidirectional shallowly
Let’s use bidirectional encoder to encode text
- But RNN is too slow
- Let’s make transformer to run the encoder fast
How to train the model?
Language understanding is bidirectional (forward and backward)
Let’s use transformer to encode text
Let’s mask out some input words, and then predict the masked words

Perform well on various NLP tasks with BERT pretrained model

Language Models

Unmodeled Representation

참고자료

http://karpathy.github.io/2015/05/21/rnn-effectiveness/
https://colah.github.io/posts/2015-08-Understanding-LSTMs/
http://jalammar.github.io/illustrated-bert/
https://lilianweng.github.io/posts/2018-06-24-attention/
http://jalammar.github.io/illustrated-transformer/
https://medium.com/analytics-vidhya/openai-gpt-3-language-models-are-few-shot-learners-82531b3d3122
https://lacker.io/ai/2020/07/06/giving-gpt-3-a-turing-test.html
https://blog.pingpong.us/gpt3-review/