[NLP] Text Classification
Text Classification
Artificial Intelligence
- The intelligence exhihited by machines
- How to create computers and computer software that are capable of intelligent behavior
Machine Learning
- Subfield of artificial intelligence
- Study of pattern recognition and computational learning theory
- Creating programs that can automatically learn rules from data
“Field of study that gives computers the ability to learn without being explicitly programmed” (Arthur Samuel, 1959)
- Traditional: Write programs using hard-coded (fixed) rules
- Machine Learning: Learn rules by looking at some training data
- Supervised Learning
- Predictive approach
- To learn a mapping from inputs to outputs
- Example) classification, regression
- Unsupervised Learning
- Descriptive approach
- To find interesting patterns in the data
- Example) clustering, dimensionality reduction
Supervised Learning
- Given: Training data as labeled instances {(x^(1), y^(1)), …, (x^(N), y^(N))}
- Goal: Learn a rule (f:x → y) to predict outputs y for new inputs x
- Example)
- Data: ((Blue, Square, 10), yes), …,((Red, Elipse, 20.7), yes)
- Task: For new inputs (Blue, Crescent, 10), (Yellow, Circle, 12), are they yes/no?
- Classification: discrete-valued outputs
- Examples)
- Data: Size and label {(Height, Weight), Cat/Dog}
- Task: Predict whether an animal is a cat or dog given new size information
- Method: Finding a linear or nonlinear separator
Classification
- A mapping h from input data x to a lael y
- x ∈ X, X is instance space (i.e. all documnets)
- y ∈ Y, Y is enumerable output spae (i.e. categories)
- x: a single document
- y: politics
- Image ⇒ Digit
-
Mail ⇒ spam or not
-
Text ⇒ Gender of the author
- Movie review ⇒ Rating
- Document ⇒ Category
text Classification Problem Given a text w = (w1, w2, …, wr) ∈ V, predict a label y ∈ Y
Classifier
- Naive Bayes
- Perceptron
- Logistic regression
- Support Vector Machine
- Random Forests
- Deep learning models
Natural Language Processing
- Word Representations
- Count-based
- created by a simple function of the counts of nearby words
- tf-idf, PPMI
- Class-based
- created through hierarchical clustering
- Brown clusters
- Distributed prediction-based embeddings
- created by training a classifier to distinguish nearby and far-away words
- Word2vec, Fasttext
- Distributed contextual embeddings from language models
- Embeddings from language model
- ELMo, BERT, GPT
- Count-based
- Documnet Representations
- Count-based
- Bag-of-words
- Neural network based
- RNN
- Neural language model
- Count-based
Bag-of-Words
- One challenge is that the sequential representation (w1, w2, …, wr) may have a different length T for every document
- The bag-of-words is a fixed-length representation, which consists of a vector word count:
- The length of x is equal to the size of the vocabulary V
- For each x, there may be many possible w, depending on word order
댓글남기기