[NLP] Natural Language Processing
Natural Language Processing
Introduction to NLP
Communication with People
Communication with Computers
- Making machines understand human language
- Communication with humans
- Access the wealth of information about the world
- Automation of natural languages (NL)
- Analysis: NL => Representation (R)
- Generation: R => NL
- Acquisition of R from knowledge and data
NLP Key Idea
Representation
Human Brain
- How is the language/knowledge expressed (in the brain)?
Computer
- How is the human language/knowledge expressed (in a computer)?
- How is the meaning of words expressed (in a computer)?
Distributional Hypothesis
The meaning of a word is its use in the language [Wittgenstein PI 1943]
If A and B have almost identical environments we say that they are synonyms [Harris 1954]
You shall know a word bhy the company it keeps [Firth 1957]
Question) Meaning of a word - Tesgüino
Situation)
– No dictionary
– Never seen before
– We only know that the word is used in the following contexts
- A bottle of ____ is on the table.
- Everybody likes ____.
- Don’t have ____ before you drive.
- We make ____ out of corn.
Word | Context 1 | Context 2 | Context 3 | Context 4 |
---|---|---|---|---|
Tesgüino | Yes | Yes | Yes | Yes |
Loud | No | No | No | No |
Tortillas | No | Yes | No | Yes |
Wine | Yes | Yes | Yes | No |
- Tesgüinois an artisanal corn beer produced by several Yuto-Aztec people.
Q) How is the meaning of words expressed (in a computer)? A) Each word = one vector
- Similar words are nearby in space
- Understanding the context in which words are used in large textual data
-
Word Context 1 Context 2 Context 3 … Context 100,000,000 Tesgüino Yes Yes Yes … 0.731 Loud No No No … 0.273 Tortillas No Yes No … 0.276 Wine Yes Yes Yes … 0.836 … - - - … -
NLP Applications
Applications
- Machine Translation
- Information Retrieval
- Question Answering
- Dialogue Systems
- Information Extraction
- Summarization
- Sentiment Analysis
- …
Machine Translation
The task of translating a sentence x from one language (the source language) to a sentence y in another language (the target language)
Information Retrieval
the task of finding information that people want
Question Answering
The task of answering a (natural language) question One of the oldest NLP taks (punched card systems in 1961)
Idea: doing dependency parsing on question and searching most simillar answer
Dialogue Systems
The task of generating a response for making a conversation with human
Turing test
Ability to understand and generate language - intelligence “Can machine think?”
Information Extraction
The task of extracting structured information from unstructured documents
Named Entity Recognition
- Find entities in text
- Classify entities in text
- Example)
Summarization
The task of creating a summnary that represents the most importnat or relevant information within original text
Sentiment Analysis
The task of classifying emotions in subjective data
- Fine-grained sentiment analysis
- Identify a category of sentiment
- Ex) very positive, positive, neutral, negative, very negative
- Emotion detection
- Identify emotions
- Ex) happiness, frustration, anger, sadness, …
- Aspect-based sentiment analysis
- Identify a category of sentiment in terms of aspect
- Ex) “The CPU is fast. The battery runs fast.”
Challenges in NLP
Learn bias
Lack of Reasoning
enerate rude response
Why NLP is hard?
- Ambiguity
- Expressivity
- Scale
- Variation
- Sparsity
- Unmodeled variables
- Unknown representations
참고자료
- https://www.parentmap.com/article/mind-boggling-new-discoveries-about-the-brain
- https://dictionary.cambridge.org/dictionary/english/word
- https://en.wikipedia.org/wiki/Tesg%C3%BCino
댓글남기기