NLP in general

Interdisciplinary field that concerns interactions between computers and technology.

Natural languages are Italian, Chinese… all the languages people speak.

It is split between computer science and linguistics:

  • phonetics
    • sounds
  • phonology
    • sound systems
  • morphology
    • formation and internal structs of the words
  • syntax
    • formation and internal structs of sentences
  • semantics
    • meaning of the sentences
  • pragmatics
    • sentences with semantic meanings are used for communication goals

Some Applications

Siri, Cortana, Autocomplete, spell checking, machine translations

Milestones

NLP Milestones are split into:

See NLP Milestones for more.

Text normalization

  • Before doing any operation we must normalize the text
  1. Tokenizing words (see Word Segmentation)
  2. Normalizing word formats
  3. Segmenting sentences