Student will learn the most important phenomena in natural languages on different levels of granularity, starting with the combination of sounds to the meaning of words, sentences and texts.You will get an introduction to main symbolic and statistical approaches to model these phenomena. All theoretical topics will be accompanied by exercises dealing with these phenomena and demonstrating their use in practical applications, like spelling correction, auto completion, keyword extraction, topic detection, named entity recognition, relation extraction, synonym detection, etc.
The objective of this course is to introduce research questions from computational linguistics that can (and need to) be solved with large amounts of language data. Alongside, we introduce relevant basic linguistic phenomena and standard ways of describing them. We then discuss a number of tools and approaches that are based on large amounts of language data. Participants are familiarized with the notions, issues and approaches in the presentation part of the course (08:30 - 10:00) and then can themselves experiment with data and tools, in order to contribute to a common descriptive and analytic task. Examples and data are taken from English (or paraphrased in EN).
This year we will focus on "Computing Meaning". Thus, you can participate in this course if you already attended the NLP course of summer 2019 ("Project Seminar: Analyzing large amounts of language data") to extend your knowledge on NLP. However, we do not require this course as a prerequesite. Topics include: Corpora, basic text processing (POS-tagging, NER, Pattern extraction), collocation extraction, semantic relations, distributional semantics, word embeddings, contextualized word embeddings, evaluation of word embeddings, applications of word embeddings in neural NLP.