Information
Canvas | |
Piazza | |
Time: Term 1 (Sep-Dec 2023), TR 9:30-11:00 | |
Location: Woodward IRC 1 | |
Instructor: Vered Shwartz | Office hours: By appointment |
TAs: | |
Shruthi Chockkalingam | Office hours: TBD |
Tanzila Rahman | Office hours: TBD |
Course Description
Natural Language Processing (NLP) is one of the fastest growing sub-area of Artificial Intelligence, with applications in all sectors of our society, including healthcare, business, science and government. In this course, starting from a solid background in computer science, students will learn how to analyze and apply fundamental NLP algorithms and techniques, combining traditional and neural models to better address the given requirements, considering possible trade-offs between accuracy, time/space efficiency and interpretability of the model’s output. In particular, the course will teach the fundamentals of modern data-driven natural language processing, including applications (such as question answering and machine translation), text representations (word embeddings, language models), and various approaches for natural language understanding and generation (classification, tagging, parsing, encoder-decoder). Importantly, they will also learn how to perform informed error-analysis and revise preliminary solutions to address most common sources of error (e.g., more data, different learning methods). Finally, special emphasis will be placed on the critical skill of reasoning about what will happen when a model is deployed in some context, especially focusing on learning how ethical issues intersect with NLP considerations (e.g., societal bias in text representations, fake news generation, sustainable models). The course will involve attending class, homework assignments, a midterm, an exam, and reading and discussing papers.
Tentative Syllabus
Course Overview + Introduction to NLP |
Finite State Text Processing + Morphology |
Text normalization + Spelling Correction |
Language Models: Traditional vs Neural |
Text Classification: Traditional Methods + Sentiment Analysis |
Text Classification: Neural Methods |
Sequence labeling: Traditional Methods + POS Tagging and NER |
Sequence labeling: Neural Methods - RNN, LSTM |
Sequence-to-Sequence: Encoder-Decoder + Attention |
Transformers |
Pre-trained Language Models |
Syntax + Context Free Grammars and Parsing |
Chunking + Dependency Parsing + Treebanks |
Traditional and Neural Methods for Constituency and Dependency Parsing |
Semantics - Lexical Semantics, Semantic Role Labeling, Semantic Parsing |
Topic Modeling (LDA) |
Discourse + Coreference |
Summarization |
Advanced topics: e.g. prompting, ethics, efficient NLP, commonsense reasoning, etc. |
Grading
Final Exam | 35%* |
Midterm | 30% |
Assignments | 25% |
Readings | 10% |
The instructor reserves the right to modify this scheme at any time; however, it is likely that the scheme will remain similar to that stated here.
*You must pass the final exam in order to pass the course.