Course Syllabus
Course Overview
A graduate-level course covering technical foundations of modern natural language processing (NLP). The course will cast NLP as an application of machine learning, in particular deep learning, and focus on deriving general mathematical principles that underlie state-of-the-art NLP systems today.
Course survey (due May 6)
Prerequisites for undergraduates: linear algebra (M250), probability (CS206, or M477/S379), data structures (CS112). Recommended: multivariable calculus (M251), machine learning (533).
Project Google Sheet (15-minute meeting scheduling link)
Instructor: Karl Stratos (karl.stratos@rutgers.edu)
Instructor Office Hours: Tuesday 4-5pm (Zoom link, passcode "rutgersnlp")
Teaching Assistant: Wenyue Hua (wh302@scarletmail.rutgers.edu)
Teaching Assistant Office Hours: Thursday 3-4pm (Zoom link, passcode "nlpta")
Textbooks (for optional reading):
- Natural Language Processing (Eisenstein)
- A Primer on Neural Network Models for Natural Language Processing (Goldberg)
- Deep Learning (Goodfellow, Bengio, and Courville)
LaTeX templates:
Course Schedule
Week 1
Tuesday, January 19
Lecture: General introduction (video, slides)
Entrance Quiz: 3:20-4pm (TA available in the office hour Zoom link during this time window)
Optional reading: Chapter 1 (Eisenstein); linear algebra review (Kolter)
Week 2
Tuesday, January 26
Lecture: Linear classification (video, slides)
Optional reading: Chapter 2.5, 2.6 (Eisenstein)
Assignment 1 assigned (due in 3 weeks)
Jupyter Notebook on projections
Week 3
Tuesday, February 2
Lecture: Optimization, introduction to deep learning (video, slides)
Optional reading: Notes on feedforward networks (Collins), notes on backpropagation
Jupyter Notebook on separable encodings
Week 4
Tuesday, February 9
Lecture: Feedforward networks, universality, backpropagation (video, slides)
Optional reading: Chapter 3.1-3.3 (Eisenstein), notes on Xavier initialization (Stanford), notes on gradient-based optimization algorithms (Ruder)
Thursday, February 11
Quiz 1: 30 minutes (available 1-6pm)
Week 5
Tuesday, February 16
Lecture: Convolutional, recurrent and attention-based architectures (video, slides)
Optional reading: Chapter 3.4 (Eisenstein), Olah's blogs on LSTMs and attention, notes on transformers
Assignment 2 assigned
Assignment 1 due
Week 6
Tuesday, February 23
Lecture: Language models, beam search, text generation (video, slides)
Optional reading: RNN LM PyTorch example, generate function in Hugging Face transformers, top-p/top-k sampling implementation
Week 7
Tuesday, March 2
Lecture: Conditional language models, machine translation (video, slides)
Optional reading: Chapter 18.1 (Eisenstein), Google NMT and multilingual translation papers, T5 paper
Thursday, March 4
Quiz 2: 40 minutes (available 1-6pm)
Week 8
Tuesday, March 9
Lecture: Copy mechanism, relation-aware self-attention, hidden Markov models (video, slides)
Optional reading: Gulcehre et al. (2016), Shaw et al. (2018), notes on hidden Markov models (Collins), example of neural HMM (Chui and Rush, 2020)
Assignment 3 assigned
Assignment 2 due
Spring Recess (March 12-20)
Week 9
Tuesday, March 23
Lecture: Marginal decoding, conditional random fields (video, slides)
Optional reading: Chapter 7.5.3 (Eisenstein), Lample et al. (2016), notes on graphical models (Blei), notes on belief propagation
Week 10
Tuesday, March 30
Lecture: Natural language understanding, pretrained language models (video, slides)
Optional reading: The word2vec paper (also a blog), the ELMo paper, the BERT paper, paper analyzing commonsense reasoning performance (Trichelair et al., 2019), paper about effects of pretraining scale (Zhang et al., 2020)
Jupyter notebook on how to use BERT
Thursday, April 1
Quiz 3: 30 minutes (available 1-6pm)
Week 11
Tuesday, April 6
Lecture: More pretrained transformers, latent-variable generative models (video, slides)
Optional reading: The BART paper, Section 1 and Appendix A of this note, additional notes, VAEs applied to text generation and document hashing
Project proposal due
Assignment 3 due
Week 12
Tuesday, April 12
Lecture: More variational autoencoders, discrete latent variables (video, slides)
Optional reading: Notes on Gumbel (Appendix A of this note, you may have to refresh the page), Li et al. (2019)
Week 13
Tuesday, April 20
Lecture: Knowledge-intensive language tasks (video, slides)
Optional reading: Notes on noise contrastive estimation, Lee et al. (2019), Cheng et al. (2020), Wu et al. (2020)
Milestone due
Week 14
Tuesday, April 26
Lecture: Coreference resolution, review (video, slides)
Optional reading: Section 4.2 of Marquez et al. (2012), LEA, end-to-end coref (Lee et al., 2017) and its extension, coref with BERT and SpanBERT, CorefQA
Monday, May 10
Project presentation video and final report due
Course Summary:
Date | Details | Due |
---|---|---|