Course Syllabus
Course Overview
A graduate-level course covering technical foundations of modern natural language processing (NLP). The course will cast NLP as an application of machine learning, in particular deep learning, and focus on deriving general mathematical principles that underlie state-of-the-art NLP systems today.
Course links:
- Reading list
- Projects (previous: 2021)
Course format: Flipped classroom. Each week, the student watches a lecture video online and studies the associated slides at her own pace (note: the slides may be updated to incorporate more recent developments and differ slightly from the video, so please re-download the latest slides after the lecture time). Even though all materials are made available upfront, the student is not expected to "study ahead"; he is only responsible for materials covered up to that week only (e.g., for quizzes and assignments). Then, in the class we will discuss a research paper chosen from the reading list above. Every student is required to present a paper and lead the discussion (10% of the grade).
- Meeting each week on Wednesday 1:00-2:30pm.
- Until January 30th: Synchronous online
- After January 30th: In-person at SEC 207 BUS
- Occasionally, there will be online quizzes on Canvas. They will be scheduled within the course time slot (12:10-3:10pm Wednesday) so that there is no time conflict with other courses.
Prerequisites for undergraduates: linear algebra (M250), probability (CS206, or M477/S379), data structures (CS112). Recommended: multivariable calculus (M251), machine learning (533).
Instructor: Karl Stratos (karl.stratos@rutgers.edu)
Instructor Office Hours: Wednesday 4-5pm (Zoom link, passcode "rutgersnlp")
Teaching Assistant: Wenzheng "Vincent" Zhang (wenzheng.zhang@rutgers.edu)
Teaching Assistant Office Hours: Thursday 11am-12pm (Zoom link)
Grader: SohailAbbas "Sohail" Saiyed (ss3723@scarletmail.rutgers.edu)
Textbooks (for optional reading):
- Natural Language Processing (Eisenstein)
- A Primer on Neural Network Models for Natural Language Processing (Goldberg)
- Deep Learning (Goodfellow, Bengio, and Courville)
LaTeX templates:
Course Schedule
Week 1
Wednesday, January 19
Lecture: General introduction (asynchronous remote, Zoom link, slides, recording)
Entrance Quiz: 2:30-3:10pm
Optional reading: Chapter 1 (Eisenstein); linear algebra review (Kolter)
Week 2
Wednesday, January 26
Lecture: Linear classification (video, slides)
Paper Discussion (asynchronous remote, Zoom link)
Learning Transferable Visual Models From Natural Language Supervision (CLIP)
Optional reading: Chapter 2.5, 2.6 (Eisenstein)
Assignment 1 assigned (due in 2 weeks)
Jupyter Notebook on projections
Week 3
Wednesday, February 2
Lecture: Optimization, introduction to deep learning (video, slides)
Paper Discussion
Simple Local Attentions Remain Competitive for Long-Context Tasks
Every Model Learned by Gradient Descent Is Approximately a Kernel Machine (video)
Optional reading: Notes on feedforward networks (Collins), notes on backpropagation
Jupyter Notebook on separable encodings
Week 4
Wednesday, February 9
Lecture: Feedforward networks, universality, backpropagation (video, slides)
Paper Discussion
From RankNet to LambdaRank to LambdaMART: An Overview
Making Transformers Solve Compositional Tasks
Optional reading: Chapter 3.1-3.3 (Eisenstein), notes on Xavier initialization (Stanford), notes on gradient-based optimization algorithms (Ruder)
Quiz 1: 30 minutes
Assignment 1 due
Week 5
Wednesday, February 16
Lecture: Convolutional, recurrent and attention-based architectures (video, slides)
Paper Discussion
Efficient Nearest Neighbor Language Models (followup)
Step-unrolled Denoising Autoencoders for Text Generation
Optional reading: Chapter 3.4 (Eisenstein), Olah's blogs on LSTMs and attention, notes on transformers
Assignment 2 assigned
Office hours: Immediately after the class
Week 6
Wednesday, February 23
Lecture: Language models, beam search, text generation (video, slides)
Paper Discussion
SPLADE v2: Sparse Lexical and Expansion Model for Information Retrieval
Scaling Language Models: Methods, Analysis & Insights from Training Gopher
Optional reading: RNN LM PyTorch example, generate function in Hugging Face transformers, top-p/top-k sampling implementation
Office hours: Immediately after the class
Week 7
Wednesday, March 2
Lecture: Conditional language models, machine translation (video, slides)
Paper Discussion
data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language
Entity Linking and Discovery via Arborescence-based Supervised Clustering
Optional reading: Chapter 18.1 (Eisenstein), Google NMT and multilingual translation papers, T5 paper
Quiz 2: 30 minutes
Assignment 3 assigned
Assignment 2 due
Week 8
Wednesday, March 9
Lecture: Natural language understanding, pretrained language models (video, slides)
(Lecture order changed: watch the beginning part of Week 12's lecture to finish the materials on language modeling and Transformers - copy mechanism and relative position embeddings)
Paper Discussion
A White Box Analysis of ColBERT
On the Power of Saturated Transformers: A View from Circuit Complexity
Optional reading: The word2vec paper (also a blog), the ELMo paper, the BERT paper, paper analyzing commonsense reasoning performance (Trichelair et al., 2019), paper about effects of pretraining scale (Zhang et al., 2020)
Jupyter notebook on how to use BERT
Spring Recess (March 12-20)
Week 9
Office hours: Tuesday (March 22) 9pm
Wednesday, March 23
(No in-person class this week, paper presentations will be available online)
Lecture: More pretrained transformers, latent-variable generative models (video, slides)
Paper Discussion
Partially Supervised Named Entity Recognition via the Expected Entity Ratio Loss (presentation)
VAE Approximation Error: ELBO and Exponential Families (presentation)
Optional reading: The BART paper, Section 1 and Appendix A of this note, additional notes, VAEs applied to text generation and document hashing
Assignment 3 due (deadline postponed to Saturday 8pm)
Week 10
Project proposal 1-1 meetings on Wed/Fri 3:30-5:30: sign up here (no additional office hours)
Wednesday, March 30
Lecture: More variational autoencoders, discrete latent variables (video, slides)
Paper Discussion
Revisiting the Primacy of English in Zero-shot Cross-lingual Transfer
Optional reading: Notes on Gumbel (Appendix A of this note, you may have to refresh the page), Li et al. (2019)
Week 11
Wednesday, April 6
Lecture: Knowledge-intensive language tasks (video, slides)
Paper Discussion
LM-Critic: Language Models for Unsupervised Grammatical Error Correction
Towards a Unified View of Parameter-Efficient Transfer Learning
Optional reading: Notes on noise contrastive estimation, Lee et al. (2019), Cheng et al. (2020), Wu et al. (2020)
Project proposal due
Week 12
Wednesday, April 13
Lecture: Copy mechanism, relation-aware self-attention, hidden Markov models (video, slides)
Improving Compositional Generalization with Latent Structure and Data Augmentation
Investigating the Effect of Background Knowledge on Natural Questions
How Do Vision Transformers Work?
Paper Discussion
Optional reading: Gulcehre et al. (2016), Shaw et al. (2018), notes on hidden Markov models (Collins), example of neural HMM (Chui and Rush, 2020)
Week 13
Wednesday, April 20
Lecture: Marginal decoding, conditional random fields (video, slides)
Paper Discussion
A Contrastive Framework for Neural Text Generation
Revisiting Over-smoothing in BERT from the Perspective of Graph
The MultiBERTs: BERT Reproductions for Robustness Analysis
Optional reading: Chapter 7.5.3 (Eisenstein), Lample et al. (2016), notes on graphical models (Blei), notes on belief propagation
Quiz 3: 30 minutes
Milestone due
Week 14
Wednesday, April 26
(Class time changed to 1:30-3pm)
Lecture: Coreference resolution, review (video, slides)
Paper Discussion
Perceiver IO: A General Architecture for Structured Inputs & Outputs
BEIR: A Heterogeneous Benchmark for Zero-shot Evaluation of Information Retrieval Models
Optional reading: Section 4.2 of Marquez et al. (2012), LEA, end-to-end coref (Lee et al., 2017) and its extension, coref with BERT and SpanBERT, CorefQA
Tuesday, May 10
Project presentation video and final report due
Course Summary:
Date | Details | Due |
---|---|---|