Course Syllabus

Course Overview

A graduate-level course covering technical foundations of modern natural language processing (NLP). The course will cast NLP as an application of machine learning, in particular deep learning, and focus on deriving general mathematical principles that underlie state-of-the-art NLP systems today. 

Course links:

Course format: Flipped classroom. Each week, the student watches a lecture video online and studies the associated slides at her own pace (note: the slides may be updated to incorporate more recent developments and differ slightly from the video, so please re-download the latest slides after the lecture time). Even though all materials are made available upfront, the student is not expected to "study ahead"; he is only responsible for materials covered up to that week only (e.g., for quizzes and assignments). Then, in the class we will discuss a research paper chosen from the reading list above. Every student is required to present a paper and lead the discussion (10% of the grade).

  • Meeting each week on Wednesday 1:00-2:30pm.  
    • Until January 30th: Synchronous online 
    • After January 30th: In-person at SEC 207 BUS
  • Occasionally, there will be online quizzes on Canvas. They will be scheduled within the course time slot (12:10-3:10pm Wednesday) so that there is no time conflict with other courses. 

Prerequisites for undergraduates: linear algebra (M250), probability (CS206, or M477/S379), data structures (CS112). Recommended: multivariable calculus (M251), machine learning (533).

Instructor: Karl Stratos (

Instructor Office Hours: Wednesday 4-5pm (Zoom link, passcode "rutgersnlp") 

Teaching Assistant: Wenzheng "Vincent" Zhang (

Teaching Assistant Office Hours: Thursday 11am-12pm (Zoom link)

Grader: SohailAbbas "Sohail" Saiyed (

Textbooks (for optional reading):

LaTeX templates:

Course Schedule

Week 1

Wednesday, January 19

Lecture: General introduction (asynchronous remote, Zoom link, slides, recording)

Entrance Quiz: 2:30-3:10pm

Optional reading: Chapter 1 (Eisenstein); linear algebra review (Kolter)

Week 2

Wednesday, January 26

Lecture: Linear classification (video, slides)

Paper Discussion (asynchronous remote, Zoom link)

Learning Transferable Visual Models From Natural Language Supervision (CLIP)

Optional reading: Chapter 2.5, 2.6  (Eisenstein) 

Assignment 1 assigned (due in 2 weeks) 

Jupyter Notebook on projections 

Week 3

Wednesday, February 2

Lecture: Optimization, introduction to deep learning (video, slides)

Paper Discussion

Simple Local Attentions Remain Competitive for Long-Context Tasks

Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing

Every Model Learned by Gradient Descent Is Approximately a Kernel Machine (video)

Optional reading: Notes on feedforward networks (Collins), notes on backpropagation

Jupyter Notebook on separable encodings 

Week 4 

Wednesday, February 9 

Lecture: Feedforward networks, universality, backpropagation (video, slides)

Paper Discussion

From RankNet to LambdaRank to LambdaMART: An Overview

Making Transformers Solve Compositional Tasks

Optional reading: Chapter 3.1-3.3 (Eisenstein), notes on Xavier initialization (Stanford), notes on gradient-based optimization algorithms (Ruder)

Quiz 1: 30 minutes 

Assignment 1 due

Week 5

Wednesday, February 16  

Lecture: Convolutional, recurrent and attention-based architectures (video, slides)

Paper Discussion

Efficient Nearest Neighbor Language Models (followup)

Step-unrolled Denoising Autoencoders for Text Generation

Optional reading: Chapter 3.4 (Eisenstein), Olah's blogs on LSTMs and attention, notes on transformers 

Assignment 2 assigned

Office hours: Immediately after the class

Week 6

Wednesday, February 23

Lecture: Language models, beam search, text generation (video, slides)

Paper Discussion

SPLADE v2: Sparse Lexical and Expansion Model for Information Retrieval

Scaling Language Models: Methods, Analysis & Insights from Training Gopher

Optional reading: RNN LM PyTorch example, generate function in Hugging Face transformers, top-p/top-k sampling implementation

Office hours: Immediately after the class

Week 7 

Wednesday, March 2

Lecture: Conditional language models, machine translation (video, slides)

Paper Discussion

data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language

Entity Linking and Discovery via Arborescence-based Supervised Clustering

Optional reading: Chapter 18.1 (Eisenstein), Google NMT and multilingual translation papers, T5 paper  

Quiz 2: 30 minutes 

Assignment 3 assigned

Assignment 2 due

Week 8

Wednesday, March 9

Lecture: Natural language understanding, pretrained language models (video, slides)

(Lecture order changed: watch the beginning part of Week 12's lecture to finish the materials on language modeling and Transformers - copy mechanism and relative position embeddings) 

Paper Discussion

A White Box Analysis of ColBERT

On the Power of Saturated Transformers: A View from Circuit Complexity

Optional reading: The word2vec paper (also a blog), the ELMo paper, the BERT paper, paper analyzing commonsense reasoning performance (Trichelair et al., 2019), paper about effects of pretraining scale (Zhang et al., 2020

Jupyter notebook on how to use BERT 

Spring Recess (March 12-20)

Week 9

Office hours: Tuesday (March 22) 9pm

Wednesday, March 23

(No in-person class this week, paper presentations will be available online) 

Lecture: More pretrained transformers, latent-variable generative models (video, slides)

Paper Discussion

Partially Supervised Named Entity Recognition via the Expected Entity Ratio Loss (presentation)

VAE Approximation Error: ELBO and Exponential Families (presentation)

Optional reading: The BART paper, Section 1 and Appendix A of this note, additional notes, VAEs applied to text generation and document hashing   

Assignment 3 due (deadline postponed to Saturday 8pm)

Week 10

Project proposal 1-1 meetings on Wed/Fri 3:30-5:30: sign up here (no additional office hours)

Wednesday, March 30

Lecture: More variational autoencoders, discrete latent variables (video, slides)

Paper Discussion

Memorizing Transformers

Revisiting the Primacy of English in Zero-shot Cross-lingual Transfer

Optional reading: Notes on Gumbel (Appendix A of this note, you may have to refresh the page), Li et al. (2019) 

Week 11 

Wednesday, April 6

Lecture: Knowledge-intensive language tasks (video, slides

Paper Discussion

LM-Critic: Language Models for Unsupervised Grammatical Error Correction

Towards a Unified View of Parameter-Efficient Transfer Learning

Optional reading: Notes on noise contrastive estimation, Lee et al. (2019), Cheng et al. (2020), Wu et al. (2020) 

Project proposal due 

Week 12 

Wednesday, April 13

Lecture: Copy mechanism, relation-aware self-attention, hidden Markov models (video, slides)

Improving Compositional Generalization with Latent Structure and Data Augmentation

Investigating the Effect of Background Knowledge on Natural Questions

How Do Vision Transformers Work?

Paper Discussion

Optional reading: Gulcehre et al. (2016), Shaw et al. (2018), notes on hidden Markov models (Collins), example of neural HMM (Chui and Rush, 2020)  

Week 13

Wednesday, April 20

Lecture: Marginal decoding, conditional random fields (video, slides

Paper Discussion

A Contrastive Framework for Neural Text Generation

Revisiting Over-smoothing in BERT from the Perspective of Graph

The MultiBERTs: BERT Reproductions for Robustness Analysis

Optional reading: Chapter 7.5.3 (Eisenstein), Lample et al. (2016), notes on graphical models (Blei), notes on belief propagation

Quiz 3: 30 minutes 

Milestone due

Week 14

Wednesday, April 26

(Class time changed to 1:30-3pm)

Lecture: Coreference resolution, review (video, slides

Paper Discussion

Perceiver IO: A General Architecture for Structured Inputs & Outputs

BEIR: A Heterogeneous Benchmark for Zero-shot Evaluation of Information Retrieval Models

Optional reading: Section 4.2 of Marquez et al. (2012), LEA, end-to-end coref (Lee et al., 2017) and its extension, coref with BERT and SpanBERT, CorefQA    

Tuesday, May 10

Project presentation video and final report due 

Course Summary:

Date Details Due