Course Staff

Chenhao Tan

Instructor

Qirun Dai

Teaching Assistant

Dang Nguyen

Teaching Assistant

Darin Keng

Teaching Assistant

Logistics

Lectures: Tuesdays/Thursdays 11-12:20, Ryerson 251
Office hours: (DSI -> 5460 University Ave)
- Chenhao Tan: Thursdays 2-3pm (DSI 321)
- Qirun Dai: Wednesdays 11am-1pm (DSI 363)
- Dang Nguyen: Mondays 11am-12pm, Fridays 3-4pm (DSI 363)
- Darin Keng: Thursdays 3-5pm (DSI 363)
Contact: Email (chenhao@uchicago.edu, dangnguyen@uchicago.edu, qirundai@uchicago.edu, dkeng@uchicago.edu); Ed Discussion Forum on Canvas; Discord server (link on Canvas).

Content

What is this course about?

This course will introduce fundamental concepts in natural language processing (NLP). It will cover the basics of enabling computers to understand and generate language, including word embeddings, language modeling, transformers, and an overview of large language models. It will also cover topics on connections with other disciplines such as linguistics and other social sciences.

Prerequisites

Proficiency in Python
You should have a strong foundation in Python and familiarity with Git and Jupyter Notebook. In addition, you must be able to quickly learn and adapt to new frameworks. The course will involve using the PyTorch framework, the HuggingFace library, and several APIs (e.g., OpenAI and TogetherAI). Although optional tutorial sessions will be provided to introduce the basics, you are expected to be able to learn from documentation and troubleshoot effectively.
Calculus, Linear Algebra and Probability
You should be comfortable taking (multivariable) derivatives, understanding matrix/vector notation and operations, and the basics of probability.
Machine Learning
You should have taken one of CMSC 25300, CMSC 25400, and CMSC 25025.

Coursework

Assignments

Assignment 1: Exploring Zipf's Law

Due: Tuesday, January 13, 2026

Assignment 2: Text Classification and Word2Vec

Due: Friday, January 23, 2026

Assignment 3: Attention and Prompting

Due: Friday, February 6, 2026

Assignment 4: TBD

Due: Friday, February 20, 2026

Project

Project Proposal: Due Week 5
Weekly Blog Entries: Weeks 7, 8
Final Project Presentation: Week 9
Final Project Report: Week 10

Exams

Midterm: Week 5
Final Exam: Week 10

Compute

Modal has generously offered compute to each student. See details on Ed.

Textbook

There are a lot of resources online for related content. We will provide readings and pointers throughout the course. A recommended textbook is Speech and Language Processing by Dan Jurafsky and James H. Martin.

Honor Code

We expect students to not look at solutions or implementations online. Like all other classes at UChicago, we take academic honesty very seriously. Please make sure to read the UChicago Academic Honesty page.

Collaboration policy

For individual assignments, collaboration with fellow students is encouraged as long as they are properly disclosed for each submission. However, you should not share any written work or code for your assignments. After discussing a problem with others, you should write the solution by yourself. For final projects, you are expected to work in groups 2-3.

AI tools policy

Using generative AI tools such as Claude Code and ChatGPT is allowed as long as they are properly disclosed for each submission. For individual assignments, we encourage you to implement it on your own so that it maximizes your learning, but learning the content with AI tools is acceptable. In fact, we encourage your creative use of these tools, treating them as collaborators in the learning process.

Additional course policies can be found on Canvas.

Submitting Coursework

All coursework should be submitted via Gradescope by the deadline.

Late Days

Each student has 6 late days to use throughout the quarter.
Each assignment can use at most 3 late days.

Preliminary Schedule

#	Date	Topic	Materials	Deadlines
1	Tues Jan 6	Introduction and tokenization	lecture, notebook(html version)	Assignment 1 out
2	Thurs Jan 8	Word vectors	lecture, notebook(html version) Readings: Efficient Estimation of Word Representations in Vector Space by Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean, SLP Chapter 2 and 5
3	Tues Jan 13	Text Classification	lecture, notebook Readings: Deep Unordered Composition Rivals Syntactic Methods for Text Classification by Mohit Iyyer, Varun Manjunatha, Jordan Boyd-Graber, Hal Daumé III; SLP Chapter 4	Assignment 1 due (Tuesday night) Assignment 2 out
4	Thurs Jan 15	N-gram Language Modeling, Neural Language Models	lecture, notebooks, Readings: A Neural Probabilistic Language Model by Yoshua Bengio, Réjean Ducharme, Pascal Vincent, Christian Jauvin; SLP Chapter 3, 13
5	Tues Jan 20	The NLP Recipe	lecture, notebooks
6	Thurs Jan 22	Attention	lecture, notebooks, Readings: SLP Chapter 8, Attention Is All You Need by Vaswani et al.	Assignment 2 due (Friday night) Assignment 3 out (Friday)
7	Tues Jan 27	Transformers	lecture, notebooks, Readings: SLP Chapter 8, Attention Is All You Need by Vaswani et al.
8	Thurs Jan 29	Attention and transformer demo	lecture, notebooks, Readings: SLP Chapter 8
9	Tues Feb 3	Pretraining and Fine-tuning	lecture, notebooks, Readings: SLP Chapter 8, Language Models are Unsupervised Multitask Learners by Radford et al., Scaling Laws for Neural Language Models by Kaplan et al.
10	Thurs Feb 5	Midterm		Assignment 3 due (Friday night) Project Proposal due (Sunday night) Assignment 4 out (Friday)
11	Tues Feb 10	Decoding LLMs and Prompting	lecture, SLP Chapter 7, Chain-of-Thought Prompting Elicits Reasoning in Large Language Models by Wei et al., Large Language Models are Zero-Shot Reasoners by Kojima et al. Fast Inference from Transformers via Speculative Decoding by Leviathan et al.
12	Thurs Feb 12	Benchmarking and evaluation	lecture, notebooks, Task-Completion Time Horizons of Frontier AI Models
13	Tues Feb 17	Post-training	lecture, notebooks, Training language models to follow instructions with human feedback by Ouyang et al., DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning by DeepSeek-AI.
14	Thurs Feb 19	Reward Optimization Continued	lecture, notebooks,	Assignment 4 due (Friday night) Blog Entry 1 due
15	Tues Feb 24	Guest Lecture: Hypothesis Generation with Large Language Models
16	Thurs Feb 26	Selected advanced topics in LLMs		Blog Entry 2 due
17	Tues Mar 3	Guest Lecture: Multimodal NLP
18	Thurs Mar 5	Final Project Presentation
19	TBD	Final Exam

Acknowledgments

This course website is adapted from the Stanford CS336 course website. This course is built on prior offerings by Mina Lee and Chenhao Tan.

CMSC 25700/35100: Natural Language Processing

University of Chicago / Winter 2026

Course Staff

Logistics

Content

What is this course about?

Prerequisites

Coursework

Assignments

Project

Exams

Compute

Textbook

Honor Code

Collaboration policy

AI tools policy

Additional course policies can be found on Canvas.

Submitting Coursework

Late Days

Preliminary Schedule

Acknowledgments