Course Staff

Chenhao Tan
Instructor
Qirun Dai
Teaching Assistant
Dang Nguyen
Teaching Assistant
Darin Keng
Teaching Assistant

Logistics

Content

What is this course about?

This course will introduce fundamental concepts in natural language processing (NLP). It will cover the basics of enabling computers to understand and generate language, including word embeddings, language modeling, transformers, and an overview of large language models. It will also cover topics on connections with other disciplines such as linguistics and other social sciences.

Prerequisites


Coursework

Assignments

Project

Exams

Compute

Modal has generously offered compute to each student. See details on Ed. Modal

Textbook

There are a lot of resources online for related content. We will provide readings and pointers throughout the course. A recommended textbook is Speech and Language Processing by Dan Jurafsky and James H. Martin.

Honor Code

We expect students to not look at solutions or implementations online. Like all other classes at UChicago, we take academic honesty very seriously. Please make sure to read the UChicago Academic Honesty page.

Collaboration policy

For individual assignments, collaboration with fellow students is encouraged as long as they are properly disclosed for each submission. However, you should not share any written work or code for your assignments. After discussing a problem with others, you should write the solution by yourself. For final projects, you are expected to work in groups 2-3.

AI tools policy

Using generative AI tools such as Claude Code and ChatGPT is allowed as long as they are properly disclosed for each submission. For individual assignments, we encourage you to implement it on your own so that it maximizes your learning, but learning the content with AI tools is acceptable. In fact, we encourage your creative use of these tools, treating them as collaborators in the learning process.

Additional course policies can be found on Canvas.

Submitting Coursework

Late Days


Preliminary Schedule

# Date Topic Materials Deadlines
1 Tues Jan 6 Introduction and tokenization lecture, notebook(html version) Assignment 1 out
2 Thurs Jan 8 Word vectors lecture, notebook(html version)
Readings:
Efficient Estimation of Word Representations in Vector Space by Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean
3 Tues Jan 13 Text Classification Assignment 1 due (Tuesday night)
Assignment 2 out
4 Thurs Jan 15 N-gram Language Modeling, Neural Language Models
5 Tues Jan 20 The NLP Recipe Assignment 3 out
6 Thurs Jan 22 Attention Assignment 2 due (Friday night)
7 Tues Jan 27 Transformers
8 Thurs Jan 29 Pretraining and modern NLP pipeline
9 Tues Feb 3 Benchmarks and Evaluation Assignment 4 out
10 Thurs Feb 5 Midterm Assignment 3 due (Friday night) Project Proposal due (Friday night)
11 Tues Feb 10 Decoding LLMs
12 Thurs Feb 12 Prompting
13 Tues Feb 17 Post-training
14 Thurs Feb 19 Reasoning and Agents Assignment 4 due (Friday night)
Blog Entry 1 due
15 Tues Feb 24 Guest Lecture: Hypothesis Generation with Large Language Models
16 Thurs Feb 26 Guest Lecture: Multimodal NLP Blog Entry 2 due
17 Tues Mar 3 Alignment and Safety
18 Thurs Mar 5 Final Project Presentation Blog Entry 3 due
19 TBD Final Exam

Acknowledgments

This course website is adapted from the Stanford CS336 course website. This course is built on prior offerings by Mina Lee and Chenhao Tan.