Course Overview
Description
This graduate-level course offers an in-depth exploration of the cutting-edge research and engineering principles behind large language models (LLMs). We will cover the foundational architectures that power modern AI, such as the Transformer, and delve into the complex systems required to train and serve these massive models at scale. Topics include advanced parallel computing paradigms (data, tensor, pipeline), specialized hardware like GPUs and accelerators, modern model architectures like Mixture-of-Experts (MoE), and the latest techniques in model optimization and alignment. A significant portion of the course is dedicated to student-led presentations and a final project, allowing students to engage directly with state-of-the-art literature and apply their knowledge to a practical problem.
Prerequisites
A strong background in machine learning, deep learning, and proficiency in Python programming are expected. Familiarity with deep learning frameworks like PyTorch or TensorFlow is highly recommended.
Schedule & Topics
Part 1: Foundations & Parallelism (Instructor-Led)
Week 1 (Sep 2-5): Introduction & Logistics
- Course overview, grading, and project expectations.
- The modern AI landscape.
Week 2 (Sep 8-12): Foundation Model Architectures
A deep dive into the Transformer architecture. Required readings: Attention Is All You Need, BERT, and GPT-2.
Week 3 (Sep 15-19): Parallelisms I
Topics on data, tensor, and pipeline parallelism. Readings: Megatron-LM and GPipe.
Week 4 (Sep 22-26): Parallelisms II
Advanced architectures (MOE, SSMs) and parallelisms. Readings: ZeRO and ST-MOE.
Week 5 (Sep 29 - Oct 3): Hardware & Systems
High-performance computing hardware (GPUs, accelerators), networking, and storage.
Part 2: Advanced Topics (Student-Led)
Weeks 6-14 (Oct 6 - Dec 5): Student-Led Paper Presentations
Each week, students will lead discussions on selected papers from the course reading list. Please sign up for a presentation slot using the link below. Note: Thanksgiving week (Nov 24-28) schedule will be adjusted.
Part 3: Final Projects
Weeks 15-16 (Dec 8-19): Final Project Presentations
Student teams will present their final course projects.
Reading List for Student Presentations
Efficient Attention
Efficient Inference & Serving
- vLLM: Efficient Memory Management for Large Language Model Serving with PagedAttention (Kwon et al., 2023)
- Speculative Decoding (Leviathan et al., 2022)
Efficient Training & Fine-Tuning
- Mixed Precision Training (Micikevicius et al., 2017)
- LoRA: Low-Rank Adaptation of Large Language Models (Hu et al., 2021)
Model Alignment
- Direct Preference Optimization: Your Language Model is Secretly a Reward Model (Rafailov et al., 2023)
Alternative Architectures
- Mamba: Linear-Time Sequence Modeling with Selective State Spaces (Gu & Dao, 2023)
Scientific Applications
- Highly accurate protein structure prediction with AlphaFold (Jumper et al., 2021)
Final Project
The final project is a significant component of this course. Students will work in teams to apply their knowledge to a practical problem in the field of large language models. This could involve reproducing results from a recent paper, exploring a novel model architecture, or developing a new application. Project presentations will take place during the final two weeks of the semester (Dec 8-19).