CS 671: Recent Advances in Large Language Models

Course Overview

Description

This graduate-level course offers an in-depth exploration of the cutting-edge research and engineering principles behind large language models (LLMs). We will cover the foundational architectures that power modern AI, such as the Transformer, and delve into the complex systems required to train and serve these massive models at scale. Topics include advanced parallel computing paradigms (data, tensor, pipeline), specialized hardware like GPUs and accelerators, modern model architectures like Mixture-of-Experts (MoE), and the latest techniques in model optimization and alignment. A significant portion of the course is dedicated to student-led presentations and a final project, allowing students to engage directly with state-of-the-art literature and apply their knowledge to a practical problem.

Prerequisites

A strong background in machine learning, deep learning, and proficiency in Python programming are expected. Familiarity with deep learning frameworks like PyTorch or TensorFlow is highly recommended.

Schedule & Topics

Part 1: Foundations & Parallelism (Instructor-Led)

Week 1 (Sep 2-5): Introduction & Logistics

Course overview, grading, and project expectations.
The modern AI landscape.

Week 2 (Sep 8-12): Foundation Model Architectures

A deep dive into the Transformer architecture. Required readings: Attention Is All You Need, BERT, and GPT-2.

Week 3 (Sep 15-19): Parallelisms I

Topics on data, tensor, and pipeline parallelism. Readings: Megatron-LM and GPipe.

Week 4 (Sep 22-26): Parallelisms II

Advanced architectures (MOE, SSMs) and parallelisms. Readings: ZeRO and ST-MOE.

Week 5 (Sep 29 - Oct 3): Hardware & Systems

High-performance computing hardware (GPUs, accelerators), networking, and storage.

Part 2: Advanced Topics (Student-Led)

Weeks 6-14 (Oct 6 - Dec 5): Student-Led Paper Presentations

Each week, students will lead discussions on selected papers from the course reading list. Please sign up for a presentation slot using the link below. Note: Thanksgiving week (Nov 24-28) schedule will be adjusted.

Part 3: Final Projects

Weeks 15-16 (Dec 8-19): Final Project Presentations

Student teams will present their final course projects.

Reading List for Student Presentations

Final Project

The final project is a significant component of this course. Students will work in teams to apply their knowledge to a practical problem in the field of large language models. This could involve reproducing results from a recent paper, exploring a novel model architecture, or developing a new application. Project presentations will take place during the final two weeks of the semester (Dec 8-19).

Course Overview

Description

Prerequisites

Schedule & Topics

Part 1: Foundations & Parallelism (Instructor-Led)

Part 2: Advanced Topics (Student-Led)

Part 3: Final Projects

Reading List for Student Presentations

Efficient Attention

Efficient Inference & Serving

Efficient Training & Fine-Tuning

Model Alignment

Alternative Architectures

Scientific Applications

Final Project