This commit is contained in:
Junda Chen 2026-02-01 20:04:08 -08:00
parent cefe6c8b13
commit b96dd61c6b
2 changed files with 6 additions and 0 deletions

View file

@ -20,6 +20,7 @@ This course focuses on the design of end-to-end large language model (LLM) syste
The course can be more accurately divided into three parts (with several additional guest lectures):
Part 1. Foundations: modern deep learning and computational representations
- Modern deep learning and computation graphs (framework and system fundamentals)
- Automatic differentiation and an overview of ML system architectures
- Tensor formats, in-depth matrix multiplication, and hardware accelerators
@ -27,6 +28,7 @@ Part 1. Foundations: modern deep learning and computational representations
Part 2. Systems and performance optimization: from GPU kernels to compilation and memory
- GPUs and CUDA (including basic performance models)
- GPU matrix multiplication and operator-level compilation
- Triton programming, graph optimization, and compilation
@ -35,6 +37,7 @@ Part 2. Systems and performance optimization: from GPU kernels to compilation an
Part 3. LLM systems: training and inference
- Parallelization strategies: model parallelism, collective communication, intra-/inter-op parallelism, and auto-parallelization
- LLM fundamentals: Transformers, Attention, and MoE
- LLM training optimizations (e.g., FlashAttention-style techniques)

View file

@ -22,12 +22,14 @@
课程可以更准确地分为三个部分(外加若干 guest lecture
Part 1. 基础:现代深度学习与计算表示
- Modern DL 与计算图computational graph / framework 基础)
- Autodiff 与 ML system 架构概览
- Tensor format、MatMul 深入与硬件加速器accelerators
Part 2. 系统与性能优化:从 GPU Kernel 到编译与内存
- GPUs & CUDA含基本性能模型
- GPU MatMul 与算子编译operator compilation
- Triton 编程、图优化与编译graph optimization & compilation
@ -36,6 +38,7 @@ Part 2. 系统与性能优化:从 GPU Kernel 到编译与内存
Part 3. LLM系统训练与推理
- 并行策略模型并行、collective communication、intra-/inter-op、自动并行化
- LLM 基础Transformer、Attention、MoE
- LLM 训练优化FlashAttention 等