diff --git a/docs/机器学习系统/CSE234.en.md b/docs/机器学习系统/CSE234.en.md index 8ee74f7d..ac17d0b5 100644 --- a/docs/机器学习系统/CSE234.en.md +++ b/docs/机器学习系统/CSE234.en.md @@ -20,23 +20,27 @@ This course focuses on the design of end-to-end large language model (LLM) syste The course can be more accurately divided into three parts (with several additional guest lectures): Part 1. Foundations: modern deep learning and computational representations - - Modern deep learning and computation graphs (framework and system fundamentals) - - Automatic differentiation and an overview of ML system architectures - - Tensor formats, in-depth matrix multiplication, and hardware accelerators +- Modern deep learning and computation graphs (framework and system fundamentals) +- Automatic differentiation and an overview of ML system architectures +- Tensor formats, in-depth matrix multiplication, and hardware accelerators + + Part 2. Systems and performance optimization: from GPU kernels to compilation and memory - - GPUs and CUDA (including basic performance models) - - GPU matrix multiplication and operator-level compilation - - Triton programming, graph optimization, and compilation - - Memory management (including practical issues and techniques in training and inference) - - Quantization methods and system-level deployment +- GPUs and CUDA (including basic performance models) +- GPU matrix multiplication and operator-level compilation +- Triton programming, graph optimization, and compilation +- Memory management (including practical issues and techniques in training and inference) +- Quantization methods and system-level deployment + Part 3. LLM systems: training and inference - - Parallelization strategies: model parallelism, collective communication, intra-/inter-op parallelism, and auto-parallelization - - LLM fundamentals: Transformers, Attention, and MoE - - LLM training optimizations (e.g., FlashAttention-style techniques) - - LLM inference: continuous batching, paged attention, disaggregated prefill/decoding - - Scaling laws +- Parallelization strategies: model parallelism, collective communication, intra-/inter-op parallelism, and auto-parallelization +- LLM fundamentals: Transformers, Attention, and MoE +- LLM training optimizations (e.g., FlashAttention-style techniques) +- LLM inference: continuous batching, paged attention, disaggregated prefill/decoding +- Scaling laws + (Guest lectures cover topics such as ML compilers, LLM pretraining and open science, fast inference, and tool use and agents, serving as complementary extensions.) diff --git a/docs/机器学习系统/CSE234.md b/docs/机器学习系统/CSE234.md index eb150142..e03e665b 100644 --- a/docs/机器学习系统/CSE234.md +++ b/docs/机器学习系统/CSE234.md @@ -22,23 +22,26 @@ 课程可以更准确地分为三个部分(外加若干 guest lecture): Part 1. 基础:现代深度学习与计算表示 - - Modern DL 与计算图(computational graph / framework 基础) - - Autodiff 与 ML system 架构概览 - - Tensor format、MatMul 深入与硬件加速器(accelerators) +- Modern DL 与计算图(computational graph / framework 基础) +- Autodiff 与 ML system 架构概览 +- Tensor format、MatMul 深入与硬件加速器(accelerators) + Part 2. 系统与性能优化:从 GPU Kernel 到编译与内存 - - GPUs & CUDA(含基本性能模型) - - GPU MatMul 与算子编译(operator compilation) - - Triton 编程、图优化与编译(graph optimization & compilation) - - Memory(含训练/推理中的内存问题与技巧) - - Quantization(量化方法与系统落地) +- GPUs & CUDA(含基本性能模型) +- GPU MatMul 与算子编译(operator compilation) +- Triton 编程、图优化与编译(graph optimization & compilation) +- Memory(含训练/推理中的内存问题与技巧) +- Quantization(量化方法与系统落地) + Part 3. LLM系统:训练与推理 - - 并行策略:模型并行、collective communication、intra-/inter-op、自动并行化 - - LLM 基础:Transformer、Attention、MoE - - LLM 训练优化:FlashAttention 等 - - LLM 推理:continuous batching、paged attention、disaggregated prefill/decoding - - Scaling law +- 并行策略:模型并行、collective communication、intra-/inter-op、自动并行化 +- LLM 基础:Transformer、Attention、MoE +- LLM 训练优化:FlashAttention 等 +- LLM 推理:continuous batching、paged attention、disaggregated prefill/decoding +- Scaling law + (Guest lectures:ML compiler、LLM pretraining/open science、fast inference、tool use & agents 等,作为补充与扩展。)