diff --git a/docs/机器学习系统/CSE234.en.md b/docs/机器学习系统/CSE234.en.md index cff42376..8ee74f7d 100644 --- a/docs/机器学习系统/CSE234.en.md +++ b/docs/机器学习系统/CSE234.en.md @@ -19,19 +19,19 @@ This course focuses on the design of end-to-end large language model (LLM) syste The course can be more accurately divided into three parts (with several additional guest lectures): -1. Foundations: modern deep learning and computational representations +Part 1. Foundations: modern deep learning and computational representations - Modern deep learning and computation graphs (framework and system fundamentals) - Automatic differentiation and an overview of ML system architectures - Tensor formats, in-depth matrix multiplication, and hardware accelerators -2. Systems and performance optimization: from GPU kernels to compilation and memory +Part 2. Systems and performance optimization: from GPU kernels to compilation and memory - GPUs and CUDA (including basic performance models) - GPU matrix multiplication and operator-level compilation - Triton programming, graph optimization, and compilation - Memory management (including practical issues and techniques in training and inference) - Quantization methods and system-level deployment -3. LLM systems: training and inference +Part 3. LLM systems: training and inference - Parallelization strategies: model parallelism, collective communication, intra-/inter-op parallelism, and auto-parallelization - LLM fundamentals: Transformers, Attention, and MoE - LLM training optimizations (e.g., FlashAttention-style techniques) diff --git a/docs/机器学习系统/CSE234.md b/docs/机器学习系统/CSE234.md index e1786ab5..eb150142 100644 --- a/docs/机器学习系统/CSE234.md +++ b/docs/机器学习系统/CSE234.md @@ -21,19 +21,19 @@ 课程可以更准确地分为三个部分(外加若干 guest lecture): -1. 基础:现代深度学习与计算表示 +Part 1. 基础:现代深度学习与计算表示 - Modern DL 与计算图(computational graph / framework 基础) - Autodiff 与 ML system 架构概览 - Tensor format、MatMul 深入与硬件加速器(accelerators) -2. 系统与性能优化:从 GPU Kernel 到编译与内存 +Part 2. 系统与性能优化:从 GPU Kernel 到编译与内存 - GPUs & CUDA(含基本性能模型) - GPU MatMul 与算子编译(operator compilation) - Triton 编程、图优化与编译(graph optimization & compilation) - Memory(含训练/推理中的内存问题与技巧) - Quantization(量化方法与系统落地) -3. LLM系统:训练与推理 +Part 3. LLM系统:训练与推理 - 并行策略:模型并行、collective communication、intra-/inter-op、自动并行化 - LLM 基础:Transformer、Attention、MoE - LLM 训练优化:FlashAttention 等