update

2026-06-22 09:27:22 +08:00 · 2026-02-01 20:03:11 -08:00 · 2026-02-01 20:03:11 -08:00 · cefe6c8b13
commit cefe6c8b13
parent c0d6bac60f
2 changed files with 33 additions and 26 deletions
--- a/docs/机器学习系统/CSE234.en.md
+++ b/docs/机器学习系统/CSE234.en.md
@ -24,6 +24,8 @@ Part 1. Foundations: modern deep learning and computational representations
 - Automatic differentiation and an overview of ML system architectures  
 - Tensor formats, in-depth matrix multiplication, and hardware accelerators  
 Part 2. Systems and performance optimization: from GPU kernels to compilation and memory  
 - GPUs and CUDA (including basic performance models)  
 - GPU matrix multiplication and operator-level compilation  
@ -31,6 +33,7 @@ Part 2. Systems and performance optimization: from GPU kernels to compilation an
 - Memory management (including practical issues and techniques in training and inference)  
 - Quantization methods and system-level deployment  
 Part 3. LLM systems: training and inference  
 - Parallelization strategies: model parallelism, collective communication, intra-/inter-op parallelism, and auto-parallelization  
 - LLM fundamentals: Transformers, Attention, and MoE  
@ -38,6 +41,7 @@ Part 3. LLM systems: training and inference
 - LLM inference: continuous batching, paged attention, disaggregated prefill/decoding  
 - Scaling laws
 (Guest lectures cover topics such as ML compilers, LLM pretraining and open science, fast inference, and tool use and agents, serving as complementary extensions.)
 The defining characteristic of CSE234 is its strong focus on LLM systems as the core application setting. The course emphasizes real-world system design trade-offs and engineering constraints, rather than remaining at the level of algorithms or API usage. Assignments often require students to directly confront performance bottlenecks—such as memory bandwidth limitations, communication overheads, and kernel fusion—and address them through Triton or system-level optimizations. Overall, the learning experience is fairly intensive: a solid background in systems and parallel computing is important. For self-study, it is strongly recommended to prepare CUDA, parallel programming, and core systems knowledge in advance; otherwise, the learning curve becomes noticeably steep in the later parts of the course, especially around LLM optimization and inference. That said, once the pace is manageable, the course offers strong long-term value for those pursuing work in LLM infrastructure, ML systems, or AI compilers.
--- a/docs/机器学习系统/CSE234.md
+++ b/docs/机器学习系统/CSE234.md
@ -26,6 +26,7 @@ Part 1. 基础：现代深度学习与计算表示
 - Autodiff 与 ML system 架构概览
 - Tensor format、MatMul 深入与硬件加速器（accelerators）
 Part 2. 系统与性能优化：从 GPU Kernel 到编译与内存
 - GPUs & CUDA（含基本性能模型）
 - GPU MatMul 与算子编译（operator compilation）
@ -33,6 +34,7 @@ Part 2. 系统与性能优化：从 GPU Kernel 到编译与内存
 - Memory（含训练/推理中的内存问题与技巧）
 - Quantization（量化方法与系统落地）
 Part 3. LLM系统：训练与推理
 - 并行策略：模型并行、collective communication、intra-/inter-op、自动并行化
 - LLM 基础：Transformer、Attention、MoE
@ -40,6 +42,7 @@ Part 3. LLM系统：训练与推理
 - LLM 推理：continuous batching、paged attention、disaggregated prefill/decoding
 - Scaling law
 （Guest lectures：ML compiler、LLM pretraining/open science、fast inference、tool use & agents 等，作为补充与扩展。）
 CSE234的最大特点在于非常专注于以LLM (LLM System)为核心应用场景，强调真实系统设计中的取舍与工程约束，而非停留在算法或 API 使用层面。课程作业通常需要直接面对性能瓶颈（如内存带宽、通信开销、kernel fusion 等），并通过 Triton 或系统级优化手段加以解决，对理解“为什么某些 LLM 系统设计是现在这个样子”非常有帮助。学习体验整体偏硬核，前期对系统与并行计算背景要求较高，自学时建议提前补齐 CUDA/并行编程与基础系统知识，否则在后半部分（尤其是 LLM 优化与推理相关内容）会明显感到陡峭的学习曲线。但一旦跟上节奏，这门课对从事 LLM Infra / ML Systems / AI Compiler 方向的同学具有很强的长期价值。