From b96dd61c6bc83e6c59fd553bd4e83e72b2882b4b Mon Sep 17 00:00:00 2001 From: Junda Chen <32371474+GindaChen@users.noreply.github.com> Date: Sun, 1 Feb 2026 20:04:08 -0800 Subject: [PATCH] update --- docs/机器学习系统/CSE234.en.md | 3 +++ docs/机器学习系统/CSE234.md | 3 +++ 2 files changed, 6 insertions(+) diff --git a/docs/机器学习系统/CSE234.en.md b/docs/机器学习系统/CSE234.en.md index ac17d0b5..7bc175d2 100644 --- a/docs/机器学习系统/CSE234.en.md +++ b/docs/机器学习系统/CSE234.en.md @@ -20,6 +20,7 @@ This course focuses on the design of end-to-end large language model (LLM) syste The course can be more accurately divided into three parts (with several additional guest lectures): Part 1. Foundations: modern deep learning and computational representations + - Modern deep learning and computation graphs (framework and system fundamentals) - Automatic differentiation and an overview of ML system architectures - Tensor formats, in-depth matrix multiplication, and hardware accelerators @@ -27,6 +28,7 @@ Part 1. Foundations: modern deep learning and computational representations Part 2. Systems and performance optimization: from GPU kernels to compilation and memory + - GPUs and CUDA (including basic performance models) - GPU matrix multiplication and operator-level compilation - Triton programming, graph optimization, and compilation @@ -35,6 +37,7 @@ Part 2. Systems and performance optimization: from GPU kernels to compilation an Part 3. LLM systems: training and inference + - Parallelization strategies: model parallelism, collective communication, intra-/inter-op parallelism, and auto-parallelization - LLM fundamentals: Transformers, Attention, and MoE - LLM training optimizations (e.g., FlashAttention-style techniques) diff --git a/docs/机器学习系统/CSE234.md b/docs/机器学习系统/CSE234.md index e03e665b..e228b9e6 100644 --- a/docs/机器学习系统/CSE234.md +++ b/docs/机器学习系统/CSE234.md @@ -22,12 +22,14 @@ 课程可以更准确地分为三个部分(外加若干 guest lecture): Part 1. 基础:现代深度学习与计算表示 + - Modern DL 与计算图(computational graph / framework 基础) - Autodiff 与 ML system 架构概览 - Tensor format、MatMul 深入与硬件加速器(accelerators) Part 2. 系统与性能优化:从 GPU Kernel 到编译与内存 + - GPUs & CUDA(含基本性能模型) - GPU MatMul 与算子编译(operator compilation) - Triton 编程、图优化与编译(graph optimization & compilation) @@ -36,6 +38,7 @@ Part 2. 系统与性能优化:从 GPU Kernel 到编译与内存 Part 3. LLM系统:训练与推理 + - 并行策略:模型并行、collective communication、intra-/inter-op、自动并行化 - LLM 基础:Transformer、Attention、MoE - LLM 训练优化:FlashAttention 等