update contents to contain more details

2026-06-21 00:47:16 +08:00 · 2026-01-31 22:32:03 -08:00 · 2026-01-31 22:32:03 -08:00 · 8e1b6d82b3
commit 8e1b6d82b3
parent 5be29a7378
2 changed files with 69 additions and 28 deletions
--- a/docs/机器学习系统/CSE234.en.md
+++ b/docs/机器学习系统/CSE234.en.md
@ -1,35 +1,54 @@
 # CSE234: Data Systems for Machine Learning

-## Descriptions
+## Course Overview

- Offered by: UCSD
- Prerequisites: Linear Algebra, Deep Learning, Operating Systems
- Programming Languages: Python, Triton
- Difficulty: 🌟🌟🌟
- Class Hour: 80 hours
+- University: UCSD  
+- Prerequisites: Linear Algebra, Deep Learning, Operating Systems, Computer Networks, Distributed Systems  
+- Programming Languages: Python, Triton  
+- Difficulty: 🌟🌟🌟  
+- Estimated Workload: ~120 hours  

-<!-- 
-        Introduce the course in a paragraph or two, including but not limited to:
-        (1) The technical knowledge covered in lectures
-        (2) Its differences and features compared to similar courses
-        (3) Your personal experiences and feelings after studying this course
-        (4) Caveats about studying this course on your own (pitfalls, difficulty warnings, etc.)
-        (5) ... ...
+<!-- Introduce the course in one or two paragraphs, including but not limited to:
+     (1) The scope of technical topics covered
+     (2) Its advantages and distinguishing features compared to similar courses
+     (3) Personal learning experience and impressions
+     (4) Caveats and difficulty warnings for self-study
 -->

+This course focuses on the design of end-to-end large language model (LLM) systems, serving as an introductory yet comprehensive guide to building efficient LLM systems in practice.

-This course is focused on designing a wholistic LLM System class as an introduction to design efficient systems for LLM. 
+The course can be more accurately divided into three main parts (with several additional guest lectures):

-The class into three parts, covering the following topics.
+1. Foundations: modern deep learning and computational representations  
+   - Modern deep learning and computation graphs (framework and system basics)  
+   - Automatic differentiation and ML system architecture overview  
+   - Tensor formats, in-depth matrix multiplication, and hardware accelerators  

-1. Basics: deep learning, autodiff, CUDA programming, ML hardware
-2. ML systems and optimizations: Dataflow graph systems, ML compilation, memory and graph optimization, ML parallelism, auto-parallelization
-3. LLM systems: LLM training, data curation, inference and serving, attention optimization, scaling law, RAG, LLM agents
+2. Systems and performance optimization: from GPU kernels to compilation and memory  
+   - GPUs and CUDA (including basic performance models)  
+   - GPU matrix multiplication and operator-level compilation  
+   - Triton programming, graph optimization, and compilation  
+   - Memory management in training and inference  
+   - Quantization techniques and system-level deployment  

+3. LLM systems: training and inference  
+   - Parallelization strategies: model parallelism, collective communication, intra-/inter-op parallelism, and auto-parallelization  
+   - LLM fundamentals: Transformers, Attention, and MoE  
+   - LLM training optimizations (e.g., FlashAttention-style techniques)  
+   - LLM inference: continuous batching, paged attention, disaggregated prefill/decoding  
+   - Scaling laws, test-time compute and reasoning, and “LLM + X” applications (RAG, search, multimodality, tool use, agents, etc.)
+
+(Guest lectures cover topics such as ML compilers, LLM pretraining and open science, fast inference, and tool use and agents, serving as complementary extensions.)
+
+The defining characteristic of CSE234 is its strong focus on LLM systems as the core application setting. The course emphasizes real-world system design trade-offs and engineering constraints, rather than remaining at the level of algorithms or API usage. Assignments often require students to directly confront performance bottlenecks—such as memory bandwidth limitations, communication overheads, and kernel fusion—and address them through Triton or system-level optimizations. This makes the course particularly helpful for understanding *why modern LLM systems are designed the way they are*. The learning experience is overall quite intensive: a solid background in systems and parallel computing is important, and for self-study it is strongly recommended to prepare CUDA, parallel programming, and core systems knowledge in advance. Otherwise, the learning curve becomes noticeably steep in the later parts of the course, especially around LLM optimization and inference. That said, once the pace is manageable, the course offers substantial long-term value for those pursuing work in LLM infrastructure, ML systems, or AI compilers.

 ## Course Resources

- Course Website: https://hao-ai-lab.github.io/cse234-w25/
- Recordings: https://hao-ai-lab.github.io/cse234-w25/
- Textbooks: https://hao-ai-lab.github.io/cse234-w25/resources/
- Assignments: https://hao-ai-lab.github.io/cse234-w25/assignments/
+- Course Website: https://hao-ai-lab.github.io/cse234-w25/  
+- Lecture Videos: https://hao-ai-lab.github.io/cse234-w25/  
+- Reading Materials: https://hao-ai-lab.github.io/cse234-w25/resources/  
+- Assignments: https://hao-ai-lab.github.io/cse234-w25/assignments/  
+
+## Resource Summary
+
+All course materials are released in open-source form. However, the online grading infrastructure and reference solutions for assignments have not been made public.
--- a/docs/机器学习系统/CSE234.md
+++ b/docs/机器学习系统/CSE234.md
@ -4,10 +4,10 @@
 ## 课程简介

 - 所属大学：UCSD
- 先修要求：线性代数，深度学习，操作系统
+- 先修要求：线性代数，深度学习，操作系统，计算机网络，分布式系统
 - 编程语言：Python, Triton
 - 课程难度：🌟🌟🌟
- 预计学时：80小时
+- 预计学时：120小时

 <!-- 用一两段话介绍这门课程，内容包括但不限于：
    （1）课程覆盖的知识点范围
@ -19,11 +19,30 @@

 本课程专注于设计一个全面的大语言模型(LLM)系统课程，作为设计高效LLM系统的入门介绍。

-课程分为三个部分，涵盖以下主题：
+课程可以更准确地分为三个部分（外加若干 guest lecture）：

-1. 基础知识：深度学习、自动微分、CUDA编程、机器学习硬件
-2. 机器学习系统与优化：数据流图系统、机器学习编译、内存与图优化、机器学习并行化、自动并行化
-3. 大语言模型系统：LLM训练、数据整理、推理与服务、注意力机制优化、缩放定律、检索增强生成(RAG)、Agent
+1. 基础：现代深度学习与计算表示
+   - Modern DL 与计算图（computational graph / framework 基础）
+   - Autodiff 与 ML system 架构概览
+   - Tensor format、MatMul 深入与硬件加速器（accelerators）
+
+2. 系统与性能优化：从 GPU Kernel 到编译与内存
+   - GPUs & CUDA（含基本性能模型）
+   - GPU MatMul 与算子编译（operator compilation）
+   - Triton 编程、图优化与编译（graph optimization & compilation）
+   - Memory（含训练/推理中的内存问题与技巧）
+   - Quantization（量化方法与系统落地）
+
+3. LLM系统：训练与推理
+   - 并行策略：模型并行、collective communication、intra-/inter-op、自动并行化
+   - LLM 基础：Transformer、Attention、MoE
+   - LLM 训练优化：FlashAttention 等
+   - LLM 推理：continuous batching、paged attention、disaggregated prefill/decoding
+   - Scaling law、test-time compute / reasoning，以及 “LLM + X”（RAG / search / multimodality / tool-use / agents 等）
+
+（Guest lectures：ML compiler、LLM pretraining/open science、fast inference、tool use & agents 等，作为补充与扩展。）
+
+CSE234的最大特点在于非常专注于以LLM (LLM System)为核心应用场景，强调真实系统设计中的取舍与工程约束，而非停留在算法或 API 使用层面。课程作业通常需要直接面对性能瓶颈（如内存带宽、通信开销、kernel fusion 等），并通过 Triton 或系统级优化手段加以解决，对理解“为什么某些 LLM 系统设计是现在这个样子”非常有帮助。学习体验整体偏硬核，前期对系统与并行计算背景要求较高，自学时建议提前补齐 CUDA/并行编程与基础系统知识，否则在后半部分（尤其是 LLM 优化与推理相关内容）会明显感到陡峭的学习曲线。但一旦跟上节奏，这门课对从事 LLM Infra / ML Systems / AI Compiler 方向的同学具有很强的长期价值。


 ## 课程资源
@ -33,3 +52,6 @@
 - 课程教材：https://hao-ai-lab.github.io/cse234-w25/resources/
 - 课程作业：https://hao-ai-lab.github.io/cse234-w25/assignments/

+## 资源汇总
+
+所有课程内容都发布了对应的开源版本，但在线测评和作业参考答案部分尚未开源。