From 8e1b6d82b3e29f50d1b71d8fa4a8ab87c4714813 Mon Sep 17 00:00:00 2001 From: Junda Chen <32371474+GindaChen@users.noreply.github.com> Date: Sat, 31 Jan 2026 22:32:03 -0800 Subject: [PATCH] update contents to contain more details --- docs/机器学习系统/CSE234.en.md | 63 ++++++++++++++++++++++------------ docs/机器学习系统/CSE234.md | 34 ++++++++++++++---- 2 files changed, 69 insertions(+), 28 deletions(-) diff --git a/docs/机器学习系统/CSE234.en.md b/docs/机器学习系统/CSE234.en.md index c37e67f6..e62d844e 100644 --- a/docs/机器学习系统/CSE234.en.md +++ b/docs/机器学习系统/CSE234.en.md @@ -1,35 +1,54 @@ # CSE234: Data Systems for Machine Learning -## Descriptions +## Course Overview -- Offered by: UCSD -- Prerequisites: Linear Algebra, Deep Learning, Operating Systems -- Programming Languages: Python, Triton -- Difficulty: 🌟🌟🌟 -- Class Hour: 80 hours +- University: UCSD +- Prerequisites: Linear Algebra, Deep Learning, Operating Systems, Computer Networks, Distributed Systems +- Programming Languages: Python, Triton +- Difficulty: 🌟🌟🌟 +- Estimated Workload: ~120 hours - +This course focuses on the design of end-to-end large language model (LLM) systems, serving as an introductory yet comprehensive guide to building efficient LLM systems in practice. -This course is focused on designing a wholistic LLM System class as an introduction to design efficient systems for LLM. +The course can be more accurately divided into three main parts (with several additional guest lectures): -The class into three parts, covering the following topics. +1. Foundations: modern deep learning and computational representations + - Modern deep learning and computation graphs (framework and system basics) + - Automatic differentiation and ML system architecture overview + - Tensor formats, in-depth matrix multiplication, and hardware accelerators -1. Basics: deep learning, autodiff, CUDA programming, ML hardware -2. ML systems and optimizations: Dataflow graph systems, ML compilation, memory and graph optimization, ML parallelism, auto-parallelization -3. LLM systems: LLM training, data curation, inference and serving, attention optimization, scaling law, RAG, LLM agents +2. Systems and performance optimization: from GPU kernels to compilation and memory + - GPUs and CUDA (including basic performance models) + - GPU matrix multiplication and operator-level compilation + - Triton programming, graph optimization, and compilation + - Memory management in training and inference + - Quantization techniques and system-level deployment +3. LLM systems: training and inference + - Parallelization strategies: model parallelism, collective communication, intra-/inter-op parallelism, and auto-parallelization + - LLM fundamentals: Transformers, Attention, and MoE + - LLM training optimizations (e.g., FlashAttention-style techniques) + - LLM inference: continuous batching, paged attention, disaggregated prefill/decoding + - Scaling laws, test-time compute and reasoning, and “LLM + X” applications (RAG, search, multimodality, tool use, agents, etc.) + +(Guest lectures cover topics such as ML compilers, LLM pretraining and open science, fast inference, and tool use and agents, serving as complementary extensions.) + +The defining characteristic of CSE234 is its strong focus on LLM systems as the core application setting. The course emphasizes real-world system design trade-offs and engineering constraints, rather than remaining at the level of algorithms or API usage. Assignments often require students to directly confront performance bottlenecks—such as memory bandwidth limitations, communication overheads, and kernel fusion—and address them through Triton or system-level optimizations. This makes the course particularly helpful for understanding *why modern LLM systems are designed the way they are*. The learning experience is overall quite intensive: a solid background in systems and parallel computing is important, and for self-study it is strongly recommended to prepare CUDA, parallel programming, and core systems knowledge in advance. Otherwise, the learning curve becomes noticeably steep in the later parts of the course, especially around LLM optimization and inference. That said, once the pace is manageable, the course offers substantial long-term value for those pursuing work in LLM infrastructure, ML systems, or AI compilers. ## Course Resources -- Course Website: https://hao-ai-lab.github.io/cse234-w25/ -- Recordings: https://hao-ai-lab.github.io/cse234-w25/ -- Textbooks: https://hao-ai-lab.github.io/cse234-w25/resources/ -- Assignments: https://hao-ai-lab.github.io/cse234-w25/assignments/ +- Course Website: https://hao-ai-lab.github.io/cse234-w25/ +- Lecture Videos: https://hao-ai-lab.github.io/cse234-w25/ +- Reading Materials: https://hao-ai-lab.github.io/cse234-w25/resources/ +- Assignments: https://hao-ai-lab.github.io/cse234-w25/assignments/ + +## Resource Summary + +All course materials are released in open-source form. However, the online grading infrastructure and reference solutions for assignments have not been made public. \ No newline at end of file diff --git a/docs/机器学习系统/CSE234.md b/docs/机器学习系统/CSE234.md index 15c94822..c88cdc78 100644 --- a/docs/机器学习系统/CSE234.md +++ b/docs/机器学习系统/CSE234.md @@ -4,10 +4,10 @@ ## 课程简介 - 所属大学:UCSD -- 先修要求:线性代数,深度学习,操作系统 +- 先修要求:线性代数,深度学习,操作系统,计算机网络,分布式系统 - 编程语言:Python, Triton - 课程难度:🌟🌟🌟 -- 预计学时:80小时 +- 预计学时:120小时