mirror of
https://github.com/PKUFlyingPig/cs-self-learning.git
synced 2026-06-22 01:17:17 +08:00
update
This commit is contained in:
parent
c0d6bac60f
commit
cefe6c8b13
2 changed files with 33 additions and 26 deletions
|
|
@ -20,23 +20,27 @@ This course focuses on the design of end-to-end large language model (LLM) syste
|
||||||
The course can be more accurately divided into three parts (with several additional guest lectures):
|
The course can be more accurately divided into three parts (with several additional guest lectures):
|
||||||
|
|
||||||
Part 1. Foundations: modern deep learning and computational representations
|
Part 1. Foundations: modern deep learning and computational representations
|
||||||
- Modern deep learning and computation graphs (framework and system fundamentals)
|
- Modern deep learning and computation graphs (framework and system fundamentals)
|
||||||
- Automatic differentiation and an overview of ML system architectures
|
- Automatic differentiation and an overview of ML system architectures
|
||||||
- Tensor formats, in-depth matrix multiplication, and hardware accelerators
|
- Tensor formats, in-depth matrix multiplication, and hardware accelerators
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
Part 2. Systems and performance optimization: from GPU kernels to compilation and memory
|
Part 2. Systems and performance optimization: from GPU kernels to compilation and memory
|
||||||
- GPUs and CUDA (including basic performance models)
|
- GPUs and CUDA (including basic performance models)
|
||||||
- GPU matrix multiplication and operator-level compilation
|
- GPU matrix multiplication and operator-level compilation
|
||||||
- Triton programming, graph optimization, and compilation
|
- Triton programming, graph optimization, and compilation
|
||||||
- Memory management (including practical issues and techniques in training and inference)
|
- Memory management (including practical issues and techniques in training and inference)
|
||||||
- Quantization methods and system-level deployment
|
- Quantization methods and system-level deployment
|
||||||
|
|
||||||
|
|
||||||
Part 3. LLM systems: training and inference
|
Part 3. LLM systems: training and inference
|
||||||
- Parallelization strategies: model parallelism, collective communication, intra-/inter-op parallelism, and auto-parallelization
|
- Parallelization strategies: model parallelism, collective communication, intra-/inter-op parallelism, and auto-parallelization
|
||||||
- LLM fundamentals: Transformers, Attention, and MoE
|
- LLM fundamentals: Transformers, Attention, and MoE
|
||||||
- LLM training optimizations (e.g., FlashAttention-style techniques)
|
- LLM training optimizations (e.g., FlashAttention-style techniques)
|
||||||
- LLM inference: continuous batching, paged attention, disaggregated prefill/decoding
|
- LLM inference: continuous batching, paged attention, disaggregated prefill/decoding
|
||||||
- Scaling laws
|
- Scaling laws
|
||||||
|
|
||||||
|
|
||||||
(Guest lectures cover topics such as ML compilers, LLM pretraining and open science, fast inference, and tool use and agents, serving as complementary extensions.)
|
(Guest lectures cover topics such as ML compilers, LLM pretraining and open science, fast inference, and tool use and agents, serving as complementary extensions.)
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -22,23 +22,26 @@
|
||||||
课程可以更准确地分为三个部分(外加若干 guest lecture):
|
课程可以更准确地分为三个部分(外加若干 guest lecture):
|
||||||
|
|
||||||
Part 1. 基础:现代深度学习与计算表示
|
Part 1. 基础:现代深度学习与计算表示
|
||||||
- Modern DL 与计算图(computational graph / framework 基础)
|
- Modern DL 与计算图(computational graph / framework 基础)
|
||||||
- Autodiff 与 ML system 架构概览
|
- Autodiff 与 ML system 架构概览
|
||||||
- Tensor format、MatMul 深入与硬件加速器(accelerators)
|
- Tensor format、MatMul 深入与硬件加速器(accelerators)
|
||||||
|
|
||||||
|
|
||||||
Part 2. 系统与性能优化:从 GPU Kernel 到编译与内存
|
Part 2. 系统与性能优化:从 GPU Kernel 到编译与内存
|
||||||
- GPUs & CUDA(含基本性能模型)
|
- GPUs & CUDA(含基本性能模型)
|
||||||
- GPU MatMul 与算子编译(operator compilation)
|
- GPU MatMul 与算子编译(operator compilation)
|
||||||
- Triton 编程、图优化与编译(graph optimization & compilation)
|
- Triton 编程、图优化与编译(graph optimization & compilation)
|
||||||
- Memory(含训练/推理中的内存问题与技巧)
|
- Memory(含训练/推理中的内存问题与技巧)
|
||||||
- Quantization(量化方法与系统落地)
|
- Quantization(量化方法与系统落地)
|
||||||
|
|
||||||
|
|
||||||
Part 3. LLM系统:训练与推理
|
Part 3. LLM系统:训练与推理
|
||||||
- 并行策略:模型并行、collective communication、intra-/inter-op、自动并行化
|
- 并行策略:模型并行、collective communication、intra-/inter-op、自动并行化
|
||||||
- LLM 基础:Transformer、Attention、MoE
|
- LLM 基础:Transformer、Attention、MoE
|
||||||
- LLM 训练优化:FlashAttention 等
|
- LLM 训练优化:FlashAttention 等
|
||||||
- LLM 推理:continuous batching、paged attention、disaggregated prefill/decoding
|
- LLM 推理:continuous batching、paged attention、disaggregated prefill/decoding
|
||||||
- Scaling law
|
- Scaling law
|
||||||
|
|
||||||
|
|
||||||
(Guest lectures:ML compiler、LLM pretraining/open science、fast inference、tool use & agents 等,作为补充与扩展。)
|
(Guest lectures:ML compiler、LLM pretraining/open science、fast inference、tool use & agents 等,作为补充与扩展。)
|
||||||
|
|
||||||
|
|
|
||||||
Loading…
Reference in a new issue