mirror of
https://github.com/PKUFlyingPig/cs-self-learning.git
synced 2026-02-03 02:24:53 +08:00
[COURSE] Add UCSD CSE234 Data Systems for Machine Learning (#713)
* Add CSE234 UCSD * update contents to contain more details * update extended materials * Update mkdocs.yml * change a bit * make it better * update * update
This commit is contained in:
75
docs/机器学习系统/CSE234.en.md
Normal file
75
docs/机器学习系统/CSE234.en.md
Normal file
@@ -0,0 +1,75 @@
|
||||
# CSE234: Data Systems for Machine Learning
|
||||
|
||||
## Course Overview
|
||||
|
||||
- University: UCSD
|
||||
- Prerequisites: Linear Algebra, Deep Learning, Operating Systems, Computer Networks, Distributed Systems
|
||||
- Programming Languages: Python, Triton
|
||||
- Difficulty: 🌟🌟🌟
|
||||
- Estimated Workload: ~120 hours
|
||||
|
||||
<!-- Introduce the course in one or two paragraphs, including but not limited to:
|
||||
(1) The scope of technical topics covered
|
||||
(2) Its distinguishing features compared to similar courses
|
||||
(3) Personal learning experience and impressions
|
||||
(4) Caveats and difficulty warnings for self-study
|
||||
-->
|
||||
|
||||
This course focuses on the design of end-to-end large language model (LLM) systems, serving as an introductory course to building efficient LLM systems in practice.
|
||||
|
||||
The course can be more accurately divided into three parts (with several additional guest lectures):
|
||||
|
||||
Part 1. Foundations: modern deep learning and computational representations
|
||||
|
||||
- Modern deep learning and computation graphs (framework and system fundamentals)
|
||||
- Automatic differentiation and an overview of ML system architectures
|
||||
- Tensor formats, in-depth matrix multiplication, and hardware accelerators
|
||||
|
||||
|
||||
|
||||
Part 2. Systems and performance optimization: from GPU kernels to compilation and memory
|
||||
|
||||
- GPUs and CUDA (including basic performance models)
|
||||
- GPU matrix multiplication and operator-level compilation
|
||||
- Triton programming, graph optimization, and compilation
|
||||
- Memory management (including practical issues and techniques in training and inference)
|
||||
- Quantization methods and system-level deployment
|
||||
|
||||
|
||||
Part 3. LLM systems: training and inference
|
||||
|
||||
- Parallelization strategies: model parallelism, collective communication, intra-/inter-op parallelism, and auto-parallelization
|
||||
- LLM fundamentals: Transformers, Attention, and MoE
|
||||
- LLM training optimizations (e.g., FlashAttention-style techniques)
|
||||
- LLM inference: continuous batching, paged attention, disaggregated prefill/decoding
|
||||
- Scaling laws
|
||||
|
||||
|
||||
(Guest lectures cover topics such as ML compilers, LLM pretraining and open science, fast inference, and tool use and agents, serving as complementary extensions.)
|
||||
|
||||
The defining characteristic of CSE234 is its strong focus on LLM systems as the core application setting. The course emphasizes real-world system design trade-offs and engineering constraints, rather than remaining at the level of algorithms or API usage. Assignments often require students to directly confront performance bottlenecks—such as memory bandwidth limitations, communication overheads, and kernel fusion—and address them through Triton or system-level optimizations. Overall, the learning experience is fairly intensive: a solid background in systems and parallel computing is important. For self-study, it is strongly recommended to prepare CUDA, parallel programming, and core systems knowledge in advance; otherwise, the learning curve becomes noticeably steep in the later parts of the course, especially around LLM optimization and inference. That said, once the pace is manageable, the course offers strong long-term value for those pursuing work in LLM infrastructure, ML systems, or AI compilers.
|
||||
|
||||
## Recommended Learning Path
|
||||
|
||||
The course itself is relatively well-structured and progressive. However, for students without prior experience in systems and parallel computing, the transition into the second part of the course may feel somewhat steep. A key aspect of this course is spending significant time implementing and optimizing systems in practice. Therefore, it is highly recommended to explore relevant open-source projects on GitHub while reading papers, and to implement related systems or kernels hands-on to deepen understanding.
|
||||
|
||||
- Foundations: consider studying alongside open-source projects such as [micrograd](https://github.com/karpathy/micrograd)
|
||||
- Systems & performance optimization and LLM systems: consider pairing with projects such as [nanoGPT](https://github.com/karpathy/nanoGPT) and [nano-vllm](https://github.com/GeeeekExplorer/nano-vllm)
|
||||
|
||||
The course website itself provides a curated list of additional references and materials, which can be found here:
|
||||
[Book-related documentation and courses](https://hao-ai-lab.github.io/cse234-w25/resources/#book-related-documentation-and-courses)
|
||||
|
||||
## Course Resources
|
||||
|
||||
- Course Website: https://hao-ai-lab.github.io/cse234-w25/
|
||||
- Lecture Videos: https://hao-ai-lab.github.io/cse234-w25/
|
||||
- Reading Materials: https://hao-ai-lab.github.io/cse234-w25/resources/
|
||||
- Assignments: https://hao-ai-lab.github.io/cse234-w25/assignments/
|
||||
|
||||
## Resource Summary
|
||||
|
||||
All course materials are released in open-source form. However, the online grading infrastructure and reference solutions for assignments have not been made public.
|
||||
|
||||
## Additional Resources / Further Reading
|
||||
|
||||
- [GPUMode](https://www.youtube.com/@GPUMODE): offers in-depth explanations of GPU kernels and systems. Topics referenced in the course—such as [DistServe](https://www.youtube.com/watch?v=tIPDwUepXcA), [FlashAttention](https://www.youtube.com/watch?v=VPslgC9piIw), and [Triton](https://www.youtube.com/watch?v=njgow_zaJMw)—all have excellent extended talks available.
|
||||
77
docs/机器学习系统/CSE234.md
Normal file
77
docs/机器学习系统/CSE234.md
Normal file
@@ -0,0 +1,77 @@
|
||||
# CSE234: Data Systems for Machine Learning
|
||||
|
||||
|
||||
## 课程简介
|
||||
|
||||
- 所属大学:UCSD
|
||||
- 先修要求:线性代数,深度学习,操作系统,计算机网络,分布式系统
|
||||
- 编程语言:Python, Triton
|
||||
- 课程难度:🌟🌟🌟
|
||||
- 预计学时:120小时
|
||||
|
||||
<!-- 用一两段话介绍这门课程,内容包括但不限于:
|
||||
(1)课程覆盖的知识点范围
|
||||
(2)与同类课程相比它的优势与特点
|
||||
(3)学习这门课程的体验与感受
|
||||
(4)自学这门课的注意点(踩过的坑、难度预警等等)
|
||||
(5)... ...
|
||||
-->
|
||||
|
||||
本课程专注于设计一个全面的大语言模型(LLM)系统课程,作为设计高效LLM系统的入门介绍。
|
||||
|
||||
课程可以更准确地分为三个部分(外加若干 guest lecture):
|
||||
|
||||
Part 1. 基础:现代深度学习与计算表示
|
||||
|
||||
- Modern DL 与计算图(computational graph / framework 基础)
|
||||
- Autodiff 与 ML system 架构概览
|
||||
- Tensor format、MatMul 深入与硬件加速器(accelerators)
|
||||
|
||||
|
||||
Part 2. 系统与性能优化:从 GPU Kernel 到编译与内存
|
||||
|
||||
- GPUs & CUDA(含基本性能模型)
|
||||
- GPU MatMul 与算子编译(operator compilation)
|
||||
- Triton 编程、图优化与编译(graph optimization & compilation)
|
||||
- Memory(含训练/推理中的内存问题与技巧)
|
||||
- Quantization(量化方法与系统落地)
|
||||
|
||||
|
||||
Part 3. LLM系统:训练与推理
|
||||
|
||||
- 并行策略:模型并行、collective communication、intra-/inter-op、自动并行化
|
||||
- LLM 基础:Transformer、Attention、MoE
|
||||
- LLM 训练优化:FlashAttention 等
|
||||
- LLM 推理:continuous batching、paged attention、disaggregated prefill/decoding
|
||||
- Scaling law
|
||||
|
||||
|
||||
(Guest lectures:ML compiler、LLM pretraining/open science、fast inference、tool use & agents 等,作为补充与扩展。)
|
||||
|
||||
CSE234的最大特点在于非常专注于以LLM (LLM System)为核心应用场景,强调真实系统设计中的取舍与工程约束,而非停留在算法或 API 使用层面。课程作业通常需要直接面对性能瓶颈(如内存带宽、通信开销、kernel fusion 等),并通过 Triton 或系统级优化手段加以解决,对理解“为什么某些 LLM 系统设计是现在这个样子”非常有帮助。学习体验整体偏硬核,前期对系统与并行计算背景要求较高,自学时建议提前补齐 CUDA/并行编程与基础系统知识,否则在后半部分(尤其是 LLM 优化与推理相关内容)会明显感到陡峭的学习曲线。但一旦跟上节奏,这门课对从事 LLM Infra / ML Systems / AI Compiler 方向的同学具有很强的长期价值。
|
||||
|
||||
|
||||
## 学习路线推荐
|
||||
|
||||
课程本身其实比较循序渐进,但是对于没有系统与并行计算背景的同学来说可能到第二部分会感觉稍微陡峭一点。课程最核心的部分其实是要花很多时间动手实现与优化系统,因此建议在读paper的时候就可以在Github上找一些相关的开源项目,动手实现相关的系统或者Kernel,加深理解。
|
||||
|
||||
- 基础部分:建议配合 [micrograd](https://github.com/karpathy/micrograd) 等开源项目一起学习
|
||||
- 系统与性能优化 & LLM系统:建议配合 [nanoGPT](https://github.com/karpathy/nanoGPT), [nano-vllm](https://github.com/GeeeekExplorer/nano-vllm) 等开源项目一起食用
|
||||
|
||||
课程页面本身提供了一些知识与资源,可以参考:[Book related documentation and courses](https://hao-ai-lab.github.io/cse234-w25/resources/#book-related-documentation-and-courses)
|
||||
|
||||
|
||||
## 课程资源
|
||||
|
||||
- 课程网站:https://hao-ai-lab.github.io/cse234-w25/
|
||||
- 课程视频:https://hao-ai-lab.github.io/cse234-w25/
|
||||
- 课程教材:https://hao-ai-lab.github.io/cse234-w25/resources/
|
||||
- 课程作业:https://hao-ai-lab.github.io/cse234-w25/assignments/
|
||||
|
||||
## 资源汇总
|
||||
|
||||
所有课程内容都发布了对应的开源版本,但在线测评和作业参考答案部分尚未开源。
|
||||
|
||||
## 其他资源/课程延伸
|
||||
|
||||
- [GPUMode](https://www.youtube.com/@GPUMODE): 有非常多关于GPU Kernel / System的深度讲解。课程中提到的包括[DistServe](https://www.youtube.com/watch?v=tIPDwUepXcA), [FlashAttention](https://www.youtube.com/watch?v=VPslgC9piIw), [Triton](https://www.youtube.com/watch?v=njgow_zaJMw) 都有很好的延伸
|
||||
@@ -276,6 +276,7 @@ nav:
|
||||
- "CMU 10-414/714: Deep Learning Systems": "机器学习系统/CMU10-414.md"
|
||||
- "MIT6.5940: TinyML and Efficient Deep Learning Computing": "机器学习系统/EML.md"
|
||||
- "Machine Learning Compilation": "机器学习系统/MLC.md"
|
||||
- "UCSD CSE234: Data Systems for Machine Learning": "机器学习系统/CSE234.md"
|
||||
- 深度学习:
|
||||
- "Coursera: Deep Learning": "深度学习/CS230.md"
|
||||
- "国立台湾大学: 李宏毅机器学习": "深度学习/LHY.md"
|
||||
|
||||
Reference in New Issue
Block a user