[COURSE] Add UCSD CSE234 Data Systems for Machine Learning (#713)

* Add CSE234 UCSD

* update contents to contain more details

* update extended materials

* Update mkdocs.yml

* change a bit

* make it better

* update

* update
This commit is contained in:
Junda Chen
2026-02-01 20:07:11 -08:00
committed by GitHub
parent fbf8f26a2b
commit 9519b3a667
3 changed files with 153 additions and 0 deletions

View File

@@ -0,0 +1,75 @@
# CSE234: Data Systems for Machine Learning
## Course Overview
- University: UCSD
- Prerequisites: Linear Algebra, Deep Learning, Operating Systems, Computer Networks, Distributed Systems
- Programming Languages: Python, Triton
- Difficulty: 🌟🌟🌟
- Estimated Workload: ~120 hours
<!-- Introduce the course in one or two paragraphs, including but not limited to:
(1) The scope of technical topics covered
(2) Its distinguishing features compared to similar courses
(3) Personal learning experience and impressions
(4) Caveats and difficulty warnings for self-study
-->
This course focuses on the design of end-to-end large language model (LLM) systems, serving as an introductory course to building efficient LLM systems in practice.
The course can be more accurately divided into three parts (with several additional guest lectures):
Part 1. Foundations: modern deep learning and computational representations
- Modern deep learning and computation graphs (framework and system fundamentals)
- Automatic differentiation and an overview of ML system architectures
- Tensor formats, in-depth matrix multiplication, and hardware accelerators
Part 2. Systems and performance optimization: from GPU kernels to compilation and memory
- GPUs and CUDA (including basic performance models)
- GPU matrix multiplication and operator-level compilation
- Triton programming, graph optimization, and compilation
- Memory management (including practical issues and techniques in training and inference)
- Quantization methods and system-level deployment
Part 3. LLM systems: training and inference
- Parallelization strategies: model parallelism, collective communication, intra-/inter-op parallelism, and auto-parallelization
- LLM fundamentals: Transformers, Attention, and MoE
- LLM training optimizations (e.g., FlashAttention-style techniques)
- LLM inference: continuous batching, paged attention, disaggregated prefill/decoding
- Scaling laws
(Guest lectures cover topics such as ML compilers, LLM pretraining and open science, fast inference, and tool use and agents, serving as complementary extensions.)
The defining characteristic of CSE234 is its strong focus on LLM systems as the core application setting. The course emphasizes real-world system design trade-offs and engineering constraints, rather than remaining at the level of algorithms or API usage. Assignments often require students to directly confront performance bottlenecks—such as memory bandwidth limitations, communication overheads, and kernel fusion—and address them through Triton or system-level optimizations. Overall, the learning experience is fairly intensive: a solid background in systems and parallel computing is important. For self-study, it is strongly recommended to prepare CUDA, parallel programming, and core systems knowledge in advance; otherwise, the learning curve becomes noticeably steep in the later parts of the course, especially around LLM optimization and inference. That said, once the pace is manageable, the course offers strong long-term value for those pursuing work in LLM infrastructure, ML systems, or AI compilers.
## Recommended Learning Path
The course itself is relatively well-structured and progressive. However, for students without prior experience in systems and parallel computing, the transition into the second part of the course may feel somewhat steep. A key aspect of this course is spending significant time implementing and optimizing systems in practice. Therefore, it is highly recommended to explore relevant open-source projects on GitHub while reading papers, and to implement related systems or kernels hands-on to deepen understanding.
- Foundations: consider studying alongside open-source projects such as [micrograd](https://github.com/karpathy/micrograd)
- Systems & performance optimization and LLM systems: consider pairing with projects such as [nanoGPT](https://github.com/karpathy/nanoGPT) and [nano-vllm](https://github.com/GeeeekExplorer/nano-vllm)
The course website itself provides a curated list of additional references and materials, which can be found here:
[Book-related documentation and courses](https://hao-ai-lab.github.io/cse234-w25/resources/#book-related-documentation-and-courses)
## Course Resources
- Course Website: https://hao-ai-lab.github.io/cse234-w25/
- Lecture Videos: https://hao-ai-lab.github.io/cse234-w25/
- Reading Materials: https://hao-ai-lab.github.io/cse234-w25/resources/
- Assignments: https://hao-ai-lab.github.io/cse234-w25/assignments/
## Resource Summary
All course materials are released in open-source form. However, the online grading infrastructure and reference solutions for assignments have not been made public.
## Additional Resources / Further Reading
- [GPUMode](https://www.youtube.com/@GPUMODE): offers in-depth explanations of GPU kernels and systems. Topics referenced in the course—such as [DistServe](https://www.youtube.com/watch?v=tIPDwUepXcA), [FlashAttention](https://www.youtube.com/watch?v=VPslgC9piIw), and [Triton](https://www.youtube.com/watch?v=njgow_zaJMw)—all have excellent extended talks available.

View File

@@ -0,0 +1,77 @@
# CSE234: Data Systems for Machine Learning
## 课程简介
- 所属大学UCSD
- 先修要求:线性代数,深度学习,操作系统,计算机网络,分布式系统
- 编程语言Python, Triton
- 课程难度:🌟🌟🌟
- 预计学时120小时
<!-- 用一两段话介绍这门课程,内容包括但不限于:
1课程覆盖的知识点范围
2与同类课程相比它的优势与特点
3学习这门课程的体验与感受
4自学这门课的注意点踩过的坑、难度预警等等
5... ...
-->
本课程专注于设计一个全面的大语言模型(LLM)系统课程作为设计高效LLM系统的入门介绍。
课程可以更准确地分为三个部分(外加若干 guest lecture
Part 1. 基础:现代深度学习与计算表示
- Modern DL 与计算图computational graph / framework 基础)
- Autodiff 与 ML system 架构概览
- Tensor format、MatMul 深入与硬件加速器accelerators
Part 2. 系统与性能优化:从 GPU Kernel 到编译与内存
- GPUs & CUDA含基本性能模型
- GPU MatMul 与算子编译operator compilation
- Triton 编程、图优化与编译graph optimization & compilation
- Memory含训练/推理中的内存问题与技巧)
- Quantization量化方法与系统落地
Part 3. LLM系统训练与推理
- 并行策略模型并行、collective communication、intra-/inter-op、自动并行化
- LLM 基础Transformer、Attention、MoE
- LLM 训练优化FlashAttention 等
- LLM 推理continuous batching、paged attention、disaggregated prefill/decoding
- Scaling law
Guest lecturesML compiler、LLM pretraining/open science、fast inference、tool use & agents 等,作为补充与扩展。)
CSE234的最大特点在于非常专注于以LLM (LLM System)为核心应用场景,强调真实系统设计中的取舍与工程约束,而非停留在算法或 API 使用层面。课程作业通常需要直接面对性能瓶颈如内存带宽、通信开销、kernel fusion 等),并通过 Triton 或系统级优化手段加以解决,对理解“为什么某些 LLM 系统设计是现在这个样子”非常有帮助。学习体验整体偏硬核,前期对系统与并行计算背景要求较高,自学时建议提前补齐 CUDA/并行编程与基础系统知识,否则在后半部分(尤其是 LLM 优化与推理相关内容)会明显感到陡峭的学习曲线。但一旦跟上节奏,这门课对从事 LLM Infra / ML Systems / AI Compiler 方向的同学具有很强的长期价值。
## 学习路线推荐
课程本身其实比较循序渐进但是对于没有系统与并行计算背景的同学来说可能到第二部分会感觉稍微陡峭一点。课程最核心的部分其实是要花很多时间动手实现与优化系统因此建议在读paper的时候就可以在Github上找一些相关的开源项目动手实现相关的系统或者Kernel加深理解。
- 基础部分:建议配合 [micrograd](https://github.com/karpathy/micrograd) 等开源项目一起学习
- 系统与性能优化 & LLM系统建议配合 [nanoGPT](https://github.com/karpathy/nanoGPT), [nano-vllm](https://github.com/GeeeekExplorer/nano-vllm) 等开源项目一起食用
课程页面本身提供了一些知识与资源,可以参考:[Book related documentation and courses](https://hao-ai-lab.github.io/cse234-w25/resources/#book-related-documentation-and-courses)
## 课程资源
- 课程网站https://hao-ai-lab.github.io/cse234-w25/
- 课程视频https://hao-ai-lab.github.io/cse234-w25/
- 课程教材https://hao-ai-lab.github.io/cse234-w25/resources/
- 课程作业https://hao-ai-lab.github.io/cse234-w25/assignments/
## 资源汇总
所有课程内容都发布了对应的开源版本,但在线测评和作业参考答案部分尚未开源。
## 其他资源/课程延伸
- [GPUMode](https://www.youtube.com/@GPUMODE): 有非常多关于GPU Kernel / System的深度讲解。课程中提到的包括[DistServe](https://www.youtube.com/watch?v=tIPDwUepXcA), [FlashAttention](https://www.youtube.com/watch?v=VPslgC9piIw), [Triton](https://www.youtube.com/watch?v=njgow_zaJMw) 都有很好的延伸

View File

@@ -276,6 +276,7 @@ nav:
- "CMU 10-414/714: Deep Learning Systems": "机器学习系统/CMU10-414.md"
- "MIT6.5940: TinyML and Efficient Deep Learning Computing": "机器学习系统/EML.md"
- "Machine Learning Compilation": "机器学习系统/MLC.md"
- "UCSD CSE234: Data Systems for Machine Learning": "机器学习系统/CSE234.md"
- 深度学习:
- "Coursera: Deep Learning": "深度学习/CS230.md"
- "国立台湾大学: 李宏毅机器学习": "深度学习/LHY.md"