anyin233 92e3f3e059 refactor: restructure project and update documentation for second edition (#500)
* feat: remove bilingual button on the front page

* misc: clean repo

* test: fix test suite for v1/v2 restructure and removed language switch

* refactor: restructure chapters for the second edition

* refactor: restructure project

* refactor: remove unused scripts

* refactor: move conftest.py to tests directory

* refactor: update README with new chapter structure and logo path

* fix: update image paths in README and README_EN for consistency

* refactor: update directory structure and script references in documentation
2026-03-12 20:39:45 +00:00

OpenMLSys Logo

CI Book Online License GitHub Stars

中文 | English


Machine Learning Systems: Design and Implementation

An open-source book explaining the design principles and implementation experience of modern machine learning systems, covering the complete technology stack from programming interfaces and computational graphs to compilers and distributed training.

English version 1 (stable): openmlsys.github.io/html-en/

English version 2: Under reconstruction.

Table of Contents

Target Audience

  • Students: Those who have mastered machine learning fundamentals and want to deeply understand the design and implementation of modern ML systems.
  • Researchers: Those who need to develop custom operators or leverage distributed execution for large model development.
  • Engineers: Those responsible for building ML infrastructure and need to tune system performance or customize ML systems for business needs.

Content Overview

The book (2nd edition) consists of 9 chapters:

Chapter Content
Chapter 1: Introduction Overview of ML system architecture and technology stack
Chapter 2: Programming Interfaces and Computational Graphs Tensor abstraction, automatic differentiation, graph representation and execution
Chapter 3: AI Accelerators and Programming GPU architecture and CUDA/Triton/CUTLASS programming models
Chapter 4: AI Compilers and Runtime Systems IR design, graph optimization, kernel generation, and runtime execution
Chapter 5: Data Processing Systems Data loading, data pipelines, and distributed data processing
Chapter 6: Training Systems Single-node and distributed training, parallelism strategies, and training optimization
Chapter 7: Model Serving Inference optimization, online serving, and model management
Chapter 8: RL Systems Reinforcement learning pipelines, environment interaction, and RL system design
Chapter 9: Large-scale GPU Cluster Management GPU scheduling, resource management, and large-scale training infrastructure

Changelog

Date Event
2022-01 Project initialized; Chinese content writing begins
2022-05 Extension chapters released (Federated Learning, RL Systems, Explainable AI)
2023-05 Codebase adapted to MindSpore 2.0
2026-03 Bilingual (CN/EN) build architecture refactored; English version launched

Build Guide

Prerequisites

  • curl
  • git
  • Python 3

Installation

# Clone the repository
git clone https://github.com/openmlsys/openmlsys-zh.git
cd openmlsys-zh

# Install Rust toolchain (Linux/macOS)
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

# Install mdbook
cargo install mdbook

Build HTML

sh build_mdbook_v2.sh
# English output: .mdbook-v2/book
# Chinese output: .mdbook-v2-zh/book

For more details, see the Build Guide.

Contributing

We welcome all forms of contributions. For the full workflow, see the Contributing Guide.

Before contributing, please read:

Community

微信群二维码
Join our WeChat group by scanning the QR code

Citation

If this book has been helpful to your research or work, please cite it as:

Plain text:

OpenMLSys Team. Machine Learning Systems: Design and Implementation. 2022. https://openmlsys.github.io/

BibTeX:

@book{openmlsys2022,
  title     = {Machine Learning Systems: Design and Implementation},
  author    = {OpenMLSys Team},
  year      = {2022},
  url       = {https://openmlsys.github.io/},
  note      = {Open-source textbook, \url{https://github.com/openmlsys/openmlsys-zh}}
}

License

This project is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Description
《Machine Learning Systems: Design and Implementation》- Chinese Version
Readme 65 MiB
Languages
TeX 54.8%
Python 29.8%
HTML 8.9%
JavaScript 3%
CSS 2.2%
Other 1.3%