中文 | English
--- # Machine Learning Systems: Design and Implementation An open-source book explaining the design principles and implementation experience of modern machine learning systems, covering the complete technology stack from programming interfaces and computational graphs to compilers and distributed training. **Read Online:** [openmlsys.github.io](https://openmlsys.github.io/) ## Table of Contents - [Target Audience](#target-audience) - [Content Overview](#content-overview) - [Build Guide](#build-guide) - [Contributing](#contributing) - [Community](#community) - [License](#license) ## Target Audience - **Students**: Those who have mastered machine learning fundamentals and want to deeply understand the design and implementation of modern ML systems. - **Researchers**: Those who need to develop custom operators or leverage distributed execution for large model development. - **Engineers**: Those responsible for building ML infrastructure and need to tune system performance or customize ML systems for business needs. ## Content Overview The book is organized into three parts: Fundamentals, Advanced Topics, and Extensions. ### Part I: Fundamentals | Chapter | Content | |---------|---------| | [Programming Interface](chapter_programming_interface/) | Framework API design, ML workflows, deep learning model definition, C/C++ framework development | | [Computational Graph](chapter_computational_graph/) | Graph components, generation methods, scheduling strategies, automatic differentiation | ### Part II: Advanced Topics | Chapter | Content | |---------|---------| | [Compiler Frontend & IR](chapter_frontend_and_ir/) | Type inference, intermediate representation (IR), automatic differentiation, common optimization passes | | [Compiler Backend & Runtime](chapter_backend_and_runtime/) | Graph optimization, operator selection, memory allocation, compute scheduling and execution | | [Hardware Accelerators](chapter_accelerator/) | GPU/Ascend architecture, high-performance programming interfaces (CUDA/CANN) | | [Data Processing](chapter_data_processing/) | Usability, efficiency, order preservation, distributed data processing | | [Model Deployment](chapter_model_deployment/) | Model conversion, compression, inference, and security | | [Distributed Training](chapter_distributed_training/) | Data parallelism, model parallelism, pipeline parallelism, collective communication, parameter servers | ### Part III: Extensions | Chapter | Content | |---------|---------| | [Recommender Systems](chapter_recommender_system/) | Recommendation principles, large-scale industrial architecture | | [Federated Learning](chapter_federated_learning/) | Federated learning methods, privacy protection, system implementation | | [Reinforcement Learning Systems](chapter_reinforcement_learning/) | Single-agent and multi-agent RL systems | | [Explainable AI Systems](chapter_explainable_AI/) | XAI methods and production practices | | [Robot Learning Systems](chapter_rl_sys/) | Robot perception, planning, control, and system safety | ## Build Guide ### Prerequisites - Python >= 3.10 - pandoc >= 2.19 ### Installation ```bash # Clone the repository git clone https://github.com/openmlsys/openmlsys-zh.git cd openmlsys-zh # Install d2lbook git clone https://github.com/openmlsys/d2l-book.git cd d2l-book && pip install . && cd .. # Install Python dependencies pip install -r requirements.txt ``` ### Build HTML ```bash sh build_html.sh # Output is in _build/html/ ``` For more details, see the [Build Guide](info/info.md). ## Contributing We welcome all forms of contributions, including: - **Errata**: If you find text or figure errors, please open an Issue and @ the [chapter editors](info/editors.md), or submit a PR directly. - **Content updates**: Submit PRs to update or add Markdown files. - **New chapters**: We welcome community contributions on topics such as meta-learning systems, automatic parallelism, cluster scheduling, green AI, and graph learning. Before contributing, please read: - [Writing Style Guide](info/style.md) - [Terminology Guide](info/terminology.md) ## Community Join our WeChat group by scanning the QR code in [info/mlsys_group.png](info/mlsys_group.png). ## License This project is licensed under the [Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License](https://creativecommons.org/licenses/by-nc-sa/4.0/).