mirror of
https://github.com/openmlsys/openmlsys-zh.git
synced 2026-04-01 09:50:23 +08:00
* docs: split the book into English and Chinese builds * feat: update english version framework * fix: fix ci
129 lines
5.2 KiB
Markdown
129 lines
5.2 KiB
Markdown
<p align="center">
|
|
<img src="static/logo-with-text.png" alt="OpenMLSys Logo" width="400"/>
|
|
</p>
|
|
|
|
<p align="center">
|
|
<a href="https://github.com/openmlsys/openmlsys-zh/actions/workflows/main.yml">
|
|
<img src="https://github.com/openmlsys/openmlsys-zh/actions/workflows/main.yml/badge.svg" alt="CI"/>
|
|
</a>
|
|
<a href="https://openmlsys.github.io/">
|
|
<img src="https://img.shields.io/badge/book-online-blue" alt="Book Online"/>
|
|
</a>
|
|
<a href="https://github.com/openmlsys/openmlsys-zh/blob/main/LICENSE">
|
|
<img src="https://img.shields.io/github/license/openmlsys/openmlsys-zh" alt="License"/>
|
|
</a>
|
|
<a href="https://github.com/openmlsys/openmlsys-zh/stargazers">
|
|
<img src="https://img.shields.io/github/stars/openmlsys/openmlsys-zh?style=social" alt="GitHub Stars"/>
|
|
</a>
|
|
</p>
|
|
|
|
<p align="center">
|
|
<a href="README.md">中文</a> | <b>English</b>
|
|
</p>
|
|
|
|
---
|
|
|
|
# Machine Learning Systems: Design and Implementation
|
|
|
|
An open-source book explaining the design principles and implementation experience of modern machine learning systems, covering the complete technology stack from programming interfaces and computational graphs to compilers and distributed training.
|
|
|
|
**Read Online:** [openmlsys.github.io](https://openmlsys.github.io/)
|
|
|
|
## Table of Contents
|
|
|
|
- [Target Audience](#target-audience)
|
|
- [Content Overview](#content-overview)
|
|
- [Build Guide](#build-guide)
|
|
- [Contributing](#contributing)
|
|
- [Community](#community)
|
|
- [License](#license)
|
|
|
|
## Target Audience
|
|
|
|
- **Students**: Those who have mastered machine learning fundamentals and want to deeply understand the design and implementation of modern ML systems.
|
|
- **Researchers**: Those who need to develop custom operators or leverage distributed execution for large model development.
|
|
- **Engineers**: Those responsible for building ML infrastructure and need to tune system performance or customize ML systems for business needs.
|
|
|
|
## Content Overview
|
|
|
|
The book is organized into three parts: Fundamentals, Advanced Topics, and Extensions.
|
|
|
|
### Part I: Fundamentals
|
|
|
|
| Chapter | Content |
|
|
|---------|---------|
|
|
| [Programming Interface](chapter_programming_interface/) | Framework API design, ML workflows, deep learning model definition, C/C++ framework development |
|
|
| [Computational Graph](chapter_computational_graph/) | Graph components, generation methods, scheduling strategies, automatic differentiation |
|
|
|
|
### Part II: Advanced Topics
|
|
|
|
| Chapter | Content |
|
|
|---------|---------|
|
|
| [Compiler Frontend & IR](chapter_frontend_and_ir/) | Type inference, intermediate representation (IR), automatic differentiation, common optimization passes |
|
|
| [Compiler Backend & Runtime](chapter_backend_and_runtime/) | Graph optimization, operator selection, memory allocation, compute scheduling and execution |
|
|
| [Hardware Accelerators](chapter_accelerator/) | GPU/Ascend architecture, high-performance programming interfaces (CUDA/CANN) |
|
|
| [Data Processing](chapter_data_processing/) | Usability, efficiency, order preservation, distributed data processing |
|
|
| [Model Deployment](chapter_model_deployment/) | Model conversion, compression, inference, and security |
|
|
| [Distributed Training](chapter_distributed_training/) | Data parallelism, model parallelism, pipeline parallelism, collective communication, parameter servers |
|
|
|
|
### Part III: Extensions
|
|
|
|
| Chapter | Content |
|
|
|---------|---------|
|
|
| [Recommender Systems](chapter_recommender_system/) | Recommendation principles, large-scale industrial architecture |
|
|
| [Federated Learning](chapter_federated_learning/) | Federated learning methods, privacy protection, system implementation |
|
|
| [Reinforcement Learning Systems](chapter_reinforcement_learning/) | Single-agent and multi-agent RL systems |
|
|
| [Explainable AI Systems](chapter_explainable_AI/) | XAI methods and production practices |
|
|
| [Robot Learning Systems](chapter_rl_sys/) | Robot perception, planning, control, and system safety |
|
|
|
|
## Build Guide
|
|
|
|
### Prerequisites
|
|
|
|
- Python >= 3.10
|
|
- pandoc >= 2.19
|
|
|
|
### Installation
|
|
|
|
```bash
|
|
# Clone the repository
|
|
git clone https://github.com/openmlsys/openmlsys-zh.git
|
|
cd openmlsys-zh
|
|
|
|
# Install d2lbook
|
|
git clone https://github.com/openmlsys/d2l-book.git
|
|
cd d2l-book && pip install . && cd ..
|
|
|
|
# Install Python dependencies
|
|
pip install -r requirements.txt
|
|
```
|
|
|
|
### Build HTML
|
|
|
|
```bash
|
|
sh build_html.sh
|
|
# Output is in _build/html/
|
|
```
|
|
|
|
For more details, see the [Build Guide](info/info.md).
|
|
|
|
## Contributing
|
|
|
|
We welcome all forms of contributions, including:
|
|
|
|
- **Errata**: If you find text or figure errors, please open an Issue and @ the [chapter editors](info/editors.md), or submit a PR directly.
|
|
- **Content updates**: Submit PRs to update or add Markdown files.
|
|
- **New chapters**: We welcome community contributions on topics such as meta-learning systems, automatic parallelism, cluster scheduling, green AI, and graph learning.
|
|
|
|
Before contributing, please read:
|
|
- [Writing Style Guide](info/style.md)
|
|
- [Terminology Guide](info/terminology.md)
|
|
|
|
## Community
|
|
|
|
Join our WeChat group by scanning the QR code in [info/mlsys_group.png](info/mlsys_group.png).
|
|
|
|
## License
|
|
|
|
This project is licensed under the [Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License](https://creativecommons.org/licenses/by-nc-sa/4.0/).
|