Files
Yeqi Huang d953030747 feat: add v1/v2 versioning with language selector (#494)
* feat: add v1/v2 versioning and language selector for mdbook

- Copy current content to v1/ directory (1st Edition)
- Create v2/ directory with new TOC structure (2nd Edition) and placeholder chapters
- Add version selector (V1/V2) and language toggle (EN/ZH) in top-right nav bar
- Add build scripts: build_mdbook_v1.sh, build_mdbook_v2.sh
- Update assemble_docs_publish_tree.py to support v1/v2 deployment layout
- Fix mdbook preprocessor to use 'sections' key (v0.4.43 compatibility)
- Update .gitignore for new build artifact directories
- Deployment layout: / = v2 EN, /cn/ = v2 ZH, /v1/ = v1 EN, /v1/cn/ = v1 ZH

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* build: update CI to build and verify all four books (v1/v2 x EN/ZH)

- Clarify step names: "Build v2 (EN + ZH)" and "Build v1 (EN + ZH)"
- Add verification step to check all four index.html outputs exist
- Deploy workflow assembles: / = v2 EN, /cn/ = v2 ZH, /v1/ = v1 EN, /v1/cn/ = v1 ZH

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: gracefully skip missing TOC entries instead of crashing

resolve_toc_target() now returns None for missing files instead of
raising FileNotFoundError. This fixes v1 EN build where chapter index
files reference TOC entry names that don't match actual filenames.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 13:37:42 +00:00

4.0 KiB

Machine Learning Framework Architecture

Figure :numref:intro/framework-architecture shows the basic architecture of a typical, complete machine learning framework.

Architecture of a machine learningframework 🏷️intro/framework-architecture

  1. Programming interfaces: A machine learning framework needs to provide programming interfaces, usually those of high-level programming languages (like Python), to cater for the diversified backgrounds of machine learning developers. At the same time, the framework also needs to support a system implementation that is mainly based on low-level programming languages (e.g., C and C++) so that operating system features (e.g., thread management and network communication) and various hardware accelerators can be utilized efficiently for optimized performance.

  2. Computational graph: Machine learning applications, though implemented through different programming interfaces, need to share the same backend when the applications run. The computational graph technology is key to realizing this backend. A computational graph, which defines a user's machine learning application, includes many graph nodes that represent computational operations. These nodes are connected by edges, which represent computational dependencies.

  3. Compiler frontend: Once a computational graph is built, the machine learning framework analyzes and optimizes it (or the corresponding application) through the compiler frontend. The compiler frontend provides key functions such as intermediate representation, automatic differentiation, type derivation, and static analysis.

  4. Compiler backend and runtime: After analyzing and optimizing the computational graph, the machine learning framework uses the compiler backend and runtime to optimize different types of underlying hardware. In addition to optimizing the selection or scheduling sequence of operators, common optimization technologies usually analyze the L2/L3 cache size and the instruction pipeline length to match hardware specifications.

  5. Heterogeneous processors: A machine learning application is co-executed by central processing units (CPUs) and hardware accelerators (such as NVIDIA GPUs, Huawei Ascend processors, and Google TPUs). During the execution, non-matrix operations (e.g., complex data preprocessing and computational graph scheduling) are handled by CPUs, whereas matrix operations and certain frequently used machine learning operators (e.g., Transformer operators and convolution operators) are performed by hardware accelerators.

  6. Data processing: A machine learning application needs to perform complex preprocessing on raw data and manage a large number of training, validation, and test datasets. The data processing module (e.g., the tf.data module of TensorFlow, or the DataLoader module of PyTorch) is responsible for such data-centered operations.

  7. Model deployment: In addition to model training, model deployment is another key function needed in a machine learning framework. Model compression technologies --- such as model conversion, quantization, and distillation --- enable us to run models on hardware with limited memory. It is also necessary to optimize model operators for specific hardware inference platforms (e.g., NVIDIA Orin). Furthermore, in order to ensure the security of a model (e.g., to deny unauthorized user reads), model obfuscation must be considered in the framework's design.

  8. Distributed training: A machine learning model is usually trained in parallel on distributed compute nodes. Common parallel training methods include data parallelism, model parallelism, hybrid parallelism, and pipeline parallelism, all of which are usually implemented through the remote procedure call (RPC), collective communication, or parameter server.