Files
Yeqi Huang d953030747 feat: add v1/v2 versioning with language selector (#494)
* feat: add v1/v2 versioning and language selector for mdbook

- Copy current content to v1/ directory (1st Edition)
- Create v2/ directory with new TOC structure (2nd Edition) and placeholder chapters
- Add version selector (V1/V2) and language toggle (EN/ZH) in top-right nav bar
- Add build scripts: build_mdbook_v1.sh, build_mdbook_v2.sh
- Update assemble_docs_publish_tree.py to support v1/v2 deployment layout
- Fix mdbook preprocessor to use 'sections' key (v0.4.43 compatibility)
- Update .gitignore for new build artifact directories
- Deployment layout: / = v2 EN, /cn/ = v2 ZH, /v1/ = v1 EN, /v1/cn/ = v1 ZH

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* build: update CI to build and verify all four books (v1/v2 x EN/ZH)

- Clarify step names: "Build v2 (EN + ZH)" and "Build v1 (EN + ZH)"
- Add verification step to check all four index.html outputs exist
- Deploy workflow assembles: / = v2 EN, /cn/ = v2 ZH, /v1/ = v1 EN, /v1/cn/ = v1 ZH

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: gracefully skip missing TOC entries instead of crashing

resolve_toc_target() now returns None for missing files instead of
raising FileNotFoundError. This fixes v1 EN build where chapter index
files reference TOC entry names that don't match actual filenames.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 13:37:42 +00:00

1.5 KiB

Chapter Summary

  1. The advent of large-scale machine learning models has sparked an exponential increase in the need for computational power and memory, leading to the emergence of distributed training systems.

  2. Distributed training systems often utilize data parallelism, model parallelism, or a combination of both, based on memory limitations and computational constraints.

  3. Pipeline parallelism is another technique adopted by distributed training systems, which involves partitioning a mini-batch into micro-batches and overlapping the forward and backward propagation of different micro-batches.

  4. Although distributed training systems usually function in compute clusters, these networks sometimes lack the sufficient bandwidth for the transmission of substantial gradients produced during training.

  5. To meet the demand for comprehensive communication bandwidth, machine learning clusters integrate heterogeneous high-performance networks, such as NVLink, NVSwitch, and InfiniBand.

  6. To accomplish synchronous training of a machine learning model, distributed training systems frequently employ a range of collective communication operators, among which the AllReduce operator is popularly used for aggregating the gradients computed by distributed nodes.

  7. Parameter servers play a crucial role in facilitating asynchronous training and sparse model training. Moreover, they leverage model replication to address issues related to data hotspots and server failures.