mirror of
https://github.com/openmlsys/openmlsys-zh.git
synced 2026-04-13 15:19:50 +08:00
* feat: add v1/v2 versioning and language selector for mdbook - Copy current content to v1/ directory (1st Edition) - Create v2/ directory with new TOC structure (2nd Edition) and placeholder chapters - Add version selector (V1/V2) and language toggle (EN/ZH) in top-right nav bar - Add build scripts: build_mdbook_v1.sh, build_mdbook_v2.sh - Update assemble_docs_publish_tree.py to support v1/v2 deployment layout - Fix mdbook preprocessor to use 'sections' key (v0.4.43 compatibility) - Update .gitignore for new build artifact directories - Deployment layout: / = v2 EN, /cn/ = v2 ZH, /v1/ = v1 EN, /v1/cn/ = v1 ZH Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * build: update CI to build and verify all four books (v1/v2 x EN/ZH) - Clarify step names: "Build v2 (EN + ZH)" and "Build v1 (EN + ZH)" - Add verification step to check all four index.html outputs exist - Deploy workflow assembles: / = v2 EN, /cn/ = v2 ZH, /v1/ = v1 EN, /v1/cn/ = v1 ZH Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: gracefully skip missing TOC entries instead of crashing resolve_toc_target() now returns None for missing files instead of raising FileNotFoundError. This fixes v1 EN build where chapter index files reference TOC entry names that don't match actual filenames. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
979 B
979 B
硬件加速器
上一章节详细讨论了后端的计算图优化、算子选择以及内存分配。当前主流深度学习模型大多基于神经网络实现,无论是训练还是推理,都会产生海量的计算任务,尤其是涉及矩阵乘法这种高计算任务的算子。然而,通用处理器芯片如CPU在执行这类算子时通常耗时较大,难以满足训练和推理任务的需求。因此工业界和学术界都将目光投向特定领域的加速器芯片设计,希望以此来解决算力资源不足的问题。
本章将会着重介绍加速器的基本组成原理,并且以矩阵乘法为例,介绍在加速器上的编程方式及优化方法。
本章的学习目标包括:
-
掌握加速器的基本组成
-
掌握矩阵乘法的常见优化手段
-
理解编程API的设计理念
:maxdepth: 2
accelerator_introduction
accelerator_architecture
accelerator_programming
accelerator_practise
summary