fix: fix equation rendering by changing the toolchain to mathjax (#493)

* docs: update README and build guide

* fix: escape * and _ inside math to prevent markdown emphasis corruption

* fix: configure MathJax to use TeX (Computer Modern) font

* feat: enhance markdown processing with label and figure collection

* fix: remove duplicate bibliography directives from chapter summaries

References are already handled at the chapter level, so the
:bibliography: directives in summary pages are redundant and cause
rendering issues.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
anyin233
2026-03-12 06:21:56 +00:00
committed by GitHub
parent ec03af6862
commit 00db02dbfd
26 changed files with 642 additions and 1037 deletions

View File

@@ -23,20 +23,12 @@ jobs:
with:
mdbook-version: 'latest'
- name: Cache mdbook-typst-math binary
uses: actions/cache@v4
with:
path: .mdbook-bin
key: mdbook-typst-math-v0.3.0-linux-x86_64
- name: Run mdBook regression tests
run: |
python3 -m unittest discover -s tests -p 'test_prepare_mdbook.py'
python3 -m unittest discover -s tests -p 'test_prepare_mdbook_zh.py'
python3 -m unittest discover -s tests -p 'test_assemble_docs_publish_tree.py'
python3 -m unittest discover -s tests -p 'test_ensure_book_resources.py'
python3 -m unittest discover -s tests -p 'test_mdbook_typst_math.py'
python3 -m unittest discover -s tests -p 'test_ensure_mdbook_typst_math.py'
python3 -m unittest discover -s tests -p 'test_update_docs_workflow.py'
- name: Build English HTML with mdBook

View File

@@ -17,12 +17,6 @@ jobs:
with:
python-version: '3.10'
- name: Cache mdbook-typst-math binary
uses: actions/cache@v4
with:
path: .mdbook-bin
key: mdbook-typst-math-v0.3.0-linux-x86_64
- name: Setup mdBook
uses: peaceiris/actions-mdbook@v2
with:
@@ -34,8 +28,6 @@ jobs:
python3 -m unittest discover -s tests -p 'test_prepare_mdbook_zh.py'
python3 -m unittest discover -s tests -p 'test_assemble_docs_publish_tree.py'
python3 -m unittest discover -s tests -p 'test_ensure_book_resources.py'
python3 -m unittest discover -s tests -p 'test_mdbook_typst_math.py'
python3 -m unittest discover -s tests -p 'test_ensure_mdbook_typst_math.py'
- name: Build English HTML with mdBook
run: bash build_mdbook.sh

View File

@@ -89,8 +89,8 @@
### 环境依赖
- Python >= 3.10
- pandoc >= 2.19
- curl
- git
### 安装步骤
@@ -99,19 +99,18 @@
git clone https://github.com/openmlsys/openmlsys-zh.git
cd openmlsys-zh
# 安装 d2lbook
git clone https://github.com/openmlsys/d2l-book.git
cd d2l-book && pip install . && cd ..
# 安装rust toolchain
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
# 安装 Python 依赖
pip install -r requirements.txt
# 安装mdbook
cargo install mdbook
```
### 编译 HTML
### 编译HTML
```bash
sh build_html.sh
# 生成结果在 _build/html/
sh build_mdbook_zh.sh
# 生成结果位于 .mdbook-zh/book
```
更多细节请参考 [构建指南](info/info.md)。

View File

@@ -91,8 +91,8 @@ The book is organized into three parts: Fundamentals, Advanced Topics, and Exten
### Prerequisites
- Python >= 3.10
- pandoc >= 2.19
- curl
- git
### Installation
@@ -101,19 +101,18 @@ The book is organized into three parts: Fundamentals, Advanced Topics, and Exten
git clone https://github.com/openmlsys/openmlsys-zh.git
cd openmlsys-zh
# Install d2lbook
git clone https://github.com/openmlsys/d2l-book.git
cd d2l-book && pip install . && cd ..
# Install Rust toolchain (Linux/macOS)
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
# Install Python dependencies
pip install -r requirements.txt
# Install mdbook
cargo install mdbook
```
### Build HTML
```bash
sh build_html.sh
# Output is in _build/html/
sh build_mdbook.sh
# Output is in .mdbook/book
```
For more details, see the [Build Guide](info/info.md).

View File

@@ -11,9 +11,8 @@ create-missing = false
[preprocessor.openmlsys]
command = "python3 tools/mdbook_preprocessor.py"
[preprocessor.typst-math]
[output.html]
mathjax-support = true
git-repository-url = "https://github.com/openmlsys/openmlsys-zh"
preferred-dark-theme = "navy"
additional-css = ["theme/dark-mode-images.css", "theme/typst.css"]
additional-css = ["theme/dark-mode-images.css"]

View File

@@ -11,9 +11,8 @@ create-missing = false
[preprocessor.openmlsys-zh]
command = "python3 ../../tools/mdbook_zh_preprocessor.py"
[preprocessor.typst-math]
[output.html]
mathjax-support = true
git-repository-url = "https://github.com/openmlsys/openmlsys-zh"
preferred-dark-theme = "navy"
additional-css = ["theme/dark-mode-images.css", "theme/typst.css"]
additional-css = ["theme/dark-mode-images.css"]

12
books/zh/theme/head.hbs Normal file
View File

@@ -0,0 +1,12 @@
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
"HTML-CSS": {
availableFonts: ["TeX"],
preferredFont: "TeX",
webFont: "TeX"
},
SVG: {
font: "TeX"
}
});
</script>

View File

@@ -14,10 +14,6 @@ if ! command -v mdbook >/dev/null 2>&1; then
exit 1
fi
MDBOOK_TYPST_MATH_BIN_DIR="${ROOT}/.mdbook-bin"
"${PYTHON_BIN}" "${ROOT}/tools/ensure_mdbook_typst_math.py" --output-dir "${MDBOOK_TYPST_MATH_BIN_DIR}" >/dev/null
export PATH="${MDBOOK_TYPST_MATH_BIN_DIR}:${PATH}"
"${PYTHON_BIN}" "${ROOT}/tools/ensure_book_resources.py" --chapter-dir "${ROOT}/en_chapters"
"${PYTHON_BIN}" "${ROOT}/tools/prepare_mdbook.py" \
--source "${ROOT}/en_chapters" \

View File

@@ -14,10 +14,6 @@ if ! command -v mdbook >/dev/null 2>&1; then
exit 1
fi
MDBOOK_TYPST_MATH_BIN_DIR="${ROOT}/.mdbook-bin"
"${PYTHON_BIN}" "${ROOT}/tools/ensure_mdbook_typst_math.py" --output-dir "${MDBOOK_TYPST_MATH_BIN_DIR}" >/dev/null
export PATH="${MDBOOK_TYPST_MATH_BIN_DIR}:${PATH}"
# ── Create resource links ─────────────────────────────────────────────────────
"${PYTHON_BIN}" "${ROOT}/tools/ensure_book_resources.py" --chapter-dir "${ROOT}/zh_chapters"

View File

@@ -1,13 +1,10 @@
## 环境安装
机器学习系统书籍部署在GitHub是依赖于d2lbook工具实现的。因此我们首先要安装d2lbook。
机器学习系统书籍部署在GitHub是依赖于mdbook工具实现的。我们推荐使用rust的原生包管理器cargo安装mdbook。
```bash
git clone https://github.com/openmlsys/d2l-book.git
cd d2l-book
python setup.py install
# 安装rust工具链获取cargo
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
cargo install mdbook
```
使用d2lbook构建HTML需要安装`pandoc`, 可以使用`conda install pandoc` 如果是MacOS可以用Homebrew apt源中pandoc发布版本较低表格转换格式可能有误请尽量使用较新版本的pandoc。
构建PDF时如果有SVG图片需要安装LibRsvg来转换SVG图片安装`librsvg`可以通过`apt-get install librsvg`如果是MacOS可以用Homebrew
当然构建PDF必须要有LaTeX如安装[Tex Live](https://www.tug.org/texlive/).
## 编译HTML版本
在编译前先下载[openmlsys-zh](https://github.com/openmlsys/openmlsys-zh) 所有的编译命令都在这个文件目录内执行。
@@ -15,16 +12,16 @@ python setup.py install
git clone https://github.com/openmlsys/openmlsys-zh.git
cd openmlsys-zh
```
使用d2lbook工具编译HTML。 请尽量使用build_html.sh脚本进行编译保证首页正确合并到书籍中去。
```
sh build_html.sh
使用mdbook工具编译HTML。 请尽量使用build_mdbook.sh脚本进行编译保证首页正确合并到书籍中去。
```bash
sh build_mdbook.sh
# 中文版本
sh build_mdbook_zh.sh
```
生成的html会在`_build/html`
生成的html会在`.mdbook/book`或者`.mdbook-zh/book`下。此时我们可以使用`tools/assemble_docs_publish_tree.py`组装最终的双语发布版本然后将其拷贝至openmlsys.github.io的docs发布
此时我们将编译好的html整个文件夹下的内容拷贝至openmlsys.github.io的docs发布。
需要注意的是docs目录下的.nojekyll不要删除了不然网页会没有渲染。
具体工作流可以参考`.github/workflows/update_docs.yml`
## 样式规范

View File

@@ -1,134 +0,0 @@
from __future__ import annotations
import gzip
import hashlib
import os
import tempfile
import unittest
from pathlib import Path
from unittest.mock import patch
from tools.ensure_mdbook_typst_math import (
ASSET_SHA256,
VERSION,
build_download_url,
ensure_binary,
resolve_asset_name,
resolve_binary_path,
resolve_version_path,
)
class ResolveAssetNameTests(unittest.TestCase):
def test_resolve_asset_name_for_supported_targets(self) -> None:
self.assertEqual(
resolve_asset_name(system="Darwin", machine="arm64"),
"mdbook-typst-math-aarch64-apple-darwin.gz",
)
self.assertEqual(
resolve_asset_name(system="Darwin", machine="x86_64"),
"mdbook-typst-math-x86_64-apple-darwin.gz",
)
self.assertEqual(
resolve_asset_name(system="Linux", machine="aarch64"),
"mdbook-typst-math-aarch64-unknown-linux-gnu.gz",
)
self.assertEqual(
resolve_asset_name(system="Linux", machine="AMD64"),
"mdbook-typst-math-x86_64-unknown-linux-gnu.gz",
)
self.assertEqual(
resolve_asset_name(system="Windows", machine="AMD64"),
"mdbook-typst-math-x86_64-pc-windows-msvc.exe",
)
def test_resolve_asset_name_rejects_unsupported_targets(self) -> None:
with self.assertRaises(ValueError):
resolve_asset_name(system="Linux", machine="riscv64")
class EnsureBinaryTests(unittest.TestCase):
def test_ensure_binary_downloads_and_extracts_gzip_release(self) -> None:
payload = b"linux-binary"
asset_name = "mdbook-typst-math-x86_64-unknown-linux-gnu.gz"
with tempfile.TemporaryDirectory() as tmpdir:
output_dir = Path(tmpdir)
urls: list[str] = []
def fake_downloader(url: str) -> bytes:
urls.append(url)
return gzip.compress(payload)
with patch.dict(ASSET_SHA256, {asset_name: hashlib.sha256(gzip.compress(payload)).hexdigest()}):
binary_path = ensure_binary(
output_dir,
system="Linux",
machine="x86_64",
downloader=fake_downloader,
)
self.assertEqual(binary_path, resolve_binary_path(output_dir, VERSION, asset_name))
self.assertEqual(binary_path.name, "mdbook-typst-math")
self.assertEqual(binary_path.read_bytes(), payload)
self.assertEqual(resolve_version_path(output_dir).read_text(encoding="utf-8"), VERSION)
self.assertEqual(urls, [build_download_url(VERSION, asset_name)])
self.assertTrue(os.access(binary_path, os.X_OK))
def test_ensure_binary_uses_cached_file_without_downloading(self) -> None:
with tempfile.TemporaryDirectory() as tmpdir:
output_dir = Path(tmpdir)
asset_name = "mdbook-typst-math-x86_64-unknown-linux-gnu.gz"
cached_binary = resolve_binary_path(output_dir, VERSION, asset_name)
output_dir.mkdir(parents=True, exist_ok=True)
cached_binary.write_bytes(b"cached")
cached_binary.chmod(0o755)
resolve_version_path(output_dir).write_text(VERSION, encoding="utf-8")
def fail_downloader(_: str) -> bytes:
raise AssertionError("downloader should not be called for cached binary")
binary_path = ensure_binary(
output_dir,
system="Linux",
machine="x86_64",
downloader=fail_downloader,
)
self.assertEqual(binary_path, cached_binary)
self.assertEqual(binary_path.read_bytes(), b"cached")
def test_ensure_binary_keeps_windows_extension(self) -> None:
payload = b"windows-binary"
asset_name = "mdbook-typst-math-x86_64-pc-windows-msvc.exe"
with tempfile.TemporaryDirectory() as tmpdir:
output_dir = Path(tmpdir)
def fake_downloader(_: str) -> bytes:
return payload
with patch.dict(ASSET_SHA256, {asset_name: hashlib.sha256(payload).hexdigest()}):
binary_path = ensure_binary(
output_dir,
system="Windows",
machine="AMD64",
downloader=fake_downloader,
)
self.assertEqual(binary_path.name, "mdbook-typst-math.exe")
self.assertEqual(binary_path.read_bytes(), payload)
def test_ensure_binary_rejects_checksum_mismatch(self) -> None:
with tempfile.TemporaryDirectory() as tmpdir:
with self.assertRaises(ValueError):
ensure_binary(
Path(tmpdir),
system="Linux",
machine="x86_64",
downloader=lambda _: gzip.compress(b"bad-binary"),
)
if __name__ == "__main__":
unittest.main()

View File

@@ -1,38 +0,0 @@
from __future__ import annotations
import unittest
from pathlib import Path
REPO_ROOT = Path(__file__).resolve().parents[1]
BOOK_PATHS = (
REPO_ROOT / "book.toml",
REPO_ROOT / "books" / "zh" / "book.toml",
)
BUILD_SCRIPTS = (
REPO_ROOT / "build_mdbook.sh",
REPO_ROOT / "build_mdbook_zh.sh",
)
class MdBookTypstMathConfigTests(unittest.TestCase):
def test_books_use_typst_math_without_mathjax(self) -> None:
for path in BOOK_PATHS:
config = path.read_text(encoding="utf-8")
self.assertIn("[preprocessor.typst-math]", config, path.as_posix())
self.assertIn("theme/typst.css", config, path.as_posix())
self.assertNotIn("mathjax-support = true", config, path.as_posix())
def test_build_scripts_bootstrap_prebuilt_typst_math_binary(self) -> None:
for path in BUILD_SCRIPTS:
script = path.read_text(encoding="utf-8")
self.assertIn("ensure_mdbook_typst_math.py", script, path.as_posix())
self.assertIn("MDBOOK_TYPST_MATH_BIN_DIR", script, path.as_posix())
self.assertIn("export PATH=", script, path.as_posix())
self.assertNotIn("cargo install mdbook-typst-math", script, path.as_posix())
if __name__ == "__main__":
unittest.main()

View File

@@ -4,7 +4,17 @@ import tempfile
import unittest
from pathlib import Path
from tools.prepare_mdbook import build_title_cache, rewrite_markdown, write_summary
from tools.prepare_mdbook import (
_relative_chapter_path,
build_title_cache,
collect_figure_labels,
collect_labels,
convert_math_to_mathjax,
normalize_directives,
process_figure_captions,
rewrite_markdown,
write_summary,
)
REPO_ROOT = Path(__file__).resolve().parents[1]
@@ -233,5 +243,287 @@ Reference :cite:`smith2024`.
self.assertIn("width: 100%;", frontpage)
class CollectLabelsTests(unittest.TestCase):
def test_standalone_label(self) -> None:
md = ":label:`my_fig`\n"
self.assertEqual(collect_labels(md), ["my_fig"])
def test_inline_table_label(self) -> None:
md = "|:label:`tbl`|||\n"
self.assertEqual(collect_labels(md), ["tbl"])
def test_escaped_underscores(self) -> None:
md = ":label:`ros2\\_topics`\n"
self.assertEqual(collect_labels(md), ["ros2\\_topics"])
def test_empty(self) -> None:
md = "No labels here.\n"
self.assertEqual(collect_labels(md), [])
def test_multiple_labels(self) -> None:
md = ":label:`fig1`\nsome text\n:label:`fig2`\n"
self.assertEqual(collect_labels(md), ["fig1", "fig2"])
class LabelToAnchorTests(unittest.TestCase):
def test_standalone_label_becomes_anchor(self) -> None:
result = normalize_directives(":label:`ROS2_arch`\n")
self.assertIn('<a id="ROS2_arch"></a>', result)
self.assertNotIn(":label:", result)
def test_table_row_label_becomes_anchor(self) -> None:
result = normalize_directives("|:label:`tbl`|||\n")
self.assertIn('|<a id="tbl"></a>|||', result)
def test_width_line_removed(self) -> None:
result = normalize_directives(":width:`800px`\n")
self.assertNotIn(":width:", result)
self.assertNotIn("800px", result)
class NumrefToLinkTests(unittest.TestCase):
def test_same_file_link(self) -> None:
ref_map = {"my_fig": "chapter/page.md"}
result = normalize_directives(
"See :numref:`my_fig`.\n",
ref_label_map=ref_map,
current_source_path="chapter/page.md",
)
self.assertIn("[my_fig](#my_fig)", result)
def test_cross_file_link(self) -> None:
ref_map = {"my_fig": "other_ch/file.md"}
result = normalize_directives(
"See :numref:`my_fig`.\n",
ref_label_map=ref_map,
current_source_path="chapter/page.md",
)
self.assertIn("[my_fig](../other_ch/file.md#my_fig)", result)
def test_unknown_label_fallback(self) -> None:
result = normalize_directives(
"See :numref:`unknown`.\n",
ref_label_map={},
current_source_path="chapter/page.md",
)
self.assertIn("`unknown`", result)
self.assertNotIn("[unknown]", result)
def test_no_ref_map_fallback(self) -> None:
result = normalize_directives("See :numref:`foo`.\n")
self.assertIn("`foo`", result)
def test_escaped_underscores_in_numref(self) -> None:
ref_map = {"ros2\\_topics": "chapter/ros.md"}
result = normalize_directives(
"See :numref:`ros2\\_topics`.\n",
ref_label_map=ref_map,
current_source_path="chapter/ros.md",
)
# _strip_latex_escapes_outside_math removes \_ → _, producing consistent IDs
self.assertIn("[ros2_topics](#ros2_topics)", result)
class RelativeChapterPathTests(unittest.TestCase):
def test_same_file(self) -> None:
self.assertEqual(_relative_chapter_path("ch/page.md", "ch/page.md"), "")
def test_same_dir(self) -> None:
result = _relative_chapter_path("ch/a.md", "ch/b.md")
self.assertEqual(result, "b.md")
def test_different_dir(self) -> None:
result = _relative_chapter_path("ch1/page.md", "ch2/other.md")
self.assertEqual(result, "../ch2/other.md")
class CollectFigureLabelsTests(unittest.TestCase):
def test_image_followed_by_label(self) -> None:
md = "![cap](img.png)\n:label:`fig1`\n"
self.assertEqual(collect_figure_labels(md), ["fig1"])
def test_image_with_width_and_label(self) -> None:
md = "![cap](img.png)\n:width:`800px`\n:label:`fig1`\n"
self.assertEqual(collect_figure_labels(md), ["fig1"])
def test_image_with_blank_lines(self) -> None:
md = "![cap](img.png)\n\n:width:`800px`\n\n:label:`fig1`\n"
self.assertEqual(collect_figure_labels(md), ["fig1"])
def test_table_label_not_collected(self) -> None:
md = "|:label:`tbl`|||\n"
self.assertEqual(collect_figure_labels(md), [])
def test_standalone_label_without_image(self) -> None:
md = "# Heading\n:label:`sec1`\n"
self.assertEqual(collect_figure_labels(md), [])
def test_multiple_figures(self) -> None:
md = "![a](a.png)\n:label:`f1`\n\n![b](b.png)\n:label:`f2`\n"
self.assertEqual(collect_figure_labels(md), ["f1", "f2"])
class ProcessFigureCaptionsTests(unittest.TestCase):
def test_figure_with_number_and_caption(self) -> None:
md = "![量化原理](img.png)\n:width:`800px`\n:label:`fig1`\n"
result = process_figure_captions(md, fig_number_map={"fig1": "8.1"})
self.assertIn('<a id="fig1"></a>', result)
self.assertIn("![量化原理](img.png)", result)
self.assertIn('<p align="center">图8.1 量化原理</p>', result)
self.assertNotIn(":width:", result)
self.assertNotIn(":label:", result)
def test_figure_without_number_map(self) -> None:
md = "![caption](img.png)\n:label:`fig1`\n"
result = process_figure_captions(md)
self.assertIn('<a id="fig1"></a>', result)
self.assertIn("![caption](img.png)", result)
self.assertIn('<p align="center">caption</p>', result)
def test_image_without_label_passthrough(self) -> None:
md = "![caption](img.png)\nSome text\n"
result = process_figure_captions(md)
self.assertIn("![caption](img.png)", result)
self.assertNotIn('<a id=', result)
self.assertNotIn('<p align="center">', result)
def test_figure_empty_caption(self) -> None:
md = "![](img.png)\n:label:`fig1`\n"
result = process_figure_captions(md, fig_number_map={"fig1": "1.1"})
self.assertIn('<p align="center">图1.1</p>', result)
class NumrefWithFigureNumberTests(unittest.TestCase):
def test_numref_shows_figure_number(self) -> None:
result = normalize_directives(
"See :numref:`my_fig`.\n",
ref_label_map={"my_fig": "ch/page.md"},
current_source_path="ch/page.md",
fig_number_map={"my_fig": "8.1"},
)
self.assertIn("[图8.1](#my_fig)", result)
def test_numref_cross_file_with_figure_number(self) -> None:
result = normalize_directives(
"See :numref:`my_fig`.\n",
ref_label_map={"my_fig": "other/page.md"},
current_source_path="ch/page.md",
fig_number_map={"my_fig": "3.2"},
)
self.assertIn("[图3.2](../other/page.md#my_fig)", result)
def test_numref_without_figure_number_shows_name(self) -> None:
result = normalize_directives(
"See :numref:`tbl`.\n",
ref_label_map={"tbl": "ch/page.md"},
current_source_path="ch/page.md",
fig_number_map={},
)
self.assertIn("[tbl](#tbl)", result)
class LabelNumrefIntegrationTests(unittest.TestCase):
def test_rewrite_markdown_with_label_map(self) -> None:
with tempfile.TemporaryDirectory() as tmpdir:
page = Path(tmpdir) / "chapter" / "page.md"
page.parent.mkdir()
page.write_text(
"# Title\n\n:label:`my_fig`\n\nSee :numref:`my_fig`.\n",
encoding="utf-8",
)
rewritten = rewrite_markdown(
page.read_text(encoding="utf-8"),
page.resolve(),
{page.resolve(): "Title"},
ref_label_map={"my_fig": "chapter/page.md"},
current_source_path="chapter/page.md",
)
self.assertIn('<a id="my_fig"></a>', rewritten)
self.assertIn("[my_fig](#my_fig)", rewritten)
def test_rewrite_markdown_cross_file_numref(self) -> None:
with tempfile.TemporaryDirectory() as tmpdir:
page = Path(tmpdir) / "ch1" / "page.md"
page.parent.mkdir()
page.write_text(
"# Title\n\nSee :numref:`other_fig`.\n",
encoding="utf-8",
)
rewritten = rewrite_markdown(
page.read_text(encoding="utf-8"),
page.resolve(),
{page.resolve(): "Title"},
ref_label_map={"other_fig": "ch2/file.md"},
current_source_path="ch1/page.md",
)
self.assertIn("[other_fig](../ch2/file.md#other_fig)", rewritten)
def test_rewrite_markdown_figure_with_number_and_caption(self) -> None:
with tempfile.TemporaryDirectory() as tmpdir:
page = Path(tmpdir) / "ch" / "page.md"
page.parent.mkdir()
page.write_text(
"# Title\n\n![量化原理](img.png)\n:width:`800px`\n:label:`qfig`\n\nSee :numref:`qfig`.\n",
encoding="utf-8",
)
rewritten = rewrite_markdown(
page.read_text(encoding="utf-8"),
page.resolve(),
{page.resolve(): "Title"},
ref_label_map={"qfig": "ch/page.md"},
current_source_path="ch/page.md",
fig_number_map={"qfig": "8.1"},
)
self.assertIn('<a id="qfig"></a>', rewritten)
self.assertIn("![量化原理](img.png)", rewritten)
self.assertIn('<p align="center">图8.1 量化原理</p>', rewritten)
self.assertIn("[图8.1](#qfig)", rewritten)
class ConvertMathToMathjaxTests(unittest.TestCase):
def test_display_math(self) -> None:
result = convert_math_to_mathjax("before $$x^2$$ after")
self.assertEqual(result, "before \\\\[x^2\\\\] after")
def test_inline_math(self) -> None:
result = convert_math_to_mathjax("before $x^2$ after")
self.assertEqual(result, "before \\\\(x^2\\\\) after")
def test_backslash_doubling_inside_math(self) -> None:
result = convert_math_to_mathjax("$$a \\\\ b$$")
self.assertEqual(result, "\\\\[a \\\\\\\\ b\\\\]")
def test_math_inside_code_block_not_converted(self) -> None:
text = "```\n$x^2$\n```"
result = convert_math_to_mathjax(text)
self.assertEqual(result, text)
def test_math_inside_inline_code_not_converted(self) -> None:
text = "use `$x$` for math"
result = convert_math_to_mathjax(text)
self.assertEqual(result, text)
def test_cjk_dollar_spans_stripped(self) -> None:
result = convert_math_to_mathjax("price $100美元$ done")
self.assertEqual(result, "price 100美元 done")
def test_no_math_passthrough(self) -> None:
text = "No math here at all."
self.assertEqual(convert_math_to_mathjax(text), text)
def test_mixed_display_and_inline(self) -> None:
text = "Inline $a$ and display $$b$$."
result = convert_math_to_mathjax(text)
self.assertEqual(result, "Inline \\\\(a\\\\) and display \\\\[b\\\\].")
def test_asterisk_escaped_inside_math(self) -> None:
result = convert_math_to_mathjax("$$n*CHW$$")
self.assertEqual(result, "\\\\[n\\*CHW\\\\]")
def test_underscore_escaped_inside_math(self) -> None:
result = convert_math_to_mathjax("$x_i$")
self.assertEqual(result, "\\\\(x\\_i\\\\)")
if __name__ == "__main__":
unittest.main()

12
theme/head.hbs Normal file
View File

@@ -0,0 +1,12 @@
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
"HTML-CSS": {
availableFonts: ["TeX"],
preferredFont: "TeX",
webFont: "TeX"
},
SVG: {
font: "TeX"
}
});
</script>

View File

@@ -1,16 +0,0 @@
.typst-inline {
display: inline-flex;
vertical-align: -0.2em;
}
.typst-display {
display: flex;
justify-content: center;
margin: 1rem 0;
overflow-x: auto;
}
.typst-doc {
color: var(--fg);
max-width: 100%;
}

View File

@@ -1,132 +0,0 @@
from __future__ import annotations
import argparse
import gzip
import hashlib
import os
import platform
import urllib.request
from pathlib import Path
VERSION = "v0.3.0"
RELEASE_BASE_URL = "https://github.com/duskmoon314/mdbook-typst-math/releases/download"
ASSET_SHA256 = {
"mdbook-typst-math-aarch64-apple-darwin.gz": "9c7a94113e16a465edd1324010e2cc432be3c0794320c13d6a44d9523f069384",
"mdbook-typst-math-aarch64-unknown-linux-gnu.gz": "bbcf4574e380663400af74dda76dd6ecafd36aff185d653a2e24e294c45321c3",
"mdbook-typst-math-x86_64-apple-darwin.gz": "8bb36eb558fc438c55162b442975eca588a7654b8069860526e46cc08c2aee6a",
"mdbook-typst-math-x86_64-pc-windows-msvc.exe": "b5d3e07108a7286007d153c66efe434d06ab6caf43fcd22f78b4e6af8a294314",
"mdbook-typst-math-x86_64-unknown-linux-gnu.gz": "3b785a42fb3a93bcd3f80106e6ded5c55bb0bcd4cd0634edf8232d14444b6987",
}
SUPPORTED_ASSETS = {
("darwin", "aarch64"): "mdbook-typst-math-aarch64-apple-darwin.gz",
("darwin", "x86_64"): "mdbook-typst-math-x86_64-apple-darwin.gz",
("linux", "aarch64"): "mdbook-typst-math-aarch64-unknown-linux-gnu.gz",
("linux", "x86_64"): "mdbook-typst-math-x86_64-unknown-linux-gnu.gz",
("windows", "x86_64"): "mdbook-typst-math-x86_64-pc-windows-msvc.exe",
}
def normalize_machine(machine: str) -> str:
normalized = machine.strip().lower()
if normalized in {"arm64", "aarch64"}:
return "aarch64"
if normalized in {"amd64", "x86_64", "x64"}:
return "x86_64"
return normalized
def normalize_system(system: str) -> str:
normalized = system.strip().lower()
if normalized.startswith("mingw") or normalized.startswith("msys") or normalized.startswith("cygwin"):
return "windows"
return normalized
def resolve_asset_name(system: str | None = None, machine: str | None = None) -> str:
resolved_system = normalize_system(system or platform.system())
resolved_machine = normalize_machine(machine or platform.machine())
asset_name = SUPPORTED_ASSETS.get((resolved_system, resolved_machine))
if asset_name is None:
raise ValueError(
f"Unsupported platform for mdbook-typst-math: system={resolved_system!r}, machine={resolved_machine!r}"
)
return asset_name
def build_download_url(version: str, asset_name: str) -> str:
return f"{RELEASE_BASE_URL}/{version}/{asset_name}"
def resolve_binary_path(output_dir: Path, version: str, asset_name: str) -> Path:
binary_name = "mdbook-typst-math.exe" if asset_name.endswith(".exe") else "mdbook-typst-math"
return output_dir / binary_name
def resolve_version_path(output_dir: Path) -> Path:
return output_dir / ".mdbook-typst-math.version"
def download_bytes(url: str) -> bytes:
request = urllib.request.Request(url, headers={"User-Agent": "openmlsys-mdbook-bootstrap/1.0"})
with urllib.request.urlopen(request) as response:
return response.read()
def ensure_binary(
output_dir: Path,
*,
version: str = VERSION,
system: str | None = None,
machine: str | None = None,
downloader=download_bytes,
) -> Path:
asset_name = resolve_asset_name(system=system, machine=machine)
binary_path = resolve_binary_path(output_dir, version, asset_name)
version_path = resolve_version_path(output_dir)
if binary_path.exists() and version_path.exists() and version_path.read_text(encoding="utf-8").strip() == version:
if binary_path.suffix != ".exe":
binary_path.chmod(binary_path.stat().st_mode | 0o111)
return binary_path
expected_sha256 = ASSET_SHA256[asset_name]
download_url = build_download_url(version, asset_name)
archive_bytes = downloader(download_url)
digest = hashlib.sha256(archive_bytes).hexdigest()
if digest != expected_sha256:
raise ValueError(
f"Checksum mismatch for {asset_name}: expected {expected_sha256}, got {digest}"
)
binary_path.parent.mkdir(parents=True, exist_ok=True)
payload = gzip.decompress(archive_bytes) if asset_name.endswith(".gz") else archive_bytes
output_dir.mkdir(parents=True, exist_ok=True)
temporary_path = binary_path.with_suffix(f"{binary_path.suffix}.tmp")
temporary_path.write_bytes(payload)
if binary_path.suffix != ".exe":
temporary_path.chmod(0o755)
os.replace(temporary_path, binary_path)
version_path.write_text(version, encoding="utf-8")
return binary_path
def parse_args() -> argparse.Namespace:
parser = argparse.ArgumentParser(description="Download a pinned mdbook-typst-math release binary.")
parser.add_argument(
"--output-dir",
type=Path,
default=Path(".mdbook-bin"),
help="Directory used to cache the downloaded mdbook-typst-math binary.",
)
return parser.parse_args()
def main() -> int:
args = parse_args()
binary_path = ensure_binary(args.output_dir.resolve())
print(binary_path)
return 0
if __name__ == "__main__":
raise SystemExit(main())

View File

@@ -1,614 +0,0 @@
"""Convert LaTeX math notation to Typst math notation within markdown content.
This module provides a best-effort converter for the LaTeX math subset used in
the OpenMLSys textbook. It is **not** a general-purpose LaTeX→Typst transpiler;
only the commands that actually appear in the zh_chapters are handled.
"""
from __future__ import annotations
import re
# ---------------------------------------------------------------------------
# Brace-matching helper
# ---------------------------------------------------------------------------
def _find_brace_group(s: str, pos: int) -> tuple[str, int] | None:
"""Return ``(content, end_pos)`` for the ``{…}`` group starting at *pos*.
Skips leading whitespace. Returns ``None`` when no opening brace is found
or braces are unbalanced.
"""
while pos < len(s) and s[pos] in " \t":
pos += 1
if pos >= len(s) or s[pos] != "{":
return None
depth = 0
start = pos + 1
for i in range(pos, len(s)):
if s[i] == "{":
depth += 1
elif s[i] == "}":
depth -= 1
if depth == 0:
return (s[start:i], i + 1)
return None
# ---------------------------------------------------------------------------
# Command tables
# ---------------------------------------------------------------------------
# Commands whose Typst name equals the LaTeX name (just drop the backslash).
SIMPLE_COMMANDS: set[str] = {
# Greek
"alpha", "beta", "gamma", "delta", "Delta",
"epsilon", "zeta", "eta", "theta", "Theta",
"iota", "kappa", "lambda", "Lambda",
"mu", "nu", "xi", "Xi",
"pi", "Pi", "rho",
"sigma", "Sigma", "tau",
"upsilon", "Upsilon",
"phi", "Phi", "chi", "psi", "Psi",
"omega", "Omega",
# Operators / relations
"times", "partial", "nabla", "in",
"top", "prime",
"forall", "exists", "approx", "equiv",
"subset", "supset",
# Big operators / functions
"log", "ln", "exp", "sin", "cos", "tan",
"min", "max", "lim", "sum",
"det", "dim", "ker", "inf", "sup",
}
# Commands that map to a *different* Typst identifier.
RENAMED_COMMANDS: dict[str, str] = {
"cdot": "dot.c",
"cdots": "dots.c",
"ldots": "dots",
"dots": "dots",
"to": "->",
"rightarrow": "->",
"leftarrow": "<-",
"Rightarrow": "=>",
"rightsquigarrow": "arrow.r.squiggly",
"leq": "<=",
"geq": ">=",
"prod": "product",
"notag": "",
"quad": "quad",
"qquad": "wide",
"label": "", # consumed by :eqlabel: already
"sim": "tilde.op",
"infty": "infinity",
"neq": "eq.not",
"ast": "ast.op",
"vdots": "dots.v",
"ddots": "dots.down",
"lVert": "||",
"rVert": "||",
"vert": "|",
"lvert": "|",
"rvert": "|",
"mid": "|",
"cap": "inter",
"cup": "union",
"le": "<=",
"ge": ">=",
"odot": "dot.o",
"oplus": "plus.circle",
"otimes": "times.circle",
}
# \cmd{arg} → typst_func(arg)
ONE_ARG_COMMANDS: dict[str, str] = {
"boldsymbol": "bold",
"mathcal": "cal",
"mathbf": "bold",
"mathbb": "bb",
"hat": "hat",
"bar": "overline",
"dot": "dot",
"tilde": "tilde",
"sqrt": "sqrt",
"overline": "overline",
"pmb": "bold",
"textbf": "bold",
"textit": "italic",
"bm": "bold",
}
# \cmd{arg1}{arg2} → typst_func(arg1, arg2)
TWO_ARG_COMMANDS: dict[str, str] = {
"frac": "frac",
"binom": "binom",
}
# Delimiter-sizing commands to strip (the delimiter char after them is kept).
_SIZING_COMMANDS: set[str] = {
"left", "right", "bigg", "Bigg", "big", "Big", "biggl", "biggr",
}
# ---------------------------------------------------------------------------
# Core single-pass converter
# ---------------------------------------------------------------------------
def _last_char(out: list[str]) -> str:
"""Return the last non-empty character in the output buffer, or ``""``."""
for part in reversed(out):
if part:
return part[-1]
return ""
def _emit(out: list[str], text: str) -> None:
"""Append *text* to *out*, adding a space separator when needed.
In Typst math, consecutive letters form a multi-letter identifier which
will error if unknown. Similarly, letter→digit transitions form tokens
like ``W1``. This helper inserts spaces to prevent such merging, matching
LaTeX math semantics where adjacent characters are separate symbols.
"""
if not text:
out.append(text)
return
lc = _last_char(out)
fc = text[0]
if lc and (
# letter→letter (e.g. "ou" → "o u")
(lc.isalpha() and fc.isalpha())
# letter→digit (e.g. "W1" → "W 1")
or (lc.isalpha() and fc.isdigit())
# digit→letter (e.g. "2x" → "2 x")
or (lc.isdigit() and fc.isalpha())
# )→letter/digit (e.g. "bold(X)y" → "bold(X) y")
or (lc == ")" and (fc.isalpha() or fc.isdigit()))
):
out.append(" ")
out.append(text)
def _convert(s: str) -> str:
"""Convert a single LaTeX math expression to Typst math."""
out: list[str] = []
i = 0
n = len(s)
while i < n:
ch = s[i]
# ---- backslash commands ----
if ch == "\\" and i + 1 < n:
nxt = s[i + 1]
# Double backslash: either markdown-escaped bracket or line-break
if nxt == "\\":
# \\{ \\} \\[ \\] \\( \\) → markdown-escaped LaTeX delimiters
if i + 2 < n and s[i + 2] in "{}[]()":
out.append(s[i + 2])
i += 3
continue
out.append(" \\\n")
i += 2
continue
# Escaped characters: \{ \} \[ \] \( \) \, \; \! \ \.
if nxt in "{}[]()":
out.append(nxt)
i += 2
continue
if nxt == ",":
out.append("thin ") # thin space
i += 2
continue
if nxt == ";":
out.append("med ") # medium space
i += 2
continue
if nxt == "!":
out.append("") # negative thin space → ignore
i += 2
continue
if nxt == " ":
out.append(" ")
i += 2
continue
if nxt == "\n":
out.append(" ")
i += 2
continue
# Try to match an alphabetic command name
m = re.match(r"[a-zA-Z]+", s[i + 1:])
if not m:
# Bare backslash before non-alpha → keep the char after
out.append(nxt)
i += 2
continue
cmd = m.group()
after = i + 1 + m.end()
# -- environments --
if cmd == "begin":
g = _find_brace_group(s, after)
if g:
env_name, env_pos = g
end_marker = f"\\end{{{env_name}}}"
end_idx = s.find(end_marker, env_pos)
if end_idx != -1:
body = s[env_pos:end_idx]
i = end_idx + len(end_marker)
_emit(out, _convert_environment(env_name, body))
continue
# Fallthrough: couldn't parse, skip \begin
i = after
continue
if cmd == "end":
# Stray \end (shouldn't happen if \begin matched)
g = _find_brace_group(s, after)
i = g[1] if g else after
continue
# -- special multi-arg commands --
if cmd == "underset":
# \underset{below}{base} → attach(base, b: below)
g1 = _find_brace_group(s, after)
if g1:
below, p1 = g1
g2 = _find_brace_group(s, p1)
if g2:
base, p2 = g2
_emit(out, f"attach({_convert(base)}, b: {_convert(below)})")
i = p2
continue
if cmd == "overset":
# \overset{above}{base} → attach(base, t: above)
g1 = _find_brace_group(s, after)
if g1:
above, p1 = g1
g2 = _find_brace_group(s, p1)
if g2:
base, p2 = g2
_emit(out, f"attach({_convert(base)}, t: {_convert(above)})")
i = p2
continue
if cmd == "operatorname":
# \operatorname{name} → op("name")
g = _find_brace_group(s, after)
if g:
name, pos = g
_emit(out, f'op("{name}")')
i = pos
continue
if cmd == "tag":
# \tag{n} → visual equation number
g = _find_brace_group(s, after)
if g:
content, pos = g
_emit(out, f'quad upright("({content})")')
i = pos
continue
if cmd == "eqref":
# \eqref{name} → show label name as fallback
g = _find_brace_group(s, after)
if g:
content, pos = g
_emit(out, f'upright("({content})")')
i = pos
continue
if cmd in ("mathrm", "text"):
# \mathrm{text} → upright("text") — treat as text, not math
g = _find_brace_group(s, after)
if g:
content, pos = g
stripped = content.strip()
if stripped:
_emit(out, f'upright("{stripped}")')
# else: empty mathrm (spacing hack) → drop
i = pos
continue
# -- two-arg commands --
if cmd in TWO_ARG_COMMANDS:
g1 = _find_brace_group(s, after)
if g1:
c1, p1 = g1
g2 = _find_brace_group(s, p1)
if g2:
c2, p2 = g2
func = TWO_ARG_COMMANDS[cmd]
_emit(out, f"{func}({_convert(c1)}, {_convert(c2)})")
i = p2
continue
# Fallthrough
_emit(out, cmd)
i = after
continue
# -- one-arg commands --
if cmd in ONE_ARG_COMMANDS:
g = _find_brace_group(s, after)
if g:
content, pos = g
func = ONE_ARG_COMMANDS[cmd]
_emit(out, f"{func}({_convert(content)})")
i = pos
continue
# Fallthrough: no brace group → just emit the typst name
_emit(out, ONE_ARG_COMMANDS[cmd])
i = after
continue
# -- \rm (applies upright to the rest of the current scope) --
if cmd == "rm":
raw_rest = s[after:]
leading = len(raw_rest) - len(raw_rest.lstrip())
rest = raw_rest.lstrip()
# Grab one "word"
wm = re.match(r"[A-Za-z0-9]+", rest)
if wm:
word = wm.group()
_emit(out, f"upright({word})")
i = after + leading + len(word)
continue
_emit(out, "upright")
i = after
continue
# -- delimiter sizing --
if cmd in _SIZING_COMMANDS:
# Skip the command; keep whatever delimiter follows.
i = after
continue
# -- simple (same name) --
if cmd in SIMPLE_COMMANDS:
_emit(out, cmd)
# Also add right-side space when next char would merge
if after < n and (s[after].isalnum() or s[after] == "\\"):
out.append(" ")
i = after
continue
# -- renamed --
if cmd in RENAMED_COMMANDS:
repl = RENAMED_COMMANDS[cmd]
if repl:
_emit(out, repl)
if after < n and s[after].isalnum():
out.append(" ")
# If repl is empty the command is silently dropped.
# For \label{...} consume the brace group too.
if cmd == "label":
g = _find_brace_group(s, after)
if g:
i = g[1]
continue
i = after
continue
# -- unknown command → emit name without backslash --
_emit(out, cmd)
if after < n and s[after].isalnum():
out.append(" ")
i = after
continue
# ---- brace groups (not consumed by a command) ----
if ch == "{":
g = _find_brace_group(s, i)
if g:
content, end = g
# Check if preceded by ^ or _ → superscript/subscript grouping
if out and out[-1] and out[-1][-1] in "^_":
out.append(f"({_convert(content)})")
i = end
continue
# Check for {\rm ...} pattern
rm_m = re.match(r"\\rm\s+", content)
if rm_m:
inner = content[rm_m.end():]
_emit(out, f"upright({_convert(inner)})")
i = end
continue
# Otherwise, just emit the converted content (braces act as
# invisible grouping in LaTeX — no Typst equivalent needed
# in most contexts).
_emit(out, _convert(content))
i = end
continue
# Unmatched brace — emit as-is
out.append(ch)
i += 1
continue
# ---- everything else (digits, letters, operators, whitespace) ----
# Use _emit so consecutive raw letters get spaces inserted,
# matching LaTeX math semantics where adjacent letters are
# separate variables (e.g. "out" → "o u t" in Typst).
_emit(out, ch)
i += 1
result = "".join(out)
# Typst math requires a base before ^ or _; add an invisible base
# when the expression starts with a script marker (e.g. $^2$).
if result and result.lstrip() and result.lstrip()[0] in "^_":
result = '""' + result.lstrip()
return result
# ---------------------------------------------------------------------------
# Environment converters
# ---------------------------------------------------------------------------
def _convert_environment(name: str, body: str) -> str:
"""Convert a ``\\begin{name}\\end{name}`` block to Typst."""
if name in ("matrix", "bmatrix", "pmatrix", "vmatrix"):
return _convert_matrix_env(name, body)
if name == "cases":
return _convert_cases_env(body)
if name in ("aligned", "split"):
# Just unwrap — Typst math handles & alignment and \ line-breaks.
converted = _convert(body)
return converted
if name == "figure":
# Not real math; pass through as-is.
return f"\\begin{{{name}}}{body}\\end{{{name}}}"
# Unknown environment — pass through converted content
return _convert(body)
def _convert_matrix_env(name: str, body: str) -> str:
"""Convert matrix/bmatrix/pmatrix/vmatrix to ``mat(…)``."""
delim_map = {
"matrix": "",
"bmatrix": '"["',
"pmatrix": '"("',
"vmatrix": '"|"',
}
# Split rows on \\, columns on &
rows: list[str] = []
for row_text in re.split(r"\\\\", body):
row_text = row_text.strip()
if not row_text:
continue
cells = [_convert(c.strip()) for c in row_text.split("&")]
rows.append(", ".join(cells))
inner = "; ".join(rows)
delim = delim_map.get(name, "")
if delim:
return f"mat(delim: {delim}, {inner})"
return f"mat({inner})"
def _convert_cases_env(body: str) -> str:
"""Convert cases environment to ``cases(…)``."""
branches: list[str] = []
for branch_text in re.split(r"\\\\", body):
branch_text = branch_text.strip()
if not branch_text:
continue
branches.append(_convert(branch_text))
return "cases(" + ", ".join(branches) + ")"
# ---------------------------------------------------------------------------
# Markdown-level math-span detection
# ---------------------------------------------------------------------------
_FENCE_RE = re.compile(r"^(`{3,}|~{3,})", re.MULTILINE)
def _iter_math_spans(content: str):
"""Yield ``(start, end, is_display)`` for every math span.
Skips spans inside fenced code blocks and inline code.
"""
n = len(content)
i = 0
in_fence: str | None = None # fence marker when inside a code block
while i < n:
# Track fenced code blocks
if content[i] == "`" or content[i] == "~":
m = _FENCE_RE.match(content, i)
if m and (i == 0 or content[i - 1] == "\n"):
marker = m.group(1)
if in_fence is None:
in_fence = marker[0] # opening
i = content.index("\n", i) + 1 if "\n" in content[i:] else n
continue
elif marker[0] == in_fence:
in_fence = None # closing
i = m.end()
continue
if in_fence:
i += 1
continue
# Skip inline code
if content[i] == "`":
end_tick = content.find("`", i + 1)
if end_tick != -1:
i = end_tick + 1
continue
# Display math $$...$$
if content[i:i + 2] == "$$":
start = i
close = content.find("$$", i + 2)
if close != -1:
yield (start + 2, close, True)
i = close + 2
continue
# Inline math $...$
if content[i] == "$":
start = i
# Find closing $ — any next $ closes the span (even if followed
# by another $, which starts a NEW span).
j = i + 1
while j < n:
if content[j] == "$":
if j > i + 1: # non-empty
yield (start + 1, j, False)
j += 1
break
if content[j] == "\n" and not content[i + 1:j].strip():
break # empty line → not math
j += 1
i = j
continue
i += 1
_CJK_RE = re.compile(r"[\u4e00-\u9fff\u3400-\u4dbf\uf900-\ufaff]")
def convert_latex_math_to_typst(content: str) -> str:
"""Replace LaTeX math with Typst math throughout *content* (markdown)."""
spans = list(_iter_math_spans(content))
if not spans:
return content
parts: list[str] = []
prev = 0
for start, end, is_display in spans:
delim = "$$" if is_display else "$"
delim_len = len(delim)
delim_start = start - delim_len
latex = content[start:end]
# Spans containing CJK characters are almost certainly mismatched $.
# Strip the $ delimiters and emit the raw text.
if _CJK_RE.search(latex):
parts.append(content[prev:delim_start])
parts.append(latex) # emit without $ delimiters
prev = end + delim_len
continue
parts.append(content[prev:delim_start])
converted = _convert(latex)
# Strip leading/trailing whitespace from inline math so that
# ``$ text$`` (space after opening $) never occurs — CommonMark
# and mdbook-typst-math treat that as non-math.
if not is_display:
converted = converted.strip()
parts.append(f"{delim}{converted}{delim}")
prev = end + delim_len
parts.append(content[prev:])
return "".join(parts)

View File

@@ -5,9 +5,9 @@ import sys
from pathlib import Path
try:
from tools.prepare_mdbook import build_title_cache, parse_bib, rewrite_markdown
from tools.prepare_mdbook import build_title_cache, collect_figure_labels, collect_labels, convert_math_to_mathjax, parse_bib, rewrite_markdown
except ModuleNotFoundError:
from prepare_mdbook import build_title_cache, parse_bib, rewrite_markdown
from prepare_mdbook import build_title_cache, collect_figure_labels, collect_labels, convert_math_to_mathjax, parse_bib, rewrite_markdown
PLACEHOLDER_PREFIX = "[TODO: src = zh_chapters/"
@@ -43,7 +43,25 @@ def main() -> int:
for key, fields in parse_bib(extra_bib).items():
bib_db.setdefault(key, fields)
for chapter in iter_chapters(book.get("items", [])):
chapters = iter_chapters(book.get("items", []))
# Pass 1: collect all :label: directives and figure labels
ref_label_map: dict[str, str] = {}
fig_number_map: dict[str, str] = {}
for chapter in chapters:
source_path = chapter.get("source_path") or chapter.get("path")
if not source_path:
continue
for label in collect_labels(chapter["content"]):
ref_label_map.setdefault(label, source_path)
number = chapter.get("number")
if number:
prefix = ".".join(str(n) for n in number)
for idx, label in enumerate(collect_figure_labels(chapter["content"]), 1):
fig_number_map[label] = f"{prefix}.{idx}"
# Pass 2: rewrite markdown with cross-reference linking
for chapter in chapters:
source_path = chapter.get("source_path") or chapter.get("path")
if not source_path:
continue
@@ -56,7 +74,11 @@ def main() -> int:
bibliography_title=BIBLIOGRAPHY_TITLE,
frontpage_switch_label=FRONTPAGE_SWITCH_LABEL,
frontpage_switch_href=FRONTPAGE_SWITCH_HREF,
ref_label_map=ref_label_map,
current_source_path=source_path,
fig_number_map=fig_number_map,
)
chapter["content"] = convert_math_to_mathjax(chapter["content"])
json.dump(book, sys.stdout, ensure_ascii=False)
return 0

View File

@@ -5,11 +5,9 @@ import sys
from pathlib import Path
try:
from tools.prepare_mdbook import build_title_cache, parse_bib, rewrite_markdown
from tools.latex_to_typst import convert_latex_math_to_typst
from tools.prepare_mdbook import build_title_cache, collect_figure_labels, collect_labels, convert_math_to_mathjax, parse_bib, rewrite_markdown
except ModuleNotFoundError:
from prepare_mdbook import build_title_cache, parse_bib, rewrite_markdown
from latex_to_typst import convert_latex_math_to_typst
from prepare_mdbook import build_title_cache, collect_figure_labels, collect_labels, convert_math_to_mathjax, parse_bib, rewrite_markdown
BIBLIOGRAPHY_TITLE = "参考文献"
@@ -44,7 +42,25 @@ def main() -> int:
for key, fields in parse_bib(extra_bib).items():
bib_db.setdefault(key, fields)
for chapter in iter_chapters(book.get("items", [])):
chapters = iter_chapters(book.get("items", []))
# Pass 1: collect all :label: directives and figure labels
ref_label_map: dict[str, str] = {}
fig_number_map: dict[str, str] = {}
for chapter in chapters:
source_path = chapter.get("source_path") or chapter.get("path")
if not source_path:
continue
for label in collect_labels(chapter["content"]):
ref_label_map.setdefault(label, source_path)
number = chapter.get("number")
if number:
prefix = ".".join(str(n) for n in number)
for idx, label in enumerate(collect_figure_labels(chapter["content"]), 1):
fig_number_map[label] = f"{prefix}.{idx}"
# Pass 2: rewrite markdown with cross-reference linking
for chapter in chapters:
source_path = chapter.get("source_path") or chapter.get("path")
if not source_path:
continue
@@ -57,8 +73,11 @@ def main() -> int:
bibliography_title=BIBLIOGRAPHY_TITLE,
frontpage_switch_label=FRONTPAGE_SWITCH_LABEL,
frontpage_switch_href=FRONTPAGE_SWITCH_HREF,
ref_label_map=ref_label_map,
current_source_path=source_path,
fig_number_map=fig_number_map,
)
chapter["content"] = convert_latex_math_to_typst(chapter["content"])
chapter["content"] = convert_math_to_mathjax(chapter["content"])
json.dump(book, sys.stdout, ensure_ascii=False)
return 0

View File

@@ -4,13 +4,16 @@ import argparse
import os
import re
from dataclasses import dataclass
from pathlib import Path
from pathlib import Path, PurePosixPath
TOC_FENCE = "toc"
EVAL_RST_FENCE = "eval_rst"
OPTION_LINE_RE = re.compile(r"^:(width|label):`[^`]+`\s*$", re.MULTILINE)
WIDTH_LINE_RE = re.compile(r"^:width:`[^`]+`\s*$", re.MULTILINE)
LABEL_RE = re.compile(r":label:`([^`]+)`")
NUMREF_RE = re.compile(r":numref:`([^`]+)`")
IMAGE_LINE_RE = re.compile(r"^!\[([^\]]*)\]\(([^)]+)\)\s*$")
LABEL_LINE_RE = re.compile(r"^:label:`([^`]+)`\s*$")
EQREF_RE = re.compile(r":eqref:`([^`]+)`")
EQLABEL_LINE_RE = re.compile(r"^:eqlabel:`([^`]+)`\s*$")
CITE_RE = re.compile(r":cite:`([^`]+)`")
@@ -343,12 +346,110 @@ def process_equation_labels(markdown: str) -> tuple[str, dict[str, int]]:
return "\n".join(result), label_map
def collect_labels(markdown: str) -> list[str]:
"""Extract all label names from :label: directives."""
return LABEL_RE.findall(markdown)
def collect_figure_labels(markdown: str) -> list[str]:
"""Return label names for figures (image lines followed by :label:)."""
labels: list[str] = []
lines = markdown.splitlines()
for i, line in enumerate(lines):
if not IMAGE_LINE_RE.match(line.strip()):
continue
j = i + 1
while j < len(lines):
s = lines[j].strip()
if not s or WIDTH_LINE_RE.match(s):
j += 1
continue
m = LABEL_LINE_RE.match(s)
if m:
labels.append(m.group(1))
break
return labels
def process_figure_captions(
markdown: str,
fig_number_map: dict[str, str] | None = None,
) -> str:
"""Convert image+label blocks into figures with anchors and captions."""
lines = markdown.splitlines()
result: list[str] = []
i = 0
while i < len(lines):
img_match = IMAGE_LINE_RE.match(lines[i].strip())
if img_match:
caption = img_match.group(1)
img_line = lines[i]
# Look ahead for :width: and :label:
j = i + 1
label = None
while j < len(lines):
s = lines[j].strip()
if not s or WIDTH_LINE_RE.match(s):
j += 1
continue
m = LABEL_LINE_RE.match(s)
if m:
label = m.group(1)
j += 1
break
if label:
fig_num = (fig_number_map or {}).get(label)
result.append(f'<a id="{label}"></a>')
result.append("")
result.append(img_line)
if fig_num and caption:
result.append("")
result.append(f'<p align="center">图{fig_num} {caption}</p>')
elif fig_num:
result.append("")
result.append(f'<p align="center">图{fig_num}</p>')
elif caption:
result.append("")
result.append(f'<p align="center">{caption}</p>')
i = j
continue
result.append(lines[i])
i += 1
return "\n".join(result)
def _relative_chapter_path(from_path: str, to_path: str) -> str:
"""Compute relative path between two mdbook source_paths."""
if from_path == to_path:
return ""
from_dir = str(PurePosixPath(from_path).parent)
return PurePosixPath(os.path.relpath(to_path, start=from_dir)).as_posix()
def normalize_directives(
markdown: str,
label_map: dict[str, int] | None = None,
ref_label_map: dict[str, str] | None = None,
current_source_path: str | None = None,
fig_number_map: dict[str, str] | None = None,
) -> str:
normalized = OPTION_LINE_RE.sub("", markdown)
normalized = NUMREF_RE.sub(lambda match: f"`{match.group(1)}`", normalized)
normalized = WIDTH_LINE_RE.sub("", markdown)
normalized = LABEL_RE.sub(lambda m: f'<a id="{m.group(1)}"></a>', normalized)
def _numref_replace(match: re.Match[str]) -> str:
name = match.group(1)
if ref_label_map and current_source_path and name in ref_label_map:
target_path = ref_label_map[name]
rel = _relative_chapter_path(current_source_path, target_path)
display = f"{fig_number_map[name]}" if fig_number_map and name in fig_number_map else name
if rel:
return f"[{display}]({rel}#{name})"
return f"[{display}](#{name})"
return f"`{name}`"
normalized = NUMREF_RE.sub(_numref_replace, normalized)
if label_map:
normalized = EQREF_RE.sub(
lambda m: f"({label_map[m.group(1)]})" if m.group(1) in label_map else f"$\\eqref{{{m.group(1)}}}$",
@@ -509,6 +610,121 @@ def process_citations(
return processed
_FENCE_RE = re.compile(r"^(`{3,}|~{3,})", re.MULTILINE)
_CJK_RE = re.compile(r"[\u4e00-\u9fff\u3400-\u4dbf\uf900-\ufaff]")
def _iter_math_spans(content: str):
"""Yield ``(start, end, is_display)`` for every math span.
Skips spans inside fenced code blocks and inline code.
"""
n = len(content)
i = 0
in_fence: str | None = None # fence marker when inside a code block
while i < n:
# Track fenced code blocks
if content[i] == "`" or content[i] == "~":
m = _FENCE_RE.match(content, i)
if m and (i == 0 or content[i - 1] == "\n"):
marker = m.group(1)
if in_fence is None:
in_fence = marker[0] # opening
i = content.index("\n", i) + 1 if "\n" in content[i:] else n
continue
elif marker[0] == in_fence:
in_fence = None # closing
i = m.end()
continue
if in_fence:
i += 1
continue
# Skip inline code
if content[i] == "`":
end_tick = content.find("`", i + 1)
if end_tick != -1:
i = end_tick + 1
continue
# Display math $$...$$
if content[i:i + 2] == "$$":
start = i
close = content.find("$$", i + 2)
if close != -1:
yield (start + 2, close, True)
i = close + 2
continue
# Inline math $...$
if content[i] == "$":
start = i
j = i + 1
while j < n:
if content[j] == "$":
if j > i + 1: # non-empty
yield (start + 1, j, False)
j += 1
break
if content[j] == "\n" and not content[i + 1:j].strip():
break # empty line → not math
j += 1
i = j
continue
i += 1
def convert_math_to_mathjax(content: str) -> str:
"""Replace ``$``/``$$`` delimited math with MathJax ``\\(…\\)``/``\\[…\\]``.
Inside math content, ``\\`` (LaTeX newline) is doubled to ``\\\\`` so that
mdBook's markdown processing (which consumes one level of backslash
escaping) delivers the correct ``\\`` to MathJax.
"""
spans = list(_iter_math_spans(content))
if not spans:
return content
parts: list[str] = []
prev = 0
for start, end, is_display in spans:
delim = "$$" if is_display else "$"
delim_len = len(delim)
delim_start = start - delim_len
math = content[start:end]
# Spans containing CJK characters are almost certainly mismatched $.
# Strip the $ delimiters and emit the raw text.
if _CJK_RE.search(math):
parts.append(content[prev:delim_start])
parts.append(math)
prev = end + delim_len
continue
parts.append(content[prev:delim_start])
# Double backslashes inside math so that after mdBook markdown
# processing (which eats one backslash layer) MathJax sees the
# original LaTeX.
math = math.replace("\\\\", "\\\\\\\\")
math = math.replace("*", "\\*")
math = math.replace("_", "\\_")
if is_display:
parts.append(f"\\\\[{math}\\\\]")
else:
parts.append(f"\\\\({math}\\\\)")
prev = end + delim_len
parts.append(content[prev:])
return "".join(parts)
def resolve_raw_html_file(current_file: Path, filename: str) -> Path:
direct = (current_file.parent / filename).resolve()
if direct.exists():
@@ -628,6 +844,9 @@ def rewrite_markdown(
bibliography_title: str = DEFAULT_BIBLIOGRAPHY_TITLE,
frontpage_switch_label: str | None = None,
frontpage_switch_href: str | None = None,
ref_label_map: dict[str, str] | None = None,
current_source_path: str | None = None,
fig_number_map: dict[str, str] | None = None,
) -> str:
output: list[str] = []
lines = markdown.splitlines()
@@ -676,7 +895,14 @@ def rewrite_markdown(
raw = "\n".join(output) + "\n"
result, label_map = process_equation_labels(raw)
result = normalize_directives(result, label_map=label_map)
result = process_figure_captions(result, fig_number_map=fig_number_map)
result = normalize_directives(
result,
label_map=label_map,
ref_label_map=ref_label_map,
current_source_path=current_source_path,
fig_number_map=fig_number_map,
)
result = process_citations(result, bib_db or {}, bibliography_title=bibliography_title)
return result

View File

@@ -14,8 +14,3 @@
- CUDA编程指导 [CUDA](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html)
- 昇腾社区 [Ascend](https://gitee.com/ascend)
- MLIR应用进展 [MLIR](https://mlir.llvm.org/talks)
## 参考文献
:bibliography:`../references/accelerator.bib`

View File

@@ -28,15 +28,19 @@
但是计算机的存储并不能够直接将这样的矩阵放到内存中需要将其展平成1维后存储这样就涉及逻辑上的索引如何映射成为内存中的索引即如何根据逻辑数据索引来映射到内存中的1维数据索引。
对于NCHW的数据是先取W轴方向数据再取H轴方向数据再取C轴方向最后取N轴方向。其中物理存储与逻辑存储的之间的映射关系为
$$offsetnchw(n,c,h,w) = n*CHW + c*HW + h*W +w$$
如 :numref:`nchw`所示这种格式中是按照最低维度W轴方向进行展开W轴相邻的元素在内存排布中同样是相邻的。如果需要取下一个图片上的相同位置的元素就必须跳过整个图像的尺寸$C*H*W$。比如有8张32\*32的RGB图像此时$N=8,C=3,H=32,W=32$。在内存中存储它们需要先按照W轴方向进行展开然后按照H轴排列这样之后便完成了一个通道的处理之后按照同样的方式处理下一个通道。处理完全部通道后处理下一张图片。PyTorch和MindSpore框架默认使用NCHW格式。
如 :numref:`nchw`所示这种格式中是按照最低维度W轴方向进行展开W轴相邻的元素在内存排布中同样是相邻的。如果需要取下一个图片上的相同位置的元素就必须跳过整个图像的尺寸 $C*H*W$。比如有8张32\*32的RGB图像此时$N=8,C=3,H=32,W=32$。在内存中存储它们需要先按照W轴方向进行展开然后按照H轴排列这样之后便完成了一个通道的处理之后按照同样的方式处理下一个通道。处理完全部通道后处理下一张图片。PyTorch和MindSpore框架默认使用NCHW格式。
![RGB图片下的NHWC数据格式](../img/ch05/nchw.png)
:width:`800px`
:label:`nchw`
类似的NHWC数据格式是先取C方向数据再取W方向然后是H方向最后取N方向。NHWC是Tensorflow默认的数据格式。这种格式在PyTorch中称为Channel-Last。
$$offsetnhwc(n,h,w,c) = n*HWC + h*WC + w*C +c$$
:numref:`nchwandnhwc`展示了不同数据格式下逻辑排布到内存物理侧数据排布的映射。\[x:1\]代表从最内侧维度到最下一维度的索引变换。比如\[a:1\]表示当前行W轴结束后下一个H轴排布。\[b:1\]表示最内侧C轴排布完成后进行按照W轴进行排列。
![NCHW与NHWC数据存储格式](../img/ch05/nchwandnhwc.png)

View File

@@ -8,8 +8,4 @@
- 消息队列介绍:[什么是消息队列](https://aws.amazon.com/message-queue/)
- 特征存储介绍:[什么是机器学习中的特征存储](https://www.featurestore.org/what-is-a-feature-store)
## 参考文献
:bibliography:`../references/recommender.bib`
- 特征存储介绍:[什么是机器学习中的特征存储](https://www.featurestore.org/what-is-a-feature-store)

View File

@@ -1,7 +1,3 @@
## 小结
在这一章,我们简单介绍了强化学习的基本概念,包括单智能体和多智能体强化学习算法、单节点和分布式强化学习系统等,给读者对强化学习问题的基本认识。当前,强化学习是一个快速发展的深度学习分支,许多实际问题都有可能通过强化学习算法的进一步发展得到解决。另一方面,由于强化学习问题设置的特殊性(如需要与环境交互进行采样等),也使得相应算法对计算系统的要求更高:如何更好地平衡样本采集和策略训练过程?如何均衡 CPU 和 GPU 等不同计算硬件的能力?如何在大规模分布式系统上有效部署强化学习智能体?都需要对计算机系统的设计和使用有更好的理解。
## 参考文献
:bibliography:`../references/reinforcement.bib`
在这一章,我们简单介绍了强化学习的基本概念,包括单智能体和多智能体强化学习算法、单节点和分布式强化学习系统等,给读者对强化学习问题的基本认识。当前,强化学习是一个快速发展的深度学习分支,许多实际问题都有可能通过强化学习算法的进一步发展得到解决。另一方面,由于强化学习问题设置的特殊性(如需要与环境交互进行采样等),也使得相应算法对计算系统的要求更高:如何更好地平衡样本采集和策略训练过程?如何均衡 CPU 和 GPU 等不同计算硬件的能力?如何在大规模分布式系统上有效部署强化学习智能体?都需要对计算机系统的设计和使用有更好的理解。

View File

@@ -4,7 +4,7 @@
:width:`800px`
:label:`ROS2\_arch`
:label:`ROS2_arch`
在这一章节中,我们来大致了解一下机器人操作系统(ROS)。机器人操作系统(ROS)起源于斯坦福大学人工智能实验室的一个机器人项目。它是一个自由、开源的框架提供接口、工具来构建先进的机器人。由于机器人领域的快速发展和复杂化代码复用和模块化的需求日益强烈ROS适用于机器人这种多节点多任务的复杂场景。目前也有一些机器人、无人机甚至无人车都开始采用ROS作为开发平台。在机器人学习方面ROS/ROS2可以与深度学习结合有开发人员为ROS/ROS2开发了的深度学习节点并支持NVIDIA Jetson和TensorRT。NVIDIA Jetson是NVIDIA为自主机器开发的一个嵌入式系统包括CPU、GPU、PMIC、DRAM 和闪存的一个模组化系统可以将自主机器软件运作系统运行速率提升。TensorRT 是由 Nvidia 发布的机器学习框架,用于在其硬件上运行机器学习推理。
@@ -12,19 +12,19 @@
ROS提供了很多内置工具比如三维可视化器rviz用于可视化机器人、它们工作的环境和传感器数据。它是一个高度可配置的工具具有许多不同类型的可视化和插件。catkin是ROS 构建系统类似于Linux下的CMakeCatkin Workspace是创建、修改、编译Catkin软件包的目录。roslaunch可用于在本地和远程启动多个ROS 节点以及在ROS参数服务器上设置参数的工具。此外还有机器人仿真工具Gazebo和移动操作软件和规划框架MoveIt!。ROS为机器人开发者提供了不同编程语言的接口比如C++语言ROS接口roscpppython语言的ROS接口rospy。ROS中提供了许多机器人的统一机器人描述格式URDFUnified Robot Description Format文件URDF使用XML格式描述机器人文件。ROS也有一些需要提高的地方比如它的通信实时性能有限与工业级要求的系统稳定性还有一定差距。
ROS2项目在ROSCon 2014上被宣布第一个ROS2发行版 Ardent Apalone 于2017年发布。ROS2增加了对多机器人系统的支持提高了多机器人之间通信的网络性能而且支持微控制器和跨系统平台不仅可以运行在现有的X86和ARM系统上还将支持MCU等嵌入式微控制器不止能运行在Linux系统之上还增加了对Windows、MacOS、RTOS等系统的支持。更重要的是ROS2还加入了实时控制的支持可以提高控制的时效性和整体机器人的性能。ROS2的通信系统基于DDSData Distribution Service即数据分发服务如 :numref:`ROS2\_arch`所示。
ROS2项目在ROSCon 2014上被宣布第一个ROS2发行版 Ardent Apalone 于2017年发布。ROS2增加了对多机器人系统的支持提高了多机器人之间通信的网络性能而且支持微控制器和跨系统平台不仅可以运行在现有的X86和ARM系统上还将支持MCU等嵌入式微控制器不止能运行在Linux系统之上还增加了对Windows、MacOS、RTOS等系统的支持。更重要的是ROS2还加入了实时控制的支持可以提高控制的时效性和整体机器人的性能。ROS2的通信系统基于DDSData Distribution Service即数据分发服务如 :numref:`ROS2_arch`所示。
ROS2依赖于使用shell环境组合工作区。“工作区”Workspace是一个ROS术语表示使用ROS2进行开发的系统位置。核心ROS2 工作区称为Underlay。随后的工作区称为Overlays。使用ROS2进行开发时通常会同时有多个工作区处于活动状态。接下来我们详细介绍一下ROS2的核心概念。这一部分我们参考了文献 [^1]。
### ROS2节点
ROS Graph是一个由ROS2元素组成的网络在同一时间一起处理数据。它包括所有的可执行文件和它们之间的联系。ROS2 中的每个节点都应负责一个单一的模块用途(例如,一个节点用于控制车轮马达,一个节点用于控制激光测距仪等)。每个节点都可以通过主题、服务、动作或参数向其他节点发送和接收数据。一个完整的机器人系统由许多协同工作的节点组成。如 :numref:`ros2\_graph`。在ROS2中单个可执行文件C++程序、Python 程序等)可以包含一个或多个节点。
ROS Graph是一个由ROS2元素组成的网络在同一时间一起处理数据。它包括所有的可执行文件和它们之间的联系。ROS2 中的每个节点都应负责一个单一的模块用途(例如,一个节点用于控制车轮马达,一个节点用于控制激光测距仪等)。每个节点都可以通过主题、服务、动作或参数向其他节点发送和接收数据。一个完整的机器人系统由许多协同工作的节点组成。如 :numref:`ros2_graph`。在ROS2中单个可执行文件C++程序、Python 程序等)可以包含一个或多个节点。
![一个完整的机器人系统由许多协同工作的节点组成](../img/ch13/ros2_graph.png)
:width:`800px`
:label:`ros2\_graph`
:label:`ros2_graph`
节点之间的互相发现是通过ROS2底层的中间件实现的过程总结如下

View File

@@ -1,7 +1,3 @@
### 总结
在这一章我们简单介绍了机器人系统的基本概念包括通用机器人操作系统、感知系统、规划系统和控制系统等给读者对机器人问题的基本认识。对通用机器人操作系统部分我们回顾了其中的基本概念并通过代码实例让读者对ROS能有直接的体验体会到搭建一个简单机器人系统的乐趣。当前机器人是一个快速发展的人工智能分支许多实际问题都需要通过机器人算法和系统设计的进一步发展得到解决。
## 参考文献
:bibliography:`../references/rlsys.bib`
在这一章我们简单介绍了机器人系统的基本概念包括通用机器人操作系统、感知系统、规划系统和控制系统等给读者对机器人问题的基本认识。对通用机器人操作系统部分我们回顾了其中的基本概念并通过代码实例让读者对ROS能有直接的体验体会到搭建一个简单机器人系统的乐趣。当前机器人是一个快速发展的人工智能分支许多实际问题都需要通过机器人算法和系统设计的进一步发展得到解决。