13 Commits

Author SHA1 Message Date
Luo Mai
5ac39809e9 Fix AllGather. (#442)
* Update collective.md

* Update overview.md

---------

Co-authored-by: Tanzhipeng <Rudysheeppig@users.noreply.github.com>
2023-03-31 18:39:08 +08:00
Luo Mai
4541da5d58 Update the distributed training chapter. (#436)
* WIP: distributed.

* Update the distributed section.

* Fix line numbers.

* Update.

* Update collective.md
2023-03-29 17:26:29 +01:00
Yao Fu
f5c3b21e12 Yao recsys dev (#438)
* add rec system

* fix

* add new figures

* fix

* update

* fix

* fix

* fix
2023-03-29 00:28:49 +08:00
Shaoyu Wang
ec6ccd4686 fix typo (#427)
fix typo
2023-03-24 10:45:52 +08:00
Shuangchi He
ef2031d8f9 Fix some typos. (#365)
* Fix some typos.

* Fix typo.

Co-authored-by: Cheng Lai <laicheng_VIP@163.com>
Co-authored-by: Jiarong Han <73918561+hanjr92@users.noreply.github.com>
2022-06-10 15:29:06 +08:00
Peiyuan Liao
3788ff67ad [内容补充与拓展]集合通信 (#334)
* add initial content on collective communication

* Update mlsys.bib

* update megatron-lm/dall-e citations

* [collective] basic definition

* Update collective.md

* [collective] Broadcast

* [collective] reduce

* [collective] Reduce, Allreduce, Gather, All Gather, Scatter, ReduceScatter

* [collective] reorganize op section

* Update collective.md

* [collective] format

* [collective] calculating bandwidth

* [collective] ZeRO

* [collective] ZeRO and DALL-E

* Update collective.md

* [collective] remove topology section

* [collective] ZeRO and DALL-E

* [collective] abstraction

* Update collective.md

* [collective] abstractions & allreduce to extension

* [collective] bandwidth calculation

* [collective] move comm interface to summary

* [collective] typo

* [collective] typo

* Update mlsys.bib

* Update references (#335)

* update ch03 (#338)

* update (#339)

Co-authored-by: Jiankai-Sun <jkaisun1@gmail.com>

* Fix ch10 figures (#341)

* fix #264

* Fix figures

* Add extended readings (fix #282)

* Remove extra spaces

* Fix typo

* fix #183

* update fonts in figures

* fix #184 #263

* fix #184 #263

* fix a bug

* fix a bug

* fix 183

* fix a bug

* fix a text

* Merge

* add overview figure fix #263

* fix #263

* fix the overview figure

Co-authored-by: Dalong <39682259+eedalong@users.noreply.github.com>

* Recsys fix (#340)

* fix text (#325)

* fix reference

* update images of explainable ai (#267) (#328)

* update explainable ai

* update explainable ai

* fix citation errors (#60)

* fix reference error

* update explainable ai

* update explainable ai

* fix citation errors (#60)

* fix reference error

* fetch upstream

* update explainable ai

* fix citation errors (#60)

* fix reference error

* update explainable ai

* remove redundant content

* update img of explainable AI(#267)

* fix bug in mlsys.bib

* fix bug2 in mlsys.bib

* rewrite mlsys.bib

Co-authored-by: lhy <hlicn@connect.ust.hk>
Co-authored-by: Dalong <39682259+eedalong@users.noreply.github.com>
Co-authored-by: Luo Mai <luo.mai.cs@gmail.com>

* 删除6.2.1小节标题中无效的图片路径 (#337)

6.2.1小节标题中的图片引用在下文出现了,删除该小节标题中无效的图片路径

Co-authored-by: Luo Mai <luo.mai.cs@gmail.com>
Co-authored-by: Cheng Lai <laicheng_VIP@163.com>

* add extension (#331)

Co-authored-by: Luo Mai <luo.mai.cs@gmail.com>

* add explainable extension (#343)

Co-authored-by: lixiaohui <lixiaohui33@huawei.com>
Co-authored-by: Luo Mai <luo.mai.cs@gmail.com>

* Update RL chapter (#349)

* fix chap12 render

* add distributed rl chapter

* fix bug

* fix issue #212

* fix typo

* update imgs

* fix chinese

* fix svg img

* update contents in rl chapter

* update marl sys

* fix a fig

* fix ref

* fix error

Co-authored-by: Dalong <39682259+eedalong@users.noreply.github.com>

* [collevtive] add references

* [collective] fix references & add equations

* [collective] fix reference and inline comments

* [collective] fix code

* Update collective.md

Co-authored-by: Cheng Lai <laicheng_VIP@163.com>
Co-authored-by: Jiarong Han <73918561+hanjr92@users.noreply.github.com>
Co-authored-by: Jack <sjkai1@126.com>
Co-authored-by: Jiankai-Sun <jkaisun1@gmail.com>
Co-authored-by: Yao Fu <fy38607203@163.com>
Co-authored-by: Dalong <39682259+eedalong@users.noreply.github.com>
Co-authored-by: HaoyangLI <417493727@qq.com>
Co-authored-by: lhy <hlicn@connect.ust.hk>
Co-authored-by: Luo Mai <luo.mai.cs@gmail.com>
Co-authored-by: theseed <feiyuxin1000@sina.com>
Co-authored-by: huygens12 <59854698+huygens12@users.noreply.github.com>
Co-authored-by: lixiaohui <lixiaohui33@huawei.com>
Co-authored-by: Zihan Ding <1402434478@qq.com>
2022-05-23 13:34:50 -04:00
Yao Fu
b2fa0b156d Fix chapter_distributed_training figures (#321)
* fix #264

* Fix figures

* Add extended readings (fix #282)

* Remove extra spaces

* Fix typo

* fix #183

* update fonts in figures

* fix #184 #263

* fix #184 #263

* fix a bug

* fix a bug

* fix 183

* fix a bug

* fix a text

* Merge

* add overview figure fix #263

* fix #263

Co-authored-by: Dalong <39682259+eedalong@users.noreply.github.com>
2022-04-28 16:38:49 +01:00
maqy
d03c547cea fix typos(issue243) (#244)
* fix ch6 (issue220)

* fix typos

Co-authored-by: Dalong <39682259+eedalong@users.noreply.github.com>
2022-04-04 09:57:28 +08:00
Luo Mai
7f18af395f Update further reading (#232)
* update further reading

* format
2022-03-31 09:45:30 +08:00
cailun01
4c1b2b61f8 change content in chapter 10 parameter server (#194)
Co-authored-by: Luo Mai <luo.mai.cs@gmail.com>
2022-03-23 15:36:40 +00:00
Cheng Lai
aefdd40ee6 fix ch10 (#124)
Co-authored-by: Dalong <39682259+eedalong@users.noreply.github.com>
2022-03-20 10:12:27 +08:00
Cheng Lai
dc96a6faab Update Chapter 10 distribute training (#56) 2022-03-10 08:15:39 +00:00
Luo Mai
1f1f009cea update structure (#28) 2022-03-03 08:54:20 +00:00