From 3add33ca50ed6224929b8d1c350b142dffb59e03 Mon Sep 17 00:00:00 2001 From: <> Date: Sat, 16 Dec 2023 04:15:51 +0000 Subject: [PATCH] Deployed af9526f8 with MkDocs version: 1.5.3 --- en/CS学习规划/index.html | 2 +- en/index.html | 2 +- en/体系结构/CA/index.html | 2 +- en/使用指南/index.html | 2 +- en/必学工具/Github/index.html | 35 +++ en/必学工具/Latex/index.html | 35 +++ en/必学工具/Scoop/index.html | 12 +- en/必学工具/thesis/index.html | 2 +- en/必学工具/tools/index.html | 2 +- en/必学工具/workflow/index.html | 2 +- en/必学工具/信息检索/index.html | 2 +- en/必学工具/翻墙/index.html | 2 +- en/操作系统/NJUOS/index.html | 2 +- en/数据库系统/15799/index.html | 2 +- en/数据库系统/CS122/index.html | 2 +- en/数据库系统/CS346/index.html | 2 +- en/机器学习进阶/CMU10-708/index.html | 2 +- en/机器学习进阶/CS229M/index.html | 2 +- en/机器学习进阶/STA4273/index.html | 2 +- en/机器学习进阶/STAT8201/index.html | 2 +- en/机器学习进阶/roadmap/index.html | 2 +- en/编程入门/AUT1400/index.html | 2 +- en/编程入门/DeCal/index.html | 2 +- en/计算机图形学/CS148/index.html | 2 +- en/计算机图形学/GAMES101/index.html | 2 +- en/计算机图形学/GAMES103/index.html | 2 +- en/计算机图形学/GAMES202/index.html | 2 +- search/search_index.json | 2 +- sitemap.xml | 448 ++++++++++++++------------- sitemap.xml.gz | Bin 4497 -> 4516 bytes 使用指南/index.html | 2 +- 机器学习进阶/CMU10-708/index.html | 2 +- 机器学习进阶/CS229M/index.html | 2 +- 机器学习进阶/STA4273/index.html | 2 +- 机器学习进阶/STAT8201/index.html | 2 +- 机器学习进阶/roadmap/index.html | 2 +- 36 files changed, 337 insertions(+), 255 deletions(-) create mode 100644 en/必学工具/Github/index.html create mode 100644 en/必学工具/Latex/index.html diff --git a/en/CS学习规划/index.html b/en/CS学习规划/index.html index b1184177..c059deab 100644 --- a/en/CS学习规划/index.html +++ b/en/CS学习规划/index.html @@ -1,4 +1,4 @@ - Guideline - csdiy.wiki
Skip to content

一个仅供参考的 CS 学习规划

计算机领域方向庞杂,知识浩如烟海,每个细分领域如果深究下去都可以说学无止境。因此,一个清晰明确的学习规划是非常重要的。我在多年自学的尝试中也走过不少弯路,最终提炼出了下面的内容,供大家参考。

不过,在开始学习之前,先向小白们强烈推荐一个科普向系列视频 Crash Course: Computer Science,在短短 8 个小时里非常生动且全面地科普了关于计算机科学的方方面面:计算机的历史、计算机是如何运作的、组成计算机的各个重要模块、计算机科学中的重要思想等等等等。正如它的口号所说的 Computers are not magic!,希望看完这个视频之后,大家能对计算机科学有个全貌性地感知,从而怀着兴趣去面对下面浩如烟海的更为细致且深入的学习内容。

必学工具

俗话说:磨刀不误砍柴工。如果你是一个刚刚接触计算机的24k纯小白,学会一些工具将会让你事半功倍。

学会提问:也许你会惊讶,提问也算计算机必备技能吗,还放在第一条?我觉得在开源社区中,学会提问是一项非常重要的能力,它包含两方面的事情。其一是会变相地培养你自主解决问题的能力,因为从形成问题、描述问题并发布、他人回答、最后再到理解回答这个周期是非常长的,如果遇到什么鸡毛蒜皮的事情都希望别人最好远程桌面手把手帮你完成,那计算机的世界基本与你无缘了。其二,如果真的经过尝试还无法解决,可以借助开源社区的帮助,但这时候如何通过简洁的文字让别人瞬间理解你的处境以及目的,就显得尤为重要。推荐阅读提问的智慧这篇文章,这不仅能提高你解决问题的概率和效率,也能让开源社区里无偿提供解答的人们拥有一个好心情。

MIT-Missing-Semester 这门课覆盖了这些工具中绝大部分,而且有相当详细的使用指导,强烈建议小白学习。不过需要注意的一点是,在课程中会不时提到一些与开发流程相关的术语。因此推荐至少在学完计算机导论级别的课程之后进行学习。

翻墙:由于一些众所周知的原因,谷歌、GitHub 等网站在大陆无法访问。然而很多时候,谷歌和 StackOverflow 可以解决你在开发过程中遇到的 99% 的问题。因此,学会翻墙几乎是一个内地 CSer 的必备技能。(考虑到法律问题,这个文档提供的翻墙方式仅对拥有北大邮箱的用户适用)。

命令行:熟练使用命令行是一种常常被忽视,或被认为难以掌握的技能,但实际上,它会极大地提高你作为工程师的灵活性以及生产力。命令行的艺术是一份非常经典的教程,它源于 Quora 的一个提问,但在各路大神的贡献努力下已经成为了一个 GitHub 十万 stars 的顶流项目,被翻译成了十几种语言。教程不长,非常建议大家反复通读,在实践中内化吸收。同时,掌握 Shell 脚本编程也是一项不容忽视的技术,可以参考这个教程

IDE (Integrated Development Environment):集成开发环境,说白了就是你写代码的地方。作为一个码农,IDE 的重要性不言而喻,但由于很多 IDE 是为大型工程项目设计的,体量较大,功能也过于丰富。其实如今一些轻便的文本编辑器配合丰富的插件生态基本可以满足日常的轻量编程需求。个人常用的编辑器是 VS Code 和 Sublime(前者的插件配置非常简单,后者略显复杂但颜值很高)。当然对于大型项目我还是会采用略重型的 IDE,例如 Pycharm (Python),IDEA (Java) 等等(免责申明:所有的 IDE 都是世界上最好的 IDE)。

Vim:一款命令行编辑工具。这是一个学习曲线有些陡峭的编辑器,不过学会它我觉得是非常有必要的,因为它将极大地提高你的开发效率。现在绝大多数 IDE 也都支持 Vim 插件,让你在享受现代开发环境的同时保留极客的炫酷(yue)。

Emacs:与 Vim 齐名的经典编辑器,同样具有极高的开发效率,同时具有更为强大的扩展性,它既可以配置为一个轻量编辑器,也可以扩展成一个个人定制的 IDE,甚至可以有更多奇技淫巧。

Git:一款代码版本控制工具。Git的学习曲线可能更为陡峭,但出自 Linux 之父 Linus 之手的 Git 绝对是每个学 CS 的童鞋必须掌握的神器之一。

GitHub:基于 Git 的代码托管平台。全世界最大的代码开源社区,大佬集聚地。

GNU Make:一款工程构建工具。善用 GNU Make 会让你养成代码模块化的习惯,同时也能让你熟悉一些大型工程的编译链接流程。

CMake:一款功能比 GNU Make 更为强大的构建工具,建议掌握 GNU Make 之后再加以学习。

LaTex逼格提升 论文排版工具。

Docker:一款相较于虚拟机更轻量级的软件打包与环境部署工具。

实用工具箱:除了上面提到的这些在开发中使用频率极高的工具之外,我还收集了很多实用有趣的免费工具,例如一些下载工具、设计工具、学习网站等等。

Thesis:毕业论文 Word 写作教程。

好书推荐

私以为一本好的教材应当是以人为本的,而不是炫技式的理论堆砌。告诉读者“是什么”固然重要,但更好的应当是教材作者将其在这个领域深耕几十年的经验融汇进书中,向读者娓娓道来“为什么”以及未来应该“怎么做”。

链接戳这里

环境配置

你以为的开发 —— 在 IDE 里疯狂码代码数小时。

实际上的开发 —— 配环境配几天还没开始写代码。

PC 端环境配置

如果你是 Mac 用户,那么你很幸运,这份指南 将会手把手地带你搭建起整套开发环境。如果你是 Windows 用户,在开源社区的努力下,你同样可以获得与其他平台类似的体验:Scoop

另外大家可以参考一份灵感来自 6.NULL MIT-Missing-Semester环境配置指南,重点在于终端的美化配置。此外还包括常用软件源(如 GitHub, Anaconda, PyPI 等)的加速与替换以及一些 IDE 的配置与激活教程。

服务器端环境配置

服务器端的运维需要掌握 Linux(或者其他类 Unix 系统)的基本使用以及进程、设备、网络等系统相关的基本概念,小白可以参考中国科学技术大学 Linux 用户协会编写的《Linux 101》在线讲义。如果想深入学习系统运维相关的知识,可以参考 Aspects of System Administration 这门课程。

另外,如果需要学习某个具体的概念或工具,推荐一个非常不错的 GitHub 项目 DevOps-Guide,其中涵盖了非常多的运维方面的基础知识和教程,例如 Docker, Kubernetes, Linux, CI-CD, GitHub Actions 等等。

课程地图

正如这章开头提到的,这份课程地图仅仅是一个仅供参考的课程规划,我作为一个临近毕业的本科生。深感自己没有权利也没有能力向别人宣扬“应该怎么学”。因此如果你觉得以下的课程分类与选择有不合理之处,我全盘接受,并深感抱歉。你可以在下一节定制属于你的课程地图

以下课程类别中除了含有 基础入门 字眼的以外,并无明确的先后次序,大家只要满足某个课程的先修要求,完全可以根据自己的需要和喜好选择想要学习的课程。

数学基础

微积分与线性代数

作为大一新生,学好微积分线代是和写代码至少同等重要的事情,相信已经有无数的前人经验提到过这一点,但我还是要不厌其烦地再强调一遍:学好微积分线代真的很重要!你也许会吐槽这些东西岂不是考完就忘,那我觉得你是并没有把握住它们本质,对它们的理解还没有达到刻骨铭心的程度。如果觉得老师课上讲的内容晦涩难懂,不妨参考 MIT 的 Calculus Course18.06: Linear Algebra 的课程 notes,至少于我而言,它帮助我深刻理解了微积分和线性代数的许多本质。顺道再安利一个油管数学网红 3Blue1Brown,他的频道有很多用生动形象的动画阐释数学本质内核的视频,兼具深度和广度,质量非常高。

信息论入门

作为计算机系的学生,及早了解一些信息论的基础知识,我觉得是大有裨益的。但大多信息论课程都面向高年级本科生甚至研究生,对新手极不友好。而 MIT 的 6.050J: Information theory and Entropy 这门课正是为大一新生量身定制的,几乎没有先修要求,涵盖了编码、压缩、通信、信息熵等等内容,非常有趣。

数学进阶

离散数学与概率论

集合论、图论、概率论等等是算法推导与证明的重要工具,也是后续高阶数学课程的基础。但我觉得这类课程的讲授很容易落入理论化与形式化的窠臼,让课堂成为定理结论的堆砌,而无法使学生深刻把握理论的本质,进而造成学了就背,考了就忘的怪圈。如果能在理论教学中穿插算法运用实例,学生在拓展算法知识的同时也能窥见理论的力量和魅力。

UCB CS70 : discrete Math and probability theoryUCB CS126 : Probability theory 是 UC Berkeley 的概率论课程,前者覆盖了离散数学和概率论基础,后者则涉及随机过程以及深入的理论内容。两者都非常注重理论和实践的结合,有丰富的算法实际运用实例,后者还有大量的 Python 编程作业来让学生运用概率论的知识解决实际问题。

数值分析

作为计算机系的学生,培养计算思维是很重要的,实际问题的建模、离散化,计算机的模拟、分析,是一项很重要的能力。而这两年开始风靡的,由 MIT 打造的 Julia 编程语言以其 C 一样的速度和 Python 一样友好的语法在数值计算领域有一统天下之势,MIT 的许多数学课程也开始用 Julia 作为教学工具,把艰深的数学理论用直观清晰的代码展示出来。

ComputationalThinking 是 MIT 开设的一门计算思维入门课,所有课程内容全部开源,可以在课程网站直接访问。这门课利用 Julia 编程语言,在图像处理、社会科学与数据科学、气候学建模三个 topic 下带领学生理解算法、数学建模、数据分析、交互设计、图例展示,让学生体验计算与科学的美妙结合。内容虽然不难,但给我最深刻的感受就是,科学的魅力并不是故弄玄虚的艰深理论,不是诘屈聱牙的术语行话,而是用直观生动的案例,用简练深刻的语言,让每个普通人都能理解。

上完上面的体验课之后,如果意犹未尽的话,不妨试试 MIT 的 18.330 : Introduction to numerical analysis,这门课的编程作业同样会用 Julia 编程语言,不过难度和深度上都上了一个台阶。内容涉及了浮点编码、Root finding、线性系统、微分方程等等方面,整门课的主旨就是让你利用离散化的计算机表示去估计和逼近一个数学上连续的概念。这门课的教授还专门撰写了一本配套的开源教材 Fundamentals of Numerical Computation,里面附有丰富的 Julia 代码实例和严谨的公式推导。

如果你还意犹未尽的话,还有 MIT 的数值分析研究生课程 18.335: Introduction to numerical method 供你参考。

微分方程

如果世间万物的运动发展都能用方程来刻画和描述,这是一件多么酷的事情呀!虽然几乎任何一所学校的 CS 培养方案中都没有微分方程相关的必修课程,但我还是觉得掌握它会赋予你一个新的视角来审视这个世界。

由于微分方程中往往会用到很多复变函数的知识,所以大家可以参考 MIT18.04: Complex variables functions 的课程 notes 来补齐先修知识。

MIT18.03: differential equations 主要覆盖了常微分方程的求解,在此基础之上 MIT18.152: Partial differential equations 则会深入偏微分方程的建模与求解。掌握了微分方程这一有力工具,相信对于你的实际问题的建模能力以及从众多噪声变量中把握本质的直觉都会有很大帮助。

数学高阶

作为计算机系的学生,我经常听到数学无用论的论断,对此我不敢苟同但也无权反对,但若凡事都硬要争出个有用和无用的区别来,倒也着实无趣,因此下面这些面向高年级甚至研究生的数学课程,大家按兴趣自取所需。

凸优化

Standford EE364A: Convex Optimization

信息论

MIT6.441: Information Theory

应用统计学

MIT18.650: Statistics for Applications

初等数论

MIT18.781: Theory of Numbers

密码学

Standford CS255: Cryptography

编程入门

Languages are tools, you choose the right tool to do the right thing. Since there's no universally perfect tool, there's no universally perfect language.

Shell

Python

C++

Rust

OCaml

电子基础

电路基础

作为计算机系的学生,了解一些基础的电路知识,感受从传感器收集数据到数据分析再到算法预测整条流水线,对于后续知识的学习以及计算思维的培养还是很有帮助的。EE16A&B: Designing Information Devices and Systems I&II 是伯克利 EE 学生的大一入门课,其中 EE16A 注重通过电路从实际环境中收集和分析数据,而 EE16B 则侧重从这些收集到的数据进行分析并做出预测行为。

信号与系统

信号与系统是一门我觉得非常值得一上的课,最初学它只是为了满足我对傅里叶变换的好奇,但学完之后我才不禁感叹,傅立叶变换给我提供了一个全新的视角去看待这个世界,就如同微分方程一样,让你沉浸在用数学去精确描绘和刻画这个世界的优雅与神奇之中。

MIT 6.003: signal and systems 提供了全部的课程录影、书面作业以及答案。也可以去看这门课的远古版本

UCB EE120: Signal and Systems 关于傅立叶变换的 notes 写得非常好,并且提供了6 个非常有趣的 Python 编程作业,让你实践中运用信号与系统的理论与算法。

数据结构与算法

算法是计算机科学的核心,也是几乎一切专业课程的基础。如何将实际问题通过数学抽象转化为算法问题,并选用合适的数据结构在时间和内存大小的限制下将其解决是算法课的永恒主题。如果你受够了老师的照本宣科,那么我强烈推荐伯克利的 UCB CS61B: Data Structures and Algorithms 和普林斯顿的 Coursera: Algorithms I & II,这两门课的都讲得深入浅出并且会有丰富且有趣的编程实验将理论与知识结合起来。

以上两门课程都是基于 Java 语言,如果你想学习 C/C++ 描述的版本,可以参考斯坦福的数据结构与基础算法课程 Stanford CS106B/X: Programming Abstractions。偏好 Python 的同学可以学习 MIT 的算法入门课 MIT 6.006: Introduction to Algorithms

对一些更高级的算法以及 NP 问题感兴趣的同学可以学习伯克利的算法设计与分析课程 UCB CS170: Efficient Algorithms and Intractable Problems 或者 MIT 的高阶算法 MIT 6.046: Design and Analysis of Algorithms

软件工程

入门课

一份“能跑”的代码,和一份高质量的工业级代码是有本质区别的。因此我非常推荐低年级的同学学习一下 MIT 6.031: Software Construction 这门课,它会以 Java 语言为基础,以丰富细致的阅读材料和精心设计的编程练习传授如何编写不易出 bug、简明易懂、易于维护修改的高质量代码。大到宏观数据结构设计,小到如何写注释,遵循这些前人总结的细节和经验,对于你此后的编程生涯大有裨益。

专业课

当然,如果你想系统性地上一门软件工程的课程,那我推荐的是伯克利的 UCB CS169: software engineering。但需要提醒的是,和大多学校(包括贵校)的软件工程课程不同,这门课不会涉及传统的 design and document 模式,即强调各种类图、流程图及文档设计,而是采用近些年流行起来的小团队快速迭代 Agile Develepment 开发模式以及利用云平台的 Software as a service 服务模式。

体系结构

入门课

从小我就一直听说,计算机的世界是由 01 构成的,我不理解但大受震撼。如果你的内心也怀有这份好奇,不妨花一到两个月的时间学习 Coursera: Nand2Tetris 这门无门槛的计算机课程。这门麻雀虽小五脏俱全的课程会从 01 开始让你亲手造出一台计算机,并在上面运行俄罗斯方块小游戏。一门课里涵盖了编译、虚拟机、汇编、体系结构、数字电路、逻辑门等等从上至下、从软至硬的各类知识,非常全面。难度上也是通过精心的设计,略去了众多现代计算机复杂的细节,提取出了最核心本质的东西,力图让每个人都能理解。在低年级,如果就能从宏观上建立对整个计算机体系的鸟瞰图,是大有裨益的。

专业课

当然,如果想深入现代计算机体系结构的复杂细节,还得上一门大学本科难度的课程 UCB CS61C: Great Ideas in Computer Architecture。UC Berkeley 作为 RISC-V 架构的发源地,在体系结构领域算得上首屈一指。其课程非常注重实践,你会在 Project 中手写汇编构造神经网络,从零开始搭建一个 CPU,这些实践都会让你对计算机体系结构有更为深入的理解,而不是仅停留于“取指译码执行访存写回”的单调背诵里。

系统入门

计算机系统是一个庞杂而深刻的主题,在深入学习某个细分领域之前,对各个领域有一个宏观概念性的理解,对一些通用性的设计原则有所知晓,会让你在之后的深入学习中不断强化一些最为核心乃至哲学的概念,而不会桎梏于复杂的内部细节和各种 trick。因为在我看来,学习系统最关键的还是想让你领悟到这些最核心的东西,从而能够设计和实现出属于自己的系统。

MIT6.033: System Engineering 是 MIT 的系统入门课,主题涉及了操作系统、网络、分布式和系统安全,除了知识点的传授外,这门课还会讲授一些写作和表达上的技巧,让你学会如何设计并向别人介绍和分析自己的系统。这本书配套的教材 Principles of Computer System Design: An Introduction 也写得非常好,推荐大家阅读。

CMU 15-213: Introduction to Computer System 是 CMU 的系统入门课,内容覆盖了体系结构、操作系统、链接、并行、网络等等,兼具广度和深度,配套的教材 Computer Systems: A Programmer's Perspective 也是质量极高,强烈建议阅读。

操作系统

没有什么能比自己写个内核更能加深对操作系统的理解了。

操作系统作为各类纷繁复杂的底层硬件虚拟化出一套规范优雅的抽象,给所有应用软件提供丰富的功能支持。了解操作系统的设计原则和内部原理对于一个不满足于当调包侠的程序员来说是大有裨益的。出于对操作系统的热爱,我上过国内外很多操作系统课程,它们各有侧重和优劣,大家可以根据兴趣各取所需。

MIT 6.S081: Operating System Engineering,MIT 著名 PDOS 实验室出品,11 个 Project 让你在一个实现非常优雅的类Unix操作系统xv6上增加各类功能模块。这门课也让我深刻认识到,做系统不是靠 PPT 念出来的,是得几万行代码一点点累起来的。

UCB CS162: Operating System,伯克利的操作系统课,采用和 Stanford 同样的 Project —— 一个教学用操作系统 Pintos。我作为北京大学2022年和2023年春季学期操作系统实验班的助教,引入并改善了这个 Project,课程资源也会全部开源,具体参见课程网站

NJU: Operating System Design and Implementation,南京大学的蒋炎岩老师开设的操作系统课程。蒋老师以其独到的系统视角结合丰富的代码示例将众多操作系统的概念讲得深入浅出,此外这门课的全部课程内容都是中文的,非常方便大家学习。

HIT OS: Operating System,哈尔滨工业大学的李治军老师开设的中文操作系统课程。李老师的课程基于 Linux 0.11 源码,十分注重代码实践,并站在学生视角将操作系统的来龙去脉娓娓道来。

并行与分布式系统

想必这两年各类 CS 讲座里最常听到的话就是“摩尔定律正在走向终结”,此话不假,当单核能力达到上限时,多核乃至众核架构如日中天。硬件的变化带来的是上层编程逻辑的适应与改变,要想充分利用硬件性能,编写并行程序几乎成了程序员的必备技能。与此同时,深度学习的兴起对计算机算力与存储的要求都达到了前所未有的高度,大规模集群的部署和优化也成为热门技术话题。

并行计算

CMU 15-418/Stanford CS149: Parallel Computing

分布式系统

MIT 6.824: Distributed System

系统安全

不知道你当年选择计算机是不是因为怀着一个中二的黑客梦想,但现实却是成为黑客道阻且长。

理论课程

UCB CS161: Computer Security 是伯克利的系统安全课程,会涵盖栈攻击、密码学、网站安全、网络安全等等内容。

ASU CSE365: Introduction to Cybersecurity 亚利桑那州立大学的 Web 安全课程,主要涉及注入、汇编与密码学的内容。

ASU CSE466: Computer Systems Security 亚利桑那州立大学的系统安全课程,涉及内容全面。门槛较高,需要对 Linux, C 与 Python 充分熟悉。

SU SEED Labs 雪城大学的网安课程,由 NSF 提供130万美元的资金支持,为网安教育开发了动手实践性的实验练习(称为 SEED Lab)。课程理论教学和动手实践并重,包含详细的开源讲义、视频教程、教科书(被印刷为多种语言)、开箱即用的基于虚拟机和 docker 的攻防环境等。目前全球有1050家研究机构在使用该项目。涵盖计算机和信息安全领域的广泛主题,包括软件安全、网络安全、Web 安全、操作系统安全和移动应用安全。

实践课程

掌握这些理论知识之后,还需要在实践中培养和锻炼这些“黑客素养”。CTF 夺旗赛是一项比较热门的系统安全比赛,赛题中会融会贯通地考察你对计算机各个领域知识的理解和运用。北大今年也成功举办了第 0 届和第 1 届,鼓励大家后期踊跃参与,在实践中提高自己。下面列举一些我平时学习(摸鱼)用到的资源:

计算机网络

没有什么能比自己写个 TCP/IP 协议栈更能加深对计算机网络的理解了。

大名鼎鼎的 Stanford CS144: Computer Network,8 个 Project 带你实现整个 TCP/IP 协议栈。

如果你只是想在理论上对计算机网络有所了解,那么推荐计网著名教材《自顶向下方法》的配套学习资源 Computer Networking: A Top-Down Approach

数据库系统

没有什么能比自己写个关系型数据库更能加深对数据库系统的理解了。

CMU 的著名数据库神课 CMU 15-445: Introduction to Database System 会通过 4 个 Project 带你为一个用于教学的关系型数据库 bustub 添加各种功能。实验的评测框架也免费开源了,非常适合大家自学。此外课程实验会用到 C++11 的众多新特性,也是一个锻炼 C++ 代码能力的好机会。

Berkeley 作为著名开源数据库 postgres 的发源地也不遑多让,UCB CS186: Introduction to Database System 会让你用 Java 语言实现一个支持 SQL 并发查询、B+ 树索引和故障恢复的关系型数据库。

编译原理

没有什么能比自己写个编译器更能加深对编译器的理解了。

Stanford CS143: Compilers 带你手写编译器。

Web开发

前后端开发很少在计算机的培养方案里被重视,但其实掌握这项技能还是好处多多的,例如搭建自己的个人主页,抑或是给自己的课程项目做一个精彩的展示网页。

两周速成版

MIT web development course

系统学习版

Stanford CS142: Web Applications

计算机图形学

数据科学

其实数据科学和机器学习与深度学习有着很紧密的联系,但可能更侧重于实践。Berkeley 的 UCB Data100: Principles and Techniques of Data Science 通过丰富的编程练习让你在实践中掌握各类数据分析工具和算法,并带领你体验从海量的数据集中提取出想要的结果,并对未来的数据或用户的行为做出相应的预测。但这只是一门基础课,如果想学习工业级别的数据挖掘与分析技术,可以尝试 Stanford 的大数据挖掘课程 CS246: Mining Massive Data Sets

人工智能

近十年人工智能应该算是计算机界最火爆的领域。如果你不满足于整日听各路媒体争相报道人工智能相关的进展,而想真正一探究竟,那么非常推荐学习 Harvard 神课 CS50 系列的人工智能课程 Harvard CS50: Introduction to AI with Python。课程短小精悍,覆盖了传统人工智能领域的几大分支,并配有丰富有趣的 Python 编程练习来巩固你对人工智能算法的理解。美中不足的是这门课因为面向在线自学者的缘故内容较为精简,并且不会涉及特别深入的数学理论,如果想要系统深入地学习还需要一门本科生难度的课程,例如 Berkeley 的 UCB CS188: Introduction to Artificial Intelligence。这门课的 Project 复刻了经典游戏糖豆人,让你运用人工智能算法玩游戏,非常有趣。

机器学习

机器学习领域近些年最重要的进展就是发展出了基于神经网络的深度学习分支,但其实很多基于统计学习的算法依然在数据分析领域有着广泛的应用。如果你之前从未接触过机器学习的相关知识,而且不想一开始就陷入艰深晦涩的数学证明,那么不妨先从 Andrew Ng (吴恩达)的 Coursera: Machine Learning 学起。这门课在机器学习领域基本无人不晓,吴恩达以其深厚的理论功底和出色的表达能力把很多艰深的算法讲得深入浅出,并且非常实用。其配套的作业也是质量相当上乘,可以帮助你快速入门。

但上过这门课只能让你从宏观上对机器学习这一领域有一定了解,如果想真正理解那些“神奇”算法背后的数学原理甚至从事相关领域的科研工作,那么还需要一门更“数学”的课程,例如 Stanford CS229: Machine Learning 或者 UCB CS189: Introduction to Machine Learning

深度学习

前几年 AlphaGo 的大热让深度学习进入了大众的视野,不少大学甚至专门成立了相关专业。很多计算机的其他领域也会借助深度学习的技术来做研究,因此基本不管你干啥多少都会接触到一些神经网络、深度学习相关的技术需求。如果想快速入门,同样推荐 Andrew Ng (吴恩达)的 Coursera: Deep Learning,质量无需多言,Coursera 上罕见的满分课程。此外如果你觉得英文课程学习起来有难度,推荐李宏毅老师的 国立台湾大学:机器学习 课程。这门课打着机器学习的名号,却囊括了深度学习领域的几乎所有方向,非常全面,很适合你从宏观上对这个领域有一个大致的了解。而且老师本人也非常幽默,课堂金句频出。

当然因为深度学习领域发展非常迅速,已经拥有了众多研究分支,如果想要进一步深入,可以按需学习下面罗列的代表课程,

计算机视觉

UMich EECS 498-007 / 598-005: Deep Learning for Computer Vision

Stanford CS231n: CNN for Visual Recognition

自然语言处理

Stanford CS224n: Natural Language Processing

图神经网络

Stanford CS224w: Machine Learning with Graphs

强化学习

UCB CS285: Deep Reinforcement Learning

定制属于你的课程地图

授人以鱼不如授人以渔。

以上的课程规划难免带有强烈的个人偏好,不一定适合所有人,更多是起到抛砖引玉的作用。如果你想挑选自己感兴趣的方向和内容加以学习,可以参考我在下面列出来的资源。

A Reference Guide for CS Learning

The field of computer science is vast and complex, with a seemingly endless sea of knowledge. Each specialized area can lead to limitless learning if pursued deeply. Therefore, a clear and definite study plan is very important. I've taken some detours in my years of self-study and finally distilled the following content for your reference.

Before you start learning, I highly recommend a popular science video series for beginners: Crash Course: Computer Science. In just 8 hours, it vividly and comprehensively covers various aspects of computer science: the history of computers, how computers operate, the important modules that make up a computer, key ideas in computer science, and so on. As its slogan says, Computers are not magic! I hope that after watching this video, everyone will have a holistic perception of computer science and embark on the detailed and in-depth learning content below with interest.

Essential Tools

As the saying goes: sharpening your axe will not delay your job of chopping wood. If you are a pure beginner in the world of computers, learning some tools will make you more efficient.

Learn to ask questions: You might be surprised that asking questions is the first one listed? I think in the open-source community, learning to ask questions is a very important ability. It involves two aspects. First, it indirectly cultivates your ability to solve problems independently, as the cycle of forming a question, describing it, getting answers from others, and then understanding the response is quite long. If you expect others to remotely assist you with every trivial issue, then the world of computers might not suit you. Second, if after trying, you still can't solve a problem, you can seek help from the open-source community. But at that point, how to concisely explain your situation and goal to others becomes particularly important. I recommend reading the article How To Ask Questions The Smart Way, which not only increases the probability and efficiency of solving your problems but also keeps those who provide answers in the open-source community in a good mood.

Learn to be a hacker: MIT-Missing-Semester covers many useful tools for a hacker and provides detailed usage instructions. I strongly recommend beginners to study this course. However, one thing to note is that the course occasionally refers to terms related to the development process. Therefore, it is recommended to study it at least after completing an introductory computer science course.

GFW: For well-known reasons, sites like Google and GitHub are not accessible in mainland China. However, in many cases, Google and StackOverflow can solve 99% of the problems encountered during development. Therefore, learning to use a VPN is almost an essential skill for a mainland CSer. (Considering legal issues, the methods provided in this book are only applicable to users with a Peking University email address).

Command Line: Proficiency in using the command line is often overlooked or considered difficult to master, but in reality, it greatly enhances your flexibility and productivity as an engineer. The Art of Command Line is a classic tutorial that started as a question on Quora, but with the contribution of many experts, it has become a top GitHub project with over 100,000 stars, translated into dozens of languages. The tutorial is not long, and I highly recommend everyone to read it repeatedly and internalize it through practice. Also, mastering shell script programming should not be overlooked, and you can refer to this tutorial.

IDE (Integrated Development Environment): Simply put, it's where you write your code. The importance of an IDE for a programmer goes without saying, but many IDEs are designed for large-scale projects and are quite bulky and overly feature-rich. Nowadays, some lightweight text editors with rich plugin ecosystems can basically meet the needs of daily lightweight programming. My personal favorites are VS Code and Sublime (the former has a very simple plugin configuration, while the latter is a bit more complex but aesthetically pleasing). Of course, for large projects, I would still use slightly heavier IDEs, such as Pycharm (Python), IDEA (Java), etc. (Disclaimer: all IDEs are the best in the world).

Vim: A command-line editor. Vim has a somewhat steep learning curve, but mastering it, I think, is very necessary because it will greatly improve your development efficiency. Most modern IDEs also support Vim plugins, allowing you to retain the coolness of a geek while enjoying a modern development environment.

Emacs: A classic editor that stands alongside Vim, with equally high development efficiency and more powerful expandability. It can be configured as a lightweight editor or expanded into a custom IDE, and even more sophisticated tricks.

Git: A version control tool for your project. Git, created by the father of Linux, Linus, is definitely one of the must-have tools for every CS student.

GitHub: A code hosting platform based on Git. The world's largest open-source community and a gathering place for CS experts.

GNU Make: An engineering build tool. Proficiency in GNU Make will help you develop a habit of modularizing your code and familiarize you with the compilation and linking processes of large projects.

CMake: A more powerful build tool than GNU Make, recommended for study after mastering GNU Make.

LaTex: Pretentious Paper typesetting tool.

Docker: A lighter-weight software packaging and deployment tool compared to virtual machines.

Practical Toolkit: In addition to the tools mentioned above that are frequently used in development, I have also collected many practical and interesting free tools, such as download tools, design tools, learning websites, etc.

Thesis: Tutorial for writing graduation thesis in Word.

I believe a good textbook should be people-oriented, rather than a display of technical jargon. It's certainly important to tell readers "what it is," but a better approach would be for the author to integrate decades of experience in the field into the book and narratively convey to the reader "why it is" and what should be done in the future.

Link here

Environment Setup

What you think of as development — coding frantically in an IDE for hours.

Actual development — setting up the environment for several days without starting to code.

PC Environment Setup

If you are a Mac user, you're in luck, as this guide will walk you through setting up the entire development environment. If you are a Windows user, thanks to the efforts of the open-source community, you can enjoy a similar experience with Scoop.

Additionally, you can refer to an environment setup guide inspired by 6.NULL MIT-Missing-Semester, focusing on terminal beautification. It also includes common software sources (such as GitHub, Anaconda, PyPI) for acceleration and replacement, as well as some IDE configuration and activation tutorials.

Server-Side Environment Setup

Server-side operation and maintenance require basic use of Linux (or other Unix-like systems) and fundamental concepts like processes, devices, networks, etc. Beginners can refer to the Linux 101 online notes compiled by the Linux User Association of the University of Science and Technology of China. If you want to delve deeper into system operation and maintenance, you can refer to the Aspects of System Administration course.

Additionally, if you need to learn a specific concept or tool, I recommend a great GitHub project, DevOps-Guide, which covers a lot of foundational knowledge and tutorials in the administration field, such as Docker, Kubernetes, Linux, CI-CD, GitHub Actions, and more.

Course Map

As mentioned at the beginning of this chapter, this course map is merely a reference guide for course planning, from my perspective as an undergraduate nearing graduation. I am acutely aware that I neither have the right nor the capability to preach to others about “how one should learn”. Therefore, if you find any issues with the course categorization and selection below, I fully accept and deeply apologize for them. You can tailor your own course map in the next section Customize Your Own Course Map.

Apart from courses labeled as basic or introductory, there is no explicit sequence in the following categories. As long as you meet the prerequisites for a course, you are free to choose any course according to your needs and interests.

Mathematical Foundations

Calculus and Linear Algebra

As a freshman, mastering calculus and linear algebra is as important as learning to code. This point has been reiterated countless times by predecessors, but I feel compelled to emphasize it again: mastering calculus and linear algebra is really important! You might complain that these subjects are forgotten after exams, but I believe that indicates a lack of deep understanding of their essence. If you find the content taught in class to be obscure, consider referring to MIT’s Calculus Course and 18.06: Linear Algebra course notes. For me, they greatly deepened my understanding of the essence of calculus and linear algebra. Also, I highly recommend the maths YouTuber 3Blue1Brown, whose channel features videos explaining the core of mathematics with vivid animations, offering both depth and breadth of high quality.

Introduction to Information Theory

For computer science students, gaining some foundational knowledge in information theory early on is beneficial. However, most information theory courses are targeted towards senior or even graduate students, making them quite inaccessible to beginners. MIT’s 6.050J: Information theory and Entropy is tailored for freshmen, with almost no prerequisites, covering coding, compression, communication, information entropy, and more, which is very interesting.

Advanced Mathematics

Discrete Mathematics and Probability Theory

Set theory, graph theory, and probability theory are essential tools for algorithm derivation and proof, as well as foundations for more advanced mathematical courses. However, the teaching of these subjects often falls into a rut of being overly theoretical and formalistic, turning classes into mere recitations of theorems and conclusions without helping students grasp the essence of these theories. If theory teaching can be interspersed with examples of algorithm application, students can expand their algorithm knowledge while appreciating the power and charm of theory.

UCB CS70: Discrete Math and Probability Theory and UCB CS126: Probability Theory are UC Berkeley’s probability courses. The former covers the basics of discrete mathematics and probability theory, while the latter delves into stochastic processes and more advanced theoretical content. Both emphasize the integration of theory and practice and feature abundant examples of algorithm application, with the latter including numerous Python programming assignments to apply probability theory to real-world problems.

Numerical Analysis

For computer science students, developing computational thinking is crucial. Modeling and discretizing real-world problems, and simulating and analyzing them on computers, are vital skills. Recently, the Julia programming language, developed by MIT, has become popular in the field of numerical computation with its C-like speed and Python-friendly syntax. Many MIT mathematics courses have started using Julia as a teaching tool, presenting complex mathematical theories through clear and intuitive code.

ComputationalThinking is an introductory course in computational thinking offered by MIT. All course materials are open source and accessible on the course website. Using the Julia programming language, the course covers image processing, social science and data science, and climatology modeling, helping students understand algorithms, mathematical modeling, data analysis, interactive design, and graph presentation. The course content, though not difficult, profoundly impressed me with the idea that the allure of science lies not in obscure theories or jargon but in presenting complex concepts through vivid examples and concise, deep language.

After completing this experience course, if you’re still eager for more, consider MIT’s 18.330: Introduction to Numerical Analysis. This course also uses Julia for programming assignments but is more challenging and in-depth. It covers floating-point encoding, root finding, linear systems, differential equations, and more, with the main goal of using discrete computer representations to estimate and approximate continuous mathematical concepts. The course instructor has also written an accompanying open-source textbook, Fundamentals of Numerical Computation, which includes abundant Julia code examples and rigorous formula derivations.

If you’re still not satisfied, MIT’s graduate course in numerical analysis, 18.335: Introduction to Numerical Methods, is also available for reference.

Differential Equations

Wouldn't it be cool if the motion and development of everything in the world could be described and depicted with equations? Although differential equations are not a mandatory part of any CS curriculum, I believe mastering them provides a new perspective to view the world.

Since differential equations often involve complex variable functions, you can refer to MIT18.04: Complex Variables Functions course notes to fill in prerequisite knowledge.

MIT18.03: Differential Equations mainly covers the solution of ordinary differential equations, and on this basis, MIT18.152: Partial Differential Equations dives into the modeling and solving of partial differential equations. With the powerful tool of differential equations, you will gain enhanced capabilities in modeling real-world problems and intuitively grasping the essence among various noisy variables.

Advanced Mathematical Topics

As a computer science student, I often hear arguments about the uselessness of mathematics. While I neither agree nor have the authority to oppose such views, if everything is forcibly categorized as useful or useless, it indeed becomes quite dull. Therefore, the following advanced mathematics courses, aimed at senior and even graduate students, are available for those interested.

Convex Optimization

Standford EE364A: Convex Optimization

Information Theory

MIT6.441: Information Theory

Applied Statistics

MIT18.650: Statistics for Applications

Elementary Number Theory

MIT18.781: Theory of Numbers

Cryptography

Standford CS255: Cryptography

Programming Fundamentals

Languages are tools, and you choose the right tool for the right job. Since there's no universally perfect tool, there's no universally perfect language.

Shell

Python

C++

Rust

OCaml

Electronics Fundamentals

Basics of Circuits

For computer science students, understanding basic circuit knowledge and experiencing the entire pipeline from sensor data collection to data analysis and algorithm prediction can be very helpful for future learning and developing computational thinking. EE16A&B: Designing Information Devices and Systems I&II at UC Berkeley are introductory courses for freshmen in electrical engineering. EE16A focuses on collecting and analyzing data from the real environment through circuits, while EE16B focuses on analyzing these collected data to make predictive actions.

Signals and Systems

Signals and Systems is a course I find very worthwhile. Initially, I studied it out of curiosity about Fourier Transform, but after completing it, I was amazed at how Fourier Transform provided a new perspective to view the world, just like differential equations, immersing you in the elegance and magic of precisely depicting the world with mathematics.

MIT 6.003: Signal and Systems provides all course recordings, written assignments, and answers. You can also check out this course's ancient version.

UCB EE120: Signal and Systems has very well-written notes on Fourier Transform and provides many interesting Python programming assignments to practically apply the theories and algorithms of signals and systems.

Data Structures and Algorithms

Algorithms are the core of computer science and the foundation for almost all professional courses. How to abstract real-world problems into algorithmic problems mathematically and solve them under time and memory constraints using appropriate data structures is the eternal theme of algorithm courses. If you are fed up with your teacher's rote teaching, I highly recommend UC Berkeley's UCB CS61B: Data Structures and Algorithms and Princeton's Coursera: Algorithms I & II. Both courses are taught in a deep yet simple manner and have rich and interesting programming experiments to integrate theory with knowledge.

Both of these courses are based on Java. If you prefer C/C++, you can refer to Stanford's data structure and basic algorithm course Stanford CS106B/X: Programming Abstractions. For those who prefer Python, you can learn MIT's introductory algorithm course MIT 6.006: Introduction to Algorithms.

For those interested in more advanced algorithms and NP problems, consider UC Berkeley's course on algorithm design and analysis UCB CS170: Efficient Algorithms and Intractable Problems or MIT's advanced algorithms course MIT 6.046: Design and Analysis of Algorithms.

Software Engineering

Introductory Course

There is a fundamental difference between “working” code and high-quality industrial code. Therefore, I highly recommend senior students to take MIT 6.031: Software Construction. Based on Java, this course teaches how to write high-quality code that is bug-resistant, clear, and easy to maintain and modify with rich and detailed reading materials and well-designed programming exercises. From macro data structure design to minor details like how to write comments, following these details and experiences summarized by predecessors can greatly benefit your future programming career.

Professional Course

Of course, if you want to systematically take a software engineering course, I recommend UC Berkeley’s UCB CS169: Software Engineering. However, unlike most software engineering courses, this course does not involve the traditional design and document model that emphasizes various class diagrams, flowcharts, and document design. Instead, it adopts the Agile Development model, which has become popular in recent years, featuring small team rapid iterations and the Software as a Service model using cloud platforms.

Computer Architecture

Introductory Course

Since childhood, I've always heard that the world of computers is made of 0s and 1s, which I didn't understand but was deeply impressed by. If you also have this curiosity, consider spending one to two months learning the barrier-free computer course Coursera: Nand2Tetris. This comprehensive course starts from 0s and 1s, allowing you to build a computer by hand and run a Tetris game on it. It covers compilation, virtual machines, assembly, architecture, digital circuits, logic gates, etc., from top to bottom, from software to hardware. Its difficulty is carefully designed to omit many complex details of modern computers, extracting the most core essence, aiming to make it understandable to everyone. In lower levels, establishing a bird's-eye view of the entire computer system is very beneficial.

Professional Course

Of course, if you want to delve into the complex details of modern computer architecture, you still need to take a university-level course UCB CS61C: Great Ideas in Computer Architecture. This course emphasizes practice, and you will hand-write assembly to construct neural networks in projects, build a CPU from scratch, and more, all of which will give you a deeper understanding of computer architecture, beyond the monotony of "fetch, decode, execute, memory access, write back."

Introduction to Computer Systems

Computer systems are a vast and profound topic. Before delving into a specific area, having a macro conceptual understanding of each field and some general design principles will reinforce core and even philosophical concepts in your subsequent in-depth study, rather than being shackled by complex internal details and various tricks. In my opinion, the key to learning systems is to grasp these core concepts to design and implement your own systems.

MIT6.033: System Engineering is MIT's introductory course to systems, covering topics like operating systems, networks, distributed systems, and system security. In addition to the theory, this course also teaches some writing and expression skills, helping you learn how to design, introduce, and analyze your own systems. The accompanying textbook Principles of Computer System Design: An Introduction is also very well written and recommended for reading.

CMU 15-213: Introduction to Computer System is CMU’s introductory systems course, covering architecture, operating systems, linking, parallelism, networks, etc., with both breadth and depth. The accompanying textbook Computer Systems: A Programmer's Perspective is also of very high quality and strongly recommended for reading.

Operating Systems

There’s nothing like writing your own kernel to deepen your understanding of operating systems.

Operating systems provide a set of elegant abstractions to virtualize various complex underlying hardware, providing rich functional support for all application software. Understanding the design principles and internal mechanisms of operating systems is greatly beneficial for a programmer who is not satisfied with just being a coder. Out of love for operating systems, I have taken many operating system courses in different colleges, each with its own focus and merits. You can choose based on your interests.

MIT 6.S081: Operating System Engineering, offered by the famous PDOS lab at MIT, features 11 projects that modify an elegantly implemented Unix-like operating system xv6. This course made me realize that systems is not about reading PPTs; it's about writing tens of thousands of lines of code.

UCB CS162: Operating System, UC Berkeley’s operating system course, uses the same Project as Stanford — an educational operating system, Pintos. As the teaching assistant for Peking University’s 2022 and 2023 Spring Semester Operating Systems Course, I introduced and improved this Project. The course resources are fully open-sourced, with details on the course website.

NJU: Operating System Design and Implementation, offered by Professor Yanyan Jiang at Nanjing University, provides an in-depth and accessible explanation of various operating system concepts, combining a unique system perspective with rich code examples. All course content is in Chinese, making it very convenient for students.

HIT OS: Operating System, taught by Professor Zhijun Li at Harbin Institute of Technology, is a Chinese course on operating systems. Based on the Linux 0.11 source code, the course places great emphasis on code practice, explaining the intricacies of operating systems from the student's perspective.

Parallel and Distributed Systems

In recent years, the most common phrase heard in CS lectures is "Moore's Law is coming to an end." As single-core capabilities reach their limits, multi-core and many-core architectures are becoming increasingly important. The changes in hardware necessitate adaptations and changes in the upper-level programming logic. Writing parallel programs has nearly become a mandatory skill for programmers to fully utilize hardware performance. Meanwhile, the rise of deep learning has brought unprecedented demands on computing power and storage, making the deployment and optimization of large-scale clusters a hot topic.

Parallel Computing

CMU 15-418/Stanford CS149: Parallel Computing

Distributed Systems

MIT 6.824: Distributed System

System Security

Whether you chose computer science because of a youthful dream of becoming a hacker, the reality is that becoming a hacker is a long and difficult journey.

Theoretical Courses

UCB CS161: Computer Security at UC Berkeley covers stack attacks, cryptography, website security, network security, and more.

ASU CSE365: Introduction to Cybersecurity at Arizona State University focuses mainly on injections, assembly, and cryptography.

ASU CSE466: Computer Systems Security at Arizona State University covers a wide range of topics in system security. It has a high barrier to entry, requiring familiarity with Linux, C, and Python.

SU SEED Labs at Syracuse University, supported by a $1.3 million grant from the NSF, has developed hands-on experimental exercises (called SEED Labs) for cybersecurity education. The course emphasizes both theoretical teaching and practical exercises, including detailed open-source lectures, video tutorials, textbooks (printed in multiple languages), and a ready-to-use virtual machine and Docker-based attack-defense environment. This project is currently used by 1,050 institutions worldwide and covers a wide range of topics in computer and information security, including software security, network security, web security, operating system security, and mobile app security.

Practical Courses

After mastering this theoretical knowledge, it's essential to cultivate and hone these "hacker skills" in practice. CTF competitions are a popular way to comprehensively test your understanding and application of computer knowledge in various fields. Peking University also successfully held the 0th and 1st editions, encouraging participation to improve skills through practice. Here are some resources I use for learning (and relaxing):

Computer Networks

There’s nothing like writing your own TCP/IP protocol stack to deepen your understanding of computer networks.

The renowned Stanford CS144: Computer Network includes 8 projects that guide you in implementing the entire TCP/IP protocol stack.

If you're just looking to understand computer networks theoretically, I recommend the famous networking textbook "A Top-Down Approach" and its accompanying learning resources Computer Networking: A Top-Down Approach.

Database Systems

There’s nothing like building your own relational database to deepen your understanding of database systems.

CMU's famous database course CMU 15-445: Introduction to Database System guides you through 4 projects to add various functionalities to the educational relational database bustub. The experimental evaluation framework is also open-source, making it very suitable for self-learning. The course experiments also use many new features of C++11, offering a great opportunity to strengthen C++ coding skills.

Berkeley, as the birthplace of the famous open-source database PostgreSQL, has its own course UCB CS186: Introduction to Database System where you will implement a relational database in Java that supports SQL concurrent queries, B+ tree indexing, and fault recovery.

Compiler Theory

There’s nothing like writing your own compiler to deepen your understanding of compilers.

Stanford CS143: Compilers guides you through the process of writing a compiler.

Web Development

Front-end development is often overlooked in computer science curricula, but mastering these skills has many benefits, such as building your personal website or creating an impressive presentation website for your course projects.

Two-Week Crash Course

MIT web development course

Systematic Study Version

Stanford CS142: Web Applications

Computer Graphics

Data Science

Data science, machine learning, and deep learning are closely related, with a focus on practical application. Berkeley's UCB Data100: Principles and Techniques of Data Science lets you master various data analysis tools and algorithms through extensive programming exercises. The course guides you through extracting desired results from massive datasets and making predictions about future data or user behavior. For those looking to learn industrial-level data mining and analysis techniques, Stanford's big data mining course CS246: Mining Massive Data Sets is an option.

Artificial Intelligence

Artificial intelligence has been one of the hottest fields in computer science over the past decade. If you're not content with just hearing about AI advancements in the media and want to delve into the subject, I highly recommend Harvard's renowned CS50 series AI course Harvard CS50: Introduction to AI with Python. The course is concise and covers several major branches of traditional AI, supplemented with rich and interesting Python programming exercises to reinforce your understanding of AI algorithms. However, the content is somewhat simplified for online learners and doesn't delve into deep mathematical theories. For a more systematic and in-depth study, consider an undergraduate-level course like Berkeley's UCB CS188: Introduction to Artificial Intelligence. This course's projects feature the classic game "Pac-Man," allowing you to use AI algorithms to play the game, which is very fun.

Machine Learning

The most significant recent progress in the field of machine learning is the emergence of deep learning, a branch based on deep neural networks. However, many algorithms based on statistical learning are still widely used in data analysis. If you're new to machine learning and don't want to get bogged down in complex mathematical proofs, start with Andrew Ng's (Enda Wu) Coursera: Machine Learning. This course is well-known in the field of machine learning, and Enda Wu, with his profound theoretical knowledge and excellent presentation skills, makes many complex algorithms accessible and practical. The accompanying assignments are also of high quality, helping you get started quickly.

However, completing this course will only give you a general understanding of the field of machine learning. To truly understand the mathematical principles behind these "magical" algorithms or to engage in related research, you need a more "mathematical" course, such as Stanford CS229: Machine Learning or UCB CS189: Introduction to Machine Learning.

Deep Learning

The popularity of AlphaGo a few years ago brought deep learning to the public eye, leading many universities to establish related majors. Many other areas of computer science also use deep learning technology for research, so regardless of your field, you will likely encounter some needs related to neural networks and deep learning. For a quick introduction, I again recommend Andrew Ng's (Enda Wu) Coursera: Deep Learning, a top-rated course on Coursera. Additionally, if you find English-language courses challenging, consider Professor Hongyi Li's course National Taiwan University: Machine Learning. Although titled "Machine Learning," this course covers almost all areas of deep learning and is very comprehensive, making it suitable for getting a broad overview of the field. The professor is also very humorous, with frequent witty remarks in class.

Due to the rapid development of deep learning, there are now many research branches. For further in-depth study, consider the following representative courses:

Computer Vision

Natural Language Processing

Graph Neural Networks

Reinforcement Learning

Customize Your Course Map

Better to teach fishing than to give fish.

The course map above inevitably carries strong personal preferences and may not suit everyone. It is more intended to serve as a starting point for exploration. If you want to select your own areas of interest for study, you can refer to the following resources:

Image title

Foreword

The English version is still under development, please check this issue if you want to contribute.

This is a self-learning guide to computer science, and a memento of my three years of self-learning at university.

It is also a gift to the young students at Peking University. It would be a great encouragement and comfort to me if this book could be of even the slightest help to you in your college life.

The book is currently organized to include the following sections (if you have other good suggestions, or would like to join the ranks of contributors, please feel free to email zhongyinmin@pku.edu.cn or ask questions in the issue).

  • Productivity Toolkit: IDE, VPN, StackOverflow, Git, Github, Vim, Latex, GNU Make and so on.
  • Environment configuration: PC/Server development environment setup, DevOps tutorials and so on.
  • Book recommendations: Those who have read the CSAPP must have realized the importance of good books. I will list links to books and resources in different areas of Computer Science that I find rewarding to read.
  • List of high quality CS courses: I will summarize all the high quality foreign CS courses I have taken into different categories and give relevant self-learning advice. Most of them will have a separate repository containing relevant resources as well as my homework/project implementations.

The place where dreams start —— CS61A

In my freshman year, I was a novice who knew nothing about computers. I installed a giant IDE Visual Studio and fight with OJ every day. With my high school maths background, I did pretty well in maths courses, but I felt struggled to learn courses in my major. When it came to programming, all I could do was open up that clunky IDE, create a new project that I didn't know exactly what it was for, and then cin, cout, for loops, and then CE, RE, WA loops. I was in a state where I was desperately trying to learn well but I didn't know how to learn. I listened carefully in class but I couldn't solve the homework problems. I spent almost all my spare time doing the homework after class, but the results were disappointing. I still retain the source code of the project for Introduction to Computing course —— a single 1200-line C++ file with no header files, no class abstraction, no unit tests, no makefile, no version control. The only good thing is that it can run, the disadvantage is the complement of "can run". For a while I wondered if I wasn't cut out for computer science, as all my childhood imaginings of geekiness had been completely ruined by my first semester's experience.

It all turned around during the winter break of my freshman year, when I had a hankering to learn Python. I overheard someone recommend CS61A, a freshman introductory course at UC Berkeley on Python. I'll never forget that day, when I opened the CS61A course website. It was like Columbus discovering a new continent, and I opened the door to a new world.

I finished the course in 3 weeks and for the first time I felt that CS could be so fulfilling and interesting, and I was shocked that there existed such a great course in the world.

To avoid any suspicion of pandering to foreign courses, I will tell you about my experience of studying CS61A from the perspective of a pure student.

  • Course website developed by course staffs: The course website integrates all the course resources into one, with a well organised course schedule, links to all slides, recorded videos and homework, detailed and clear syllabus, list of exams and solutions from previous years. Aesthetics aside, this website is so convenient for students.

  • Textbook written by course instructor: The course instructor has adapted the classic MIT textbook Structure and Interpretation of Computer Programs (SICP) into Python (the original textbook was based on Scheme). This is a great way to ensure that the classroom content is consistent with the textbook, while adding more details. The entire book is open source and can be read directly online.

  • Various, comprehensive and interesting homework: There are 14 labs to reinforce the knowledge gained in class, 10 homework assignments to practice, and 4 projects each with thousands of lines of code, all with well-organized skeleton code and babysitting instructions. Unlike the old-school OJ and Word document assignments, each lab/homework/project has a detailed handout document, fully automated grading scripts, and CS61A staffs have even developed an automated assignment submission and grading system. Of course, one might say "How much can you learn from a project where most of code are written by your teaching assistants?" . For someone who is new to CS and even stumbling over installing Python, this well-developed skeleton code allows students to focus on reinforcing the core knowledge they've learned in class, but also gives them a sense of achievement that they already can make a little game despite of learning Python only for a month. It also gives them the opportunity to read and learn from other people's high quality code so that they can reuse it later. I think in the freshman year, this kind of skeleton code is absolutely beneficial. The only bad thing perhaps is for the instructors and teaching assistants, as developing such assignments can conceivably require a considerable time commitment.

  • Weekly discussion sessions: The teaching assistants will explain the difficult knowledge in class and add some supplementary materials which may not be covered in class. Also, there will be exercises from exams of previous years. All the exercises are written in LaTeX with solutions.

In CS61A, You don't need any prerequesites about CS at all. You just need to pay attention, spend time and work hard. The feeling that you do not know what to do, that you are not getting anything in return for all the time you put in, is gone. It suited me so well that I fell in love with self-learning.

Imagine that if someone could chew up the hard knowledge and present it to you in a vivid and straightforward way, with so many fancy and varied projects to reinforce your theoretical knowledge, you'd think they were really trying their best to make you fully grasp the course, and it was even an insult to the course builders not to learn it well.

If you think I'm exaggerating, start with CS61A, because it's where my dreams began.

Why write this book?

In the 2020 Fall semester, I worked as a teaching assistant for the class Introduction to Computer Systems at Peking University. At that time, I had been studying totally on my own for over a year. I enjoyed this style of learning immensely. To share this joy, I have made a CS Self-learning Materials List for students in my seminar. It was purely on a whim at the time, as I wouldn't dare to encourage my students to skip classes and study on their own.

But after another year of maintenance, the list has become quite comprehensive, covering most of the courses in Computer Science, Artificial Intelligence and Soft Engineering, and I have built separate repositories for each course, summarising the self-learning materials that I used.

In my last college year, when I opened up my curriculum book, I realized that it was already a subset of my self-learning list. By then, it was only two and a half years after I had started my self-learning journey. Then, a bold idea came to my mind: perhaps I could create a self-learning book, write down the difficulty I encountered and the interest I found during these years of self-learning, hoping to make it easy for students who may also enjoy self-learning to start their wonderful self-learning journey.

If you can build up the whole CS foundation in less than three years, have relatively solid mathematical skills and coding ability, experience dozens of projects with thousands of lines of code, master at least C/C++/Java/JS/Python/Go/Rust and other mainstream programming languages, have a good understanding of algorithms, circuits, architectures, networks, operating systems, compilers, artificial intelligence, machine learning, computer vision, natural language processing, reinforcement learning, cryptography, information theory, game theory, numerical analysis, statistics, distributed systems, parallel computing, database systems, computer graphics, web development, cloud computing, supercomputing etc. I think you will be confident enough to choose the area you are interested in, and you will be quite competitive in both industry and academia.

I firmly believe that if you have read to this line, you do not lack the ability and committment to learn CS well, you just need a good teacher to teach you a good course. And I will try my best to pick such courses for you, based on my three years of experience.

Pros

For me, the biggest advantage of self-learning is that I can adjust the pace of learning entirely according to my own progress. For difficult parts, I can watch the videos over and over again, Google it online and ask questions on StackOverflow until I have it all figured out. For those that I mastered relatively quickly, I could skip them at twice or even three times the speed.

Another great thing about self-learning is that you can learn from different perspectives. I have taken core courses such as architectures, networking, operating systems, and compilers from different universities. Different instructors may have different views on the same knowledge, which will broaden your horizon.

A third advantage of self-learning is that you do not need to go to the class, listening to the boring lectures.

Cons

Of course, as a big fan of self-learning, I have to admit that it has its disadvantages.

The first is the difficulty of communication. I'm actually a very keen questioner, and I like to follow up all the points I don't understand. But when you're facing a screen and you hear a teacher talking about something you don't understand, you can't go to the other end of the network and ask him or her for clarification. I try to mitigate this by thinking independently and making good use of Google, but it would be great to have a few friends to study together. You can refer to README for more information on participating a community group.

The second thing is that these courses are basically in English. From the videos to the slides to the assignments, all in English. You may struggle at first, but I think it's a challenge that if you overcome, it will be extremely rewarding. Because at the moment, as reluctant as I am, I have to admit that in computer science, a lot of high quality documentation, forums and websites are all in English.

The third, and I think the most difficult one, is self-discipline. Because have no DDL can sometimes be a really scary thing, especially when you get deeper, many foreign courses are quite difficult. You have to be self-driven enough to force yourself to settle down, read dozens of pages of Project Handout, understand thousands of lines of skeleton code and endure hours of debugging time. With no credits, no grades, no teachers, no classmates, just one belief - that you are getting better.

Who is this book for?

As I said in the beginning, anyone who is interested in learning computer science on their own can refer to this book. If you already have some basic skills and are just interested in a particular area, you can selectively pick and choose what you are interested in to study. Of course, if you are a novice who knows nothing about computers like I did back then, and just begin your college journey, I hope this book will be your cheat sheet to get the knowledge and skills you need in the least amount of time. In a way, this book is more like a course search engine ordered according to my experience, helping you to learn high quality CS courses from the world's top universities without leaving home.

Of course, as an undergraduate student who has not yet graduated, I feel that I am not in a position nor have the right to preach one way of learning. I just hope that this material will help those who are also self-motivated and persistent to gain a richer, more varied and satisfying college life.

Special thanks

I would like to express my sincere gratitude to all the professors who have made their courses public for free. These courses are the culmination of decades of their teaching careers, and they have chosen to selflessly make such a high quality CS education available to all. Without them, my university life would not have been as fulfilling and enjoyable. Many of the professors would even reply with hundreds of words in length after I had sent them a thank you email, which really touched me beyond words. They also inspired me all the time that if decide to do something, do it with all heart and soul.

Want to join as a contributor?

There is a limit to how much one person can do, and this book was written by me under a heavy research schedule, so there are inevitably imperfections. In addition, as I work in the area of systems, many of the courses focus on systems, and there is relatively little content related to advanced mathematics, computing theory, and advanced algorithms. If any of you would like to share your self-learning experience and resources in other areas, you can directly initiate a Pull Request in the project, or feel free to contact me by email (zhongyinmin@pku.edu.cn).

Image title

Foreword

The English version is still under development, please check this issue if you want to contribute.

This is a self-learning guide to computer science, and a memento of my three years of self-learning at university.

It is also a gift to the young students at Peking University. It would be a great encouragement and comfort to me if this book could be of even the slightest help to you in your college life.

The book is currently organized to include the following sections (if you have other good suggestions, or would like to join the ranks of contributors, please feel free to email zhongyinmin@pku.edu.cn or ask questions in the issue).

  • Productivity Toolkit: IDE, VPN, StackOverflow, Git, Github, Vim, Latex, GNU Make and so on.
  • Environment configuration: PC/Server development environment setup, DevOps tutorials and so on.
  • Book recommendations: Those who have read the CSAPP must have realized the importance of good books. I will list links to books and resources in different areas of Computer Science that I find rewarding to read.
  • List of high quality CS courses: I will summarize all the high quality foreign CS courses I have taken into different categories and give relevant self-learning advice. Most of them will have a separate repository containing relevant resources as well as my homework/project implementations.

The place where dreams start —— CS61A

In my freshman year, I was a novice who knew nothing about computers. I installed a giant IDE Visual Studio and fight with OJ every day. With my high school maths background, I did pretty well in maths courses, but I felt struggled to learn courses in my major. When it came to programming, all I could do was open up that clunky IDE, create a new project that I didn't know exactly what it was for, and then cin, cout, for loops, and then CE, RE, WA loops. I was in a state where I was desperately trying to learn well but I didn't know how to learn. I listened carefully in class but I couldn't solve the homework problems. I spent almost all my spare time doing the homework after class, but the results were disappointing. I still retain the source code of the project for Introduction to Computing course —— a single 1200-line C++ file with no header files, no class abstraction, no unit tests, no makefile, no version control. The only good thing is that it can run, the disadvantage is the complement of "can run". For a while I wondered if I wasn't cut out for computer science, as all my childhood imaginings of geekiness had been completely ruined by my first semester's experience.

It all turned around during the winter break of my freshman year, when I had a hankering to learn Python. I overheard someone recommend CS61A, a freshman introductory course at UC Berkeley on Python. I'll never forget that day, when I opened the CS61A course website. It was like Columbus discovering a new continent, and I opened the door to a new world.

I finished the course in 3 weeks and for the first time I felt that CS could be so fulfilling and interesting, and I was shocked that there existed such a great course in the world.

To avoid any suspicion of pandering to foreign courses, I will tell you about my experience of studying CS61A from the perspective of a pure student.

  • Course website developed by course staffs: The course website integrates all the course resources into one, with a well organised course schedule, links to all slides, recorded videos and homework, detailed and clear syllabus, list of exams and solutions from previous years. Aesthetics aside, this website is so convenient for students.

  • Textbook written by course instructor: The course instructor has adapted the classic MIT textbook Structure and Interpretation of Computer Programs (SICP) into Python (the original textbook was based on Scheme). This is a great way to ensure that the classroom content is consistent with the textbook, while adding more details. The entire book is open source and can be read directly online.

  • Various, comprehensive and interesting homework: There are 14 labs to reinforce the knowledge gained in class, 10 homework assignments to practice, and 4 projects each with thousands of lines of code, all with well-organized skeleton code and babysitting instructions. Unlike the old-school OJ and Word document assignments, each lab/homework/project has a detailed handout document, fully automated grading scripts, and CS61A staffs have even developed an automated assignment submission and grading system. Of course, one might say "How much can you learn from a project where most of code are written by your teaching assistants?" . For someone who is new to CS and even stumbling over installing Python, this well-developed skeleton code allows students to focus on reinforcing the core knowledge they've learned in class, but also gives them a sense of achievement that they already can make a little game despite of learning Python only for a month. It also gives them the opportunity to read and learn from other people's high quality code so that they can reuse it later. I think in the freshman year, this kind of skeleton code is absolutely beneficial. The only bad thing perhaps is for the instructors and teaching assistants, as developing such assignments can conceivably require a considerable time commitment.

  • Weekly discussion sessions: The teaching assistants will explain the difficult knowledge in class and add some supplementary materials which may not be covered in class. Also, there will be exercises from exams of previous years. All the exercises are written in LaTeX with solutions.

In CS61A, You don't need any prerequesites about CS at all. You just need to pay attention, spend time and work hard. The feeling that you do not know what to do, that you are not getting anything in return for all the time you put in, is gone. It suited me so well that I fell in love with self-learning.

Imagine that if someone could chew up the hard knowledge and present it to you in a vivid and straightforward way, with so many fancy and varied projects to reinforce your theoretical knowledge, you'd think they were really trying their best to make you fully grasp the course, and it was even an insult to the course builders not to learn it well.

If you think I'm exaggerating, start with CS61A, because it's where my dreams began.

Why write this book?

In the 2020 Fall semester, I worked as a teaching assistant for the class "Introduction to Computer Systems" at Peking University. At that time, I had been studying totally on my own for over a year. I enjoyed this style of learning immensely. To share this joy, I have made a CS Self-learning Materials List for students in my seminar. It was purely on a whim at the time, as I wouldn't dare to encourage my students to skip classes and study on their own.

But after another year of maintenance, the list has become quite comprehensive, covering most of the courses in Computer Science, Artificial Intelligence and Soft Engineering, and I have built separate repositories for each course, summarising the self-learning materials that I used.

In my last college year, when I opened up my curriculum book, I realized that it was already a subset of my self-learning list. By then, it was only two and a half years after I had started my self-learning journey. Then, a bold idea came to my mind: perhaps I could create a self-learning book, write down the difficulty I encountered and the interest I found during these years of self-learning, hoping to make it easy for students who may also enjoy self-learning to start their wonderful self-learning journey.

If you can build up the whole CS foundation in less than three years, have relatively solid mathematical skills and coding ability, experience dozens of projects with thousands of lines of code, master at least C/C++/Java/JS/Python/Go/Rust and other mainstream programming languages, have a good understanding of algorithms, circuits, architectures, networks, operating systems, compilers, artificial intelligence, machine learning, computer vision, natural language processing, reinforcement learning, cryptography, information theory, game theory, numerical analysis, statistics, distributed systems, parallel computing, database systems, computer graphics, web development, cloud computing, supercomputing etc. I think you will be confident enough to choose the area you are interested in, and you will be quite competitive in both industry and academia.

I firmly believe that if you have read to this line, you do not lack the ability and committment to learn CS well, you just need a good teacher to teach you a good course. And I will try my best to pick such courses for you, based on my three years of experience.

Pros

For me, the biggest advantage of self-learning is that I can adjust the pace of learning entirely according to my own progress. For difficult parts, I can watch the videos over and over again, Google it online and ask questions on StackOverflow until I have it all figured out. For those that I mastered relatively quickly, I could skip them at twice or even three times the speed.

Another great thing about self-learning is that you can learn from different perspectives. I have taken core courses such as architectures, networking, operating systems, and compilers from different universities. Different instructors may have different views on the same knowledge, which will broaden your horizon.

A third advantage of self-learning is that you do not need to go to the class, listening to the boring lectures.

Cons

Of course, as a big fan of self-learning, I have to admit that it has its disadvantages.

The first is the difficulty of communication. I'm actually a very keen questioner, and I like to follow up all the points I don't understand. But when you're facing a screen and you hear a teacher talking about something you don't understand, you can't go to the other end of the network and ask him or her for clarification. I try to mitigate this by thinking independently and making good use of Google, but it would be great to have a few friends to study together. You can refer to README for more information on participating a community group.

The second thing is that these courses are basically in English. From the videos to the slides to the assignments, all in English. You may struggle at first, but I think it's a challenge that if you overcome, it will be extremely rewarding. Because at the moment, as reluctant as I am, I have to admit that in computer science, a lot of high quality documentation, forums and websites are all in English.

The third, and I think the most difficult one, is self-discipline. Because have no DDL can sometimes be a really scary thing, especially when you get deeper, many foreign courses are quite difficult. You have to be self-driven enough to force yourself to settle down, read dozens of pages of Project Handout, understand thousands of lines of skeleton code and endure hours of debugging time. With no credits, no grades, no teachers, no classmates, just one belief - that you are getting better.

Who is this book for?

As I said in the beginning, anyone who is interested in learning computer science on their own can refer to this book. If you already have some basic skills and are just interested in a particular area, you can selectively pick and choose what you are interested in to study. Of course, if you are a novice who knows nothing about computers like I did back then, and just begin your college journey, I hope this book will be your cheat sheet to get the knowledge and skills you need in the least amount of time. In a way, this book is more like a course search engine ordered according to my experience, helping you to learn high quality CS courses from the world's top universities without leaving home.

Of course, as an undergraduate student who has not yet graduated, I feel that I am not in a position nor have the right to preach one way of learning. I just hope that this material will help those who are also self-motivated and persistent to gain a richer, more varied and satisfying college life.

Special thanks

I would like to express my sincere gratitude to all the professors who have made their courses public for free. These courses are the culmination of decades of their teaching careers, and they have chosen to selflessly make such a high quality CS education available to all. Without them, my university life would not have been as fulfilling and enjoyable. Many of the professors would even reply with hundreds of words in length after I had sent them a thank you email, which really touched me beyond words. They also inspired me all the time that if decide to do something, do it with all heart and soul.

Want to join as a contributor?

There is a limit to how much one person can do, and this book was written by me under a heavy research schedule, so there are inevitably imperfections. In addition, as I work in the area of systems, many of the courses focus on systems, and there is relatively little content related to advanced mathematics, computing theory, and advanced algorithms. If any of you would like to share your self-learning experience and resources in other areas, you can directly initiate a Pull Request in the project, or feel free to contact me by email (zhongyinmin@pku.edu.cn).

ETH: Computer Architecture

课程简介

  • 所属大学:ETH Zurich
  • 先修要求:DDCA
  • 编程语言:C/C++,verilog
  • 课程难度:🌟🌟🌟🌟
  • 预计学时:70 小时 +

讲解计算机体系结构,授课教师是 Onur Mutlu 教授。本课程根据课程描述应该是DDCA的进阶课程,课程目标是学习如何为类MIPS处理器设计控制和数据通路硬件,如何通过流水线和简单的超标量执行使机器指令同时执行,以及如何设计快速的内存和存储系统。根据同学反馈,从课程本身的难度上说,至少高于 CS61C ,课程的部分内容十分前沿,B站搬运UP主建议大家作为卡内基梅隆大学18-447的补充。所提供的阅读材料十分丰富,相当于听了一学期讲座。

以下是官网的介绍:

We will learn the fundamental concepts of the different parts of modern computing systems, as well as the latest major research topics in Industry and Academia. We will extensively cover memory systems (including DRAM and new Non-Volatile Memory technologies, memory controllers, flash memory), new paradigms like processing-in-memory, parallel computing systems (including multicore processors, coherence and consistency, GPUs), heterogeneous computing, interconnection networks, specialized systems for major data-intensive workloads (e.g. graph analytics, bioinformatics, machine learning), etc. We will focus on fundamentals as well as cutting-edge research. Significant attention will be given to real-life examples and tradeoffs, as well as critical analysis of modern computing systems.

编程实践采取 Verilog 设计和模拟类 MIPS 流水线处理器的寄存器传输(RT)实现,以此加强对理论课程的理解。因此前几个实验会有 verilog 的 CPU 流水线编程。同时还将使用C语言开发一个周期精确的处理器模拟器,并使用该模拟器探索处理器设计选项。

课程资源

资源汇总

国内有高校引入了这门课,因此有需要的同学可以搜索到一些资源。

ETH: Computer Architecture

Course Overview

  • University: ETH Zurich
  • Prerequisites: DDCA
  • Programming Language: C/C++, Verilog
  • Difficulty Level: 🌟🌟🌟🌟
  • Estimated Study Time: 70+ hours

This course, taught by Professor Onur Mutlu, delves into computer architecture. It appears to be an advanced course following DDCA, aimed at teaching how to design control and data paths hardware for a MIPS-like processor, how to execute machine instructions concurrently through pipelining and simple superscalar execution, and how to design fast memory and storage systems. According to student feedback, the course is at least more challenging than CS61C, and some of its content is cutting-edge. Bilibili uploaders recommend it as a supplement to Carnegie Mellon University's 18-447 course. The reading materials provided are extensive, akin to attending a semester's worth of lectures.

The official website description is as follows:

"We will learn the fundamental concepts of the different parts of modern computing systems, as well as the latest major research topics in Industry and Academia. We will extensively cover memory systems (including DRAM and new Non-Volatile Memory technologies, memory controllers, flash memory), new paradigms like processing-in-memory, parallel computing systems (including multicore processors, coherence and consistency, GPUs), heterogeneous computing, interconnection networks, specialized systems for major data-intensive workloads (e.g., graph analytics, bioinformatics, machine learning), etc. We will focus on fundamentals as well as cutting-edge research. Significant attention will be given to real-life examples and tradeoffs, as well as critical analysis of modern computing systems."

The programming practice involves using Verilog to design and simulate RT implementations of a MIPS-like pipeline processor to enhance theoretical course understanding. The initial experiments include Verilog CPU pipeline programming. Additionally, students will develop a cycle-accurate processor simulator in C and explore processor design options using this simulator.

Course Resources

Resource Summary

Some universities in China have introduced this course, so interested students can find additional resources through online searches.

如何使用这本书

随着贡献者的不断增多,本书的内容也不断扩展,想把书中所有的课程全部学完是不切实际也没有必要的,甚至会起到事倍功半的反效果,吃力而不讨好。为了更好地贴合读者,让这本书真正为你所用,我将读者按照需求大致分为了如下三类,大家可以结合切身实际,精准地规划属于自己的自学方案。

初入校园

如果你刚刚进入大学校园或者还在低年级,并且就读的是计算机方向或者想要转到计算机方向,那么你很幸运,因为学习是你的本业,你可以有充足的时间和自由来学习自己感兴趣的东西,不会有工作的压力和生活的琐碎,不必过于纠结“学了有没有用”,“能不能找到工作”这类功利的想法。那么该如何安排自己的学业呢?我觉得首要的一点就是要打破在高中形成的“按部就班”式的被动学习。作为一个小镇做题家,我深知国内大部分高中会把大家一天当中的每一分钟都安排得满满当当,你只需要被动地跟着课表按部就班地完成一个个既定的任务。只要足够认真,结果都不会太差。但步入大学的校门,自由度一下子变大了许多。首先所有的课外时间基本都由你自由支配,没有人为你整理知识点,总结提纲,考试也不像高中那般模式化。如果你还抱着高中那种“乖学生”的心态,老老实实按部就班,结果未必如你所愿。因为专业培养方案未必就是合理,老师的教学未必就会负责,认真出席课堂未必就能听懂,甚至考试内容未必就和讲的有关系。说句玩笑话,你或许会觉得全世界都与你为敌,而你只能指望自己。

那么现状就是这么个现状,你想改变,也得先活过去,并且拥有足够的能力去质疑它。而在低年级,打好基础很重要。这里的基础是全方面的,课内的知识固然重要,但计算机很大程度上还是强调实践,因此有很多课本外的能力需要培养,而这恰恰是国内的计算机本科教育很欠缺的一点。我根据个人的体验总结出了下面几点建议,供大家参考。

其一就是了解如何写“优雅”的代码。国内的很多大一编程入门课都会讲成极其无聊的语法课,其效果还不如直接让学生看官方文档。事实上,在刚开始接触编程的时候,让学生试着去了解什么样的代码是优雅的,什么样的代码 "have bad taste" 是大有裨益的。一般来说,编程入门课会先介绍过程式编程(例如 C 语言)。但即便是面向过程编程,模块化封装 的思想也极其重要。如果你只想着代码能在 OpenJudge 上通过,写的时候图省事,用大段的复制粘贴和臃肿的 main 函数,长此以往,你的代码质量将一直如此。一旦接触稍微大一点的项目,无尽的 debug 和沟通维护成本将把你吞没。因此,写代码时不断问自己,是否有大量重复的代码?当前函数是否过于复杂(Linux 提倡每个函数只需要做好一件事)?这段代码能抽象成一个函数吗?一开始你可能觉得很不习惯,甚至觉得这么简单的题需要如此大费周章吗?但记住好的习惯是无价的,C 语言初中生都能学会,凭什么公司要招你去当程序员呢?

学过面向过程编程后,大一下学期一般会讲面向对象编程(例如 C++ 或 Java)。这里非常推荐大家看 MIT 6.031: Software Construction 这门课的 Notes,会以 Java 语言为例非常详细地讲解如何写出“优雅”的代码。例如 Test-Driven 的开发、函数 Specification 的设计、异常的处理等等等等。除此之外,既然接触了面向对象,那么了解一些常见的设计模式也是很有必要的。因为国内的面向对象课程同样很容易变成极其无聊的语法课,让学生纠结于各种继承的语法,甚至出一些无聊的脑筋急转弯一样的题目,殊不知这些东西在地球人的开发中基本不会用到。面向对象的精髓是让学生学会自己将实际的问题抽象成若干类和它们之间的关系,而设计模式则是前人总结出来的一些精髓的抽象方法。这里推荐大话设计模式 这本书,写得非常浅显易懂。

其二就是尝试学习一些能提高生产力的工具和技能,例如 Git、Shell、Vim。这里强烈推荐学习 MIT missing semester 这门课,也许一开始接触这些工具用起来会很不习惯,但强迫自己用,熟练之后开发效率会直线提高。此外,还有很多应用也能极大提高的你生产力。一条定律是:一切需要让手离开键盘的操作,都应该想办法去除。例如切换应用、打开文件、浏览网页这些都有相关插件可以实现快捷操作(例如 Mac 上的 Alfred)。如果你发现某个操作每天都会用到,并且用时超过1秒,那就应该想办法把它缩减到0.1秒。毕竟以后数十年你都要和电脑打交道,形成一套顺滑的工作流是事半功倍的。最后,学会盲打!如果你还需要看着键盘打字,那么赶紧上网找个教程学会盲打,这将极大提高你的开发效率。

其三就是平衡好课内和自学。我们质疑现状,但也得遵守规则,毕竟绩点在保研中还是相当重要的。因此在大一,我还是建议大家尽量按照自己的课表学习,但辅以一些优质的课外资源。例如微积分线代可以参考 MIT 18.01/18.02MIT 18.06 的课程 Notes。假期可以通过 UCB CS61A 来学习 Python。同时做到上面第一、第二点说的,注重好的编程习惯和实践能力的培养。就个人经验,大一的数学课学分占比相当大,而且数学考试的内容方差是很大的,不同学校不同老师风格迥异,自学也许能让你领悟数学的本质,但未必能给你一个好成绩。因此考前最好有针对性地刷往年题,充分应试。

在升入大二之后,计算机方向的专业课将居多,此时大家可以彻底放飞自我,进入自学的殿堂了。具体可以参考 一份仅供参考的CS学习规划,这是我根据自己三年自学经历总结提炼出来的全套指南,每门课的特点以及为什么要上这门课我都做了简单的介绍。对于你课表上的每个课程,这份规划里应该都会有相应的国外课程,而且在质量上我相信基本是全方位的碾压。由于计算机方向的专业知识基本是一样的,而且高质量的课程会让你从原理上理解知识点,对于国内大多照本宣科式的教学来说基本是降维打击。一般来说只要考前将老师“辛苦”念了一学期的 PPT 拿来突击复习两天,取得一个不错的卷面分数并不困难。如果有课程大作业,则可以尽量将国外课程的 Lab 或者 Project 修改一番以应付课内的需要。我当时上操作系统课,发现老师还用着早已被国外学校淘汰的课程实验,便邮件老师换成了自己正在学习的 MIT 6.S081 的 xv6 Project,方便自学的同时还无意间推动了课程改革。总之,灵活变通是第一要义,你的目标是用最方便、效率最高的方式掌握知识,所有与你这一目标违背的所谓规定都可以想方设法地去“糊弄”。凭着这份糊弄劲儿,我大三之后基本没有去过线下课堂(大二疫情在家呆了大半年),对绩点也完全没有影响。

最后,希望大家少点浮躁和功利,多一些耐心和追求。很多人发邮件问我自学需不需要很强的自制力,我觉得得关键得看你自己想要什么。如果你依然抱着会一门编程语言便能月薪过万的幻想,想分一杯互联网的红利,那么我说再多也是废话。其实我最初的自学并没有太多功利的想法,只是单纯的好奇和本能的求知欲。自学的过程也没有所谓的“头悬梁,锥刺股”,该吃吃,该玩玩,不知不觉才发现竟然攒下了这么多资料。现如今中美的对抗已然成为趋势,而我们还在“卑微”地“师夷长技”,感叹国外高质量课程的同时也时常会有一种危机感。这一切靠谁来改变呢?靠的是刚刚入行的你们。所以,加油吧,少年!

删繁就简

如果你已经本科毕业开始读研或者走上了工作岗位,亦或是从事着其他领域的工作想要利用业余时间转码,那么你也许并没有充足的业余时间来系统地学完 一份仅供参考的CS学习规划 里的内容,但又想弥补本科时期欠下的基础。考虑到这部分读者通常有一定的编程经验,入门课程没有必要再重复学习。而且从实用角度来说,由于工作的大体方向已经确定,确实没有太大必要对于每个计算机分支都有特别深入的研究,更应该侧重一些通用性的原则和技能。因此我结合自身经历,选取了个人感觉最重要也是质量最高的几门核心专业课,希望能更好地加深读者对计算机的理解。学完这些课程,无论你具体从事的是什么工作,我相信你将不可能沦为一个普通的调包侠,而是对计算机的底层运行逻辑有更深入的了解。

课程方向 课程名
离散数学和概率论 UCB CS70 : discrete Math and probability theory
数据结构与算法 Coursera: Algorithms I & II
软件工程 MIT 6.031: Software Construction
全栈开发 MIT web development course
计算机系统导论 CMU CS15213: CSAPP
体系结构入门 Coursera: Nand2Tetris
体系结构进阶 CS61C: Great Ideas in Computer Architecture
数据库原理 CMU 15-445: Introduction to Database System
计算机网络 Computer Networking: A Top-Down Approach
人工智能 Harvard CS50: Introduction to AI with Python
深度学习 Coursera: Deep Learning

心有所属

如果你对于计算机领域的核心专业课都掌握得相当扎实,而且已经确定了自己的工作或研究方向,那么书中还有很多未在 一份仅供参考的CS学习规划 提到的课程供你探索。

随着贡献者的不断增多,左侧的目录中将不断增加新的分支,例如 机器学习进阶机器学习系统。并且同一个分支下都有若干同类型课程,它们来自不同的学校,有着不同的侧重点和课程实验,例如 操作系统 分支下就包含了麻省理工、伯克利、南京大学还有哈工大四所学校的课程。如果你想深耕一个领域,那么学习这些同类的课程会给你不同的视角来看待类似的知识。同时,本书作者还计划联系一些相关领域的科研工作者来分享某个细分领域的科研学习路径,让 CS自学指南 在追求广度的同时,实现深度上的提高。

如果你想贡献这方面的内容,欢迎和作者邮件联系 zhongyinmin@pku.edu.cn

How to Use This Book

As the number of contributors grows, the content of this book keeps expanding. It is impractical and unnecessary to try to complete all the courses in the book. Attempting to do so might even be counterproductive, resulting in effort without reward. To better align with our readers and make this book truly useful for you, I have roughly divided readers into the following three categories based on their needs. Everyone can plan their own self-study program accurately according to their actual situation.

Freshmen

If you have just entered the university or are in the lower grades, and you are studying or planning to switch to computer science, then you are lucky. As studying is your main task, you have ample time and freedom to learn what you are interested in without the pressure of work and daily life. You needn't be overly concerned with utilitarian thoughts like "is it useful" or "can it help me find a job". So, how should you arrange your studies? The first point is to break away from the passive learning style formed in high school. As a small-town problem solver, I know that most Chinese high schools fill every minute of your day with tasks, and you just need to passively follow the schedule. As long as you are diligent, the results won’t be too bad. However, once you enter university, you have much more freedom. All your extracurricular time is yours to use, and no one will organize knowledge points or summarize outlines for you. Exams are not as formulaic as in high school. If you still hold the mentality of a "good high school student", following everything step by step, the results may not be as expected. The professional training plan may not be reasonable, the teaching may not be responsible, attending classes may not guarantee understanding, and even the exam content may not relate to what was taught. Jokingly, you might feel that the whole world is against you, and you can only rely on yourself.

Given this reality, if you want to change it, you must first survive and have the ability to question it. In the lower grades, it’s important to lay a solid foundation. This foundation is comprehensive, covering both in-class knowledge and practical skills, which are often lacking in China's undergraduate computer science education. Based on personal experience, I offer the following suggestions for your reference.

First, learn how to write "elegant" code. Many programming introductory courses in China can be extremely boring syntax classes, less effective than reading official documentation. Initially, letting students understand what makes code elegant and what constitutes "bad taste" is beneficial. Introductory courses usually start with procedural programming (like C language), but even here, the concepts of modularity and encapsulation are crucial. If you write code just to pass on OpenJudge, using lengthy copy-pasting and bloated main functions, your code quality will remain poor. For larger projects, endless debugging and maintenance costs will overwhelm you. So, constantly ask yourself, is there a lot of repetitive code? Is the current function too complex (Linux advocates each function should do only one thing)? Can this code be abstracted into a function? Initially, this may seem cumbersome for simple problems, but remember, good habits are invaluable. Even middle school students can master C language, so why should a company hire you as a software engineer?

After procedural programming, the second semester of the freshman year usually introduces object-oriented programming (like C++ or Java). I highly recommend MIT 6.031: Software Construction course notes, which use Java (switch to TypeScript after 2022) to explain how to write “elegant” code in detail, including Test-Driven development, function Specification design, exception handling, and more. Also, understanding common design patterns is necessary when learning object-oriented programming. Domestic object-oriented courses can easily become dull syntax classes, focusing on inheritance syntax and puzzling questions, neglecting that these are rarely used in real-world development. The essence of object-oriented programming is teaching students to abstract real problems into classes and their relationships, and design patterns are the essence of these abstractions. I recommend the book "Big Talk Design Patterns", which is very easy to understand.

Second, try to learn some productivity-enhancing tools and skills, such as Git, Shell, Vim. I strongly recommend the MIT missing semester course. Initially, you may feel awkward, but force yourself to use them, and your development efficiency will skyrocket. Additionally, many applications can greatly increase your productivity. A rule of thumb is: any action that requires your hands to leave the keyboard should be eliminated. For example, switching applications, opening files, browsing the web - there are plugins for these (like Alfred for Mac). If you find an daily operation that takes more than 1 second, try to reduce it to 0.1 seconds. After all, you'll be dealing with computers for decades, so forming a smooth workflow can greatly enhance efficiency. Lastly, learn to touch type! If you still need to look at the keyboard while typing, find a tutorial online and learn to type without looking. This will significantly increase your development efficiency.

Third, balance coursework and self-learning. We feel angry about the institution but must also follow the rules, as GPA is still important for postgraduate recommendations. Therefore, in the first year, I suggest focusing on the curriculum, complemented by high-quality extracurricular resources. For example, for calculus and linear algebra, refer to MIT 18.01/18.02 and MIT 18.06. During holidays, learn Python through UCB CS61A. Also, focus on good programming habits and practical skills mentioned above. From my experience, mathematics courses matter a lot for your GPA in the first year, and the content of math exams varies greatly between different schools and teachers. Self-learning might help you understand the essence of mathematics, but it may not guarantee good grades. Therefore, it’s better to specifically practice past exams.

In your sophomore year, as computer science courses become the majority, you can fully immerse yourself in self-learning. Refer to A Reference Guide for CS Learning, a guide I created based on three years of self-learning, introducing each course and its importance. For every course in your curriculum, this guide should have a corresponding one, and I believe they are of higher quality. If there are course projects, try to adapt labs or projects from these self-learning courses. For example, I took an operating systems course and found the teacher was still using experiments long abandoned by UC Berkeley, so I emailed the teacher to switch to the MIT 6.S081 xv6 Project I was studying. This allowed me to self-learn while inadvertently promoting curriculum reform. In short, be flexible. Your goal is to master knowledge in the most convenient and efficient way. Anything that contradicts this goal can be “fudged” as necessary. With this attitude, after my junior year, I barely attended offline classes (I spent most of my sophomore year at home due to the pandemic), and it had no impact on my GPA.

Finally, I hope everyone can be less impetuous and more patient in their pursuit. Many ask if self-learning requires strong self-discipline. It depends on what you want. If you still hold the illusion that mastering a programming language will earn you a high salary and a share of the internet’s profits, then whatever I say is pointless. Initially, my motivation was out of pure curiosity and a natural desire for knowledge, not for utilitarian reasons. The process didn't involve “extraordinary efforts”; I spent my days in college as usual and gradually accumulated this wealth of materials. Now, as the US-China confrontation becomes a trend, we still humbly learn techniques from the West. Who will change this? You, the newcomers. So, go for it, young man!

Simplify the Complex

If you have graduated and started postgraduate studies, or have begun working, or are in another field and want to learn coding in your spare time, you may not have enough time to systematically complete the materials in A Reference Guide for CS Learning, but still want to fill the gaps in your undergraduate foundation. Considering that these readers usually has some programming experience, there is no need to repeat introductory courses. From a practical standpoint, since the general direction of work is already determined, there is no need to deeply study every branch of computer science. Instead, focus on general principles and skills. Based on my own experience, I've selected the most important and highest quality core professional courses to deepen readers' understanding of computer science. After completing these courses, regardless of your specific job, I believe you won't just be an ordinary coder, but will have a deeper understanding of the underlying logic of computers.

Course Direction Course Name
Discrete Mathematics and Probability Theory UCB CS70: Discrete Math and Probability Theory
Data Structures and Algorithms Coursera: Algorithms I & II
Software Engineering MIT 6.031: Software Construction
Full-Stack Development MIT Web Development Course
Introduction to Computer Systems CMU CS15213: CSAPP
Introductory System Architecture Coursera: Nand2Tetris
Advanced System Architecture CS61C: Great Ideas in Computer Architecture
Principles of Databases CMU 15-445: Introduction to Database Systems
Computer Networking Computer Networking: A Top-Down Approach
Artificial Intelligence Harvard CS50: Introduction to AI with Python
Deep Learning Coursera: Deep Learning

Focused and Specialized

If you have a solid grasp of the core professional courses in computer science and have already determined your work or research direction, then there are many courses in the book not mentioned in A Reference Guide for CS Learning for you to explore.

As the number of contributors increases, new branches such as Advanced Machine Learning and Machine Learning Systems will be added to the navigation bar. Under each branch, there are several similar courses from different schools with different emphases and experiments, such as the Operating Systems branch, which includes courses from MIT, UC Berkeley, Nanjing University, and Harbin Institute of Technology. If you want to delve into a field, studying these similar courses will give you different perspectives on similar knowledge. Additionally, I plan to contact researchers in related fields to share research learning paths in specific subfields, enhancing the depth of the CS Self-learning Guide while pursuing breadth.

If you want to contribute in this area, feel free to contact the author via email zhongyinmin@pku.edu.cn.

GitHub

What is GitHub

Functionally, GitHub is an online platform for hosting code. You can host your local Git repositories on GitHub for collaborative development and maintained by a group. However, GitHub's significance has evolved far beyond that. It has become a very active and resource-rich open-source community. Developers from all over the world share a wide variety of open-source software on GitHub. From industrial-grade deep learning frameworks like PyTorch and TensorFlow to practical scripts consisting of just a few lines of code, GitHub offers hardcore knowledge sharing, beginner-friendly tutorials, and even many technical books are open-sourced here (like the one you're reading now). Browsing GitHub has become a part of my daily life.

On GitHub, stars are the ultimate affirmation for a project. If you find this book useful, you are welcome to enter the repository's homepage via the link in the upper right corner and give your precious star✨.

How to Use GitHub

If you have never created your own remote repository on GitHub or cloned someone else's code, I suggest you start your open-source journey with GitHub's official tutorial.

If you want to keep up with some interesting open-source projects on GitHub, I highly recommend the HelloGitHub website. It regularly features GitHub's recently trending or very interesting open-source projects, giving you the opportunity to access various quality resources firsthand.

I believe GitHub's success is due to the "one for all, all for one" spirit of open source and the joy of sharing knowledge. If you also want to become the next revered open-source giant or the author of a project with tens of thousands of stars, then transform your ideas that spark during development into code and showcase them on GitHub.

However, it's important to note that the open-source community is not lawless. Many open-source softwares are not meant for arbitrary copying, distribution, or even sale. Understanding various open-source licenses and complying with them is not only a legal requirement but also the responsibility of every member of the open-source community.

\ No newline at end of file diff --git a/en/必学工具/Latex/index.html b/en/必学工具/Latex/index.html new file mode 100644 index 00000000..e9769116 --- /dev/null +++ b/en/必学工具/Latex/index.html @@ -0,0 +1,35 @@ + LaTeX - csdiy.wiki

LaTeX

Why Learn LaTeX

If you need to write academic papers, please skip directly to the next section, as learning LaTeX is not just a choice but a necessity.

LaTeX is a typesetting system based on TeX, developed by Turing Award winner Lamport, while TeX was originally developed by Knuth, both of whom are giants in the field of computer science. Of course, the developers' prowess is not the reason we learn LaTeX. The biggest difference between LaTeX and the commonly used WYSIWYG (What You See Is What You Get) Word documents is that in LaTeX, users only need to focus on the content of the writing, leaving the typesetting entirely to the software. This allows people without any typesetting experience to produce papers or articles with highly professional formatting.

Berkeley computer science professor Christos Papadimitriou once jokingly said:

Every time I read a LaTeX document, I think, wow, this must be correct!

How to Learn LaTeX

The recommended learning path is as follows:

  • Setting up the LaTeX environment can be a headache. If you encounter problems with configuring LaTeX locally, consider using Overleaf, an online LaTeX editor. The site not only offers a variety of LaTeX templates to choose from but also eliminates the difficulty of environment setup.
  • Read the following three tutorials: Part-1, Part-2, Part-3.
  • The best way to learn LaTeX is, of course, by writing papers. However, starting with a math class and using LaTeX for homework is also a good choice.

Other recommended introductory materials include:

  • A brief guide to installing LaTeX [GitHub] or the TEX Live Guide (texlive-zh-cn) [PDF] can help you with installation and environment setup.
  • A (not so) brief introduction to LaTeX2ε (lshort-zh-cn) [PDF] [GitHub], translated by the CTEX development team, helps you get started quickly and accurately. It's recommended to read it thoroughly.
  • Liu Haiyang's "Introduction to LaTeX" can be used as a reference book, to be consulted when you have specific questions. Skip the section on CTEX suite.
  • Modern LaTeX Introduction Seminar
  • A Very Short LaTeX Introduction Document
\ No newline at end of file diff --git a/en/必学工具/Scoop/index.html b/en/必学工具/Scoop/index.html index ca2f9829..565e4341 100644 --- a/en/必学工具/Scoop/index.html +++ b/en/必学工具/Scoop/index.html @@ -1,14 +1,14 @@ - Scoop - csdiy.wiki

Scoop

为什么使用 Scoop

在 Windows 下,搭建开发环境一直是一个复杂且困难的问题。由于没有一个统一的标准,导致各种开发环境的安装方式差异巨大,需要付出很多不必要的时间成本。而 Scoop 可以帮助你统一安装并管理常见的开发软件,省去了手动下载安装,配置环境变量等繁琐步骤。

例如安装 python 和 nodejs 只需要执行:

scoop install python
+ Scoop - csdiy.wiki      

Scoop

Why Use Scoop

Setting up a development environment in Windows has always been a complex and challenging task. The lack of a unified standard means that the installation methods for different development environments vary greatly, resulting in unnecessary time costs. Scoop helps you uniformly install and manage common development software, eliminating the need for manual downloads, installations, and environment variable configurations.

For example, to install Python and Node.js, you just need to execute:

scoop install python
 scoop install nodejs
-

安装 Scoop

Scoop 需要 Windows PowerShell 5.1 或者 PowerShell 作为运行环境,如果你使用的是 Windows 10 及以上版本,Windows PowerShell 是内置在系统中的。而 Windows 7 内置的 Windows PowerShell 版本过于陈旧,你需要手动安装新版本的 PowerShell。

由于发现很多同学在设置 Windows 用户时使用了中文用户名,导致了用户目录也变成了中文名。如果按照 Scoop 的默认方式将软件安装到用户目录下,可能会造成部分软件执行错误。所以这里推荐安装到自定义目录,如果需要其他安装方式请参考: ScoopInstaller/Install

# 设置 PowerShell 执行策略
+

Installing Scoop

Scoop requires Windows PowerShell 5.1 or PowerShell as its runtime environment. If you are using Windows 10 or later, Windows PowerShell is built into the system. However, the version of Windows PowerShell built into Windows 7 is outdated, and you will need to manually install a newer version of PowerShell.

Many students have encountered issues due to setting up Windows user accounts with Chinese usernames, leading to user directories also being named in Chinese. Installing software via Scoop into user directories in such cases may cause some software to execute incorrectly. Therefore, it is recommended to install in a custom directory. For other installation methods, please refer to: ScoopInstaller/Install

# Set PowerShell execution policy
 Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser
-# 下载安装脚本
+# Download the installation script
 irm get.scoop.sh -outfile 'install.ps1'
-# 执行安装, --ScoopDir 参数指定 Scoop 安装路径
+# Run the installation, use --ScoopDir parameter to specify Scoop installation path
 .\install.ps1 -ScoopDir 'C:\Scoop'
-

使用 Scoop

Scoop 的官方文档对于新手非常友好,相对于在此处赘述更推荐阅读 官方文档快速入门

Q&A

Scoop 能配置镜像源吗?

Scoop 社区仅维护安装配置,所有的软件都是从该软件官方提供的下载链接进行下载,所以无法提供镜像源。如果因为你的网络环境导致多次下载失败,那么你需要一点点 魔法

为什么找不到 Java8?

原因同上,官方已不再提供 Java8 的下载链接,推荐使用 ojdkbuild8 替代。

我需要安装 python2 该如何操作?

对于已经过时弃用的软件,Scoop 社区会将其从 ScoopInstaller/Main 中移除并将其添加到 ScoopInstaller/Versions 中。如果你需要这些软件的话需要手动添加 bucket:

scoop bucket add versions
+

Using Scoop

Scoop's official documentation is very user-friendly for beginners. Instead of elaborating here, it is recommended to read the official documentation or the Quick Start guide.

Q&A

Can Scoop Configure Mirror Sources?

The Scoop community only maintains installation configurations, and all software is downloaded from the official download links provided by the software's creators. Therefore, mirror sources are not provided. If your network environment causes repeated download failures, you may need a bit of magic.

Why Can't I Find Java 8?

For the same reasons mentioned above, the official download links for Java 8 are no longer provided. It is recommended to use ojdkbuild8 as a substitute.

How Do I Install Python 2?

For software that is outdated and no longer in use, the Scoop community removes it from ScoopInstaller/Main and adds it to ScoopInstaller/Versions. If you need such software, you need to manually add the bucket:

scoop bucket add versions
 scoop install python27
-

毕业论文

为什么写这份教程

2022年,我本科毕业了。在开始动手写毕业论文的时候,我尴尬地发现,我对 Word 的掌握程度仅限于调节字体、保存导出这些傻瓜功能。曾想转战 Latex,但论文的段落格式要求调整起来还是用 Word 更为方便,经过一番痛苦缠斗之后,总算是有惊无险地完成了论文的写作和答辩。为了不让后来者重蹈覆辙,遂把相关资源整理成一份开箱即用的文档,供大家参考。

如何用 Word 写毕业论文

正如将大象装进冰箱需要三步,用 Word 写毕业论文也只需要简单三步:

  • 确定论文的格式要求:通常学院都会下发毕业论文的格式要求(各级标题的字体字号、图例和引用的格式等等),如果更为贴心的话甚至会直接给出论文模版(如是此情况请直接跳转到下一步)。很不幸的是,我的学院并没有下发标准的论文格式要求,还提供了一份格式混乱几乎毫无用处的论文模版膈应我,被逼无奈之下我找到了北京大学研究生的论文格式要求,并按照其要求制作了一份模版,大家需要的话自取,本人不承担无法毕业等任何责任。

  • 学习 Word 排版:到达这一步的童鞋分为两类,一是已经拥有了学院提供的标准模版,二是只有一份虚无缥缈的格式要求。那现在当务之急就是学习基础的 Word 排版技术,对于前者可以学会使用模版,对于后者则可以学会制作模版。此时切记不要雄心勃勃地选择一个十几个小时的 Word 教学视频开始头悬梁锥刺股,因为生产一份应付毕业的学术垃圾只要学半小时能上手就够了。我当时看的一个 B 站的教学视频,短小精悍非常实用,全长半小时极速入门。

  • 生产学术垃圾:最容易的一步,大家八仙过海,各显神通吧,祝大家毕业顺利~~

Thesis Writing

Why I Wrote This Tutorial

In 2022, I graduated from my college. When I started writing my thesis, I embarrassingly realized that my command of Word was limited to basic functions like adjusting fonts and saving documents. I considered switching to LaTeX, but formatting requirements for the thesis were more conveniently handled in Word. After a painful struggle, I finally completed the writing and defense of my thesis. To prevent others from following in my footsteps, I compiled relevant resources into a ready-to-use document for everyone's reference.

How to Write a Graduation Thesis in Word

Just as it takes three steps to put an elephant in a fridge, writing a graduation thesis in Word also requires three simple steps:

  1. Determine the Format Requirements of the Thesis: Usually, colleges will provide the formatting requirements for theses (font and size for headings, sections, formatting of figures and citations, etc.), and if you're lucky, they might even provide a thesis template (if so, jump to the next step). Unfortunately, my college did not issue standard format requirements and provided a chaotic and almost useless template. Out of desperation, I found the thesis format requirements of Peking University graduate students and created a template based on their guidelines. Feel free to use it, but I take no responsibility for any issues for using it.

  2. Learn Word Formatting: At this stage, you either have a standard template provided by your college or just a vague set of formatting requirements. Now, the priority is to learn basic Word formatting skills. If you have a template, learn to use it; if not, learn to create one. Remember, there's no need to ambitiously start with a lengthy Word tutorial video. A half-hour tutorial is enough to get started for creating a passable academic paper. I watched a concise and practical Bilibili tutorial video, which is very useful for a quick start.

  3. Produce Academic Work: The easiest step. Everyone has their own way, so unleash your creativity. Best wishes for a smooth graduation!

实用工具箱

下载工具

设计工具

  • excalidraw: 一款手绘风格的绘图工具,非常适合绘制课程报告或者PPT内的示意图。
  • tldraw: 一个绘图工具,适合画流程图,架构图等。
  • draw.io: 强大简洁的在线的绘图网站,支持流程图,UML图,架构图,原型图等等,支持 Onedrive, Google Drive, Github 导出,同时提供离线客户端。
  • origamiway: 手把手教你怎么折纸。
  • thingiverse: 囊括各类 2D/3D 设计资源,其 STL 文件下载可直接 3D 打印。
  • iconfont: 国内最大的图标和插画资源库,可用于开发或绘制系统架构图。
  • turbosquid: 可以购买各式各样的模型。
  • flaticon: 可下载免费且高质量的图标。
  • 标准地图服务系统: 可以下载官方标准地图。
  • PlantUML: 可以使用代码快速编写 UML 图。

编程相关

  • sqlfiddle: 一个简易的在线 SQL Playground。
  • sqlzoo:在线练习 sql 语句。
  • godbolt: 非常方便的编译器探索工具。你可以写一段 C/C++ 代码,选择一款编译器,然后便可以观察生成的具体汇编代码。
  • explainshell: 你是否曾为一段 shell 代码的具体含义感到困扰?manpage 看半天还是不明所以?试试这个网站!
  • regex101: 正则表达式调试网站,支持各种编程语言的匹配标准。
  • typingtom: 针对程序员的打字练习/测速网站。
  • wrk: 网站压测工具。
  • gbmb: 数据单位转换。
  • tools: 在线工具合集。
  • github1s: 用网页版 VS Code 在线阅读 GitHub 代码。
  • visualgo: 算法可视化网站。
  • DataStructureVisual: 数据结构可视化网站。
  • Data Structure Visualizations: 数据结构与算法的可视化网站。
  • learngitbranching: 可视化学习 git。
  • UnicodeCharacter: Unicode 字符集网站。

学习网站

百科网站/词典性质的网站

交流平台

  • GitHub: 许多开源项目的托管平台,也是许多开源项目的主要交流平台,通过查看 issue 可以解决许多问题。
  • StackExchange: Stack Exchange 是由 181 个问答社区组成(其中包括 Stack Overflow)的编程社区。
  • StackOverflow: Stack Overflow 是一个与程序相关的 IT 技术问答网站。
  • Gitee: 一个类似于 GitHub 的代码托管平台,可以在对应项目的 issue 里查找一些常见问题的解答。
  • 知乎: 一个类似于 Quora 的问答社区,可以在其中提问,一些问答包含有计算机的知识。
  • 博客园: 一个面向开发者的知识分享社区,拥有一些常见问题的博客,正确率不能保证,请谨慎使用。
  • CSDN: 拥有一些常见问题的博客,正确率不能保证,请谨慎使用。

杂项

  • tophub: 新闻热榜合集(综合了知乎、微博、百度、微信等)。
  • feedly: 著名的 RSS 订阅源阅读器。
  • speedtest: 在线网络测速网站。
  • public-apis: 公共 API 合集列表。
  • numberempire: 函数求导工具。
  • sustech-application: 南方科技大学经验分享网。
  • vim-adventures: 一款基于 vim 键盘快捷键的在线游戏。
  • vimsnake: 利用 vim 玩贪吃蛇。
  • keybr: 学习盲打的网站。
  • Awesome C++: 很棒的 C/C++ 框架、库、资源精选列表。
  • HelloGitHub: 分享 GitHub 上有趣、入门级的开源项目。

Practical Toolbox

Download Tools

Design Tools

  • excalidraw: A hand-drawn style drawing tool, great for creating diagrams in course reports or PPTs.
  • tldraw: A drawing tool suitable for flowcharts, architecture diagrams, etc.
  • draw.io: A powerful and concise online drawing website, supports flowcharts, UML diagrams, architecture diagrams, prototypes, etc., with export options for Onedrive, Google Drive, Github, and offline client availability.
  • origamiway: Step-by-step origami tutorials.
  • thingiverse: Includes various 2D/3D design resources, with STL files ready for 3D printing.
  • iconfont: The largest icon and illustration library in China, useful for development or drawing system architecture diagrams.
  • turbosquid: A platform to purchase various models.
  • flaticon: A site to download free and high-quality icons.
  • Standard Map Service System: Official standard map downloads.
  • PlantUML: Quickly write UML diagrams using code.
  • sqlfiddle: An easy-to-use online SQL Playground.
  • sqlzoo: Practice SQL statements online.
  • godbolt: A convenient compiler exploration tool. Write some C/C++ code, choose a compiler, and observe the specific assembly code generated.
  • explainshell: Struggling with the meaning of a shell command? Try this site!
  • regex101: A regex debugging site supporting various programming language standards.
  • typingtom: Typing practice/speed test site for programmers.
  • wrk: Website stress testing tool.
  • gbmb: Data unit conversion tool.
  • tools: A collection of online tools.
  • github1s: Read GitHub code online with a web-based VS Code.
  • visualgo: Algorithm visualization website.
  • DataStructureVisual: Data structure visualization website.
  • Data Structure Visualizations: Visualization website for data structures and algorithms.
  • learngitbranching: Visualize learning git.
  • UnicodeCharacter: Unicode character set website.

Learning Websites

Encyclopedic/Dictionarial Websites

Communication Platforms

  • GitHub: Many open-source projects' hosting platform, also a major communication platform for many open-source projects, where issues can solve many problems.
  • StackExchange: A programming community composed of 181 Q&A communities (including Stack Overflow).
  • StackOverflow: An IT technical Q&A site related to programming.
  • Gitee: A code hosting platform similar to GitHub, where you can find solutions to common questions in the issues of corresponding projects.
  • Zhihu: A Q&A community similar to Quora, where you can ask questions, with some answers containing computer knowledge.
  • Cnblogs: A knowledge-sharing community for developers, containing blogs on common questions. Accuracy is not guaranteed, please use with caution.
  • CSDN: Contains blogs on common questions. Accuracy is not guaranteed, please use with caution.

Miscellaneous

  • tophub: A collection of trending news headlines (aggregating from Zhihu, Weibo, Baidu, WeChat, etc.).
  • feedly: A famous RSS feed reader.
  • speedtest: An online network speed testing website.
  • public-apis: A collective list of free APIs for development.
  • numberempire: A tool for calculating derivatives of functions.
  • sustech-application: Southern University of Science and Technology experience sharing website.
  • vim-adventures: An online game based on vim keyboard shortcuts.
  • vimsnake: Play the snake game using vim commands.
  • keybr: A website for learning touch typing.
  • Awesome C++: A curated list of awesome C/C++ frameworks, libraries, resources.
  • HelloGitHub: Shares interesting and beginner-friendly open-source projects on GitHub.

Notes Workflow

Contributed by @HardwayLinka

计算机领域的知识覆盖面很广并且更新速度很快,因此保持终身学习的习惯很重要。但在日常开发和学习的过程中,我们获取知识的来源相对复杂且细碎。有成百上千页的文档手册,也有寥寥数语的博客,甚至闲暇时手机上划过的某则新闻和公众号都有可能包含我们感兴趣的知识。因此,如何利用现有的各类工具,形成一套适合自己的学习工作流,将不同来源的知识碎片整合进属于自己的知识库,方便之后的查阅与复习,就显得尤为重要。经过两年工作之余的学习后,我磨合出了以下学习工作流:

底层核心逻辑

一开始我学习新知识时会参考中文博客,但在代码实践时往往会发现漏洞和bug。我逐渐意识到我参考的信息可能是错误的,毕竟发博客的门槛低,文章可信度不高,于是我开始查阅一些相关的中文书籍。

中文书籍的确是比较全面且系统地讲解了知识点,但众所周知,计算机技术更迭迅速,又因为老美在 CS 方面一直都是灯塔,所以一般中文书籍里的内容会滞后于当前最新的知识,导致我跟着中文书籍实践会出现软件版本差异的问题。这时我开始意识到一手信息的重要性,有些中文书籍是翻译英文书籍的,一般翻译一本书也要一两年,这会导致信息传递的延迟,还有就是翻译的过程中信息会有损失。如果一本中文书籍不是翻译的呢,那么它大概率也参考了其他书籍,参考的过程会带有对英文原著中语义理解的偏差。

于是我就顺其自然地开始翻阅英文书籍。不得不说,英文书籍内容的质量整体是比中文书籍高的。后来随着学习的层层深入,以知识的时效性和完整性出发,我发现 源代码 > 官方文档 > 英文书籍 > 英文博客 > 中文博客,最后我得出了一张 信息损失图

虽然一手信息很重要,但后面的 N 手信息并非一无是处,因为这 N 手资料里包含了作者对源知识的转化——例如基于某种逻辑的梳理(流程图、思维导图等)或是一些自己的理解(对源知识的抽象、类比、延伸到其他知识点),这些转化可以帮助我们更快地掌握和巩固知识的核心内容,就如同初高中学习时使用的辅导书。 此外,学习的过程中和别人的交流十分重要,这些 N 手信息同时起了和其他作者交流的作用,让我们能采百家之长。所以这提示我们学习一个知识点时先尽量选择质量更高的,信息损失较少的信息源,同时不妨参考多个信息源,让自己的理解更加全面准确。

现实工作生活中的学习很难像学校里一样围绕某个单一知识点由浅入深,经常会在学习过程中涉及到其他知识点,比如一些新的专有名词,一篇没有读过的经典论文,一段未曾接触过的代码等等。这就要求我们勤于思考,刨根究底地“递归”学习,给多个知识点之间建立联系。

选择合适的笔记软件

工作流的骨架围绕 单个知识点多参考源,勤于提问给多个知识点之间建立联系 的底层核心逻辑建立。我们写论文其实就是遵循这个底层逻辑的。论文一般会有脚注去解释一些关键字,并且论文末尾会有多个参考的来源,但是我们平时写笔记会随意得多,因此需要更灵活的方式。

平时写代码习惯在 IDE 里一键跳转,把相关的函数和实现很好地联系在了一起。你也许会想,如果笔记也能像代码那样可以跳转就好了。现在市面上 双链笔记软件 就可以很好地解决这一痛点,例如 Roam Research、Logseq、Notion 和 Obsidian。Roam Research 和 Logseq 都是基于大纲结构的笔记软件,而 大纲结构 是劝退我使用这两款软件的原因。一是 大纲结构 做笔记容易使文章纵向篇幅太长,二是如果嵌套结构过多会占横向的篇幅。Notion 页面打开慢,弃之。最终我选择了 Obsidian,原因如下:

  • Obsidian 基于本地,打开速度快,且可存放很多电子书。我的笔记本是 32g 内存的华硕天选一代,拿来跑 Obsidian 可以快到飞起
  • Obsidian 基于 Markdown。这也是一个优势,如果笔记软件写的笔记格式是自家的编码格式,那么不方便其他第三方拓展,也不方便将笔记用其他软件打开,比如 qq 音乐下载歌曲有自己的格式,其他播放器播放不了,这挺恶心人的
  • Obsidian 有丰富的插件生态,并且这个生态既大又活跃,即插件数量多,且热门插件的 star 多,开发者会反馈用户 issue,版本会持续迭代。借助这些插件,可以使 Osidian 达到 all in one 的效果,即各类知识来源可以统一整合于一处

信息的来源

Obsidian 的插件使其可以支持 pdf 格式,而其本身又支持 Markdown 格式。如果想要 all in one,那么可以基于这两个格式,将其他格式文件转换为 pdf 或者 Markdown。 那么现在就面临着两个问题:

  • 有什么格式
  • 怎么转换为 pdf 或 Markdown

有什么格式

文件格式依托于其展示的平台,所以在看有什么格式之前,可以罗列一下我平时获取信息的来源:

可以看到主要分为文章论文电子书课程四类,包含的格式主要有 网页pdfmobiazwazw3

怎么转换为 pdf 或 Markdown

在线的文章和课程等大多以网页形式呈现,而将网页转换为 Markdown 可以使用剪藏软件,它可以将网页文章转换为多种文本格式文件。我选择的工具是简悦,使用简悦可以将几乎所有平台的文章很好地剪藏为 Markdown 并且导入到 Obsidian。

对于论文和电子书而言如果格式本身就是 pdf 则万事大吉,但如果是其他格式则可以使用 calibre 进行转换:

现在利用 Obsidian 的 pdf 插件和其原生的 markdown 支持就可以畅快无比地做笔记并且在这些文章的对应章节进行无缝衔接地引用跳转啦(具体操作参考下文的“信息的处理”模块)。

如何统一管理信息来源

对于 pdf 等文件类资源可以本地或者云端存储,而网页类资源则可以分门别类地放入浏览器的收藏夹,或者剪藏成 markdown 格式的笔记,但是网页浏览器不能实现移动端的网页收藏。为了实现跨端网页收藏我选用了 Cubox,在手机端看到感兴趣的网页时只需小手一划,便能将网页统一保存下来。虽然免费版只能收藏 100 个网页,但其实够用了,还可以在收藏满时督促自己赶紧剪藏消化掉这些网页,让收藏不吃灰。

除此之外,回想一下我们平时收藏的网页,就会发现有很多并不是像知乎、掘金这类有完整功能的博客平台,更多的是个人建的小站,而这些小站往往没有移动端应用,这样平时刷手机的时候也看不到,放到浏览器的收藏夹里又容易漏了看,有新文章发布我们也不能第一时间收到通知,这个时候就需要一种叫 RSS 的通信协议。

RSS(英文全称:RDF Site Summary 或 Really Simple Syndication),中文译作简易信息聚合,也称聚合内容,是一种消息来源格式规范,用以聚合多个网站更新的内容并自动通知网站订阅者。电脑端可以借助 RSSHub Radar 来快速发现和生成 RSS 订阅源,接着使用 Feedly 来订阅这些 RSS 订阅源(RSSHub RadarFeedly 在 chrome 浏览器中均有官方插件)。

到这里为止,收集信息的流程已经比较完备了。但资料再多,分类规整得再漂亮,也得真正内化成自己的才管用。因此在收集完信息后就得进一步地处理信息,即阅读这些信息,如果是英文信息的话还得搞懂英文的语义,加粗高亮重点句子段落,标记有疑问的地方,发散联想相关的知识点,最后写上自己的总结。那么在这过程中需要使用到什么工具呢?

信息的处理

英文信息

面对英文的资料,我以前是用 有道词典 来划词翻译,遇到句子的话就使用谷歌翻译,遇到大段落时就使用 deepl,久而久之,发现这样看英语文献太慢了,得用三个工具才能满足翻译这一个需求,如果有一个工具能够同时实现对单词、句子和段落的划词翻译就好了。我联想到研究生们应该会经常接触英语文献,于是我就搜 研究生 + 翻译软件,在检索结果里我最终选择了 Quicker + 沙拉查词 这个搭配来进行划词翻译。

使用这套组合可以实现在浏览器外的其他软件内进行划词翻译,并且支持单词、句子和段落的翻译,以及每次的翻译会有多个翻译平台的结果。btw,如果查单词时不着急的话,可以顺便看看 科林斯高阶 的翻译,这个词典的优点就是会用英文去解释英文,可以提供多个上下文帮助你理解,对于学习英文单词也有帮助,因为用英文解释英文才更接近英语的思维。

多媒体信息

处理完文本类的信息后,我们还得思考一下怎么处理多媒体类的信息。此处的多媒体我特指英文视频,因为我没有用播客或录音学习的习惯,而且我已经基本不看中文教程了。现在很多国外名校公开课都是以视频的形式,如果能对视频进行做笔记会不会有帮助呢?不知道大家有没这样的想法,就是如果能把老师上课讲的内容转换成文本就好了,因为平时学习时我们看书的速度往往会比老师讲课的速度快。刚好 Language Reactor 这个软件可以将油管和网飞内视频的字幕导出来,同时附上中文翻译。

我们可以把 Language Reactor 导出的字幕复制到 Obsidian 里面作为文章来读。除了出于学习的需求,也可以在平时看油管的视频时打开这个插件,这个插件可以同时显示中英文字幕,并且可以单击选中英文字幕中你认为生僻的单词后显示单词释义。

但阅读文本对于一些抽象的知识点来说并不是效率最高的学习方式。俗话说,一图胜千言,能不能将某一段知识点的文本和对应的图片甚至视频画面操作联系起来呢?我在浏览 Obsidian 的插件市场时,发现了一个叫 Media Extended 的插件,这个插件可以在你的笔记里添加跳转到视频指定时间进度的链接,相当于把你的笔记和视频连接起来了!这刚好可以和我上文提到的生成视频中英文字幕搭配起来,即每一句字幕对应一个时间,并且能根据时间点跳转到视频的指定进度,如此一来如果需要在文章中展示记录了操作过程的视频的话,就不需要自己去截取对应的视频片段,而是直接在文章内就能跳转!

Obsidian 里还有一个很强大的插件,叫 Annotator,它可以实现笔记内跳转到 pdf 原文

现在,使用 Obsidian 自带的双链功能,可以实现笔记间相互跳转,结合上述两个插件,可以实现笔记到多媒体的跳转,信息的处理过程已经完备。一般我们学习的过程相当于上山和下山,刚学的时候就好像上山,很陌生、吃力,所谓学而时习之,复习或练习的过程就像下山,没有陌生感,不见得轻松,但非走不可。那么如何把复习这一过程纳入工作流的环节里呢?

信息的回顾

Obsidian 内已经有一个连接 Anki 的插件,Anki 就是大名鼎鼎的、基于间隔重复的记忆软件。使用该插件可以截取笔记的片段导出到 Anki 并变成一张卡片,卡片内也有跳转回笔记原文的链接

总结

这个工作流是在我这两年业余时间学习时所慢慢形成的,在学习过程中因为对一些重复性的过程而感到厌倦,正是这种厌倦产生了某种特定的需求,恰好在平时网上冲浪时了解到的一些工具满足了我这些需求。不要为了虚无的满足感而将工具强行拼凑到自己的工作流中,人生苦短,做实事最紧要。

btw,此篇文章是讲解工作流的演化思路,如果对此工作流的实现细节感兴趣,建议阅读完本文后再按顺序阅读以下文章

  1. 3000 + 小时积累的学习工作流
  2. Obsidian 的高级玩法 | 打造能跳转到任何格式文件的笔记

Notes Workflow

Contributed by @HardwayLinka

The field of computer science is vast and rapidly evolving, making lifelong learning crucial. However, our sources of knowledge in daily development and learning are complex and fragmented. We encounter extensive documentation manuals, brief blogs, and even snippets of news and public accounts on our phones that may contain interesting knowledge. Therefore, it's vital to use various tools to create a learning workflow that suits you, integrating these knowledge fragments into your personal knowledge base for easy reference and review. After two years of learning alongside work, I have developed the following learning workflow:

Core Logic

Initially, when learning new knowledge, I referred to Chinese blogs but often found bugs and gaps in my code practice. Gradually, I realized that the information I referred to might be incorrect, as the threshold for posting blogs is low and their credibility is not high. So, I started consulting some related Chinese books.

Chinese books indeed provide a comprehensive and systematic explanation of concepts. However, given the rapid evolution of computer technology and the US's leadership in CS, content in Chinese books often lags behind the latest knowledge. This led me to realize the importance of firsthand information. Some Chinese books are translations of English ones, and translation can take a year or two, causing a delay in information transmission and loss during translation. If a Chinese book is not a translation, it likely references other books, introducing biases in interpreting the original English text.

Therefore, I naturally started reading English books. The quality of English books is generally higher than that of Chinese ones. As I delved deeper into my studies, I discovered a hierarchy of information reliability: source code > official documentation > English books > English blogs > Chinese blogs. This led me to create an "Information Loss Chart":

Although firsthand information is crucial, subsequent iterations (N-th hand information) are not useless. They include the author's transformation of the source knowledge — such as logical organization (flow charts, mind maps) or personal interpretations (abstractions, analogies, extensions to other knowledge points). These transformations can help us quickly grasp and consolidate core knowledge, like using guidebooks in school. Moreover, interacting with others' interpretations during learning is important, allowing us to benefit from various perspectives. Hence, it's advisable to first choose high-quality, less distorted sources of information while also considering multiple sources for a more comprehensive and accurate understanding.

In real-life work and study, learning rarely follows a linear, deep dive into a single topic. Often, it involves other knowledge points, such as new jargon, classic papers not yet read, or unfamiliar code snippets. This requires us to think deeply and "recursively" learn, establishing connections between multiple knowledge points.

Choosing the Right Note-taking Software

The backbone of the workflow is built around the core logic of "multiple references for a single knowledge point and building connections among various points." This is similar to writing academic papers. Papers usually have footnotes explaining keywords and multiple references at the end. But our daily notes are much more casual, hence the need for a more flexible method.

I'm accustomed to jumping to related functions and implementations in an IDE. It would be great if notes could also be interlinked like code. Current "double-link note-taking software," such as Roam Research, Logseq, Notion, and Obsidian, addresses this need. I chose Obsidian for the following reasons:

  • Obsidian is based locally, with fast opening speeds, and can store many e-books. My laptop, an Asus TUF Gaming FX505 with 32GB of RAM, runs Obsidian very smoothly.
  • Obsidian is Markdown-based. This is an advantage because if a note-taking software uses a proprietary format, it's inconvenient for third-party extensions and opening notes with other software.
  • Obsidian has a rich and active plugin ecosystem, allowing for an "all in one" effect, meaning various knowledge sources can be integrated in one place.

Information Sources

Obsidian's plugins support PDF formats, and it naturally supports Markdown. To achieve "all in one," you can convert other file formats to PDF or Markdown. This presents two questions:

  • What formats are there?
  • How to convert them to PDF or Markdown?

Formats

File formats depend on their display platforms. Before considering formats, let's list the sources of information I usually access:

![](https://cdn.sspai.com/2022/10

/11/07e97f372850054958d4961a3787a93f.png)

The main categories are articles, papers, e-books, and courses, primarily including formats like web pages, PDFs, MOBI, AZW, and AZW3.

Conversion to PDF or Markdown

Online articles and courses are mostly presented as web pages. To convert web pages to Markdown, I use the clipping software "Simplified Read," which can clip articles from nearly all platforms into Markdown and import them into Obsidian.

For papers and e-books, if the format is already PDF, it's straightforward. Otherwise, I use Calibre for conversion:

Now, using Obsidian's PDF plugin and native Markdown support, I can seamlessly take notes and reference across these documents (see "Information Processing" below for details).

Managing Information Sources

For file resources like PDFs, I use local or cloud storage. For web resources, I categorize and save them in browser bookmarks or clip them into Markdown notes. However, browsers don't support mobile web bookmarking. To enable cross-platform web bookmarking, I use Cubox. With a swipe on my phone, I can save interesting web pages in one place. Although the free version limits to 100 bookmarks, it's usually sufficient and prompts me to process these pages promptly.

Moreover, many of the web pages we bookmark are not from fully-featured blog platforms like Zhihu or Juejin but personal sites without mobile apps. These can be easily overlooked in browser bookmarks, and we might miss new article notifications. Here, RSS comes into play.

RSS (Rich Site Summary) is a type of web feed that allows users to access updates to online content in a standardized format. On desktops, RSSHub Radar helps discover and generate RSS feeds, which can be subscribed to using Feedly (both have official Chrome browser plugins).

With this, the information collection process is comprehensive. But no matter how well categorized, information needs to be internalized to be useful. After collecting information, the next step is processing it — reading, understanding the semantics (especially for English sources), highlighting key sentences or paragraphs, noting queries, brainstorming related knowledge points, and writing summaries. What tools are needed for this process?

Information Processing

English Sources

For English materials, I initially used "Youdao Dictionary" for word translation, Google Translate for sentences, and "Deepl" for paragraphs. Eventually, I realized this was too slow and inefficient. Ideally, a single tool that can handle word, sentence, and paragraph translation would be optimal. After researching, I chose "Quicker" + "Saladict" for translation.

This combo allows translation outside browsers and supports words, sentences, and paragraphs, offering results from multiple translation platforms. For non-urgent word lookups, the "Collins Advanced" dictionary is helpful as it explains English words in English, providing context to aid understanding.

Multimedia Information

After processing text-based information, it's important to consider how to handle multimedia information. Specifically, I'm referring to English videos, as I don't have a habit of learning through podcasts or recordings and I rarely watch Chinese tutorials anymore. Many renowned universities offer open courses in video format. Wouldn't it be helpful if you could take notes on these videos? Have you ever thought it would be great if you could convert the content of a lecture into text, since we usually read faster than a lecturer speaks? Fortunately, the software Language Reactor can export subtitles from YouTube and Netflix videos, along with Chinese translations.

We can copy the subtitles exported by Language Reactor into Obsidian and read them as articles. Besides learning purposes, you can also use this plugin while watching YouTube videos. It displays subtitles in both English and Chinese, and you can click on unfamiliar words in the subtitles to see their definitions.

However, reading texts isn't always the most efficient way to learn about some abstract concepts. As the saying goes, "A picture is worth a thousand words." What if we could link a segment of text to corresponding images or even video operations? While browsing the Obsidian plugin marketplace, I discovered a plugin called Media Extended. This plugin allows you to add links in your notes that jump to specific times in a video, effectively connecting your notes to the video! This works well with the video subtitles mentioned earlier, where each line of subtitles corresponds to a time stamp, allowing for jumps to specific parts of the video. This means you don't have to cut specific video segments; instead, you can jump directly within the article!

Obsidian also has a powerful plugin called Annotator, which allows you to jump from notes to the corresponding section in a PDF.

Now, with Obsidian's built-in double-chain feature, we can achieve inter-note linking, and with the above plugins, we can extend these links to multimedia. This completes the process of information handling. Learning often involves both a challenging ascent and a familiar descent. So, how can we incorporate the review process into this workflow?

Information Review

Obsidian already has a plugin that connects to Anki, the renowned spaced repetition-based memory software. With this plugin, you can export segments of your notes to Anki as flashcards, each containing a link back to the original note.

Conclusion

This workflow evolved over two years of learning in my spare time. Frustration with repetitive processes led to specific needs, which were fortunately met by tools I discovered online. Don't force tools into your workflow just for the sake of satisfaction; life is short, so focus on what's truly important.

By the way, this article discusses the evolution of the workflow. If you're interested in the details of how this workflow is implemented, I recommend reading the following articles in order after this one:

  1. 3000+ Hours Accumulated Learning Workflow
  2. Advanced Techniques in Obsidian | Creating Notes that Link to Any File Format

信息检索

前言

碰到问题,记住第一件事是 翻阅文档 ,不要一开始就直接搜索或者找人问,翻阅FAQ可能会快速找到答案。

信息检索,我的理解来说,实际上就是灵活运用搜索引擎中,方便快捷的搜到需要的信息,包括但不限于编程。

编程最重要的,就是 STFW(search the fucking web) 和 RTFM(read the fucking Manual) ,首先要读文档,第二要学会搜索,网上那么多资源,怎么用,就需要信息检索。

要搜索,我们首先要搞清楚搜索引擎是如何工作的:

搜索引擎工作原理

搜索引擎的工作过程大体可以分成三阶段:[^1]

  1. 爬行和抓取:搜索引擎蜘蛛通过跟踪链接访问网页,获取网页 HTML 代码存入数据库。
  2. 预处理:索引程序对抓取来的网页数据进行文字提取,中文分词,索引等处理,以备排名程序调用。
  3. 排名:用户输入关键词后,排名程序调用索引库数据,计算相关性,然后按一定格式生成搜索结果页面。

第一步,就是大家经常听说的网络爬虫,一般 Python 卖课的都会吹这个东西。简单可以理解为,我用一个自动的程序,下载网站中的所有文本、图片等相关信息,然后存入本地的磁盘。

第二步是搜索引擎的核心,但是对于我们使用来说,并不是特别关键,大致可以理解为洗干净数据,然后入库页面,每个页面加入关键字等信息方便我们查询。

第三步跟我们息息相关,不管是什么搜索网站, google 、百度、 Bing ,都一样,输入关键字或者需要查询的内容,搜索引擎会给你返回结果。本文就是教你如何获取更好的结果。

基础搜索技巧

根据上述的工作原理,我们大致就能明白,其实可以把搜索引擎当作一个比较聪明的数据库,更好的使用查询条件就能更快速的找到你想要的信息,下面介绍一些搜索的技巧:

使用英文

首先我们要知道一件事,编程中,最好使用英文搜索。原因主要有几点:

  1. 编程和各种软件操作中,英文资料质量比中文资料和其他语言资料高,英文通用性还是更好些
  2. 因为翻译问题,英文的名词比中文准确通用
  3. 中文搜索中,分词系统不准会导致歧义,比如 Google 搜中文可能会搜不出几条有用结果

如果你英文不好,用百度翻译或者搜狗翻译,足够了。

当然下面的文档为了举例方便,都还是用中文例子。

提炼关键词

搜索时不要搜索整句话,虽然搜索引擎会自动帮助我们分词检索,但是整句和关键字搜索出来的结果再准确度和顺序上会有很大差别。搜索引擎是机器,并不是你的老师或者同事,看上面的流程,搜索实际上是去检索搜索引擎爬出来的数据库,你可以理解为关键字比模糊检索要快而且准确。

我们需要提炼问题,确定我们到底需要解决什么问题。

例如,我想知道 vcpkg 如何集成到工程上而不是全局中,那么搜索 vcpkg如何集成到工程上而不是全局中 这种长句可能无法找到相关的结果,最好是拆分成单词,vcpkg 集成到 工程 全局 这样的搜索。其实这里只是举个例子,针对本条其实都能搜索出相关信息,但是越具体的问题,机器分词越可能出问题,所以最好是拆分关键字,使用词组或者断句来进行搜索。

替换关键字

还是上面那个例子,如果搜不出来,可以试试把工程换成项目,或者移出集成,如果不行,试一下高级搜索。

高级搜索

普通搜索引擎一般都支持高级搜索,包括 google , bing ,百度, ecosia ,等等,大部分都支持,不过可能语法不同,一般通用的表示:

  • 精准匹配: 精准匹配能保证搜索关键词完全被匹配上,一般是用双引号括起来
  • 比如搜索线性代数,可以在输入框内输入 "线性代数",搜索引擎将只匹配完整包含 “线性代数” 的页面,而不会搜索拆分成线性和代数两个词的页面
  • 不包含关键字: 用 - 减号连接关键字,用于排除某些干扰词
  • 包含关键字: 用 + 加号连接关键字
  • 搜索特定文件类型: filetype:pdf 直接搜索 pdf 文件
  • 搜索特定网址: site:stackoverflow.com 只搜索特定网站内的页面

一般可以参照网站说明,比如百度可以参照 高级搜索 ,Bing 可以参照 高级搜索关键字高级搜索选项

GitHub 的高级搜索

可以直接用 高级搜索页面 进行搜索,也可以参照 Github查询语法 进行查找,简单说几个:

  • in:name <关键字> 仓库名称带关键字查询
  • in:description <关键字> 仓库描述带关键字查询
  • in:readme <关键字> README 文件带关键字查询
  • stars(fork): >(=) <数字> <关键字> star 或 fork 数大于(或等于)指定数字的带关键字查询
  • stars(fork): 10..20 <关键词> star 或 fork 数在 10 到 20 之间的带关键字查询
  • size:>=5000 <关键词> 限定仓库大于等于 5000K 的带关键字查询
  • pushed(created):>2019-11-15 <关键字> 更新 或 创建 日期在 2019 年 11 月 16 日之后的带关键字查询
  • license:apache-2.0 <关键字> LICENSE 为 apache-2.0 的带关键字查询
  • language:java <关键词> 仓库语言为 Java 的带关键字查询
  • user:<用户名> 查询某个用户的项目
  • org:<组织名> 查询某个组织的项目 这些可以混合使用,也可以先查找某一类的 awesome 仓库,然后从 awesome 库里找相关的资源,github 里有很多归纳仓库,可以先看看已有的收集,有时候会节省很多时间

更多技巧

使用中,实际上我会去特定网站找一些问题:

  • 如果是语言本身相关,比如 c++/Qt/OpenGL 如何实现什么功能,可以直接加上 site:stackoverflow.com
  • 如果是具体的业务/开发环境或者软件相关,可以先在 BugList 、IssueList ,或者相关论坛里先找一下,比如 Qt 的问题就可以直接去 Qt 论坛,QGis 或者 GDAL 相关问题可以在 stackExchange 里去搜
  • QQ 群也是一个提问的地方,但是需要你提的问题有意义,否则大部分人不会回你,而且 QQ 群回复并不及时。
  • 知乎专栏、简书、博客园、 CSDN 中有大量中文笔记,这些都是别人嚼烂了的东西,基本是别人踩坑的经验

关于百度

大部分编程人都会告诉你别用百度,用 Google 或者 Bing 国际版,但是 Bing 中文搜索的准确率并不高, Google 需要科学上网,如果真的需要,可以使用 Ecosia 、 Yandex 之类的搜索引擎。而且中文搜索来说,百度可能还真是最好的。

百度的问题主要在于排序算法,可能两页都没啥对的内容,但是收录比 Bing 还是好一些的(百度以前并不遵守 robots.txt ,会抓取所有页面,所以有些个人网站甚至专门对百度做了屏蔽),甚至有时候比 Google 好。从数据库来说,百度比 Google 和 Bing 收录的中文内容要多,如果你碰到的时中文相关的问题而且确实找不到相关内容,那么就用百度,搜索引擎是工具,能用好用才是王道。

代码搜索

我们除了搜索引擎查找问题,还有可能会搜一些代码,可能是自己写的,也可能是项目中的,下面推荐一些工具:

代码检索有两种,第一是本地的代码检索,第二是要写个啥算法,需要在网上搜索

本地代码搜索

  • ACK 或者 ACK2,老牌搜索工具,perl 写的
  • The Silver Searcher c 实现的
  • The Platinum Searcher go 实现的
  • FreeCommander 自带的搜索,如果是固态硬盘速度还不错
  • IDE 自带的,搜索有些时候并不太好用

开源代码搜索

Information Retrieval

Introduction

When encountering a problem, remember the first thing is to read the documentation. Don't start by searching online or asking others directly. Reviewing FAQs may quickly provide the answer.

Information retrieval, as I understand it, is essentially about skillfully using search engines to quickly find the information you need, including but not limited to programming.

The most important thing in programming is STFW (search the fucking web) and RTFM (read the fucking manual). First, you should read the documentation, and second, learn to search. With so many resources online, how you use them depends on your information retrieval skills.

To understand how to search effectively, we first need to understand how search engines work.

How Search Engines Work

The working process of a search engine can generally be divided into three stages: 1

  1. Crawling and Fetching: Search engine spiders visit web pages by tracking links, obtain the HTML code of the pages, and store it in a database.
  2. Preprocessing: The indexing program processes the fetched web page data by extracting text, segmenting Chinese words, indexing, etc., preparing for the ranking program.
  3. Ranking: When users enter keywords, the ranking program uses the indexed data to calculate relevance and then generates the search results page in a specific format.

The first step involves web crawlers, often exaggerated in Python courses. It can be simply understood as using an automated program to download all text, images, and related information from websites and store them locally.

The second step is the core of a search engine, but not critical for users to understand. It can be roughly understood as cleaning data and indexing pages, each with keywords for easy querying.

The third step is closely related to us. Whether it's Google, Baidu, Bing, or others, you input keywords or queries, and the search engine returns results. This article teaches you how to obtain better results.

Basic Search Techniques

Based on the above working principles, we can roughly understand that a search engine can be treated as a smart database. Using better query conditions can help you find the information you need faster. Here are some search techniques:

Use English

First, it's important to know that in programming, it's best to search in English. Reasons include:

  1. In programming and various software operations, English resources are of higher quality than those in Chinese or other languages.
  2. Due to translation issues, English terms are more accurate and universally applicable than Chinese.
  3. Chinese search engines' word segmentation systems can lead to ambiguity. For example, Google searches in Chinese may not yield many useful results.

If your English is not strong, use translation tools like Baidu or Sogou; they are sufficient.

Refine Keywords

Don't search whole sentences. Although search engines automatically segment words, searching with whole sentences versus keywords can yield significantly different results in accuracy and order. Search engines are machines, not your teachers or colleagues. As mentioned above, searching is actually querying a database crawled by the search engine, so it's better to break down into keywords or phrases.

For example, if you want to know how to integrate vcpkg into a project instead of globally, searching for "如何将vcpkg集成到项目中而不是全局" in a long sentence may not yield relevant results. It's better to break it down into keywords like "vcpkg 集成 项目 全局".

Replace Keywords

If you can't find what you're looking for, try replacing "项目" with "工程" or remove "集成". If that doesn't work, try advanced searching.

Advanced Searching

Most search engines support advanced searching, including Google, Bing, Baidu, Ecosia, etc. Common formats include:

  • Exact Match: Enclose the search term in quotes for precise matching.
  • Exclude Keywords: Use a minus sign (-) to exclude specific words.
  • Include Keywords: Use a plus sign (+) to ensure a keyword is included.
  • Search Specific File Types: Use filetype:pdf to search for PDF files directly.
  • Search Specific Websites: Use site:stackoverflow.com to search within a specific site.

Refer to the website instructions for specific syntax, such as Baidu Advanced Search or Bing Advanced Search Keywords.

Use GitHub's Advanced Search page or refer to GitHub Query Syntax for advanced searches on GitHub. Examples include searching by repository name, description, readme, stars, fork count, size, update/creation date, license, language, user, and organization. These can be

used in combination.

More Tips

Depending on the context, I recommend specific sites for certain queries:

  • For language-specific queries (e.g., C++/Qt/OpenGL), add site:stackoverflow.com.
  • For specific business/development environments or software-related issues, first check BugLists, IssueLists, or relevant forums.
  • QQ groups are also a place to ask questions, but make sure your queries are meaningful.
  • Chinese platforms like Zhihu, Jian Shu, Blog Park, and CSDN have a wealth of Chinese notes and experiences.

About Baidu

Many programmers advise against using Baidu, preferring Google or Bing International. However, if you really need it, consider using alternatives like Ecosia or Yandex. For Chinese searches, Baidu might actually be the best option due to its database and indexing policies.

In addition to search engines, you might also need to search for code, either your own or from projects. Here are some recommended tools:

  • ACK or ACK2, well-established search tools written in Perl.
  • The Silver Searcher, implemented in C.
  • The Platinum Searcher, implemented in Go.
  • FreeCommander's built-in search, efficient on solid-state drives.
  • IDE's built-in search, though not always the most user-friendly.
  • Searchcode for searching open source code, known for speed.
  • 一行代码 a useful Chinese tool for code search.

翻墙

此链接出现在这里纯属二进制 bit 的随意组合,与本人毫无关系。

GFW

This link appears here purely as a random combination of binary bits and has nothing to do with me.

NJU OS: Operating System Design and Implementation

课程简介

  • 所属大学:南京大学
  • 先修要求:体系结构 + 扎实的 C 语言功底
  • 编程语言:C 语言
  • 课程难度:🌟🌟🌟🌟
  • 预计学时:150 小时

之前一直听说南大的蒋炎岩老师开设的操作系统课程讲得很好,久闻不如一见,这学期有幸在 B 站观看了蒋老师的课程视频,确实收获良多。蒋老师作为非常年轻的老师,有着丰富的一线代码的经验,因此课程讲授有着满满的 Hacker 风格,课上经常“一言不合”就在命令行里开始写代码,很多重要知识点也都配有生动直白的代码示例。让我印象最为深刻的就是老师为了让学生更好地理解动态链接库的设计思想,甚至专门实现了一个迷你的可执行文件与一系列的二进制工具,让很多困扰我多年的问题都得到了解答。

这门课的讲授思路也非常有趣,蒋老师先从“程序就是状态机”这一视角入手,为“万恶之源”并发程序建立了状态机的转化模型,并在此基础上讲授了并发控制的常见手段以及并发 bug 的应对方法。接着蒋老师将操作系统看作一系列对象(进程/线程、地址空间、文件、设备等等)以及操作它们的 API (系统调用)并结合丰富的实际例子介绍了操作系统是如何利用这系列对象虚拟化硬件资源并给应用软件提供各类服务的。最后的可持久化部分,蒋老师从 1-bit 的存储介质讲起,一步步构建起各类存储设备,并通过设备驱动抽象出一组接口来方便地设计与实现文件系统。我之前虽然上过许多门操作系统的课程,但这种讲法确实独此一家,让我收获了很多独到的视角来看待系统软件。

这门课除了在理论知识的讲授部分很有新意外,注重实践也是蒋老师的一大特点。在课堂和编程作业里,蒋老师会有意无意地培养大家阅读源码、查阅手册的能力,这也是计算机从业者必备的技能。在完成第五个 MiniLab 期间,我第一次仔仔细细阅读了微软的 FAT 文件系统手册,收获了一次非常有价值的经历。

编程作业共由 5个 MiniLab 和 4个 OSLab 组成。美中不足的是作业的评测机是不对校外开放的,不过在邮件“骚扰”后蒋老师还是非常慷慨地让我成功蹭课。由于课余时间有限我只完成了 5个 MiniLab,总体体验非常棒。尤其是第二个协程实验让我印象最为深刻,在不到百行的小实验里深刻体验了上下文切换的美妙与“可怕”。另外其实几个 MiniLab 都能非常方便地进行本地测试,就算没有评测机也不影响自学,因此希望大家不要聚众“骚扰”老师以图蹭课。

最后再次感谢蒋老师设计并开放了这样一门非常棒的操作系统课程,这也是本书收录的第一门国内高校自主开设的计算机课程。正是有蒋老师这些年轻的新生代教师在繁重的 Tenure 考核之余的用爱发电,才让无数学子收获了难忘的本科生涯。也期待国内能有更多这样的良心好课,我也会第一时间收录进本书中让更多人受益。

课程资源

资源汇总

按蒋老师的要求,我的作业实现没有开源。

NJU OS: Operating System Design and Implementation

Course Introduction

  • University: Nanjing University
  • Prerequisites: Computer Architecture + Solid C programming skills
  • Programming Language: C
  • Course Difficulty: 🌟🌟🌟🌟
  • Estimated Study Time: 150 hours

I had always heard that the operating system course taught by Professor Yanyan Jiang at Nanjing University was excellent. This semester, I had the opportunity to watch his lectures on Bilibili and gained a lot. As a young professor with rich coding experience, his teaching is full of a hacker's spirit. Often in class, he would start coding in the command line on a whim, and many important points were illustrated with vivid and straightforward code examples. What struck me most was when he implemented a mini executable file and a series of binary tools to help students better understand the design philosophy of dynamic link libraries, solving many problems that had puzzled me for years.

In the course, Prof. Jiang starts from the perspective that "programs are state machines" to establish an explainable model for the "root of all evil" concurrent programs. Based on this, he discusses common methods of concurrency control and strategies for dealing with concurrency bugs. Then, he views the operating system as a series of objects (processes/threads, address spaces, files, devices, etc.) and their APIs (system calls), combined with rich practical examples to show how operating systems use these objects to virtualize hardware resources and provide various services to application software. In the final part about persistence, he builds up various storage devices from 1-bit storage media and abstracts a set of interfaces through device drivers to facilitate the design and implementation of file systems. Although I have taken many operating system courses before, this unique approach has given me many unique perspectives on system software.

In addition to its innovative theoretical instruction, the course's emphasis on practice is a key feature of Prof. Jiang's teaching. In class and through programming assignments, he subtly cultivates the ability to read source code and consult manuals, which are essential skills for computer professionals. During the fifth MiniLab, I read Microsoft's FAT file system manual in detail for the first time, gaining a very valuable experience.

The programming assignments consist of 5 MiniLabs and 4 OSLabs. Unfortunately, the grading system is only open to students at Nanjing University. However, Professor Jiang generously allowed me to participate after I emailed him. I completed the 5 MiniLabs, and the overall experience was excellent. Particularly, the second coroutine experiment left a deep impression on me, where I experienced the beauty and "terror" of context switching in a small experiment of less than a hundred lines. Also, the MiniLabs can be easily tested locally, so the lack of a grading system should not hinder self-learning. Therefore, I hope others will not collectively "harass" the professor for access.

Finally, I want to thank Professor Jiang again for designing and offering such an excellent operating system course, the first independently developed computer course from a domestic university included in this book. It's thanks to young, new-generation teachers like Professor Jiang, who teach with passion despite the heavy Tenure track evaluation, that many students have an unforgettable undergraduate experience. I also look forward to more such high-quality courses in China, which I will include in this book for the benefit of more people.

Course Resources

Resource Summary

As per Professor Jiang's request, my assignment implementations are not open-sourced.

CMU 15-799: Special Topics in Database Systems

课程简介

  • 所属大学:CMU
  • 先修要求:CMU 15-445
  • 编程语言:C++
  • 课程难度:🌟🌟🌟
  • 预计学时:80 小时

这门课目前只开了两次:fall2013 和 spring2022,讨论了数据库领域的一些前沿主题。fall2013 讨论了 Streaming、Graph DB、NVM 等,spring2022 主要讨论 Self-Driving DBMS,都提供有相关论文。

spring2022 版课程任务:

任务一:基于 PostgreSQL 进行手动性能调优;

任务二:基于 NoisePage Pilot 改进 Self-Driving DBMS,不限特性。

授课更贴近讲座的形式,编程任务较少。对一般同学可以开拓一下视野,对专精数据库的同学可能帮助较大。

课程资源

CMU 15-799: Special Topics in Database Systems

Course Introduction

  • University: Carnegie Mellon University (CMU)
  • Prerequisites: CMU 15-445
  • Programming Language: C++
  • Course Difficulty: 🌟🌟🌟
  • Estimated Study Time: 80 hours

This course has only been offered twice so far, in Fall 2013 and Spring 2022, and it discusses some cutting-edge topics in the field of databases. The Fall 2013 session covered topics like Streaming, Graph DB, NVM, etc., while the Spring 2022 session mainly focused on Self-Driving DBMS, with relevant papers provided.

The tasks for the Spring 2022 version of the course included:

  1. Task One: Manual performance tuning based on PostgreSQL.
  2. Task Two: Improving the Self-Driving DBMS based on NoisePage Pilot, with no limitations on features.

The teaching style is more akin to a seminar, with fewer programming assignments. This course can broaden the horizons for general students and may be particularly beneficial for those specializing in databases.

Course Resources

Caltech CS 122: Database System Implementation

课程简介

  • 所属大学:Caltech
  • 先修要求:无
  • 编程语言:Java
  • 课程难度:🌟🌟🌟🌟🌟
  • 预计学时:150 小时

加州理工的这门课,不同于没有提供 SQL 层功能的 CMU15-445 课程。CS122 课程 Lab 的侧重点在于 SQL 层的相关实现,涉及查询优化器的各个模块,比如SQL的解析,Translate,如何实现 Join,统计信息以及代价估计,子查询实现,Agg,Group By 的实现等。除此之外,还有 B+树,WAL 相关实验。本门课程适合在学完 CMU15-445 课程之后,对查询优化相关内容有兴趣的同学。

下面介绍一下这门课的前 3 个 Assignment 也就是实验 Lab 所要实现的功能:

Assignment1

  • 为 NanoDB 提供 delete,update 语句的支持。
  • 为 Buffer Pool Manager 添加合适的 pin/unpin 代码。
  • 提升 insert 语句的性能, 同时不使数据库文件大小过分膨胀。

Assignment2

  • 实现一个简单的计划生成器,将各种已经 Parser 过的 SQL 语句转化为可执行的执行计划。
  • 使用 nested-loop join 算法,实现支持 inner- and outer-join 的 Join 计划节点。
  • 添加一些单元测试, 保证 inner- and outer-join 功能实现正确。

Assignment3

  • 完成收集表的统计信息。
  • 完成各种计划节点的计划成本计算。
  • 计算可出现在执行计划中的各种谓词的选择性。
  • 根据谓词更新计划节点输出的元组统计信息。

剩余 Assignment 和 Challenges 可以查看课程介绍,推荐使用 IDEA 打开工程,Maven 构建,注意日志相关配置。

课程资源

Caltech CS 122: Database System Implementation

Course Introduction

  • University: California Institute of Technology (Caltech)
  • Prerequisites: None
  • Programming Language: Java
  • Course Difficulty: 🌟🌟🌟🌟🌟
  • Estimated Study Time: 150 hours

Caltech's course, unlike CMU15-445 which does not offer SQL layer functionality, focuses on the implementation at the SQL layer in its CS122 course labs. It covers various modules of a query optimizer, such as SQL parsing, translation, implementation of joins, statistics and cost estimation, subquery implementation, and the implementation of aggregations and group by operations. Additionally, there are experiments related to B+ trees and Write-Ahead Logging (WAL). This course is suitable for students who have completed the CMU15-445 course and are interested in query optimization.

Below is an overview of the first three assignments or lab experiments of this course:

Assignment 1

  • Provide support for delete and update statements in NanoDB.
  • Add appropriate pin/unpin code to the Buffer Pool Manager.
  • Improve the performance of insert statements without excessively inflating the size of the database file.

Assignment 2

  • Implement a simple plan generator to convert various parsed SQL statements into executable plans.
  • Implement join plan nodes that support inner and outer joins using the nested-loop join algorithm.
  • Add unit tests to ensure the correct implementation of inner and outer joins.

Assignment 3

  • Complete the collection of table statistics.
  • Perform plan cost calculation for various plan nodes.
  • Calculate the selectivity of various predicates that may appear in the execution plan.
  • Update the tuple statistics of the plan nodes' outputs based on predicates.

For the remaining Assignments and Challenges, please refer to the course description. It is recommended to use IDEA to open the project and Maven for building, keeping in mind the log-related configurations.

Course Resources

Stanford CS 346: Database System Implementation

课程简介

  • 所属大学:Stanford
  • 先修要求:无
  • 编程语言:C++
  • 课程难度:🌟🌟🌟🌟🌟
  • 预计学时:150 小时

RedBase 是 cs346 的一个项目,实现了一个简易的数据库系统,项目是高度结构化的。整个项目能够被分为以下几个部分(同时也是 4 个需要完善的 lab):

  1. The record management component:记录管理组件。

  2. The index component:B+ 索引管理。

  3. The System Management Component:ddl语句、命令行工具、数据加载命令、元数据管理。

  4. The Query Language Component:在这个部分需要实现 RQL Redbase 查询语言。RQL 要实现 select、insert、delete、update 语句。

  5. Extension Component:除了上述数据库系统的基本功能组件,还需要实现一个扩展组件,可以是 Blob 类型、 网络模块、连接算法、CBO 优化器、OLAP、事务等。

RedBase 适合在学完 CMU 15-445 后继续学习数据库系统中的其他组件,因为其代码量不多,可以方便的根据需要扩展代码。同时代码完全由 C++ 编写,也可以用于练习 C++ 编程技巧。

课程资源

Stanford CS 346: Database System Implementation

Course Introduction

  • University: Stanford
  • Prerequisites: None
  • Programming Language: C++
  • Course Difficulty: 🌟🌟🌟🌟🌟
  • Estimated Study Time: 150 hours

RedBase, the project for CS346, involves the implementation of a simplified database system and is highly structured. The project can be divided into the following parts, which also correspond to the four labs that need to be completed:

  1. The Record Management Component: This involves the implementation of record management functionalities.

  2. The Index Component: Focuses on the management of B+ tree indexing.

  3. The System Management Component: Deals with DDL statements, command-line tools, data loading commands, and metadata management.

  4. The Query Language Component: In this part, students are required to implement the RQL Redbase Query Language, including select, insert, delete, and update statements.

  5. Extension Component: Beyond the basic components of a database system, students must implement an extension component, which could be a Blob type, network module, join algorithms, CBO optimizer, OLAP, transactions, etc.

RedBase is an ideal follow-up project for students who have completed CMU 15-445 and wish to learn other components of a database system. Due to its manageable codebase, it allows for convenient expansion as needed. Furthermore, as it is entirely written in C++, it also serves as good practice for C++ programming skills.

Course Resources

CMU 10-708: Probabilistic Graphical Models

课程简介

  • 所属大学:CMU
  • 先修要求:Machine Learning, Deep Learning, Reinforcement Learning
  • 课程难度:🌟🌟🌟🌟🌟
  • 课程网站:https://sailinglab.github.io/pgm-spring-2019/
  • 这个网站包含了所有的资源:slides, nots, video, homework, project

这门课程是 CMU 的图模型基础 + 进阶课,授课老师为 Eric P. Xing,涵盖了图模型基础,与神经网络的结合,在强化学习中的应用,以及非参数方法。相当硬核

CMU 10-708: Probabilistic Graphical Models

Course Introduction

  • University: Carnegie Mellon University (CMU)
  • Prerequisites: Machine Learning, Deep Learning, Reinforcement Learning
  • Course Difficulty: 🌟🌟🌟🌟🌟
  • Course Website: CMU 10-708
  • Course Resources: The course website includes slides, notes, videos, homework, and project materials.

CMU's course on Probabilistic Graphical Models, taught by Eric P. Xing, is a foundational and advanced course on graphical models. The curriculum covers the basics of graphical models, their integration with neural networks, applications in reinforcement learning, and non-parametric methods, making it a highly rigorous and comprehensive course.

For students with a solid background in machine learning, deep learning, and reinforcement learning, this course provides a deep dive into the theoretical and practical aspects of probabilistic graphical models. The extensive resources available on the course website make it an invaluable learning tool for anyone looking to master this complex and rapidly evolving field.

STATS214 / CS229M: Machine Learning Theory

课程简介

经典学习理论 + 最新深度学习理论,非常硬核。授课老师之前是 Percy Liang,现在是 Tengyu Ma

STATS214 / CS229M: Machine Learning Theory

Course Introduction

  • University: Stanford
  • Prerequisites: Machine Learning, Deep Learning, Statistics
  • Course Difficulty: 🌟🌟🌟🌟🌟🌟
  • Course Website: STATS214 / CS229M

This course offers a rigorous blend of classical learning theory and the latest developments in deep learning theory, making it exceptionally challenging and comprehensive. Previously taught by Percy Liang, the course is now led by Tengyu Ma, ensuring a high level of expertise and insight into the theoretical aspects of machine learning.

The curriculum is designed for students with a solid foundation in machine learning, deep learning, and statistics, aiming to deepen their understanding of the underlying theoretical principles in these fields. This course is an excellent choice for anyone looking to gain a thorough understanding of both the traditional and contemporary theoretical approaches in machine learning.

STA 4273 Winter 2021: Minimizing Expectations

课程简介

这是一门较为进阶的 Ph.D. 研究课程,核心内容是 inference 和 control 之间的关系。授课老师为 Chris Maddison (AlphaGo founding member, NeurIPS 14 best paper)

STA 4273 Winter 2021: Minimizing Expectations

Course Introduction

  • University: University of Toronto
  • Prerequisites: Bayesian Inference, Reinforcement Learning
  • Course Difficulty: 🌟🌟🌟🌟🌟🌟🌟
  • Course Website: STA 4273 Winter 2021

"Minimizing Expectations" is an advanced Ph.D. level research course, focusing on the interplay between inference and control. The course is taught by Chris Maddison, a founding member of AlphaGo and a NeurIPS 2014 best paper awardee.

This course is notably challenging and is designed for students who have a strong background in Bayesian Inference and Reinforcement Learning. The curriculum explores deep theoretical concepts and their practical applications in the fields of machine learning and artificial intelligence.

Chris Maddison's expertise and his significant contributions to the field, particularly in the development of AlphaGo, make this course highly prestigious and insightful for Ph.D. students and researchers looking to deepen their understanding of inference and control in advanced machine learning contexts. The course website provides valuable resources for anyone interested in this specialized area of study.

Columbia STAT 8201: Deep Generative Models

课程简介

这门课是一门 PhD 讨论班,每周的内容是展示 + 讨论论文,授课老师是 John Cunningham。Deep Generative Models (深度生成模型) 是图模型与神经网络的结合,也是现代机器学习最重要的方向之一

Columbia STAT 8201: Deep Generative Models

Course Introduction

  • University: Columbia University
  • Prerequisites: Machine Learning, Deep Learning, Graphical Models
  • Course Difficulty: 🌟🌟🌟🌟🌟🌟
  • Course Website: STAT 8201

"Deep Generative Models" is a Ph.D. level seminar course at Columbia University, taught by John Cunningham. This course is structured around weekly paper presentations and discussions, focusing on deep generative models, which represent the intersection of graphical models and neural networks and are one of the most important directions in modern machine learning.

The course is designed to explore the latest advancements and theoretical foundations in deep generative models. Participants engage in in-depth discussions about current research papers, fostering a deep understanding of the subject matter. This format not only helps students keep abreast of the latest developments in this rapidly evolving field but also sharpens their critical thinking and research skills.

Given the advanced nature of the course, it is ideal for Ph.D. students and researchers who have a solid foundation in machine learning, deep learning, and graphical models, and are looking to delve into the cutting-edge of deep generative models. The course website provides a valuable resource for accessing the curriculum and related materials.

机器学习进阶

此路线图适用于已经学过了基础机器学习 (ML, NLP, CV, RL) 的同学 (高年级本科生或低年级研究生),已经发表过至少一篇顶会论文 (NeurIPS, ICML, ICLR, ACL, EMNLP, NAACL, CVPR, ICCV) 想要走机器学习科研路线的选手。

此路线的目标是为读懂与发表机器学习顶会论文打下理论基础,特别是 Probabilistic Methods 这个 track 下的文章

机器学习进阶可能存在多种不同的学习路线,此路线只能代表作者 Yao Fu 所理解的最佳路径,侧重于贝叶斯学派下的概率建模方法,也会涉及到各项相关学科的交叉知识。

必读教材

  • PRML: Pattern Recognition and Machine Learning. Christopher Bishop
  • 经典贝叶斯学派教材
  • AoS: All of Statistics. Larry Wasserman
  • 经典频率学派教材

所以这两本书刚好相辅相成

字典

  • MLAPP: Machine Learning: A Probabilistic Perspective. Kevin Murphy
  • Convex Optimization. Stephen Boyd and Lieven Vandenberghe

进阶书籍

  • W&J: Graphical Models, Exponential Families, and Variational Inference. Martin Wainwright and Michael Jordan
  • Theory of Point Estimation. E. L. Lehmann and George Casella

如何阅读

Guidelines

  • 必读教材就是一定要读的教材
  • 字典的意思是,一般情况下不管它,但当遇到了不懂的概念的时候,就去字典里面查(而不是维基百科)
  • 进阶书籍先不读,先读完必读书籍。必读书籍一般都是要前前后后反复看过 N 遍才算读完
  • 读的过程中,最重要的读法就是对比阅读 (contrastive-comparative reading):同时打开两本书讲同一主题的章节,然后对比相同点和不同点和联系
  • 读的过程中,尽量去回想之前读过的论文,比较论文和教材的相同点与不同点

基础路径

  • 先读 AoS 第六章: Models, Statistical Inference and Learning,这一部分是最基础的科普
  • 然后读 PRML 第 10, 11 章
  • 第 10 章的内容是 Variational Inference, 第 11 章的内容是 MCMC, 这两种方法是贝叶斯推断的两条最主要路线
  • 如果在读 PRML 的过程中发现有任何不懂的名词,就去翻前面的章节。很大概率能够在第 3,4 章找到相对应的定义;如果找不到或者不够详细,就去查 MLAPP
  • AoS 第 8 章 (Parametric Inference) 和第 11 章 (Bayesian Inference) 也可以作为参考。最好的方法是多本书对比阅读,流程如下
    • 假设我在读 PRML 第 10 章的时候发现了一个不懂的词:posterior inference
    • 于是我往前翻,翻到了第 3 章 (Linear Model for Regression),看到了最简单的 posterior
    • 然后我接着翻 AoS,翻到了第 11 章,也有对 posterior 的描述
    • 然后我对比 PRML 第 10 章,第 3 章,AoS 第 11 章,三处不同地方对 posterior 的解读,比较其相同点和不同点和联系
  • 读完 PRML 第 10 和 11 章之后,接着读 AoS 第 24 章 (Simulation Methods),然后把它和 PRML 第 11 章对比阅读 -- 这俩都是讲 MCMC
  • 如果到此处发现还有基础概念读不懂,就回到 PRML 第 3 章,把它和 AoS 第 11 章对比阅读
  • Again,对比阅读非常重要,一定要把不同本书的类似内容同时摆在面前相互对比,这样可以显著增强记忆
  • 然后读 PRML 第 13 章(跳过第 12 章),这一章可以和 MLAPP 的第 17, 18 章对比阅读
  • MLAPP 第 17 章是 PRML 第 13.2 章的详细版,主要讲 HMM
  • MLAPP 第 18 章是 PRML 第 13.3 章的详细版,主要讲 LDS
  • 读完 PRML 第 13 章之后,再去读 PRML 第 8 章 (Graphical Models) -- 此时这部分应该会读得很轻松
  • 以上的内容可以进一步对照 CMU 10-708 PGM 课程材料

到目前为止,应该能够掌握

  • 概率模型的基础定义
  • 精准推断 - Sum-Product
  • 近似推断 - MCMC
  • 近似推断 - VI

然后就可以去做更进阶的内容

Advanced Machine Learning

This learning path is suitable for students who have already learned the basics of machine learning (ML, NLP, CV, RL), such as senior undergraduates or junior graduate students, and have published at least one paper in top conferences (NeurIPS, ICML, ICLR, ACL, EMNLP, NAACL, CVPR, ICCV) and are interested in pursuing a research path in machine learning.

The goal of this path is to lay the theoretical groundwork for understanding and publishing papers at top machine learning conferences, especially in the track of Probabilistic Methods.

There can be multiple advanced learning paths in machine learning, and this one represents the best path as understood by the author Yao Fu, focusing on probabilistic modeling methods under the Bayesian school and involving interdisciplinary knowledge.

Essential Textbooks

  • PRML: Pattern Recognition and Machine Learning by Christopher Bishop
  • AoS: All of Statistics by Larry Wasserman

These two books respectively represent classic teachings of the Bayesian and frequentist schools, complementing each other nicely.

Reference Books

  • MLAPP: Machine Learning: A Probabilistic Perspective by Kevin Murphy
  • Convex Optimization by Stephen Boyd and Lieven Vandenberghe

Advanced Books

  • W&J: Graphical Models, Exponential Families, and Variational Inference by Martin Wainwright and Michael Jordan
  • Theory of Point Estimation by E. L. Lehmann and George Casella

Reading Guidelines

How to Approach

  • Essential textbooks are a must-read.
  • Reference books are like dictionaries: consult them when encountering unfamiliar concepts (instead of Wikipedia).
  • Advanced books should be approached after completing the essential textbooks, which should be read multiple times for thorough understanding.
  • Contrastive-comparative reading is crucial: open two books on the same topic, compare similarities, differences, and connections.
  • Recall previously read papers during reading and compare them with textbook content.

Basic Pathway

  1. Start with AoS Chapter 6: Models, Statistical Inference, and Learning as a basic introduction.
  2. Read PRML Chapters 10 and 11:
  3. Chapter 10 covers Variational Inference, and Chapter 11 covers MCMC, the two main routes for Bayesian inference.
  4. Consult earlier chapters in PRML or MLAPP for any unclear terms.
  5. AoS Chapter 8 (Parametric Inference) and Chapter 11 (Bayesian Inference) can also serve as references. Compare these chapters with the relevant PRML chapters.
  6. After PRML Chapters 10 and 11, proceed to AoS Chapter 24 (Simulation Methods) and compare it with PRML Chapter 11, focusing on MCMC.
  7. If foundational concepts are still unclear, review PRML Chapter 3 and compare it with AoS Chapter 11.
  8. Read PRML Chapter 13 (skip Chapter 12) and compare it with MLAPP Chapters 17 and 18, focusing on HMM and LDS.
  9. After completing PRML Chapter 13, move on to Chapter 8 (Graphical Models).
  10. Cross-reference these topics with CMU 10-708 PGM course materials.

By this point, you should have a grasp of:

  • Basic definitions of probabilistic models
  • Exact inference - Sum-Product
  • Approximate inference - MCMC
  • Approximate inference - VI

Afterward, you can proceed to more advanced topics.

Amirkabir University of Technology 1400-2: Advanced Programming Course

课程简介

  • 所属大学:Amirkabir University of Technology
  • 先修要求:无
  • 编程语言:C++
  • 课程难度:🌟🌟🌟🌟🌟
  • 预计学时:50 小时

无意中发现的一个 C++ 课程。课程的 homework 质量很高,每个 homework 相互独立结构简单,且有完善的单元测试,非常适合用来学习 C++ 的编程。本课程共 7 个 homework,如下所示:

  1. 实现一个 Matrix 类以及相关函数。

  2. 实现一个模拟加密货币客户端/服务端执行过程的程序。

  3. 实现一个 Binary Search Tree (BST)。

  4. 实现 C++ 中的 SharedPtr 和 UniquePtr 智能指针。

  5. 使用继承和多态实现多个类。

  6. 使用 STL 库解决 4 个问题。

  7. 是个 python 项目,有兴趣的可以看看。

没找到课程的主页,只有在 github 有 homework 的源码(名字为 AP1400-2-HW 的那几个)。

课程资源

Amirkabir University of Technology 1400-2: Advanced Programming Course

Course Introduction

  • Affiliated University: Amirkabir University of Technology
  • Prerequisites: None
  • Programming Language: C++
  • Course Difficulty: 🌟🌟🌟🌟🌟
  • Estimated Study Time: 50 hours

This is an accidentally discovered C++ course. The quality of the homework assignments is outstanding, with each being independently structured and simple, complemented by comprehensive unit tests, making it highly suitable for learning C++ programming. The course includes a total of 7 homework assignments, as follows:

  1. Implement a Matrix class and related functions.

  2. Implement a program that simulates the operation of a cryptocurrency client/server.

  3. Implement a Binary Search Tree (BST).

  4. Implement SharedPtr and UniquePtr smart pointers in C++.

  5. Use inheritance and polymorphism to implement multiple classes.

  6. Solve 4 problems using the STL library.

  7. There's a Python project, for those interested.

The course homepage was not found, but the source code for the homework (named AP1400-2-HW) can be found on GitHub.

Course Resources

UCB: Sysadmin DeCal

课程简介

  • 所属大学:UCB
  • 先修要求:无
  • 编程语言:shell
  • 课程难度:🌟🌟🌟
  • 预计学时:20小时

来自 UCB 的一门讲解 Linux 的入门课程,比起定位相同的 MIT 的公开课 Missing Semester,Decal 讲解得更加系统、也更加清晰,这也是我推荐它的主要原因。比起 Missing Semester 更像是给已经开始编程但没有系统使用过这些工具的学生的查缺补漏,DeCal 更像是面向零基础的同学的课程。一共十二周的课程内容,包括 Linux 基础、shell 编程(还有tmux 、vim)、包管理、服务(Services)、基础计算机网络、网络服务、安全(密钥管理)、Git、Docker、Kubernetes、Puppet 和 CUDA。 十分适合新手了解和入门 Linux 环境相关内容。

美中不足的是部分课程作业需要在远程服务器操作,比如关于 ssh 的练习,需要 UCB 内部账号访问。但是大部分作业可以通过自己搭建的虚拟机 + 使用 Xshell 等工具或者直接使用 Linux 桌面版来操作练习,在听完完整课程和做完作业后,相信已经对 Linux 有最基本的了解了。

为弥补无法使用远程服务器的不足以及熟悉 linux 命令行的需求,在此推荐 bandit 。bandit 是一款来自OverTheWire 网站的 Wargame,为 CTF 爱好者提供免费的练习靶场。bandit 的前 15 个 level 都是基础的 linux 操作而不涉及任何 CTF 知识。这些练习也正好弥补了 DeCal 外校无法访问的部分(主要是远程链接、文件权限)等内容。

课程资源

  • 课程网站:官网
  • 课程视频:见课程官网,B站有一个只有前一部分的不完全搬运
  • 课程教材:无指定教材,但每一周的 labs 之中都有足够的阅读材料供你深入细节。
  • 课程作业:见课程官网

UCB: Sysadmin DeCal

Course Introduction

  • Affiliated University: UCB
  • Prerequisites: None
  • Programming Language: Shell
  • Course Difficulty: 🌟🌟🌟
  • Estimated Study Time: 20 hours

This is an introductory course on Linux from UCB, which I find more systematic and clearer than MIT's similarly aimed open course, Missing Semester. This is the main reason I recommend it. While Missing Semester seems more like a course for filling gaps for students who have started programming but haven't systematically used these tools, DeCal is more suitable for absolute beginners. The twelve-week course covers Linux basics, shell programming (including tmux and vim), package management, services, basic computer networks, network services, security (key management), Git, Docker, Kubernetes, Puppet, and CUDA. It's ideal for newcomers to understand and get started with the Linux environment.

A slight drawback is that some course assignments require operations on remote servers, like exercises on ssh, which need UCB internal account access. However, most assignments can be practiced by setting up a virtual machine and using tools like Xshell or directly using a Linux desktop version. After completing the full course and assignments, you should have a basic understanding of Linux.

To compensate for the inability to use remote servers and to familiarize with the Linux command line, I recommend bandit. Bandit is a Wargame from OverTheWire, providing a free practice range for CTF enthusiasts. The first 15 levels of bandit are basic Linux operations without any CTF knowledge. These exercises perfectly supplement the parts of DeCal that are inaccessible to external students (mainly remote connections, file permissions, etc.).

Course Resources

  • Course Website: Official Site
  • Course Videos: Available on the official course website, Bilibili has an incomplete transfer that only includes the first part.
  • Course Textbook: No specified textbook, but each week's labs contain enough reading material for in-depth study.
  • Course Assignments: Available on the official course website.

Stanford CS148

课程简介

  • 所属大学:Stanford
  • 先修要求:线性代数,高等数学,Python
  • 编程语言:Python
  • 课程难度:🌟🌟🌟
  • 预计学时:40 小时

官方介绍:

这是一门计算机图形学的入门课程,这门课的一开始使用 Blender 去生成图像,并且理解底层的数学知识,包括三角形、法向量、插值、纹理映射、凹凸贴图等等。之后会介绍光线和颜色以及它们如何影响计算机的显示和打印。同时也会介绍 BRDF 以及一些基本的光照和着色模型。课程的最后,会涉及到光线追踪、反走样、加速结构等内容。

具体课程信息可以查看课程网站

这门课程比起 GAMES101 浅一些,编程语言使用的是 Python,对于不熟悉 C++ 的同学比较友好。

课程资源

Stanford CS148

Course Introduction

  • University: Stanford
  • Prerequisites: Linear Algebra, Advanced Mathematics, Python
  • Programming Language: Python
  • Course Difficulty: 🌟🌟🌟
  • Estimated Study Time: 40 hours

Official Description:

This introductory course in computer graphics begins with using Blender to generate images and understanding the underlying mathematical concepts, including triangles, normals, interpolation, texture mapping, bump mapping, and more. It then delves into light and color and how they affect computer displays and printing. The course also covers BRDF and some basic lighting and shading models. Towards the end, topics like ray tracing, anti-aliasing, and acceleration structures are introduced.

For more detailed information, you can visit the course website.

This course is somewhat less in-depth compared to GAMES101 and uses Python, making it more accessible for students who are not familiar with C++.

Course Resources

GAMES101

课程简介

  • 所属大学:UCSB
  • 先修要求:线性代数,高等数学,C++
  • 编程语言:C++
  • 课程难度:🌟🌟🌟
  • 预计学时:80 小时

官方介绍:

本课程将全面而系统地介绍现代计算机图形学的四大组成部分:(1)光栅化成像,(2)几何表示,(3)光的传播理论,以及(4)动画与模拟。每个方面都会从基础原理出发讲解到实际应用,并介绍前沿的理论研究。通过本课程,你可以学习到计算机图形学背后的数学和物理知识,并锻炼实际的编程能力。 作为入门,本课程会尽可能的覆盖图形学的方方面面,把每一部分的基本概念都尽可能说清楚,让大家对计算机图形学有一个完整的、自上而下的全局把握。全局的理解很重要,学完本课程后,你会了解到图形学不等于 OpenGL,不等于光线追踪,而是一套生成整个虚拟世界的方法。从本课程的标题,大家还可以看到“现代”二字,也就是说,这门课所要给大家介绍的都是现代化的知识,也都是现代图形学工业界需要的图形学基础。

GAMES101 是国内相当有名的图形学公开课。和大家印象中满是数学和算法的图形学不同,这门课以十分生动的方式带我们进入了图形学这个领域的大门。

每个project代码量都不会很多,但是却都十分有趣。在做这些project的过程中,我们会实现简单的光栅化,并渲染一个简易的模型,我们还会实现光线追踪,以追求渲染更好的质量。每个project中还有选做的拓展作业,可以让我们渲染的模型具有更好的质量,更快的渲染速度。

喜欢玩游戏的同学应该对实时光线追踪有一定的了解,这门课的老师闫令琪就对这一技术有直接的推动作用。

跟着课程的视频,做完每一个project,相信你会和我一样对图形学,以及现代的渲染技术产生浓厚的兴趣。

课程资源

资源汇总

@ysj1173886760 在学习这门课中用到的所有资源和作业实现都汇总在ysj1173886760/Learning: graphics/GAMES101 - GitHub 中。

GAMES101

Course Introduction

  • University: University of California, Santa Barbara (UCSB)
  • Prerequisites: Linear Algebra, Advanced Mathematics, C++
  • Programming Language: C++
  • Course Difficulty: 🌟🌟🌟
  • Estimated Study Time: 80 hours

Official Introduction:

This course comprehensively and systematically introduces the four major components of modern computer graphics: (1) rasterization imaging, (2) geometric representation, (3) the theory of light propagation, and (4) animation and simulation. Each aspect is explained from basic principles to practical applications, along with the introduction of cutting-edge theoretical research. Through this course, you can learn the mathematics and physics behind computer graphics and enhance your practical programming skills. As an introduction, this course aims to cover as many aspects of graphics as possible, clearly explaining the basic concepts of each part to provide a complete, top-down understanding of computer graphics. A global understanding is crucial; after completing this course, you will realize that graphics is not just OpenGL or ray tracing but a set of methods for creating virtual worlds. The title of this course also contains the word "modern," indicating that the knowledge imparted is contemporary and essential for the modern graphics industry.

GAMES101 is a well-known graphics course in China. Unlike the traditionally math and algorithm-heavy perception of graphics, this course introduces the field of graphics in a very vivid way.

Each project is not code-heavy but is quite interesting. Through these projects, students will implement simple rasterization to render basic models and ray tracing for better rendering quality. Each project also includes optional extensions to enhance the quality and speed of the rendered models.

If you enjoy gaming, you might be familiar with real-time ray tracing, a technology that the course instructor, Lingqi Yan, has directly contributed to. By following the course videos and completing each project, you'll likely develop a strong interest in graphics and modern rendering techniques, just as I did.

Course Resources

Resource Summary

All resources and homework implementations used by @ysj1173886760 during the course are compiled at ysj1173886760/Learning: graphics/GAMES101 - GitHub.

GAMES103

课程简介

  • 所属大学:Style3D/OSU
  • 先修要求:线性代数,高等数学,大学物理,编程能力,基本图形知识
  • 编程语言:C#
  • 课程难度:🌟🌟🌟🌟
  • 预计学时:50 小时

官方介绍:

本课程将作为基于物理的计算机动画技术入门,着重介绍各种基础的物理动画模拟技术。

该课程主要涵盖四个方向,分别为:1)刚体模拟;2)质点弹簧、约束与布料模拟;3)基于有限元的弹性体模拟;4)流体模拟。

本课程内容将不会涉及具体的物理模拟引擎,但会讨论各种引擎背后的技术,以及它们的优缺点等等。由于开发学习物理模拟需要一定的数学基础,课程初始阶段也会花一定的时间复习必备的数学知识。顺利完成课程之后,同学们应该会对基础的物理模拟技术有深入理解,对高级的模拟技术也会有部分接触。

图形学可粗略分为渲染、模拟、几何三个领域。GAMES101 和 GAMES202 主要以渲染为主,而对于物理模拟,GAMES103 则是很棒的学习资源。

课程资源

资源汇总

@indevn 在学习这门课中用到的所有资源和作业要求都汇总在 GAMES103 Unoffical 中。对于作业的具体实现,在知乎上有很多相关文章进行了细致讲解可以参考。

GAMES103

Course Introduction

  • University: Style3D / Oregon State University (OSU)
  • Prerequisites: Linear Algebra, Advanced Mathematics, College Physics, Programming Skills, Basic Graphics Knowledge
  • Programming Language: C#
  • Course Difficulty: 🌟🌟🌟🌟
  • Estimated Study Time: 50 hours

Official Introduction:

This course serves as an introduction to physics-based computer animation techniques, focusing on various fundamental physical animation simulation technologies.

The course mainly covers four areas: 1) Rigid body simulation; 2) Particle systems, springs, constraints, and cloth simulation; 3) Elastic body simulation based on the finite element method; 4) Fluid simulation.

The course content will not delve into specific physical simulation engines but will discuss the technologies behind various engines and their pros and cons. Since developing and learning physical simulations requires a solid mathematical foundation, the initial stages of the course will also spend some time reviewing necessary mathematical concepts. Upon successful completion of the course, students should have a deep understanding of basic physical simulation techniques and some exposure to advanced simulation technologies.

In graphics, the field can be roughly divided into rendering, simulation, and geometry. While GAMES101 and GAMES202 mainly focus on rendering, GAMES103 is an excellent resource for learning about physical simulation.

Course Resources

  • Course Website: GAMES103
  • Course Videos: Bilibili
  • Reference Materials: Course PPT
  • Course Assignments: Four assignments, available on the official BBS mini-app or the unofficial Repo: GAMES103 HW

Resource Summary

All resources and homework requirements used by @indevn during the course are compiled at GAMES103 Unofficial. For detailed implementations of the assignments, there are many articles on Zhihu that provide in-depth explanations and can be referenced.

GAMES202

课程简介

  • 所属大学:UCSB
  • 先修要求:线性代数,高等数学,C++,GAMES101
  • 编程语言:C++
  • 课程难度:🌟🌟🌟🌟
  • 预计学时:60 小时

官方介绍:

本课程将全面地介绍现代实时渲染中的关键问题与解决方法。由于实时渲染 (>30 FPS) 对速度要求极高,因此本课程的关注点将是在苛刻的时间限制下,人们如何打破速度与质量之间的权衡,同时保证实时的高速度与照片级的真实感。

本课程将以专题的形式呈现,课程内容会覆盖学术界与工业界的前沿内容,包括:(1)实时软阴影的渲染;(2)环境光照;(3)基于预计算或无预计算的全局光照;(4)基于物理的着色模型与方法;(5)实时光线追踪;(6)抗锯齿与超采样;以及一些常见的加速方式等等。

除了最新最全的内容之外,本课程与其它任何实时渲染的教程还有一个重要的区别,那就是本课程不会讲授任何与游戏引擎的使用相关的内容,并且不会特别强调具体的着色器实现技术,而主要讲授实时渲染背后的科学与知识。本课程的目标是在你学习完这门课的时候,你将有深厚的功底去开发一个属于你自己的实时渲染引擎。

作为 GAMES101 的进阶课程,难度有一定的提升,但不会很大,相信完成了 GAMES101 的同学都有能力完成这门课程。每个 project 代码量都不会很多,但是都需要一定的思考。

课程资源

GAMES202

Course Introduction

  • University: University of California, Santa Barbara (UCSB)
  • Prerequisites: Linear Algebra, Advanced Mathematics, C++, GAMES101
  • Programming Language: C++
  • Course Difficulty: 🌟🌟🌟🌟
  • Estimated Study Time: 60 hours

Official Introduction:

This course comprehensively introduces the key issues and solutions in modern real-time rendering. Since real-time rendering (>30 FPS) demands high speed, the focus of this course is on how to break the trade-off between speed and quality under strict time constraints, ensuring both high-speed real-time performance and photorealism.

The course will be presented in a thematic manner, covering cutting-edge content from both academia and industry, including: (1) real-time soft shadow rendering; (2) ambient lighting; (3) global illumination with or without precomputation; (4) physically-based shading models and methods; (5) real-time ray tracing; (6) anti-aliasing and supersampling; as well as various common acceleration methods.

In addition to the latest and most comprehensive content, an important distinction of this course from other real-time rendering tutorials is that it does not teach the use of game engines or emphasize specific shader implementation techniques. Instead, it focuses on the science and knowledge behind real-time rendering. The goal of this course is to provide you with a solid foundation to develop your own real-time rendering engine upon completion.

As an advanced course following GAMES101, GAMES202 offers a slightly increased level of difficulty, but it's manageable for students who have completed GAMES101. Each project requires a moderate amount of coding but involves significant thought.

Course Resources

  • Course Website: GAMES202
  • Course Videos: Bilibili
  • Course Textbook: Real-Time Rendering, 4th edition.
  • Course Assignments: 5 Projects

如何使用这本书

随着贡献者的不断增多,本书的内容也不断扩展,想把书中所有的课程全部学完是不切实际也没有必要的,甚至会起到事倍功半的反效果,吃力而不讨好。为了更好地贴合读者,让这本书真正为你所用,我将读者按照需求大致分为了如下三类,大家可以结合切身实际,精准地规划属于自己的自学方案。

初入校园

如果你刚刚进入大学校园或者还在低年级,并且就读的是计算机方向或者想要转到计算机方向,那么你很幸运,因为学习是你的本业,你可以有充足的时间和自由来学习自己感兴趣的东西,不会有工作的压力和生活的琐碎,不必过于纠结“学了有没有用”,“能不能找到工作”这类功利的想法。那么该如何安排自己的学业呢?我觉得首要的一点就是要打破在高中形成的“按部就班”式的被动学习。作为一个小镇做题家,我深知国内大部分高中会把大家一天当中的每一分钟都安排得满满当当,你只需要被动地跟着课表按部就班地完成一个个既定的任务。只要足够认真,结果都不会太差。但步入大学的校门,自由度一下子变大了许多。首先所有的课外时间基本都由你自由支配,没有人为你整理知识点,总结提纲,考试也不像高中那般模式化。如果你还抱着高中那种“乖学生”的心态,老老实实按部就班,结果未必如你所愿。因为专业培养方案未必就是合理,老师的教学未必就会负责,认真出席课堂未必就能听懂,甚至考试内容未必就和讲的有关系。说句玩笑话,你或许会觉得全世界都与你为敌,而你只能指望自己。

那么现状就是这么个现状,你想改变,也得先活过去,并且拥有足够的能力去质疑它。而在低年级,打好基础很重要。这里的基础是全方面的,课内的知识固然重要,但计算机很大程度上还是强调实践,因此有很多课本外的能力需要培养,而这恰恰是国内的计算机本科教育很欠缺的一点。我根据个人的体验总结出了下面几点建议,供大家参考。

其一就是了解如何写“优雅”的代码。国内的很多大一编程入门课都会讲成极其无聊的语法课,其效果还不如直接让学生看官方文档。事实上,在刚开始接触编程的时候,让学生试着去了解什么样的代码是优雅的,什么样的代码 "have bad taste" 是大有裨益的。一般来说,编程入门课会先介绍过程式编程(例如 C 语言)。但即便是面向过程编程,模块化封装 的思想也极其重要。如果你只想着代码能在 OpenJudge 上通过,写的时候图省事,用大段的复制粘贴和臃肿的 main 函数,长此以往,你的代码质量将一直如此。一旦接触稍微大一点的项目,无尽的 debug 和沟通维护成本将把你吞没。因此,写代码时不断问自己,是否有大量重复的代码?当前函数是否过于复杂(Linux 提倡每个函数只需要做好一件事)?这段代码能抽象成一个函数吗?一开始你可能觉得很不习惯,甚至觉得这么简单的题需要如此大费周章吗?但记住好的习惯是无价的,C 语言初中生都能学会,凭什么公司要招你去当程序员呢?

学过面向过程编程后,大一下学期一般会讲面向对象编程(例如 C++ 或 Java)。这里非常推荐大家看 MIT 6.031: Software Construction 这门课的 Notes,会以 Java 语言为例非常详细地讲解如何写出“优雅”的代码。例如 Test-Driven 的开发、函数 Specification 的设计、异常的处理等等等等。除此之外,既然接触了面向对象,那么了解一些常见的设计模式也是很有必要的。因为国内的面向对象课程同样很容易变成极其无聊的语法课,让学生纠结于各种继承的语法,甚至出一些无聊的脑筋急转弯一样的题目,殊不知这些东西在地球人的开发中基本不会用到。面向对象的精髓是让学生学会自己将实际的问题抽象成若干类和它们之间的关系,而设计模式则是前人总结出来的一些精髓的抽象方法。这里推荐大话设计模式 这本书,写得非常浅显易懂。

其二就是尝试学习一些能提高生产力的工具和技能,例如 Git、Shell、Vim。这里强烈推荐学习 MIT missing semester 这门课,也许一开始接触这些工具用起来会很不习惯,但强迫自己用,熟练之后开发效率会直线提高。此外,还有很多应用也能极大提高的你生产力。一条定律是:一切需要让手离开键盘的操作,都应该想办法去除。例如切换应用、打开文件、浏览网页这些都有相关插件可以实现快捷操作(例如 Mac 上的 Alfred)。如果你发现某个操作每天都会用到,并且用时超过1秒,那就应该想办法把它缩减到0.1秒。毕竟以后数十年你都要和电脑打交道,形成一套顺滑的工作流是事半功倍的。最后,学会盲打!如果你还需要看着键盘打字,那么赶紧上网找个教程学会盲打,这将极大提高你的开发效率。

其三就是平衡好课内和自学。我们质疑现状,但也得遵守规则,毕竟绩点在保研中还是相当重要的。因此在大一,我还是建议大家尽量按照自己的课表学习,但辅以一些优质的课外资源。例如微积分线代可以参考 MIT 18.01/18.02MIT 18.06 的课程 Notes。假期可以通过 UCB CS61A 来学习 Python。同时做到上面第一、第二点说的,注重好的编程习惯和实践能力的培养。就个人经验,大一的数学课学分占比相当大,而且数学考试的内容方差是很大的,不同学校不同老师风格迥异,自学也许能让你领悟数学的本质,但未必能给你一个好成绩。因此考前最好有针对性地刷往年题,充分应试。

在升入大二之后,计算机方向的专业课将居多,此时大家可以彻底放飞自我,进入自学的殿堂了。具体可以参考 一份仅供参考的CS学习规划,这是我根据自己三年自学经历总结提炼出来的全套指南,每门课的特点以及为什么要上这门课我都做了简单的介绍。对于你课表上的每个课程,这份规划里应该都会有相应的国外课程,而且在质量上我相信基本是全方位的碾压。由于计算机方向的专业知识基本是一样的,而且高质量的课程会让你从原理上理解知识点,对于国内大多照本宣科式的教学来说基本是降维打击。一般来说只要考前将老师“辛苦”念了一学期的 PPT 拿来突击复习两天,取得一个不错的卷面分数并不困难。如果有课程大作业,则可以尽量将国外课程的 Lab 或者 Project 修改一番以应付课内的需要。我当时上操作系统课,发现老师还用着早已被国外学校淘汰的课程实验,便邮件老师换成了自己正在学习的 MIT 6.S081 的 xv6 Project,方便自学的同时还无意间推动了课程改革。总之,灵活变通是第一要义,你的目标是用最方便、效率最高的方式掌握知识,所有与你这一目标违背的所谓规定都可以想方设法地去“糊弄”。凭着这份糊弄劲儿,我大三之后基本没有去过线下课堂(大二疫情在家呆了大半年),对绩点也完全没有影响。

最后,希望大家少点浮躁和功利,多一些耐心和追求。很多人发邮件问我自学需不需要很强的自制力,我觉得得关键得看你自己想要什么。如果你依然抱着会一门编程语言便能月薪过万的幻想,想分一杯互联网的红利,那么我说再多也是废话。其实我最初的自学并没有太多功利的想法,只是单纯的好奇和本能的求知欲。自学的过程也没有所谓的“头悬梁,锥刺股”,该吃吃,该玩玩,不知不觉才发现竟然攒下了这么多资料。现如今中美的对抗已然成为趋势,而我们还在“卑微”地“师夷长技”,感叹国外高质量课程的同时也时常会有一种危机感。这一切靠谁来改变呢?靠的是刚刚入行的你们。所以,加油吧,少年!

删繁就简

如果你已经本科毕业开始读研或者走上了工作岗位,亦或是从事着其他领域的工作想要利用业余时间转码,那么你也许并没有充足的业余时间来系统地学完 一份仅供参考的CS学习规划 里的内容,但又想弥补本科时期欠下的基础。考虑到这部分读者通常有一定的编程经验,入门课程没有必要再重复学习。而且从实用角度来说,由于工作的大体方向已经确定,确实没有太大必要对于每个计算机分支都有特别深入的研究,更应该侧重一些通用性的原则和技能。因此我结合自身经历,选取了个人感觉最重要也是质量最高的几门核心专业课,希望能更好地加深读者对计算机的理解。学完这些课程,无论你具体从事的是什么工作,我相信你将不可能沦为一个普通的调包侠,而是对计算机的底层运行逻辑有更深入的了解。

课程方向 课程名
离散数学和概率论 UCB CS70 : discrete Math and probability theory
数据结构与算法 Coursera: Algorithms I & II
软件工程 MIT 6.031: Software Construction
全栈开发 MIT web development course
计算机系统导论 CMU CS15213: CSAPP
体系结构入门 Coursera: Nand2Tetris
体系结构进阶 CS61C: Great Ideas in Computer Architecture
数据库原理 CMU 15-445: Introduction to Database System
计算机网络 Computer Networking: A Top-Down Approach
人工智能 Harvard CS50: Introduction to AI with Python
深度学习 Coursera: Deep Learning

心有所属

如果你对于计算机领域的核心专业课都掌握得相当扎实,而且已经确定了自己的工作或研究方向,那么书中还有很多未在 一份仅供参考的CS学习规划 提到的课程供你探索。

随着贡献者的不断增多,左侧的目录中将不断增加新的分支,例如 机器学习进阶机器学习系统。并且同一个分支下都有若干同类型课程,它们来自不同的学校,有着不同的侧重点和课程实验,例如 操作系统 分支下就包含了麻省理工、伯克利、南京大学还有哈工大四所学校的课程。如果你想深耕一个领域,那么学习这些同类的课程会给你不同的视角来看待类似的知识。同时,本书作者还计划联系一些相关领域的科研工作者来分享某个细分领域的科研学习路径,让 CS自学指南 在追求广度的同时,实现深度上的提高。

如果你想贡献这方面的内容,欢迎和作者邮件联系 zhongyinmin@pku.edu.cn

如何使用这本书

随着贡献者的不断增多,本书的内容也不断扩展,想把书中所有的课程全部学完是不切实际也没有必要的,甚至会起到事倍功半的反效果,吃力而不讨好。为了更好地贴合读者,让这本书真正为你所用,我将读者按照需求大致分为了如下三类,大家可以结合切身实际,精准地规划属于自己的自学方案。

初入校园

如果你刚刚进入大学校园或者还在低年级,并且就读的是计算机方向或者想要转到计算机方向,那么你很幸运,因为学习是你的本业,你可以有充足的时间和自由来学习自己感兴趣的东西,不会有工作的压力和生活的琐碎,不必过于纠结“学了有没有用”,“能不能找到工作”这类功利的想法。那么该如何安排自己的学业呢?我觉得首要的一点就是要打破在高中形成的“按部就班”式的被动学习。作为一个小镇做题家,我深知国内大部分高中会把大家一天当中的每一分钟都安排得满满当当,你只需要被动地跟着课表按部就班地完成一个个既定的任务。只要足够认真,结果都不会太差。但步入大学的校门,自由度一下子变大了许多。首先所有的课外时间基本都由你自由支配,没有人为你整理知识点,总结提纲,考试也不像高中那般模式化。如果你还抱着高中那种“乖学生”的心态,老老实实按部就班,结果未必如你所愿。因为专业培养方案未必就是合理,老师的教学未必就会负责,认真出席课堂未必就能听懂,甚至考试内容未必就和讲的有关系。说句玩笑话,你或许会觉得全世界都与你为敌,而你只能指望自己。

那么现状就是这么个现状,你想改变,也得先活过去,并且拥有足够的能力去质疑它。而在低年级,打好基础很重要。这里的基础是全方面的,课内的知识固然重要,但计算机很大程度上还是强调实践,因此有很多课本外的能力需要培养,而这恰恰是国内的计算机本科教育很欠缺的一点。我根据个人的体验总结出了下面几点建议,供大家参考。

其一就是了解如何写“优雅”的代码。国内的很多大一编程入门课都会讲成极其无聊的语法课,其效果还不如直接让学生看官方文档。事实上,在刚开始接触编程的时候,让学生试着去了解什么样的代码是优雅的,什么样的代码 "have bad taste" 是大有裨益的。一般来说,编程入门课会先介绍过程式编程(例如 C 语言)。但即便是面向过程编程,模块化封装 的思想也极其重要。如果你只想着代码能在 OpenJudge 上通过,写的时候图省事,用大段的复制粘贴和臃肿的 main 函数,长此以往,你的代码质量将一直如此。一旦接触稍微大一点的项目,无尽的 debug 和沟通维护成本将把你吞没。因此,写代码时不断问自己,是否有大量重复的代码?当前函数是否过于复杂(Linux 提倡每个函数只需要做好一件事)?这段代码能抽象成一个函数吗?一开始你可能觉得很不习惯,甚至觉得这么简单的题需要如此大费周章吗?但记住好的习惯是无价的,C 语言初中生都能学会,凭什么公司要招你去当程序员呢?

学过面向过程编程后,大一下学期一般会讲面向对象编程(例如 C++ 或 Java)。这里非常推荐大家看 MIT 6.031: Software Construction 这门课的 Notes,会以 Java 语言(22年改用了 TypeScript 语言)为例非常详细地讲解如何写出“优雅”的代码。例如 Test-Driven 的开发、函数 Specification 的设计、异常的处理等等等等。除此之外,既然接触了面向对象,那么了解一些常见的设计模式也是很有必要的。因为国内的面向对象课程同样很容易变成极其无聊的语法课,让学生纠结于各种继承的语法,甚至出一些无聊的脑筋急转弯一样的题目,殊不知这些东西在地球人的开发中基本不会用到。面向对象的精髓是让学生学会自己将实际的问题抽象成若干类和它们之间的关系,而设计模式则是前人总结出来的一些精髓的抽象方法。这里推荐大话设计模式 这本书,写得非常浅显易懂。

其二就是尝试学习一些能提高生产力的工具和技能,例如 Git、Shell、Vim。这里强烈推荐学习 MIT missing semester 这门课,也许一开始接触这些工具用起来会很不习惯,但强迫自己用,熟练之后开发效率会直线提高。此外,还有很多应用也能极大提高的你生产力。一条定律是:一切需要让手离开键盘的操作,都应该想办法去除。例如切换应用、打开文件、浏览网页这些都有相关插件可以实现快捷操作(例如 Mac 上的 Alfred)。如果你发现某个操作每天都会用到,并且用时超过1秒,那就应该想办法把它缩减到0.1秒。毕竟以后数十年你都要和电脑打交道,形成一套顺滑的工作流是事半功倍的。最后,学会盲打!如果你还需要看着键盘打字,那么赶紧上网找个教程学会盲打,这将极大提高你的开发效率。

其三就是平衡好课内和自学。我们质疑现状,但也得遵守规则,毕竟绩点在保研中还是相当重要的。因此在大一,我还是建议大家尽量按照自己的课表学习,但辅以一些优质的课外资源。例如微积分线代可以参考 MIT 18.01/18.02MIT 18.06 的课程 Notes。假期可以通过 UCB CS61A 来学习 Python。同时做到上面第一、第二点说的,注重好的编程习惯和实践能力的培养。就个人经验,大一的数学课学分占比相当大,而且数学考试的内容方差是很大的,不同学校不同老师风格迥异,自学也许能让你领悟数学的本质,但未必能给你一个好成绩。因此考前最好有针对性地刷往年题,充分应试。

在升入大二之后,计算机方向的专业课将居多,此时大家可以彻底放飞自我,进入自学的殿堂了。具体可以参考 一份仅供参考的CS学习规划,这是我根据自己三年自学经历总结提炼出来的全套指南,每门课的特点以及为什么要上这门课我都做了简单的介绍。对于你课表上的每个课程,这份规划里应该都会有相应的国外课程,而且在质量上我相信基本是全方位的碾压。由于计算机方向的专业知识基本是一样的,而且高质量的课程会让你从原理上理解知识点,对于国内大多照本宣科式的教学来说基本是降维打击。一般来说只要考前将老师“辛苦”念了一学期的 PPT 拿来突击复习两天,取得一个不错的卷面分数并不困难。如果有课程大作业,则可以尽量将国外课程的 Lab 或者 Project 修改一番以应付课内的需要。我当时上操作系统课,发现老师还用着早已被国外学校淘汰的课程实验,便邮件老师换成了自己正在学习的 MIT 6.S081 的 xv6 Project,方便自学的同时还无意间推动了课程改革。总之,灵活变通是第一要义,你的目标是用最方便、效率最高的方式掌握知识,所有与你这一目标违背的所谓规定都可以想方设法地去“糊弄”。凭着这份糊弄劲儿,我大三之后基本没有去过线下课堂(大二疫情在家呆了大半年),对绩点也完全没有影响。

最后,希望大家少点浮躁和功利,多一些耐心和追求。很多人发邮件问我自学需不需要很强的自制力,我觉得得关键得看你自己想要什么。如果你依然抱着会一门编程语言便能月薪过万的幻想,想分一杯互联网的红利,那么我说再多也是废话。其实我最初的自学并没有太多功利的想法,只是单纯的好奇和本能的求知欲。自学的过程也没有所谓的“头悬梁,锥刺股”,该吃吃,该玩玩,不知不觉才发现竟然攒下了这么多资料。现如今中美的对抗已然成为趋势,而我们还在“卑微”地“师夷长技”,感叹国外高质量课程的同时也时常会有一种危机感。这一切靠谁来改变呢?靠的是刚刚入行的你们。所以,加油吧,少年!

删繁就简

如果你已经本科毕业开始读研或者走上了工作岗位,亦或是从事着其他领域的工作想要利用业余时间转码,那么你也许并没有充足的业余时间来系统地学完 一份仅供参考的CS学习规划 里的内容,但又想弥补本科时期欠下的基础。考虑到这部分读者通常有一定的编程经验,入门课程没有必要再重复学习。而且从实用角度来说,由于工作的大体方向已经确定,确实没有太大必要对于每个计算机分支都有特别深入的研究,更应该侧重一些通用性的原则和技能。因此我结合自身经历,选取了个人感觉最重要也是质量最高的几门核心专业课,希望能更好地加深读者对计算机的理解。学完这些课程,无论你具体从事的是什么工作,我相信你将不可能沦为一个普通的调包侠,而是对计算机的底层运行逻辑有更深入的了解。

课程方向 课程名
离散数学和概率论 UCB CS70 : discrete Math and probability theory
数据结构与算法 Coursera: Algorithms I & II
软件工程 MIT 6.031: Software Construction
全栈开发 MIT web development course
计算机系统导论 CMU CS15213: CSAPP
体系结构入门 Coursera: Nand2Tetris
体系结构进阶 CS61C: Great Ideas in Computer Architecture
数据库原理 CMU 15-445: Introduction to Database System
计算机网络 Computer Networking: A Top-Down Approach
人工智能 Harvard CS50: Introduction to AI with Python
深度学习 Coursera: Deep Learning

心有所属

如果你对于计算机领域的核心专业课都掌握得相当扎实,而且已经确定了自己的工作或研究方向,那么书中还有很多未在 一份仅供参考的CS学习规划 提到的课程供你探索。

随着贡献者的不断增多,左侧的目录中将不断增加新的分支,例如 机器学习进阶机器学习系统。并且同一个分支下都有若干同类型课程,它们来自不同的学校,有着不同的侧重点和课程实验,例如 操作系统 分支下就包含了麻省理工、伯克利、南京大学还有哈工大四所学校的课程。如果你想深耕一个领域,那么学习这些同类的课程会给你不同的视角来看待类似的知识。同时,本书作者还计划联系一些相关领域的科研工作者来分享某个细分领域的科研学习路径,让 CS自学指南 在追求广度的同时,实现深度上的提高。

如果你想贡献这方面的内容,欢迎和作者邮件联系 zhongyinmin@pku.edu.cn

CMU 10-708: Probabilistic Graphical Models

课程简介

  • 所属大学:CMU
  • 先修要求:Machine Learning, Deep Learning, Reinforcement Learning
  • 课程难度:🌟🌟🌟🌟🌟
  • 课程网站:https://sailinglab.github.io/pgm-spring-2019/
  • 这个网站包含了所有的资源:slides, nots, video, homework, project

这门课程是 CMU 的图模型基础 + 进阶课,授课老师为 Eric P. Xing,涵盖了图模型基础,与神经网络的结合,在强化学习中的应用,以及非参数方法。相当硬核

CMU 10-708: Probabilistic Graphical Models

课程简介

  • 所属大学:CMU
  • 先修要求:Machine Learning, Deep Learning, Reinforcement Learning
  • 课程难度:🌟🌟🌟🌟🌟
  • 课程网站:https://sailinglab.github.io/pgm-spring-2019/
  • 课程网站包含了所有的资源:slides, notes, video, homework, and project

这门课程是 CMU 的图模型基础 + 进阶课,授课老师为 Eric P. Xing,涵盖了图模型基础,与神经网络的结合,在强化学习中的应用,以及非参数方法,相当硬核。

STATS214 / CS229M: Machine Learning Theory

课程简介

经典学习理论 + 最新深度学习理论,非常硬核。授课老师之前是 Percy Liang,现在是 Tengyu Ma

STATS214 / CS229M: Machine Learning Theory

课程简介

经典学习理论 + 最新深度学习理论,非常硬核。授课老师之前是 Percy Liang,现在是 Tengyu Ma。

STA 4273 Winter 2021: Minimizing Expectations

课程简介

这是一门较为进阶的 Ph.D. 研究课程,核心内容是 inference 和 control 之间的关系。授课老师为 Chris Maddison (AlphaGo founding member, NeurIPS 14 best paper)

STA 4273 Winter 2021: Minimizing Expectations

课程简介

这是一门较为进阶的 Ph.D. 研究课程,核心内容是 inference 和 control 之间的关系。授课老师为 Chris Maddison (AlphaGo founding member, NeurIPS 14 best paper)。

Columbia STAT 8201: Deep Generative Models

课程简介

这门课是一门 PhD 讨论班,每周的内容是展示 + 讨论论文,授课老师是 John Cunningham。Deep Generative Models (深度生成模型) 是图模型与神经网络的结合,也是现代机器学习最重要的方向之一

Columbia STAT 8201: Deep Generative Models

课程简介

这门课是一门 PhD 讨论班,每周的内容是展示 + 讨论论文,授课老师是 John Cunningham。Deep Generative Models (深度生成模型) 是图模型与神经网络的结合,也是现代机器学习最重要的方向之一。

机器学习进阶

此路线图适用于已经学过了基础机器学习 (ML, NLP, CV, RL) 的同学 (高年级本科生或低年级研究生),已经发表过至少一篇顶会论文 (NeurIPS, ICML, ICLR, ACL, EMNLP, NAACL, CVPR, ICCV) 想要走机器学习科研路线的选手。

此路线的目标是为读懂与发表机器学习顶会论文打下理论基础,特别是 Probabilistic Methods 这个 track 下的文章

机器学习进阶可能存在多种不同的学习路线,此路线只能代表作者 Yao Fu 所理解的最佳路径,侧重于贝叶斯学派下的概率建模方法,也会涉及到各项相关学科的交叉知识。

必读教材

  • PRML: Pattern Recognition and Machine Learning. Christopher Bishop
  • 经典贝叶斯学派教材
  • AoS: All of Statistics. Larry Wasserman
  • 经典频率学派教材

所以这两本书刚好相辅相成

字典

  • MLAPP: Machine Learning: A Probabilistic Perspective. Kevin Murphy
  • Convex Optimization. Stephen Boyd and Lieven Vandenberghe

进阶书籍

  • W&J: Graphical Models, Exponential Families, and Variational Inference. Martin Wainwright and Michael Jordan
  • Theory of Point Estimation. E. L. Lehmann and George Casella

如何阅读

Guidelines

  • 必读教材就是一定要读的教材
  • 字典的意思是,一般情况下不管它,但当遇到了不懂的概念的时候,就去字典里面查(而不是维基百科)
  • 进阶书籍先不读,先读完必读书籍。必读书籍一般都是要前前后后反复看过 N 遍才算读完
  • 读的过程中,最重要的读法就是对比阅读 (contrastive-comparative reading):同时打开两本书讲同一主题的章节,然后对比相同点和不同点和联系
  • 读的过程中,尽量去回想之前读过的论文,比较论文和教材的相同点与不同点

基础路径

  • 先读 AoS 第六章: Models, Statistical Inference and Learning,这一部分是最基础的科普
  • 然后读 PRML 第 10, 11 章
  • 第 10 章的内容是 Variational Inference, 第 11 章的内容是 MCMC, 这两种方法是贝叶斯推断的两条最主要路线
  • 如果在读 PRML 的过程中发现有任何不懂的名词,就去翻前面的章节。很大概率能够在第 3,4 章找到相对应的定义;如果找不到或者不够详细,就去查 MLAPP
  • AoS 第 8 章 (Parametric Inference) 和第 11 章 (Bayesian Inference) 也可以作为参考。最好的方法是多本书对比阅读,流程如下
    • 假设我在读 PRML 第 10 章的时候发现了一个不懂的词:posterior inference
    • 于是我往前翻,翻到了第 3 章 (Linear Model for Regression),看到了最简单的 posterior
    • 然后我接着翻 AoS,翻到了第 11 章,也有对 posterior 的描述
    • 然后我对比 PRML 第 10 章,第 3 章,AoS 第 11 章,三处不同地方对 posterior 的解读,比较其相同点和不同点和联系
  • 读完 PRML 第 10 和 11 章之后,接着读 AoS 第 24 章 (Simulation Methods),然后把它和 PRML 第 11 章对比阅读 -- 这俩都是讲 MCMC
  • 如果到此处发现还有基础概念读不懂,就回到 PRML 第 3 章,把它和 AoS 第 11 章对比阅读
  • Again,对比阅读非常重要,一定要把不同本书的类似内容同时摆在面前相互对比,这样可以显著增强记忆
  • 然后读 PRML 第 13 章(跳过第 12 章),这一章可以和 MLAPP 的第 17, 18 章对比阅读
  • MLAPP 第 17 章是 PRML 第 13.2 章的详细版,主要讲 HMM
  • MLAPP 第 18 章是 PRML 第 13.3 章的详细版,主要讲 LDS
  • 读完 PRML 第 13 章之后,再去读 PRML 第 8 章 (Graphical Models) -- 此时这部分应该会读得很轻松
  • 以上的内容可以进一步对照 CMU 10-708 PGM 课程材料

到目前为止,应该能够掌握

  • 概率模型的基础定义
  • 精准推断 - Sum-Product
  • 近似推断 - MCMC
  • 近似推断 - VI

然后就可以去做更进阶的内容

机器学习进阶

此路线图适用于已经学过了基础机器学习 (ML, NLP, CV, RL) 的同学 (高年级本科生或低年级研究生),已经发表过至少一篇顶会论文 (NeurIPS, ICML, ICLR, ACL, EMNLP, NAACL, CVPR, ICCV) 想要走机器学习科研路线的选手。

此路线的目标是为读懂与发表机器学习顶会论文打下理论基础,特别是 Probabilistic Methods 这个 track 下的文章。

机器学习进阶可能存在多种不同的学习路线,此路线只能代表作者 Yao Fu 所理解的最佳路径,侧重于贝叶斯学派下的概率建模方法,也会涉及到各项相关学科的交叉知识。

必读教材

  • PRML: Pattern Recognition and Machine Learning. Christopher Bishop
  • AoS: All of Statistics. Larry Wasserman

这两本书分别是经典贝叶斯学派和经典频率学派的教材,刚好相辅相成。

字典

  • MLAPP: Machine Learning: A Probabilistic Perspective. Kevin Murphy
  • Convex Optimization. Stephen Boyd and Lieven Vandenberghe

进阶书籍

  • W&J: Graphical Models, Exponential Families, and Variational Inference. Martin Wainwright and Michael Jordan
  • Theory of Point Estimation. E. L. Lehmann and George Casella

如何阅读

Guidelines

  • 必读教材就是一定要读的教材
  • 字典的意思是,一般情况下不管它,但当遇到了不懂的概念的时候,就去字典里面查(而不是维基百科)
  • 进阶书籍先不读,先读完必读书籍。必读书籍一般都是要前前后后反复看过 N 遍才算读完
  • 读的过程中,最重要的读法就是对比阅读 (contrastive-comparative reading):同时打开两本书讲同一主题的章节,然后对比相同点和不同点和联系
  • 读的过程中,尽量去回想之前读过的论文,比较论文和教材的相同点与不同点

基础路径

  • 先读 AoS 第六章: Models, Statistical Inference and Learning,这一部分是最基础的科普
  • 然后读 PRML 第 10, 11 章
  • 第 10 章的内容是 Variational Inference, 第 11 章的内容是 MCMC, 这两种方法是贝叶斯推断的两条最主要路线
  • 如果在读 PRML 的过程中发现有任何不懂的名词,就去翻前面的章节。很大概率能够在第 3,4 章找到相对应的定义;如果找不到或者不够详细,就去查 MLAPP
  • AoS 第 8 章 (Parametric Inference) 和第 11 章 (Bayesian Inference) 也可以作为参考。最好的方法是多本书对比阅读,流程如下
    • 假设我在读 PRML 第 10 章的时候发现了一个不懂的词:posterior inference
    • 于是我往前翻,翻到了第 3 章 (Linear Model for Regression),看到了最简单的 posterior
    • 然后我接着翻 AoS,翻到了第 11 章,也有对 posterior 的描述
    • 然后我对比 PRML 第 10 章,第 3 章,AoS 第 11 章,三处不同地方对 posterior 的解读,比较其相同点和不同点和联系
  • 读完 PRML 第 10 和 11 章之后,接着读 AoS 第 24 章 (Simulation Methods),然后把它和 PRML 第 11 章对比阅读 -- 这俩都是讲 MCMC
  • 如果到此处发现还有基础概念读不懂,就回到 PRML 第 3 章,把它和 AoS 第 11 章对比阅读
  • Again,对比阅读非常重要,一定要把不同本书的类似内容同时摆在面前相互对比,这样可以显著增强记忆
  • 然后读 PRML 第 13 章(跳过第 12 章),这一章可以和 MLAPP 的第 17, 18 章对比阅读
  • MLAPP 第 17 章是 PRML 第 13.2 章的详细版,主要讲 HMM
  • MLAPP 第 18 章是 PRML 第 13.3 章的详细版,主要讲 LDS
  • 读完 PRML 第 13 章之后,再去读 PRML 第 8 章 (Graphical Models) -- 此时这部分应该会读得很轻松
  • 以上的内容可以进一步对照 CMU 10-708 PGM 课程材料

到目前为止,应该能够掌握

  • 概率模型的基础定义
  • 精准推断 - Sum-Product
  • 近似推断 - MCMC
  • 近似推断 - VI

然后就可以去做更进阶的内容