Apply suggestions from code review

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Omar Sanseviero <osanseviero@gmail.com>
This commit is contained in:
Thomas Simonini
2022-12-05 02:28:23 +01:00
committed by GitHub
parent 8e17dcae02
commit 738c012e44
2 changed files with 3 additions and 3 deletions

View File

@@ -126,7 +126,7 @@ Which is equivalent to:
However, in reality, **we cant just add them like that.** The rewards that come sooner (at the beginning of the game) **are more likely to happen** since they are more predictable than the long-term future reward.
Lets say your agent is this tiny mouse that can move one tile each time step, and your opponent is the cat (that can move too). Your goal is **to eat the maximum amount of cheese before being eaten by the cat.**
Lets say your agent is this tiny mouse that can move one tile each time step, and your opponent is the cat (that can move too). The mouse's goal is **to eat the maximum amount of cheese before being eaten by the cat.**
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit1/rewards_3.jpg" alt="Rewards" width="100%">
@@ -142,5 +142,5 @@ To discount the rewards, we proceed like this:
2. Then, each reward will be discounted by gamma to the exponent of the time step. As the time step increases, the cat gets closer to us, **so the future reward is less and less likely to happen.**
Our discounted cumulative expected rewards is:
Our discounted expected cumulative reward is:
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit1/rewards_4.jpg" alt="Rewards" width="100%">

View File

@@ -1,6 +1,6 @@
# Summary [[summary]]
That was a lot of information, if we summarize:
That was a lot of information! Let's summarize:
- Reinforcement Learning is a computational approach of learning from action. We build an agent that learns from the environment **by interacting with it through trial and error** and receiving rewards (negative or positive) as feedback.