diff --git a/.github/workflows/delete_doc_comment.yml b/.github/workflows/delete_doc_comment.yml
deleted file mode 100644
index 72801c8..0000000
--- a/.github/workflows/delete_doc_comment.yml
+++ /dev/null
@@ -1,13 +0,0 @@
-name: Delete doc comment
-
-on:
- workflow_run:
- workflows: ["Delete doc comment trigger"]
- types:
- - completed
-
-jobs:
- delete:
- uses: huggingface/doc-builder/.github/workflows/delete_doc_comment.yml@main
- secrets:
- comment_bot_token: ${{ secrets.COMMENT_BOT_TOKEN }}
\ No newline at end of file
diff --git a/.github/workflows/delete_doc_comment_trigger.yml b/.github/workflows/delete_doc_comment_trigger.yml
deleted file mode 100644
index 5e39e25..0000000
--- a/.github/workflows/delete_doc_comment_trigger.yml
+++ /dev/null
@@ -1,12 +0,0 @@
-name: Delete doc comment trigger
-
-on:
- pull_request:
- types: [ closed ]
-
-
-jobs:
- delete:
- uses: huggingface/doc-builder/.github/workflows/delete_doc_comment_trigger.yml@main
- with:
- pr_number: ${{ github.event.number }}
\ No newline at end of file
diff --git a/notebooks/bonus-unit1/bonus-unit1.ipynb b/notebooks/bonus-unit1/bonus-unit1.ipynb
index 58f20cf..39e32cb 100644
--- a/notebooks/bonus-unit1/bonus-unit1.ipynb
+++ b/notebooks/bonus-unit1/bonus-unit1.ipynb
@@ -217,25 +217,25 @@
]
},
{
- "cell_type": "code",
+ "cell_type": "markdown",
"source": [
- "!wget --load-cookies /tmp/cookies.txt \"https://docs.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id=1zv3M95ZJTWHUVOWT6ckq_cm98nft8gdF' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\\1\\n/p')&id=1zv3M95ZJTWHUVOWT6ckq_cm98nft8gdF\" -O ./trained-envs-executables/linux/Huggy.zip && rm -rf /tmp/cookies.txt"
+ "We downloaded the file Huggy.zip from https://github.com/huggingface/Huggy using `wget`"
],
"metadata": {
- "id": "EB-G-80GsxYN"
+ "id": "IHh_LXsRrrbM"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "!wget \"https://github.com/huggingface/Huggy/raw/main/Huggy.zip\" -O ./trained-envs-executables/linux/Huggy.zip"
+ ],
+ "metadata": {
+ "id": "8xNAD1tRpy0_"
},
"execution_count": null,
"outputs": []
},
- {
- "cell_type": "markdown",
- "metadata": {
- "id": "jsoZGxr1MIXY"
- },
- "source": [
- "Download the file Huggy.zip from https://drive.google.com/uc?export=download&id=1zv3M95ZJTWHUVOWT6ckq_cm98nft8gdF using `wget`. Check out the full solution to download large files from GDrive [here](https://bcrf.biochem.wisc.edu/2021/02/05/download-google-drive-files-using-wget/)"
- ]
- },
{
"cell_type": "code",
"execution_count": null,
@@ -441,7 +441,7 @@
},
"outputs": [],
"source": [
- "!mlagents-learn ./config/ppo/Huggy.yaml --env=./trained-envs-executables/linux/Huggy/Huggy --run-id=\"Huggy\" --no-graphics"
+ "!mlagents-learn ./config/ppo/Huggy.yaml --env=./trained-envs-executables/linux/Huggy/Huggy --run-id=\"Huggy2\" --no-graphics"
]
},
{
diff --git a/notebooks/unit1/unit1.ipynb b/notebooks/unit1/unit1.ipynb
index eb02ad4..858fe5a 100644
--- a/notebooks/unit1/unit1.ipynb
+++ b/notebooks/unit1/unit1.ipynb
@@ -115,7 +115,7 @@
"\n",
"🔲 📝 **[Read Unit 0](https://huggingface.co/deep-rl-course/unit0/introduction)** that gives you all the **information about the course and helps you to onboard** 🤗\n",
"\n",
- "🔲 📚 **Develop an understanding of the foundations of Reinforcement learning** (MC, TD, Rewards hypothesis...) by [reading Unit 1](https://huggingface.co/deep-rl-course/unit1/introduction)."
+ "🔲 📚 **Develop an understanding of the foundations of Reinforcement learning** (RL process, Rewards hypothesis...) by [reading Unit 1](https://huggingface.co/deep-rl-course/unit1/introduction)."
]
},
{
diff --git a/notebooks/unit5/unit5.ipynb b/notebooks/unit5/unit5.ipynb
index cef401d..633d204 100644
--- a/notebooks/unit5/unit5.ipynb
+++ b/notebooks/unit5/unit5.ipynb
@@ -206,7 +206,7 @@
},
"outputs": [],
"source": [
- "%%capture\n",
+ "\n",
"# Go inside the repository and install the package\n",
"%cd ml-agents\n",
"!pip3 install -e ./ml-agents-envs\n",
@@ -600,58 +600,38 @@
},
{
"cell_type": "markdown",
- "metadata": {
- "id": "NyqYYkLyAVMK"
- },
"source": [
- "Download the file Pyramids.zip from https://drive.google.com/uc?export=download&id=1UiFNdKlsH0NTu32xV-giYUEVKV4-vc7H using `wget`. Check out the full solution to download large files from GDrive [here](https://bcrf.biochem.wisc.edu/2021/02/05/download-google-drive-files-using-wget/)"
- ]
+ "We downloaded the file Pyramids.zip from from https://huggingface.co/spaces/unity/ML-Agents-Pyramids/resolve/main/Pyramids.zip using `wget`"
+ ],
+ "metadata": {
+ "id": "x2C48SGZjZYw"
+ }
},
{
"cell_type": "code",
- "execution_count": null,
- "metadata": {
- "id": "AxojCsSVAVMP"
- },
- "outputs": [],
"source": [
- "!wget --load-cookies /tmp/cookies.txt \"https://docs.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id=1UiFNdKlsH0NTu32xV-giYUEVKV4-vc7H' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\\1\\n/p')&id=1UiFNdKlsH0NTu32xV-giYUEVKV4-vc7H\" -O ./training-envs-executables/linux/Pyramids.zip && rm -rf /tmp/cookies.txt"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "id": "bfs6CTJ1AVMP"
- },
- "source": [
- "**OR** Download directly to local machine and then drag and drop the file from local machine to `./training-envs-executables/linux`"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "id": "H7JmgOwcSSmF"
- },
- "source": [
- "Wait for the upload to finish and then run the command below.\n",
- "\n",
- ""
- ]
- },
- {
- "cell_type": "markdown",
- "source": [
- "Unzip it"
+ "!wget \"https://huggingface.co/spaces/unity/ML-Agents-Pyramids/resolve/main/Pyramids.zip\" -O ./training-envs-executables/linux/Pyramids.zip"
],
"metadata": {
- "id": "iWUUcs0_794U"
+ "id": "eWh8Pl3sjZY2"
+ },
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "We unzip the executable.zip file"
+ ],
+ "metadata": {
+ "id": "V5LXPOPujZY3"
}
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
- "id": "i2E3K4V2AVMP"
+ "id": "SmNgFdXhjZY3"
},
"outputs": [],
"source": [
@@ -662,7 +642,7 @@
{
"cell_type": "markdown",
"metadata": {
- "id": "KmKYBgHTAVMP"
+ "id": "T1jxwhrJjZY3"
},
"source": [
"Make sure your file is accessible"
@@ -672,7 +652,7 @@
"cell_type": "code",
"execution_count": null,
"metadata": {
- "id": "Im-nwvLPAVMP"
+ "id": "6fDd03btjZY3"
},
"outputs": [],
"source": [
@@ -834,4 +814,4 @@
},
"nbformat": 4,
"nbformat_minor": 0
-}
\ No newline at end of file
+}
diff --git a/units/en/unit0/discord101.mdx b/units/en/unit0/discord101.mdx
index 962c766..1c3d440 100644
--- a/units/en/unit0/discord101.mdx
+++ b/units/en/unit0/discord101.mdx
@@ -17,7 +17,7 @@ Then click next, you'll then get to **introduce yourself in the `#introduce-your
They are in the reinforcement learning category. **Don't forget to sign up to these channels** by clicking on 🤖 Reinforcement Learning in `role-assigment`.
-- `rl-announcements`: where we give the **lastest information about the course**.
+- `rl-announcements`: where we give the **latest information about the course**.
- `rl-discussions`: where you can **exchange about RL and share information**.
- `rl-study-group`: where you can **ask questions and exchange with your classmates**.
- `rl-i-made-this`: where you can **share your projects and models**.
diff --git a/units/en/unit2/bellman-equation.mdx b/units/en/unit2/bellman-equation.mdx
index f401ccc..6f85eed 100644
--- a/units/en/unit2/bellman-equation.mdx
+++ b/units/en/unit2/bellman-equation.mdx
@@ -27,7 +27,7 @@ Instead of calculating the expected return for each state or each state-action p
The Bellman equation is a recursive equation that works like this: instead of starting for each state from the beginning and calculating the return, we can consider the value of any state as:
-**The immediate reward \\(R_{t+1}\\) + the discounted value of the state that follows ( \\(gamma * V(S_{t+1}) \\) ) .**
+**The immediate reward \\(R_{t+1}\\) + the discounted value of the state that follows ( \\(\gamma * V(S_{t+1}) \\) ) .**
diff --git a/units/en/unit2/quiz2.mdx b/units/en/unit2/quiz2.mdx
index 961d477..3ab4f51 100644
--- a/units/en/unit2/quiz2.mdx
+++ b/units/en/unit2/quiz2.mdx
@@ -19,12 +19,11 @@ The best way to learn and [to avoid the illusion of competence](https://www.cour
},
{
text: "An algorithm that determines the value of being at a particular state and taking a specific action at that state",
- explain: "",
- correct: true
+ explain: "Q-function is the function that determines the value of being at a particular state and taking a specific action at that state.",
},
{
text: "A table",
- explain: "Q-function is not a Q-table. The Q-function is the algorithm that will feed the Q-table."
+ explain: "Q-learning is not a Q-table. The Q-function is the algorithm that will feed the Q-table."
}
]}
/>
diff --git a/units/en/unit4/pg-theorem.mdx b/units/en/unit4/pg-theorem.mdx
index 7b393cc..de35adb 100644
--- a/units/en/unit4/pg-theorem.mdx
+++ b/units/en/unit4/pg-theorem.mdx
@@ -27,9 +27,12 @@ We then multiply every term in the sum by \\(\frac{P(\tau;\theta)}{P(\tau;\theta
\\( = \sum_{\tau} \frac{P(\tau;\theta)}{P(\tau;\theta)}\nabla_\theta P(\tau;\theta)R(\tau) \\)
-We can simplify further this since \\( \frac{P(\tau;\theta)}{P(\tau;\theta)}\nabla_\theta P(\tau;\theta)\\). Thus we can rewrite the sum as \\( = P(\tau;\theta)\frac{\nabla_\theta P(\tau;\theta)}{P(\tau;\theta)} \\)
-\\(= \sum_{\tau} P(\tau;\theta) \frac{\nabla_\theta P(\tau;\theta)}{P(\tau;\theta)}R(\tau) \\)
+We can simplify further this since \\( \frac{P(\tau;\theta)}{P(\tau;\theta)}\nabla_\theta P(\tau;\theta)\\).
+
+Thus we can rewrite the sum as \\( = P(\tau;\theta)\frac{\nabla_\theta P(\tau;\theta)}{P(\tau;\theta)} \\)
+
+\\( P(\tau;\theta)\frac{\nabla_\theta P(\tau;\theta)}{P(\tau;\theta)}= \sum_{\tau} P(\tau;\theta) \frac{\nabla_\theta P(\tau;\theta)}{P(\tau;\theta)}R(\tau) \\)
We can then use the *derivative log trick* (also called *likelihood ratio trick* or *REINFORCE trick*), a simple rule in calculus that implies that \\( \nabla_x log f(x) = \frac{\nabla_x f(x)}{f(x)} \\)
diff --git a/units/en/unit4/policy-gradient.mdx b/units/en/unit4/policy-gradient.mdx
index e329e02..9ea6b3b 100644
--- a/units/en/unit4/policy-gradient.mdx
+++ b/units/en/unit4/policy-gradient.mdx
@@ -2,7 +2,7 @@
## Getting the big picture
-We just learned that policy-gradient methods aim to find parameters \\( \theta \\) that **maximize the expected return**.
+We just learned that policy-gradient methods aim to find parameters \\( \theta \\) that **maximize the expected return**.
The idea is that we have a *parameterized stochastic policy*. In our case, a neural network outputs a probability distribution over actions. The probability of taking each action is also called the *action preference*.
@@ -20,7 +20,7 @@ But **how are we going to optimize the weights using the expected return**?
The idea is that we're going to **let the agent interact during an episode**. And if we win the episode, we consider that each action taken was good and must be more sampled in the future
since they lead to win.
-So for each state-action pair, we want to increase the \\(P(a|s)\\): the probability of taking that action at that state. Or decrease if we lost.
+So for each state-action pair, we want to increase the \\(P(a|s)\\): the probability of taking that action at that state. Or decrease if we lost.
The Policy-gradient algorithm (simplified) looks like this:
@@ -31,15 +31,15 @@ Now that we got the big picture, let's dive deeper into policy-gradient methods.
## Diving deeper into policy-gradient methods
-We have our stochastic policy \\(\pi\\) which has a parameter \\(\theta\\). This \\(\pi\\), given a state, **outputs a probability distribution of actions**.
+We have our stochastic policy \\(\pi\\) which has a parameter \\(\theta\\). This \\(\pi\\), given a state, **outputs a probability distribution of actions**.
-Where \\(\pi_\theta(a_t|s_t)\\) is the probability of the agent selecting action \\(a_t\\) from state \\(s_t\\) given our policy.
+Where \\(\pi_\theta(a_t|s_t)\\) is the probability of the agent selecting action \\(a_t\\) from state \\(s_t\\) given our policy.
-**But how do we know if our policy is good?** We need to have a way to measure it. To know that, we define a score/objective function called \\(J(\theta)\\).
+**But how do we know if our policy is good?** We need to have a way to measure it. To know that, we define a score/objective function called \\(J(\theta)\\).
### The objective function
@@ -48,19 +48,20 @@ The *objective function* gives us the **performance of the agent** given a traje
Let's give some more details on this formula:
-- The *expected return* (also called expected cumulative reward), is the weighted average (where the weights are given by \\(P(\tau;\theta)\\) of all possible values that the return \\(R(\tau)\\) can take).
+- The *expected return* (also called expected cumulative reward), is the weighted average (where the weights are given by \\(P(\tau;\theta)\\) of all possible values that the return \\(R(\tau)\\) can take).
- \\(R(\tau)\\) : Return from an arbitrary trajectory. To take this quantity and use it to calculate the expected return, we need to multiply it by the probability of each possible trajectory.
-- \\(P(\tau;\theta)\\) : Probability of each possible trajectory \\(\tau\\) (that probability depends on \\( \theta\\) since it defines the policy that it uses to select the actions of the trajectory which has an impact of the states visited).
+
+- \\(P(\tau;\theta)\\) : Probability of each possible trajectory \\(\tau\\) (that probability depends on \\(\theta\\) since it defines the policy that it uses to select the actions of the trajectory which has an impact of the states visited).
-- \\(J(\theta)\\) : Expected return, we calculate it by summing for all trajectories, the probability of taking that trajectory given \\(\theta \\) multiplied by the return of this trajectory.
+- \\(J(\theta)\\) : Expected return, we calculate it by summing for all trajectories, the probability of taking that trajectory given \\(\theta \\) multiplied by the return of this trajectory.
-Our objective then is to maximize the expected cumulative reward by finding the \\(\theta \\) that will output the best action probability distributions:
+Our objective then is to maximize the expected cumulative reward by finding the \\(\theta \\) that will output the best action probability distributions:
@@ -68,7 +69,7 @@ Our objective then is to maximize the expected cumulative reward by finding the
## Gradient Ascent and the Policy-gradient Theorem
-Policy-gradient is an optimization problem: we want to find the values of \\(\theta\\) that maximize our objective function \\(J(\theta)\\), so we need to use **gradient-ascent**. It's the inverse of *gradient-descent* since it gives the direction of the steepest increase of \\(J(\theta)\\).
+Policy-gradient is an optimization problem: we want to find the values of \\(\theta\\) that maximize our objective function \\(J(\theta)\\), so we need to use **gradient-ascent**. It's the inverse of *gradient-descent* since it gives the direction of the steepest increase of \\(J(\theta)\\).
(If you need a refresher on the difference between gradient descent and gradient ascent [check this](https://www.baeldung.com/cs/gradient-descent-vs-ascent) and [this](https://stats.stackexchange.com/questions/258721/gradient-ascent-vs-gradient-descent-in-logistic-regression)).
@@ -76,9 +77,9 @@ Our update step for gradient-ascent is:
\\( \theta \leftarrow \theta + \alpha * \nabla_\theta J(\theta) \\)
-We can repeatedly apply this update in the hopes that \\(\theta \\) converges to the value that maximizes \\(J(\theta)\\).
+We can repeatedly apply this update in the hopes that \\(\theta \\) converges to the value that maximizes \\(J(\theta)\\).
-However, there are two problems with computing the derivative of \\(J(\theta)\\):
+However, there are two problems with computing the derivative of \\(J(\theta)\\):
1. We can't calculate the true gradient of the objective function since it requires calculating the probability of each possible trajectory, which is computationally super expensive.
So we want to **calculate a gradient estimation with a sample-based estimate (collect some trajectories)**.
@@ -97,18 +98,20 @@ If you want to understand how we derive this formula for approximating the gradi
The Reinforce algorithm, also called Monte-Carlo policy-gradient, is a policy-gradient algorithm that **uses an estimated return from an entire episode to update the policy parameter** \\(\theta\\):
In a loop:
-- Use the policy \\(\pi_\theta\\) to collect an episode \\(\tau\\)
-- Use the episode to estimate the gradient \\(\hat{g} = \nabla_\theta J(\theta)\\)
+- Use the policy \\(\pi_\theta\\) to collect an episode \\(\tau\\)
+- Use the episode to estimate the gradient \\(\hat{g} = \nabla_\theta J(\theta)\\)
-- Update the weights of the policy: \\(\theta \leftarrow \theta + \alpha \hat{g}\\)
+- Update the weights of the policy: \\(\theta \leftarrow \theta + \alpha \hat{g}\\)
We can interpret this update as follows:
+
- \\(\nabla_\theta log \pi_\theta(a_t|s_t)\\) is the direction of **steepest increase of the (log) probability** of selecting action \\(a_t\\) from state \\(s_t\\).
This tells us **how we should change the weights of policy** if we want to increase/decrease the log probability of selecting action \\(a_t\\) at state \\(s_t\\).
+
- \\(R(\tau)\\): is the scoring function:
- If the return is high, it will **push up the probabilities** of the (state, action) combinations.
- Otherwise, if the return is low, it will **push down the probabilities** of the (state, action) combinations.
diff --git a/units/en/unit5/hands-on.mdx b/units/en/unit5/hands-on.mdx
index cd4a157..1b9dfc3 100644
--- a/units/en/unit5/hands-on.mdx
+++ b/units/en/unit5/hands-on.mdx
@@ -288,17 +288,16 @@ Now let's try a more challenging environment called Pyramids.
- We need to download it and place it into `./training-envs-executables/linux/`
- We use a linux executable because we're using colab, and the colab machine's OS is Ubuntu (linux)
-Download the file Pyramids.zip from https://drive.google.com/uc?export=download&id=1UiFNdKlsH0NTu32xV-giYUEVKV4-vc7H using `wget`. Check out the full solution to download large files from GDrive [here](https://bcrf.biochem.wisc.edu/2021/02/05/download-google-drive-files-using-wget/)
+We downloaded the file Pyramids.zip from from https://huggingface.co/spaces/unity/ML-Agents-Pyramids/resolve/main/Pyramids.zip using `wget`
```python
-!wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id=1UiFNdKlsH0NTu32xV-giYUEVKV4-vc7H' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')&id=1UiFNdKlsH0NTu32xV-giYUEVKV4-vc7H" -O ./training-envs-executables/linux/Pyramids.zip && rm -rf /tmp/cookies.txt
+wget "https://huggingface.co/spaces/unity/ML-Agents-Pyramids/resolve/main/Pyramids.zip" -O ./training-envs-executables/linux/Pyramids.zip
```
Unzip it
```python
-%%capture
-!unzip -d ./training-envs-executables/linux/ ./training-envs-executables/linux/Pyramids.zip
+unzip -d ./training-envs-executables/linux/ ./training-envs-executables/linux/Pyramids.zip
```
Make sure your file is accessible
diff --git a/units/en/unit5/quiz.mdx b/units/en/unit5/quiz.mdx
index 7b9ec0c..ccb8f40 100644
--- a/units/en/unit5/quiz.mdx
+++ b/units/en/unit5/quiz.mdx
@@ -78,13 +78,15 @@ The best way to learn and [to avoid the illusion of competence](https://www.cour
### Q3: Fill the missing letters
-- In Unity ML-Agents, the Policy of an Agent is called a b _ _ _ n
-- The component in charge of orchestrating the agents is called the _ c _ _ _ m _
+- In Unity ML-Agents, the Policy of an Agent is called a b \_ \_ \_ n
+- The component in charge of orchestrating the agents is called the \_ c \_ \_ \_ m \_
Solution
-- b r a i n
-- a c a d e m y
+
+
b r a i n
+
a c a d e m y
+
### Q4: Define with your own words what is a `raycast`
diff --git a/units/en/unit6/variance-problem.mdx b/units/en/unit6/variance-problem.mdx
index 9ce3d8e..1fbbe9c 100644
--- a/units/en/unit6/variance-problem.mdx
+++ b/units/en/unit6/variance-problem.mdx
@@ -27,4 +27,5 @@ However, increasing the batch size significantly **reduces sample efficiency**.
If you want to dive deeper into the question of variance and bias tradeoff in Deep Reinforcement Learning, you can check out these two articles:
- [Making Sense of the Bias / Variance Trade-off in (Deep) Reinforcement Learning](https://blog.mlreview.com/making-sense-of-the-bias-variance-trade-off-in-deep-reinforcement-learning-79cf1e83d565)
- [Bias-variance Tradeoff in Reinforcement Learning](https://www.endtoend.ai/blog/bias-variance-tradeoff-in-reinforcement-learning/)
+- [High Variance in Policy gradients](https://balajiai.github.io/high_variance_in_policy_gradients)
---
diff --git a/units/en/unitbonus3/envs-to-try.mdx b/units/en/unitbonus3/envs-to-try.mdx
index 6bc4ba9..ce372e5 100644
--- a/units/en/unitbonus3/envs-to-try.mdx
+++ b/units/en/unitbonus3/envs-to-try.mdx
@@ -2,6 +2,45 @@
Here we provide a list of interesting environments you can try to train your agents on:
+## DIAMBRA Arena
+
+
+
+
+DIAMBRA Arena is a software package featuring a collection of high-quality environments for Reinforcement Learning research and experimentation. It provides a standard interface to popular arcade emulated video games, offering a Python API fully compliant with OpenAI Gym/Gymnasium format, that makes its adoption smooth and straightforward.
+
+It supports all major Operating Systems (Linux, Windows and MacOS) and can be easily installed via [Python PIP](https://pypi.org/project/diambra-arena/). It is completely free to use, the user only needs to register on the [official website](https://diambra.ai/register/).
+
+In addition, its [GitHub repository](https://github.com/diambra/) provides a collection of examples covering main use cases of interest that can be run in just a few steps.
+
+#### Main Features
+
+All environments are episodic Reinforcement Learning tasks, with discrete actions (gamepad buttons) and observations composed by screen pixels plus additional numerical data (RAM values like characters health bars or characters stage side).
+
+They all support both single player (1P) as well as two players (2P) mode, making them the perfect resource to explore Standard RL, Competitive Multi-Agent, Competitive Human-Agent, Self-Play, Imitation Learning and Human-in-the-Loop.
+
+[Interfaced games](https://docs.diambra.ai/envs/games/) have been selected among the most popular fighting retro-games. While sharing the same fundamental mechanics, they provide different challenges, with specific features such as different type and number of characters, how to perform combos, health bars recharging, etc.
+
+DIAMBRA Arena is built to maximize compatibility will all major Reinforcement Learning libraries. It natively provides interfaces with the two most important packages: [Stable Baselines 3](https://stable-baselines3.readthedocs.io/en/master/) and [Ray RLlib](https://docs.ray.io/en/latest/rllib/index.html), while Stable Baselines is also available but deprecated. Their usage is illustrated in the [official documentation](https://docs.diambra.ai/) and in the [DIAMBRA Agents examples repository](https://github.com/diambra/agents). It can easily be interfaced with any other package in a similar way.
+
+### Competition Platform
+
+DIAMBRA also provides a competition platform fully integrated with the Hugging Face Hub, on which you can submit your trained agents and compete with other coders around the globe in epic video games tournaments!
+
+It features a public leaderboard where users are ranked by the best score achieved by their agents in our different environments.
+
+It also offers the possibility to unlock cool achievements depending on the performances of your agent.
+
+Submitted agents are evaluated and their episodes are streamed on [DIAMBRA Twitch channel](https://www.twitch.tv/diambra_ai).
+
+#### References
+
+To start using this environment, check these resources:
+- [Official Docs](https://docs.diambra.ai/)
+- [Competition Platform](https://diambra.ai)
+- [GitHub](https://github.com/diambra/)
+- [Discord](https://diambra.ai/discord)
+
## MineRL