Creating an RL environment is like designing a video game or simulation. Before writing any code, you need to think through the learning problem you want to solve. This design phase is crucial - a poorly designed environment will make learning difficult or impossible, no matter how good your algorithm is.
For our tutorial example, we'll create a simple GridWorld environment:
- **🎯 Skill**: Navigate efficiently to a target location
- **👀 Information**: Agent position and target position on a grid
- **🎮 Actions**: Move up, down, left, or right
- **🏆 Success**: Reach the target in minimum steps
- **⏰ End**: When agent reaches target (or optional time limit)
This provides a clear learning problem that's simple enough to understand but non-trivial to solve optimally.
---
This page provides a complete implementation of creating custom environments with Gymnasium. For a more [complete tutorial](../tutorials/gymnasium_basics/environment_creation) with rendering.
We recommend that you familiarise yourself with the [basic usage](basic_usage) before reading this page!
We will implement our GridWorld game as a 2-dimensional square grid of fixed size. The agent can move vertically or horizontally between grid cells in each timestep, and the goal is to navigate to a target that has been placed randomly at the beginning of the episode.
Like all environments, our custom environment will inherit from :class:`gymnasium.Env` that defines the structure all environments must follow. One of the requirements is defining the observation and action spaces, which declare what inputs (actions) and outputs (observations) are valid for this environment.
As outlined in our design, our agent has four discrete actions (move in cardinal directions), so we'll use ``Discrete(4)`` space.
For our observation, we have several options. We could represent the full grid as a 2D array, or use coordinate positions, or even a 3D array with separate "layers" for agent and target. For this tutorial, we'll use a simple dictionary format like ``{"agent": array([1, 0]), "target": array([0, 3])}`` where the arrays represent x,y coordinates.
This choice makes the observation human-readable and easy to debug. We'll declare this as a :class:`Dict` space with the agent and target spaces being :class:`Box` spaces that contain integer coordinates.
Since we need to compute observations in both :meth:`Env.reset` and :meth:`Env.step`, it's convenient to have a helper method ``_get_obs`` that translates the environment's internal state into the observation format. This keeps our code DRY (Don't Repeat Yourself) and makes it easier to modify the observation format later.
We can also implement a similar method for auxiliary information returned by :meth:`Env.reset` and :meth:`Env.step`. In our case, we'll provide the Manhattan distance between agent and target - this can be useful for debugging and understanding agent progress, but shouldn't be used by the learning algorithm itself.
Sometimes info will contain data that's only available inside :meth:`Env.step` (like individual reward components, action success/failure, etc.). In those cases, we'd update the dictionary returned by ``_get_info`` directly in the step method.
The :meth:`reset` method starts a new episode. It takes two optional parameters: ``seed`` for reproducible random generation and ``options`` for additional configuration. On the first line, you must call ``super().reset(seed=seed)`` to properly initialize the random number generator.
In our GridWorld environment, :meth:`reset` randomly places the agent and target on the grid, ensuring they don't start in the same location. We return both the initial observation and info as a tuple.
The :meth:`step` method contains the core environment logic. It takes an action, updates the environment state, and returns the results. This is where the physics, game rules, and reward logic live.
While you can use your custom environment immediately, it's more convenient to register it with Gymnasium so you can create it with :meth:`gymnasium.make` just like built-in environments.
The environment ID has three components: an optional namespace (here: ``gymnasium_env``), a mandatory name (here: ``GridWorld``), and an optional but recommended version (here: v0). You could register it as ``GridWorld-v0``, ``GridWorld``, or ``gymnasium_env/GridWorld``, but the full format is recommended for clarity.
Since this tutorial isn't part of a Python package, we pass the class directly as the entry point. In real projects, you'd typically use a string like ``"my_package.envs:GridWorldEnv"``.
For a more complete guide on registering custom environments (including with string entry points), please read the full [create environment](../tutorials/gymnasium_basics/environment_creation) tutorial.
Once registered, you can check all available environments with :meth:`gymnasium.pprint_registry` and create instances with :meth:`gymnasium.make`. You can also create vectorized versions with :meth:`gymnasium.make_vec`.
Sometimes you want to modify your environment's behavior without changing the core implementation. Wrappers are perfect for this - they let you add functionality like changing observation formats, adding time limits, or modifying rewards without touching your original environment code.
This is particularly useful when working with algorithms that expect specific input formats (like neural networks that need 1D arrays instead of dictionaries).
## Advanced Environment Features
Once you have the basics working, you might want to add more sophisticated features:
### Adding Rendering
```python
def render(self):
"""Render the environment for human viewing."""
if self.render_mode == "human":
# Print a simple ASCII representation
for y in range(self.size - 1, -1, -1): # Top to bottom
reward = -self.step_penalty # Step penalty (0 by default)
```
## Real-World Environment Design Tips
### Start Simple, Add Complexity Gradually
1.**First**: Get basic movement and goal-reaching working
2.**Then**: Add obstacles, multiple goals, or time pressure
3.**Finally**: Add complex dynamics, partial observability, or multi-agent interactions
### Design for Learning
- **Clear Success Criteria**: Agent should know when it's doing well
- **Reasonable Difficulty**: Not too easy (trivial) or too hard (impossible)
- **Consistent Rules**: Same action in same state should have same effect
- **Informative Observations**: Include everything needed for optimal decisions
### Think About Your Research Question
- **Navigation**: Focus on spatial reasoning and path planning
- **Control**: Emphasize dynamics, stability, and continuous actions
- **Strategy**: Include partial information, opponent modeling, or long-term planning
- **Optimization**: Design clear trade-offs and resource constraints
## Next Steps
Congratulations! You now know how to create custom RL environments. Here's what to explore next:
1.**Add rendering** to visualize your environment ([complete tutorial](../tutorials/gymnasium_basics/environment_creation))
2.**Train an agent** on your custom environment ([training guide](train_agent))
3.**Experiment with different reward functions** to see how they affect learning
4.**Try wrapper combinations** to modify your environment's behavior
5.**Create more complex environments** with obstacles, multiple agents, or continuous actions
The key to good environment design is iteration - start simple, test thoroughly, and gradually add complexity as needed for your research or application goals.