Gymnasium/gymnasium/functional.py

"""Base class and definitions for an alternative, functional backend for gym envs, particularly suitable for hardware accelerated and otherwise transformed environments."""

from __future__ import annotations

from typing import Any, Callable, Generic, TypeVar

import numpy as np

from gymnasium import Space


StateType = TypeVar("StateType")
ActType = TypeVar("ActType")
ObsType = TypeVar("ObsType")
RewardType = TypeVar("RewardType")
TerminalType = TypeVar("TerminalType")
RenderStateType = TypeVar("RenderStateType")
Params = TypeVar("Params")


class FuncEnv(
    Generic[
        StateType, ObsType, ActType, RewardType, TerminalType, RenderStateType, Params
    ]
):
    """Base class (template) for functional envs.

    This API is meant to be used in a stateless manner, with the environment state being passed around explicitly.
    That being said, nothing here prevents users from using the environment statefully, it's just not recommended.
    A functional env consists of the following functions (in this case, instance methods):

     * initial: returns the initial state of the POMDP
     * observation: returns the observation in a given state
     * transition: returns the next state after taking an action in a given state
     * reward: returns the reward for a given (state, action, next_state) tuple
     * terminal: returns whether a given state is terminal
     * state_info: optional, returns a dict of info about a given state
     * step_info: optional, returns a dict of info about a given (state, action, next_state) tuple

    The class-based structure serves the purpose of allowing environment constants to be defined in the class,
    and then using them by name in the code itself.

    For the moment, this is predominantly for internal use. This API is likely to change, but in the future
    we intend to flesh it out and officially expose it to end users.
    """

    observation_space: Space
    action_space: Space

    def __init__(self, options: dict[str, Any] | None = None):
        """Initialize the environment constants."""
        self.__dict__.update(options or {})
        self.default_params = self.get_default_params()

    def initial(self, rng: Any, params: Params | None = None) -> StateType:
        """Generates the initial state of the environment with a random number generator."""
        raise NotImplementedError

    def transition(
        self, state: StateType, action: ActType, rng: Any, params: Params | None = None
    ) -> StateType:
        """Updates (transitions) the state with an action and random number generator."""
        raise NotImplementedError

    def observation(
        self, state: StateType, rng: Any, params: Params | None = None
    ) -> ObsType:
        """Generates an observation for a given state of an environment."""
        raise NotImplementedError

    def reward(
        self,
        state: StateType,
        action: ActType,
        next_state: StateType,
        rng: Any,
        params: Params | None = None,
    ) -> RewardType:
        """Computes the reward for a given transition between `state`, `action` to `next_state`."""
        raise NotImplementedError

    def terminal(
        self, state: StateType, rng: Any, params: Params | None = None
    ) -> TerminalType:
        """Returns if the state is a final terminal state."""
        raise NotImplementedError

    def state_info(self, state: StateType, params: Params | None = None) -> dict:
        """Info dict about a single state."""
        return {}

    def transition_info(
        self,
        state: StateType,
        action: ActType,
        next_state: StateType,
        params: Params | None = None,
    ) -> dict:
        """Info dict about a full transition."""
        return {}

    def transform(self, func: Callable[[Callable], Callable]):
        """Functional transformations."""
        self.initial = func(self.initial)
        self.transition = func(self.transition)
        self.observation = func(self.observation)
        self.reward = func(self.reward)
        self.terminal = func(self.terminal)
        self.state_info = func(self.state_info)
        self.step_info = func(self.transition_info)

    def render_image(
        self,
        state: StateType,
        render_state: RenderStateType,
        params: Params | None = None,
    ) -> tuple[RenderStateType, np.ndarray]:
        """Show the state."""
        raise NotImplementedError

    def render_init(self, params: Params | None = None, **kwargs) -> RenderStateType:
        """Initialize the render state."""
        raise NotImplementedError

    def render_close(self, render_state: RenderStateType, params: Params | None = None):
        """Close the render state."""
        raise NotImplementedError

    def get_default_params(self, **kwargs) -> Params | None:
        """Get the default params."""
        return None
Functional API and proof-of-concept jax classic-control envs (#25) (#145) 2022-11-18 22:25:33 +01:00			`"""Base class and definitions for an alternative, functional backend for gym envs, particularly suitable for hardware accelerated and otherwise transformed environments."""`
Pre commit autoupdate (#1082) 2024-06-10 17:07:47 +01:00
Move dev_wrappers and functional to experimental (#159) 2022-11-29 23:37:53 +00:00			`from __future__ import annotations`
Functional API and proof-of-concept jax classic-control envs (#25) (#145) 2022-11-18 22:25:33 +01:00
Move dev_wrappers and functional to experimental (#159) 2022-11-29 23:37:53 +00:00			`from typing import Any, Callable, Generic, TypeVar`
Functional API and proof-of-concept jax classic-control envs (#25) (#145) 2022-11-18 22:25:33 +01:00
			`import numpy as np`

New vector API + Functional environments + (vector) wrappers (#97) Co-authored-by: Roberto Schiavone <hello@robertoschiavone.io> 2023-02-12 07:49:37 -05:00			`from gymnasium import Space`

Update and rerun `pre-commit` hooks for better code quality (#179) 2022-12-04 22:24:02 +08:00
Functional API and proof-of-concept jax classic-control envs (#25) (#145) 2022-11-18 22:25:33 +01:00			`StateType = TypeVar("StateType")`
			`ActType = TypeVar("ActType")`
			`ObsType = TypeVar("ObsType")`
			`RewardType = TypeVar("RewardType")`
			`TerminalType = TypeVar("TerminalType")`
			`RenderStateType = TypeVar("RenderStateType")`
Change the functional API to include explicit params (#818) 2023-12-17 15:03:06 +01:00			`Params = TypeVar("Params")`
Functional API and proof-of-concept jax classic-control envs (#25) (#145) 2022-11-18 22:25:33 +01:00

			`class FuncEnv(`
Change the functional API to include explicit params (#818) 2023-12-17 15:03:06 +01:00			`Generic[`
			`StateType, ObsType, ActType, RewardType, TerminalType, RenderStateType, Params`
			`]`
Functional API and proof-of-concept jax classic-control envs (#25) (#145) 2022-11-18 22:25:33 +01:00			`):`
			`"""Base class (template) for functional envs.`

			`This API is meant to be used in a stateless manner, with the environment state being passed around explicitly.`
			`That being said, nothing here prevents users from using the environment statefully, it's just not recommended.`
			`A functional env consists of the following functions (in this case, instance methods):`
Merge v1.0.0 (#682) Co-authored-by: Kallinteris Andreas <30759571+Kallinteris-Andreas@users.noreply.github.com> Co-authored-by: Jet <38184875+jjshoots@users.noreply.github.com> Co-authored-by: Omar Younis <42100908+younik@users.noreply.github.com> 2023-11-07 13:27:25 +00:00
			`* initial: returns the initial state of the POMDP`
			`* observation: returns the observation in a given state`
			`* transition: returns the next state after taking an action in a given state`
			`* reward: returns the reward for a given (state, action, next_state) tuple`
			`* terminal: returns whether a given state is terminal`
			`* state_info: optional, returns a dict of info about a given state`
			`* step_info: optional, returns a dict of info about a given (state, action, next_state) tuple`
Functional API and proof-of-concept jax classic-control envs (#25) (#145) 2022-11-18 22:25:33 +01:00
			`The class-based structure serves the purpose of allowing environment constants to be defined in the class,`
			`and then using them by name in the code itself.`

			`For the moment, this is predominantly for internal use. This API is likely to change, but in the future`
			`we intend to flesh it out and officially expose it to end users.`
			`"""`

New vector API + Functional environments + (vector) wrappers (#97) Co-authored-by: Roberto Schiavone <hello@robertoschiavone.io> 2023-02-12 07:49:37 -05:00			`observation_space: Space`
			`action_space: Space`

Move dev_wrappers and functional to experimental (#159) 2022-11-29 23:37:53 +00:00			`def __init__(self, options: dict[str, Any] \| None = None):`
Functional API and proof-of-concept jax classic-control envs (#25) (#145) 2022-11-18 22:25:33 +01:00			`"""Initialize the environment constants."""`
			`self.__dict__.update(options or {})`
Change the functional API to include explicit params (#818) 2023-12-17 15:03:06 +01:00			`self.default_params = self.get_default_params()`
Functional API and proof-of-concept jax classic-control envs (#25) (#145) 2022-11-18 22:25:33 +01:00
Change the functional API to include explicit params (#818) 2023-12-17 15:03:06 +01:00			`def initial(self, rng: Any, params: Params \| None = None) -> StateType:`
Merge v1.0.0 (#682) Co-authored-by: Kallinteris Andreas <30759571+Kallinteris-Andreas@users.noreply.github.com> Co-authored-by: Jet <38184875+jjshoots@users.noreply.github.com> Co-authored-by: Omar Younis <42100908+younik@users.noreply.github.com> 2023-11-07 13:27:25 +00:00			`"""Generates the initial state of the environment with a random number generator."""`
Functional API and proof-of-concept jax classic-control envs (#25) (#145) 2022-11-18 22:25:33 +01:00			`raise NotImplementedError`

Change the functional API to include explicit params (#818) 2023-12-17 15:03:06 +01:00			`def transition(`
			`self, state: StateType, action: ActType, rng: Any, params: Params \| None = None`
			`) -> StateType:`
Merge v1.0.0 (#682) Co-authored-by: Kallinteris Andreas <30759571+Kallinteris-Andreas@users.noreply.github.com> Co-authored-by: Jet <38184875+jjshoots@users.noreply.github.com> Co-authored-by: Omar Younis <42100908+younik@users.noreply.github.com> 2023-11-07 13:27:25 +00:00			`"""Updates (transitions) the state with an action and random number generator."""`
Functional API and proof-of-concept jax classic-control envs (#25) (#145) 2022-11-18 22:25:33 +01:00			`raise NotImplementedError`

Add `params` and `rng` argument to all `FuncEnv` member functions (#900) Co-authored-by: Mark Towers <mark.m.towers@gmail.com> Co-authored-by: Pratik Ingle <prin@itu.dk> Co-authored-by: Jose Antonio Martin H <ja.martin.h@repsol.com> Co-authored-by: Oli <ollihaus@t-online.de> Co-authored-by: Jared Swift <j.w.swift@outlook.com> Co-authored-by: Tim Schneider <mail@tim-schneider.me> Co-authored-by: Tim Schneider <tim@robot-learning.de> Co-authored-by: Tim Schneider <tim.schneider94@t-online.de> Co-authored-by: Manuel Goulão <msilvagoulao@gmail.com> Co-authored-by: Michael Panchenko <35432522+MischaPanch@users.noreply.github.com> Co-authored-by: TobiasKallehauge <tkal@es.aau.dk> Co-authored-by: Ariel Kwiatkowski <ariel.j.kwiatkowski@gmail.com> Co-authored-by: James Mochizuki-Freeman <jameymmf@gmail.com> 2024-06-07 20:16:38 +00:00			`def observation(`
			`self, state: StateType, rng: Any, params: Params \| None = None`
			`) -> ObsType:`
Merge v1.0.0 (#682) Co-authored-by: Kallinteris Andreas <30759571+Kallinteris-Andreas@users.noreply.github.com> Co-authored-by: Jet <38184875+jjshoots@users.noreply.github.com> Co-authored-by: Omar Younis <42100908+younik@users.noreply.github.com> 2023-11-07 13:27:25 +00:00			`"""Generates an observation for a given state of an environment."""`
Move dev_wrappers and functional to experimental (#159) 2022-11-29 23:37:53 +00:00			`raise NotImplementedError`

Functional API and proof-of-concept jax classic-control envs (#25) (#145) 2022-11-18 22:25:33 +01:00			`def reward(`
Change the functional API to include explicit params (#818) 2023-12-17 15:03:06 +01:00			`self,`
			`state: StateType,`
			`action: ActType,`
			`next_state: StateType,`
Add `params` and `rng` argument to all `FuncEnv` member functions (#900) Co-authored-by: Mark Towers <mark.m.towers@gmail.com> Co-authored-by: Pratik Ingle <prin@itu.dk> Co-authored-by: Jose Antonio Martin H <ja.martin.h@repsol.com> Co-authored-by: Oli <ollihaus@t-online.de> Co-authored-by: Jared Swift <j.w.swift@outlook.com> Co-authored-by: Tim Schneider <mail@tim-schneider.me> Co-authored-by: Tim Schneider <tim@robot-learning.de> Co-authored-by: Tim Schneider <tim.schneider94@t-online.de> Co-authored-by: Manuel Goulão <msilvagoulao@gmail.com> Co-authored-by: Michael Panchenko <35432522+MischaPanch@users.noreply.github.com> Co-authored-by: TobiasKallehauge <tkal@es.aau.dk> Co-authored-by: Ariel Kwiatkowski <ariel.j.kwiatkowski@gmail.com> Co-authored-by: James Mochizuki-Freeman <jameymmf@gmail.com> 2024-06-07 20:16:38 +00:00			`rng: Any,`
Change the functional API to include explicit params (#818) 2023-12-17 15:03:06 +01:00			`params: Params \| None = None,`
Functional API and proof-of-concept jax classic-control envs (#25) (#145) 2022-11-18 22:25:33 +01:00			`) -> RewardType:`
Merge v1.0.0 (#682) Co-authored-by: Kallinteris Andreas <30759571+Kallinteris-Andreas@users.noreply.github.com> Co-authored-by: Jet <38184875+jjshoots@users.noreply.github.com> Co-authored-by: Omar Younis <42100908+younik@users.noreply.github.com> 2023-11-07 13:27:25 +00:00			"""Computes the reward for a given transition between `state`, `action` to `next_state`."""
Functional API and proof-of-concept jax classic-control envs (#25) (#145) 2022-11-18 22:25:33 +01:00			`raise NotImplementedError`

Add `params` and `rng` argument to all `FuncEnv` member functions (#900) Co-authored-by: Mark Towers <mark.m.towers@gmail.com> Co-authored-by: Pratik Ingle <prin@itu.dk> Co-authored-by: Jose Antonio Martin H <ja.martin.h@repsol.com> Co-authored-by: Oli <ollihaus@t-online.de> Co-authored-by: Jared Swift <j.w.swift@outlook.com> Co-authored-by: Tim Schneider <mail@tim-schneider.me> Co-authored-by: Tim Schneider <tim@robot-learning.de> Co-authored-by: Tim Schneider <tim.schneider94@t-online.de> Co-authored-by: Manuel Goulão <msilvagoulao@gmail.com> Co-authored-by: Michael Panchenko <35432522+MischaPanch@users.noreply.github.com> Co-authored-by: TobiasKallehauge <tkal@es.aau.dk> Co-authored-by: Ariel Kwiatkowski <ariel.j.kwiatkowski@gmail.com> Co-authored-by: James Mochizuki-Freeman <jameymmf@gmail.com> 2024-06-07 20:16:38 +00:00			`def terminal(`
			`self, state: StateType, rng: Any, params: Params \| None = None`
			`) -> TerminalType:`
Merge v1.0.0 (#682) Co-authored-by: Kallinteris Andreas <30759571+Kallinteris-Andreas@users.noreply.github.com> Co-authored-by: Jet <38184875+jjshoots@users.noreply.github.com> Co-authored-by: Omar Younis <42100908+younik@users.noreply.github.com> 2023-11-07 13:27:25 +00:00			`"""Returns if the state is a final terminal state."""`
Functional API and proof-of-concept jax classic-control envs (#25) (#145) 2022-11-18 22:25:33 +01:00			`raise NotImplementedError`

[`FuncEnv`] fix `state_info` was misnaned to `initial_info` (#862) 2023-12-25 22:55:06 +02:00			`def state_info(self, state: StateType, params: Params \| None = None) -> dict:`
Functional API and proof-of-concept jax classic-control envs (#25) (#145) 2022-11-18 22:25:33 +01:00			`"""Info dict about a single state."""`
			`return {}`

Merge v1.0.0 (#682) Co-authored-by: Kallinteris Andreas <30759571+Kallinteris-Andreas@users.noreply.github.com> Co-authored-by: Jet <38184875+jjshoots@users.noreply.github.com> Co-authored-by: Omar Younis <42100908+younik@users.noreply.github.com> 2023-11-07 13:27:25 +00:00			`def transition_info(`
Change the functional API to include explicit params (#818) 2023-12-17 15:03:06 +01:00			`self,`
			`state: StateType,`
			`action: ActType,`
			`next_state: StateType,`
			`params: Params \| None = None,`
Functional API and proof-of-concept jax classic-control envs (#25) (#145) 2022-11-18 22:25:33 +01:00			`) -> dict:`
			`"""Info dict about a full transition."""`
			`return {}`

			`def transform(self, func: Callable[[Callable], Callable]):`
			`"""Functional transformations."""`
			`self.initial = func(self.initial)`
			`self.transition = func(self.transition)`
			`self.observation = func(self.observation)`
			`self.reward = func(self.reward)`
			`self.terminal = func(self.terminal)`
[`FuncEnv`] fix `state_info` was misnaned to `initial_info` (#862) 2023-12-25 22:55:06 +02:00			`self.state_info = func(self.state_info)`
Change the functional API to include explicit params (#818) 2023-12-17 15:03:06 +01:00			`self.step_info = func(self.transition_info)`
Functional API and proof-of-concept jax classic-control envs (#25) (#145) 2022-11-18 22:25:33 +01:00
			`def render_image(`
Change the functional API to include explicit params (#818) 2023-12-17 15:03:06 +01:00			`self,`
			`state: StateType,`
			`render_state: RenderStateType,`
			`params: Params \| None = None,`
Move dev_wrappers and functional to experimental (#159) 2022-11-29 23:37:53 +00:00			`) -> tuple[RenderStateType, np.ndarray]:`
Functional API and proof-of-concept jax classic-control envs (#25) (#145) 2022-11-18 22:25:33 +01:00			`"""Show the state."""`
			`raise NotImplementedError`

Add `params` and `rng` argument to all `FuncEnv` member functions (#900) Co-authored-by: Mark Towers <mark.m.towers@gmail.com> Co-authored-by: Pratik Ingle <prin@itu.dk> Co-authored-by: Jose Antonio Martin H <ja.martin.h@repsol.com> Co-authored-by: Oli <ollihaus@t-online.de> Co-authored-by: Jared Swift <j.w.swift@outlook.com> Co-authored-by: Tim Schneider <mail@tim-schneider.me> Co-authored-by: Tim Schneider <tim@robot-learning.de> Co-authored-by: Tim Schneider <tim.schneider94@t-online.de> Co-authored-by: Manuel Goulão <msilvagoulao@gmail.com> Co-authored-by: Michael Panchenko <35432522+MischaPanch@users.noreply.github.com> Co-authored-by: TobiasKallehauge <tkal@es.aau.dk> Co-authored-by: Ariel Kwiatkowski <ariel.j.kwiatkowski@gmail.com> Co-authored-by: James Mochizuki-Freeman <jameymmf@gmail.com> 2024-06-07 20:16:38 +00:00			`def render_init(self, params: Params \| None = None, **kwargs) -> RenderStateType:`
Functional API and proof-of-concept jax classic-control envs (#25) (#145) 2022-11-18 22:25:33 +01:00			`"""Initialize the render state."""`
			`raise NotImplementedError`

Add `params` and `rng` argument to all `FuncEnv` member functions (#900) Co-authored-by: Mark Towers <mark.m.towers@gmail.com> Co-authored-by: Pratik Ingle <prin@itu.dk> Co-authored-by: Jose Antonio Martin H <ja.martin.h@repsol.com> Co-authored-by: Oli <ollihaus@t-online.de> Co-authored-by: Jared Swift <j.w.swift@outlook.com> Co-authored-by: Tim Schneider <mail@tim-schneider.me> Co-authored-by: Tim Schneider <tim@robot-learning.de> Co-authored-by: Tim Schneider <tim.schneider94@t-online.de> Co-authored-by: Manuel Goulão <msilvagoulao@gmail.com> Co-authored-by: Michael Panchenko <35432522+MischaPanch@users.noreply.github.com> Co-authored-by: TobiasKallehauge <tkal@es.aau.dk> Co-authored-by: Ariel Kwiatkowski <ariel.j.kwiatkowski@gmail.com> Co-authored-by: James Mochizuki-Freeman <jameymmf@gmail.com> 2024-06-07 20:16:38 +00:00			`def render_close(self, render_state: RenderStateType, params: Params \| None = None):`
Functional API and proof-of-concept jax classic-control envs (#25) (#145) 2022-11-18 22:25:33 +01:00			`"""Close the render state."""`
			`raise NotImplementedError`
Change the functional API to include explicit params (#818) 2023-12-17 15:03:06 +01:00
			`def get_default_params(self, **kwargs) -> Params \| None:`
			`"""Get the default params."""`
			`return None`