2016-05-30 18:07:59 -07:00
|
|
|
import numpy as np
|
|
|
|
|
2016-05-29 09:07:09 -07:00
|
|
|
from gym import Env, spaces
|
|
|
|
from gym.utils import seeding
|
2016-04-27 08:00:58 -07:00
|
|
|
|
2020-11-09 13:24:26 -05:00
|
|
|
|
2016-05-29 09:07:09 -07:00
|
|
|
def categorical_sample(prob_n, np_random):
|
2016-04-27 08:00:58 -07:00
|
|
|
"""
|
|
|
|
Sample from categorical distribution
|
|
|
|
Each row specifies class probabilities
|
|
|
|
"""
|
|
|
|
prob_n = np.asarray(prob_n)
|
|
|
|
csprob_n = np.cumsum(prob_n)
|
2016-05-29 09:07:09 -07:00
|
|
|
return (csprob_n > np_random.rand()).argmax()
|
2016-04-27 08:00:58 -07:00
|
|
|
|
|
|
|
|
|
|
|
class DiscreteEnv(Env):
|
|
|
|
|
2016-05-07 20:52:51 -04:00
|
|
|
"""
|
|
|
|
Has the following members
|
|
|
|
- nS: number of states
|
|
|
|
- nA: number of actions
|
|
|
|
- P: transitions (*)
|
|
|
|
- isd: initial state distribution (**)
|
|
|
|
|
2020-11-09 13:24:26 -05:00
|
|
|
(*) dictionary of lists, where
|
2016-05-07 20:52:51 -04:00
|
|
|
P[s][a] == [(probability, nextstate, reward, done), ...]
|
|
|
|
(**) list or array of length nS
|
|
|
|
|
|
|
|
|
|
|
|
"""
|
2021-07-29 02:26:34 +02:00
|
|
|
|
2016-05-07 20:52:51 -04:00
|
|
|
def __init__(self, nS, nA, P, isd):
|
2016-04-27 08:00:58 -07:00
|
|
|
self.P = P
|
|
|
|
self.isd = isd
|
2020-11-09 13:24:26 -05:00
|
|
|
self.lastaction = None # for rendering
|
2016-05-29 09:07:09 -07:00
|
|
|
self.nS = nS
|
|
|
|
self.nA = nA
|
|
|
|
|
2016-05-30 18:07:59 -07:00
|
|
|
self.action_space = spaces.Discrete(self.nA)
|
|
|
|
self.observation_space = spaces.Discrete(self.nS)
|
|
|
|
|
Cleanup, removal of unmaintained code (#836)
* add dtype to Box
* remove board_game, debugging, safety, parameter_tuning environments
* massive set of breaking changes
- remove python logging module
- _step, _reset, _seed, _close => non underscored method
- remove benchmark and scoring folder
* Improve render("human"), now resizable, closable window.
* get rid of default step and reset in wrappers, so it doesn’t silently fail for people with underscore methods
* CubeCrash unit test environment
* followup fixes
* MemorizeDigits unit test envrionment
* refactored spaces a bit
fixed indentation
disabled test_env_semantics
* fix unit tests
* fixes
* CubeCrash, MemorizeDigits tested
* gym backwards compatibility patch
* gym backwards compatibility, followup fixes
* changelist, add spaces to main namespaces
* undo_logger_setup for backwards compat
* remove configuration.py
2018-01-25 18:20:14 -08:00
|
|
|
self.seed()
|
2019-02-15 18:57:46 -05:00
|
|
|
self.s = categorical_sample(self.isd, self.np_random)
|
2016-04-27 08:00:58 -07:00
|
|
|
|
Cleanup, removal of unmaintained code (#836)
* add dtype to Box
* remove board_game, debugging, safety, parameter_tuning environments
* massive set of breaking changes
- remove python logging module
- _step, _reset, _seed, _close => non underscored method
- remove benchmark and scoring folder
* Improve render("human"), now resizable, closable window.
* get rid of default step and reset in wrappers, so it doesn’t silently fail for people with underscore methods
* CubeCrash unit test environment
* followup fixes
* MemorizeDigits unit test envrionment
* refactored spaces a bit
fixed indentation
disabled test_env_semantics
* fix unit tests
* fixes
* CubeCrash, MemorizeDigits tested
* gym backwards compatibility patch
* gym backwards compatibility, followup fixes
* changelist, add spaces to main namespaces
* undo_logger_setup for backwards compat
* remove configuration.py
2018-01-25 18:20:14 -08:00
|
|
|
def seed(self, seed=None):
|
2016-05-29 09:07:09 -07:00
|
|
|
self.np_random, seed = seeding.np_random(seed)
|
|
|
|
return [seed]
|
2016-05-07 20:52:51 -04:00
|
|
|
|
Cleanup, removal of unmaintained code (#836)
* add dtype to Box
* remove board_game, debugging, safety, parameter_tuning environments
* massive set of breaking changes
- remove python logging module
- _step, _reset, _seed, _close => non underscored method
- remove benchmark and scoring folder
* Improve render("human"), now resizable, closable window.
* get rid of default step and reset in wrappers, so it doesn’t silently fail for people with underscore methods
* CubeCrash unit test environment
* followup fixes
* MemorizeDigits unit test envrionment
* refactored spaces a bit
fixed indentation
disabled test_env_semantics
* fix unit tests
* fixes
* CubeCrash, MemorizeDigits tested
* gym backwards compatibility patch
* gym backwards compatibility, followup fixes
* changelist, add spaces to main namespaces
* undo_logger_setup for backwards compat
* remove configuration.py
2018-01-25 18:20:14 -08:00
|
|
|
def reset(self):
|
2016-05-29 09:07:09 -07:00
|
|
|
self.s = categorical_sample(self.isd, self.np_random)
|
2019-02-09 02:58:51 +02:00
|
|
|
self.lastaction = None
|
2020-08-15 09:39:12 +10:00
|
|
|
return int(self.s)
|
2016-04-27 08:00:58 -07:00
|
|
|
|
Cleanup, removal of unmaintained code (#836)
* add dtype to Box
* remove board_game, debugging, safety, parameter_tuning environments
* massive set of breaking changes
- remove python logging module
- _step, _reset, _seed, _close => non underscored method
- remove benchmark and scoring folder
* Improve render("human"), now resizable, closable window.
* get rid of default step and reset in wrappers, so it doesn’t silently fail for people with underscore methods
* CubeCrash unit test environment
* followup fixes
* MemorizeDigits unit test envrionment
* refactored spaces a bit
fixed indentation
disabled test_env_semantics
* fix unit tests
* fixes
* CubeCrash, MemorizeDigits tested
* gym backwards compatibility patch
* gym backwards compatibility, followup fixes
* changelist, add spaces to main namespaces
* undo_logger_setup for backwards compat
* remove configuration.py
2018-01-25 18:20:14 -08:00
|
|
|
def step(self, a):
|
2016-04-27 08:00:58 -07:00
|
|
|
transitions = self.P[self.s][a]
|
2016-05-29 09:07:09 -07:00
|
|
|
i = categorical_sample([t[0] for t in transitions], self.np_random)
|
2020-11-09 13:24:26 -05:00
|
|
|
p, s, r, d = transitions[i]
|
2016-04-27 08:00:58 -07:00
|
|
|
self.s = s
|
2019-02-09 02:58:51 +02:00
|
|
|
self.lastaction = a
|
2020-11-09 13:24:26 -05:00
|
|
|
return (int(s), r, d, {"prob": p})
|