Initial release. Hello world :).

This commit is contained in:
Greg Brockman
2016-04-27 08:00:58 -07:00
commit e8f2980603
97 changed files with 6500 additions and 0 deletions

31
.gitignore vendored Normal file
View File

@@ -0,0 +1,31 @@
*.swp
*.pyc
*.py~
.DS_Store
# Setuptools distribution and build folders.
/dist/
/build
# Virtualenv
/env
# Python egg metadata, regenerated from source files by setuptools.
/*.egg-info
*.sublime-project
*.sublime-workspace
logs/
.ipynb_checkpoints
ghostdriver.log
junk
MUJOCO_LOG.txt
mujoco-bundle
rllab_mujoco
tutorial/*.html

32
.travis.yml Normal file
View File

@@ -0,0 +1,32 @@
dist: trusty
sudo: required
cache:
apt: true
pip: false
language: python
python:
- "2.7"
# - "3.2"
# Install numpy and scipy so we don't need to compile them
addons:
apt:
packages:
- python-numpy
- python-matplotlib
- python-tk
before_install:
- Xvfb :12 -screen 0 800x600x24 +extension RANDR &
- mkdir -p ~/.mujoco
- curl https://openai-public.s3-us-west-2.amazonaws.com/mujoco/$MUJOCO_KEY_BUNDLE.tar.gz | tar xz -C ~/.mujoco
env:
- DISPLAY=:12
install: pip install -r requirements.txt
script: nose2
notifications:
slack:
secure: h/Mxm8K+avH/2W0818zCHmLloRPMFN4NJL01+VShvAkH80/acfjeq/+mMdWXXPL/oOB6kSHDk+GDhwR6+s03ZcPMn5INTFvFYqUc6UWmT+NXtOPxGTN0xda6MdYUkWQUKaMyjFrweZQOMOASFBIzPOq4XeVbM5aB8s4EJhnfAcYZhp/idwKbToVihN4KZgxlvZIFc8iEp1o9uSl5qrsaeYYYXRkb6mauacAwOo4/Chu+cOnoLUOnvhBFE3rV3doDNrbnoalO8XiExtgx5CIAYWrlMni7r2Q+LlzgwdyTH19ZtybPxJTZIIWSBQ2UtcoYdIEDcc36GcUwz1VUGg32mLJJnY2xw80CWR4ixFPpLwwP5Y99WTn8v094B4nmFTWOwNWXp3EkqtTN9XcJoRBqXB5ArucIPqrx57dOCljSKx22gL6WaF2p3stSAxIGFektGyGnisaELrFZG1C63aHoUPicj3gUlijmAoUmYaDRf6P1wnpXqBpKDAWWhAMSatvx1ekmEJgR7OQklQnnfjx9kENDUygNUWS4IQwN2qYieuzHFL3of7/30mTM43+Vt/vWN8GI7j01BXu6FNGGloHxjH1pt3bLP/+uj5BJsT2HWF+Z8XR4VE6cyVuKsQAFgCXwOkoDHALbcwsspONDIt/9ixkesgh1oFt4CzU3UuU5wYs=
on_success: change

13
CODE_OF_CONDUCT.rst Normal file
View File

@@ -0,0 +1,13 @@
OpenAI Gym is dedicated to providing a harassment-free experience for
everyone, regardless of gender, gender identity and expression, sexual
orientation, disability, physical appearance, body size, age, race, or
religion. We do not tolerate harassment of participants in any form.
This code of conduct applies to all OpenAI Gym spaces (including Gist
comments) both online and off. Anyone who violates this code of
conduct may be sanctioned or expelled from these spaces at the
discretion of the OpenAI team.
We may add additional rules over time, which will be made clearly
available to participants. Participants are responsible for knowing
and abiding by these rules.

35
Dockerfile Normal file
View File

@@ -0,0 +1,35 @@
# A Dockerfile that sets up a full Gym install
FROM ubuntu:14.04
RUN apt-get update \
&& apt-get install -y xorg-dev \
libgl1-mesa-dev \
xvfb \
libxinerama1 \
libxcursor1 \
libglu1-mesa \
libav-tools \
python-numpy \
python-scipy \
python-pyglet \
python-setuptools \
libpq-dev \
libjpeg-dev \
curl \
cmake \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/* \
&& easy_install pip
WORKDIR /usr/local/gym
RUN mkdir gym && touch gym/__init__.py
COPY ./gym/version.py ./gym
COPY ./requirements.txt .
COPY ./setup.py .
RUN pip install -r requirements.txt
# Finally, upload our actual code!
COPY . /usr/local/gym
WORKDIR /root
ENTRYPOINT ["/usr/local/gym/bin/docker_entrypoint"]

7
Makefile Normal file
View File

@@ -0,0 +1,7 @@
.PHONY: install test
install:
pip install -r requirements.txt
test:
nose2

208
README.rst Normal file
View File

@@ -0,0 +1,208 @@
gym
******
**OpenAI Gym is a toolkit for developing and comparing reinforcement learning algorithms.** This is the ``gym`` open-source library, which gives you access to an ever-growing variety of environments.
``gym`` makes no assumptions about the structure of your agent, and is compatible with any numerical computation library, such as Tensorflow or Theano. You can use it from Python code, and soon from other languages.
If you're not sure where to start, we recommend beginning with the
`docs <https://gym.openai.com/docs>`_ on our site.
.. contents:: **Contents of this document**
:depth: 2
Basics
======
There are two basic concepts in reinforcement learning: the
environment (namely, the outside world) and the agent (namely, the
algorithm you are writing). The agent sends `actions` to the
environment, and the environment replies with `observations` and
`rewards` (that is, a score).
The core `gym` interface is `Env
<https://github.com/openai/gym/blob/master/gym/core.py>`_, which is
the unified environment interface. There is no interface for agents;
that part is left to you. The following are the ``Env`` methods you
should know:
- `reset(self)`: Reset the environment's state. Returns `observation`.
- `step(self, action)`: Step the environment by one timestep. Returns `observation`, `action`, `reward`, `done`.
- `render(self, mode='human', close=False)`: Render one frame of the environment. The default mode will do something human friendly, such as pop up a window. Passing the `close` flag signals the renderer to close any such windows.
Installation
============
You can perform a minimal install of ``gym`` with:
.. code:: shell
git clone git@github.com:gym
cd gym
pip install -e .
You'll be able to run a few environments right away:
- `algorithmic <https://gym.openai.com/envs#algorithmic>`_
- `toy_text <https://gym.openai.com/envs#toy_text>`_
- `classic_control <https://gym.openai.com/envs#classic_control>`_ (you'll need ``pyglet`` to render though)
We recommend playing with those environments at first, and then later
installing the dependencies for the remaining environments.
Installing everything
---------------------
Once you're ready to install everything, run ``pip install -e .[all]``.
MuJoCo has a proprietary dependency we can't set up for you. Follow
the
`instructions <https://github.com/openai/mujoco-py#obtaining-the-binaries-and-license-key>`_
in the ``mujoco-py`` package for help.
For the install to succeed, you'll need to have some system packages
installed. We'll build out the list here over time; please let us know
what you end up installing on your platform.
On Ubuntu 14.04:
.. code:: shell
apt-get install -y numpy python-dev cmake zlib1g-dev libjpeg-dev xvfb libav-tools xorg-dev python-opengl
Supported systems
-----------------
We currenty support Python 2.7 on Linux and OSX.
We will expand support to Python 3 and Windows based on demand. We
will also soon ship a Docker container exposing OpenAI Gym as an API
callable from any platform.
Pip version
-----------
To run ``pip install -e .[all]``, you'll need a semi-recent pip.
Please make sure your pip is at least at version ``1.5.0``. You can
upgrade using the following: ``pip install --ignore-installed
pip``. Alternatively, you can open `setup.py
<https://github.com/openai/gym/blob/master/setup.py>`_ and
install the dependencies by hand.
Installing dependencies for specific environments
-------------------------------------------------
If you'd like to install the dependencies for only specific
environments, see `setup.py
<https://github.com/openai/gym/blob/master/setup.py>`_. We
maintain the lists of dependencies on a per-environment group basis.
Environments
============
The code for each environment group is housed in its own subdirectory
`gym/envs
<https://github.com/openai/gym/blob/master/gym/envs>`_. The
specification of each task is in `gym/envs/__init__.py
<https://github.com/openai/gym/blob/master/gym/envs/__init__.py>`_. It's
worth browsing through both.
Algorithmic
-----------
These are a variety of algorithmic tasks, such as learning to copy a
sequence.
.. code:: python
import gym
env = gym.make('Copy-v0')
env.reset()
env.render()
Atari
-----
The Atari environments are a variety of Atari video games. If you didn't do the full install, you can install dependencies via ``pip install -e .[atari]`` and then get started as follow:
.. code:: python
import gym
env = gym.make('SpaceInvaders-v0')
env.reset()
env.render()
This will install ``atari-py``, which automatically compiles the `Arcade Learning Environment <http://www.arcadelearningenvironment.org/>`_. This can take quite a while (a few minutes on a decent laptop), so just be prepared.
Board games
-----------
The board game environments are a variety of board games. If you didn't do the full install, you can install dependencies via ``pip install -e .[board_game]`` and then get started as follow:
.. code:: python
import gym
env = gym.make('Go9x9-v0')
env.reset()
env.render()
Classic control
---------------
These are a variety of classic control tasks, which would appear in a typical reinforcement learning textbook. If you didn't do the full install, you will need to run ``pip install -e .[classic_control]`` to enable rendering. You can get started with them via:
.. code:: python
import gym
env = gym.make('CartPole-v0')
env.reset()
env.render()
MuJoCo
------
`MuJoCo <http://www.mujoco.org/>`_ is a physics engine which can do
very detailed efficient simulations with contacts. It's not
open-source, so you'll have to follow the instructions in `mujoco-py
<https://github.com/openai/mujoco-py#obtaining-the-binaries-and-license-key>`_
to set it up. You'll have to also run ``pip install -e .[mujoco]`` if you didn't do the full install.
.. code:: python
import gym
env = gym.make('Humanoid')
env.reset()
env.render()
Toy text
--------
Toy environments which are text-based. There's no extra dependency to install, so to get started, you can just do:
.. code:: python
import gym
env = gym.make('FrozenLake')
env.reset()
env.render()
Examples
========
See the ``examples`` directory.
- Run `examples/agents/random_agent.py <https://github.com/openai/gym/blob/master/examples/agents/random_agent.py>`_ to run an simple random agent and upload the results to the scoreboard.
- Run `examples/agents/cem.py <https://github.com/openai/gym/blob/master/examples/agents/cem.py>`_ to run an actual learning agent (using the cross-entropy method) and upload the results to the scoreboard.
- Run `examples/scripts/list_envs <https://github.com/openai/gym/blob/master/examples/scripts/list_envs>`_ to generate a list of all environments. (You see also just `browse <https://gym.openai.com/docs>`_ the list on our site.
- Run `examples/scripts/upload <https://github.com/openai/gym/blob/master/examples/scripts/upload>`_ to upload the recorded output from ``random_agent.py`` or ``cem.py``. Make sure to obtain an `API key <https://gym.openai.com/settings/profile>`_.
Testing
=======
We are using `nose2 <https://github.com/nose-devs/nose2>`_ for tests. You can run them via
.. code:: shell
nose2
You can also run tests in a specific directory by using the ``-s`` option, or by passing in the specific name of the test. See the `nose2 docs <http://nose2.readthedocs.org/en/latest/usage.html#naming-tests>`_ for more details.

12
bin/docker_entrypoint Executable file
View File

@@ -0,0 +1,12 @@
#!/bin/sh
# This script is the entrypoint for our Docker image.
set -e
# Set up display; otherwise rendering will cause segfaults
rm -f /tmp/.X12-lock
Xvfb :12 -screen 0 800x600x24 +extension RANDR 2>/dev/null &
export DISPLAY=:12
exec "$@"

View File

@@ -0,0 +1,19 @@
# Support code for cem.py
class BinaryActionLinearPolicy(object):
def __init__(self, theta):
self.w = theta[:-1]
self.b = theta[-1]
def act(self, ob):
y = ob.dot(self.w) + self.b
a = int(y < 0)
return a
class ContinuousActionLinearPolicy(object):
def __init__(self, theta, n_in, n_out):
assert len(theta) == (n_in + 1) * n_out
self.W = theta[0 : n_in * n_out].reshape(n_in, n_out)
self.b = theta[n_in * n_out : None].reshape(1, n_out)
def act(self, ob):
a = ob.dot(self.W) + self.b
return a

92
examples/agents/cem.py Normal file
View File

@@ -0,0 +1,92 @@
import gym
import logging
import numpy as np
import json, sys, cPickle, os
from os import path
from _policies import BinaryActionLinearPolicy # Different file so it can be unpickled
import argparse
def cem(f, th_mean, batch_size, n_iter, elite_frac, initial_std=1.0):
"""
Generic implementation of the cross-entropy method for maximizing a black-box function
f: a function mapping from vector -> scalar
th_mean: initial mean over input distribution
batch_size: number of samples of theta to evaluate per batch
n_iter: number of batches
elite_frac: each batch, select this fraction of the top-performing samples
initial_std: initial standard deviation over parameter vectors
"""
n_elite = int(np.round(batch_size*elite_frac))
th_std = np.ones_like(th_mean) * initial_std
for _ in xrange(n_iter):
ths = np.array([th_mean + dth for dth in th_std[None,:]*np.random.randn(batch_size, th_mean.size)])
ys = np.array([f(th) for th in ths])
elite_inds = ys.argsort()[::-1][:n_elite]
elite_ths = ths[elite_inds]
th_mean = elite_ths.mean(axis=0)
th_std = elite_ths.std(axis=0)
yield {'ys' : ys, 'theta_mean' : th_mean, 'y_mean' : ys.mean()}
def do_rollout(agent, env, num_steps, render=False):
total_rew = 0
ob = env.reset()
for t in xrange(num_steps):
a = agent.act(ob)
(ob, reward, done, _info) = env.step(a)
total_rew += reward
if render and t%3==0: env.render()
if done: break
return total_rew, t+1
if __name__ == '__main__':
logger = logging.getLogger()
logger.setLevel(logging.INFO)
parser = argparse.ArgumentParser()
parser.add_argument('--display', action='store_true')
args = parser.parse_args()
np.random.seed(0)
env = gym.make('CartPole-v0')
params = dict(n_iter=10, batch_size=25, elite_frac = 0.2)
num_steps = 200
# You provide the directory to write to (can be an existing
# directory, but can't contain previous monitor results. You can
# also dump to a tempdir if you'd like: tempfile.mkdtemp().
outdir = '/tmp/cem-agent-results'
env.monitor.start(outdir, force=True)
# Prepare snapshotting
# ----------------------------------------
def writefile(fname, s):
with open(path.join(outdir, fname), 'w') as fh: fh.write(s)
info = {}
info['params'] = params
info['argv'] = sys.argv
info['env_id'] = env.spec.id
# ------------------------------------------
def noisy_evaluation(theta):
agent = BinaryActionLinearPolicy(theta)
rew, T = do_rollout(agent, env, num_steps)
return rew
# Train the agent, and snapshot each stage
for (i, iterdata) in enumerate(
cem(noisy_evaluation, np.zeros(env.observation_space.shape[0]+1), **params)):
print 'Iteration %2i. Episode mean reward: %7.3f'%(i, iterdata['y_mean'])
agent = BinaryActionLinearPolicy(iterdata['theta_mean'])
if args.display: do_rollout(agent, env, 200, render=True)
writefile('agent-%.4i.pkl'%i, cPickle.dumps(agent, -1))
# Write out the env at the end so we store the parameters of this
# environment.
writefile('info.json', json.dumps(info))
env.monitor.close()
logger.info("Successfully ran RandomAgent. Now trying to upload results to the scoreboard. If it breaks, you can always just try re-uploading the same results.")
gym.upload(outdir, algorithm_id='cem')

View File

@@ -0,0 +1,50 @@
import logging
import os
import gym
# The world's simplest agent!
class RandomAgent(object):
def __init__(self, action_space):
self.action_space = action_space
def act(self, observation, reward, done):
return self.action_space.sample()
if __name__ == '__main__':
# You can optionally set up the logger. Also fine to set the level
# to logging.DEBUG or logging.WARN if you want to change the
# amount of outut.
logger = logging.getLogger()
logger.setLevel(logging.INFO)
env = gym.make('CartPole-v0')
agent = RandomAgent(env.action_space)
# You provide the directory to write to (can be an existing
# directory, but can't contain previous monitor results. You can
# also dump to a tempdir if you'd like: tempfile.mkdtemp().
outdir = '/tmp/random-agent-results'
env.monitor.start(outdir, force=True)
episode_count = 200
max_steps = 100
reward = 0
done = False
for i in xrange(episode_count):
ob = env.reset()
for j in xrange(max_steps):
action = agent.act(ob, reward, done)
ob, reward, done, _ = env.step(action)
if done:
break
# Dump result info to disk
env.monitor.close()
# Upload to the scoreboard. We could also do this from another
# process if we wanted.
logger.info("Successfully ran RandomAgent. Now trying to upload results to the scoreboard. If it breaks, you can always just try re-uploading the same results.")
gym.upload(outdir, algorithm_id='random')

View File

@@ -0,0 +1,44 @@
class TabularQAgent(object):
"""
Agent implementing tabular Q-learning.
"""
def __init__(self, observation_space, action_space, **userconfig):
if not isinstance(observation_space, discrete.Discrete):
raise UnsupportedSpace('Observation space {} incompatible with {}. (Only supports Discrete observation spaces.)'.format(observation_space, self))
if not isinstance(action_space, discrete.Discrete):
raise UnsupportedSpace('Action space {} incompatible with {}. (Only supports Discrete action spaces.)'.format(action_space, self))
self.observation_space = observation_space
self.action_space = action_space
self.action_n = action_space.n
self.config = {
"init_mean" : 0.0, # Initialize Q values with this mean
"init_std" : 0.0, # Initialize Q values with this standard deviation
"learning_rate" : 0.1,
"eps": 0.05, # Epsilon in epsilon greedy policies
"discount": 0.95,
"n_iter": 10000} # Number of iterations
self.config.update(userconfig)
self.q = defaultdict(lambda: self.config["init_std"] * np.random.randn(self.action_n) + self.config["init_mean"])
def act(self, observation, eps=None):
if eps is None:
eps = self.config["eps"]
# epsilon greedy.
action = np.argmax(self.q[observation.item()]) if np.random.random() > eps else self.action_space.sample()
return action
def learn(self, env):
config = self.config
obs = env.reset()
q = self.q
for t in xrange(config["n_iter"]):
action, _ = self.act(obs)
obs2, reward, done, _ = env.step(action)
future = 0.0
if not done:
future = np.max(q[obs2.item()])
q[obs.item()][action] -= \
self.config["learning_rate"] * (q[obs.item()][action] - reward - config["discount"] * future)
obs = obs2

5
examples/scripts/list_envs Executable file
View File

@@ -0,0 +1,5 @@
#!/usr/bin/env python
from gym import envs
envids = [spec.id for spec in envs.registry.all()]
for envid in sorted(envids):
print(envid)

35
examples/scripts/play_go Executable file
View File

@@ -0,0 +1,35 @@
#!/usr/bin/env python
import argparse
import pachi_py
import gym
from gym import spaces, envs
from gym.envs.board_game import go
def main():
parser = argparse.ArgumentParser()
parser.add_argument('--raw_actions', action='store_true')
args = parser.parse_args()
env = envs.make('Go9x9-v0')
env.reset()
while True:
s = env._state
env._render()
colorstr = pachi_py.color_to_str(s.color)
if args.raw_actions:
a = int(raw_input('{} (raw)> '.format(colorstr)))
else:
coordstr = raw_input('{}> '.format(colorstr))
a = go.str_to_action(s.board, coordstr)
_, r, done, _ = env.step(a)
if done:
break
print
print 'You win!' if r > 0 else 'Opponent wins!'
print 'Final score:', env._state.board.official_score
if __name__ == '__main__':
main()

69
examples/scripts/sim_env Executable file
View File

@@ -0,0 +1,69 @@
#!/usr/bin/env python
import gym
from gym import spaces, envs
import argparse
import numpy as np
import itertools
import time
parser = argparse.ArgumentParser()
parser.add_argument("env")
parser.add_argument("--mode", choices=["noop", "random", "static", "human"],
default="random")
parser.add_argument("--max_steps", type=int, default=0)
parser.add_argument("--fps",type=float)
parser.add_argument("--once", action="store_true")
parser.add_argument("--ignore_done", action="store_true")
args = parser.parse_args()
env = envs.make(args.env)
ac_space = env.action_space
fps = args.fps or env.metadata.get('video.frames_per_second') or 100
if args.max_steps == 0: args.max_steps = env.spec.timestep_limit
if args.mode == "human":
if isinstance(ac_space, spaces.Discrete):
print("Press keys 0-{} to choose the agent's actions".format(ac_space.n-1))
import cv2
else:
raise ValueError("Can only use human on discrete action space. Got {}".format(type(ac_space)))
while True:
env.reset()
print("Starting a new trajectory")
for t in xrange(args.max_steps) if args.max_steps else itertools.count():
done = False
if args.mode == "noop":
if isinstance(ac_space, spaces.Box):
a = np.zeros(ac_space.shape)
elif isinstance(ac_space, spaces.Discrete):
a = 0
else:
raise NotImplementedError("noop not implemented for class {}".format(type(ac_space)))
_, _, done, _ = env.step(a)
time.sleep(1.0/fps)
elif args.mode == "random":
a = ac_space.sample()
_, _, done, _ = env.step(a)
time.sleep(1.0/fps)
elif args.mode == "static":
time.sleep(1.0/fps)
elif args.mode == "human":
if t == 0:
a = 0
else:
key = cv2.waitKey(-1)
a = key - ord('0')
if a >= ac_space.n:
print("WARNING: ignoring illegal action {}.".format(a))
a = 0
_, _, done, _ = env.step(a)
env.render()
if done and not args.ignore_done: break
print("Done after {} steps".format(t+1))
if args.once:
break
else:
raw_input("Press enter to continue")

44
examples/scripts/upload Executable file
View File

@@ -0,0 +1,44 @@
#!/usr/bin/env python
#
# This script assumes you have set an OPENAI_GYM_API_KEY environment
# variable. You can find your API key in the web interface:
# https://gym.openai.com/settings/profile.
import argparse
import logging
import os
import sys
import gym
# In modules, use `logger = logging.getLogger(__name__)`
logger = logging.getLogger()
class Uploader(object):
def __init__(self, training_dir, algorithm_id, writeup):
self.training_dir = training_dir
self.algorithm_id = algorithm_id
self.writeup = writeup
def run(self):
gym.upload(self.training_dir, algorithm_id=self.algorithm_id, writeup=self.writeup)
def main():
parser = argparse.ArgumentParser(description=None)
parser.add_argument('-t', '--training-dir', required=True, help='What directory to upload.')
parser.add_argument('-a', '--algorithm_id', help='Set the algorithm id.')
parser.add_argument('-w', '--writeup', help='Writeup to attach.')
parser.add_argument('-v', '--verbose', action='count', dest='verbosity', default=0, help='Set verbosity.')
args = parser.parse_args()
if args.verbosity == 0:
logger.setLevel(logging.INFO)
elif args.verbosity >= 1:
logger.setLevel(logging.DEBUG)
runner = Uploader(training_dir=args.training_dir, algorithm_id=args.algorithm_id, writeup=args.writeup)
runner.run()
return 0
if __name__ == '__main__':
sys.exit(main())

16
gym/__init__.py Normal file
View File

@@ -0,0 +1,16 @@
import logging
import sys
from gym.core import Env, Space
from gym.configuration import logger_setup, undo_logger_setup
from gym.envs import make, spec
from gym.scoreboard.api import upload
logger = logging.getLogger(__name__)
# We automatically configure a logger with a simple stderr handler. If
# you'd rather customize logging yourself, run undo_logger_setup.
logger_setup(logger)
del logger_setup
__all__ = ["Env", "Space", "make", "spec", "upload"]

87
gym/configuration.py Normal file
View File

@@ -0,0 +1,87 @@
import hashlib
import numpy as np
import logging
import os
import random
import struct
import sys
import gym
logger = logging.getLogger(__name__)
root_logger = logging.getLogger()
requests_logger = logging.getLogger('requests')
# Set up the default handler
formatter = logging.Formatter('[%(asctime)s] %(message)s')
handler = logging.StreamHandler(sys.stderr)
handler.setFormatter(formatter)
# We need to take in the gym logger explicitly since this is called
# at initialization time.
def logger_setup(gym_logger):
root_logger.addHandler(handler)
gym_logger.setLevel(logging.INFO)
# When set to INFO, this will print out the hostname of every
# connection it makes.
# requests_logger.setLevel(logging.WARN)
def undo_logger_setup():
"""Undoes the automatic logging setup done by OpenAI Gym. You should call
this function if you want to manually configure logging
yourself. Typical usage would involve putting something like the
following at the top of your script:
gym.undo_logger_setup()
logger = logging.getLogger()
logger.addHandler(logging.StreamHandler(sys.stderr))
"""
root_logger.removeHandler(handler)
gym.logger.setLevel(logging.NOTSET)
requests_logger.setLevel(logging.NOTSET)
def seed(a=None):
"""Seeds the 'random' and 'numpy.random' generators. By default,
Python seeds these with the system time. Call this if you are
using multiple processes.
Notes:
SECURITY SENSITIVE: a bug here would allow people to generate fake results. Please let us know if you find one :).
Args:
a (Optional[int, str]): None or no argument seeds from an operating system specific randomness source. If an int or str passed, then all of bits are used.
"""
# Adapted from https://svn.python.org/projects/python/tags/r32/Lib/random.py
if a is None:
a = bigint_from_bytes(os.urandom(32))
if isinstance(a, str):
a = a.encode('utf8')
a += hashlib.sha512(a).digest()
a = bigint_from_bytes(a)
# Actually seed the generators
random.seed(a)
np.random.seed(int_list_from_bigint(a))
return a
# TODO: don't hardcode sizeof_int here
def bigint_from_bytes(bytes):
sizeof_int = 4
padding = sizeof_int - len(bytes) % sizeof_int
bytes += '\0' * padding
int_count = len(bytes) / sizeof_int
unpacked = struct.unpack("{}I".format(int_count), bytes)
accum = 0
for i, val in enumerate(unpacked):
accum += 2 ** (sizeof_int * 8 * i) * val
return accum
def int_list_from_bigint(bigint):
ints = []
while bigint > 0:
bigint, mod = divmod(bigint, 2 ** 32)
ints.append(mod)
return ints

173
gym/core.py Normal file
View File

@@ -0,0 +1,173 @@
import logging
import numpy as np
from gym import error, monitoring
# Env-related abstractions
class Env(object):
"""The main OpenAI Gym class. It encapsulates an environment with
arbitrary behind-the-scenes dynamics.
When implementing an environment, override the following methods
in your subclass:
_step
_reset
_render
And set the following attributes:
action_space: The Space object corresponding to valid actions
observation_space: The Space object corresponding to valid observations
The methods are accessed publicly as "step", "reset", etc.. The
non-underscored versions are wrapper methods to which we may add
functionality to over time.
"""
# Set this in SOME subclasses
metadata = {'render.modes': []}
# Set these in ALL subclasses
action_space = None
observation_space = None
# Override in ALL subclasses
def _step(self, action): raise NotImplementedError
def _reset(self): raise NotImplementedError
def _render(self, mode='human', close=False):
if close:
return
raise NotImplementedError
# Will be automatically set when creating an environment via
# 'make'.
spec = None
@property
def monitor(self):
if not hasattr(self, '_monitor'):
self._monitor = monitoring.Monitor(self)
return self._monitor
def step(self, action):
"""
Run one timestep of the environment's dynamics. When end of episode
is reached, the environment will automatically reset its internal state.
Input
-----
action : an action provided by the environment
Outputs
-------
(observation, reward, done, info)
observation (object): agent's observation of the current environment
reward (float) : amount of reward due to the previous action
done (boolean): whether the episode has ended, in which case further step() calls will return undefined results
info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
"""
self.monitor._before_step(action)
observation, reward, done, info = self._step(action)
done = self.monitor._after_step(observation, reward, done, info)
return observation, reward, done, info
def reset(self):
"""
Resets the state of the environment and returns an initial observation.
Outputs
-------
observation (object): the initial observation of the space. (Initial reward is assumed to be 0.)
"""
self.monitor._before_reset()
observation = self._reset()
self.monitor._after_reset(observation)
return observation
def render(self, mode='human', close=False):
"""Renders the environment.
The set of supported modes varies per environment. (And some
environments do not support rendering at all.) By convention,
if mode is:
- human: render to the current display or terminal and
return nothing. Usually for human consumption.
- rgb_array: Return an numpy.ndarray with shape (x, y, 3),
representing RGB values for an x-by-y pixel image, suitable
for turning into a video.
- ansi: Return a string (str) or StringIO.StringIO containing a
terminal-style text representation. The text can include newlines
and ANSI escape sequences (e.g. for colors).
Note:
Make sure that your class's metadata 'render.modes' key includes
the list of supported modes. It's recommended to call super()
in implementations to use the functionality of this method.
Args:
mode (str): the mode to render with
close (bool): close all open renderings
Example:
class MyEnv(Env):
metadata = {'render.modes': ['human', 'rgb_array']}
def render(self, mode='human'):
if mode == 'rgb_array':
return np.array(...) # return RGB frame suitable for video
elif mode is 'human':
... # pop up a window and render
else:
super(MyEnv, self).render(mode=mode) # just raise an exception
"""
if close:
return self._render(close=close)
# This code can be useful for calling super() in a subclass.
modes = self.metadata.get('render.modes', [])
if len(modes) == 0:
raise error.UnsupportedMode('{} does not support rendering (requested mode: {})'.format(self, mode))
elif mode not in modes:
raise error.UnsupportedMode('Unsupported rendering mode: {}. (Supported modes for {}: {})'.format(mode, self, modes))
return self._render(mode=mode, close=close)
def __str__(self):
return '<{} instance>'.format(type(self).__name__)
# Space-related abstractions
class Space(object):
"""
Provides a classification state spaces and action spaces,
so you can write generic code that applies to any Environment.
E.g. to choose a random action.
"""
def sample(self, seed=0):
"""
Uniformly randomly sample a random elemnt of this space
"""
raise NotImplementedError
def contains(self, x):
"""
Return boolean specifying if x is a valid
member of this space
"""
raise NotImplementedError
def to_jsonable(self, sample_n):
"""Convert a batch of samples from this space to a JSONable data type."""
# By default, assume identity is JSONable
return sample_n
def from_jsonable(self, sample_n):
"""Convert a JSONable data type to a batch of samples from this space."""
# By default, assume identity is JSONable
return sample_n

20
gym/envs/README.md Normal file
View File

@@ -0,0 +1,20 @@
# Envs
These are the core integrated environments. Note that we may later
restructure any of the files, but will keep the environments available
at the relevant package's top-level. So for example, you should access
`AntEnv` as follows:
```
# Will be supported in future releases
from gym.envs import mujoco
mujoco.AntEnv
```
Rather than:
```
# May break in future releases
from gym.envs.mujoco import ant
ant.AntEnv
```

208
gym/envs/__init__.py Normal file
View File

@@ -0,0 +1,208 @@
from gym.envs.registration import registry, register, make, spec
# Algorithmic
# ----------------------------------------
register(
id='Copy-v0',
entry_point='gym.envs.algorithmic:CopyEnv',
timestep_limit=200,
reward_threshold=25.0,
)
register(
id='RepeatCopy-v0',
entry_point='gym.envs.algorithmic:RepeatCopyEnv',
timestep_limit=200,
reward_threshold=75.0,
)
register(
id='ReversedAddition-v0',
entry_point='gym.envs.algorithmic:ReversedAdditionEnv',
kwargs={'rows' : 2},
timestep_limit=200,
reward_threshold=25.0,
)
register(
id='ReversedAddition3-v0',
entry_point='gym.envs.algorithmic:ReversedAdditionEnv',
kwargs={'rows' : 3},
timestep_limit=200,
reward_threshold=25.0,
)
register(
id='DuplicatedInput-v0',
entry_point='gym.envs.algorithmic:DuplicatedInputEnv',
timestep_limit=200,
reward_threshold=9.0,
)
register(
id='Reverse-v0',
entry_point='gym.envs.algorithmic:ReverseEnv',
timestep_limit=200,
reward_threshold=25.0,
)
# Classic
# ----------------------------------------
register(
id='CartPole-v0',
entry_point='gym.envs.classic_control:CartPoleEnv',
timestep_limit=200,
reward_threshold=195,
)
register(
id='MountainCar-v0',
entry_point='gym.envs.classic_control:MountainCarEnv',
timestep_limit=200,
)
register(
id='Pendulum-v0',
entry_point='gym.envs.classic_control:PendulumEnv',
timestep_limit=200,
)
register(
id='Acrobot-v0',
entry_point='gym.envs.classic_control:AcrobotEnv',
timestep_limit=200,
)
# Toy Text
# ----------------------------------------
register(
id='FrozenLake-v0',
entry_point='gym.envs.toy_text:FrozenLakeEnv',
kwargs={'map_name' : '4x4'},
timestep_limit=100,
)
register(
id='FrozenLake8x8-v0',
entry_point='gym.envs.toy_text:FrozenLakeEnv',
kwargs={'map_name' : '8x8'},
timestep_limit=200,
)
register(
id='Roulette-v0',
entry_point='gym.envs.toy_text:RouletteEnv',
timestep_limit=100,
)
register(
id='Taxi-v0',
entry_point='gym.envs.toy_text.taxi:TaxiEnv',
timestep_limit=200,
)
# Mujoco
# ----------------------------------------
# 2D
register(
id='Reacher-v0',
entry_point='gym.envs.mujoco:ReacherEnv',
timestep_limit=50
)
register(
id='InvertedPendulum-v0',
entry_point='gym.envs.mujoco:InvertedPendulumEnv',
)
register(
id='InvertedDoublePendulum-v0',
entry_point='gym.envs.mujoco:InvertedDoublePendulumEnv',
)
register(
id='HalfCheetah-v0',
entry_point='gym.envs.mujoco:HalfCheetahEnv',
)
register(
id='Hopper-v0',
entry_point='gym.envs.mujoco:HopperEnv',
)
register(
id='Swimmer-v0',
entry_point='gym.envs.mujoco:SwimmerEnv',
)
register(
id='Walker2d-v0',
entry_point='gym.envs.mujoco:Walker2dEnv',
)
register(
id='Ant-v0',
entry_point='gym.envs.mujoco:AntEnv',
)
register(
id='Humanoid-v0',
entry_point='gym.envs.mujoco:HumanoidEnv',
)
# Atari
# ----------------------------------------
# # print ', '.join(["'{}'".format(name.split('.')[0]) for name in atari_py.list_games()])
for game in ['air_raid', 'alien', 'amidar', 'assault', 'asterix', 'asteroids', 'atlantis',
'bank_heist', 'battle_zone', 'beam_rider', 'berzerk', 'bowling', 'boxing', 'breakout', 'carnival',
'centipede', 'chopper_command', 'crazy_climber', 'demon_attack', 'double_dunk',
'elevator_action', 'enduro', 'fishing_derby', 'freeway', 'frostbite', 'gopher', 'gravitar',
'ice_hockey', 'jamesbond', 'journey_escape', 'kangaroo', 'krull', 'kung_fu_master',
'montezuma_revenge', 'ms_pacman', 'name_this_game', 'phoenix', 'pitfall', 'pong', 'pooyan',
'private_eye', 'qbert', 'riverraid', 'road_runner', 'robotank', 'seaquest', 'skiing',
'solaris', 'space_invaders', 'star_gunner', 'tennis', 'time_pilot', 'tutankham', 'up_n_down',
'venture', 'video_pinball', 'wizard_of_wor', 'yars_revenge', 'zaxxon']:
for obs_type in ['image', 'ram']:
# space_invaders should yield SpaceInvaders-v0 and SpaceInvaders-ram-v0
name = ''.join([g.capitalize() for g in game.split('_')])
if obs_type == 'ram':
name = '{}-ram'.format(name)
register(
id='{}-v0'.format(name),
entry_point='gym.envs.atari:AtariEnv',
kwargs={'game': game, 'obs_type': obs_type},
timestep_limit=10000,
)
# Board games
# ----------------------------------------
register(
id='Go9x9-v0',
entry_point='gym.envs.board_game:GoEnv',
kwargs={
'player_color': 'black',
'opponent': 'pachi:uct:_2400',
'observation_type': 'image3c',
'illegal_move_mode': 'lose',
'board_size': 9,
},
)
register(
id='Go19x19-v0',
entry_point='gym.envs.board_game:GoEnv',
kwargs={
'player_color': 'black',
'opponent': 'pachi:uct:_2400',
'observation_type': 'image3c',
'illegal_move_mode': 'lose',
'board_size': 19,
},
)

View File

@@ -0,0 +1,3 @@
# Algorithmic tasks
Not yet ready for prime-time. We'll shore these up soon.

View File

@@ -0,0 +1,5 @@
from gym.envs.algorithmic.copy import CopyEnv
from gym.envs.algorithmic.repeat_copy import RepeatCopyEnv
from gym.envs.algorithmic.duplicated_input import DuplicatedInputEnv
from gym.envs.algorithmic.reverse import ReverseEnv
from gym.envs.algorithmic.reversed_addition import ReversedAdditionEnv

View File

@@ -0,0 +1,203 @@
from gym import Env
from gym.spaces import Discrete, Tuple
from gym.utils import colorize
import numpy as np
import random
import StringIO
import sys
import math
hash_base = None
def ha(array):
return (hash_base * (array + 5)).sum()
class AlgorithmicEnv(Env):
metadata = {'render.modes': ['human', 'ansi']}
def __init__(self, inp_dim=1, base=10, chars=False):
global hash_base
hash_base = 50 ** np.arange(inp_dim)
self.base = base
self.last = 10
self.total_reward = 0
self.sum_reward = 0
AlgorithmicEnv.sum_rewards = []
self.chars = chars
self.inp_dim = inp_dim
AlgorithmicEnv.current_length = 2
tape_control = []
self.action_space = Tuple(([Discrete(2 * inp_dim), Discrete(2), Discrete(self.base)]))
self.observation_space = Discrete(self.base + 1)
self.reset()
def _get_obs(self, pos=None):
if pos is None:
pos = self.x
assert(isinstance(pos, np.ndarray) and pos.shape[0] == self.inp_dim)
if ha(pos) not in self.content:
self.content[ha(pos)] = self.base
return self.content[ha(pos)]
def _get_str_obs(self, pos=None):
ret = self._get_obs(pos)
if ret == self.base:
return " "
else:
if self.chars:
return chr(ret + ord('A'))
return str(ret)
def _get_str_target(self, pos=None):
if pos not in self.target:
return " "
else:
ret = self.target[pos]
if self.chars:
return chr(ret + ord('A'))
return str(ret)
def _render_observation(self):
x = self.x
if self.inp_dim == 1:
x_str = "Observation Tape : "
for i in range(-2, self.total_len + 2):
if i == x:
x_str += colorize(self._get_str_obs(np.array([i])), 'green', highlight=True)
else:
x_str += self._get_str_obs(np.array([i]))
x_str += "\n"
return x_str
elif self.inp_dim == 2:
label = "Observation Grid : "
x_str = ""
for j in range(-1, 3):
if j != -1:
x_str += " " * len(label)
for i in range(-2, self.total_len + 2):
if i == x[0] and j == x[1]:
x_str += colorize(self._get_str_obs(np.array([i, j])), 'green', highlight=True)
else:
x_str += self._get_str_obs(np.array([i, j]))
x_str += "\n"
x_str = label + x_str
return x_str
else:
assert(False)
def _render(self, mode='human', close=False):
if close:
# Nothing interesting to close
return
outfile = StringIO.StringIO() if mode == 'ansi' else sys.stdout
inp = "Total length of input instance: %d, step: %d\n" % (self.total_len, self.time)
outfile.write(inp)
x, y, action = self.x, self.y, self.last_action
if action is not None:
inp_act, out_act, pred = action
outfile.write("=" * (len(inp) - 1) + "\n")
y_str = "Output Tape : "
target_str = "Targets : "
if action is not None:
if self.chars:
pred_str = chr(pred + ord('A'))
else:
pred_str = str(pred)
x_str = self._render_observation()
max_len = int(self.total_reward) + 1
for i in range(-2, max_len):
if i not in self.target:
y_str += " "
continue
target_str += self._get_str_target(i)
if i < y - 1:
y_str += self._get_str_target(i)
elif i == (y - 1):
if action is not None and out_act == 1:
if pred == self.target[i]:
y_str += colorize(pred_str, 'green', highlight=True)
else:
y_str += colorize(pred_str, 'red', highlight=True)
else:
y_str += self._get_str_target(i)
outfile.write(x_str)
outfile.write(y_str + "\n")
outfile.write(target_str + "\n\n")
if action is not None:
outfile.write("Current reward : %.3f\n" % self.reward)
outfile.write("Cumulative reward : %.3f\n" % self.sum_reward)
move = ""
if inp_act == 0:
move = "left"
elif inp_act == 1:
move = "right"
elif inp_act == 2:
move += "up"
elif inp_act == 3:
move += "down"
outfile.write("Action : Tuple(move over input: %s,\n" % move)
if out_act == 1:
out_act = "True"
else:
out_act = "False"
outfile.write(" write to the output tape: %s,\n" % out_act)
outfile.write(" prediction: %s)\n" % pred_str)
else:
outfile.write("\n" * 5)
return outfile
def _step(self, action):
self.last_action = action
inp_act, out_act, pred = action
done = False
reward = 0.0
# We are outside the sample.
self.time += 1
if self.y not in self.target:
reward = -10.0
done = True
else:
if out_act == 1:
if pred == self.target[self.y]:
reward = 1.0
else:
reward = -0.5
done = True
self.y += 1
if self.y not in self.target:
done = True
if inp_act == 0:
self.x[0] -= 1
elif inp_act == 1:
self.x[0] += 1
elif inp_act == 2:
self.x[1] -= 1
elif inp_act == 3:
self.x[1] += 1
if self.time > self.total_len + self.total_reward + 4:
reward = -1.0
done = True
obs = self._get_obs()
self.reward = reward
self.sum_reward += reward
return (obs, reward, done, {})
def _reset(self):
self.last_action = None
self.x = np.zeros(self.inp_dim).astype(np.int)
self.y = 0
AlgorithmicEnv.sum_rewards.append(self.sum_reward - self.total_reward)
AlgorithmicEnv.sum_rewards = AlgorithmicEnv.sum_rewards[-self.last:]
if len(AlgorithmicEnv.sum_rewards) == self.last and \
min(AlgorithmicEnv.sum_rewards) >= -1.0 and \
AlgorithmicEnv.current_length < 30:
AlgorithmicEnv.current_length += 1
AlgorithmicEnv.sum_rewards = []
self.sum_reward = 0.0
self.time = 0
self.total_len = random.randrange(3) + AlgorithmicEnv.current_length
self.set_data()
return self._get_obs()

View File

@@ -0,0 +1,24 @@
"""
Task is to copy content from the input tape to
the output tape. http://arxiv.org/abs/1511.07275
"""
import random
import numpy as np
from gym.envs.algorithmic import algorithmic_env
from gym.envs.algorithmic.algorithmic_env import ha
class CopyEnv(algorithmic_env.AlgorithmicEnv):
def __init__(self, base=5):
algorithmic_env.AlgorithmicEnv.__init__(self,
inp_dim=1,
base=base,
chars=True)
def set_data(self):
self.content = {}
self.target = {}
for i in range(self.total_len):
val = random.randrange(self.base)
self.content[ha(np.array([i]))] = val
self.target[i] = val
self.total_reward = self.total_len

View File

@@ -0,0 +1,27 @@
"""
Task is to return every second character from the input tape.
http://arxiv.org/abs/1511.07275
"""
import random
import numpy as np
from gym.envs.algorithmic import algorithmic_env
from gym.envs.algorithmic.algorithmic_env import ha
class DuplicatedInputEnv(algorithmic_env.AlgorithmicEnv):
def __init__(self, duplication=2, base=5):
self.duplication = duplication
algorithmic_env.AlgorithmicEnv.__init__(self,
inp_dim=1,
base=base,
chars=True)
def set_data(self):
self.content = {}
self.target = {}
copies = int(self.total_len / self.duplication)
for i in range(copies):
val = random.randrange(self.base)
self.target[i] = val
for d in range(self.duplication):
self.content[ha(np.array([i * self.duplication + d]))] = val
self.total_reward = self.total_len / self.duplication

View File

@@ -0,0 +1,29 @@
"""
Task is to copy content multiple-times from the input tape to
the output tape. http://arxiv.org/abs/1511.07275
"""
import random
import numpy as np
from gym.envs.algorithmic import algorithmic_env
from gym.envs.algorithmic.algorithmic_env import ha
class RepeatCopyEnv(algorithmic_env.AlgorithmicEnv):
def __init__(self, base=5):
algorithmic_env.AlgorithmicEnv.__init__(self,
inp_dim=1,
base=base,
chars=True)
self.last = 50
def set_data(self):
self.content = {}
self.target = {}
unique = set()
for i in range(self.total_len):
val = random.randrange(self.base)
self.content[ha(np.array([i]))] = val
self.target[i] = val
self.target[2 * self.total_len - i - 1] = val
self.target[2 * self.total_len + i] = val
self.total_reward = 3.0 * self.total_len + 0.9

View File

@@ -0,0 +1,27 @@
"""
Task is to reverse content over the input tape.
http://arxiv.org/abs/1511.07275
"""
import random
import numpy as np
from gym.envs.algorithmic import algorithmic_env
from gym.envs.algorithmic.algorithmic_env import ha
class ReverseEnv(algorithmic_env.AlgorithmicEnv):
def __init__(self, base=2):
algorithmic_env.AlgorithmicEnv.__init__(self,
inp_dim=1,
base=base,
chars=True)
algorithmic_env.AlgorithmicEnv.current_length = 1
self.last = 50
def set_data(self):
self.content = {}
self.target = {}
for i in range(self.total_len):
val = random.randrange(self.base)
self.content[ha(np.array([i]))] = val
self.target[self.total_len - i - 1] = val
self.total_reward = self.total_len + 0.9

View File

@@ -0,0 +1,30 @@
import random
import numpy as np
from gym.envs.algorithmic import algorithmic_env
from gym.envs.algorithmic.algorithmic_env import ha
class ReversedAdditionEnv(algorithmic_env.AlgorithmicEnv):
def __init__(self, rows=2, base=3):
self.rows = rows
algorithmic_env.AlgorithmicEnv.__init__(self,
inp_dim=2,
base=base,
chars=False)
def set_data(self):
self.content = {}
self.target = {}
curry = 0
for i in range(self.total_len):
vals = []
for k in range(self.rows):
val = random.randrange(self.base)
self.content[ha(np.array([i, k]))] = val
vals.append(val)
total = sum(vals) + curry
self.target[i] = total % self.base
curry = total / self.base
if curry > 0:
self.target[self.total_len] = curry
self.total_reward = self.total_len

View File

@@ -0,0 +1 @@
from gym.envs.atari.atari_env import AtariEnv

121
gym/envs/atari/atari_env.py Normal file
View File

@@ -0,0 +1,121 @@
import numpy as np
import os
import gym
from gym import error, spaces
from gym import utils
try:
import atari_py
except ImportError:
raise error.DependencyNotInstalled("{}. (HINT: you can install Atari dependencies with 'pip install gym[atari].)'")
import logging
logger = logging.getLogger(__name__)
def to_rgb(ale):
(screen_width,screen_height) = ale.getScreenDims()
arr = np.zeros((screen_height, screen_width, 4), dtype=np.uint8)
ale.getScreenRGB(arr) # says rgb but actually bgr
return arr[:,:,[2, 1, 0]].copy()
def to_ram(ale):
ram_size = ale.getRAMSize()
ram = np.zeros((ram_size),dtype=np.uint8)
ale.getRAM(ram)
return ram
class AtariEnv(gym.Env, utils.EzPickle):
metadata = {'render.modes': ['human', 'rgb_array']}
def __init__(self, game='pong', obs_type='ram'):
utils.EzPickle.__init__(self, game, obs_type)
assert obs_type in ('ram', 'image')
game_path = atari_py.get_game_path(game)
if not os.path.exists(game_path):
raise IOError('You asked for game %s but path %s does not exist'%(game, game_path))
self.ale = atari_py.ALEInterface()
self.ale.loadROM(game_path)
self._obs_type = obs_type
self._action_set = self.ale.getMinimalActionSet()
self.viewer = None
(screen_width,screen_height) = self.ale.getScreenDims()
self.action_space = spaces.Discrete(len(self._action_set))
if self._obs_type == 'ram':
self.observation_space = spaces.Box(low=np.zeros(128), high=np.zeros(128)+255)
elif self._obs_type == 'image':
self.observation_space = spaces.Box(low=0, high=255, shape=(screen_height, screen_width, 3))
else:
raise error.Error('Unrecognized observation type: {}'.format(self._obs_type))
def _step(self, a):
reward = 0.0
action = self._action_set[a]
num_steps = np.random.randint(2, 5)
for _ in xrange(num_steps):
reward += self.ale.act(action)
ob = self._get_obs()
return ob, reward, self.ale.game_over(), {}
def _get_image(self):
return to_rgb(self.ale)
def _get_ram(self):
return to_ram(self.ale)
@property
def _n_actions(self):
return len(self._action_set)
def _get_obs(self):
if self._obs_type == 'ram':
return self._get_ram()
elif self._obs_type == 'image':
img = self._get_image()
return img
# return: (states, observations)
def _reset(self):
self.ale.reset_game()
return self._get_obs()
def _render(self, mode='human', close=False):
if close:
if self.viewer is not None:
self.viewer.close()
return
img = self._get_image()
if mode == 'rgb_array':
return img
elif mode is 'human':
from gym.envs.classic_control import rendering
if self.viewer is None:
self.viewer = rendering.SimpleImageViewer()
self.viewer.imshow(img)
def get_action_meanings(self):
return [ACTION_MEANING[i] for i in self._action_set]
ACTION_MEANING = {
0 : "NOOP",
1 : "FIRE",
2 : "UP",
3 : "RIGHT",
4 : "LEFT",
5 : "DOWN",
6 : "UPRIGHT",
7 : "UPLEFT",
8 : "DOWNRIGHT",
9 : "DOWNLEFT",
10 : "UPFIRE",
11 : "RIGHTFIRE",
12 : "LEFTFIRE",
13 : "DOWNFIRE",
14 : "UPRIGHTFIRE",
15 : "UPLEFTFIRE",
16 : "DOWNRIGHTFIRE",
17 : "DOWNLEFTFIRE",
}

View File

@@ -0,0 +1 @@
from gym.envs.board_game.go import GoEnv

233
gym/envs/board_game/go.py Normal file
View File

@@ -0,0 +1,233 @@
from gym import error
try:
import pachi_py
except ImportError as e:
# The dependency group [pachi] should match the name is setup.py.
raise error.DependencyNotInstalled('{}. (HINT: you may need to install the Go dependencies via "pip install gym[pachi].)"'.format(e))
import numpy as np
import gym
from gym import spaces
import StringIO
import sys
# The coordinate representation of Pachi (and pachi_py) is defined on a board
# with extra rows and columns on the margin of the board, so positions on the board
# are not numbers in [0, board_size**2) as one would expect. For this Go env, we instead
# use an action representation that does fall in this more natural range.
def _coord_to_action(board, c):
'''Converts Pachi coordinates to actions'''
if c == pachi_py.PASS_COORD: return board.size**2 # pass
if c == pachi_py.RESIGN_COORD: return board.size**2 + 1 # resign
i, j = board.coord_to_ij(c)
return i*board.size + j
def _action_to_coord(board, a):
'''Converts actions to Pachi coordinates'''
if a == board.size**2: return pachi_py.PASS_COORD
if a == board.size**2 + 1: return pachi_py.RESIGN_COORD
return board.ij_to_coord(a // board.size, a % board.size)
def str_to_action(board, s):
return _coord_to_action(board, board.str_to_coord(s))
class GoState(object):
'''
Go game state. Consists of a current player and a board.
Actions are exposed as integers in [0, num_actions), which is different
from Pachi's internal "coord_t" encoding.
'''
def __init__(self, board, color):
'''
Args:
board: current board
color: color of current player
'''
assert color in [pachi_py.BLACK, pachi_py.WHITE], 'Invalid player color'
self.board, self.color = board, color
def act(self, action):
'''
Executes an action for the current player
Returns:
a new GoState with the new board and the player switched
'''
return GoState(
self.board.play(_action_to_coord(self.board, action), self.color),
pachi_py.stone_other(self.color))
def __repr__(self):
return 'To play: {}\n{}'.format(pachi_py.color_to_str(self.color), repr(self.board))
### Adversary policies ###
def random_policy(curr_state, prev_state, prev_action):
b = curr_state.board
legal_coords = b.get_legal_coords(curr_state.color)
return _coord_to_action(b, np.random.choice(legal_coords))
def make_pachi_policy(board, engine_type='uct', threads=1, pachi_timestr=''):
engine = pachi_py.PyPachiEngine(board, engine_type, 'threads=%d' % threads)
def pachi_policy(curr_state, prev_state, prev_action):
if prev_state is not None:
assert engine.curr_board == prev_state.board, 'Engine internal board is inconsistent with provided board. The Pachi engine must be called consistently as the game progresses.'
prev_coord = _action_to_coord(prev_state.board, prev_action)
engine.notify(prev_coord, prev_state.color)
engine.curr_board.play_inplace(prev_coord, prev_state.color)
out_coord = engine.genmove(curr_state.color, pachi_timestr)
out_action = _coord_to_action(curr_state.board, out_coord)
engine.curr_board.play_inplace(out_coord, curr_state.color)
return out_action
return pachi_policy
def _play(black_policy_fn, white_policy_fn, board_size=19):
'''
Samples a trajectory for two player policies.
Args:
black_policy_fn, white_policy_fn: functions that maps a GoState to a move coord (int)
'''
moves = []
prev_state, prev_action = None, None
curr_state = GoState(CreateBoard(board_size), BLACK)
while not curr_state.board.is_terminal:
a = (black_policy_fn if curr_state.color == BLACK else white_policy_fn)(curr_state, prev_state, prev_action)
next_state = curr_state.act(a)
moves.append((curr_state, a, next_state))
prev_state, prev_action = curr_state, a
curr_state = next_state
return moves
class GoEnv(gym.Env):
'''
Go environment. Play against a fixed opponent.
'''
metadata = {"render.modes": ["human", "ansi"]}
def __init__(self, player_color, opponent, observation_type, illegal_move_mode, board_size):
'''
Args:
player_color: Stone color for the agent. Either 'black' or 'white'
opponent: An opponent policy
observation_type: State encoding
illegal_move_mode: What to do when the agent makes an illegal move. Choices: 'raise' or 'lose'
'''
assert isinstance(board_size, int) and board_size >= 1, 'Invalid board size: {}'.format(board_size)
self.board_size = board_size
colormap = {
'black': pachi_py.BLACK,
'white': pachi_py.WHITE,
}
try:
self.player_color = colormap[player_color]
except KeyError:
raise error.Error("player_color must be 'black' or 'white', not {}".format(player_color))
self.opponent_policy = None
self.opponent = opponent
assert observation_type in ['image3c']
self.observation_type = observation_type
assert illegal_move_mode in ['lose', 'raise']
self.illegal_move_mode = illegal_move_mode
# One action for each board position, pass, and resign
self.action_space = spaces.Discrete(self.board_size**2 + 2)
if self.observation_type == 'image3c':
shape = pachi_py.CreateBoard(self.board_size).encode().shape
self.observation_space = spaces.Box(np.zeros(shape), np.ones(shape))
else:
raise error.Error('Unsupported observation type: {}'.format(self.observation_type))
self.reset()
def _reset(self):
self.state = GoState(pachi_py.CreateBoard(self.board_size), pachi_py.BLACK)
# (re-initialize) the opponent
# necessary because a pachi engine is attached to a game via internal data in a board
# so with a fresh game, we need a fresh engine
self._reset_opponent(self.state.board)
# Let the opponent play if it's not the agent's turn
if self.state.color != self.player_color:
self.state = self._exec_opponent_play(self.state, None, None)
assert self.state.color == self.player_color
self.done = self.state.board.is_terminal
return self.state.board.encode()
def _render(self, mode="human", close=False):
if close:
return
outfile = StringIO.StringIO() if mode == 'ansi' else sys.stdout
outfile.write(repr(self.state) + '\n')
return outfile
def _step(self, action):
assert self.state.color == self.player_color
# If already terminal, then don't do anything
if self.done:
return self.state.board.encode(), 0., True, {'state': self.state}
# Play
prev_state = self.state
try:
self.state = self.state.act(action)
except pachi_py.IllegalMove:
if self.illegal_move_mode == 'raise':
raise
elif self.illegal_move_mode == 'lose':
# Automatic loss on illegal move
self.done = True
return self.state.board.encode(), -1., True, {'state': self.state}
else:
raise error.Error('Unsupported illegal move action: {}'.format(self.illegal_move_mode))
# Opponent play
if not self.state.board.is_terminal:
self.state = self._exec_opponent_play(self.state, prev_state, action)
# After opponent play, we should be back to the original color
assert self.state.color == self.player_color
# Reward: 0 if nonterminal, 1 if won, -1 if lost
if self.state.board.is_terminal:
self.done = True
white_wins = self.state.board.official_score > 0
reward = 1. if (white_wins and self.player_color == pachi_py.WHITE) else -1.
else:
self.done = False
reward = 0.
return self.state.board.encode(), reward, self.done, {'state': self.state}
def _exec_opponent_play(self, curr_state, prev_state, prev_action):
assert curr_state.color != self.player_color
opponent_action = self.opponent_policy(curr_state, prev_state, prev_action)
return curr_state.act(opponent_action)
@property
def _state(self):
return self.state
def _reset_opponent(self, board):
if self.opponent == 'random':
self.opponent_policy = random_policy
elif self.opponent == 'pachi:uct:_2400':
self.opponent_policy = make_pachi_policy(board=board, engine_type='uct', pachi_timestr='_2400') # TODO: strength as argument
else:
raise error.Error('Unrecognized opponent policy {}'.format(self.opponent))

View File

@@ -0,0 +1,5 @@
from gym.envs.classic_control.cartpole import CartPoleEnv
from gym.envs.classic_control.mountain_car import MountainCarEnv
from gym.envs.classic_control.pendulum import PendulumEnv
from gym.envs.classic_control.acrobot import AcrobotEnv

View File

@@ -0,0 +1,288 @@
"""classic Acrobot task"""
from gym import core, spaces
import numpy as np
import time
__copyright__ = "Copyright 2013, RLPy http://acl.mit.edu/RLPy"
__credits__ = ["Alborz Geramifard", "Robert H. Klein", "Christoph Dann",
"William Dabney", "Jonathan P. How"]
__license__ = "BSD 3-Clause"
__author__ = "Christoph Dann <cdann@cdann.de>"
# SOURCE:
# https://github.com/rlpy/rlpy/blob/master/rlpy/Domains/Acrobot.py
class AcrobotEnv(core.Env):
"""
Acrobot is a 2-link pendulum with only the second joint actuated
Intitially, both links point downwards. The goal is to swing the
end-effector at a height at least the length of one link above the base.
Both links can swing freely and can pass by each other, i.e., they don't
collide when they have the same angle.
**STATE:**
The state consists of the two rotational joint angles and their velocities
[theta1 theta2 thetaDot1 thetaDot2]. An angle of 0 corresponds to corresponds
to the respective link pointing downwards (angles are in world coordinates).
**ACTIONS:**
The action is either applying +1, 0 or -1 torque on the joint between
the two pendulum links.
.. note::
The dynamics equations were missing some terms in the NIPS paper which
are present in the book. R. Sutton confirmed in personal correspondance
that the experimental results shown in the paper and the book were
generated with the equations shown in the book.
However, there is the option to run the domain with the paper equations
by setting book_or_nips = 'nips'
**REFERENCE:**
.. seealso::
R. Sutton: Generalization in Reinforcement Learning:
Successful Examples Using Sparse Coarse Coding (NIPS 1996)
.. seealso::
R. Sutton and A. G. Barto:
Reinforcement learning: An introduction.
Cambridge: MIT press, 1998.
.. warning::
This version of the domain uses the Runge-Kutta method for integrating
the system dynamics and is more realistic, but also considerably harder
than the original version which employs Euler integration,
see the AcrobotLegacy class.
"""
metadata = {
'render.modes': ['human', 'rgb_array'],
'video.frames_per_second' : 15
}
dt = .2
LINK_LENGTH_1 = 1. # [m]
LINK_LENGTH_2 = 1. # [m]
LINK_MASS_1 = 1. #: [kg] mass of link 1
LINK_MASS_2 = 1. #: [kg] mass of link 2
LINK_COM_POS_1 = 0.5 #: [m] position of the center of mass of link 1
LINK_COM_POS_2 = 0.5 #: [m] position of the center of mass of link 2
LINK_MOI = 1. #: moments of inertia for both links
MAX_VEL_1 = 4 * np.pi
MAX_VEL_2 = 9 * np.pi
AVAIL_TORQUE = [-1., 0., +1]
torque_noise_max = 0.
#: use dynamics equations from the nips paper or the book
book_or_nips = "book"
action_arrow = None
domain_fig = None
actions_num = 3
def __init__(self):
high = np.array([np.pi, np.pi, self.MAX_VEL_1, self.MAX_VEL_2])
low = -high
self.observation_space = spaces.Box(low, high)
self.action_space = spaces.Discrete(3)
self.viewer = None
def _reset(self):
self.state = np.random.uniform(low=-0.1, high=0.1, size=(4,))
return self.state
def _step(self, a):
s = self.state
torque = self.AVAIL_TORQUE[a]
# Add noise to the force action
if self.torque_noise_max > 0:
torque += np.random.uniform(-self.torque_noise_max, self.torque_noise_max)
# Now, augment the state with our force action so it can be passed to
# _dsdt
s_augmented = np.append(s, torque)
ns = rk4(self._dsdt, s_augmented, [0, self.dt])
# only care about final timestep of integration returned by integrator
ns = ns[-1]
ns = ns[:4] # omit action
# ODEINT IS TOO SLOW!
# ns_continuous = integrate.odeint(self._dsdt, self.s_continuous, [0, self.dt])
# self.s_continuous = ns_continuous[-1] # We only care about the state
# at the ''final timestep'', self.dt
ns[0] = wrap(ns[0], -np.pi, np.pi)
ns[1] = wrap(ns[1], -np.pi, np.pi)
ns[2] = bound(ns[2], -self.MAX_VEL_1, self.MAX_VEL_1)
ns[3] = bound(ns[3], -self.MAX_VEL_2, self.MAX_VEL_2)
self.state = ns.copy()
terminal = self._terminal()
reward = -1. if not terminal else 0.
return (np.array(self.state), reward, terminal, {})
def _terminal(self):
s = self.state
return bool(-np.cos(s[0]) - np.cos(s[1] + s[0]) > 1.)
def _dsdt(self, s_augmented, t):
m1 = self.LINK_MASS_1
m2 = self.LINK_MASS_2
l1 = self.LINK_LENGTH_1
lc1 = self.LINK_COM_POS_1
lc2 = self.LINK_COM_POS_2
I1 = self.LINK_MOI
I2 = self.LINK_MOI
g = 9.8
a = s_augmented[-1]
s = s_augmented[:-1]
theta1 = s[0]
theta2 = s[1]
dtheta1 = s[2]
dtheta2 = s[3]
d1 = m1 * lc1 ** 2 + m2 * \
(l1 ** 2 + lc2 ** 2 + 2 * l1 * lc2 * np.cos(theta2)) + I1 + I2
d2 = m2 * (lc2 ** 2 + l1 * lc2 * np.cos(theta2)) + I2
phi2 = m2 * lc2 * g * np.cos(theta1 + theta2 - np.pi / 2.)
phi1 = - m2 * l1 * lc2 * dtheta2 ** 2 * np.sin(theta2) \
- 2 * m2 * l1 * lc2 * dtheta2 * dtheta1 * np.sin(theta2) \
+ (m1 * lc1 + m2 * l1) * g * np.cos(theta1 - np.pi / 2) + phi2
if self.book_or_nips == "nips":
# the following line is consistent with the description in the
# paper
ddtheta2 = (a + d2 / d1 * phi1 - phi2) / \
(m2 * lc2 ** 2 + I2 - d2 ** 2 / d1)
else:
# the following line is consistent with the java implementation and the
# book
ddtheta2 = (a + d2 / d1 * phi1 - m2 * l1 * lc2 * dtheta1 ** 2 * np.sin(theta2) - phi2) \
/ (m2 * lc2 ** 2 + I2 - d2 ** 2 / d1)
ddtheta1 = -(d2 * ddtheta2 + phi1) / d1
return (dtheta1, dtheta2, ddtheta1, ddtheta2, 0.)
def _render(self, mode='human', close=False):
from gym.envs.classic_control import rendering
if close:
if self.viewer is not None:
self.viewer.close()
return
s = self.state
if self.viewer is None:
self.viewer = rendering.Viewer(500,500)
self.viewer.set_bounds(-2.2,2.2,-2.2,2.2)
p1 = [-self.LINK_LENGTH_1 *
np.cos(s[0]), self.LINK_LENGTH_1 * np.sin(s[0])]
p2 = [p1[0] - self.LINK_LENGTH_2 * np.cos(s[0] + s[1]),
p1[1] + self.LINK_LENGTH_2 * np.sin(s[0] + s[1])]
xys = np.array([[0,0], p1, p2])[:,::-1]
thetas = [s[0]-np.pi/2, s[0]+s[1]-np.pi/2]
self.viewer.draw_line((-2.2, 1), (2.2, 1))
for ((x,y),th) in zip(xys, thetas):
l,r,t,b = 0, 1, .1, -.1
jtransform = rendering.Transform(rotation=th, translation=(x,y))
link = self.viewer.draw_polygon([(l,b), (l,t), (r,t), (r,b)])
link.add_attr(jtransform)
link.set_color(0,.8, .8)
circ = self.viewer.draw_circle(.1)
circ.set_color(.8, .8, 0)
circ.add_attr(jtransform)
self.viewer.render()
if mode == 'rgb_array':
return self.viewer.get_array()
elif mode is 'human':
pass
def wrap(x, m, M):
"""
:param x: a scalar
:param m: minimum possible value in range
:param M: maximum possible value in range
Wraps ``x`` so m <= x <= M; but unlike ``bound()`` which
truncates, ``wrap()`` wraps x around the coordinate system defined by m,M.\n
For example, m = -180, M = 180 (degrees), x = 360 --> returns 0.
"""
diff = M - m
while x > M:
x = x - diff
while x < m:
x = x + diff
return x
def bound(x, m, M=None):
"""
:param x: scalar
Either have m as scalar, so bound(x,m,M) which returns m <= x <= M *OR*
have m as length 2 vector, bound(x,m, <IGNORED>) returns m[0] <= x <= m[1].
"""
if M is None:
M = m[1]
m = m[0]
# bound x between min (m) and Max (M)
return min(max(x, m), M)
def rk4(derivs, y0, t, *args, **kwargs):
"""
Integrate 1D or ND system of ODEs using 4-th order Runge-Kutta.
This is a toy implementation which may be useful if you find
yourself stranded on a system w/o scipy. Otherwise use
:func:`scipy.integrate`.
*y0*
initial state vector
*t*
sample times
*derivs*
returns the derivative of the system and has the
signature ``dy = derivs(yi, ti)``
*args*
additional arguments passed to the derivative function
*kwargs*
additional keyword arguments passed to the derivative function
Example 1 ::
## 2D system
def derivs6(x,t):
d1 = x[0] + 2*x[1]
d2 = -3*x[0] + 4*x[1]
return (d1, d2)
dt = 0.0005
t = arange(0.0, 2.0, dt)
y0 = (1,2)
yout = rk4(derivs6, y0, t)
Example 2::
## 1D system
alpha = 2
def derivs(x,t):
return -alpha*x + exp(-t)
y0 = 1
yout = rk4(derivs, y0, t)
If you have access to scipy, you should probably be using the
scipy.integrate tools rather than this function.
"""
try:
Ny = len(y0)
except TypeError:
yout = np.zeros((len(t),), np.float_)
else:
yout = np.zeros((len(t), Ny), np.float_)
yout[0] = y0
i = 0
for i in np.arange(len(t) - 1):
thist = t[i]
dt = t[i + 1] - thist
dt2 = dt / 2.0
y0 = yout[i]
k1 = np.asarray(derivs(y0, thist, *args, **kwargs))
k2 = np.asarray(derivs(y0 + dt2 * k1, thist + dt2, *args, **kwargs))
k3 = np.asarray(derivs(y0 + dt2 * k2, thist + dt2, *args, **kwargs))
k4 = np.asarray(derivs(y0 + dt * k3, thist + dt, *args, **kwargs))
yout[i + 1] = y0 + dt / 6.0 * (k1 + 2 * k2 + 2 * k3 + k4)
return yout

Binary file not shown.

After

Width:  |  Height:  |  Size: 6.8 KiB

View File

@@ -0,0 +1,118 @@
"""
Classic cart-pole system implemented by Rich Sutton et al.
Copied from https://webdocs.cs.ualberta.ca/~sutton/book/code/pole.c
"""
import math
import gym
from gym import spaces
import numpy as np
class CartPoleEnv(gym.Env):
metadata = {
'render.modes': ['human', 'rgb_array'],
'video.frames_per_second' : 50
}
def __init__(self):
self.gravity = 9.8
self.masscart = 1.0
self.masspole = 0.1
self.total_mass = (self.masspole + self.masscart)
self.length = 0.5 # actually half the pole's length
self.polemass_length = (self.masspole * self.length)
self.force_mag = 10.0
self.tau = 0.02 # seconds between state updates
# Angle at which to fail the episode
self.theta_threshold_radians = 12 * 2 * math.pi / 360
self.x_threshold = 2.4
self.reset()
self.viewer = None
high = np.array([self.x_threshold, np.inf, self.theta_threshold_radians, np.inf])
self.action_space = spaces.Discrete(2)
self.observation_space = spaces.Box(-high, high)
def _step(self, action):
action = action
assert action==0 or action==1, "%r (%s) invalid"%(action, type(action))
state = self.state
x, x_dot, theta, theta_dot = state
force = self.force_mag if action==1 else -self.force_mag
costheta = math.cos(theta)
sintheta = math.sin(theta)
temp = (force + self.polemass_length * theta_dot * theta_dot * sintheta) / self.total_mass
thetaacc = (self.gravity * sintheta - costheta* temp) / (self.length * (4.0/3.0 - self.masspole * costheta * costheta / self.total_mass))
xacc = temp - self.polemass_length * thetaacc * costheta / self.total_mass
x = x + self.tau * x_dot
x_dot = x_dot + self.tau * xacc
theta = theta + self.tau * theta_dot
theta_dot = theta_dot + self.tau * thetaacc
self.state = (x,x_dot,theta,theta_dot)
done = x < -self.x_threshold \
or x > self.x_threshold \
or theta < -self.theta_threshold_radians \
or theta > self.theta_threshold_radians
done = bool(done)
reward = 1.0
return np.array(self.state), reward, done, {}
def _reset(self):
self.state = np.random.uniform(low=-0.05, high=0.05, size=(4,))
return np.array(self.state)
def _render(self, mode='human', close=False):
if close:
if self.viewer is not None:
self.viewer.close()
return
screen_width = 600
screen_height = 400
world_width = self.x_threshold*2
scale = screen_width/world_width
carty = 100 # TOP OF CART
polewidth = 10.0
polelen = scale * 1.0
cartwidth = 50.0
cartheight = 30.0
if self.viewer is None:
from gym.envs.classic_control import rendering
self.viewer = rendering.Viewer(screen_width, screen_height)
l,r,t,b = -cartwidth/2, cartwidth/2, cartheight/2, -cartheight/2
axleoffset =cartheight/4.0
cart = rendering.FilledPolygon([(l,b), (l,t), (r,t), (r,b)])
self.carttrans = rendering.Transform()
cart.add_attr(self.carttrans)
self.viewer.add_geom(cart)
l,r,t,b = -polewidth/2,polewidth/2,polelen-polewidth/2,-polewidth/2
pole = rendering.FilledPolygon([(l,b), (l,t), (r,t), (r,b)])
pole.set_color(.8,.6,.4)
self.poletrans = rendering.Transform(translation=(0, axleoffset))
pole.add_attr(self.poletrans)
pole.add_attr(self.carttrans)
self.viewer.add_geom(pole)
self.axle = rendering.make_circle(polewidth/2)
self.axle.add_attr(self.poletrans)
self.axle.add_attr(self.carttrans)
self.axle.set_color(.5,.5,.8)
self.viewer.add_geom(self.axle)
self.track = rendering.Line((0,carty), (screen_width,carty))
self.track.set_color(0,0,0)
self.viewer.add_geom(self.track)
x = self.state
cartx = x[0]*scale+screen_width/2.0 # MIDDLE OF CART
self.carttrans.set_translation(cartx, carty)
self.poletrans.set_rotation(-x[2])
self.viewer.render()
if mode == 'rgb_array':
return self.viewer.get_array()
elif mode is 'human':
pass
else:
return super(CartPoleEnv, self).render(mode=mode)

View File

@@ -0,0 +1,119 @@
"""
https://webdocs.cs.ualberta.ca/~sutton/MountainCar/MountainCar1.cp
"""
import math
import gym
from gym import spaces
import numpy as np
class MountainCarEnv(gym.Env):
metadata = {
'render.modes': ['human', 'rgb_array'],
'video.frames_per_second': 30
}
def __init__(self):
self.reset()
self.viewer = None
self.reset()
self.min_position = -1.2
self.max_position = 0.6
self.max_speed = 0.07
self.goal_position = 0.5
self.low = np.array([self.min_position, -self.max_speed])
self.high = np.array([self.max_position, self.max_speed])
self.action_space = spaces.Discrete(3)
self.observation_space = spaces.Box(self.low, self.high)
def _step(self, action):
# action = np.sign((self.state[0]+math.pi/2) * self.state[1])+1
position, velocity = self.state
velocity += (action-1)*0.001 + math.cos(3*position)*(-0.0025)
if (velocity > self.max_speed): velocity = self.max_speed
if (velocity < -self.max_speed): velocity = -self.max_speed
position += velocity
if (position > self.max_position): position = self.max_position
if (position < self.min_position): position = self.min_position
if (position==self.min_position and velocity<0): velocity = 0
done = bool(position >= self.goal_position)
reward = -1.0
self.state = (position, velocity)
return np.array(self.state), reward, done, {}
def _reset(self):
self.state = np.array([np.random.uniform(low=-0.6, high=-0.4), 0])
return np.array(self.state)
def _height(self, xs):
return np.sin(3 * xs)*.45+.55
def _render(self, mode='human', close=False):
if close:
if self.viewer is not None:
self.viewer.close()
return
screen_width = 600
screen_height = 400
world_width = self.max_position - self.min_position
scale = screen_width/world_width
carwidth=40
carheight=20
if self.viewer is None:
from gym.envs.classic_control import rendering
self.viewer = rendering.Viewer(screen_width, screen_height)
xs = np.linspace(self.min_position, self.max_position, 100)
ys = self._height(xs)
xys = zip((xs-self.min_position)*scale, ys*scale)
self.track = rendering.make_polyline(xys)
self.track.set_linewidth(4)
self.viewer.add_geom(self.track)
clearance = 10
l,r,t,b = -carwidth/2, carwidth/2, carheight, 0
car = rendering.FilledPolygon([(l,b), (l,t), (r,t), (r,b)])
car.add_attr(rendering.Transform(translation=(0, clearance)))
self.cartrans = rendering.Transform()
car.add_attr(self.cartrans)
self.viewer.add_geom(car)
frontwheel = rendering.make_circle(carheight/2.5)
frontwheel.set_color(.5, .5, .5)
frontwheel.add_attr(rendering.Transform(translation=(carwidth/4,clearance)))
frontwheel.add_attr(self.cartrans)
self.viewer.add_geom(frontwheel)
backwheel = rendering.make_circle(carheight/2.5)
backwheel.add_attr(rendering.Transform(translation=(-carwidth/4,clearance)))
backwheel.add_attr(self.cartrans)
backwheel.set_color(.5, .5, .5)
self.viewer.add_geom(backwheel)
flagx = (self.goal_position-self.min_position)*scale
flagy1 = self._height(self.goal_position)*scale
flagy2 = flagy1 + 50
flagpole = rendering.Line((flagx, flagy1), (flagx, flagy2))
self.viewer.add_geom(flagpole)
flag = rendering.FilledPolygon([(flagx, flagy2), (flagx, flagy2-10), (flagx+25, flagy2-5)])
flag.set_color(.8,.8,0)
self.viewer.add_geom(flag)
pos = self.state[0]
self.cartrans.set_translation((pos-self.min_position)*scale, self._height(pos)*scale)
self.cartrans.set_rotation(math.cos(3 * pos))
self.viewer.render()
if mode == 'rgb_array':
return self.viewer.get_array()
elif mode is 'human':
pass
else:
return super(MountainCarEnv, self).render(mode=mode)

View File

@@ -0,0 +1,89 @@
import gym
from gym import spaces
import numpy as np
from os import path
class PendulumEnv(gym.Env):
metadata = {
'render.modes' : ['human', 'rgb_array'],
'video.frames_per_second' : 30
}
def __init__(self):
self.max_speed=8
self.max_torque=2.
self.dt=.05
self.viewer = None
high = np.array([1., 1., self.max_speed])
self.action_space = spaces.Box(low=-self.max_torque, high=self.max_torque, shape=(1,))
self.observation_space = spaces.Box(low=-high, high=high)
def _step(self,u):
th, thdot = self.state # th := theta
g = 10.
m = 1.
l = 1.
dt = self.dt
self.last_u = u # for rendering
u = np.clip(u, -self.max_torque, self.max_torque)[0]
costs = angle_normalize(th)**2 + .1*thdot**2 + .001*(u**2)
newthdot = thdot + (-3*g/(2*l) * np.sin(th + np.pi) + 3./(m*l**2)*u) * dt
newth = th + newthdot*dt
newthdot = np.clip(newthdot, -self.max_speed, self.max_speed) #pylint: disable=E1111
self.state = np.array([newth, newthdot])
return self._get_obs(), -costs, False, {}
def _reset(self):
high = np.array([np.pi, 1])
self.state = np.random.uniform(low=-high, high=high)
self.last_u = None
return self._get_obs()
def _get_obs(self):
theta, thetadot = self.state
return np.array([np.cos(theta), np.sin(theta), thetadot])
def _render(self, mode='human', close=False):
if close:
if self.viewer is not None:
self.viewer.close()
return
if self.viewer is None:
from gym.envs.classic_control import rendering
self.viewer = rendering.Viewer(500,500)
self.viewer.set_bounds(-2.2,2.2,-2.2,2.2)
rod = rendering.make_capsule(1, .2)
rod.set_color(.8, .3, .3)
self.pole_transform = rendering.Transform()
rod.add_attr(self.pole_transform)
self.viewer.add_geom(rod)
axle = rendering.make_circle(.05)
axle.set_color(0,0,0)
self.viewer.add_geom(axle)
fname = path.join(path.dirname(__file__), "assets/clockwise.png")
self.img = rendering.Image(fname, 1., 1.)
self.imgtrans = rendering.Transform()
self.img.add_attr(self.imgtrans)
self.viewer.add_onetime(self.img)
self.pole_transform.set_rotation(self.state[0] + np.pi/2)
if self.last_u:
self.imgtrans.scale = (-self.last_u/2, np.abs(self.last_u)/2)
self.viewer.render()
if mode == 'rgb_array':
return self.viewer.get_array()
elif mode is 'human':
pass
else:
return super(PendulumEnv, self).render(mode=mode)
def angle_normalize(x):
return (((x+np.pi) % (2*np.pi)) - np.pi)

View File

@@ -0,0 +1,292 @@
"""
2D rendering framework
"""
from __future__ import division
import os, sys
if "Apple" in sys.version:
if 'DYLD_FALLBACK_LIBRARY_PATH' in os.environ:
os.environ['DYLD_FALLBACK_LIBRARY_PATH'] += ':/usr/lib'
# (JDS 2016/04/15): avoid bug on Anaconda 2.3.0 / Yosemite
from gym import error
import pyglet
try:
from pyglet.gl import *
except ImportError as e:
raise error.DependencyNotInstalled("""{} (while running: from pyglet.gl import *).
(HINT: make sure you have OpenGL install. On Ubuntu, you can run 'apt-get install python-opengl'. If you're running on a server, you may need a virtual frame buffer; something like this should work: 'xvfb-run -s "-screen 0 1400x900x24" <your script here>')""".format(e))
import math
import numpy as np
RAD2DEG = 57.29577951308232
class Viewer(object):
def __init__(self, width, height):
self.width = width
self.height = height
self.window = pyglet.window.Window(width=width, height=height)
self.geoms = []
self.onetime_geoms = []
self.transform = Transform()
glEnable(GL_BLEND)
glBlendFunc(GL_SRC_ALPHA, GL_ONE_MINUS_SRC_ALPHA)
def close(self):
self.window.close()
def set_bounds(self, left, right, bottom, top):
assert right > left and top > bottom
scalex = self.width/(right-left)
scaley = self.height/(top-bottom)
self.transform = Transform(
translation=(-left*scalex, -bottom*scalex),
scale=(scalex, scaley))
def add_geom(self, geom):
self.geoms.append(geom)
def add_onetime(self, geom):
self.onetime_geoms.append(geom)
def render(self):
glClearColor(1,1,1,1)
self.window.clear()
self.window.switch_to()
self.window.dispatch_events()
self.transform.enable()
for geom in self.geoms:
geom.render()
for geom in self.onetime_geoms:
geom.render()
self.transform.disable()
self.window.flip()
self.onetime_geoms = []
# Convenience
def draw_circle(self, radius=10, res=30, filled=True, **attrs):
geom = make_circle(radius=radius, res=res, filled=filled)
_add_attrs(geom, attrs)
self.add_onetime(geom)
return geom
def draw_polygon(self, v, filled=True, **attrs):
geom = make_polygon(v=v, filled=filled)
_add_attrs(geom, attrs)
self.add_onetime(geom)
return geom
def draw_polyline(self, v, **attrs):
geom = make_polyline(v=v)
_add_attrs(geom, attrs)
self.add_onetime(geom)
return geom
def draw_line(self, start, end, **attrs):
geom = Line(start, end)
_add_attrs(geom, attrs)
self.add_onetime(geom)
return geom
def get_array(self):
self.window.flip()
image_data = pyglet.image.get_buffer_manager().get_color_buffer().get_image_data()
self.window.flip()
arr = np.fromstring(image_data.data, dtype=np.uint8, sep='')
arr = arr.reshape(self.height, self.width, 4)
return arr[::-1,:,0:3]
def _add_attrs(geom, attrs):
if "color" in attrs:
geom.set_color(attrs["color"])
if "linewidth" in attrs:
geom.set_linewidth(attrs["linewidth"])
class Geom(object):
def __init__(self):
self._color=Color((0, 0, 0, 1.0))
self.attrs = [self._color]
def render(self):
for attr in reversed(self.attrs):
attr.enable()
self.render1()
for attr in self.attrs:
attr.disable()
def render1(self):
raise NotImplementedError
def add_attr(self, attr):
self.attrs.append(attr)
def set_color(self, r, g, b):
self._color.vec4 = (r, g, b, 1)
class Attr(object):
def enable(self):
raise NotImplementedError
def disable(self):
pass
class Transform(Attr):
def __init__(self, translation=(0.0, 0.0), rotation=0.0, scale=(1,1)):
self.set_translation(*translation)
self.set_rotation(rotation)
self.set_scale(*scale)
def enable(self):
glPushMatrix()
glTranslatef(self.translation[0], self.translation[1], 0) # translate to GL loc ppint
glRotatef(RAD2DEG * self.rotation, 0, 0, 1.0)
glScalef(self.scale[0], self.scale[1], 1)
def disable(self):
glPopMatrix()
def set_translation(self, newx, newy):
self.translation = (float(newx), float(newy))
def set_rotation(self, new):
self.rotation = float(new)
def set_scale(self, newx, newy):
self.scale = (float(newx), float(newy))
class Color(Attr):
def __init__(self, vec4):
self.vec4 = vec4
def enable(self):
glColor4f(*self.vec4)
class LineStyle(Attr):
def __init__(self, style):
self.style = style
def enable(self):
glEnable(GL_LINE_STIPPLE)
glLineStipple(1, self.style)
def disable(self):
glDisable(GL_LINE_STIPPLE)
class LineWidth(Attr):
def __init__(self, stroke):
self.stroke = stroke
def enable(self):
glLineWidth(self.stroke)
class Point(Geom):
def __init__(self):
Geom.__init__(self)
def render1(self):
glBegin(GL_POINTS) # draw point
glVertex3f(0.0, 0.0, 0.0)
glEnd()
class FilledPolygon(Geom):
def __init__(self, v):
Geom.__init__(self)
self.v = v
def render1(self):
if len(self.v) == 4 : glBegin(GL_QUADS)
elif len(self.v) > 4 : glBegin(GL_POLYGON)
else: glBegin(GL_TRIANGLES)
for p in self.v:
glVertex3f(p[0], p[1],0) # draw each vertex
glEnd()
def make_circle(radius=10, res=30, filled=True):
points = []
for i in xrange(res):
ang = 2*math.pi*i / res
points.append((math.cos(ang)*radius, math.sin(ang)*radius))
if filled:
return FilledPolygon(points)
else:
return PolyLine(points, True)
def make_polygon(v, filled=True):
if filled: return FilledPolygon(v)
else: return PolyLine(v, True)
def make_polyline(v):
return PolyLine(v, False)
def make_capsule(length, width):
l, r, t, b = 0, length, width/2, -width/2
box = make_polygon([(l,b), (l,t), (r,t), (r,b)])
circ0 = make_circle(width/2)
circ1 = make_circle(width/2)
circ1.add_attr(Transform(translation=(length, 0)))
geom = Compound([box, circ0, circ1])
return geom
class Compound(Geom):
def __init__(self, gs):
Geom.__init__(self)
self.gs = gs
for g in self.gs:
g.attrs = [a for a in g.attrs if not isinstance(a, Color)]
def render1(self):
for g in self.gs:
g.render()
class PolyLine(Geom):
def __init__(self, v, close):
Geom.__init__(self)
self.v = v
self.close = close
self.linewidth = LineWidth(1)
self.add_attr(self.linewidth)
def render1(self):
glBegin(GL_LINE_LOOP if self.close else GL_LINE_STRIP)
for p in self.v:
glVertex3f(p[0], p[1],0) # draw each vertex
glEnd()
def set_linewidth(self, x):
self.linewidth.stroke = x
class Line(Geom):
def __init__(self, start=(0.0, 0.0), end=(0.0, 0.0)):
Geom.__init__(self)
self.start = start
self.end = end
self.linewidth = LineWidth(1)
self.add_attr(self.linewidth)
def render1(self):
glBegin(GL_LINES)
glVertex2f(*self.start)
glVertex2f(*self.end)
glEnd()
class Image(Geom):
def __init__(self, fname, width, height):
Geom.__init__(self)
self.width = width
self.height = height
img = pyglet.image.load(fname)
self.img = img
self.flip = False
def render1(self):
self.img.blit(-self.width/2, -self.height/2, width=self.width, height=self.height)
# ================================================================
class SimpleImageViewer(object):
def __init__(self):
self.window = None
self.isopen = False
def imshow(self, arr):
if self.window is None:
height, width, channels = arr.shape
self.window = pyglet.window.Window(width=width, height=height)
self.width = width
self.height = height
self.isopen = True
assert arr.shape == (self.height, self.width, 3), "You passed in an image with the wrong number shape"
image = pyglet.image.ImageData(self.width, self.height, 'RGB', arr.tobytes(), pitch=self.width * -3)
self.window.clear()
self.window.switch_to()
self.window.dispatch_events()
image.blit(0,0)
self.window.flip()
def close(self):
if self.isopen:
self.window.close()
self.isopen = False
def __del__(self):
self.close()

1
gym/envs/mujoco/.gitignore vendored Normal file
View File

@@ -0,0 +1 @@
mujoco-bundle

View File

@@ -0,0 +1,12 @@
from gym.envs.mujoco.mujoco_env import MujocoEnv
# ^^^^^ so that user gets the correct error
# message if mujoco is not installed correctly
from gym.envs.mujoco.ant import AntEnv
from gym.envs.mujoco.half_cheetah import HalfCheetahEnv
from gym.envs.mujoco.hopper import HopperEnv
from gym.envs.mujoco.walker2d import Walker2dEnv
from gym.envs.mujoco.humanoid import HumanoidEnv
from gym.envs.mujoco.inverted_pendulum import InvertedPendulumEnv
from gym.envs.mujoco.inverted_double_pendulum import InvertedDoublePendulumEnv
from gym.envs.mujoco.reacher import ReacherEnv
from gym.envs.mujoco.swimmer import SwimmerEnv

46
gym/envs/mujoco/ant.py Normal file
View File

@@ -0,0 +1,46 @@
import numpy as np
from gym import utils
from gym.envs.mujoco import mujoco_env
class AntEnv(mujoco_env.MujocoEnv, utils.EzPickle):
def __init__(self):
mujoco_env.MujocoEnv.__init__(self, 'ant.xml', 5)
utils.EzPickle.__init__(self)
self.finalize()
def _step(self, a):
xposbefore = self.get_body_com("torso")[0]
self.do_simulation(a, self.frame_skip)
xposafter = self.get_body_com("torso")[0]
forward_reward = (xposafter - xposbefore)/self.dt
ctrl_cost = .5 * np.square(a).sum()
contact_cost = 0.5 * 1e-3 * np.sum(
np.square(np.clip(self.model.data.cfrc_ext, -1, 1)))
survive_reward = 1.0
reward = forward_reward - ctrl_cost - contact_cost + survive_reward
state = self._state
notdone = np.isfinite(state).all() \
and state[2] >= 0.2 and state[2] <= 1.0
done = not notdone
ob = self._get_obs()
return ob, reward, done, dict(
reward_forward=forward_reward,
reward_ctrl=-ctrl_cost,
reward_contact=-contact_cost,
reward_survive=survive_reward)
def _get_obs(self):
return np.concatenate([
self.model.data.qpos.flat[2:],
self.model.data.qvel.flat,
np.clip(self.model.data.cfrc_ext, -1, 1).flat,
])
def _reset(self):
self.model.data.qpos = self.init_qpos + np.random.uniform(size=(self.model.nq,1),low=-.1,high=.1)
self.model.data.qvel = self.init_qvel + np.random.randn(self.model.nv,1)*.1
self.reset_viewer_if_necessary()
return self._get_obs()
def viewer_setup(self):
self.viewer.cam.distance = self.model.stat.extent * 0.5

View File

@@ -0,0 +1,80 @@
<mujoco model="ant">
<compiler angle="degree" coordinate="local" inertiafromgeom="true"/>
<option integrator="RK4" timestep="0.01"/>
<custom>
<numeric data="0.0 0.0 0.55 1.0 0.0 0.0 0.0 0.0 1.0 0.0 -1.0 0.0 -1.0 0.0 1.0" name="init_qpos"/>
</custom>
<default>
<joint armature="1" damping="1" limited="true"/>
<geom conaffinity="0" condim="3" density="5.0" friction="1 0.5 0.5" margin="0.01" rgba="0.8 0.6 0.4 1"/>
</default>
<asset>
<texture builtin="gradient" height="100" rgb1="1 1 1" rgb2="0 0 0" type="skybox" width="100"/>
<texture builtin="flat" height="1278" mark="cross" markrgb="1 1 1" name="texgeom" random="0.01" rgb1="0.8 0.6 0.4" rgb2="0.8 0.6 0.4" type="cube" width="127"/>
<texture builtin="checker" height="100" name="texplane" rgb1="0 0 0" rgb2="0.8 0.8 0.8" type="2d" width="100"/>
<material name="MatPlane" reflectance="0.5" shininess="1" specular="1" texrepeat="60 60" texture="texplane"/>
<material name="geom" texture="texgeom" texuniform="true"/>
</asset>
<worldbody>
<light cutoff="100" diffuse="1 1 1" dir="-0 0 -1.3" directional="true" exponent="1" pos="0 0 1.3" specular=".1 .1 .1"/>
<geom conaffinity="1" condim="3" material="MatPlane" name="floor" pos="0 0 0" rgba="0.8 0.9 0.8 1" size="40 40 40" type="plane"/>
<body name="torso" pos="0 0 0.75">
<geom name="torso_geom" pos="0 0 0" size="0.25" type="sphere"/>
<joint armature="0" damping="0" limited="false" margin="0.01" name="root" pos="0 0 0" type="free"/>
<body name="front_left_leg" pos="0 0 0">
<geom fromto="0.0 0.0 0.0 0.2 0.2 0.0" name="aux_1_geom" size="0.08" type="capsule"/>
<body name="aux_1" pos="0.2 0.2 0">
<joint axis="0 0 1" name="hip_1" pos="0.0 0.0 0.0" range="-30 30" type="hinge"/>
<geom fromto="0.0 0.0 0.0 0.2 0.2 0.0" name="left_leg_geom" size="0.08" type="capsule"/>
<body pos="0.2 0.2 0">
<joint axis="-1 1 0" name="ankle_1" pos="0.0 0.0 0.0" range="30 70" type="hinge"/>
<geom fromto="0.0 0.0 0.0 0.4 0.4 0.0" name="left_ankle_geom" size="0.08" type="capsule"/>
</body>
</body>
</body>
<body name="front_right_leg" pos="0 0 0">
<geom fromto="0.0 0.0 0.0 -0.2 0.2 0.0" name="aux_2_geom" size="0.08" type="capsule"/>
<body name="aux_2" pos="-0.2 0.2 0">
<joint axis="0 0 1" name="hip_2" pos="0.0 0.0 0.0" range="-30 30" type="hinge"/>
<geom fromto="0.0 0.0 0.0 -0.2 0.2 0.0" name="right_leg_geom" size="0.08" type="capsule"/>
<body pos="-0.2 0.2 0">
<joint axis="1 1 0" name="ankle_2" pos="0.0 0.0 0.0" range="-70 -30" type="hinge"/>
<geom fromto="0.0 0.0 0.0 -0.4 0.4 0.0" name="right_ankle_geom" size="0.08" type="capsule"/>
</body>
</body>
</body>
<body name="back_leg" pos="0 0 0">
<geom fromto="0.0 0.0 0.0 -0.2 -0.2 0.0" name="aux_3_geom" size="0.08" type="capsule"/>
<body name="aux_3" pos="-0.2 -0.2 0">
<joint axis="0 0 1" name="hip_3" pos="0.0 0.0 0.0" range="-30 30" type="hinge"/>
<geom fromto="0.0 0.0 0.0 -0.2 -0.2 0.0" name="back_leg_geom" size="0.08" type="capsule"/>
<body pos="-0.2 -0.2 0">
<joint axis="-1 1 0" name="ankle_3" pos="0.0 0.0 0.0" range="-70 -30" type="hinge"/>
<geom fromto="0.0 0.0 0.0 -0.4 -0.4 0.0" name="third_ankle_geom" size="0.08" type="capsule"/>
</body>
</body>
</body>
<body name="right_back_leg" pos="0 0 0">
<geom fromto="0.0 0.0 0.0 0.2 -0.2 0.0" name="aux_4_geom" size="0.08" type="capsule"/>
<body name="aux_4" pos="0.2 -0.2 0">
<joint axis="0 0 1" name="hip_4" pos="0.0 0.0 0.0" range="-30 30" type="hinge"/>
<geom fromto="0.0 0.0 0.0 0.2 -0.2 0.0" name="rightback_leg_geom" size="0.08" type="capsule"/>
<body pos="0.2 -0.2 0">
<joint axis="1 1 0" name="ankle_4" pos="0.0 0.0 0.0" range="30 70" type="hinge"/>
<geom fromto="0.0 0.0 0.0 0.4 -0.4 0.0" name="fourth_ankle_geom" size="0.08" type="capsule"/>
</body>
</body>
</body>
</body>
</worldbody>
<actuator>
<motor ctrllimited="true" ctrlrange="-1.0 1.0" joint="hip_4" gear="150"/>
<motor ctrllimited="true" ctrlrange="-1.0 1.0" joint="ankle_4" gear="150"/>
<motor ctrllimited="true" ctrlrange="-1.0 1.0" joint="hip_1" gear="150"/>
<motor ctrllimited="true" ctrlrange="-1.0 1.0" joint="ankle_1" gear="150"/>
<motor ctrllimited="true" ctrlrange="-1.0 1.0" joint="hip_2" gear="150"/>
<motor ctrllimited="true" ctrlrange="-1.0 1.0" joint="ankle_2" gear="150"/>
<motor ctrllimited="true" ctrlrange="-1.0 1.0" joint="hip_3" gear="150"/>
<motor ctrllimited="true" ctrlrange="-1.0 1.0" joint="ankle_3" gear="150"/>
</actuator>
</mujoco>

View File

@@ -0,0 +1,95 @@
<!-- Cheetah Model
The state space is populated with joints in the order that they are
defined in this file. The actuators also operate on joints.
State-Space (name/joint/parameter):
- rootx slider position (m)
- rootz slider position (m)
- rooty hinge angle (rad)
- bthigh hinge angle (rad)
- bshin hinge angle (rad)
- bfoot hinge angle (rad)
- fthigh hinge angle (rad)
- fshin hinge angle (rad)
- ffoot hinge angle (rad)
- rootx slider velocity (m/s)
- rootz slider velocity (m/s)
- rooty hinge angular velocity (rad/s)
- bthigh hinge angular velocity (rad/s)
- bshin hinge angular velocity (rad/s)
- bfoot hinge angular velocity (rad/s)
- fthigh hinge angular velocity (rad/s)
- fshin hinge angular velocity (rad/s)
- ffoot hinge angular velocity (rad/s)
Actuators (name/actuator/parameter):
- bthigh hinge torque (N m)
- bshin hinge torque (N m)
- bfoot hinge torque (N m)
- fthigh hinge torque (N m)
- fshin hinge torque (N m)
- ffoot hinge torque (N m)
-->
<mujoco model="cheetah">
<compiler angle="radian" coordinate="local" inertiafromgeom="true" settotalmass="14"/>
<default>
<joint armature=".1" damping=".01" limited="true" solimplimit="0 .8 .03" solreflimit=".02 1" stiffness="8"/>
<geom conaffinity="0" condim="3" contype="1" friction=".4 .1 .1" rgba="0.8 0.6 .4 1" solimp="0.0 0.8 0.01" solref="0.02 1"/>
<motor ctrllimited="true" ctrlrange="-1 1"/>
</default>
<size nstack="300000" nuser_geom="1"/>
<option gravity="0 0 -9.81" timestep="0.01"/>
<asset>
<texture builtin="gradient" height="100" rgb1="1 1 1" rgb2="0 0 0" type="skybox" width="100"/>
<texture builtin="flat" height="1278" mark="cross" markrgb="1 1 1" name="texgeom" random="0.01" rgb1="0.8 0.6 0.4" rgb2="0.8 0.6 0.4" type="cube" width="127"/>
<texture builtin="checker" height="100" name="texplane" rgb1="0 0 0" rgb2="0.8 0.8 0.8" type="2d" width="100"/>
<material name="MatPlane" reflectance="0.5" shininess="1" specular="1" texrepeat="60 60" texture="texplane"/>
<material name="geom" texture="texgeom" texuniform="true"/>
</asset>
<worldbody>
<light cutoff="100" diffuse="1 1 1" dir="-0 0 -1.3" directional="true" exponent="1" pos="0 0 1.3" specular=".1 .1 .1"/>
<geom conaffinity="1" condim="3" material="MatPlane" name="floor" pos="0 0 0" rgba="0.8 0.9 0.8 1" size="40 40 40" type="plane"/>
<body name="torso" pos="0 0 .7">
<joint armature="0" axis="1 0 0" damping="0" limited="false" name="rootx" pos="0 0 0" stiffness="0" type="slide"/>
<joint armature="0" axis="0 0 1" damping="0" limited="false" name="rootz" pos="0 0 0" stiffness="0" type="slide"/>
<joint armature="0" axis="0 1 0" damping="0" limited="false" name="rooty" pos="0 0 0" stiffness="0" type="hinge"/>
<geom fromto="-.5 0 0 .5 0 0" name="torso" size="0.046" type="capsule"/>
<geom axisangle="0 1 0 .87" name="head" pos=".6 0 .1" size="0.046 .15" type="capsule"/>
<!-- <site name='tip' pos='.15 0 .11'/>-->
<body name="bthigh" pos="-.5 0 0">
<joint axis="0 1 0" damping="6" name="bthigh" pos="0 0 0" range="-.52 1.05" stiffness="240" type="hinge"/>
<geom axisangle="0 1 0 -3.8" name="bthigh" pos=".1 0 -.13" size="0.046 .145" type="capsule"/>
<body name="bshin" pos=".16 0 -.25">
<joint axis="0 1 0" damping="4.5" name="bshin" pos="0 0 0" range="-.785 .785" stiffness="180" type="hinge"/>
<geom axisangle="0 1 0 -2.03" name="bshin" pos="-.14 0 -.07" rgba="0.9 0.6 0.6 1" size="0.046 .15" type="capsule"/>
<body name="bfoot" pos="-.28 0 -.14">
<joint axis="0 1 0" damping="3" name="bfoot" pos="0 0 0" range="-.4 .785" stiffness="120" type="hinge"/>
<geom axisangle="0 1 0 -.27" name="bfoot" pos=".03 0 -.097" rgba="0.9 0.6 0.6 1" size="0.046 .094" type="capsule"/>
</body>
</body>
</body>
<body name="fthigh" pos=".5 0 0">
<joint axis="0 1 0" damping="4.5" name="fthigh" pos="0 0 0" range="-1 .7" stiffness="180" type="hinge"/>
<geom axisangle="0 1 0 .52" name="fthigh" pos="-.07 0 -.12" size="0.046 .133" type="capsule"/>
<body name="fshin" pos="-.14 0 -.24">
<joint axis="0 1 0" damping="3" name="fshin" pos="0 0 0" range="-1.2 .87" stiffness="120" type="hinge"/>
<geom axisangle="0 1 0 -.6" name="fshin" pos=".065 0 -.09" rgba="0.9 0.6 0.6 1" size="0.046 .106" type="capsule"/>
<body name="ffoot" pos=".13 0 -.18">
<joint axis="0 1 0" damping="1.5" name="ffoot" pos="0 0 0" range="-.5 .5" stiffness="60" type="hinge"/>
<geom axisangle="0 1 0 -.6" name="ffoot" pos=".045 0 -.07" rgba="0.9 0.6 0.6 1" size="0.046 .07" type="capsule"/>
</body>
</body>
</body>
</body>
</worldbody>
<actuator>
<motor gear="120" joint="bthigh" name="bthigh"/>
<motor gear="90" joint="bshin" name="bshin"/>
<motor gear="60" joint="bfoot" name="bfoot"/>
<motor gear="120" joint="fthigh" name="fthigh"/>
<motor gear="60" joint="fshin" name="fshin"/>
<motor gear="30" joint="ffoot" name="ffoot"/>
</actuator>
</mujoco>

View File

@@ -0,0 +1,44 @@
<mujoco model="hopper">
<compiler angle="degree" coordinate="global" inertiafromgeom="true"/>
<default>
<joint armature="1" damping="1" limited="true"/>
<geom conaffinity="1" condim="1" contype="1" margin="0.001" material="geom" rgba="0.8 0.6 .4 1" solimp=".8 .8 .01" solref=".02 1"/>
<motor ctrllimited="true" ctrlrange="-.4 .4"/>
</default>
<option integrator="RK4" timestep="0.002"/>
<worldbody>
<light cutoff="100" diffuse="1 1 1" dir="-0 0 -1.3" directional="true" exponent="1" pos="0 0 1.3" specular=".1 .1 .1"/>
<geom conaffinity="1" condim="3" name="floor" pos="0 0 0" rgba="0.8 0.9 0.8 1" size="20 20 .125" type="plane" material="MatPlane"/>
<body name="torso" pos="0 0 1.25">
<joint armature="0" axis="1 0 0" damping="0" limited="false" name="rootx" pos="0 0 0" stiffness="0" type="slide"/>
<joint armature="0" axis="0 0 1" damping="0" limited="false" name="rootz" pos="0 0 0" ref="1.25" stiffness="0" type="slide"/>
<joint armature="0" axis="0 1 0" damping="0" limited="false" name="rooty" pos="0 0 1.25" stiffness="0" type="hinge"/>
<geom friction="0.9" fromto="0 0 1.45 0 0 1.05" name="torso_geom" size="0.05" type="capsule"/>
<body name="thigh" pos="0 0 1.05">
<joint axis="0 -1 0" name="thigh_joint" pos="0 0 1.05" range="-150 0" type="hinge"/>
<geom friction="0.9" fromto="0 0 1.05 0 0 0.6" name="thigh_geom" size="0.05" type="capsule"/>
<body name="leg" pos="0 0 0.35">
<joint axis="0 -1 0" name="leg_joint" pos="0 0 0.6" range="-150 0" type="hinge"/>
<geom friction="0.9" fromto="0 0 0.6 0 0 0.1" name="leg_geom" size="0.04" type="capsule"/>
<body name="foot" pos="0.13/2 0 0.1">
<joint axis="0 -1 0" name="foot_joint" pos="0 0 0.1" range="-45 45" type="hinge"/>
<geom friction="2.0" fromto="-0.13 0 0.1 0.26 0 0.1" name="foot_geom" size="0.06" type="capsule"/>
</body>
</body>
</body>
</body>
</worldbody>
<actuator>
<motor ctrllimited="true" ctrlrange="-1.0 1.0" gear="200.0" joint="thigh_joint"/>
<motor ctrllimited="true" ctrlrange="-1.0 1.0" gear="200.0" joint="leg_joint"/>
<motor ctrllimited="true" ctrlrange="-1.0 1.0" gear="200.0" joint="foot_joint"/>
</actuator>
<asset>
<texture type="skybox" builtin="gradient" rgb1=".4 .5 .6" rgb2="0 0 0"
width="100" height="100"/>
<texture builtin="flat" height="1278" mark="cross" markrgb="1 1 1" name="texgeom" random="0.01" rgb1="0.8 0.6 0.4" rgb2="0.8 0.6 0.4" type="cube" width="127"/>
<texture builtin="checker" height="100" name="texplane" rgb1="0 0 0" rgb2="0.8 0.8 0.8" type="2d" width="100"/>
<material name="MatPlane" reflectance="0.5" shininess="1" specular="1" texrepeat="60 60" texture="texplane"/>
<material name="geom" texture="texgeom" texuniform="true"/>
</asset>
</mujoco>

View File

@@ -0,0 +1,120 @@
<mujoco model="humanoid">
<compiler angle="degree" inertiafromgeom="true"/>
<default>
<joint armature="1" damping="1" limited="true"/>
<geom conaffinity="1" condim="1" contype="1" margin="0.001" material="geom" rgba="0.8 0.6 .4 1"/>
<motor ctrllimited="true" ctrlrange="-.4 .4"/>
</default>
<option integrator="RK4" iterations="50" solver="PGS" timestep="0.003">
<!-- <flags solverstat="enable" energy="enable"/>-->
</option>
<size nkey="5" nuser_geom="1"/>
<visual>
<map fogend="5" fogstart="3"/>
</visual>
<asset>
<texture builtin="gradient" height="100" rgb1=".4 .5 .6" rgb2="0 0 0" type="skybox" width="100"/>
<!-- <texture builtin="gradient" height="100" rgb1="1 1 1" rgb2="0 0 0" type="skybox" width="100"/>-->
<texture builtin="flat" height="1278" mark="cross" markrgb="1 1 1" name="texgeom" random="0.01" rgb1="0.8 0.6 0.4" rgb2="0.8 0.6 0.4" type="cube" width="127"/>
<texture builtin="checker" height="100" name="texplane" rgb1="0 0 0" rgb2="0.8 0.8 0.8" type="2d" width="100"/>
<material name="MatPlane" reflectance="0.5" shininess="1" specular="1" texrepeat="60 60" texture="texplane"/>
<material name="geom" texture="texgeom" texuniform="true"/>
</asset>
<worldbody>
<light cutoff="100" diffuse="1 1 1" dir="-0 0 -1.3" directional="true" exponent="1" pos="0 0 1.3" specular=".1 .1 .1"/>
<geom condim="3" friction="1 .1 .1" material="MatPlane" name="floor" pos="0 0 0" rgba="0.8 0.9 0.8 1" size="20 20 0.125" type="plane"/>
<!-- <geom condim="3" material="MatPlane" name="floor" pos="0 0 0" size="10 10 0.125" type="plane"/>-->
<body name="torso" pos="0 0 1.4">
<joint armature="0" damping="0" limited="false" name="root" pos="0 0 0" stiffness="0" type="free"/>
<geom fromto="0 -.07 0 0 .07 0" name="torso1" size="0.07" type="capsule"/>
<geom name="head" pos="0 0 .19" size=".09" type="sphere" user="258"/>
<geom fromto="-.01 -.06 -.12 -.01 .06 -.12" name="uwaist" size="0.06" type="capsule"/>
<body name="lwaist" pos="-.01 0 -0.260" quat="1.000 0 -0.002 0">
<geom fromto="0 -.06 0 0 .06 0" name="lwaist" size="0.06" type="capsule"/>
<joint armature="0.02" axis="0 0 1" damping="5" name="abdomen_z" pos="0 0 0.065" range="-45 45" stiffness="20" type="hinge"/>
<joint armature="0.02" axis="0 1 0" damping="5" name="abdomen_y" pos="0 0 0.065" range="-75 30" stiffness="10" type="hinge"/>
<body name="pelvis" pos="0 0 -0.165" quat="1.000 0 -0.002 0">
<joint armature="0.02" axis="1 0 0" damping="5" name="abdomen_x" pos="0 0 0.1" range="-35 35" stiffness="10" type="hinge"/>
<geom fromto="-.02 -.07 0 -.02 .07 0" name="butt" size="0.09" type="capsule"/>
<body name="right_thigh" pos="0 -0.1 -0.04">
<joint armature="0.01" axis="1 0 0" damping="5" name="right_hip_x" pos="0 0 0" range="-25 5" stiffness="10" type="hinge"/>
<joint armature="0.01" axis="0 0 1" damping="5" name="right_hip_z" pos="0 0 0" range="-60 35" stiffness="10" type="hinge"/>
<joint armature="0.0080" axis="0 1 0" damping="5" name="right_hip_y" pos="0 0 0" range="-110 20" stiffness="20" type="hinge"/>
<geom fromto="0 0 0 0 0.01 -.34" name="right_thigh1" size="0.06" type="capsule"/>
<body name="right_shin" pos="0 0.01 -0.403">
<joint armature="0.0060" axis="0 -1 0" name="right_knee" pos="0 0 .02" range="-160 -2" type="hinge"/>
<geom fromto="0 0 0 0 0 -.3" name="right_shin1" size="0.049" type="capsule"/>
<body name="right_foot" pos="0 0 -0.45">
<geom name="right_foot" pos="0 0 0.1" size="0.075" type="sphere" user="0"/>
</body>
</body>
</body>
<body name="left_thigh" pos="0 0.1 -0.04">
<joint armature="0.01" axis="-1 0 0" damping="5" name="left_hip_x" pos="0 0 0" range="-25 5" stiffness="10" type="hinge"/>
<joint armature="0.01" axis="0 0 -1" damping="5" name="left_hip_z" pos="0 0 0" range="-60 35" stiffness="10" type="hinge"/>
<joint armature="0.01" axis="0 1 0" damping="5" name="left_hip_y" pos="0 0 0" range="-120 20" stiffness="20" type="hinge"/>
<geom fromto="0 0 0 0 -0.01 -.34" name="left_thigh1" size="0.06" type="capsule"/>
<body name="left_shin" pos="0 -0.01 -0.403">
<joint armature="0.0060" axis="0 -1 0" name="left_knee" pos="0 0 .02" range="-160 -2" stiffness="1" type="hinge"/>
<geom fromto="0 0 0 0 0 -.3" name="left_shin1" size="0.049" type="capsule"/>
<body name="left_foot" pos="0 0 -0.45">
<geom name="left_foot" type="sphere" size="0.075" pos="0 0 0.1" user="0" />
</body>
</body>
</body>
</body>
</body>
<body name="right_upper_arm" pos="0 -0.17 0.06">
<joint armature="0.0068" axis="2 1 1" name="right_shoulder1" pos="0 0 0" range="-85 60" stiffness="1" type="hinge"/>
<joint armature="0.0051" axis="0 -1 1" name="right_shoulder2" pos="0 0 0" range="-85 60" stiffness="1" type="hinge"/>
<geom fromto="0 0 0 .16 -.16 -.16" name="right_uarm1" size="0.04 0.16" type="capsule"/>
<body name="right_lower_arm" pos=".18 -.18 -.18">
<joint armature="0.0028" axis="0 -1 1" name="right_elbow" pos="0 0 0" range="-90 50" stiffness="0" type="hinge"/>
<geom fromto="0.01 0.01 0.01 .17 .17 .17" name="right_larm" size="0.031" type="capsule"/>
<geom name="right_hand" pos=".18 .18 .18" size="0.04" type="sphere"/>
<camera pos="0 0 0"/>
</body>
</body>
<body name="left_upper_arm" pos="0 0.17 0.06">
<joint armature="0.0068" axis="2 -1 1" name="left_shoulder1" pos="0 0 0" range="-60 85" stiffness="1" type="hinge"/>
<joint armature="0.0051" axis="0 1 1" name="left_shoulder2" pos="0 0 0" range="-60 85" stiffness="1" type="hinge"/>
<geom fromto="0 0 0 .16 .16 -.16" name="left_uarm1" size="0.04 0.16" type="capsule"/>
<body name="left_lower_arm" pos=".18 .18 -.18">
<joint armature="0.0028" axis="0 -1 -1" name="left_elbow" pos="0 0 0" range="-90 50" stiffness="0" type="hinge"/>
<geom fromto="0.01 -0.01 0.01 .17 -.17 .17" name="left_larm" size="0.031" type="capsule"/>
<geom name="left_hand" pos=".18 -.18 .18" size="0.04" type="sphere"/>
</body>
</body>
</body>
</worldbody>
<tendon>
<fixed name="left_hipknee">
<joint coef="-1" joint="left_hip_y"/>
<joint coef="1" joint="left_knee"/>
</fixed>
<fixed name="right_hipknee">
<joint coef="-1" joint="right_hip_y"/>
<joint coef="1" joint="right_knee"/>
</fixed>
</tendon>
<actuator>
<motor gear="100" joint="abdomen_y" name="abdomen_y"/>
<motor gear="100" joint="abdomen_z" name="abdomen_z"/>
<motor gear="100" joint="abdomen_x" name="abdomen_x"/>
<motor gear="100" joint="right_hip_x" name="right_hip_x"/>
<motor gear="100" joint="right_hip_z" name="right_hip_z"/>
<motor gear="300" joint="right_hip_y" name="right_hip_y"/>
<motor gear="200" joint="right_knee" name="right_knee"/>
<motor gear="100" joint="left_hip_x" name="left_hip_x"/>
<motor gear="100" joint="left_hip_z" name="left_hip_z"/>
<motor gear="300" joint="left_hip_y" name="left_hip_y"/>
<motor gear="200" joint="left_knee" name="left_knee"/>
<motor gear="25" joint="right_shoulder1" name="right_shoulder1"/>
<motor gear="25" joint="right_shoulder2" name="right_shoulder2"/>
<motor gear="25" joint="right_elbow" name="right_elbow"/>
<motor gear="25" joint="left_shoulder1" name="left_shoulder1"/>
<motor gear="25" joint="left_shoulder2" name="left_shoulder2"/>
<motor gear="25" joint="left_elbow" name="left_elbow"/>
</actuator>
</mujoco>

View File

@@ -0,0 +1,47 @@
<!-- Cartpole Model
The state space is populated with joints in the order that they are
defined in this file. The actuators also operate on joints.
State-Space (name/joint/parameter):
- cart slider position (m)
- pole hinge angle (rad)
- cart slider velocity (m/s)
- pole hinge angular velocity (rad/s)
Actuators (name/actuator/parameter):
- cart motor force x (N)
-->
<mujoco model="cartpole">
<compiler coordinate="local" inertiafromgeom="true"/>
<custom>
<numeric data="2" name="frame_skip"/>
</custom>
<default>
<joint damping="0.05"/>
<geom contype="0" friction="1 0.1 0.1" rgba="0.7 0.7 0 1"/>
</default>
<option gravity="1e-5 0 -9.81" integrator="RK4" timestep="0.01"/>
<size nstack="3000"/>
<worldbody>
<geom name="floor" pos="0 0 -3.0" rgba="0.8 0.9 0.8 1" size="40 40 40" type="plane"/>
<geom name="rail" pos="0 0 0" quat="0.707 0 0.707 0" rgba="0.3 0.3 0.7 1" size="0.02 1" type="capsule"/>
<body name="cart" pos="0 0 0">
<joint axis="1 0 0" limited="true" margin="0.01" name="slider" pos="0 0 0" range="-1 1" type="slide"/>
<geom name="cart" pos="0 0 0" quat="0.707 0 0.707 0" size="0.1 0.1" type="capsule"/>
<body name="pole" pos="0 0 0">
<joint axis="0 1 0" name="hinge" pos="0 0 0" type="hinge"/>
<geom fromto="0 0 0 0 0 0.6" name="cpole" rgba="0 0.7 0.7 1" size="0.045 0.3" type="capsule"/>
<body name="pole2" pos="0 0 0.6">
<joint axis="0 1 0" name="hinge2" pos="0 0 0" type="hinge"/>
<geom fromto="0 0 0 0 0 0.6" name="cpole2" rgba="0 0.7 0.7 1" size="0.045 0.3" type="capsule"/>
<site name="tip" pos="0 0 .6" size="0.01 0.01"/>
</body>
</body>
</body>
</worldbody>
<actuator>
<motor ctrllimited="true" ctrlrange="-1 1" gear="500" joint="slider" name="slide"/>
</actuator>
</mujoco>

View File

@@ -0,0 +1,27 @@
<mujoco model="inverted pendulum">
<compiler inertiafromgeom="true"/>
<default>
<joint armature="0" damping="1" limited="true"/>
<geom contype="0" friction="1 0.1 0.1" rgba="0.7 0.7 0 1"/>
<tendon/>
<motor ctrlrange="-3 3"/>
</default>
<option gravity="0 0 -9.81" integrator="RK4" timestep="0.02"/>
<size nstack="3000"/>
<worldbody>
<!--geom name="ground" type="plane" pos="0 0 0" /-->
<geom name="rail" pos="0 0 0" quat="0.707 0 0.707 0" rgba="0.3 0.3 0.7 1" size="0.02 1" type="capsule"/>
<body name="cart" pos="0 0 0">
<joint axis="1 0 0" limited="true" name="slider" pos="0 0 0" range="-1 1" type="slide"/>
<geom name="cart" pos="0 0 0" quat="0.707 0 0.707 0" size="0.1 0.1" type="capsule"/>
<body name="pole" pos="0 0 0">
<joint axis="0 1 0" name="hinge" pos="0 0 0" range="-90 90" type="hinge"/>
<geom fromto="0 0 0 0.001 0 0.6" name="cpole" rgba="0 0.7 0.7 1" size="0.049 0.3" type="capsule"/>
<!-- <body name="pole2" pos="0.001 0 0.6"><joint name="hinge2" type="hinge" pos="0 0 0" axis="0 1 0"/><geom name="cpole2" type="capsule" fromto="0 0 0 0 0 0.6" size="0.05 0.3" rgba="0.7 0 0.7 1"/><site name="tip2" pos="0 0 .6"/></body>-->
</body>
</body>
</worldbody>
<actuator>
<motor gear="100" joint="slider" name="slide"/>
</actuator>
</mujoco>

View File

@@ -0,0 +1,31 @@
<mujoco>
<compiler angle="degree" coordinate="local" inertiafromgeom="true"/>
<option integrator="RK4" timestep="0.02"/>
<default>
<joint armature="0" damping="0" limited="false"/>
<geom conaffinity="0" condim="3" density="100" friction="1 0.5 0.5" margin="0" rgba="0.8 0.6 0.4 1"/>
</default>
<asset>
<texture builtin="gradient" height="100" rgb1="1 1 1" rgb2="0 0 0" type="skybox" width="100"/>
<texture builtin="flat" height="1278" mark="cross" markrgb="1 1 1" name="texgeom" random="0.01" rgb1="0.8 0.6 0.4" rgb2="0.8 0.6 0.4" type="cube" width="127"/>
<texture builtin="checker" height="100" name="texplane" rgb1="0 0 0" rgb2="0.8 0.8 0.8" type="2d" width="100"/>
<material name="MatPlane" reflectance="0.5" shininess="1" specular="1" texrepeat="30 30" texture="texplane"/>
<material name="geom" texture="texgeom" texuniform="true"/>
</asset>
<worldbody>
<light cutoff="100" diffuse="1 1 1" dir="-0 0 -1.3" directional="true" exponent="1" pos="0 0 1.3" specular=".1 .1 .1"/>
<geom conaffinity="1" condim="3" material="MatPlane" name="floor" pos="0 0 0" rgba="0.8 0.9 0.8 1" size="40 40 40" type="plane"/>
<body name="torso" pos="0 0 0">
<geom name="pointbody" pos="0 0 0.5" size="0.5" type="sphere"/>
<geom name="pointarrow" pos="0.6 0 0.5" size="0.5 0.1 0.1" type="box"/>
<joint axis="1 0 0" name="ballx" pos="0 0 0" type="slide"/>
<joint axis="0 1 0" name="bally" pos="0 0 0" type="slide"/>
<joint axis="0 0 1" limited="false" name="rot" pos="0 0 0" type="hinge"/>
</body>
</worldbody>
<actuator>
<!-- Those are just dummy actuators for providing ranges -->
<motor ctrllimited="true" ctrlrange="-1 1" joint="ballx"/>
<motor ctrllimited="true" ctrlrange="-0.25 0.25" joint="rot"/>
</actuator>
</mujoco>

View File

@@ -0,0 +1,39 @@
<mujoco model="reacher">
<compiler angle="radian" inertiafromgeom="true"/>
<default>
<joint armature="1" damping="1" limited="true"/>
<geom contype="0" friction="1 0.1 0.1" rgba="0.7 0.7 0 1"/>
</default>
<option gravity="0 0 -9.81" integrator="RK4" timestep="0.01"/>
<worldbody>
<!-- Arena -->
<geom conaffinity="0" contype="0" name="ground" pos="0 0 0" rgba="0.9 0.9 0.9 1" size="1 1 10" type="plane"/>
<geom conaffinity="0" fromto="-.3 -.3 .01 .3 -.3 .01" name="sideS" rgba="0.9 0.4 0.6 1" size=".02" type="capsule"/>
<geom conaffinity="0" fromto=" .3 -.3 .01 .3 .3 .01" name="sideE" rgba="0.9 0.4 0.6 1" size=".02" type="capsule"/>
<geom conaffinity="0" fromto="-.3 .3 .01 .3 .3 .01" name="sideN" rgba="0.9 0.4 0.6 1" size=".02" type="capsule"/>
<geom conaffinity="0" fromto="-.3 -.3 .01 -.3 .3 .01" name="sideW" rgba="0.9 0.4 0.6 1" size=".02" type="capsule"/>
<!-- Arm -->
<geom conaffinity="0" contype="0" fromto="0 0 0 0 0 0.02" name="root" rgba="0.9 0.4 0.6 1" size=".011" type="cylinder"/>
<body name="body0" pos="0 0 .01">
<geom fromto="0 0 0 0.1 0 0" name="link0" rgba="0.0 0.4 0.6 1" size=".01" type="capsule"/>
<joint axis="0 0 1" limited="false" name="joint0" pos="0 0 0" type="hinge"/>
<body name="body1" pos="0.1 0 0">
<joint axis="0 0 1" limited="true" name="joint1" pos="0 0 0" range="-3.0 3.0" type="hinge"/>
<geom fromto="0 0 0 0.1 0 0" name="link1" rgba="0.0 0.4 0.6 1" size=".01" type="capsule"/>
<body name="fingertip" pos="0.11 0 0">
<geom contype="0" name="fingertip" pos="0 0 0" rgba="0.0 0.8 0.6 1" size=".01" type="sphere"/>
</body>
</body>
</body>
<!-- Target -->
<body name="target" pos=".1 -.1 .01">
<joint armature="0" axis="1 0 0" damping="0" limited="true" name="target_x" pos="0 0 0" range="-.27 .27" ref=".1" stiffness="0" type="slide"/>
<joint armature="0" axis="0 1 0" damping="0" limited="true" name="target_y" pos="0 0 0" range="-.27 .27" ref="-.1" stiffness="0" type="slide"/>
<geom conaffinity="0" contype="0" name="target" pos="0 0 0" rgba="0.9 0.2 0.2 1" size=".009" type="sphere"/>
</body>
</worldbody>
<actuator>
<motor ctrllimited="true" ctrlrange="-1.0 1.0" gear="200.0" joint="joint0"/>
<motor ctrllimited="true" ctrlrange="-1.0 1.0" gear="200.0" joint="joint1"/>
</actuator>
</mujoco>

View File

@@ -0,0 +1,38 @@
<mujoco model="swimmer">
<compiler angle="degree" coordinate="local" inertiafromgeom="true"/>
<option collision="predefined" density="4000" integrator="RK4" timestep="0.01" viscosity="0.1"/>
<default>
<geom conaffinity="1" condim="1" contype="1" material="geom" rgba="0.8 0.6 .4 1"/>
<joint armature='0.1' />
</default>
<asset>
<texture builtin="gradient" height="100" rgb1="1 1 1" rgb2="0 0 0" type="skybox" width="100"/>
<texture builtin="flat" height="1278" mark="cross" markrgb="1 1 1" name="texgeom" random="0.01" rgb1="0.8 0.6 0.4" rgb2="0.8 0.6 0.4" type="cube" width="127"/>
<texture builtin="checker" height="100" name="texplane" rgb1="0 0 0" rgb2="0.8 0.8 0.8" type="2d" width="100"/>
<material name="MatPlane" reflectance="0.5" shininess="1" specular="1" texrepeat="30 30" texture="texplane"/>
<material name="geom" texture="texgeom" texuniform="true"/>
</asset>
<worldbody>
<light cutoff="100" diffuse="1 1 1" dir="-0 0 -1.3" directional="true" exponent="1" pos="0 0 1.3" specular=".1 .1 .1"/>
<geom conaffinity="1" condim="3" material="MatPlane" name="floor" pos="0 0 -0.1" rgba="0.8 0.9 0.8 1" size="40 40 0.1" type="plane"/>
<!-- ================= SWIMMER ================= /-->
<body name="torso" pos="0 0 0">
<geom density="1000" fromto="1.5 0 0 0.5 0 0" size="0.1" type="capsule"/>
<joint axis="1 0 0" name="slider1" pos="0 0 0" type="slide"/>
<joint axis="0 1 0" name="slider2" pos="0 0 0" type="slide"/>
<joint axis="0 0 1" name="rot" pos="0 0 0" type="hinge"/>
<body name="mid" pos="0.5 0 0">
<geom density="1000" fromto="0 0 0 -1 0 0" size="0.1" type="capsule"/>
<joint axis="0 0 1" limited="true" name="rot2" pos="0 0 0" range="-100 100" type="hinge"/>
<body name="back" pos="-1 0 0">
<geom density="1000" fromto="0 0 0 -1 0 0" size="0.1" type="capsule"/>
<joint axis="0 0 1" limited="true" name="rot3" pos="0 0 0" range="-100 100" type="hinge"/>
</body>
</body>
</body>
</worldbody>
<actuator>
<motor ctrllimited="true" ctrlrange="-1 1" gear="150.0" joint="rot2"/>
<motor ctrllimited="true" ctrlrange="-1 1" gear="150.0" joint="rot3"/>
</actuator>
</mujoco>

View File

@@ -0,0 +1,61 @@
<mujoco model="walker2d">
<compiler angle="degree" coordinate="global" inertiafromgeom="true"/>
<default>
<joint armature="0.01" damping=".1" limited="true"/>
<geom conaffinity="0" condim="3" contype="1" density="1000" friction=".7 .1 .1" rgba="0.8 0.6 .4 1"/>
</default>
<option integrator="RK4" timestep="0.002"/>
<worldbody>
<light cutoff="100" diffuse="1 1 1" dir="-0 0 -1.3" directional="true" exponent="1" pos="0 0 1.3" specular=".1 .1 .1"/>
<geom conaffinity="1" condim="3" name="floor" pos="0 0 0" rgba="0.8 0.9 0.8 1" size="40 40 40" type="plane" material="MatPlane"/>
<body name="torso" pos="0 0 1.25">
<joint armature="0" axis="1 0 0" damping="0" limited="false" name="rootx" pos="0 0 0" stiffness="0" type="slide"/>
<joint armature="0" axis="0 0 1" damping="0" limited="false" name="rootz" pos="0 0 0" ref="1.25" stiffness="0" type="slide"/>
<joint armature="0" axis="0 1 0" damping="0" limited="false" name="rooty" pos="0 0 1.25" stiffness="0" type="hinge"/>
<geom friction="0.9" fromto="0 0 1.45 0 0 1.05" name="torso_geom" size="0.05" type="capsule"/>
<body name="thigh" pos="0 0 1.05">
<joint axis="0 -1 0" name="thigh_joint" pos="0 0 1.05" range="-150 0" type="hinge"/>
<geom friction="0.9" fromto="0 0 1.05 0 0 0.6" name="thigh_geom" size="0.05" type="capsule"/>
<body name="leg" pos="0 0 0.35">
<joint axis="0 -1 0" name="leg_joint" pos="0 0 0.6" range="-150 0" type="hinge"/>
<geom friction="0.9" fromto="0 0 0.6 0 0 0.1" name="leg_geom" size="0.04" type="capsule"/>
<body name="foot" pos="0.2/2 0 0.1">
<joint axis="0 -1 0" name="foot_joint" pos="0 0 0.1" range="-45 45" type="hinge"/>
<geom friction="0.9" fromto="-0.0 0 0.1 0.2 0 0.1" name="foot_geom" size="0.06" type="capsule"/>
</body>
</body>
</body>
<!-- copied and then replace thigh->thigh_left, leg->leg_left, foot->foot_right -->
<body name="thigh_left" pos="0 0 1.05">
<joint axis="0 -1 0" name="thigh_left_joint" pos="0 0 1.05" range="-150 0" type="hinge"/>
<geom friction="0.9" fromto="0 0 1.05 0 0 0.6" name="thigh_left_geom" rgba=".7 .3 .6 1" size="0.05" type="capsule"/>
<body name="leg_left" pos="0 0 0.35">
<joint axis="0 -1 0" name="leg_left_joint" pos="0 0 0.6" range="-150 0" type="hinge"/>
<geom friction="0.9" fromto="0 0 0.6 0 0 0.1" name="leg_left_geom" rgba=".7 .3 .6 1" size="0.04" type="capsule"/>
<body name="foot_left" pos="0.2/2 0 0.1">
<joint axis="0 -1 0" name="foot_left_joint" pos="0 0 0.1" range="-45 45" type="hinge"/>
<geom friction="1.9" fromto="-0.0 0 0.1 0.2 0 0.1" name="foot_left_geom" rgba=".7 .3 .6 1" size="0.06" type="capsule"/>
</body>
</body>
</body>
</body>
</worldbody>
<actuator>
<!-- <motor joint="torso_joint" ctrlrange="-100.0 100.0" isctrllimited="true"/>-->
<motor ctrllimited="true" ctrlrange="-1.0 1.0" gear="100" joint="thigh_joint"/>
<motor ctrllimited="true" ctrlrange="-1.0 1.0" gear="100" joint="leg_joint"/>
<motor ctrllimited="true" ctrlrange="-1.0 1.0" gear="100" joint="foot_joint"/>
<motor ctrllimited="true" ctrlrange="-1.0 1.0" gear="100" joint="thigh_left_joint"/>
<motor ctrllimited="true" ctrlrange="-1.0 1.0" gear="100" joint="leg_left_joint"/>
<motor ctrllimited="true" ctrlrange="-1.0 1.0" gear="100" joint="foot_left_joint"/>
<!-- <motor joint="finger2_rot" ctrlrange="-20.0 20.0" isctrllimited="true"/>-->
</actuator>
<asset>
<texture type="skybox" builtin="gradient" rgb1=".4 .5 .6" rgb2="0 0 0"
width="100" height="100"/>
<texture builtin="flat" height="1278" mark="cross" markrgb="1 1 1" name="texgeom" random="0.01" rgb1="0.8 0.6 0.4" rgb2="0.8 0.6 0.4" type="cube" width="127"/>
<texture builtin="checker" height="100" name="texplane" rgb1="0 0 0" rgb2="0.8 0.8 0.8" type="2d" width="100"/>
<material name="MatPlane" reflectance="0.5" shininess="1" specular="1" texrepeat="60 60" texture="texplane"/>
<material name="geom" texture="texgeom" texuniform="true"/>
</asset>
</mujoco>

View File

@@ -0,0 +1,35 @@
import numpy as np
from gym import utils
from gym.envs.mujoco import mujoco_env
class HalfCheetahEnv(mujoco_env.MujocoEnv, utils.EzPickle):
def __init__(self):
mujoco_env.MujocoEnv.__init__(self, 'half_cheetah.xml', 5)
utils.EzPickle.__init__(self)
self.finalize()
def _step(self, action):
xposbefore = self.model.data.qpos[0,0]
self.do_simulation(action, self.frame_skip)
xposafter = self.model.data.qpos[0,0]
ob = self._get_obs()
reward_ctrl = - 0.1 * np.square(action).sum()
reward_run = (xposafter - xposbefore)/self.dt
reward = reward_ctrl + reward_run
done = False
return ob, reward, done, dict(reward_run = reward_run, reward_ctrl=reward_ctrl)
def _get_obs(self):
return np.concatenate([
self.model.data.qpos.flat[1:],
self.model.data.qvel.flat,
])
def _reset(self):
self.model.data.qpos = self.init_qpos + np.random.uniform(size=(self.model.nq,1),low=-.1,high=.1)
self.model.data.qvel = self.init_qvel + np.random.randn(self.model.nv,1)*.1
self.reset_viewer_if_necessary()
return self._get_obs()
def viewer_setup(self):
self.viewer.cam.distance = self.model.stat.extent * 0.5

41
gym/envs/mujoco/hopper.py Normal file
View File

@@ -0,0 +1,41 @@
import numpy as np
from gym import utils
from gym.envs.mujoco import mujoco_env
class HopperEnv(mujoco_env.MujocoEnv, utils.EzPickle):
def __init__(self):
mujoco_env.MujocoEnv.__init__(self, 'hopper.xml', 4)
utils.EzPickle.__init__(self)
self.finalize()
def _step(self, a):
posbefore = self.model.data.qpos[0,0]
self.do_simulation(a, self.frame_skip)
posafter,height,ang = self.model.data.qpos[0:3,0]
alive_bonus = 1.0
reward = (posafter - posbefore) / self.dt
reward += alive_bonus
reward -= 1e-3 * np.square(a).sum()
s = self._state
done = not (np.isfinite(s).all() and (np.abs(s[2:]) < 100).all() and
(height > .7) and (abs(ang) < .2))
ob = self._get_obs()
return ob, reward, done, {}
def _get_obs(self):
return np.concatenate([
self.model.data.qpos.flat[1:],
np.clip(self.model.data.qvel.flat,-10,10)
])
def _reset(self):
self.model.data.qpos = self.init_qpos + np.random.rand(self.model.nq,1)*.01-.005
self.model.data.qvel = self.init_qvel + np.random.rand(self.model.nv,1)*.01-.005
self.reset_viewer_if_necessary()
return self._get_obs()
def viewer_setup(self):
self.viewer.cam.trackbodyid = 2
self.viewer.cam.distance = self.model.stat.extent * 0.75
self.viewer.cam.lookat[2] += .8
self.viewer.cam.elevation = -20

View File

@@ -0,0 +1,53 @@
import numpy as np
from gym.envs.mujoco import mujoco_env
from gym import utils
def mass_center(model):
mass = model.body_mass
xpos = model.data.xipos
return (np.sum(mass * xpos, 0) / np.sum(mass))[0]
class HumanoidEnv(mujoco_env.MujocoEnv, utils.EzPickle):
def __init__(self, initial_randomness=0.01):
mujoco_env.MujocoEnv.__init__(self, 'humanoid.xml', 5)
utils.EzPickle.__init__(self)
self.initial_randomness = initial_randomness
self.finalize()
def _get_obs(self):
data = self.model.data
return np.concatenate([data.qpos.flat[2:],
data.qvel.flat,
data.cinert.flat,
data.cvel.flat,
data.qfrc_actuator.flat,
data.cfrc_ext.flat])
def _step(self, a):
pos_before = mass_center(self.model)
self.do_simulation(a, self.frame_skip)
pos_after = mass_center(self.model)
alive_bonus = 5.0
data = self.model.data
lin_vel_cost = 0.25 * (pos_after - pos_before) / self.model.opt.timestep
quad_ctrl_cost = 0.1 * np.square(data.ctrl).sum()
quad_impact_cost = .5e-6 * np.square(data.cfrc_ext).sum()
quad_impact_cost = min(quad_impact_cost, 10)
reward = lin_vel_cost - quad_ctrl_cost - quad_impact_cost + alive_bonus
qpos = self.model.data.qpos
done = bool((qpos[2] < 1.0) or (qpos[2] > 2.0))
return self._get_obs(), reward, done, dict(reward_linvel=lin_vel_cost, reward_quadctrl=-quad_ctrl_cost, reward_alive=alive_bonus, reward_impact=-quad_impact_cost)
# TODO: requires more complicated reset.
def _reset(self):
self.model.data.qpos = self.init_qpos + (np.random.rand(self.model.nq,1)-0.5)*2*self.initial_randomness
self.model.data.qvel = self.init_qvel + (np.random.rand(self.model.nv,1)-0.5)*2*self.initial_randomness
self.model.forward()
self.reset_viewer_if_necessary()
return self._get_obs()
def viewer_setup(self):
self.viewer.cam.trackbodyid = 1
self.viewer.cam.distance = self.model.stat.extent * 1.0
self.viewer.cam.lookat[2] += .8
self.viewer.cam.elevation = -20

View File

@@ -0,0 +1,43 @@
import numpy as np
from gym import utils
from gym.envs.mujoco import mujoco_env
class InvertedDoublePendulumEnv(mujoco_env.MujocoEnv, utils.EzPickle):
def __init__(self):
mujoco_env.MujocoEnv.__init__(self, 'inverted_double_pendulum.xml', 5)
utils.EzPickle.__init__(self)
self.finalize()
def _step(self, action):
self.do_simulation(action, self.frame_skip)
ob = self._get_obs()
x, _, y = self.model.data.site_xpos[0]
dist_penalty = 0.01 * x ** 2 + (y - 2) ** 2
v1, v2 = self.model.data.qvel[1:3]
vel_penalty = 1e-3 * v1**2 + 5e-3 * v2**2
alive_bonus = 10
r = (alive_bonus - dist_penalty - vel_penalty)[0]
done = bool(y <= 1)
return ob, r, done, {}
def _get_obs(self):
return np.concatenate([
self.model.data.qpos[:1], # cart x pos
np.sin(self.model.data.qpos[1:]), # link angles
np.cos(self.model.data.qpos[1:]),
np.clip(self.model.data.qvel, -10, 10),
np.clip(self.model.data.qfrc_constraint, -10, 10)
]).ravel()
def _reset(self):
self.model.data.qpos = self.init_qpos + np.random.uniform(size=(self.model.nq,1),low=-.1,high=.1)
self.model.data.qvel = self.init_qvel + np.random.randn(self.model.nv,1)*.1
self.reset_viewer_if_necessary()
return self._get_obs()
def viewer_setup(self):
v = self.viewer
v.cam.trackbodyid=0
v.cam.distance = v.model.stat.extent * 0.5
v.cam.lookat[2] += 3#v.model.stat.center[2]

View File

@@ -0,0 +1,31 @@
import numpy as np
from gym import utils
from gym.envs.mujoco import mujoco_env
class InvertedPendulumEnv(mujoco_env.MujocoEnv, utils.EzPickle):
def __init__(self):
utils.EzPickle.__init__(self)
mujoco_env.MujocoEnv.__init__(self, 'inverted_pendulum.xml', 2)
self.finalize()
def _step(self, a):
reward = 1.0
self.do_simulation(a, self.frame_skip)
ob = self._get_obs()
notdone = np.isfinite(ob).all() and (np.abs(ob[1]) <= .2)
done = not notdone
return ob, reward, done, {}
def _reset(self):
self.model.data.qpos = self.init_qpos + np.random.uniform(size=(self.model.nq,1), low=-0.01, high=0.01)
self.model.data.qvel = self.init_qvel + np.random.uniform(size=(self.model.nv,1), low=-0.01, high=0.01)
self.reset_viewer_if_necessary()
return self._get_obs()
def _get_obs(self):
return np.concatenate([self.model.data.qpos, self.model.data.qvel]).ravel()
def viewer_setup(self):
v = self.viewer
v.cam.trackbodyid=0
v.cam.distance = v.model.stat.extent

View File

@@ -0,0 +1,109 @@
import os.path
import numpy as np
import gym
from gym import error, spaces
try:
import mujoco_py
except ImportError as e:
raise error.DependencyNotInstalled("{}. (HINT: you need to install mujoco_py, and also perform the setup instructions here: https://github.com/openai/mujoco-py/.)'".format(e))
BIG=10000
class MujocoEnv(gym.Env):
def __init__(self, model_path, frame_skip):
if model_path.startswith("/"):
fullpath = model_path
else:
fullpath = os.path.join(os.path.dirname(__file__), "assets", model_path)
if not os.path.exists(fullpath):
raise IOError("File %s does not exist"%fullpath)
self.frame_skip= frame_skip
self.model = mujoco_py.MjModel(fullpath)
self.data = self.model.data
self.viewer = None
self.metadata = {
'render.modes': ['human', 'rgb_array'],
'video.frames_per_second' : int(np.round(1.0 / self.dt))
}
@property
def dt(self):
return self.model.opt.timestep * self.frame_skip
def do_simulation(self, ctrl, n_frames):
self.model.data.ctrl = ctrl
for _ in range(n_frames):
self.model.step()
def finalize(self):
self.init_qpos = self.model.data.qpos.copy()
self.init_qvel = self.model.data.qvel.copy()
self.ctrl_dim = self.model.data.ctrl.size
observation, _reward, done, _info = self.step(np.zeros(self.ctrl_dim))
assert not done
self.obs_dim = observation.size
high = np.ones(self.ctrl_dim)
low = -high
self.action_space = spaces.Box(low, high)
high = BIG*np.ones(self.obs_dim)
low = -high
self.observation_space = spaces.Box(low, high)
def _render(self, mode='human', close=False):
if close:
self._get_viewer().finish()
return
if mode == 'rgb_array':
self._get_viewer().render()
data, width, height = self._get_viewer().get_image()
return np.fromstring(data, dtype='uint8').reshape(height, width, 3)[::-1,:,:]
elif mode is 'human':
self._get_viewer().loop_once()
def _get_viewer(self):
if self.viewer is None:
self.viewer = mujoco_py.MjViewer()
self.viewer.start()
self.viewer.set_model(self.model)
self.viewer_setup()
return self.viewer
def viewer_setup(self):
pass
def reset_viewer_if_necessary(self):
if self.viewer is not None:
self.viewer.autoscale()
self.viewer_setup()
def get_body_com(self, body_name):
idx = self.model.body_names.index(body_name)
return self.model.data.com_subtree[idx]
def get_body_comvel(self, body_name):
idx = self.model.body_names.index(body_name)
return self.model.body_comvels[idx]
def get_body_xmat(self, body_name):
idx = self.model.body_names.index(body_name)
return self.model.data.xmat[idx].reshape((3, 3))
@property
def action_bounds(self):
bounds = self.model.actuator_ctrlrange
lb = bounds[:, 0]
ub = bounds[:, 1]
return lb, ub
@property
def _state(self):
return np.concatenate([
self.model.data.qpos.flat,
self.model.data.qvel.flat
])

View File

@@ -0,0 +1,45 @@
import numpy as np
from gym import utils
from gym.envs.mujoco import mujoco_env
class ReacherEnv(mujoco_env.MujocoEnv, utils.EzPickle):
def __init__(self):
utils.EzPickle.__init__(self)
mujoco_env.MujocoEnv.__init__(self, 'reacher.xml', 2)
self.finalize()
def _step(self, a):
vec = self.get_body_com("fingertip")-self.get_body_com("target")
reward_dist = - np.linalg.norm(vec)
reward_ctrl = - np.square(a).sum()
reward = reward_dist + reward_ctrl
self.do_simulation(a, self.frame_skip)
ob = self._get_obs()
done = False
return ob, reward, done, dict(reward_dist=reward_dist, reward_ctrl=reward_ctrl)
def viewer_setup(self):
self.viewer.cam.trackbodyid=0
def _reset(self):
qpos = np.random.uniform(low=-0.1, high=0.1, size=(self.model.nq,1)) + self.init_qpos
while True:
self.goal = np.random.uniform(low=-.2, high=.2, size=(2,1))
if np.linalg.norm(self.goal) < 2: break
qpos[-2:] = self.goal
self.model.data.qpos = qpos
qvel = self.init_qvel + np.random.rand(self.model.nv,1)*.01-.005
qvel[-2:] = 0
self.model.data.qvel = qvel
self.reset_viewer_if_necessary()
return self._get_obs()
def _get_obs(self):
theta = self.model.data.qpos.flat[:2]
return np.concatenate([
np.cos(theta),
np.sin(theta),
self.model.data.qpos.flat[2:],
self.model.data.qvel.flat[:2],
self.get_body_com("fingertip") - self.get_body_com("target")
])

View File

@@ -0,0 +1,35 @@
import numpy as np
from gym import utils
from gym.envs.mujoco import mujoco_env
class SwimmerEnv(mujoco_env.MujocoEnv, utils.EzPickle):
def __init__(self):
mujoco_env.MujocoEnv.__init__(self, 'swimmer.xml', 4)
utils.EzPickle.__init__(self)
self.ctrl_cost_coeff = 0.0001
self.finalize()
def _step(self, a):
xposbefore = self.model.data.qpos[0,0]
self.do_simulation(a, self.frame_skip)
xposafter = self.model.data.qpos[0,0]
reward_fwd = (xposafter - xposbefore) / self.dt
reward_ctrl = - self.ctrl_cost_coeff * np.square(a).sum()
reward = reward_fwd + reward_ctrl
ob = self._get_obs()
return ob, reward, False, dict(reward_fwd = reward_fwd, reward_ctrl=reward_ctrl)
def _get_obs(self):
qpos = self.model.data.qpos
qvel = self.model.data.qvel
return np.concatenate([
qpos.flat[2:],
qvel.flat
])
def _reset(self):
self.model.data.qpos = self.init_qpos + np.random.uniform(size=(self.model.nq,1),low=-.1,high=.1)
self.model.data.qvel = self.init_qvel + np.random.uniform(size=(self.model.nv,1),low=-.1,high=.1)
self.reset_viewer_if_necessary()
return self._get_obs()

View File

@@ -0,0 +1,41 @@
import numpy as np
from gym import utils
from gym.envs.mujoco import mujoco_env
# copied from hopper
class Walker2dEnv(mujoco_env.MujocoEnv, utils.EzPickle):
def __init__(self):
mujoco_env.MujocoEnv.__init__(self, "walker2d.xml", 4)
utils.EzPickle.__init__(self)
self.finalize()
def _step(self, a):
posbefore = self.model.data.qpos[0,0]
self.do_simulation(a, self.frame_skip)
posafter,height,ang = self.model.data.qpos[0:3,0]
alive_bonus = 1.0
reward = ((posafter - posbefore) / self.dt )
reward += alive_bonus
reward -= 1e-3 * np.square(a).sum()
done = not (height > 0.8 and height < 2.0
and ang > -1.0 and ang < 1.0)
ob = self._get_obs()
return ob, reward, done, {}
def _get_obs(self):
qpos = self.model.data.qpos
qvel = self.model.data.qvel
return np.concatenate([qpos[1:], np.clip(qvel,-10,10)]).ravel()
def _reset(self):
self.model.data.qpos = self.init_qpos + np.random.rand(self.model.nq,1)*.01-.005
self.model.data.qvel = self.init_qvel + np.random.rand(self.model.nv,1)*.01-.005
self.reset_viewer_if_necessary()
return self._get_obs()
def viewer_setup(self):
self.viewer.cam.trackbodyid = 2
self.viewer.cam.distance = self.model.stat.extent * 0.5
self.viewer.cam.lookat[2] += .8
self.viewer.cam.elevation = -20

115
gym/envs/registration.py Normal file
View File

@@ -0,0 +1,115 @@
import logging
import pkg_resources
import re
import six
import sys
from gym import error
logger = logging.getLogger(__name__)
# This format is true today, but it's *not* an official spec.
env_id_re = re.compile(r'^([\w:-]+)-v(\d+)$')
def load(name):
entry_point = pkg_resources.EntryPoint.parse('x={}'.format(name))
try:
result = entry_point.load(False)
except ImportError as e:
_, _, traceback = sys.exc_info()
new_e = ImportError("{} (while loading {})".format(e, name))
six.reraise(type(new_e), new_e, traceback)
else:
return result
class EnvSpec(object):
"""A specification for a particular instance of the environment. Used
to register the parameters for official evaluations.
Args:
id (str): The official environment ID
entry_point (str): The Python entrypoint of the environment class (e.g. module.name:Class)
timestep_limit (int): The max number of timesteps per episode during training
trials (int): The number of trials to average reward over
reward_threshold (Optional[int]): The reward threshold before the task is considered solved
kwargs (dict): The kwargs to pass to the environment class
Attributes:
id (str): The official environment ID
timestep_limit (int): The max number of timesteps per episode in official evaluation
trials (int): The number of trials run in official evaluation
"""
def __init__(self, id, entry_point, timestep_limit=1000, trials=100, reward_threshold=None, kwargs=None):
self.id = id
# Evaluation parameters
self.timestep_limit = timestep_limit
self.trials = trials
self.reward_threshold = reward_threshold
# We may make some of these other parameters public if they're
# useful.
match = env_id_re.search(id)
if not match:
raise error.Error('Attempted to register malformed environment ID: {}. (Currently all IDs must be of the form {}.)'.format(id, env_id_re.pattern))
self._entry_point = entry_point
self._kwargs = {} if kwargs is None else kwargs
def make(self):
"""Instantiates an instance of the environment with appropriate kwargs"""
cls = load(self._entry_point)
try:
env = cls(**self._kwargs)
except TypeError as e:
type, value, traceback = sys.exc_info()
# This likely indicates unsupported kwargs
six.reraise(type, """Could not 'make' {} ({}): {}.
(For reference, the environment was instantiated with kwargs: {}).""".format(self.id, cls, e.message, self._kwargs), traceback)
# Make the enviroment aware of which spec it came from.
env.spec = self
return env
def __repr__(self):
return "EnvSpec({})".format(self.id)
class EnvRegistry(object):
"""Register an env by ID. IDs remain stable over time and are
guaranteed to resolve to the same environment dynamics (or be
desupported). The goal is that results on a particular environment
should always be comparable, and not depend on the version of the
code that was running.
"""
def __init__(self):
self.env_specs = {}
def make(self, id):
logger.info('Making new env: %s', id)
spec = self.spec(id)
return spec.make()
def all(self):
return self.env_specs.values()
def spec(self, id):
match = env_id_re.search(id)
if not match:
raise error.Error('Attempted to look up malformed environment ID: {}. (Currently all IDs must be of the form {}.)'.format(id.encode('utf-8'), env_id_re.pattern))
try:
return self.env_specs[id]
except KeyError:
raise error.UnregisteredEnv('No registered env with id: {}'.format(id))
def register(self, id, entry_point, **kwargs):
if id in self.env_specs:
raise error.Error('Cannot re-register id: {}'.format(id))
self.env_specs[id] = EnvSpec(id, entry_point, **kwargs)
# Have a global registry
registry = EnvRegistry()
register = registry.register
make = registry.make
spec = registry.spec

View File

@@ -0,0 +1,35 @@
import numpy as np
from nose2 import tools
from gym import envs
# This runs a smoketest on each official registered env. We may want
# to try also running environments which are not officially registered
# envs.
specs = [spec for spec in envs.registry.all() if (not spec.id.startswith("atari")) or ("space_invaders" in spec.id)] # only test space invaders out of atari games
@tools.params(*specs)
def test_env(spec):
env = spec.make()
ob_space = env.observation_space
act_space = env.action_space
ob = env.reset()
assert ob_space.contains(ob), 'Reset observation: {!r} not in space'.format(ob)
a = act_space.sample()
observation, reward, done, _info = env.step(a)
assert ob_space.contains(observation), 'Step observation: {!r} not in space'.format(observation)
assert np.isscalar(reward), "{} is not a scalar for {}".format(reward, env)
assert isinstance(done, bool), "Expected {} to be a boolean".format(done)
for mode in env.metadata.get('render.modes'):
env.render(mode=mode)
# Run a longer rollout on some environments
def test_random_rollout():
for env in [envs.make('CartPole-v0'), envs.make('FrozenLake-v0')]:
agent = lambda ob: env.action_space.sample()
ob = env.reset()
for _ in xrange(10):
assert env.observation_space.contains(ob)
a = agent(ob)
assert env.action_space.contains(a)
(ob, _reward, done, _info) = env.step(a)
if done: break

View File

@@ -0,0 +1,35 @@
# -*- coding: utf-8 -*-
from gym import error, envs
from gym.envs import registration
from gym.envs.classic_control import cartpole
def test_make():
env = envs.make('CartPole-v0')
assert env.spec.id == 'CartPole-v0'
assert isinstance(env, cartpole.CartPoleEnv)
def test_spec():
spec = envs.spec('CartPole-v0')
assert spec.id == 'CartPole-v0'
def test_missing_lookup():
registry = registration.EnvRegistry()
registry.register(id='Test-v0', entry_point=None)
registry.register(id='Test-v15', entry_point=None)
registry.register(id='Test-v9', entry_point=None)
registry.register(id='Other-v100', entry_point=None)
try:
registry.spec('Test-v1')
except error.UnregisteredEnv:
pass
else:
assert False
def test_malformed_lookup():
registry = registration.EnvRegistry()
try:
registry.spec(u'“Breakout-v0”')
except error.Error as e:
assert 'malformed environment ID' in e.message, 'Unexpected message: {}'.format(e)
else:
assert False

View File

@@ -0,0 +1,2 @@
from gym.envs.toy_text.roulette import RouletteEnv
from gym.envs.toy_text.frozen_lake import FrozenLakeEnv

View File

@@ -0,0 +1,40 @@
from gym import Env
from gym import spaces
import numpy as np
def categorical_sample(prob_n):
"""
Sample from categorical distribution
Each row specifies class probabilities
"""
prob_n = np.asarray(prob_n)
csprob_n = np.cumsum(prob_n)
return (csprob_n > np.random.rand()).argmax()
class DiscreteEnv(Env):
def __init__(self, nS, nA, P, isd):
"""
Compute a transition probabilities, of the form
P[s][a] == [(probability, nextstate, reward, done)]
also compute initial state distribution
"""
self.action_space = spaces.Discrete(nA)
self.observation_space = spaces.Discrete(nS)
self.nA = nA
self.P = P
self.isd = isd
self.lastaction=None # for rendering
def _reset(self):
self.s = categorical_sample(self.isd)
return self.s
def _step(self, a):
transitions = self.P[self.s][a]
i = categorical_sample([t[0] for t in transitions])
p, s, r, d= transitions[i]
self.s = s
self.lastaction=a
return (s, r, d, {"prob" : p})

View File

@@ -0,0 +1,127 @@
import numpy as np
import StringIO, sys
from gym import utils
from gym.envs.toy_text import discrete
UP = 0
RIGHT = 1
DOWN = 2
LEFT = 3
MAPS = {
"4x4": [
"SFFF",
"FHFH",
"FFFH",
"HFFG"
],
"8x8": [
"SFFFFFFF",
"FFFFFFFF",
"FFFHFFFF",
"FFFFFHFF",
"FFFHFFFF",
"FHHFFFHF",
"FHFFHFHF",
"FFFHFFFG"
],
}
class FrozenLakeEnv(discrete.DiscreteEnv):
"""
Winter is here. You and your friends were tossing around a frisbee at the park
when you made a wild throw that left the frisbee out in the middle of the lake.
The water is mostly frozen, but there are a few holes where the ice has melted.
If you step into one of those holes, you'll fall into the freezing water.
At this time, there's an international frisbee shortage, so it's absolutely imperative that
you navigate across the lake and retrieve the disc.
However, the ice is slippery, so you won't always move in the direction you intend.
The surface is described using a grid like the following
SFFF
FHFH
FFFH
HFFG
S : starting point, safe
F : frozen surface, safe
H : hole, fall to your doom
G : goal, where the frisbee is located
The episode ends when you reach the goal or fall in a hole.
You receive a reward of 1 if you reach the goal, and zero otherwise.
"""
metadata = {'render.modes': ['human', 'ansi']}
def __init__(self, desc=None, map_name="4x4",is_slippery=True):
if desc is None and map_name is None:
raise ValueError('Must provide either desc or map_name')
elif desc is None:
desc = MAPS[map_name]
self.desc = desc = np.asarray(desc,dtype='c')
self.nrow, self.ncol = nrow, ncol = desc.shape
nA = 4
nS = nrow * ncol
isd = (desc == 'S').ravel().astype('float64')
isd /= isd.sum()
P = {s : {a : [] for a in xrange(nA)} for s in xrange(nS)}
def to_s(row, col):
return row*ncol + col
def inc(row, col, a):
if a==0:
col = max(col-1,0)
elif a==1:
row = min(row+1,nrow-1)
elif a==2:
col = min(col+1,ncol-1)
elif a==3:
row = max(row-1,0)
return (row, col)
for row in xrange(nrow):
for col in xrange(ncol):
s = to_s(row, col)
for a in xrange(4):
li = P[s][a]
if is_slippery:
for b in [(a-1)%4, a, (a+1)%4]:
newrow, newcol = inc(row, col, b)
newstate = to_s(newrow, newcol)
letter = desc[newrow, newcol]
done = letter in 'GH'
rew = float(letter == 'G')
li.append((1.0/3.0, newstate, rew, done))
else:
newrow, newcol = inc(row, col, a)
newstate = to_s(newrow, newcol)
letter = desc[newrow, newcol]
done = letter in 'GH'
rew = float(letter == 'G')
li.append((1.0/3.0, newstate, rew, done))
super(FrozenLakeEnv, self).__init__(nrow * ncol, 4, P, isd)
def _render(self, mode='human', close=False):
if close:
return
outfile = StringIO.StringIO() if mode == 'ansi' else sys.stdout
row, col = self.s // self.ncol, self.s % self.ncol
desc = self.desc.tolist()
desc[row][col] = utils.colorize(desc[row][col], "red", highlight=True)
outfile.write("\n".join("".join(row) for row in desc)+"\n")
if self.lastaction is not None:
outfile.write(" ({})\n".format(["Left","Down","Right","Up"][self.lastaction]))
else:
outfile.write("\n")
return outfile

View File

@@ -0,0 +1,40 @@
import numpy as np
import gym
from gym import spaces
class RouletteEnv(gym.Env):
"""Simple roulette environment
The roulette wheel has 37 spots. If the bet is 0 and a 0 comes up,
you win a reward of 35. If the parity of your bet matches the parity
of the spin, you win 1. Otherwise you receive a reward of -1.
The long run reward for playing 0 should be -1/37 for any state
The last action (38) stops the rollout for a return of 0 (walking away)
"""
def __init__(self, spots=37):
self.n = spots + 1
self.action_space = spaces.Discrete(self.n)
self.observation_space = spaces.Discrete(1)
def _step(self, action):
assert(action >= 0 and action < self.n)
if action == self.n - 1:
# observation, reward, done, info
return 0, 0, True, {}
# N.B. np.random.randint draws from [A, B) while random.randint draws from [A,B]
val = np.random.randint(0, self.n - 1)
if val == action == 0:
reward = self.n - 2.0
elif val != 0 and action != 0 and val % 2 == action % 2:
reward = 1.0
else:
reward = -1.0
return 0, reward, False, {}
def _reset(self):
return 0

135
gym/envs/toy_text/taxi.py Normal file
View File

@@ -0,0 +1,135 @@
import numpy as np
import StringIO, sys
from gym import spaces, utils
from gym.envs.toy_text import discrete
MAP = [
"+---------+",
"|R: | : :G|",
"| : : : : |",
"| : : : : |",
"| | :F| : |",
"|Y| : |B: |",
"+---------+",
]
class TaxiEnv(discrete.DiscreteEnv):
"""
The Taxi Problem
from "Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition"
by Tom Dietterich
rendering:
- blue: passenger
- magenta: destination
- yellow: empty taxi
- green: full taxi
- other letters: locations
"""
metadata = {'render.modes': ['human', 'ansi']}
def __init__(self):
self.desc = np.asarray(MAP,dtype='c')
self.locs = locs = [(0,0), (0,4), (4,0), (3,2), (4,3)]
nS = 500
nR = 5
nC = 5
maxR = nR-1
maxC = nC-1
isd = np.zeros(nS)
nA = 6
P = {s : {a : [] for a in xrange(nA)} for s in xrange(nS)}
for row in xrange(5):
for col in xrange(5):
for passidx in xrange(5):
for destidx in xrange(4):
for a in xrange(nA):
state = self.encode(row, col, passidx, destidx)
# defaults
newrow, newcol, newpassidx = row, col, passidx
reward = -1
done = False
taxiloc = (row, col)
if a==0:
newrow = min(row+1, maxR)
elif a==1:
newrow = max(row-1, 0)
if a==2 and self.desc[1+row,2*col+2]==":":
newcol = min(col+1, maxC)
elif a==3 and self.desc[1+row,2*col]==":":
newcol = max(col-1, 0)
elif a==4: # pickup
if (taxiloc == locs[passidx]):
newpassidx = 4
else:
reward = -10
elif a==5: # dropoff
if (taxiloc == locs[destidx]) and passidx==4:
done = True
elif (taxiloc in locs) and passidx==4:
newpassidx = locs.index(taxiloc)
else:
reward = -10
newstate = self.encode(newrow, newcol, newpassidx, destidx)
if passidx < 4: isd[state] += 1
P[state][a].append((1.0, newstate, reward, done))
isd /= isd.sum()
discrete.DiscreteEnv.__init__(self, nS, nA, P, isd)
self.observation_space = spaces.Discrete(500)
self.action_space = spaces.Discrete(6)
def encode(self, taxirow, taxicol, passloc, destidx):
# (5) 5, 5, 4
i = taxirow
i *= 5
i += taxicol
i *= 5
i += passloc
i *= 4
i += destidx
return i
def decode(self, i):
out = []
out.append(i % 4)
i = i // 4
out.append(i % 5)
i = i // 5
out.append(i % 5)
i = i // 5
out.append(i)
assert 0 <= i < 5
return reversed(out)
def _render(self, mode='human', close=False):
if close:
return
outfile = StringIO.StringIO() if mode == 'ansi' else sys.stdout
out = self.desc.copy().tolist()
taxirow, taxicol, passidx, destidx = self.decode(self.s)
def ul(x): return "_" if x == " " else x
if passidx < 4:
out[1+taxirow][2*taxicol+1] = utils.colorize(out[1+taxirow][2*taxicol+1], 'yellow', highlight=True)
pi, pj = self.locs[passidx]
out[1+pi][2*pj+1] = utils.colorize(out[1+pi][2*pj+1], 'blue', bold=True)
else: # passenger in taxi
out[1+taxirow][2*taxicol+1] = utils.colorize(ul(out[1+taxirow][2*taxicol+1]), 'green', highlight=True)
di, dj = self.locs[destidx]
out[1+di][2*dj+1] = utils.colorize(out[1+di][2*dj+1], 'magenta')
outfile.write("\n".join(["".join(row) for row in out])+"\n")
if self.lastaction is not None:
outfile.write(" ({})\n".format(["North", "South", "East", "West", "Pickup", "Dropoff"][self.lastaction]))
else: outfile.write("\n")
# No need to return anything for human
if mode != 'human':
return outfile

97
gym/error.py Normal file
View File

@@ -0,0 +1,97 @@
import sys
class Error(Exception):
pass
# Local errors
class UnregisteredEnv(Error):
"""Raised when the user requests an env from the registry that does
not actually exist.
"""
pass
class DependencyNotInstalled(Error):
pass
class UnsupportedMode(Exception):
"""Raised when the user requests a rendering mode not supported by the
environment.
"""
pass
class ResetNeeded(Exception):
"""When the monitor is active, raised when the user tries to step an
environment that's already done.
"""
pass
class ResetNotAllowed(Exception):
"""When the monitor is active, raised when the user tries to step an
environment that's not yet done.
"""
pass
# API errors
class APIError(Error):
def __init__(self, message=None, http_body=None, http_status=None,
json_body=None, headers=None):
super(APIError, self).__init__(message)
if http_body and hasattr(http_body, 'decode'):
try:
http_body = http_body.decode('utf-8')
except:
http_body = ('<Could not decode body as utf-8. '
'Please report to gym@openai.com>')
self._message = message
self.http_body = http_body
self.http_status = http_status
self.json_body = json_body
self.headers = headers or {}
self.request_id = self.headers.get('request-id', None)
def __unicode__(self):
if self.request_id is not None:
msg = self._message or "<empty message>"
return u"Request {0}: {1}".format(self.request_id, msg)
else:
return self._message
if sys.version_info > (3, 0):
def __str__(self):
return self.__unicode__()
else:
def __str__(self):
return unicode(self).encode('utf-8')
class APIConnectionError(APIError):
pass
class InvalidRequestError(APIError):
def __init__(self, message, param, http_body=None,
http_status=None, json_body=None, headers=None):
super(InvalidRequestError, self).__init__(
message, http_body, http_status, json_body,
headers)
self.param = param
class AuthenticationError(APIError):
pass
class RateLimitError(APIError):
pass
# Video errors
class VideoRecorderError(Error):
pass
class InvalidFrame(Error):
pass

View File

@@ -0,0 +1,3 @@
from gym.monitoring.monitor import Monitor, load_results, monitors as _monitors
from gym.monitoring.stats_recorder import StatsRecorder
from gym.monitoring.video_recorder import VideoRecorder

328
gym/monitoring/monitor.py Normal file
View File

@@ -0,0 +1,328 @@
import atexit
import logging
import json
import numpy as np
import os
import six
import sys
import threading
import weakref
from gym import error, version
from gym.monitoring import stats_recorder, video_recorder
logger = logging.getLogger(__name__)
FILE_PREFIX = 'openaigym'
MANIFEST_PREFIX = FILE_PREFIX + '.manifest'
i = -1
lock = threading.Lock()
def next_monitor_id():
global i
with lock:
i += 1
return i
def detect_training_manifests(training_dir):
return [os.path.join(training_dir, f) for f in os.listdir(training_dir) if f.startswith(MANIFEST_PREFIX + '.')]
def detect_monitor_files(training_dir):
return [os.path.join(training_dir, f) for f in os.listdir(training_dir) if f.startswith(FILE_PREFIX + '.')]
def clear_monitor_files(training_dir):
files = detect_monitor_files(training_dir)
if len(files) == 0:
return
logger.info('Clearing %d monitor files from previous run (because force=True was provided)', len(files))
for file in files:
os.unlink(file)
def capped_cubic_video_schedule(episode_id):
if episode_id < 1000:
return int(round(episode_id ** (1. / 3))) ** 3 == episode_id
else:
return episode_id % 1000 == 0
# Monitors will automatically close themselves when garbage collected
# (via __del__) or when the program exits (via close_all_monitors's
# atexit behavior).
monitors = weakref.WeakValueDictionary()
def ensure_close_at_exit(monitor):
monitors[monitor.monitor_id] = monitor
@atexit.register
def close_all_monitors():
for key, monitor in monitors.items():
monitor.close()
class Monitor(object):
"""A configurable monitor for your training runs.
Every env has an attached monitor, which you can access as
'env.monitor'. Simple usage is just to call 'monitor.start(dir)'
to begin monitoring and 'monitor.close()' when training is
complete. This will record stats and will periodically record a video.
For finer-grained control over how often videos are collected, use the
video_callable argument, e.g.
'monitor.start(video_callable=lambda count: count % 100 == 0)'
to record every 100 episodes. ('count' is how many episodes have completed)
Depending on the environment, video can slow down execution. You
can also use 'monitor.configure(video=lambda count: False)' to disable
video.
Monitor supports multiple threads and multiple processes writing
to the same directory of training data. The data will later be
joined by scoreboard.upload_training_data and on the server.
Args:
env (gym.Env): The environment instance to monitor.
Attributes:
id (Optional[str]): The ID of the monitored environment
"""
def __init__(self, env):
self.env = env
self.videos = []
self.stats_recorder = None
self.video_recorder = None
self.enabled = False
self.episode_id = 0
self.monitor_id = next_monitor_id()
ensure_close_at_exit(self)
def start(self, directory, video_callable=None, force=False):
"""Start monitoring.
Args:
directory (str): A per-training run directory where to record stats.
video_callable: function that takes in the index of the episode and outputs a boolean, indicating whether we should record a video on this episode. The default is to take perfect cubes.
force (bool): Clear out existing training data from this directory (by deleting every file prefixed with "openaigym.").
"""
if self.env.spec is None:
logger.warn("Trying to monitor an environment which has no 'spec' set. This usually means you did not create it via 'gym.make', and is recommended only for advanced users.")
if not os.path.exists(directory):
logger.info('Creating monitor directory %s', directory)
os.makedirs(directory)
if video_callable is None:
video_callable = capped_cubic_video_schedule
# Check on whether we need to clear anything
if force:
clear_monitor_files(directory)
else:
training_manifests = detect_training_manifests(directory)
if len(training_manifests) > 0:
raise error.Error('''Trying to write to monitor directory {} with existing monitor files: {}.
You should use a unique directory for each training run, or use 'force=True' to automatically clear previous monitor files.'''.format(directory, ', '.join(training_manifests[:5])))
self.enabled = True
self.directory = os.path.abspath(directory)
# We use the 'openai-gym' prefix to determine if a file is
# ours
self.file_prefix = FILE_PREFIX
self.file_infix = str(self.monitor_id)
self.stats_recorder = stats_recorder.StatsRecorder(directory, '{}.episode_batch.{}'.format(self.file_prefix, self.file_infix))
self.configure(video_callable=video_callable)
if not os.path.exists(directory):
os.mkdir(directory)
def close(self):
"""Flush all monitor data to disk and close any open rending windows."""
if not self.enabled:
return
stats_file = None
if self.stats_recorder:
stats_file = self.stats_recorder.close()
if self.video_recorder is not None:
self._close_video_recorder()
# Note we'll close the env's rendering window even if we did
# not open it. There isn't a particular great way to know if
# we did, since some environments will have a window pop up
# during video recording.
try:
self.env.render(close=True)
except Exception:
type, value, traceback = sys.exc_info()
if self.env.spec:
key = self.env.spec.id
else:
key = self.env
# This likely indicates unsupported kwargs
six.reraise(type, '{} (when closing {})'.format(value, key), traceback)
# Give it a very distiguished name, since we need to pick it
# up from the filesystem later.
path = os.path.join(self.directory, '{}.manifest.{}.{}.manifest.json'.format(self.file_prefix, self.file_infix, os.getpid()))
logger.debug('Writing training manifest file to %s', path)
with open(path, 'w') as f:
# We need to write relative paths here since people may
# move the training_dir around. It would be cleaner to
# already have the basenames rather than basename'ing
# manually, but this works for now.
json.dump({
'stats': os.path.basename(stats_file),
'videos': [(os.path.basename(v), os.path.basename(m))
for v, m in self.videos],
'env_info': self._env_info(),
}, f)
self.enabled = False
# Stop tracking this for autoclose
del monitors[self.monitor_id]
logger.info('''Finished writing results. You can upload them to the scoreboard via gym.upload(%r)''', self.directory)
def configure(self, video_callable=None):
"""Reconfigure the monitor.
video_callable (function): Whether to record video to upload to the scoreboard.
"""
if video_callable is not None:
self.video_callable = video_callable
def _before_step(self, action):
if not self.enabled: return
self.stats_recorder.before_step(action)
def _after_step(self, observation, reward, done, info):
if not self.enabled: return done
# Add 1 since about to take another step
if self.env.spec and self.stats_recorder.steps+1 >= self.env.spec.timestep_limit:
logger.info('Ending episode %i because it reached the timestep limit of %i.', self.episode_id, self.env.spec.timestep_limit)
done = True
# Record stats
self.stats_recorder.after_step(observation, reward, done, info)
# Record video
self.video_recorder.capture_frame()
return done
def _before_reset(self):
if not self.enabled: return
self.stats_recorder.before_reset()
def _after_reset(self, observation):
if not self.enabled: return
# Reset the stat count
self.stats_recorder.after_reset(observation)
# Close any existing video recorder
if self.video_recorder:
self._close_video_recorder()
# Start recording the next video.
self.video_recorder = video_recorder.VideoRecorder(
env=self.env,
base_path=os.path.join(self.directory, '{}.video.{}.{}.video{:06}'.format(self.file_prefix, self.file_infix, os.getpid(), self.episode_id)),
metadata={'episode_id': self.episode_id},
enabled=self._video_enabled(),
)
self.video_recorder.capture_frame()
# Bump *after* all reset activity has finished
self.episode_id += 1
def _close_video_recorder(self):
self.video_recorder.close()
if self.video_recorder.functional:
self.videos.append((self.video_recorder.path, self.video_recorder.metadata_path))
def _video_enabled(self):
return self.video_callable(self.episode_id)
def _env_info(self):
if self.env.spec:
return {
'env_id': self.env.spec.id,
'gym_version': version.VERSION,
}
else:
return {}
def __del__(self):
# Make sure we've closed up shop when garbage collecting
self.close()
def load_results(training_dir):
if not os.path.exists(training_dir):
return
manifests = detect_training_manifests(training_dir)
if not manifests:
return
logger.debug('Uploading data from manifest %s', ', '.join(manifests))
# Load up stats + video files
stats_files = []
videos = []
env_infos = []
for manifest in manifests:
with open(manifest) as f:
contents = json.load(f)
# Make these paths absolute again
stats_files.append(os.path.join(training_dir, contents['stats']))
videos += [(os.path.join(training_dir, v), os.path.join(training_dir, m))
for v, m in contents['videos']]
env_infos.append(contents['env_info'])
env_info = collapse_env_infos(env_infos, training_dir)
timestamps, episode_lengths, episode_rewards = merge_stats_files(stats_files)
return {
'manifests': manifests,
'env_info': env_info,
'timestamps': timestamps,
'episode_lengths': episode_lengths,
'episode_rewards': episode_rewards,
'videos': videos,
}
def merge_stats_files(stats_files):
timestamps = []
episode_lengths = []
episode_rewards = []
for path in stats_files:
with open(path) as f:
content = json.load(f)
timestamps += content['timestamps']
episode_lengths += content['episode_lengths']
episode_rewards += content['episode_rewards']
idxs = np.argsort(timestamps)
timestamps = np.array(timestamps)[idxs].tolist()
episode_lengths = np.array(episode_lengths)[idxs].tolist()
episode_rewards = np.array(episode_rewards)[idxs].tolist()
return timestamps, episode_lengths, episode_rewards
def collapse_env_infos(env_infos, training_dir):
assert len(env_infos) > 0
first = env_infos[0]
for other in env_infos[1:]:
if first != other:
raise error.Error('Found two unequal env_infos: {} and {}. This usually indicates that your training directory {} has commingled results from multiple runs.'.format(first, other, training_dir))
for key in ['env_id', 'gym_version']:
if key not in first:
raise error.Error("env_info {} from training directory {} is missing expected key {}. This is unexpected and likely indicates a bug in gym.".format(first, training_dir, key))
return first

View File

@@ -0,0 +1,56 @@
import json
import os
import time
from gym import error
class StatsRecorder(object):
def __init__(self, directory, file_prefix):
self.directory = directory
self.file_prefix = file_prefix
self.episode_lengths = []
self.episode_rewards = []
self.timestamps = []
self.steps = None
self.rewards = None
self.done = None
def before_step(self, action):
if self.done:
raise error.ResetNeeded("Trying to step environment which is currently done. While the monitor is active, you cannot step beyond the end of an episode. Call 'env.reset()' to start the next episode.")
elif self.steps is None:
raise error.ResetNeeded("Trying to step an environment before reset. While the monitor is active, you must call 'env.reset()' before taking an initial step.")
def after_step(self, observation, reward, done, info):
self.steps += 1
self.rewards += reward
if done:
self.done = True
def before_reset(self):
self.done = False
def after_reset(self, observation):
self.flush()
def flush(self):
if self.steps is not None:
self.episode_lengths.append(self.steps)
self.episode_rewards.append(self.rewards)
self.timestamps.append(time.time())
self.steps = 0
self.rewards = 0
def close(self):
self.flush()
filename = '{}.{}.stats.json'.format(self.file_prefix, os.getpid())
path = os.path.join(self.directory, filename)
with open(path, 'w') as f:
json.dump({
'timestamps': self.timestamps,
'episode_lengths': self.episode_lengths,
'episode_rewards': self.episode_rewards,
}, f)
return path

View File

@@ -0,0 +1,67 @@
import json
import os
import shutil
import tempfile
import numpy as np
from nose2 import tools
import gym
from gym.monitoring import VideoRecorder
class BrokenRecordableEnv(object):
metadata = {'render.modes': [None, 'rgb_array']}
def render(self, mode=None):
pass
class UnrecordableEnv(object):
metadata = {'render.modes': [None]}
def render(self, mode=None):
pass
# TODO(jonas): disabled until we have ffmpeg on travis
# def test_record_simple():
# rec = VideoRecorder()
# env, id = gym.make("CartPole")
# rec.capture_frame(env)
# rec.close()
# assert not rec.empty
# assert not rec.broken
# assert os.path.exists(rec.path)
# f = open(rec.path)
# assert os.fstat(f.fileno()).st_size > 100
def test_no_frames():
env = BrokenRecordableEnv()
rec = VideoRecorder(env)
rec.close()
assert rec.empty
assert rec.functional
assert not os.path.exists(rec.path)
def test_record_unrecordable_method():
env = UnrecordableEnv()
rec = VideoRecorder(env)
assert not rec.enabled
rec.close()
def test_record_breaking_render_method():
env = BrokenRecordableEnv()
rec = VideoRecorder(env)
rec.capture_frame()
rec.close()
assert rec.empty
assert rec.broken
assert not os.path.exists(rec.path)
def test_text_envs():
env = gym.make('FrozenLake-v0')
video = VideoRecorder(env)
try:
env.reset()
video.capture_frame()
video.close()
finally:
os.remove(video.path)

View File

@@ -0,0 +1,290 @@
import logging
import json
import os
import subprocess
import tempfile
import os.path
import distutils.spawn
import numpy as np
import StringIO
from gym import error
logger = logging.getLogger(__name__)
def touch(path):
open(path, 'a').close()
class VideoRecorder(object):
"""VideoRecorder renders a nice movie of a rollout, frame by frame. It
comes with an `enabled` option so you can still use the same code
on episodes where you don't want to record video.
Note:
You are responsible for calling `close` on a created
VideoRecorder, or else you may leak an encoder process.
Args:
env (Env): Environment to take video of.
path (Optional[str]): Path to the video file; will be randomly chosen if omitted.
base_path (Optional[str]): Alternatively, path to the video file without extension, which will be added.
metadata (Optional[dict]): Contents to save to the metadata file.
enabled (bool): Whether to actually record video, or just no-op (for convenience)
"""
def __init__(self, env, path=None, metadata=None, enabled=True, base_path=None):
modes = env.metadata.get('render.modes', [])
self.ansi_mode = False
if 'rgb_array' not in modes:
if 'ansi' in modes:
self.ansi_mode = True
else:
logger.info('Disabling video recorder because {} neither supports video mode "rgb_array" nor "ansi".'.format(env))
enabled = False
if path is not None and base_path is not None:
raise error.Error("You can pass at most one of `path` or `base_path`.")
self.enabled = enabled
self.last_frame = None
if not self.enabled:
return
self.env = env
required_ext = '.json' if self.ansi_mode else '.mp4'
if path is None:
if base_path is not None:
# Base path given, append ext
path = base_path + required_ext
else:
# Otherwise, just generate a unique filename
with tempfile.NamedTemporaryFile(suffix=required_ext, delete=False) as f:
path = f.name
self.path = path
path_base, actual_ext = os.path.splitext(self.path)
if actual_ext != required_ext:
hint = " HINT: The environment is text-only, therefore we're recording its text output in a structured JSON format." if self.ansi_mode else ''
raise error.Error("Invalid path given: {} -- must have file extension {}.{}".format(self.path, required_ext, hint))
# Touch the file in any case, so we know it's present. (This
# corrects for platform platform differences. Using ffmpeg on
# OS X, the file is precreated, but not on Linux.
touch(path)
self.frames_per_sec = env.metadata.get('video.frames_per_second', 30)
self.encoder = None # lazily start the process
self.broken = False
# Dump metadata
self.metadata = metadata or {}
self.metadata['content_type'] = 'video/vnd.openai.ansivid' if self.ansi_mode else 'video/mp4'
self.metadata_path = '{}.meta.json'.format(path_base)
self.write_metadata()
logger.info('Starting new video recorder writing to %s', self.path)
self.empty = True
@property
def functional(self):
return self.enabled and not self.broken
def capture_frame(self):
"""Render the given `env` and add the resulting frame to the video."""
if not self.functional: return
logger.debug('Capturing video frame: path=%s', self.path)
render_mode = 'ansi' if self.ansi_mode else 'rgb_array'
frame = self.env.render(mode=render_mode)
if frame is None:
# Indicates a bug in the environment: don't want to raise
# an error here.
logger.warn('Env returned None on render(). Disabling further rendering for video recorder by marking as disabled: path=%s metadata_path=%s', self.path, self.metadata_path)
self.broken = True
else:
self.last_frame = frame
if self.ansi_mode:
self._encode_ansi_frame(frame)
else:
self._encode_image_frame(frame)
def close(self):
"""Make sure to manually close, or else you'll leak the encoder process"""
if not self.enabled:
return
if self.encoder:
logger.debug('Closing video encoder: path=%s', self.path)
self.encoder.close()
self.encoder = None
else:
# No frames captured. Set metadata, and remove the empty output file.
os.remove(self.path)
if self.metadata is None:
self.metadata = {}
self.metadata['empty'] = True
# If broken, get rid of the output file, otherwise we'd leak it.
if self.broken:
logger.info('Cleaning up paths for broken video recorder: path=%s metadata_path=%s', self.path, self.metadata_path)
# Might have crashed before even starting the output file, don't try to remove in that case.
if os.path.exists(self.path):
os.remove(self.path)
if self.metadata is None:
self.metadata = {}
self.metadata['broken'] = True
self.write_metadata()
def write_metadata(self):
with open(self.metadata_path, 'w') as f:
json.dump(self.metadata, f)
def _encode_ansi_frame(self, frame):
if not self.encoder:
self.encoder = TextEncoder(self.path, self.frames_per_sec)
self.metadata['encoder_version'] = self.encoder.version_info
self.encoder.capture_frame(frame)
self.empty = False
def _encode_image_frame(self, frame):
if not self.encoder:
self.encoder = ImageEncoder(self.path, frame.shape, self.frames_per_sec)
self.metadata['encoder_version'] = self.encoder.version_info
try:
self.encoder.capture_frame(frame)
except error.InvalidFrame as e:
logger.warn('Tried to pass invalid video frame, marking as broken: %s', e)
self.broken = True
else:
self.empty = False
class TextEncoder(object):
"""Store a moving picture made out of ANSI frames. Format adapted from
https://github.com/asciinema/asciinema/blob/master/doc/asciicast-v1.md"""
def __init__(self, output_path, frames_per_sec):
self.output_path = output_path
self.frames_per_sec = frames_per_sec
self.frames = []
def capture_frame(self, frame):
string = None
if isinstance(frame, str):
string = frame
elif isinstance(frame, StringIO.StringIO):
string = frame.getvalue()
else:
raise error.InvalidFrame('Wrong type {} for {}: text frame must be a string or StringIO'.format(type(frame), frame))
if string[-1] != '\n':
raise error.InvalidFrame('Frame must end with a newline: """{}"""'.format(string))
if '\r\n' in string:
raise error.InvalidFrame('Frame contains carriage returns (only newlines are allowed: """{}"""'.format(string))
self.frames.append(string)
def close(self):
#frame_duration = float(1) / self.frames_per_sec
frame_duration = .5
# Turn frames into events: clear screen beforehand
# https://rosettacode.org/wiki/Terminal_control/Clear_the_screen#Python
# https://rosettacode.org/wiki/Terminal_control/Cursor_positioning#Python
clear_code = "%c[2J\033[1;1H" % (27)
events = [ (frame_duration, clear_code+frame.replace('\n','\r\n')) for frame in self.frames ]
# Calculate frame size from the largest frames.
# Add some padding since we'll get cut off otherwise.
height = max([frame.count('\n') for frame in self.frames]) + 1
width = max([max([len(line) for line in frame.split('\n')])]) + 2
data = {
"version": 1,
"width": width,
"height": height,
"duration": len(self.frames)*frame_duration,
"command": "-",
"title": "gym VideoRecorder episode",
"env": {}, # could add some env metadata here
"stdout": events,
}
with open(self.output_path, 'w') as f:
json.dump(data, f)
@property
def version_info(self):
return {'backend':'TextEncoder','version':1}
class ImageEncoder(object):
def __init__(self, output_path, frame_shape, frames_per_sec):
self.proc = None
self.output_path = output_path
# Frame shape should be lines-first, so w and h are swapped
h, w, pixfmt = frame_shape
if pixfmt != 3 and pixfmt != 4:
raise error.InvalidFrame("Your frame has shape {}, but we require (w,h,3) or (w,h,4), i.e. RGB values for a w-by-h image, with an optional alpha channl.".format(frame_shape))
self.wh = (w,h)
self.includes_alpha = (pixfmt == 4)
self.frame_shape = frame_shape
self.frames_per_sec = frames_per_sec
if distutils.spawn.find_executable('ffmpeg') is not None:
self.backend = 'ffmpeg'
elif distutils.spawn.find_executable('avconv') is not None:
self.backend = 'avconv'
else:
raise error.DependencyNotInstalled("""Found neither the ffmpeg nor avconv executables. On OS X, you can install ffmpeg via `brew install ffmpeg`. On most Ubuntu variants, `sudo apt-get install ffmpeg` should do it. On Ubuntu 14.04, however, you'll need to install avconv with `sudo apt-get install libav-tools`.""")
self.start()
@property
def version_info(self):
return {'backend':self.backend,'version':subprocess.check_output([self.backend, '-version']),'cmdline':self.cmdline}
def start(self):
self.cmdline = (self.backend,
'-nostats',
'-loglevel', 'error', # suppress warnings
'-y',
'-r', '%d' % self.frames_per_sec,
# input
'-f', 'rawvideo',
'-s:v', '{}x{}'.format(*self.wh),
'-pix_fmt',('rgb32' if self.includes_alpha else 'rgb24'),
'-i', '/dev/stdin',
# output
'-vcodec', 'libx264',
'-pix_fmt', 'yuv420p',
self.output_path
)
logger.debug('Starting ffmpeg with "%s"', ' '.join(self.cmdline))
self.proc = subprocess.Popen(self.cmdline, stdin=subprocess.PIPE)
def capture_frame(self, frame):
if not isinstance(frame, (np.ndarray, np.generic)):
raise error.InvalidFrame('Wrong type {} for {} (must be np.ndarray or np.generic)'.format(type(frame), frame))
if frame.shape != self.frame_shape:
raise error.InvalidFrame("Your frame has shape {}, but the VideoRecorder is configured for shape {}.".format(frame_shape, self.frame_shape))
if frame.dtype != np.uint8:
raise error.InvalidFrame("Your frame has data type {}, but we require uint8 (i.e. RGB values from 0-255).".format(frame.dtype))
self.proc.stdin.write(frame.tobytes())
def close(self):
self.proc.stdin.close()
ret = self.proc.wait()
if ret != 0:
logger.error("VideoRecorder encoder exited with status {}".format(ret))

View File

@@ -0,0 +1,9 @@
import os
from gym.scoreboard.client.resource import FileUpload, Evaluation
# Discover API key from the environment. (You should never have to
# change api_base / web_base.)
api_key = os.environ.get('OPENAI_GYM_API_KEY')
api_base = os.environ.get('OPENAI_GYM_API_BASE', 'https://gym-api.openai.com')
web_base = os.environ.get('OPENAI_GYM_WEB_BASE', 'https://gym.openai.com')

181
gym/scoreboard/api.py Normal file
View File

@@ -0,0 +1,181 @@
import logging
import json
import os
import re
import tarfile
import tempfile
from gym import error, monitoring
from gym.scoreboard.client import resource, util
MAX_VIDEOS = 100
logger = logging.getLogger(__name__)
video_name_re = re.compile('^[\w.-]+\.(mp4|avi|json)$')
metadata_name_re = re.compile('^[\w.-]+\.meta\.json$')
def upload(training_dir, algorithm_id=None, writeup=None, api_key=None):
"""Upload the results of training (as automatically recorded by your
env's monitor) to OpenAI Gym.
Args:
training_dir (Optional[str]): A directory containing the results of a training run.
algorithm_id (Optional[str]): An arbitrary string indicating the paricular version of the algorithm (including choices of parameters) you are running.
writeup (Optional[str]): A Gist URL (of the form https://gist.github.com/<user>/<id>) containing your writeup for this evaluation.
api_key (Optional[str]): Your OpenAI API key. Can also be provided as an environment variable (OPENAI_GYM_API_KEY).
"""
open_monitors = monitoring._monitors.values()
if open_monitors:
envs = [m.env.spec.id if m.env.spec else '(unknown)' for m in open_monitors]
raise error.Error("Still have an open monitor on {}. You must run 'env.monitor.close()' before uploading.".format(', '.join(envs)))
env_info, training_episode_batch, training_video = upload_training_data(training_dir, api_key=None)
training_episode_batch_id = training_video_id = None
if training_episode_batch:
training_episode_batch_id = training_episode_batch.id
if training_video:
training_video_id = training_video.id
if logger.level <= logging.INFO:
message = ['Creating evaluation object on the server']
if training_episode_batch_id is not None and training_video_id is not None:
logger.info('Creating evaluation object from %s with learning curve and training video', training_dir)
elif training_episode_batch_id is not None:
logger.info('Creating evaluation object from %s with learning curve', training_dir)
elif training_video_id is not None:
logger.info('Creating evaluation object from %s with training video', training_dir)
else:
raise error.Error("You didn't have any recorded training data in {}. Once you've used 'env.monitor.start(training_dir)' to start recording, you need to actually run some rollouts. Please join the community chat on https://gym.openai.com if you have any issues.".format(training_dir))
evaluation = resource.Evaluation.create(
training_episode_batch=training_episode_batch_id,
training_video=training_video_id,
env=env_info['env_id'],
algorithm={
'id': algorithm_id,
},
writeup=writeup,
gym_version=env_info['gym_version'],
api_key=api_key,
)
logger.info(
"""
****************************************************
You successfully uploaded your agent evaluation to
OpenAI Gym! You can find it at:
%s
****************************************************
""".rstrip(), evaluation.web_url())
return evaluation
def upload_training_data(training_dir, api_key=None):
# Could have multiple manifests
results = monitoring.load_results(training_dir)
if not results:
raise error.Error('''Could not find any manifest files in {}.
(HINT: this usually means you did not yet close() your env.monitor and have not yet exited the process. You should call 'env.monitor.start(training_dir)' at the start of training and 'env.monitor.close()' at the end, or exit the process.)'''.format(training_dir))
manifests = results['manifests']
env_info = results['env_info']
timestamps = results['timestamps']
episode_lengths = results['episode_lengths']
episode_rewards = results['episode_rewards']
videos = results['videos']
logger.debug('Uploading data from manifest %s', ', '.join(manifests))
# Do the relevant uploads
if len(episode_lengths) > 0:
training_episode_batch = upload_training_episode_batch(episode_lengths, episode_rewards, timestamps, api_key)
else:
training_episode_batch = None
if len(videos) > MAX_VIDEOS:
logger.warn('You recorded videos for {} episodes, but the scoreboard only supports up to {}. We will automatically subsample for you, but you also might wish to adjust your video recording rate.'.format(len(videos), MAX_VIDEOS))
skip = len(videos) / (MAX_VIDEOS - 1)
videos = videos[::skip]
if len(videos) > 0:
training_video = upload_training_video(videos, api_key)
else:
training_video = None
return env_info, training_episode_batch, training_video
def upload_training_episode_batch(episode_lengths, episode_rewards, timestamps, api_key=None):
logger.info('Uploading %d episodes of training data', len(episode_lengths))
file_upload = resource.FileUpload.create(purpose='episode_batch', api_key=api_key)
file_upload.put({
'episode_lengths': episode_lengths,
'episode_rewards': episode_rewards,
'timestamps': timestamps,
})
return file_upload
def upload_training_video(videos, api_key=None):
"""videos: should be list of (video_path, metadata_path) tuples"""
with tempfile.TemporaryFile() as archive_file:
write_archive(videos, archive_file)
archive_file.seek(0)
logger.info('Uploading videos of %d training episodes (%d bytes)', len(videos), util.file_size(archive_file))
file_upload = resource.FileUpload.create(purpose='video', content_type='application/vnd.openai.video+x-compressed', api_key=api_key)
file_upload.put(archive_file, encode=None)
return file_upload
def write_archive(videos, archive_file):
if len(videos) > MAX_VIDEOS:
raise error.Error('Trying to upload {} videos, but there is a limit of {} currently. If you actually want to upload this many videos, please email gym@openai.com with your use-case.'.format(MAX_VIDEOS, len(videos)))
logger.debug('Preparing an archive of %d videos: %s', len(videos), videos)
# Double check that there are no collisions
basenames = set()
manifest = {
'version': 0,
'videos': []
}
with tarfile.open(fileobj=archive_file, mode='w:gz') as tar:
for video_path, metadata_path in videos:
video_name = os.path.basename(video_path)
metadata_name = os.path.basename(metadata_path)
if not os.path.exists(video_path):
raise error.Error('No such video file {}. (HINT: Your video recorder may have broken midway through the run. You can check this with `video_recorder.functional`.)'.format(video_path))
elif not os.path.exists(metadata_path):
raise error.Error('No such metadata file {}. (HINT: this should be automatically created when using a VideoRecorder instance.)'.format(video_path))
# Do some sanity checking
if video_name in basenames:
raise error.Error('Duplicated video name {} in video list: {}'.format(video_name, videos))
elif metadata_name in basenames:
raise error.Error('Duplicated metadata file name {} in video list: {}'.format(metadata_name, videos))
elif not video_name_re.search(video_name):
raise error.Error('Invalid video name {} (must match {})'.format(video_name, video_name_re.pattern))
elif not metadata_name_re.search(metadata_name):
raise error.Error('Invalid metadata file name {} (must match {})'.format(metadata_name, metadata_name_re.pattern))
# Record that we've seen these names; add to manifest
basenames.add(video_name)
basenames.add(metadata_name)
manifest['videos'].append((video_name, metadata_name))
# Import the files into the archive
tar.add(video_path, arcname=video_name, recursive=False)
tar.add(metadata_path, arcname=metadata_name, recursive=False)
# Actually write the manifest file
with tempfile.NamedTemporaryFile() as f:
json.dump(manifest, f)
f.flush()
tar.add(f.name, arcname='manifest.json')

View File

@@ -0,0 +1,4 @@
# Client
This client was forked from the (Stripe
Python)[https://github.com/stripe/stripe-python] bindings.

View File

@@ -0,0 +1,6 @@
import logging
import os
from gym import error
logger = logging.getLogger(__name__)

View File

@@ -0,0 +1,158 @@
import json
import platform
import urlparse
from gym import error, version
import gym.scoreboard.client
from gym.scoreboard.client import http_client
verify_ssl_certs = True # [SECURITY CRITICAL] only turn this off while debugging
http_client = http_client.RequestsClient(verify_ssl_certs=verify_ssl_certs)
def _build_api_url(url, query):
scheme, netloc, path, base_query, fragment = urlparse.urlsplit(url)
if base_query:
query = '%s&%s' % (base_query, query)
return urlparse.urlunsplit((scheme, netloc, path, query, fragment))
def _strip_nulls(params):
if isinstance(params, dict):
stripped = {}
for key, value in params.iteritems():
value = _strip_nulls(value)
if value is not None:
stripped[key] = value
return stripped
else:
return params
class APIRequestor(object):
def __init__(self, key=None, api_base=None):
self.api_base = api_base or gym.scoreboard.api_base
self.api_key = key
self._client = http_client
def request(self, method, url, params=None, headers=None):
rbody, rcode, rheaders, my_api_key = self.request_raw(
method.lower(), url, params, headers)
resp = self.interpret_response(rbody, rcode, rheaders)
return resp, my_api_key
def handle_api_error(self, rbody, rcode, resp, rheaders):
# Rate limits were previously coded as 400's with code 'rate_limit'
if rcode == 429:
raise error.RateLimitError(
resp.get('detail'), rbody, rcode, resp, rheaders)
elif rcode in [400, 404]:
type = resp.get('type')
if type == 'about:blank':
type = None
raise error.InvalidRequestError(
resp.get('detail'), type,
rbody, rcode, resp, rheaders)
elif rcode == 401:
raise error.AuthenticationError(
resp.get('detail'), rbody, rcode, resp,
rheaders)
else:
detail = resp.get('detail')
# This information will only be returned to developers of
# the OpenAI Gym Scoreboard.
dev_info = resp.get('dev_info')
if dev_info:
detail = "{}\n\n<dev_info>\n{}\n</dev_info>".format(detail, dev_info['traceback'])
raise error.APIError(detail, rbody, rcode, resp,
rheaders)
def request_raw(self, method, url, params=None, supplied_headers=None):
"""
Mechanism for issuing an API call
"""
if self.api_key:
my_api_key = self.api_key
else:
my_api_key = gym.scoreboard.api_key
if my_api_key is None:
raise error.AuthenticationError("""You must provide an OpenAI Gym API key.
(HINT: Set your API key using "gym.scoreboard.api_key = .." or "export OPENAI_GYM_API_KEY=..."). You can find your API key in the OpenAI Gym web interface: https://gym.openai.com/settings/profile.""")
abs_url = '%s%s' % (self.api_base, url)
if params:
encoded_params = json.dumps(_strip_nulls(params))
else:
encoded_params = None
if method == 'get' or method == 'delete':
if params:
abs_url = _build_api_url(abs_url, encoded_params)
post_data = None
elif method == 'post':
post_data = encoded_params
else:
raise error.APIConnectionError(
'Unrecognized HTTP method %r. This may indicate a bug in the '
'OpenAI Gym bindings. Please contact gym@openai.com for '
'assistance.' % (method,))
ua = {
'bindings_version': version.VERSION,
'lang': 'python',
'publisher': 'openai',
'httplib': self._client.name,
}
for attr, func in [['lang_version', platform.python_version],
['platform', platform.platform]]:
try:
val = func()
except Exception as e:
val = "!! %s" % (e,)
ua[attr] = val
headers = {
'Openai-Gym-User-Agent': json.dumps(ua),
'User-Agent': 'Openai-Gym/v1 PythonBindings/%s' % (version.VERSION,),
'Authorization': 'Bearer %s' % (my_api_key,)
}
if method == 'post':
headers['Content-Type'] = 'application/json'
if supplied_headers is not None:
for key, value in supplied_headers.items():
headers[key] = value
rbody, rcode, rheaders = self._client.request(
method, abs_url, headers, post_data)
return rbody, rcode, rheaders, my_api_key
def interpret_response(self, rbody, rcode, rheaders):
content_type = rheaders.get('Content-Type', '')
if content_type.startswith('text/plain'):
# Pass through plain text
resp = rbody
if not (200 <= rcode < 300):
self.handle_api_error(rbody, rcode, {}, rheaders)
else:
# TODO: Be strict about other Content-Types
try:
if hasattr(rbody, 'decode'):
rbody = rbody.decode('utf-8')
resp = json.loads(rbody)
except Exception:
raise error.APIError(
"Invalid response body from API: %s "
"(HTTP response code was %d)" % (rbody, rcode),
rbody, rcode, rheaders)
if not (200 <= rcode < 300):
self.handle_api_error(rbody, rcode, resp, rheaders)
return resp

View File

@@ -0,0 +1,93 @@
# Forked from
import logging
import requests
import textwrap
from gym import error
from gym.scoreboard.client import util
logger = logging.getLogger(__name__)
warned = False
def render_post_data(post_data):
if hasattr(post_data, 'fileno'): # todo: is this the right way of checking if it's a file?
return '%r (%d bytes)' % (post_data, util.file_size(post_data))
elif isinstance(post_data, basestring):
return '%r (%d bytes)' % (post_data, len(post_data))
else:
return None
class RequestsClient(object):
name = 'requests'
def __init__(self, verify_ssl_certs=True):
self._verify_ssl_certs = verify_ssl_certs
self.session = requests.Session()
def request(self, method, url, headers, post_data=None, files=None):
kwargs = {}
# Really, really only turn this off while debugging.
if not self._verify_ssl_certs:
if not warned:
logger.warn('You have disabled SSL cert verification in OpenAI Gym, so we will not verify SSL certs. This means an attacker with control of your network could snoop on or modify your data in transit.')
warned = True
kwargs['verify'] = False
try:
try:
result = self.session.request(method,
url,
headers=headers,
data=post_data,
timeout=200,
files=files,
**kwargs)
except TypeError as e:
raise TypeError(
'Warning: It looks like your installed version of the '
'"requests" library is not compatible with OpenAI Gym\'s'
'usage thereof. (HINT: The most likely cause is that '
'your "requests" library is out of date. You can fix '
'that by running "pip install -U requests".) The '
'underlying error was: %s' % (e,))
# This causes the content to actually be read, which could cause
# e.g. a socket timeout. TODO: The other fetch methods probably
# are susceptible to the same and should be updated.
content = result.content
status_code = result.status_code
except Exception as e:
# Would catch just requests.exceptions.RequestException, but can
# also raise ValueError, RuntimeError, etc.
self._handle_request_error(e, method, url)
if util.logger.level <= logging.DEBUG:
util.logger.debug(
"""API request to %s returned (response code, response body) of
(%d, %r)
Request body was: %s""", url, status_code, content, render_post_data(post_data))
elif util.logger.level <= logging.INFO:
util.logger.info('HTTP request: %s %s %d', method.upper(), url, status_code)
return content, status_code, result.headers
def _handle_request_error(self, e, method, url):
if isinstance(e, requests.exceptions.RequestException):
msg = ("Unexpected error communicating with OpenAI Gym "
"(while calling {} {}). "
"If this problem persists, let us know at "
"gym@openai.com.".format(method, url))
err = "%s: %s" % (type(e).__name__, str(e))
else:
msg = ("Unexpected error communicating with OpenAI Gym. "
"It looks like there's probably a configuration "
"issue locally. If this problem persists, let us "
"know at gym@openai.com.")
err = "A %s was raised" % (type(e).__name__,)
if str(e):
err += " with error message %s" % (str(e),)
else:
err += " with no error message"
msg = textwrap.fill(msg, width=140) + "\n\n(Network error: %s)" % (err,)
raise error.APIConnectionError(msg)

View File

@@ -0,0 +1,378 @@
import json
import urllib
import warnings
import sys
import gym
from gym import error
from gym.scoreboard.client import api_requestor, util
def convert_to_gym_object(resp, api_key):
types = {
'evaluation': Evaluation,
'file': FileUpload,
}
if isinstance(resp, list):
return [convert_to_gym_object(i, api_key) for i in resp]
elif isinstance(resp, dict) and not isinstance(resp, GymObject):
resp = resp.copy()
klass_name = resp.get('object')
if isinstance(klass_name, basestring):
klass = types.get(klass_name, GymObject)
else:
klass = GymObject
return klass.construct_from(resp, api_key)
else:
return resp
def populate_headers(idempotency_key):
if idempotency_key is not None:
return {"Idempotency-Key": idempotency_key}
return None
def _compute_diff(current, previous):
if isinstance(current, dict):
previous = previous or {}
diff = current.copy()
for key in set(previous.keys()) - set(diff.keys()):
diff[key] = ""
return diff
return current if current is not None else ""
class GymObject(dict):
def __init__(self, id=None, api_key=None, **params):
super(GymObject, self).__init__()
self._unsaved_values = set()
self._transient_values = set()
self._retrieve_params = params
self._previous = None
object.__setattr__(self, 'api_key', api_key)
if id:
self['id'] = id
def update(self, update_dict):
for k in update_dict:
self._unsaved_values.add(k)
return super(GymObject, self).update(update_dict)
def __setattr__(self, k, v):
if k[0] == '_' or k in self.__dict__:
return super(GymObject, self).__setattr__(k, v)
else:
self[k] = v
def __getattr__(self, k):
if k[0] == '_':
raise AttributeError(k)
try:
return self[k]
except KeyError as err:
raise AttributeError(*err.args)
def __delattr__(self, k):
if k[0] == '_' or k in self.__dict__:
return super(GymObject, self).__delattr__(k)
else:
del self[k]
def __setitem__(self, k, v):
if v == "":
raise ValueError(
"You cannot set %s to an empty string. "
"We interpret empty strings as None in requests."
"You may set %s.%s = None to delete the property" % (
k, str(self), k))
super(GymObject, self).__setitem__(k, v)
# Allows for unpickling in Python 3.x
if not hasattr(self, '_unsaved_values'):
self._unsaved_values = set()
self._unsaved_values.add(k)
def __getitem__(self, k):
try:
return super(GymObject, self).__getitem__(k)
except KeyError as err:
if k in self._transient_values:
raise KeyError(
"%r. HINT: The %r attribute was set in the past."
"It was then wiped when refreshing the object with "
"the result returned by Rl_Gym's API, probably as a "
"result of a save(). The attributes currently "
"available on this object are: %s" %
(k, k, ', '.join(self.keys())))
else:
raise err
def __delitem__(self, k):
super(GymObject, self).__delitem__(k)
# Allows for unpickling in Python 3.x
if hasattr(self, '_unsaved_values'):
self._unsaved_values.remove(k)
@classmethod
def construct_from(cls, values, key):
instance = cls(values.get('id'), api_key=key)
instance.refresh_from(values, api_key=key)
return instance
def refresh_from(self, values, api_key=None, partial=False):
self.api_key = api_key or getattr(values, 'api_key', None)
# Wipe old state before setting new. This is useful for e.g.
# updating a customer, where there is no persistent card
# parameter. Mark those values which don't persist as transient
if partial:
self._unsaved_values = (self._unsaved_values - set(values))
else:
removed = set(self.keys()) - set(values)
self._transient_values = self._transient_values | removed
self._unsaved_values = set()
self.clear()
self._transient_values = self._transient_values - set(values)
for k, v in values.iteritems():
super(GymObject, self).__setitem__(
k, convert_to_gym_object(v, api_key))
self._previous = values
@classmethod
def api_base(cls):
return None
def request(self, method, url, params=None, headers=None):
if params is None:
params = self._retrieve_params
requestor = api_requestor.APIRequestor(
key=self.api_key, api_base=self.api_base())
response, api_key = requestor.request(method, url, params, headers)
return convert_to_gym_object(response, api_key)
def __repr__(self):
ident_parts = [type(self).__name__]
if isinstance(self.get('object'), basestring):
ident_parts.append(self.get('object'))
if isinstance(self.get('id'), basestring):
ident_parts.append('id=%s' % (self.get('id'),))
unicode_repr = '<%s at %s> JSON: %s' % (
' '.join(ident_parts), hex(id(self)), str(self))
if sys.version_info[0] < 3:
return unicode_repr.encode('utf-8')
else:
return unicode_repr
def __str__(self):
return json.dumps(self, sort_keys=True, indent=2)
def to_dict(self):
warnings.warn(
'The `to_dict` method is deprecated and will be removed in '
'version 2.0 of the Rl_Gym bindings. The GymObject is '
'itself now a subclass of `dict`.',
DeprecationWarning)
return dict(self)
@property
def gym_id(self):
return self.id
def serialize(self, previous):
params = {}
unsaved_keys = self._unsaved_values or set()
previous = previous or self._previous or {}
for k, v in self.items():
if k == 'id' or (isinstance(k, str) and k.startswith('_')):
continue
elif isinstance(v, APIResource):
continue
elif hasattr(v, 'serialize'):
params[k] = v.serialize(previous.get(k, None))
elif k in unsaved_keys:
params[k] = _compute_diff(v, previous.get(k, None))
return params
class APIResource(GymObject):
@classmethod
def retrieve(cls, id, api_key=None, **params):
instance = cls(id, api_key, **params)
instance.refresh()
return instance
def refresh(self):
self.refresh_from(self.request('get', self.instance_path()))
return self
@classmethod
def class_name(cls):
if cls == APIResource:
raise NotImplementedError(
'APIResource is an abstract class. You should perform '
'actions on its subclasses (e.g. Charge, Customer)')
return str(urllib.quote_plus(cls.__name__.lower()))
@classmethod
def class_path(cls):
cls_name = cls.class_name()
return "/v1/%ss" % (cls_name,)
def instance_path(self):
id = self.get('id')
if not id:
raise error.InvalidRequestError(
'Could not determine which URL to request: %s instance '
'has invalid ID: %r' % (type(self).__name__, id), 'id')
id = util.utf8(id)
base = self.class_path()
extn = urllib.quote_plus(id)
return "%s/%s" % (base, extn)
class ListObject(GymObject):
def list(self, **params):
return self.request('get', self['url'], params)
def all(self, **params):
warnings.warn("The `all` method is deprecated and will"
"be removed in future versions. Please use the "
"`list` method instead",
DeprecationWarning)
return self.list(**params)
def auto_paging_iter(self):
page = self
params = dict(self._retrieve_params)
while True:
item_id = None
for item in page:
item_id = item.get('id', None)
yield item
if not getattr(page, 'has_more', False) or item_id is None:
return
params['starting_after'] = item_id
page = self.list(**params)
def create(self, idempotency_key=None, **params):
headers = populate_headers(idempotency_key)
return self.request('post', self['url'], params, headers)
def retrieve(self, id, **params):
base = self.get('url')
id = util.utf8(id)
extn = urllib.quote_plus(id)
url = "%s/%s" % (base, extn)
return self.request('get', url, params)
def __iter__(self):
return getattr(self, 'data', []).__iter__()
# Classes of API operations
class ListableAPIResource(APIResource):
@classmethod
def all(cls, *args, **params):
warnings.warn("The `all` class method is deprecated and will"
"be removed in future versions. Please use the "
"`list` class method instead",
DeprecationWarning)
return cls.list(*args, **params)
@classmethod
def auto_paging_iter(self, *args, **params):
return self.list(*args, **params).auto_paging_iter()
@classmethod
def list(cls, api_key=None, idempotency_key=None, **params):
requestor = api_requestor.APIRequestor(api_key)
url = cls.class_path()
response, api_key = requestor.request('get', url, params)
return convert_to_gym_object(response, api_key)
class CreateableAPIResource(APIResource):
@classmethod
def create(cls, api_key=None, idempotency_key=None, **params):
requestor = api_requestor.APIRequestor(api_key)
url = cls.class_path()
headers = populate_headers(idempotency_key)
response, api_key = requestor.request('post', url, params, headers)
return convert_to_gym_object(response, api_key)
class UpdateableAPIResource(APIResource):
def save(self, idempotency_key=None):
updated_params = self.serialize(None)
headers = populate_headers(idempotency_key)
if updated_params:
self.refresh_from(self.request('post', self.instance_path(),
updated_params, headers))
else:
util.logger.debug("Trying to save already saved object %r", self)
return self
class DeletableAPIResource(APIResource):
def delete(self, **params):
self.refresh_from(self.request('delete', self.instance_path(), params))
return self
## Our resources
class FileUpload(ListableAPIResource):
@classmethod
def class_name(cls):
return 'file'
@classmethod
def create(cls, api_key=None, **params):
requestor = api_requestor.APIRequestor(
api_key, api_base=cls.api_base())
url = cls.class_path()
response, api_key = requestor.request(
'post', url, params=params)
return convert_to_gym_object(response, api_key)
def put(self, contents, encode='json'):
supplied_headers = {
"Content-Type": self.content_type
}
if encode == 'json':
contents = json.dumps(contents)
elif encode is None:
pass
else:
raise error.Error('Encode request for put must be "json" or None, not {}'.format(encode))
files = {'file': contents}
body, code, headers = api_requestor.http_client.request(
'post', self.post_url, post_data=self.post_fields, files=files, headers={})
if code != 204:
raise error.Error("Upload to S3 failed. If error persists, please contact us at gym@openai.com this message. S3 returned '{} -- {}'. Tried 'POST {}' with fields {}.".format(code, body, self.post_url, self.post_fields))
class Evaluation(CreateableAPIResource):
def web_url(self):
return "%s/evaluations/%s" % (gym.scoreboard.web_base, self.get('id'))

View File

View File

@@ -0,0 +1,32 @@
import mock
import unittest
import uuid
def fake_id(prefix):
entropy = ''.join([a for a in str(uuid.uuid4()) if a.isalnum()])
return '{}_{}'.format(prefix, entropy)
class APITestCase(unittest.TestCase):
def setUp(self):
super(APITestCase, self).setUp()
self.requestor_patcher = mock.patch('gym.scoreboard.client.api_requestor.APIRequestor')
requestor_class_mock = self.requestor_patcher.start()
self.requestor_mock = requestor_class_mock.return_value
def mock_response(self, res):
self.requestor_mock.request = mock.Mock(return_value=(res, 'reskey'))
class TestData(object):
@classmethod
def file_upload_response(cls):
return {
'id': fake_id('file'),
'object': 'file',
}
@classmethod
def evaluation_response(cls):
return {
'id': fake_id('file'),
'object': 'evaluation',
}

View File

@@ -0,0 +1,16 @@
from gym.scoreboard.client.tests import helper
from gym import scoreboard
class EvaluationTest(helper.APITestCase):
def test_create_evaluation(self):
self.mock_response(helper.TestData.evaluation_response())
evaluation = scoreboard.Evaluation.create()
assert isinstance(evaluation, scoreboard.Evaluation)
self.requestor_mock.request.assert_called_with(
'post',
'/v1/evaluations',
{},
None
)

View File

@@ -0,0 +1,15 @@
from gym.scoreboard.client.tests import helper
from gym import scoreboard
class FileUploadTest(helper.APITestCase):
def test_create_file_upload(self):
self.mock_response(helper.TestData.file_upload_response())
file_upload = scoreboard.FileUpload.create()
assert isinstance(file_upload, scoreboard.FileUpload), 'File upload is: {!r}'.format(file_upload)
self.requestor_mock.request.assert_called_with(
'post',
'/v1/files',
params={},
)

View File

@@ -0,0 +1,14 @@
import logging
import os
import sys
logger = logging.getLogger(__name__)
def utf8(value):
if isinstance(value, unicode) and sys.version_info < (3, 0):
return value.encode('utf-8')
else:
return value
def file_size(f):
return os.fstat(f.fileno()).st_size

123
gym/scoreboard/scoring.py Normal file
View File

@@ -0,0 +1,123 @@
"""This is the actual code we use to score people's solutions
server-side. The interfaces here are not yet stable, but we include
them so that people can reproduce our scoring calculations
independently.
We correspondly do not currently import this module.
"""
import numpy as np
import requests
import gym
def score_from_remote(url):
result = requests.get(url)
parsed = result.json()
episode_lengths = parsed['episode_lengths']
episode_rewards = parsed['episode_rewards']
timestamps = parsed['timestamps']
env_id = parsed['env_id']
spec = gym.spec(env_id)
return score_from_merged(episode_lengths, episode_rewards, timestamps, spec.trials, spec.reward_threshold)
def score_from_merged(episode_lengths, episode_rewards, timestamps, trials, reward_threshold):
"""Method to calculate the score from merged monitor files.
"""
# Make sure everything is a float -- no pesky ints.
episode_rewards = np.array(episode_rewards, dtype='float64')
episode_t_value = timestep_t_value = mean = error = time_in_seconds = None
if len(timestamps) > 2:
# This is: time from the first *step* to the last *step*.
time_in_seconds = timestamps[-1] - timestamps[0]
if len(episode_rewards) >= trials:
means = running_mean(episode_rewards, trials)
if reward_threshold is not None:
# Compute t-value by finding the first index above the
# threshold. It comes out as a singleton tuple.
(indexes_above_threshold, ) = np.where(means > reward_threshold)
if len(indexes_above_threshold) > 0:
# Grab the first episode index that is above the threshold value
episode_t_value = indexes_above_threshold[0]
# Find timestep corresponding to this episode
cumulative_timesteps = np.cumsum(np.insert(episode_lengths, 0, 0))
# Convert that into timesteps
timestep_t_value = cumulative_timesteps[episode_t_value]
# Find the window with the best mean
best_idx = np.argmax(means)
best_rewards = episode_rewards[best_idx:best_idx+trials]
mean = np.mean(best_rewards)
error = np.std(best_rewards) / (np.sqrt(trials) - 1)
return {
'episode_t_value': episode_t_value,
'timestep_t_value': timestep_t_value,
'mean': mean,
'error': error,
'number_episodes': len(episode_rewards),
'number_timesteps': sum(episode_lengths),
'time_in_seconds': time_in_seconds,
}
def running_mean(x, N):
x = np.array(x, dtype='float64')
cumsum = np.cumsum(np.insert(x, 0, 0))
return (cumsum[N:] - cumsum[:-N]) / N
def compute_graph_stats(episode_lengths, episode_rewards, timestamps, buckets):
"""Method to compute the aggregates for the graphs."""
# Not a dependency of OpenAI Gym generally.
import scipy
num_episodes = len(episode_lengths)
episode_rewards = np.array(episode_rewards)
episode_lengths = np.array(episode_lengths)
# The index of the start of each episode
x_timestep = np.cumsum(np.insert(episode_lengths, 0, 0))[:-1]
assert len(x_timestep) == num_episodes
# Nothing to compute here
x_timestamp = timestamps
# The index of each episode
x_episode = range(num_episodes)
# Calculate the appropriate x/y statistics
x_timestep_y_reward = scipy.stats.binned_statistic(x_timestep, episode_rewards, 'median', buckets)
x_timestep_y_length = scipy.stats.binned_statistic(x_timestep, episode_lengths, 'median', buckets)
x_episode_y_reward = scipy.stats.binned_statistic(x_episode, episode_rewards, 'median', buckets)
x_episode_y_length = scipy.stats.binned_statistic(x_episode, episode_lengths, 'median', buckets)
x_timestamp_y_reward = scipy.stats.binned_statistic(x_timestamp, episode_rewards, 'median', buckets)
x_timestamp_y_length = scipy.stats.binned_statistic(x_timestamp, episode_lengths, 'median', buckets)
return {
'x_timestep_y_reward': graphable_binned_statistic(x_timestep_y_reward),
'x_timestep_y_length': graphable_binned_statistic(x_timestep_y_length),
'x_episode_y_reward': graphable_binned_statistic(x_episode_y_reward),
'x_episode_y_length': graphable_binned_statistic(x_episode_y_length),
'x_timestamp_y_length': graphable_binned_statistic(x_timestamp_y_length),
'x_timestamp_y_length': graphable_binned_statistic(x_timestamp_y_length),
}
def graphable_binned_statistic(binned):
x = running_mean(binned.bin_edges, 2)
y = binned.statistic
assert len(x) == len(y)
# Get rid of nasty NaNs
valid = np.logical_not(np.isnan(x)) & np.logical_not(np.isnan(y))
x = x[valid]
y = y[valid]
return {
'x': x,
'y': y,
}

5
gym/spaces/__init__.py Normal file
View File

@@ -0,0 +1,5 @@
from .box import Box
from .discrete import Discrete
from .tuple_space import Tuple
__all__ = ["Box", "Discrete", "Tuple"]

39
gym/spaces/box.py Normal file
View File

@@ -0,0 +1,39 @@
from gym import Space
import numpy as np
class Box(Space):
"""
A box in R^n.
I.e., each coordinate is bounded.
"""
def __init__(self, low, high, shape=None):
"""
Two kinds of valid input:
Box(-1.0, 1.0, (3,4)) # low and high are scalars, and shape is provided
Box(np.array([-1.0,-2.0]), np.array([2.0,4.0])) # low and high are arrays of the same shape
"""
if shape is None:
assert low.shape == high.shape
self.low = low
self.high = high
else:
assert np.isscalar(low) and np.isscalar(high)
self.low = low + np.zeros(shape)
self.high = high + np.zeros(shape)
def sample(self):
return np.random.uniform(low=self.low, high=self.high, size=self.low.shape)
def contains(self, x):
return x.shape == self.shape and (x >= self.low).all() and (x <= self.high).all()
def to_jsonable(self, sample_n):
return np.array(sample_n).tolist()
def from_jsonable(self, sample_n):
return [np.asarray(sample) for sample in sample_n]
@property
def shape(self):
return self.low.shape
def __repr__(self):
return "Box" + str(self.shape)
def __eq__(self, other):
return np.allclose(self.low, other.low) and np.allclose(self.high, other.high)

17
gym/spaces/discrete.py Normal file
View File

@@ -0,0 +1,17 @@
import numpy as np
from gym import Space
class Discrete(Space):
"""
{0,1,...,n-1}
"""
def __init__(self, n):
self.n = n
def sample(self):
return np.random.randint(self.n)
def contains(self, x):
return isinstance(x, int) and x >= 0 and x < self.n
def __repr__(self):
return "Discrete(%d)" % self.n
def __eq__(self, other):
return self.n == other.n

View File

@@ -0,0 +1,30 @@
import json # note: ujson fails this test due to float equality
import numpy as np
from nose2 import tools
from gym.spaces import Tuple, Box, Discrete
@tools.params(Discrete(3),
Tuple([Discrete(5), Discrete(10)]),
Tuple([Discrete(5), Box(np.array([0,0]),np.array([1,5]))]),
Tuple((Discrete(5), Discrete(2), Discrete(2)))
)
def test_roundtripping(space):
sample_1 = space.sample()
sample_2 = space.sample()
assert space.contains(sample_1)
assert space.contains(sample_2)
json_rep = space.to_jsonable([sample_1, sample_2])
json_roundtripped = json.loads(json.dumps(json_rep))
samples_after_roundtrip = space.from_jsonable(json_roundtripped)
sample_1_prime, sample_2_prime = samples_after_roundtrip
s1 = space.to_jsonable([sample_1])
s1p = space.to_jsonable([sample_1_prime])
s2 = space.to_jsonable([sample_2])
s2p = space.to_jsonable([sample_2_prime])
assert s1 == s1p, "Expected {} to equal {}".format(s1, s1p)
assert s2 == s2p, "Expected {} to equal {}".format(s2, s2p)

26
gym/spaces/tuple_space.py Normal file
View File

@@ -0,0 +1,26 @@
from gym import Space
class Tuple(Space):
"""
A tuple (i.e., product) of simpler spaces
"""
def __init__(self, spaces):
self.spaces = spaces
def sample(self):
return tuple([space.sample() for space in self.spaces])
def contains(self, x):
return isinstance(x, tuple) and len(x) == len(self.spaces) and all(
space.contains(part) for (space,part) in zip(self.spaces,x))
def __repr__(self):
return "Tuple(" + ", ". join([str(s) for s in self.spaces]) + ")"
def to_jsonable(self, sample_n):
# serialize as list-repr of tuple of vectors
return [space.to_jsonable([sample[i] for sample in sample_n]) \
for i, space in enumerate(self.spaces)]
def from_jsonable(self, sample_n):
return zip(*[space.from_jsonable(sample_n[i]) for i, space in enumerate(self.spaces)])

55
gym/utils.py Normal file
View File

@@ -0,0 +1,55 @@
"""A set of common utilities used within the environments. These are
not intended as API functions, and will not remain stable over time.
"""
color2num = dict(
gray=30,
red=31,
green=32,
yellow=33,
blue=34,
magenta=35,
cyan=36,
white=37,
crimson=38
)
def colorize(string, color, bold=False, highlight = False):
"""Return string surrounded by appropriate terminal color codes to
print colorized text. Valid colors: gray, red, green, yellow,
blue, magenta, cyan, white, crimson
"""
attr = []
num = color2num[color]
if highlight: num += 10
attr.append(unicode(num))
if bold: attr.append('1')
return '\x1b[%sm%s\x1b[0m' % (';'.join(attr), string)
class EzPickle(object):
"""Objects that are pickled and unpickled via their constructor
arguments.
Example usage:
class Dog(Animal, EzPickle):
def __init__(self, furcolor, tailkind="bushy"):
Animal.__init__()
EzPickle.__init__(furcolor, tailkind)
...
When this object is unpickled, a new Dog will be constructed by passing the provided
furcolor and tailkind into the constructor. However, philosophers are still not sure
whether it is still the same dog.
This is generally needed only for environments which wrap C/C++ code, such as MuJoCo
and Atari.
"""
def __init__(self, *args, **kwargs):
self._ezpickle_args = args
self._ezpickle_kwargs = kwargs
def __getstate__(self):
return {"_ezpickle_args" : self._ezpickle_args, "_ezpickle_kwargs": self._ezpickle_kwargs}
def __setstate__(self, d):
out = type(self)(*d["_ezpickle_args"], **d["_ezpickle_kwargs"])
self.__dict__.update(out.__dict__)

1
gym/version.py Normal file
View File

@@ -0,0 +1 @@
VERSION = '0.0.1'

5
requirements.txt Normal file
View File

@@ -0,0 +1,5 @@
# Testing
nose2
mock
-e .[all]

34
setup.py Normal file
View File

@@ -0,0 +1,34 @@
from setuptools import setup, find_packages
import sys, os.path
# Don't import gym module here, since deps may not be installed
sys.path.insert(0, os.path.join(os.path.dirname(__file__), 'gym'))
from version import VERSION
setup(name='gym',
version=VERSION,
description='The OpenAI Gym: A toolkit for developing and comparing your reinforcement learning agents.',
url='https://github.com/openai/gym',
author='OpenAI',
author_email='gym@openai.com',
license='',
packages=[package for package in find_packages()
if package.startswith('gym')],
zip_safe=False,
install_requires=[
'numpy>=1.10.4', 'requests', 'six'
],
extras_require={
'all': ['atari_py>=0.0.14', 'Pillow', 'pyglet',
'pachi-py>=0.0.16',
'mujoco_py>=0.4.0', 'imageio'],
# Environment-specific dependencies. Keep these in sync with
# 'all'!
'atari': ['atari_py>=0.0.14', 'Pillow', 'pyglet'],
'board_game' : ['pachi-py>=0.0.16'],
'classic_control': ['pyglet'],
'mujoco': ['mujoco_py>=0.4.0', 'imageio'],
},
tests_require=['nose2', 'mock'],
)