mirror of
https://github.com/Farama-Foundation/Gymnasium.git
synced 2025-07-31 05:44:31 +00:00
Initial release. Hello world :).
This commit is contained in:
31
.gitignore
vendored
Normal file
31
.gitignore
vendored
Normal file
@@ -0,0 +1,31 @@
|
||||
*.swp
|
||||
*.pyc
|
||||
*.py~
|
||||
.DS_Store
|
||||
|
||||
# Setuptools distribution and build folders.
|
||||
/dist/
|
||||
/build
|
||||
|
||||
# Virtualenv
|
||||
/env
|
||||
|
||||
# Python egg metadata, regenerated from source files by setuptools.
|
||||
/*.egg-info
|
||||
|
||||
*.sublime-project
|
||||
*.sublime-workspace
|
||||
|
||||
logs/
|
||||
|
||||
.ipynb_checkpoints
|
||||
ghostdriver.log
|
||||
|
||||
junk
|
||||
MUJOCO_LOG.txt
|
||||
mujoco-bundle
|
||||
|
||||
|
||||
rllab_mujoco
|
||||
|
||||
tutorial/*.html
|
32
.travis.yml
Normal file
32
.travis.yml
Normal file
@@ -0,0 +1,32 @@
|
||||
dist: trusty
|
||||
sudo: required
|
||||
cache:
|
||||
apt: true
|
||||
pip: false
|
||||
language: python
|
||||
python:
|
||||
- "2.7"
|
||||
# - "3.2"
|
||||
|
||||
# Install numpy and scipy so we don't need to compile them
|
||||
addons:
|
||||
apt:
|
||||
packages:
|
||||
- python-numpy
|
||||
- python-matplotlib
|
||||
- python-tk
|
||||
|
||||
before_install:
|
||||
- Xvfb :12 -screen 0 800x600x24 +extension RANDR &
|
||||
- mkdir -p ~/.mujoco
|
||||
- curl https://openai-public.s3-us-west-2.amazonaws.com/mujoco/$MUJOCO_KEY_BUNDLE.tar.gz | tar xz -C ~/.mujoco
|
||||
env:
|
||||
- DISPLAY=:12
|
||||
|
||||
install: pip install -r requirements.txt
|
||||
script: nose2
|
||||
|
||||
notifications:
|
||||
slack:
|
||||
secure: h/Mxm8K+avH/2W0818zCHmLloRPMFN4NJL01+VShvAkH80/acfjeq/+mMdWXXPL/oOB6kSHDk+GDhwR6+s03ZcPMn5INTFvFYqUc6UWmT+NXtOPxGTN0xda6MdYUkWQUKaMyjFrweZQOMOASFBIzPOq4XeVbM5aB8s4EJhnfAcYZhp/idwKbToVihN4KZgxlvZIFc8iEp1o9uSl5qrsaeYYYXRkb6mauacAwOo4/Chu+cOnoLUOnvhBFE3rV3doDNrbnoalO8XiExtgx5CIAYWrlMni7r2Q+LlzgwdyTH19ZtybPxJTZIIWSBQ2UtcoYdIEDcc36GcUwz1VUGg32mLJJnY2xw80CWR4ixFPpLwwP5Y99WTn8v094B4nmFTWOwNWXp3EkqtTN9XcJoRBqXB5ArucIPqrx57dOCljSKx22gL6WaF2p3stSAxIGFektGyGnisaELrFZG1C63aHoUPicj3gUlijmAoUmYaDRf6P1wnpXqBpKDAWWhAMSatvx1ekmEJgR7OQklQnnfjx9kENDUygNUWS4IQwN2qYieuzHFL3of7/30mTM43+Vt/vWN8GI7j01BXu6FNGGloHxjH1pt3bLP/+uj5BJsT2HWF+Z8XR4VE6cyVuKsQAFgCXwOkoDHALbcwsspONDIt/9ixkesgh1oFt4CzU3UuU5wYs=
|
||||
on_success: change
|
13
CODE_OF_CONDUCT.rst
Normal file
13
CODE_OF_CONDUCT.rst
Normal file
@@ -0,0 +1,13 @@
|
||||
OpenAI Gym is dedicated to providing a harassment-free experience for
|
||||
everyone, regardless of gender, gender identity and expression, sexual
|
||||
orientation, disability, physical appearance, body size, age, race, or
|
||||
religion. We do not tolerate harassment of participants in any form.
|
||||
|
||||
This code of conduct applies to all OpenAI Gym spaces (including Gist
|
||||
comments) both online and off. Anyone who violates this code of
|
||||
conduct may be sanctioned or expelled from these spaces at the
|
||||
discretion of the OpenAI team.
|
||||
|
||||
We may add additional rules over time, which will be made clearly
|
||||
available to participants. Participants are responsible for knowing
|
||||
and abiding by these rules.
|
35
Dockerfile
Normal file
35
Dockerfile
Normal file
@@ -0,0 +1,35 @@
|
||||
# A Dockerfile that sets up a full Gym install
|
||||
FROM ubuntu:14.04
|
||||
|
||||
RUN apt-get update \
|
||||
&& apt-get install -y xorg-dev \
|
||||
libgl1-mesa-dev \
|
||||
xvfb \
|
||||
libxinerama1 \
|
||||
libxcursor1 \
|
||||
libglu1-mesa \
|
||||
libav-tools \
|
||||
python-numpy \
|
||||
python-scipy \
|
||||
python-pyglet \
|
||||
python-setuptools \
|
||||
libpq-dev \
|
||||
libjpeg-dev \
|
||||
curl \
|
||||
cmake \
|
||||
&& apt-get clean \
|
||||
&& rm -rf /var/lib/apt/lists/* \
|
||||
&& easy_install pip
|
||||
|
||||
WORKDIR /usr/local/gym
|
||||
RUN mkdir gym && touch gym/__init__.py
|
||||
COPY ./gym/version.py ./gym
|
||||
COPY ./requirements.txt .
|
||||
COPY ./setup.py .
|
||||
RUN pip install -r requirements.txt
|
||||
|
||||
# Finally, upload our actual code!
|
||||
COPY . /usr/local/gym
|
||||
|
||||
WORKDIR /root
|
||||
ENTRYPOINT ["/usr/local/gym/bin/docker_entrypoint"]
|
7
Makefile
Normal file
7
Makefile
Normal file
@@ -0,0 +1,7 @@
|
||||
.PHONY: install test
|
||||
|
||||
install:
|
||||
pip install -r requirements.txt
|
||||
|
||||
test:
|
||||
nose2
|
208
README.rst
Normal file
208
README.rst
Normal file
@@ -0,0 +1,208 @@
|
||||
gym
|
||||
******
|
||||
|
||||
**OpenAI Gym is a toolkit for developing and comparing reinforcement learning algorithms.** This is the ``gym`` open-source library, which gives you access to an ever-growing variety of environments.
|
||||
|
||||
``gym`` makes no assumptions about the structure of your agent, and is compatible with any numerical computation library, such as Tensorflow or Theano. You can use it from Python code, and soon from other languages.
|
||||
|
||||
If you're not sure where to start, we recommend beginning with the
|
||||
`docs <https://gym.openai.com/docs>`_ on our site.
|
||||
|
||||
.. contents:: **Contents of this document**
|
||||
:depth: 2
|
||||
|
||||
Basics
|
||||
======
|
||||
|
||||
There are two basic concepts in reinforcement learning: the
|
||||
environment (namely, the outside world) and the agent (namely, the
|
||||
algorithm you are writing). The agent sends `actions` to the
|
||||
environment, and the environment replies with `observations` and
|
||||
`rewards` (that is, a score).
|
||||
|
||||
The core `gym` interface is `Env
|
||||
<https://github.com/openai/gym/blob/master/gym/core.py>`_, which is
|
||||
the unified environment interface. There is no interface for agents;
|
||||
that part is left to you. The following are the ``Env`` methods you
|
||||
should know:
|
||||
|
||||
- `reset(self)`: Reset the environment's state. Returns `observation`.
|
||||
- `step(self, action)`: Step the environment by one timestep. Returns `observation`, `action`, `reward`, `done`.
|
||||
- `render(self, mode='human', close=False)`: Render one frame of the environment. The default mode will do something human friendly, such as pop up a window. Passing the `close` flag signals the renderer to close any such windows.
|
||||
|
||||
Installation
|
||||
============
|
||||
|
||||
You can perform a minimal install of ``gym`` with:
|
||||
|
||||
.. code:: shell
|
||||
|
||||
git clone git@github.com:gym
|
||||
cd gym
|
||||
pip install -e .
|
||||
|
||||
You'll be able to run a few environments right away:
|
||||
|
||||
- `algorithmic <https://gym.openai.com/envs#algorithmic>`_
|
||||
- `toy_text <https://gym.openai.com/envs#toy_text>`_
|
||||
- `classic_control <https://gym.openai.com/envs#classic_control>`_ (you'll need ``pyglet`` to render though)
|
||||
|
||||
We recommend playing with those environments at first, and then later
|
||||
installing the dependencies for the remaining environments.
|
||||
|
||||
Installing everything
|
||||
---------------------
|
||||
|
||||
Once you're ready to install everything, run ``pip install -e .[all]``.
|
||||
|
||||
MuJoCo has a proprietary dependency we can't set up for you. Follow
|
||||
the
|
||||
`instructions <https://github.com/openai/mujoco-py#obtaining-the-binaries-and-license-key>`_
|
||||
in the ``mujoco-py`` package for help.
|
||||
|
||||
For the install to succeed, you'll need to have some system packages
|
||||
installed. We'll build out the list here over time; please let us know
|
||||
what you end up installing on your platform.
|
||||
|
||||
On Ubuntu 14.04:
|
||||
|
||||
.. code:: shell
|
||||
|
||||
apt-get install -y numpy python-dev cmake zlib1g-dev libjpeg-dev xvfb libav-tools xorg-dev python-opengl
|
||||
|
||||
Supported systems
|
||||
-----------------
|
||||
|
||||
We currenty support Python 2.7 on Linux and OSX.
|
||||
|
||||
We will expand support to Python 3 and Windows based on demand. We
|
||||
will also soon ship a Docker container exposing OpenAI Gym as an API
|
||||
callable from any platform.
|
||||
|
||||
Pip version
|
||||
-----------
|
||||
|
||||
To run ``pip install -e .[all]``, you'll need a semi-recent pip.
|
||||
Please make sure your pip is at least at version ``1.5.0``. You can
|
||||
upgrade using the following: ``pip install --ignore-installed
|
||||
pip``. Alternatively, you can open `setup.py
|
||||
<https://github.com/openai/gym/blob/master/setup.py>`_ and
|
||||
install the dependencies by hand.
|
||||
|
||||
Installing dependencies for specific environments
|
||||
-------------------------------------------------
|
||||
|
||||
If you'd like to install the dependencies for only specific
|
||||
environments, see `setup.py
|
||||
<https://github.com/openai/gym/blob/master/setup.py>`_. We
|
||||
maintain the lists of dependencies on a per-environment group basis.
|
||||
|
||||
Environments
|
||||
============
|
||||
|
||||
The code for each environment group is housed in its own subdirectory
|
||||
`gym/envs
|
||||
<https://github.com/openai/gym/blob/master/gym/envs>`_. The
|
||||
specification of each task is in `gym/envs/__init__.py
|
||||
<https://github.com/openai/gym/blob/master/gym/envs/__init__.py>`_. It's
|
||||
worth browsing through both.
|
||||
|
||||
Algorithmic
|
||||
-----------
|
||||
|
||||
These are a variety of algorithmic tasks, such as learning to copy a
|
||||
sequence.
|
||||
|
||||
.. code:: python
|
||||
|
||||
import gym
|
||||
env = gym.make('Copy-v0')
|
||||
env.reset()
|
||||
env.render()
|
||||
|
||||
Atari
|
||||
-----
|
||||
|
||||
The Atari environments are a variety of Atari video games. If you didn't do the full install, you can install dependencies via ``pip install -e .[atari]`` and then get started as follow:
|
||||
|
||||
.. code:: python
|
||||
|
||||
import gym
|
||||
env = gym.make('SpaceInvaders-v0')
|
||||
env.reset()
|
||||
env.render()
|
||||
|
||||
This will install ``atari-py``, which automatically compiles the `Arcade Learning Environment <http://www.arcadelearningenvironment.org/>`_. This can take quite a while (a few minutes on a decent laptop), so just be prepared.
|
||||
|
||||
Board games
|
||||
-----------
|
||||
|
||||
The board game environments are a variety of board games. If you didn't do the full install, you can install dependencies via ``pip install -e .[board_game]`` and then get started as follow:
|
||||
|
||||
.. code:: python
|
||||
|
||||
import gym
|
||||
env = gym.make('Go9x9-v0')
|
||||
env.reset()
|
||||
env.render()
|
||||
|
||||
Classic control
|
||||
---------------
|
||||
|
||||
These are a variety of classic control tasks, which would appear in a typical reinforcement learning textbook. If you didn't do the full install, you will need to run ``pip install -e .[classic_control]`` to enable rendering. You can get started with them via:
|
||||
|
||||
.. code:: python
|
||||
|
||||
import gym
|
||||
env = gym.make('CartPole-v0')
|
||||
env.reset()
|
||||
env.render()
|
||||
|
||||
MuJoCo
|
||||
------
|
||||
|
||||
`MuJoCo <http://www.mujoco.org/>`_ is a physics engine which can do
|
||||
very detailed efficient simulations with contacts. It's not
|
||||
open-source, so you'll have to follow the instructions in `mujoco-py
|
||||
<https://github.com/openai/mujoco-py#obtaining-the-binaries-and-license-key>`_
|
||||
to set it up. You'll have to also run ``pip install -e .[mujoco]`` if you didn't do the full install.
|
||||
|
||||
.. code:: python
|
||||
|
||||
import gym
|
||||
env = gym.make('Humanoid')
|
||||
env.reset()
|
||||
env.render()
|
||||
|
||||
Toy text
|
||||
--------
|
||||
|
||||
Toy environments which are text-based. There's no extra dependency to install, so to get started, you can just do:
|
||||
|
||||
.. code:: python
|
||||
|
||||
import gym
|
||||
env = gym.make('FrozenLake')
|
||||
env.reset()
|
||||
env.render()
|
||||
|
||||
Examples
|
||||
========
|
||||
|
||||
See the ``examples`` directory.
|
||||
|
||||
- Run `examples/agents/random_agent.py <https://github.com/openai/gym/blob/master/examples/agents/random_agent.py>`_ to run an simple random agent and upload the results to the scoreboard.
|
||||
- Run `examples/agents/cem.py <https://github.com/openai/gym/blob/master/examples/agents/cem.py>`_ to run an actual learning agent (using the cross-entropy method) and upload the results to the scoreboard.
|
||||
- Run `examples/scripts/list_envs <https://github.com/openai/gym/blob/master/examples/scripts/list_envs>`_ to generate a list of all environments. (You see also just `browse <https://gym.openai.com/docs>`_ the list on our site.
|
||||
- Run `examples/scripts/upload <https://github.com/openai/gym/blob/master/examples/scripts/upload>`_ to upload the recorded output from ``random_agent.py`` or ``cem.py``. Make sure to obtain an `API key <https://gym.openai.com/settings/profile>`_.
|
||||
|
||||
Testing
|
||||
=======
|
||||
|
||||
We are using `nose2 <https://github.com/nose-devs/nose2>`_ for tests. You can run them via
|
||||
|
||||
.. code:: shell
|
||||
|
||||
nose2
|
||||
|
||||
You can also run tests in a specific directory by using the ``-s`` option, or by passing in the specific name of the test. See the `nose2 docs <http://nose2.readthedocs.org/en/latest/usage.html#naming-tests>`_ for more details.
|
12
bin/docker_entrypoint
Executable file
12
bin/docker_entrypoint
Executable file
@@ -0,0 +1,12 @@
|
||||
#!/bin/sh
|
||||
|
||||
# This script is the entrypoint for our Docker image.
|
||||
|
||||
set -e
|
||||
|
||||
# Set up display; otherwise rendering will cause segfaults
|
||||
rm -f /tmp/.X12-lock
|
||||
Xvfb :12 -screen 0 800x600x24 +extension RANDR 2>/dev/null &
|
||||
export DISPLAY=:12
|
||||
|
||||
exec "$@"
|
19
examples/agents/_policies.py
Normal file
19
examples/agents/_policies.py
Normal file
@@ -0,0 +1,19 @@
|
||||
# Support code for cem.py
|
||||
|
||||
class BinaryActionLinearPolicy(object):
|
||||
def __init__(self, theta):
|
||||
self.w = theta[:-1]
|
||||
self.b = theta[-1]
|
||||
def act(self, ob):
|
||||
y = ob.dot(self.w) + self.b
|
||||
a = int(y < 0)
|
||||
return a
|
||||
|
||||
class ContinuousActionLinearPolicy(object):
|
||||
def __init__(self, theta, n_in, n_out):
|
||||
assert len(theta) == (n_in + 1) * n_out
|
||||
self.W = theta[0 : n_in * n_out].reshape(n_in, n_out)
|
||||
self.b = theta[n_in * n_out : None].reshape(1, n_out)
|
||||
def act(self, ob):
|
||||
a = ob.dot(self.W) + self.b
|
||||
return a
|
92
examples/agents/cem.py
Normal file
92
examples/agents/cem.py
Normal file
@@ -0,0 +1,92 @@
|
||||
import gym
|
||||
import logging
|
||||
import numpy as np
|
||||
import json, sys, cPickle, os
|
||||
from os import path
|
||||
from _policies import BinaryActionLinearPolicy # Different file so it can be unpickled
|
||||
import argparse
|
||||
|
||||
def cem(f, th_mean, batch_size, n_iter, elite_frac, initial_std=1.0):
|
||||
"""
|
||||
Generic implementation of the cross-entropy method for maximizing a black-box function
|
||||
|
||||
f: a function mapping from vector -> scalar
|
||||
th_mean: initial mean over input distribution
|
||||
batch_size: number of samples of theta to evaluate per batch
|
||||
n_iter: number of batches
|
||||
elite_frac: each batch, select this fraction of the top-performing samples
|
||||
initial_std: initial standard deviation over parameter vectors
|
||||
"""
|
||||
n_elite = int(np.round(batch_size*elite_frac))
|
||||
th_std = np.ones_like(th_mean) * initial_std
|
||||
|
||||
for _ in xrange(n_iter):
|
||||
ths = np.array([th_mean + dth for dth in th_std[None,:]*np.random.randn(batch_size, th_mean.size)])
|
||||
ys = np.array([f(th) for th in ths])
|
||||
elite_inds = ys.argsort()[::-1][:n_elite]
|
||||
elite_ths = ths[elite_inds]
|
||||
th_mean = elite_ths.mean(axis=0)
|
||||
th_std = elite_ths.std(axis=0)
|
||||
yield {'ys' : ys, 'theta_mean' : th_mean, 'y_mean' : ys.mean()}
|
||||
|
||||
def do_rollout(agent, env, num_steps, render=False):
|
||||
total_rew = 0
|
||||
ob = env.reset()
|
||||
for t in xrange(num_steps):
|
||||
a = agent.act(ob)
|
||||
(ob, reward, done, _info) = env.step(a)
|
||||
total_rew += reward
|
||||
if render and t%3==0: env.render()
|
||||
if done: break
|
||||
return total_rew, t+1
|
||||
|
||||
if __name__ == '__main__':
|
||||
logger = logging.getLogger()
|
||||
logger.setLevel(logging.INFO)
|
||||
|
||||
parser = argparse.ArgumentParser()
|
||||
parser.add_argument('--display', action='store_true')
|
||||
args = parser.parse_args()
|
||||
|
||||
np.random.seed(0)
|
||||
env = gym.make('CartPole-v0')
|
||||
params = dict(n_iter=10, batch_size=25, elite_frac = 0.2)
|
||||
num_steps = 200
|
||||
|
||||
# You provide the directory to write to (can be an existing
|
||||
# directory, but can't contain previous monitor results. You can
|
||||
# also dump to a tempdir if you'd like: tempfile.mkdtemp().
|
||||
outdir = '/tmp/cem-agent-results'
|
||||
env.monitor.start(outdir, force=True)
|
||||
|
||||
# Prepare snapshotting
|
||||
# ----------------------------------------
|
||||
def writefile(fname, s):
|
||||
with open(path.join(outdir, fname), 'w') as fh: fh.write(s)
|
||||
info = {}
|
||||
info['params'] = params
|
||||
info['argv'] = sys.argv
|
||||
info['env_id'] = env.spec.id
|
||||
# ------------------------------------------
|
||||
|
||||
def noisy_evaluation(theta):
|
||||
agent = BinaryActionLinearPolicy(theta)
|
||||
rew, T = do_rollout(agent, env, num_steps)
|
||||
return rew
|
||||
|
||||
# Train the agent, and snapshot each stage
|
||||
for (i, iterdata) in enumerate(
|
||||
cem(noisy_evaluation, np.zeros(env.observation_space.shape[0]+1), **params)):
|
||||
print 'Iteration %2i. Episode mean reward: %7.3f'%(i, iterdata['y_mean'])
|
||||
agent = BinaryActionLinearPolicy(iterdata['theta_mean'])
|
||||
if args.display: do_rollout(agent, env, 200, render=True)
|
||||
writefile('agent-%.4i.pkl'%i, cPickle.dumps(agent, -1))
|
||||
|
||||
# Write out the env at the end so we store the parameters of this
|
||||
# environment.
|
||||
writefile('info.json', json.dumps(info))
|
||||
|
||||
env.monitor.close()
|
||||
|
||||
logger.info("Successfully ran RandomAgent. Now trying to upload results to the scoreboard. If it breaks, you can always just try re-uploading the same results.")
|
||||
gym.upload(outdir, algorithm_id='cem')
|
50
examples/agents/random_agent.py
Normal file
50
examples/agents/random_agent.py
Normal file
@@ -0,0 +1,50 @@
|
||||
import logging
|
||||
import os
|
||||
|
||||
import gym
|
||||
|
||||
# The world's simplest agent!
|
||||
class RandomAgent(object):
|
||||
def __init__(self, action_space):
|
||||
self.action_space = action_space
|
||||
|
||||
def act(self, observation, reward, done):
|
||||
return self.action_space.sample()
|
||||
|
||||
if __name__ == '__main__':
|
||||
# You can optionally set up the logger. Also fine to set the level
|
||||
# to logging.DEBUG or logging.WARN if you want to change the
|
||||
# amount of outut.
|
||||
logger = logging.getLogger()
|
||||
logger.setLevel(logging.INFO)
|
||||
|
||||
env = gym.make('CartPole-v0')
|
||||
agent = RandomAgent(env.action_space)
|
||||
|
||||
# You provide the directory to write to (can be an existing
|
||||
# directory, but can't contain previous monitor results. You can
|
||||
# also dump to a tempdir if you'd like: tempfile.mkdtemp().
|
||||
outdir = '/tmp/random-agent-results'
|
||||
env.monitor.start(outdir, force=True)
|
||||
|
||||
episode_count = 200
|
||||
max_steps = 100
|
||||
reward = 0
|
||||
done = False
|
||||
|
||||
for i in xrange(episode_count):
|
||||
ob = env.reset()
|
||||
|
||||
for j in xrange(max_steps):
|
||||
action = agent.act(ob, reward, done)
|
||||
ob, reward, done, _ = env.step(action)
|
||||
if done:
|
||||
break
|
||||
|
||||
# Dump result info to disk
|
||||
env.monitor.close()
|
||||
|
||||
# Upload to the scoreboard. We could also do this from another
|
||||
# process if we wanted.
|
||||
logger.info("Successfully ran RandomAgent. Now trying to upload results to the scoreboard. If it breaks, you can always just try re-uploading the same results.")
|
||||
gym.upload(outdir, algorithm_id='random')
|
44
examples/agents/tabular_q_agent.py
Normal file
44
examples/agents/tabular_q_agent.py
Normal file
@@ -0,0 +1,44 @@
|
||||
class TabularQAgent(object):
|
||||
"""
|
||||
Agent implementing tabular Q-learning.
|
||||
"""
|
||||
|
||||
def __init__(self, observation_space, action_space, **userconfig):
|
||||
if not isinstance(observation_space, discrete.Discrete):
|
||||
raise UnsupportedSpace('Observation space {} incompatible with {}. (Only supports Discrete observation spaces.)'.format(observation_space, self))
|
||||
if not isinstance(action_space, discrete.Discrete):
|
||||
raise UnsupportedSpace('Action space {} incompatible with {}. (Only supports Discrete action spaces.)'.format(action_space, self))
|
||||
self.observation_space = observation_space
|
||||
self.action_space = action_space
|
||||
self.action_n = action_space.n
|
||||
self.config = {
|
||||
"init_mean" : 0.0, # Initialize Q values with this mean
|
||||
"init_std" : 0.0, # Initialize Q values with this standard deviation
|
||||
"learning_rate" : 0.1,
|
||||
"eps": 0.05, # Epsilon in epsilon greedy policies
|
||||
"discount": 0.95,
|
||||
"n_iter": 10000} # Number of iterations
|
||||
self.config.update(userconfig)
|
||||
self.q = defaultdict(lambda: self.config["init_std"] * np.random.randn(self.action_n) + self.config["init_mean"])
|
||||
|
||||
def act(self, observation, eps=None):
|
||||
if eps is None:
|
||||
eps = self.config["eps"]
|
||||
# epsilon greedy.
|
||||
action = np.argmax(self.q[observation.item()]) if np.random.random() > eps else self.action_space.sample()
|
||||
return action
|
||||
|
||||
def learn(self, env):
|
||||
config = self.config
|
||||
obs = env.reset()
|
||||
q = self.q
|
||||
for t in xrange(config["n_iter"]):
|
||||
action, _ = self.act(obs)
|
||||
obs2, reward, done, _ = env.step(action)
|
||||
future = 0.0
|
||||
if not done:
|
||||
future = np.max(q[obs2.item()])
|
||||
q[obs.item()][action] -= \
|
||||
self.config["learning_rate"] * (q[obs.item()][action] - reward - config["discount"] * future)
|
||||
|
||||
obs = obs2
|
5
examples/scripts/list_envs
Executable file
5
examples/scripts/list_envs
Executable file
@@ -0,0 +1,5 @@
|
||||
#!/usr/bin/env python
|
||||
from gym import envs
|
||||
envids = [spec.id for spec in envs.registry.all()]
|
||||
for envid in sorted(envids):
|
||||
print(envid)
|
35
examples/scripts/play_go
Executable file
35
examples/scripts/play_go
Executable file
@@ -0,0 +1,35 @@
|
||||
#!/usr/bin/env python
|
||||
import argparse
|
||||
import pachi_py
|
||||
import gym
|
||||
from gym import spaces, envs
|
||||
from gym.envs.board_game import go
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser()
|
||||
parser.add_argument('--raw_actions', action='store_true')
|
||||
args = parser.parse_args()
|
||||
|
||||
env = envs.make('Go9x9-v0')
|
||||
env.reset()
|
||||
while True:
|
||||
s = env._state
|
||||
env._render()
|
||||
|
||||
colorstr = pachi_py.color_to_str(s.color)
|
||||
if args.raw_actions:
|
||||
a = int(raw_input('{} (raw)> '.format(colorstr)))
|
||||
else:
|
||||
coordstr = raw_input('{}> '.format(colorstr))
|
||||
a = go.str_to_action(s.board, coordstr)
|
||||
|
||||
_, r, done, _ = env.step(a)
|
||||
if done:
|
||||
break
|
||||
|
||||
print
|
||||
print 'You win!' if r > 0 else 'Opponent wins!'
|
||||
print 'Final score:', env._state.board.official_score
|
||||
|
||||
if __name__ == '__main__':
|
||||
main()
|
69
examples/scripts/sim_env
Executable file
69
examples/scripts/sim_env
Executable file
@@ -0,0 +1,69 @@
|
||||
#!/usr/bin/env python
|
||||
import gym
|
||||
from gym import spaces, envs
|
||||
import argparse
|
||||
import numpy as np
|
||||
import itertools
|
||||
import time
|
||||
|
||||
parser = argparse.ArgumentParser()
|
||||
parser.add_argument("env")
|
||||
parser.add_argument("--mode", choices=["noop", "random", "static", "human"],
|
||||
default="random")
|
||||
parser.add_argument("--max_steps", type=int, default=0)
|
||||
parser.add_argument("--fps",type=float)
|
||||
parser.add_argument("--once", action="store_true")
|
||||
parser.add_argument("--ignore_done", action="store_true")
|
||||
args = parser.parse_args()
|
||||
|
||||
env = envs.make(args.env)
|
||||
ac_space = env.action_space
|
||||
|
||||
fps = args.fps or env.metadata.get('video.frames_per_second') or 100
|
||||
if args.max_steps == 0: args.max_steps = env.spec.timestep_limit
|
||||
|
||||
if args.mode == "human":
|
||||
if isinstance(ac_space, spaces.Discrete):
|
||||
print("Press keys 0-{} to choose the agent's actions".format(ac_space.n-1))
|
||||
import cv2
|
||||
else:
|
||||
raise ValueError("Can only use human on discrete action space. Got {}".format(type(ac_space)))
|
||||
|
||||
while True:
|
||||
env.reset()
|
||||
print("Starting a new trajectory")
|
||||
for t in xrange(args.max_steps) if args.max_steps else itertools.count():
|
||||
done = False
|
||||
if args.mode == "noop":
|
||||
if isinstance(ac_space, spaces.Box):
|
||||
a = np.zeros(ac_space.shape)
|
||||
elif isinstance(ac_space, spaces.Discrete):
|
||||
a = 0
|
||||
else:
|
||||
raise NotImplementedError("noop not implemented for class {}".format(type(ac_space)))
|
||||
_, _, done, _ = env.step(a)
|
||||
time.sleep(1.0/fps)
|
||||
elif args.mode == "random":
|
||||
a = ac_space.sample()
|
||||
_, _, done, _ = env.step(a)
|
||||
time.sleep(1.0/fps)
|
||||
elif args.mode == "static":
|
||||
time.sleep(1.0/fps)
|
||||
elif args.mode == "human":
|
||||
if t == 0:
|
||||
a = 0
|
||||
else:
|
||||
key = cv2.waitKey(-1)
|
||||
a = key - ord('0')
|
||||
if a >= ac_space.n:
|
||||
print("WARNING: ignoring illegal action {}.".format(a))
|
||||
a = 0
|
||||
_, _, done, _ = env.step(a)
|
||||
|
||||
env.render()
|
||||
if done and not args.ignore_done: break
|
||||
print("Done after {} steps".format(t+1))
|
||||
if args.once:
|
||||
break
|
||||
else:
|
||||
raw_input("Press enter to continue")
|
44
examples/scripts/upload
Executable file
44
examples/scripts/upload
Executable file
@@ -0,0 +1,44 @@
|
||||
#!/usr/bin/env python
|
||||
#
|
||||
# This script assumes you have set an OPENAI_GYM_API_KEY environment
|
||||
# variable. You can find your API key in the web interface:
|
||||
# https://gym.openai.com/settings/profile.
|
||||
import argparse
|
||||
import logging
|
||||
import os
|
||||
import sys
|
||||
|
||||
import gym
|
||||
|
||||
# In modules, use `logger = logging.getLogger(__name__)`
|
||||
logger = logging.getLogger()
|
||||
|
||||
class Uploader(object):
|
||||
def __init__(self, training_dir, algorithm_id, writeup):
|
||||
self.training_dir = training_dir
|
||||
self.algorithm_id = algorithm_id
|
||||
self.writeup = writeup
|
||||
|
||||
def run(self):
|
||||
gym.upload(self.training_dir, algorithm_id=self.algorithm_id, writeup=self.writeup)
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(description=None)
|
||||
parser.add_argument('-t', '--training-dir', required=True, help='What directory to upload.')
|
||||
parser.add_argument('-a', '--algorithm_id', help='Set the algorithm id.')
|
||||
parser.add_argument('-w', '--writeup', help='Writeup to attach.')
|
||||
parser.add_argument('-v', '--verbose', action='count', dest='verbosity', default=0, help='Set verbosity.')
|
||||
args = parser.parse_args()
|
||||
|
||||
if args.verbosity == 0:
|
||||
logger.setLevel(logging.INFO)
|
||||
elif args.verbosity >= 1:
|
||||
logger.setLevel(logging.DEBUG)
|
||||
|
||||
runner = Uploader(training_dir=args.training_dir, algorithm_id=args.algorithm_id, writeup=args.writeup)
|
||||
runner.run()
|
||||
|
||||
return 0
|
||||
|
||||
if __name__ == '__main__':
|
||||
sys.exit(main())
|
16
gym/__init__.py
Normal file
16
gym/__init__.py
Normal file
@@ -0,0 +1,16 @@
|
||||
import logging
|
||||
import sys
|
||||
|
||||
from gym.core import Env, Space
|
||||
from gym.configuration import logger_setup, undo_logger_setup
|
||||
from gym.envs import make, spec
|
||||
from gym.scoreboard.api import upload
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# We automatically configure a logger with a simple stderr handler. If
|
||||
# you'd rather customize logging yourself, run undo_logger_setup.
|
||||
logger_setup(logger)
|
||||
del logger_setup
|
||||
|
||||
__all__ = ["Env", "Space", "make", "spec", "upload"]
|
87
gym/configuration.py
Normal file
87
gym/configuration.py
Normal file
@@ -0,0 +1,87 @@
|
||||
import hashlib
|
||||
import numpy as np
|
||||
import logging
|
||||
import os
|
||||
import random
|
||||
import struct
|
||||
import sys
|
||||
|
||||
import gym
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
root_logger = logging.getLogger()
|
||||
requests_logger = logging.getLogger('requests')
|
||||
|
||||
# Set up the default handler
|
||||
formatter = logging.Formatter('[%(asctime)s] %(message)s')
|
||||
handler = logging.StreamHandler(sys.stderr)
|
||||
handler.setFormatter(formatter)
|
||||
|
||||
# We need to take in the gym logger explicitly since this is called
|
||||
# at initialization time.
|
||||
def logger_setup(gym_logger):
|
||||
root_logger.addHandler(handler)
|
||||
gym_logger.setLevel(logging.INFO)
|
||||
# When set to INFO, this will print out the hostname of every
|
||||
# connection it makes.
|
||||
# requests_logger.setLevel(logging.WARN)
|
||||
|
||||
def undo_logger_setup():
|
||||
"""Undoes the automatic logging setup done by OpenAI Gym. You should call
|
||||
this function if you want to manually configure logging
|
||||
yourself. Typical usage would involve putting something like the
|
||||
following at the top of your script:
|
||||
|
||||
gym.undo_logger_setup()
|
||||
logger = logging.getLogger()
|
||||
logger.addHandler(logging.StreamHandler(sys.stderr))
|
||||
"""
|
||||
root_logger.removeHandler(handler)
|
||||
gym.logger.setLevel(logging.NOTSET)
|
||||
requests_logger.setLevel(logging.NOTSET)
|
||||
|
||||
def seed(a=None):
|
||||
"""Seeds the 'random' and 'numpy.random' generators. By default,
|
||||
Python seeds these with the system time. Call this if you are
|
||||
using multiple processes.
|
||||
|
||||
Notes:
|
||||
SECURITY SENSITIVE: a bug here would allow people to generate fake results. Please let us know if you find one :).
|
||||
|
||||
Args:
|
||||
a (Optional[int, str]): None or no argument seeds from an operating system specific randomness source. If an int or str passed, then all of bits are used.
|
||||
"""
|
||||
# Adapted from https://svn.python.org/projects/python/tags/r32/Lib/random.py
|
||||
if a is None:
|
||||
a = bigint_from_bytes(os.urandom(32))
|
||||
|
||||
if isinstance(a, str):
|
||||
a = a.encode('utf8')
|
||||
a += hashlib.sha512(a).digest()
|
||||
a = bigint_from_bytes(a)
|
||||
|
||||
# Actually seed the generators
|
||||
random.seed(a)
|
||||
np.random.seed(int_list_from_bigint(a))
|
||||
|
||||
return a
|
||||
|
||||
# TODO: don't hardcode sizeof_int here
|
||||
def bigint_from_bytes(bytes):
|
||||
sizeof_int = 4
|
||||
padding = sizeof_int - len(bytes) % sizeof_int
|
||||
bytes += '\0' * padding
|
||||
int_count = len(bytes) / sizeof_int
|
||||
unpacked = struct.unpack("{}I".format(int_count), bytes)
|
||||
accum = 0
|
||||
for i, val in enumerate(unpacked):
|
||||
accum += 2 ** (sizeof_int * 8 * i) * val
|
||||
return accum
|
||||
|
||||
def int_list_from_bigint(bigint):
|
||||
ints = []
|
||||
while bigint > 0:
|
||||
bigint, mod = divmod(bigint, 2 ** 32)
|
||||
ints.append(mod)
|
||||
return ints
|
173
gym/core.py
Normal file
173
gym/core.py
Normal file
@@ -0,0 +1,173 @@
|
||||
import logging
|
||||
import numpy as np
|
||||
|
||||
from gym import error, monitoring
|
||||
|
||||
# Env-related abstractions
|
||||
|
||||
class Env(object):
|
||||
"""The main OpenAI Gym class. It encapsulates an environment with
|
||||
arbitrary behind-the-scenes dynamics.
|
||||
|
||||
When implementing an environment, override the following methods
|
||||
in your subclass:
|
||||
|
||||
_step
|
||||
_reset
|
||||
_render
|
||||
|
||||
And set the following attributes:
|
||||
|
||||
action_space: The Space object corresponding to valid actions
|
||||
observation_space: The Space object corresponding to valid observations
|
||||
|
||||
The methods are accessed publicly as "step", "reset", etc.. The
|
||||
non-underscored versions are wrapper methods to which we may add
|
||||
functionality to over time.
|
||||
"""
|
||||
|
||||
# Set this in SOME subclasses
|
||||
metadata = {'render.modes': []}
|
||||
|
||||
# Set these in ALL subclasses
|
||||
action_space = None
|
||||
observation_space = None
|
||||
|
||||
# Override in ALL subclasses
|
||||
def _step(self, action): raise NotImplementedError
|
||||
def _reset(self): raise NotImplementedError
|
||||
def _render(self, mode='human', close=False):
|
||||
if close:
|
||||
return
|
||||
raise NotImplementedError
|
||||
|
||||
# Will be automatically set when creating an environment via
|
||||
# 'make'.
|
||||
spec = None
|
||||
|
||||
@property
|
||||
def monitor(self):
|
||||
if not hasattr(self, '_monitor'):
|
||||
self._monitor = monitoring.Monitor(self)
|
||||
return self._monitor
|
||||
|
||||
def step(self, action):
|
||||
"""
|
||||
Run one timestep of the environment's dynamics. When end of episode
|
||||
is reached, the environment will automatically reset its internal state.
|
||||
|
||||
Input
|
||||
-----
|
||||
action : an action provided by the environment
|
||||
|
||||
Outputs
|
||||
-------
|
||||
(observation, reward, done, info)
|
||||
|
||||
observation (object): agent's observation of the current environment
|
||||
reward (float) : amount of reward due to the previous action
|
||||
done (boolean): whether the episode has ended, in which case further step() calls will return undefined results
|
||||
info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
|
||||
"""
|
||||
self.monitor._before_step(action)
|
||||
observation, reward, done, info = self._step(action)
|
||||
done = self.monitor._after_step(observation, reward, done, info)
|
||||
return observation, reward, done, info
|
||||
|
||||
def reset(self):
|
||||
"""
|
||||
Resets the state of the environment and returns an initial observation.
|
||||
|
||||
Outputs
|
||||
-------
|
||||
observation (object): the initial observation of the space. (Initial reward is assumed to be 0.)
|
||||
"""
|
||||
self.monitor._before_reset()
|
||||
observation = self._reset()
|
||||
self.monitor._after_reset(observation)
|
||||
return observation
|
||||
|
||||
def render(self, mode='human', close=False):
|
||||
"""Renders the environment.
|
||||
|
||||
The set of supported modes varies per environment. (And some
|
||||
environments do not support rendering at all.) By convention,
|
||||
if mode is:
|
||||
|
||||
- human: render to the current display or terminal and
|
||||
return nothing. Usually for human consumption.
|
||||
- rgb_array: Return an numpy.ndarray with shape (x, y, 3),
|
||||
representing RGB values for an x-by-y pixel image, suitable
|
||||
for turning into a video.
|
||||
- ansi: Return a string (str) or StringIO.StringIO containing a
|
||||
terminal-style text representation. The text can include newlines
|
||||
and ANSI escape sequences (e.g. for colors).
|
||||
|
||||
Note:
|
||||
Make sure that your class's metadata 'render.modes' key includes
|
||||
the list of supported modes. It's recommended to call super()
|
||||
in implementations to use the functionality of this method.
|
||||
|
||||
Args:
|
||||
mode (str): the mode to render with
|
||||
close (bool): close all open renderings
|
||||
|
||||
Example:
|
||||
|
||||
class MyEnv(Env):
|
||||
metadata = {'render.modes': ['human', 'rgb_array']}
|
||||
|
||||
def render(self, mode='human'):
|
||||
if mode == 'rgb_array':
|
||||
return np.array(...) # return RGB frame suitable for video
|
||||
elif mode is 'human':
|
||||
... # pop up a window and render
|
||||
else:
|
||||
super(MyEnv, self).render(mode=mode) # just raise an exception
|
||||
"""
|
||||
if close:
|
||||
return self._render(close=close)
|
||||
|
||||
# This code can be useful for calling super() in a subclass.
|
||||
modes = self.metadata.get('render.modes', [])
|
||||
if len(modes) == 0:
|
||||
raise error.UnsupportedMode('{} does not support rendering (requested mode: {})'.format(self, mode))
|
||||
elif mode not in modes:
|
||||
raise error.UnsupportedMode('Unsupported rendering mode: {}. (Supported modes for {}: {})'.format(mode, self, modes))
|
||||
|
||||
return self._render(mode=mode, close=close)
|
||||
|
||||
def __str__(self):
|
||||
return '<{} instance>'.format(type(self).__name__)
|
||||
|
||||
# Space-related abstractions
|
||||
|
||||
class Space(object):
|
||||
"""
|
||||
Provides a classification state spaces and action spaces,
|
||||
so you can write generic code that applies to any Environment.
|
||||
E.g. to choose a random action.
|
||||
"""
|
||||
|
||||
def sample(self, seed=0):
|
||||
"""
|
||||
Uniformly randomly sample a random elemnt of this space
|
||||
"""
|
||||
raise NotImplementedError
|
||||
|
||||
def contains(self, x):
|
||||
"""
|
||||
Return boolean specifying if x is a valid
|
||||
member of this space
|
||||
"""
|
||||
raise NotImplementedError
|
||||
|
||||
def to_jsonable(self, sample_n):
|
||||
"""Convert a batch of samples from this space to a JSONable data type."""
|
||||
# By default, assume identity is JSONable
|
||||
return sample_n
|
||||
|
||||
def from_jsonable(self, sample_n):
|
||||
"""Convert a JSONable data type to a batch of samples from this space."""
|
||||
# By default, assume identity is JSONable
|
||||
return sample_n
|
20
gym/envs/README.md
Normal file
20
gym/envs/README.md
Normal file
@@ -0,0 +1,20 @@
|
||||
# Envs
|
||||
|
||||
These are the core integrated environments. Note that we may later
|
||||
restructure any of the files, but will keep the environments available
|
||||
at the relevant package's top-level. So for example, you should access
|
||||
`AntEnv` as follows:
|
||||
|
||||
```
|
||||
# Will be supported in future releases
|
||||
from gym.envs import mujoco
|
||||
mujoco.AntEnv
|
||||
```
|
||||
|
||||
Rather than:
|
||||
|
||||
```
|
||||
# May break in future releases
|
||||
from gym.envs.mujoco import ant
|
||||
ant.AntEnv
|
||||
```
|
208
gym/envs/__init__.py
Normal file
208
gym/envs/__init__.py
Normal file
@@ -0,0 +1,208 @@
|
||||
from gym.envs.registration import registry, register, make, spec
|
||||
|
||||
# Algorithmic
|
||||
# ----------------------------------------
|
||||
|
||||
register(
|
||||
id='Copy-v0',
|
||||
entry_point='gym.envs.algorithmic:CopyEnv',
|
||||
timestep_limit=200,
|
||||
reward_threshold=25.0,
|
||||
)
|
||||
|
||||
register(
|
||||
id='RepeatCopy-v0',
|
||||
entry_point='gym.envs.algorithmic:RepeatCopyEnv',
|
||||
timestep_limit=200,
|
||||
reward_threshold=75.0,
|
||||
)
|
||||
|
||||
register(
|
||||
id='ReversedAddition-v0',
|
||||
entry_point='gym.envs.algorithmic:ReversedAdditionEnv',
|
||||
kwargs={'rows' : 2},
|
||||
timestep_limit=200,
|
||||
reward_threshold=25.0,
|
||||
)
|
||||
|
||||
register(
|
||||
id='ReversedAddition3-v0',
|
||||
entry_point='gym.envs.algorithmic:ReversedAdditionEnv',
|
||||
kwargs={'rows' : 3},
|
||||
timestep_limit=200,
|
||||
reward_threshold=25.0,
|
||||
)
|
||||
|
||||
register(
|
||||
id='DuplicatedInput-v0',
|
||||
entry_point='gym.envs.algorithmic:DuplicatedInputEnv',
|
||||
timestep_limit=200,
|
||||
reward_threshold=9.0,
|
||||
)
|
||||
|
||||
register(
|
||||
id='Reverse-v0',
|
||||
entry_point='gym.envs.algorithmic:ReverseEnv',
|
||||
timestep_limit=200,
|
||||
reward_threshold=25.0,
|
||||
)
|
||||
|
||||
# Classic
|
||||
# ----------------------------------------
|
||||
|
||||
register(
|
||||
id='CartPole-v0',
|
||||
entry_point='gym.envs.classic_control:CartPoleEnv',
|
||||
timestep_limit=200,
|
||||
reward_threshold=195,
|
||||
)
|
||||
|
||||
register(
|
||||
id='MountainCar-v0',
|
||||
entry_point='gym.envs.classic_control:MountainCarEnv',
|
||||
timestep_limit=200,
|
||||
)
|
||||
|
||||
register(
|
||||
id='Pendulum-v0',
|
||||
entry_point='gym.envs.classic_control:PendulumEnv',
|
||||
timestep_limit=200,
|
||||
)
|
||||
|
||||
register(
|
||||
id='Acrobot-v0',
|
||||
entry_point='gym.envs.classic_control:AcrobotEnv',
|
||||
timestep_limit=200,
|
||||
)
|
||||
|
||||
# Toy Text
|
||||
# ----------------------------------------
|
||||
|
||||
register(
|
||||
id='FrozenLake-v0',
|
||||
entry_point='gym.envs.toy_text:FrozenLakeEnv',
|
||||
kwargs={'map_name' : '4x4'},
|
||||
timestep_limit=100,
|
||||
)
|
||||
|
||||
register(
|
||||
id='FrozenLake8x8-v0',
|
||||
entry_point='gym.envs.toy_text:FrozenLakeEnv',
|
||||
kwargs={'map_name' : '8x8'},
|
||||
timestep_limit=200,
|
||||
)
|
||||
|
||||
register(
|
||||
id='Roulette-v0',
|
||||
entry_point='gym.envs.toy_text:RouletteEnv',
|
||||
timestep_limit=100,
|
||||
)
|
||||
|
||||
register(
|
||||
id='Taxi-v0',
|
||||
entry_point='gym.envs.toy_text.taxi:TaxiEnv',
|
||||
timestep_limit=200,
|
||||
)
|
||||
|
||||
# Mujoco
|
||||
# ----------------------------------------
|
||||
|
||||
# 2D
|
||||
|
||||
register(
|
||||
id='Reacher-v0',
|
||||
entry_point='gym.envs.mujoco:ReacherEnv',
|
||||
timestep_limit=50
|
||||
)
|
||||
|
||||
register(
|
||||
id='InvertedPendulum-v0',
|
||||
entry_point='gym.envs.mujoco:InvertedPendulumEnv',
|
||||
)
|
||||
|
||||
register(
|
||||
id='InvertedDoublePendulum-v0',
|
||||
entry_point='gym.envs.mujoco:InvertedDoublePendulumEnv',
|
||||
)
|
||||
|
||||
register(
|
||||
id='HalfCheetah-v0',
|
||||
entry_point='gym.envs.mujoco:HalfCheetahEnv',
|
||||
)
|
||||
|
||||
register(
|
||||
id='Hopper-v0',
|
||||
entry_point='gym.envs.mujoco:HopperEnv',
|
||||
)
|
||||
|
||||
register(
|
||||
id='Swimmer-v0',
|
||||
entry_point='gym.envs.mujoco:SwimmerEnv',
|
||||
)
|
||||
|
||||
register(
|
||||
id='Walker2d-v0',
|
||||
entry_point='gym.envs.mujoco:Walker2dEnv',
|
||||
)
|
||||
|
||||
register(
|
||||
id='Ant-v0',
|
||||
entry_point='gym.envs.mujoco:AntEnv',
|
||||
)
|
||||
|
||||
register(
|
||||
id='Humanoid-v0',
|
||||
entry_point='gym.envs.mujoco:HumanoidEnv',
|
||||
)
|
||||
|
||||
# Atari
|
||||
# ----------------------------------------
|
||||
|
||||
# # print ', '.join(["'{}'".format(name.split('.')[0]) for name in atari_py.list_games()])
|
||||
for game in ['air_raid', 'alien', 'amidar', 'assault', 'asterix', 'asteroids', 'atlantis',
|
||||
'bank_heist', 'battle_zone', 'beam_rider', 'berzerk', 'bowling', 'boxing', 'breakout', 'carnival',
|
||||
'centipede', 'chopper_command', 'crazy_climber', 'demon_attack', 'double_dunk',
|
||||
'elevator_action', 'enduro', 'fishing_derby', 'freeway', 'frostbite', 'gopher', 'gravitar',
|
||||
'ice_hockey', 'jamesbond', 'journey_escape', 'kangaroo', 'krull', 'kung_fu_master',
|
||||
'montezuma_revenge', 'ms_pacman', 'name_this_game', 'phoenix', 'pitfall', 'pong', 'pooyan',
|
||||
'private_eye', 'qbert', 'riverraid', 'road_runner', 'robotank', 'seaquest', 'skiing',
|
||||
'solaris', 'space_invaders', 'star_gunner', 'tennis', 'time_pilot', 'tutankham', 'up_n_down',
|
||||
'venture', 'video_pinball', 'wizard_of_wor', 'yars_revenge', 'zaxxon']:
|
||||
for obs_type in ['image', 'ram']:
|
||||
# space_invaders should yield SpaceInvaders-v0 and SpaceInvaders-ram-v0
|
||||
name = ''.join([g.capitalize() for g in game.split('_')])
|
||||
if obs_type == 'ram':
|
||||
name = '{}-ram'.format(name)
|
||||
register(
|
||||
id='{}-v0'.format(name),
|
||||
entry_point='gym.envs.atari:AtariEnv',
|
||||
kwargs={'game': game, 'obs_type': obs_type},
|
||||
timestep_limit=10000,
|
||||
)
|
||||
|
||||
# Board games
|
||||
# ----------------------------------------
|
||||
|
||||
register(
|
||||
id='Go9x9-v0',
|
||||
entry_point='gym.envs.board_game:GoEnv',
|
||||
kwargs={
|
||||
'player_color': 'black',
|
||||
'opponent': 'pachi:uct:_2400',
|
||||
'observation_type': 'image3c',
|
||||
'illegal_move_mode': 'lose',
|
||||
'board_size': 9,
|
||||
},
|
||||
)
|
||||
|
||||
register(
|
||||
id='Go19x19-v0',
|
||||
entry_point='gym.envs.board_game:GoEnv',
|
||||
kwargs={
|
||||
'player_color': 'black',
|
||||
'opponent': 'pachi:uct:_2400',
|
||||
'observation_type': 'image3c',
|
||||
'illegal_move_mode': 'lose',
|
||||
'board_size': 19,
|
||||
},
|
||||
)
|
3
gym/envs/algorithmic/README.md
Normal file
3
gym/envs/algorithmic/README.md
Normal file
@@ -0,0 +1,3 @@
|
||||
# Algorithmic tasks
|
||||
|
||||
Not yet ready for prime-time. We'll shore these up soon.
|
5
gym/envs/algorithmic/__init__.py
Normal file
5
gym/envs/algorithmic/__init__.py
Normal file
@@ -0,0 +1,5 @@
|
||||
from gym.envs.algorithmic.copy import CopyEnv
|
||||
from gym.envs.algorithmic.repeat_copy import RepeatCopyEnv
|
||||
from gym.envs.algorithmic.duplicated_input import DuplicatedInputEnv
|
||||
from gym.envs.algorithmic.reverse import ReverseEnv
|
||||
from gym.envs.algorithmic.reversed_addition import ReversedAdditionEnv
|
203
gym/envs/algorithmic/algorithmic_env.py
Normal file
203
gym/envs/algorithmic/algorithmic_env.py
Normal file
@@ -0,0 +1,203 @@
|
||||
from gym import Env
|
||||
from gym.spaces import Discrete, Tuple
|
||||
from gym.utils import colorize
|
||||
import numpy as np
|
||||
import random
|
||||
import StringIO
|
||||
import sys
|
||||
import math
|
||||
|
||||
hash_base = None
|
||||
def ha(array):
|
||||
return (hash_base * (array + 5)).sum()
|
||||
|
||||
class AlgorithmicEnv(Env):
|
||||
|
||||
metadata = {'render.modes': ['human', 'ansi']}
|
||||
|
||||
def __init__(self, inp_dim=1, base=10, chars=False):
|
||||
global hash_base
|
||||
hash_base = 50 ** np.arange(inp_dim)
|
||||
self.base = base
|
||||
self.last = 10
|
||||
self.total_reward = 0
|
||||
self.sum_reward = 0
|
||||
AlgorithmicEnv.sum_rewards = []
|
||||
self.chars = chars
|
||||
self.inp_dim = inp_dim
|
||||
AlgorithmicEnv.current_length = 2
|
||||
tape_control = []
|
||||
self.action_space = Tuple(([Discrete(2 * inp_dim), Discrete(2), Discrete(self.base)]))
|
||||
self.observation_space = Discrete(self.base + 1)
|
||||
self.reset()
|
||||
|
||||
def _get_obs(self, pos=None):
|
||||
if pos is None:
|
||||
pos = self.x
|
||||
assert(isinstance(pos, np.ndarray) and pos.shape[0] == self.inp_dim)
|
||||
if ha(pos) not in self.content:
|
||||
self.content[ha(pos)] = self.base
|
||||
return self.content[ha(pos)]
|
||||
|
||||
def _get_str_obs(self, pos=None):
|
||||
ret = self._get_obs(pos)
|
||||
if ret == self.base:
|
||||
return " "
|
||||
else:
|
||||
if self.chars:
|
||||
return chr(ret + ord('A'))
|
||||
return str(ret)
|
||||
|
||||
def _get_str_target(self, pos=None):
|
||||
if pos not in self.target:
|
||||
return " "
|
||||
else:
|
||||
ret = self.target[pos]
|
||||
if self.chars:
|
||||
return chr(ret + ord('A'))
|
||||
return str(ret)
|
||||
|
||||
def _render_observation(self):
|
||||
x = self.x
|
||||
if self.inp_dim == 1:
|
||||
x_str = "Observation Tape : "
|
||||
for i in range(-2, self.total_len + 2):
|
||||
if i == x:
|
||||
x_str += colorize(self._get_str_obs(np.array([i])), 'green', highlight=True)
|
||||
else:
|
||||
x_str += self._get_str_obs(np.array([i]))
|
||||
x_str += "\n"
|
||||
return x_str
|
||||
elif self.inp_dim == 2:
|
||||
label = "Observation Grid : "
|
||||
x_str = ""
|
||||
for j in range(-1, 3):
|
||||
if j != -1:
|
||||
x_str += " " * len(label)
|
||||
for i in range(-2, self.total_len + 2):
|
||||
if i == x[0] and j == x[1]:
|
||||
x_str += colorize(self._get_str_obs(np.array([i, j])), 'green', highlight=True)
|
||||
else:
|
||||
x_str += self._get_str_obs(np.array([i, j]))
|
||||
x_str += "\n"
|
||||
x_str = label + x_str
|
||||
return x_str
|
||||
else:
|
||||
assert(False)
|
||||
|
||||
|
||||
def _render(self, mode='human', close=False):
|
||||
if close:
|
||||
# Nothing interesting to close
|
||||
return
|
||||
|
||||
outfile = StringIO.StringIO() if mode == 'ansi' else sys.stdout
|
||||
inp = "Total length of input instance: %d, step: %d\n" % (self.total_len, self.time)
|
||||
outfile.write(inp)
|
||||
x, y, action = self.x, self.y, self.last_action
|
||||
if action is not None:
|
||||
inp_act, out_act, pred = action
|
||||
outfile.write("=" * (len(inp) - 1) + "\n")
|
||||
y_str = "Output Tape : "
|
||||
target_str = "Targets : "
|
||||
if action is not None:
|
||||
if self.chars:
|
||||
pred_str = chr(pred + ord('A'))
|
||||
else:
|
||||
pred_str = str(pred)
|
||||
x_str = self._render_observation()
|
||||
max_len = int(self.total_reward) + 1
|
||||
for i in range(-2, max_len):
|
||||
if i not in self.target:
|
||||
y_str += " "
|
||||
continue
|
||||
target_str += self._get_str_target(i)
|
||||
if i < y - 1:
|
||||
y_str += self._get_str_target(i)
|
||||
elif i == (y - 1):
|
||||
if action is not None and out_act == 1:
|
||||
if pred == self.target[i]:
|
||||
y_str += colorize(pred_str, 'green', highlight=True)
|
||||
else:
|
||||
y_str += colorize(pred_str, 'red', highlight=True)
|
||||
else:
|
||||
y_str += self._get_str_target(i)
|
||||
outfile.write(x_str)
|
||||
outfile.write(y_str + "\n")
|
||||
outfile.write(target_str + "\n\n")
|
||||
|
||||
if action is not None:
|
||||
outfile.write("Current reward : %.3f\n" % self.reward)
|
||||
outfile.write("Cumulative reward : %.3f\n" % self.sum_reward)
|
||||
move = ""
|
||||
if inp_act == 0:
|
||||
move = "left"
|
||||
elif inp_act == 1:
|
||||
move = "right"
|
||||
elif inp_act == 2:
|
||||
move += "up"
|
||||
elif inp_act == 3:
|
||||
move += "down"
|
||||
outfile.write("Action : Tuple(move over input: %s,\n" % move)
|
||||
if out_act == 1:
|
||||
out_act = "True"
|
||||
else:
|
||||
out_act = "False"
|
||||
outfile.write(" write to the output tape: %s,\n" % out_act)
|
||||
outfile.write(" prediction: %s)\n" % pred_str)
|
||||
else:
|
||||
outfile.write("\n" * 5)
|
||||
return outfile
|
||||
|
||||
def _step(self, action):
|
||||
self.last_action = action
|
||||
inp_act, out_act, pred = action
|
||||
done = False
|
||||
reward = 0.0
|
||||
# We are outside the sample.
|
||||
self.time += 1
|
||||
if self.y not in self.target:
|
||||
reward = -10.0
|
||||
done = True
|
||||
else:
|
||||
if out_act == 1:
|
||||
if pred == self.target[self.y]:
|
||||
reward = 1.0
|
||||
else:
|
||||
reward = -0.5
|
||||
done = True
|
||||
self.y += 1
|
||||
if self.y not in self.target:
|
||||
done = True
|
||||
if inp_act == 0:
|
||||
self.x[0] -= 1
|
||||
elif inp_act == 1:
|
||||
self.x[0] += 1
|
||||
elif inp_act == 2:
|
||||
self.x[1] -= 1
|
||||
elif inp_act == 3:
|
||||
self.x[1] += 1
|
||||
if self.time > self.total_len + self.total_reward + 4:
|
||||
reward = -1.0
|
||||
done = True
|
||||
obs = self._get_obs()
|
||||
self.reward = reward
|
||||
self.sum_reward += reward
|
||||
return (obs, reward, done, {})
|
||||
|
||||
def _reset(self):
|
||||
self.last_action = None
|
||||
self.x = np.zeros(self.inp_dim).astype(np.int)
|
||||
self.y = 0
|
||||
AlgorithmicEnv.sum_rewards.append(self.sum_reward - self.total_reward)
|
||||
AlgorithmicEnv.sum_rewards = AlgorithmicEnv.sum_rewards[-self.last:]
|
||||
if len(AlgorithmicEnv.sum_rewards) == self.last and \
|
||||
min(AlgorithmicEnv.sum_rewards) >= -1.0 and \
|
||||
AlgorithmicEnv.current_length < 30:
|
||||
AlgorithmicEnv.current_length += 1
|
||||
AlgorithmicEnv.sum_rewards = []
|
||||
self.sum_reward = 0.0
|
||||
self.time = 0
|
||||
self.total_len = random.randrange(3) + AlgorithmicEnv.current_length
|
||||
self.set_data()
|
||||
return self._get_obs()
|
24
gym/envs/algorithmic/copy.py
Normal file
24
gym/envs/algorithmic/copy.py
Normal file
@@ -0,0 +1,24 @@
|
||||
"""
|
||||
Task is to copy content from the input tape to
|
||||
the output tape. http://arxiv.org/abs/1511.07275
|
||||
"""
|
||||
import random
|
||||
import numpy as np
|
||||
from gym.envs.algorithmic import algorithmic_env
|
||||
from gym.envs.algorithmic.algorithmic_env import ha
|
||||
|
||||
class CopyEnv(algorithmic_env.AlgorithmicEnv):
|
||||
def __init__(self, base=5):
|
||||
algorithmic_env.AlgorithmicEnv.__init__(self,
|
||||
inp_dim=1,
|
||||
base=base,
|
||||
chars=True)
|
||||
def set_data(self):
|
||||
self.content = {}
|
||||
self.target = {}
|
||||
for i in range(self.total_len):
|
||||
val = random.randrange(self.base)
|
||||
self.content[ha(np.array([i]))] = val
|
||||
self.target[i] = val
|
||||
self.total_reward = self.total_len
|
||||
|
27
gym/envs/algorithmic/duplicated_input.py
Normal file
27
gym/envs/algorithmic/duplicated_input.py
Normal file
@@ -0,0 +1,27 @@
|
||||
"""
|
||||
Task is to return every second character from the input tape.
|
||||
http://arxiv.org/abs/1511.07275
|
||||
"""
|
||||
|
||||
import random
|
||||
import numpy as np
|
||||
from gym.envs.algorithmic import algorithmic_env
|
||||
from gym.envs.algorithmic.algorithmic_env import ha
|
||||
|
||||
class DuplicatedInputEnv(algorithmic_env.AlgorithmicEnv):
|
||||
def __init__(self, duplication=2, base=5):
|
||||
self.duplication = duplication
|
||||
algorithmic_env.AlgorithmicEnv.__init__(self,
|
||||
inp_dim=1,
|
||||
base=base,
|
||||
chars=True)
|
||||
def set_data(self):
|
||||
self.content = {}
|
||||
self.target = {}
|
||||
copies = int(self.total_len / self.duplication)
|
||||
for i in range(copies):
|
||||
val = random.randrange(self.base)
|
||||
self.target[i] = val
|
||||
for d in range(self.duplication):
|
||||
self.content[ha(np.array([i * self.duplication + d]))] = val
|
||||
self.total_reward = self.total_len / self.duplication
|
29
gym/envs/algorithmic/repeat_copy.py
Normal file
29
gym/envs/algorithmic/repeat_copy.py
Normal file
@@ -0,0 +1,29 @@
|
||||
"""
|
||||
Task is to copy content multiple-times from the input tape to
|
||||
the output tape. http://arxiv.org/abs/1511.07275
|
||||
"""
|
||||
import random
|
||||
import numpy as np
|
||||
from gym.envs.algorithmic import algorithmic_env
|
||||
from gym.envs.algorithmic.algorithmic_env import ha
|
||||
|
||||
class RepeatCopyEnv(algorithmic_env.AlgorithmicEnv):
|
||||
def __init__(self, base=5):
|
||||
algorithmic_env.AlgorithmicEnv.__init__(self,
|
||||
inp_dim=1,
|
||||
base=base,
|
||||
chars=True)
|
||||
self.last = 50
|
||||
|
||||
def set_data(self):
|
||||
self.content = {}
|
||||
self.target = {}
|
||||
unique = set()
|
||||
for i in range(self.total_len):
|
||||
val = random.randrange(self.base)
|
||||
self.content[ha(np.array([i]))] = val
|
||||
self.target[i] = val
|
||||
self.target[2 * self.total_len - i - 1] = val
|
||||
self.target[2 * self.total_len + i] = val
|
||||
self.total_reward = 3.0 * self.total_len + 0.9
|
||||
|
27
gym/envs/algorithmic/reverse.py
Normal file
27
gym/envs/algorithmic/reverse.py
Normal file
@@ -0,0 +1,27 @@
|
||||
"""
|
||||
Task is to reverse content over the input tape.
|
||||
http://arxiv.org/abs/1511.07275
|
||||
"""
|
||||
|
||||
import random
|
||||
import numpy as np
|
||||
from gym.envs.algorithmic import algorithmic_env
|
||||
from gym.envs.algorithmic.algorithmic_env import ha
|
||||
|
||||
class ReverseEnv(algorithmic_env.AlgorithmicEnv):
|
||||
def __init__(self, base=2):
|
||||
algorithmic_env.AlgorithmicEnv.__init__(self,
|
||||
inp_dim=1,
|
||||
base=base,
|
||||
chars=True)
|
||||
algorithmic_env.AlgorithmicEnv.current_length = 1
|
||||
self.last = 50
|
||||
|
||||
def set_data(self):
|
||||
self.content = {}
|
||||
self.target = {}
|
||||
for i in range(self.total_len):
|
||||
val = random.randrange(self.base)
|
||||
self.content[ha(np.array([i]))] = val
|
||||
self.target[self.total_len - i - 1] = val
|
||||
self.total_reward = self.total_len + 0.9
|
30
gym/envs/algorithmic/reversed_addition.py
Normal file
30
gym/envs/algorithmic/reversed_addition.py
Normal file
@@ -0,0 +1,30 @@
|
||||
import random
|
||||
import numpy as np
|
||||
from gym.envs.algorithmic import algorithmic_env
|
||||
from gym.envs.algorithmic.algorithmic_env import ha
|
||||
|
||||
class ReversedAdditionEnv(algorithmic_env.AlgorithmicEnv):
|
||||
def __init__(self, rows=2, base=3):
|
||||
self.rows = rows
|
||||
algorithmic_env.AlgorithmicEnv.__init__(self,
|
||||
inp_dim=2,
|
||||
base=base,
|
||||
chars=False)
|
||||
def set_data(self):
|
||||
self.content = {}
|
||||
self.target = {}
|
||||
curry = 0
|
||||
for i in range(self.total_len):
|
||||
vals = []
|
||||
for k in range(self.rows):
|
||||
val = random.randrange(self.base)
|
||||
self.content[ha(np.array([i, k]))] = val
|
||||
vals.append(val)
|
||||
total = sum(vals) + curry
|
||||
self.target[i] = total % self.base
|
||||
curry = total / self.base
|
||||
if curry > 0:
|
||||
self.target[self.total_len] = curry
|
||||
self.total_reward = self.total_len
|
||||
|
||||
|
1
gym/envs/atari/__init__.py
Normal file
1
gym/envs/atari/__init__.py
Normal file
@@ -0,0 +1 @@
|
||||
from gym.envs.atari.atari_env import AtariEnv
|
121
gym/envs/atari/atari_env.py
Normal file
121
gym/envs/atari/atari_env.py
Normal file
@@ -0,0 +1,121 @@
|
||||
import numpy as np
|
||||
import os
|
||||
import gym
|
||||
from gym import error, spaces
|
||||
from gym import utils
|
||||
|
||||
try:
|
||||
import atari_py
|
||||
except ImportError:
|
||||
raise error.DependencyNotInstalled("{}. (HINT: you can install Atari dependencies with 'pip install gym[atari].)'")
|
||||
|
||||
import logging
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
def to_rgb(ale):
|
||||
(screen_width,screen_height) = ale.getScreenDims()
|
||||
arr = np.zeros((screen_height, screen_width, 4), dtype=np.uint8)
|
||||
ale.getScreenRGB(arr) # says rgb but actually bgr
|
||||
return arr[:,:,[2, 1, 0]].copy()
|
||||
|
||||
def to_ram(ale):
|
||||
ram_size = ale.getRAMSize()
|
||||
ram = np.zeros((ram_size),dtype=np.uint8)
|
||||
ale.getRAM(ram)
|
||||
return ram
|
||||
|
||||
class AtariEnv(gym.Env, utils.EzPickle):
|
||||
metadata = {'render.modes': ['human', 'rgb_array']}
|
||||
|
||||
def __init__(self, game='pong', obs_type='ram'):
|
||||
utils.EzPickle.__init__(self, game, obs_type)
|
||||
assert obs_type in ('ram', 'image')
|
||||
game_path = atari_py.get_game_path(game)
|
||||
if not os.path.exists(game_path):
|
||||
raise IOError('You asked for game %s but path %s does not exist'%(game, game_path))
|
||||
self.ale = atari_py.ALEInterface()
|
||||
self.ale.loadROM(game_path)
|
||||
self._obs_type = obs_type
|
||||
self._action_set = self.ale.getMinimalActionSet()
|
||||
self.viewer = None
|
||||
|
||||
(screen_width,screen_height) = self.ale.getScreenDims()
|
||||
|
||||
self.action_space = spaces.Discrete(len(self._action_set))
|
||||
if self._obs_type == 'ram':
|
||||
self.observation_space = spaces.Box(low=np.zeros(128), high=np.zeros(128)+255)
|
||||
elif self._obs_type == 'image':
|
||||
self.observation_space = spaces.Box(low=0, high=255, shape=(screen_height, screen_width, 3))
|
||||
else:
|
||||
raise error.Error('Unrecognized observation type: {}'.format(self._obs_type))
|
||||
|
||||
def _step(self, a):
|
||||
reward = 0.0
|
||||
action = self._action_set[a]
|
||||
num_steps = np.random.randint(2, 5)
|
||||
for _ in xrange(num_steps):
|
||||
reward += self.ale.act(action)
|
||||
ob = self._get_obs()
|
||||
|
||||
return ob, reward, self.ale.game_over(), {}
|
||||
|
||||
def _get_image(self):
|
||||
return to_rgb(self.ale)
|
||||
def _get_ram(self):
|
||||
return to_ram(self.ale)
|
||||
|
||||
@property
|
||||
def _n_actions(self):
|
||||
return len(self._action_set)
|
||||
|
||||
def _get_obs(self):
|
||||
if self._obs_type == 'ram':
|
||||
return self._get_ram()
|
||||
elif self._obs_type == 'image':
|
||||
img = self._get_image()
|
||||
return img
|
||||
|
||||
# return: (states, observations)
|
||||
def _reset(self):
|
||||
self.ale.reset_game()
|
||||
return self._get_obs()
|
||||
|
||||
def _render(self, mode='human', close=False):
|
||||
if close:
|
||||
if self.viewer is not None:
|
||||
self.viewer.close()
|
||||
return
|
||||
img = self._get_image()
|
||||
if mode == 'rgb_array':
|
||||
return img
|
||||
elif mode is 'human':
|
||||
from gym.envs.classic_control import rendering
|
||||
if self.viewer is None:
|
||||
self.viewer = rendering.SimpleImageViewer()
|
||||
self.viewer.imshow(img)
|
||||
|
||||
def get_action_meanings(self):
|
||||
return [ACTION_MEANING[i] for i in self._action_set]
|
||||
|
||||
|
||||
|
||||
ACTION_MEANING = {
|
||||
0 : "NOOP",
|
||||
1 : "FIRE",
|
||||
2 : "UP",
|
||||
3 : "RIGHT",
|
||||
4 : "LEFT",
|
||||
5 : "DOWN",
|
||||
6 : "UPRIGHT",
|
||||
7 : "UPLEFT",
|
||||
8 : "DOWNRIGHT",
|
||||
9 : "DOWNLEFT",
|
||||
10 : "UPFIRE",
|
||||
11 : "RIGHTFIRE",
|
||||
12 : "LEFTFIRE",
|
||||
13 : "DOWNFIRE",
|
||||
14 : "UPRIGHTFIRE",
|
||||
15 : "UPLEFTFIRE",
|
||||
16 : "DOWNRIGHTFIRE",
|
||||
17 : "DOWNLEFTFIRE",
|
||||
}
|
1
gym/envs/board_game/__init__.py
Normal file
1
gym/envs/board_game/__init__.py
Normal file
@@ -0,0 +1 @@
|
||||
from gym.envs.board_game.go import GoEnv
|
233
gym/envs/board_game/go.py
Normal file
233
gym/envs/board_game/go.py
Normal file
@@ -0,0 +1,233 @@
|
||||
from gym import error
|
||||
try:
|
||||
import pachi_py
|
||||
except ImportError as e:
|
||||
# The dependency group [pachi] should match the name is setup.py.
|
||||
raise error.DependencyNotInstalled('{}. (HINT: you may need to install the Go dependencies via "pip install gym[pachi].)"'.format(e))
|
||||
|
||||
import numpy as np
|
||||
import gym
|
||||
from gym import spaces
|
||||
import StringIO
|
||||
import sys
|
||||
|
||||
|
||||
# The coordinate representation of Pachi (and pachi_py) is defined on a board
|
||||
# with extra rows and columns on the margin of the board, so positions on the board
|
||||
# are not numbers in [0, board_size**2) as one would expect. For this Go env, we instead
|
||||
# use an action representation that does fall in this more natural range.
|
||||
|
||||
def _coord_to_action(board, c):
|
||||
'''Converts Pachi coordinates to actions'''
|
||||
if c == pachi_py.PASS_COORD: return board.size**2 # pass
|
||||
if c == pachi_py.RESIGN_COORD: return board.size**2 + 1 # resign
|
||||
i, j = board.coord_to_ij(c)
|
||||
return i*board.size + j
|
||||
|
||||
|
||||
def _action_to_coord(board, a):
|
||||
'''Converts actions to Pachi coordinates'''
|
||||
if a == board.size**2: return pachi_py.PASS_COORD
|
||||
if a == board.size**2 + 1: return pachi_py.RESIGN_COORD
|
||||
return board.ij_to_coord(a // board.size, a % board.size)
|
||||
|
||||
def str_to_action(board, s):
|
||||
return _coord_to_action(board, board.str_to_coord(s))
|
||||
|
||||
class GoState(object):
|
||||
'''
|
||||
Go game state. Consists of a current player and a board.
|
||||
Actions are exposed as integers in [0, num_actions), which is different
|
||||
from Pachi's internal "coord_t" encoding.
|
||||
'''
|
||||
def __init__(self, board, color):
|
||||
'''
|
||||
Args:
|
||||
board: current board
|
||||
color: color of current player
|
||||
'''
|
||||
assert color in [pachi_py.BLACK, pachi_py.WHITE], 'Invalid player color'
|
||||
self.board, self.color = board, color
|
||||
|
||||
def act(self, action):
|
||||
'''
|
||||
Executes an action for the current player
|
||||
|
||||
Returns:
|
||||
a new GoState with the new board and the player switched
|
||||
'''
|
||||
return GoState(
|
||||
self.board.play(_action_to_coord(self.board, action), self.color),
|
||||
pachi_py.stone_other(self.color))
|
||||
|
||||
def __repr__(self):
|
||||
return 'To play: {}\n{}'.format(pachi_py.color_to_str(self.color), repr(self.board))
|
||||
|
||||
|
||||
### Adversary policies ###
|
||||
def random_policy(curr_state, prev_state, prev_action):
|
||||
b = curr_state.board
|
||||
legal_coords = b.get_legal_coords(curr_state.color)
|
||||
return _coord_to_action(b, np.random.choice(legal_coords))
|
||||
|
||||
def make_pachi_policy(board, engine_type='uct', threads=1, pachi_timestr=''):
|
||||
engine = pachi_py.PyPachiEngine(board, engine_type, 'threads=%d' % threads)
|
||||
|
||||
def pachi_policy(curr_state, prev_state, prev_action):
|
||||
if prev_state is not None:
|
||||
assert engine.curr_board == prev_state.board, 'Engine internal board is inconsistent with provided board. The Pachi engine must be called consistently as the game progresses.'
|
||||
prev_coord = _action_to_coord(prev_state.board, prev_action)
|
||||
engine.notify(prev_coord, prev_state.color)
|
||||
engine.curr_board.play_inplace(prev_coord, prev_state.color)
|
||||
out_coord = engine.genmove(curr_state.color, pachi_timestr)
|
||||
out_action = _coord_to_action(curr_state.board, out_coord)
|
||||
engine.curr_board.play_inplace(out_coord, curr_state.color)
|
||||
return out_action
|
||||
|
||||
return pachi_policy
|
||||
|
||||
|
||||
def _play(black_policy_fn, white_policy_fn, board_size=19):
|
||||
'''
|
||||
Samples a trajectory for two player policies.
|
||||
Args:
|
||||
black_policy_fn, white_policy_fn: functions that maps a GoState to a move coord (int)
|
||||
'''
|
||||
moves = []
|
||||
|
||||
prev_state, prev_action = None, None
|
||||
curr_state = GoState(CreateBoard(board_size), BLACK)
|
||||
|
||||
while not curr_state.board.is_terminal:
|
||||
a = (black_policy_fn if curr_state.color == BLACK else white_policy_fn)(curr_state, prev_state, prev_action)
|
||||
next_state = curr_state.act(a)
|
||||
moves.append((curr_state, a, next_state))
|
||||
|
||||
prev_state, prev_action = curr_state, a
|
||||
curr_state = next_state
|
||||
|
||||
return moves
|
||||
|
||||
|
||||
class GoEnv(gym.Env):
|
||||
'''
|
||||
Go environment. Play against a fixed opponent.
|
||||
'''
|
||||
metadata = {"render.modes": ["human", "ansi"]}
|
||||
|
||||
def __init__(self, player_color, opponent, observation_type, illegal_move_mode, board_size):
|
||||
'''
|
||||
Args:
|
||||
player_color: Stone color for the agent. Either 'black' or 'white'
|
||||
opponent: An opponent policy
|
||||
observation_type: State encoding
|
||||
illegal_move_mode: What to do when the agent makes an illegal move. Choices: 'raise' or 'lose'
|
||||
'''
|
||||
assert isinstance(board_size, int) and board_size >= 1, 'Invalid board size: {}'.format(board_size)
|
||||
self.board_size = board_size
|
||||
|
||||
colormap = {
|
||||
'black': pachi_py.BLACK,
|
||||
'white': pachi_py.WHITE,
|
||||
}
|
||||
try:
|
||||
self.player_color = colormap[player_color]
|
||||
except KeyError:
|
||||
raise error.Error("player_color must be 'black' or 'white', not {}".format(player_color))
|
||||
|
||||
self.opponent_policy = None
|
||||
self.opponent = opponent
|
||||
|
||||
assert observation_type in ['image3c']
|
||||
self.observation_type = observation_type
|
||||
|
||||
assert illegal_move_mode in ['lose', 'raise']
|
||||
self.illegal_move_mode = illegal_move_mode
|
||||
|
||||
# One action for each board position, pass, and resign
|
||||
self.action_space = spaces.Discrete(self.board_size**2 + 2)
|
||||
|
||||
if self.observation_type == 'image3c':
|
||||
shape = pachi_py.CreateBoard(self.board_size).encode().shape
|
||||
self.observation_space = spaces.Box(np.zeros(shape), np.ones(shape))
|
||||
else:
|
||||
raise error.Error('Unsupported observation type: {}'.format(self.observation_type))
|
||||
|
||||
self.reset()
|
||||
|
||||
def _reset(self):
|
||||
self.state = GoState(pachi_py.CreateBoard(self.board_size), pachi_py.BLACK)
|
||||
|
||||
# (re-initialize) the opponent
|
||||
# necessary because a pachi engine is attached to a game via internal data in a board
|
||||
# so with a fresh game, we need a fresh engine
|
||||
self._reset_opponent(self.state.board)
|
||||
|
||||
# Let the opponent play if it's not the agent's turn
|
||||
if self.state.color != self.player_color:
|
||||
self.state = self._exec_opponent_play(self.state, None, None)
|
||||
assert self.state.color == self.player_color
|
||||
|
||||
self.done = self.state.board.is_terminal
|
||||
return self.state.board.encode()
|
||||
|
||||
def _render(self, mode="human", close=False):
|
||||
if close:
|
||||
return
|
||||
outfile = StringIO.StringIO() if mode == 'ansi' else sys.stdout
|
||||
outfile.write(repr(self.state) + '\n')
|
||||
return outfile
|
||||
|
||||
def _step(self, action):
|
||||
assert self.state.color == self.player_color
|
||||
|
||||
# If already terminal, then don't do anything
|
||||
if self.done:
|
||||
return self.state.board.encode(), 0., True, {'state': self.state}
|
||||
|
||||
# Play
|
||||
prev_state = self.state
|
||||
try:
|
||||
self.state = self.state.act(action)
|
||||
except pachi_py.IllegalMove:
|
||||
if self.illegal_move_mode == 'raise':
|
||||
raise
|
||||
elif self.illegal_move_mode == 'lose':
|
||||
# Automatic loss on illegal move
|
||||
self.done = True
|
||||
return self.state.board.encode(), -1., True, {'state': self.state}
|
||||
else:
|
||||
raise error.Error('Unsupported illegal move action: {}'.format(self.illegal_move_mode))
|
||||
|
||||
# Opponent play
|
||||
if not self.state.board.is_terminal:
|
||||
self.state = self._exec_opponent_play(self.state, prev_state, action)
|
||||
# After opponent play, we should be back to the original color
|
||||
assert self.state.color == self.player_color
|
||||
|
||||
# Reward: 0 if nonterminal, 1 if won, -1 if lost
|
||||
if self.state.board.is_terminal:
|
||||
self.done = True
|
||||
white_wins = self.state.board.official_score > 0
|
||||
reward = 1. if (white_wins and self.player_color == pachi_py.WHITE) else -1.
|
||||
else:
|
||||
self.done = False
|
||||
reward = 0.
|
||||
return self.state.board.encode(), reward, self.done, {'state': self.state}
|
||||
|
||||
def _exec_opponent_play(self, curr_state, prev_state, prev_action):
|
||||
assert curr_state.color != self.player_color
|
||||
opponent_action = self.opponent_policy(curr_state, prev_state, prev_action)
|
||||
return curr_state.act(opponent_action)
|
||||
|
||||
@property
|
||||
def _state(self):
|
||||
return self.state
|
||||
|
||||
def _reset_opponent(self, board):
|
||||
if self.opponent == 'random':
|
||||
self.opponent_policy = random_policy
|
||||
elif self.opponent == 'pachi:uct:_2400':
|
||||
self.opponent_policy = make_pachi_policy(board=board, engine_type='uct', pachi_timestr='_2400') # TODO: strength as argument
|
||||
else:
|
||||
raise error.Error('Unrecognized opponent policy {}'.format(self.opponent))
|
5
gym/envs/classic_control/__init__.py
Normal file
5
gym/envs/classic_control/__init__.py
Normal file
@@ -0,0 +1,5 @@
|
||||
from gym.envs.classic_control.cartpole import CartPoleEnv
|
||||
from gym.envs.classic_control.mountain_car import MountainCarEnv
|
||||
from gym.envs.classic_control.pendulum import PendulumEnv
|
||||
from gym.envs.classic_control.acrobot import AcrobotEnv
|
||||
|
288
gym/envs/classic_control/acrobot.py
Normal file
288
gym/envs/classic_control/acrobot.py
Normal file
@@ -0,0 +1,288 @@
|
||||
"""classic Acrobot task"""
|
||||
from gym import core, spaces
|
||||
import numpy as np
|
||||
import time
|
||||
|
||||
__copyright__ = "Copyright 2013, RLPy http://acl.mit.edu/RLPy"
|
||||
__credits__ = ["Alborz Geramifard", "Robert H. Klein", "Christoph Dann",
|
||||
"William Dabney", "Jonathan P. How"]
|
||||
__license__ = "BSD 3-Clause"
|
||||
__author__ = "Christoph Dann <cdann@cdann.de>"
|
||||
|
||||
# SOURCE:
|
||||
# https://github.com/rlpy/rlpy/blob/master/rlpy/Domains/Acrobot.py
|
||||
|
||||
class AcrobotEnv(core.Env):
|
||||
|
||||
"""
|
||||
Acrobot is a 2-link pendulum with only the second joint actuated
|
||||
Intitially, both links point downwards. The goal is to swing the
|
||||
end-effector at a height at least the length of one link above the base.
|
||||
Both links can swing freely and can pass by each other, i.e., they don't
|
||||
collide when they have the same angle.
|
||||
**STATE:**
|
||||
The state consists of the two rotational joint angles and their velocities
|
||||
[theta1 theta2 thetaDot1 thetaDot2]. An angle of 0 corresponds to corresponds
|
||||
to the respective link pointing downwards (angles are in world coordinates).
|
||||
**ACTIONS:**
|
||||
The action is either applying +1, 0 or -1 torque on the joint between
|
||||
the two pendulum links.
|
||||
.. note::
|
||||
The dynamics equations were missing some terms in the NIPS paper which
|
||||
are present in the book. R. Sutton confirmed in personal correspondance
|
||||
that the experimental results shown in the paper and the book were
|
||||
generated with the equations shown in the book.
|
||||
However, there is the option to run the domain with the paper equations
|
||||
by setting book_or_nips = 'nips'
|
||||
**REFERENCE:**
|
||||
.. seealso::
|
||||
R. Sutton: Generalization in Reinforcement Learning:
|
||||
Successful Examples Using Sparse Coarse Coding (NIPS 1996)
|
||||
.. seealso::
|
||||
R. Sutton and A. G. Barto:
|
||||
Reinforcement learning: An introduction.
|
||||
Cambridge: MIT press, 1998.
|
||||
.. warning::
|
||||
This version of the domain uses the Runge-Kutta method for integrating
|
||||
the system dynamics and is more realistic, but also considerably harder
|
||||
than the original version which employs Euler integration,
|
||||
see the AcrobotLegacy class.
|
||||
"""
|
||||
|
||||
metadata = {
|
||||
'render.modes': ['human', 'rgb_array'],
|
||||
'video.frames_per_second' : 15
|
||||
}
|
||||
|
||||
dt = .2
|
||||
|
||||
LINK_LENGTH_1 = 1. # [m]
|
||||
LINK_LENGTH_2 = 1. # [m]
|
||||
LINK_MASS_1 = 1. #: [kg] mass of link 1
|
||||
LINK_MASS_2 = 1. #: [kg] mass of link 2
|
||||
LINK_COM_POS_1 = 0.5 #: [m] position of the center of mass of link 1
|
||||
LINK_COM_POS_2 = 0.5 #: [m] position of the center of mass of link 2
|
||||
LINK_MOI = 1. #: moments of inertia for both links
|
||||
|
||||
MAX_VEL_1 = 4 * np.pi
|
||||
MAX_VEL_2 = 9 * np.pi
|
||||
|
||||
AVAIL_TORQUE = [-1., 0., +1]
|
||||
|
||||
torque_noise_max = 0.
|
||||
|
||||
#: use dynamics equations from the nips paper or the book
|
||||
book_or_nips = "book"
|
||||
action_arrow = None
|
||||
domain_fig = None
|
||||
actions_num = 3
|
||||
|
||||
def __init__(self):
|
||||
high = np.array([np.pi, np.pi, self.MAX_VEL_1, self.MAX_VEL_2])
|
||||
low = -high
|
||||
self.observation_space = spaces.Box(low, high)
|
||||
self.action_space = spaces.Discrete(3)
|
||||
self.viewer = None
|
||||
|
||||
def _reset(self):
|
||||
self.state = np.random.uniform(low=-0.1, high=0.1, size=(4,))
|
||||
return self.state
|
||||
|
||||
def _step(self, a):
|
||||
s = self.state
|
||||
torque = self.AVAIL_TORQUE[a]
|
||||
|
||||
# Add noise to the force action
|
||||
if self.torque_noise_max > 0:
|
||||
torque += np.random.uniform(-self.torque_noise_max, self.torque_noise_max)
|
||||
|
||||
# Now, augment the state with our force action so it can be passed to
|
||||
# _dsdt
|
||||
s_augmented = np.append(s, torque)
|
||||
|
||||
ns = rk4(self._dsdt, s_augmented, [0, self.dt])
|
||||
# only care about final timestep of integration returned by integrator
|
||||
ns = ns[-1]
|
||||
ns = ns[:4] # omit action
|
||||
# ODEINT IS TOO SLOW!
|
||||
# ns_continuous = integrate.odeint(self._dsdt, self.s_continuous, [0, self.dt])
|
||||
# self.s_continuous = ns_continuous[-1] # We only care about the state
|
||||
# at the ''final timestep'', self.dt
|
||||
|
||||
ns[0] = wrap(ns[0], -np.pi, np.pi)
|
||||
ns[1] = wrap(ns[1], -np.pi, np.pi)
|
||||
ns[2] = bound(ns[2], -self.MAX_VEL_1, self.MAX_VEL_1)
|
||||
ns[3] = bound(ns[3], -self.MAX_VEL_2, self.MAX_VEL_2)
|
||||
self.state = ns.copy()
|
||||
terminal = self._terminal()
|
||||
reward = -1. if not terminal else 0.
|
||||
return (np.array(self.state), reward, terminal, {})
|
||||
|
||||
def _terminal(self):
|
||||
s = self.state
|
||||
return bool(-np.cos(s[0]) - np.cos(s[1] + s[0]) > 1.)
|
||||
|
||||
def _dsdt(self, s_augmented, t):
|
||||
m1 = self.LINK_MASS_1
|
||||
m2 = self.LINK_MASS_2
|
||||
l1 = self.LINK_LENGTH_1
|
||||
lc1 = self.LINK_COM_POS_1
|
||||
lc2 = self.LINK_COM_POS_2
|
||||
I1 = self.LINK_MOI
|
||||
I2 = self.LINK_MOI
|
||||
g = 9.8
|
||||
a = s_augmented[-1]
|
||||
s = s_augmented[:-1]
|
||||
theta1 = s[0]
|
||||
theta2 = s[1]
|
||||
dtheta1 = s[2]
|
||||
dtheta2 = s[3]
|
||||
d1 = m1 * lc1 ** 2 + m2 * \
|
||||
(l1 ** 2 + lc2 ** 2 + 2 * l1 * lc2 * np.cos(theta2)) + I1 + I2
|
||||
d2 = m2 * (lc2 ** 2 + l1 * lc2 * np.cos(theta2)) + I2
|
||||
phi2 = m2 * lc2 * g * np.cos(theta1 + theta2 - np.pi / 2.)
|
||||
phi1 = - m2 * l1 * lc2 * dtheta2 ** 2 * np.sin(theta2) \
|
||||
- 2 * m2 * l1 * lc2 * dtheta2 * dtheta1 * np.sin(theta2) \
|
||||
+ (m1 * lc1 + m2 * l1) * g * np.cos(theta1 - np.pi / 2) + phi2
|
||||
if self.book_or_nips == "nips":
|
||||
# the following line is consistent with the description in the
|
||||
# paper
|
||||
ddtheta2 = (a + d2 / d1 * phi1 - phi2) / \
|
||||
(m2 * lc2 ** 2 + I2 - d2 ** 2 / d1)
|
||||
else:
|
||||
# the following line is consistent with the java implementation and the
|
||||
# book
|
||||
ddtheta2 = (a + d2 / d1 * phi1 - m2 * l1 * lc2 * dtheta1 ** 2 * np.sin(theta2) - phi2) \
|
||||
/ (m2 * lc2 ** 2 + I2 - d2 ** 2 / d1)
|
||||
ddtheta1 = -(d2 * ddtheta2 + phi1) / d1
|
||||
return (dtheta1, dtheta2, ddtheta1, ddtheta2, 0.)
|
||||
|
||||
def _render(self, mode='human', close=False):
|
||||
from gym.envs.classic_control import rendering
|
||||
if close:
|
||||
if self.viewer is not None:
|
||||
self.viewer.close()
|
||||
return
|
||||
|
||||
s = self.state
|
||||
|
||||
if self.viewer is None:
|
||||
self.viewer = rendering.Viewer(500,500)
|
||||
self.viewer.set_bounds(-2.2,2.2,-2.2,2.2)
|
||||
|
||||
p1 = [-self.LINK_LENGTH_1 *
|
||||
np.cos(s[0]), self.LINK_LENGTH_1 * np.sin(s[0])]
|
||||
|
||||
p2 = [p1[0] - self.LINK_LENGTH_2 * np.cos(s[0] + s[1]),
|
||||
p1[1] + self.LINK_LENGTH_2 * np.sin(s[0] + s[1])]
|
||||
|
||||
xys = np.array([[0,0], p1, p2])[:,::-1]
|
||||
thetas = [s[0]-np.pi/2, s[0]+s[1]-np.pi/2]
|
||||
|
||||
self.viewer.draw_line((-2.2, 1), (2.2, 1))
|
||||
for ((x,y),th) in zip(xys, thetas):
|
||||
l,r,t,b = 0, 1, .1, -.1
|
||||
jtransform = rendering.Transform(rotation=th, translation=(x,y))
|
||||
link = self.viewer.draw_polygon([(l,b), (l,t), (r,t), (r,b)])
|
||||
link.add_attr(jtransform)
|
||||
link.set_color(0,.8, .8)
|
||||
circ = self.viewer.draw_circle(.1)
|
||||
circ.set_color(.8, .8, 0)
|
||||
circ.add_attr(jtransform)
|
||||
|
||||
self.viewer.render()
|
||||
if mode == 'rgb_array':
|
||||
return self.viewer.get_array()
|
||||
elif mode is 'human':
|
||||
pass
|
||||
|
||||
def wrap(x, m, M):
|
||||
"""
|
||||
:param x: a scalar
|
||||
:param m: minimum possible value in range
|
||||
:param M: maximum possible value in range
|
||||
Wraps ``x`` so m <= x <= M; but unlike ``bound()`` which
|
||||
truncates, ``wrap()`` wraps x around the coordinate system defined by m,M.\n
|
||||
For example, m = -180, M = 180 (degrees), x = 360 --> returns 0.
|
||||
"""
|
||||
diff = M - m
|
||||
while x > M:
|
||||
x = x - diff
|
||||
while x < m:
|
||||
x = x + diff
|
||||
return x
|
||||
|
||||
def bound(x, m, M=None):
|
||||
"""
|
||||
:param x: scalar
|
||||
Either have m as scalar, so bound(x,m,M) which returns m <= x <= M *OR*
|
||||
have m as length 2 vector, bound(x,m, <IGNORED>) returns m[0] <= x <= m[1].
|
||||
"""
|
||||
if M is None:
|
||||
M = m[1]
|
||||
m = m[0]
|
||||
# bound x between min (m) and Max (M)
|
||||
return min(max(x, m), M)
|
||||
|
||||
|
||||
def rk4(derivs, y0, t, *args, **kwargs):
|
||||
"""
|
||||
Integrate 1D or ND system of ODEs using 4-th order Runge-Kutta.
|
||||
This is a toy implementation which may be useful if you find
|
||||
yourself stranded on a system w/o scipy. Otherwise use
|
||||
:func:`scipy.integrate`.
|
||||
*y0*
|
||||
initial state vector
|
||||
*t*
|
||||
sample times
|
||||
*derivs*
|
||||
returns the derivative of the system and has the
|
||||
signature ``dy = derivs(yi, ti)``
|
||||
*args*
|
||||
additional arguments passed to the derivative function
|
||||
*kwargs*
|
||||
additional keyword arguments passed to the derivative function
|
||||
Example 1 ::
|
||||
## 2D system
|
||||
def derivs6(x,t):
|
||||
d1 = x[0] + 2*x[1]
|
||||
d2 = -3*x[0] + 4*x[1]
|
||||
return (d1, d2)
|
||||
dt = 0.0005
|
||||
t = arange(0.0, 2.0, dt)
|
||||
y0 = (1,2)
|
||||
yout = rk4(derivs6, y0, t)
|
||||
Example 2::
|
||||
## 1D system
|
||||
alpha = 2
|
||||
def derivs(x,t):
|
||||
return -alpha*x + exp(-t)
|
||||
y0 = 1
|
||||
yout = rk4(derivs, y0, t)
|
||||
If you have access to scipy, you should probably be using the
|
||||
scipy.integrate tools rather than this function.
|
||||
"""
|
||||
|
||||
try:
|
||||
Ny = len(y0)
|
||||
except TypeError:
|
||||
yout = np.zeros((len(t),), np.float_)
|
||||
else:
|
||||
yout = np.zeros((len(t), Ny), np.float_)
|
||||
|
||||
yout[0] = y0
|
||||
i = 0
|
||||
|
||||
for i in np.arange(len(t) - 1):
|
||||
|
||||
thist = t[i]
|
||||
dt = t[i + 1] - thist
|
||||
dt2 = dt / 2.0
|
||||
y0 = yout[i]
|
||||
|
||||
k1 = np.asarray(derivs(y0, thist, *args, **kwargs))
|
||||
k2 = np.asarray(derivs(y0 + dt2 * k1, thist + dt2, *args, **kwargs))
|
||||
k3 = np.asarray(derivs(y0 + dt2 * k2, thist + dt2, *args, **kwargs))
|
||||
k4 = np.asarray(derivs(y0 + dt * k3, thist + dt, *args, **kwargs))
|
||||
yout[i + 1] = y0 + dt / 6.0 * (k1 + 2 * k2 + 2 * k3 + k4)
|
||||
return yout
|
BIN
gym/envs/classic_control/assets/clockwise.png
Normal file
BIN
gym/envs/classic_control/assets/clockwise.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 6.8 KiB |
118
gym/envs/classic_control/cartpole.py
Normal file
118
gym/envs/classic_control/cartpole.py
Normal file
@@ -0,0 +1,118 @@
|
||||
"""
|
||||
Classic cart-pole system implemented by Rich Sutton et al.
|
||||
Copied from https://webdocs.cs.ualberta.ca/~sutton/book/code/pole.c
|
||||
"""
|
||||
|
||||
import math
|
||||
import gym
|
||||
from gym import spaces
|
||||
import numpy as np
|
||||
|
||||
class CartPoleEnv(gym.Env):
|
||||
metadata = {
|
||||
'render.modes': ['human', 'rgb_array'],
|
||||
'video.frames_per_second' : 50
|
||||
}
|
||||
|
||||
def __init__(self):
|
||||
self.gravity = 9.8
|
||||
self.masscart = 1.0
|
||||
self.masspole = 0.1
|
||||
self.total_mass = (self.masspole + self.masscart)
|
||||
self.length = 0.5 # actually half the pole's length
|
||||
self.polemass_length = (self.masspole * self.length)
|
||||
self.force_mag = 10.0
|
||||
self.tau = 0.02 # seconds between state updates
|
||||
|
||||
# Angle at which to fail the episode
|
||||
self.theta_threshold_radians = 12 * 2 * math.pi / 360
|
||||
self.x_threshold = 2.4
|
||||
self.reset()
|
||||
self.viewer = None
|
||||
|
||||
high = np.array([self.x_threshold, np.inf, self.theta_threshold_radians, np.inf])
|
||||
self.action_space = spaces.Discrete(2)
|
||||
self.observation_space = spaces.Box(-high, high)
|
||||
|
||||
def _step(self, action):
|
||||
action = action
|
||||
assert action==0 or action==1, "%r (%s) invalid"%(action, type(action))
|
||||
state = self.state
|
||||
x, x_dot, theta, theta_dot = state
|
||||
force = self.force_mag if action==1 else -self.force_mag
|
||||
costheta = math.cos(theta)
|
||||
sintheta = math.sin(theta)
|
||||
temp = (force + self.polemass_length * theta_dot * theta_dot * sintheta) / self.total_mass
|
||||
thetaacc = (self.gravity * sintheta - costheta* temp) / (self.length * (4.0/3.0 - self.masspole * costheta * costheta / self.total_mass))
|
||||
xacc = temp - self.polemass_length * thetaacc * costheta / self.total_mass
|
||||
x = x + self.tau * x_dot
|
||||
x_dot = x_dot + self.tau * xacc
|
||||
theta = theta + self.tau * theta_dot
|
||||
theta_dot = theta_dot + self.tau * thetaacc
|
||||
self.state = (x,x_dot,theta,theta_dot)
|
||||
done = x < -self.x_threshold \
|
||||
or x > self.x_threshold \
|
||||
or theta < -self.theta_threshold_radians \
|
||||
or theta > self.theta_threshold_radians
|
||||
done = bool(done)
|
||||
reward = 1.0
|
||||
return np.array(self.state), reward, done, {}
|
||||
|
||||
def _reset(self):
|
||||
self.state = np.random.uniform(low=-0.05, high=0.05, size=(4,))
|
||||
return np.array(self.state)
|
||||
|
||||
def _render(self, mode='human', close=False):
|
||||
if close:
|
||||
if self.viewer is not None:
|
||||
self.viewer.close()
|
||||
return
|
||||
|
||||
screen_width = 600
|
||||
screen_height = 400
|
||||
|
||||
world_width = self.x_threshold*2
|
||||
scale = screen_width/world_width
|
||||
carty = 100 # TOP OF CART
|
||||
polewidth = 10.0
|
||||
polelen = scale * 1.0
|
||||
cartwidth = 50.0
|
||||
cartheight = 30.0
|
||||
|
||||
if self.viewer is None:
|
||||
from gym.envs.classic_control import rendering
|
||||
self.viewer = rendering.Viewer(screen_width, screen_height)
|
||||
l,r,t,b = -cartwidth/2, cartwidth/2, cartheight/2, -cartheight/2
|
||||
axleoffset =cartheight/4.0
|
||||
cart = rendering.FilledPolygon([(l,b), (l,t), (r,t), (r,b)])
|
||||
self.carttrans = rendering.Transform()
|
||||
cart.add_attr(self.carttrans)
|
||||
self.viewer.add_geom(cart)
|
||||
l,r,t,b = -polewidth/2,polewidth/2,polelen-polewidth/2,-polewidth/2
|
||||
pole = rendering.FilledPolygon([(l,b), (l,t), (r,t), (r,b)])
|
||||
pole.set_color(.8,.6,.4)
|
||||
self.poletrans = rendering.Transform(translation=(0, axleoffset))
|
||||
pole.add_attr(self.poletrans)
|
||||
pole.add_attr(self.carttrans)
|
||||
self.viewer.add_geom(pole)
|
||||
self.axle = rendering.make_circle(polewidth/2)
|
||||
self.axle.add_attr(self.poletrans)
|
||||
self.axle.add_attr(self.carttrans)
|
||||
self.axle.set_color(.5,.5,.8)
|
||||
self.viewer.add_geom(self.axle)
|
||||
self.track = rendering.Line((0,carty), (screen_width,carty))
|
||||
self.track.set_color(0,0,0)
|
||||
self.viewer.add_geom(self.track)
|
||||
|
||||
x = self.state
|
||||
cartx = x[0]*scale+screen_width/2.0 # MIDDLE OF CART
|
||||
self.carttrans.set_translation(cartx, carty)
|
||||
self.poletrans.set_rotation(-x[2])
|
||||
|
||||
self.viewer.render()
|
||||
if mode == 'rgb_array':
|
||||
return self.viewer.get_array()
|
||||
elif mode is 'human':
|
||||
pass
|
||||
else:
|
||||
return super(CartPoleEnv, self).render(mode=mode)
|
119
gym/envs/classic_control/mountain_car.py
Normal file
119
gym/envs/classic_control/mountain_car.py
Normal file
@@ -0,0 +1,119 @@
|
||||
"""
|
||||
https://webdocs.cs.ualberta.ca/~sutton/MountainCar/MountainCar1.cp
|
||||
"""
|
||||
|
||||
import math
|
||||
import gym
|
||||
from gym import spaces
|
||||
import numpy as np
|
||||
|
||||
class MountainCarEnv(gym.Env):
|
||||
metadata = {
|
||||
'render.modes': ['human', 'rgb_array'],
|
||||
'video.frames_per_second': 30
|
||||
}
|
||||
|
||||
def __init__(self):
|
||||
self.reset()
|
||||
self.viewer = None
|
||||
self.reset()
|
||||
|
||||
self.min_position = -1.2
|
||||
self.max_position = 0.6
|
||||
self.max_speed = 0.07
|
||||
self.goal_position = 0.5
|
||||
|
||||
self.low = np.array([self.min_position, -self.max_speed])
|
||||
self.high = np.array([self.max_position, self.max_speed])
|
||||
|
||||
self.action_space = spaces.Discrete(3)
|
||||
self.observation_space = spaces.Box(self.low, self.high)
|
||||
|
||||
def _step(self, action):
|
||||
# action = np.sign((self.state[0]+math.pi/2) * self.state[1])+1
|
||||
position, velocity = self.state
|
||||
velocity += (action-1)*0.001 + math.cos(3*position)*(-0.0025)
|
||||
if (velocity > self.max_speed): velocity = self.max_speed
|
||||
if (velocity < -self.max_speed): velocity = -self.max_speed
|
||||
position += velocity
|
||||
if (position > self.max_position): position = self.max_position
|
||||
if (position < self.min_position): position = self.min_position
|
||||
if (position==self.min_position and velocity<0): velocity = 0
|
||||
|
||||
done = bool(position >= self.goal_position)
|
||||
reward = -1.0
|
||||
|
||||
self.state = (position, velocity)
|
||||
return np.array(self.state), reward, done, {}
|
||||
|
||||
def _reset(self):
|
||||
self.state = np.array([np.random.uniform(low=-0.6, high=-0.4), 0])
|
||||
return np.array(self.state)
|
||||
|
||||
def _height(self, xs):
|
||||
return np.sin(3 * xs)*.45+.55
|
||||
|
||||
def _render(self, mode='human', close=False):
|
||||
if close:
|
||||
if self.viewer is not None:
|
||||
self.viewer.close()
|
||||
return
|
||||
|
||||
screen_width = 600
|
||||
screen_height = 400
|
||||
|
||||
world_width = self.max_position - self.min_position
|
||||
scale = screen_width/world_width
|
||||
carwidth=40
|
||||
carheight=20
|
||||
|
||||
|
||||
if self.viewer is None:
|
||||
from gym.envs.classic_control import rendering
|
||||
self.viewer = rendering.Viewer(screen_width, screen_height)
|
||||
xs = np.linspace(self.min_position, self.max_position, 100)
|
||||
ys = self._height(xs)
|
||||
xys = zip((xs-self.min_position)*scale, ys*scale)
|
||||
|
||||
self.track = rendering.make_polyline(xys)
|
||||
self.track.set_linewidth(4)
|
||||
self.viewer.add_geom(self.track)
|
||||
|
||||
clearance = 10
|
||||
|
||||
l,r,t,b = -carwidth/2, carwidth/2, carheight, 0
|
||||
car = rendering.FilledPolygon([(l,b), (l,t), (r,t), (r,b)])
|
||||
car.add_attr(rendering.Transform(translation=(0, clearance)))
|
||||
self.cartrans = rendering.Transform()
|
||||
car.add_attr(self.cartrans)
|
||||
self.viewer.add_geom(car)
|
||||
frontwheel = rendering.make_circle(carheight/2.5)
|
||||
frontwheel.set_color(.5, .5, .5)
|
||||
frontwheel.add_attr(rendering.Transform(translation=(carwidth/4,clearance)))
|
||||
frontwheel.add_attr(self.cartrans)
|
||||
self.viewer.add_geom(frontwheel)
|
||||
backwheel = rendering.make_circle(carheight/2.5)
|
||||
backwheel.add_attr(rendering.Transform(translation=(-carwidth/4,clearance)))
|
||||
backwheel.add_attr(self.cartrans)
|
||||
backwheel.set_color(.5, .5, .5)
|
||||
self.viewer.add_geom(backwheel)
|
||||
flagx = (self.goal_position-self.min_position)*scale
|
||||
flagy1 = self._height(self.goal_position)*scale
|
||||
flagy2 = flagy1 + 50
|
||||
flagpole = rendering.Line((flagx, flagy1), (flagx, flagy2))
|
||||
self.viewer.add_geom(flagpole)
|
||||
flag = rendering.FilledPolygon([(flagx, flagy2), (flagx, flagy2-10), (flagx+25, flagy2-5)])
|
||||
flag.set_color(.8,.8,0)
|
||||
self.viewer.add_geom(flag)
|
||||
|
||||
pos = self.state[0]
|
||||
self.cartrans.set_translation((pos-self.min_position)*scale, self._height(pos)*scale)
|
||||
self.cartrans.set_rotation(math.cos(3 * pos))
|
||||
|
||||
self.viewer.render()
|
||||
if mode == 'rgb_array':
|
||||
return self.viewer.get_array()
|
||||
elif mode is 'human':
|
||||
pass
|
||||
else:
|
||||
return super(MountainCarEnv, self).render(mode=mode)
|
89
gym/envs/classic_control/pendulum.py
Normal file
89
gym/envs/classic_control/pendulum.py
Normal file
@@ -0,0 +1,89 @@
|
||||
import gym
|
||||
from gym import spaces
|
||||
import numpy as np
|
||||
from os import path
|
||||
|
||||
class PendulumEnv(gym.Env):
|
||||
metadata = {
|
||||
'render.modes' : ['human', 'rgb_array'],
|
||||
'video.frames_per_second' : 30
|
||||
}
|
||||
|
||||
def __init__(self):
|
||||
self.max_speed=8
|
||||
self.max_torque=2.
|
||||
self.dt=.05
|
||||
self.viewer = None
|
||||
|
||||
high = np.array([1., 1., self.max_speed])
|
||||
self.action_space = spaces.Box(low=-self.max_torque, high=self.max_torque, shape=(1,))
|
||||
self.observation_space = spaces.Box(low=-high, high=high)
|
||||
|
||||
def _step(self,u):
|
||||
th, thdot = self.state # th := theta
|
||||
|
||||
g = 10.
|
||||
m = 1.
|
||||
l = 1.
|
||||
dt = self.dt
|
||||
|
||||
self.last_u = u # for rendering
|
||||
u = np.clip(u, -self.max_torque, self.max_torque)[0]
|
||||
costs = angle_normalize(th)**2 + .1*thdot**2 + .001*(u**2)
|
||||
|
||||
newthdot = thdot + (-3*g/(2*l) * np.sin(th + np.pi) + 3./(m*l**2)*u) * dt
|
||||
newth = th + newthdot*dt
|
||||
newthdot = np.clip(newthdot, -self.max_speed, self.max_speed) #pylint: disable=E1111
|
||||
|
||||
self.state = np.array([newth, newthdot])
|
||||
return self._get_obs(), -costs, False, {}
|
||||
|
||||
def _reset(self):
|
||||
high = np.array([np.pi, 1])
|
||||
self.state = np.random.uniform(low=-high, high=high)
|
||||
self.last_u = None
|
||||
return self._get_obs()
|
||||
|
||||
def _get_obs(self):
|
||||
theta, thetadot = self.state
|
||||
return np.array([np.cos(theta), np.sin(theta), thetadot])
|
||||
|
||||
def _render(self, mode='human', close=False):
|
||||
if close:
|
||||
if self.viewer is not None:
|
||||
self.viewer.close()
|
||||
return
|
||||
|
||||
if self.viewer is None:
|
||||
from gym.envs.classic_control import rendering
|
||||
self.viewer = rendering.Viewer(500,500)
|
||||
self.viewer.set_bounds(-2.2,2.2,-2.2,2.2)
|
||||
rod = rendering.make_capsule(1, .2)
|
||||
rod.set_color(.8, .3, .3)
|
||||
self.pole_transform = rendering.Transform()
|
||||
rod.add_attr(self.pole_transform)
|
||||
self.viewer.add_geom(rod)
|
||||
axle = rendering.make_circle(.05)
|
||||
axle.set_color(0,0,0)
|
||||
self.viewer.add_geom(axle)
|
||||
fname = path.join(path.dirname(__file__), "assets/clockwise.png")
|
||||
self.img = rendering.Image(fname, 1., 1.)
|
||||
self.imgtrans = rendering.Transform()
|
||||
self.img.add_attr(self.imgtrans)
|
||||
|
||||
self.viewer.add_onetime(self.img)
|
||||
self.pole_transform.set_rotation(self.state[0] + np.pi/2)
|
||||
if self.last_u:
|
||||
self.imgtrans.scale = (-self.last_u/2, np.abs(self.last_u)/2)
|
||||
|
||||
|
||||
self.viewer.render()
|
||||
if mode == 'rgb_array':
|
||||
return self.viewer.get_array()
|
||||
elif mode is 'human':
|
||||
pass
|
||||
else:
|
||||
return super(PendulumEnv, self).render(mode=mode)
|
||||
|
||||
def angle_normalize(x):
|
||||
return (((x+np.pi) % (2*np.pi)) - np.pi)
|
292
gym/envs/classic_control/rendering.py
Normal file
292
gym/envs/classic_control/rendering.py
Normal file
@@ -0,0 +1,292 @@
|
||||
"""
|
||||
2D rendering framework
|
||||
"""
|
||||
from __future__ import division
|
||||
import os, sys
|
||||
if "Apple" in sys.version:
|
||||
if 'DYLD_FALLBACK_LIBRARY_PATH' in os.environ:
|
||||
os.environ['DYLD_FALLBACK_LIBRARY_PATH'] += ':/usr/lib'
|
||||
# (JDS 2016/04/15): avoid bug on Anaconda 2.3.0 / Yosemite
|
||||
|
||||
from gym import error
|
||||
|
||||
import pyglet
|
||||
try:
|
||||
from pyglet.gl import *
|
||||
except ImportError as e:
|
||||
raise error.DependencyNotInstalled("""{} (while running: from pyglet.gl import *).
|
||||
|
||||
(HINT: make sure you have OpenGL install. On Ubuntu, you can run 'apt-get install python-opengl'. If you're running on a server, you may need a virtual frame buffer; something like this should work: 'xvfb-run -s "-screen 0 1400x900x24" <your script here>')""".format(e))
|
||||
|
||||
import math
|
||||
import numpy as np
|
||||
|
||||
RAD2DEG = 57.29577951308232
|
||||
|
||||
class Viewer(object):
|
||||
def __init__(self, width, height):
|
||||
self.width = width
|
||||
self.height = height
|
||||
self.window = pyglet.window.Window(width=width, height=height)
|
||||
self.geoms = []
|
||||
self.onetime_geoms = []
|
||||
self.transform = Transform()
|
||||
|
||||
glEnable(GL_BLEND)
|
||||
glBlendFunc(GL_SRC_ALPHA, GL_ONE_MINUS_SRC_ALPHA)
|
||||
|
||||
def close(self):
|
||||
self.window.close()
|
||||
|
||||
def set_bounds(self, left, right, bottom, top):
|
||||
assert right > left and top > bottom
|
||||
scalex = self.width/(right-left)
|
||||
scaley = self.height/(top-bottom)
|
||||
self.transform = Transform(
|
||||
translation=(-left*scalex, -bottom*scalex),
|
||||
scale=(scalex, scaley))
|
||||
|
||||
def add_geom(self, geom):
|
||||
self.geoms.append(geom)
|
||||
|
||||
def add_onetime(self, geom):
|
||||
self.onetime_geoms.append(geom)
|
||||
|
||||
def render(self):
|
||||
glClearColor(1,1,1,1)
|
||||
self.window.clear()
|
||||
self.window.switch_to()
|
||||
self.window.dispatch_events()
|
||||
self.transform.enable()
|
||||
for geom in self.geoms:
|
||||
geom.render()
|
||||
for geom in self.onetime_geoms:
|
||||
geom.render()
|
||||
self.transform.disable()
|
||||
self.window.flip()
|
||||
self.onetime_geoms = []
|
||||
|
||||
# Convenience
|
||||
def draw_circle(self, radius=10, res=30, filled=True, **attrs):
|
||||
geom = make_circle(radius=radius, res=res, filled=filled)
|
||||
_add_attrs(geom, attrs)
|
||||
self.add_onetime(geom)
|
||||
return geom
|
||||
|
||||
def draw_polygon(self, v, filled=True, **attrs):
|
||||
geom = make_polygon(v=v, filled=filled)
|
||||
_add_attrs(geom, attrs)
|
||||
self.add_onetime(geom)
|
||||
return geom
|
||||
|
||||
def draw_polyline(self, v, **attrs):
|
||||
geom = make_polyline(v=v)
|
||||
_add_attrs(geom, attrs)
|
||||
self.add_onetime(geom)
|
||||
return geom
|
||||
|
||||
def draw_line(self, start, end, **attrs):
|
||||
geom = Line(start, end)
|
||||
_add_attrs(geom, attrs)
|
||||
self.add_onetime(geom)
|
||||
return geom
|
||||
|
||||
def get_array(self):
|
||||
self.window.flip()
|
||||
image_data = pyglet.image.get_buffer_manager().get_color_buffer().get_image_data()
|
||||
self.window.flip()
|
||||
arr = np.fromstring(image_data.data, dtype=np.uint8, sep='')
|
||||
arr = arr.reshape(self.height, self.width, 4)
|
||||
return arr[::-1,:,0:3]
|
||||
|
||||
def _add_attrs(geom, attrs):
|
||||
if "color" in attrs:
|
||||
geom.set_color(attrs["color"])
|
||||
if "linewidth" in attrs:
|
||||
geom.set_linewidth(attrs["linewidth"])
|
||||
|
||||
class Geom(object):
|
||||
def __init__(self):
|
||||
self._color=Color((0, 0, 0, 1.0))
|
||||
self.attrs = [self._color]
|
||||
def render(self):
|
||||
for attr in reversed(self.attrs):
|
||||
attr.enable()
|
||||
self.render1()
|
||||
for attr in self.attrs:
|
||||
attr.disable()
|
||||
def render1(self):
|
||||
raise NotImplementedError
|
||||
def add_attr(self, attr):
|
||||
self.attrs.append(attr)
|
||||
def set_color(self, r, g, b):
|
||||
self._color.vec4 = (r, g, b, 1)
|
||||
|
||||
class Attr(object):
|
||||
def enable(self):
|
||||
raise NotImplementedError
|
||||
def disable(self):
|
||||
pass
|
||||
|
||||
class Transform(Attr):
|
||||
def __init__(self, translation=(0.0, 0.0), rotation=0.0, scale=(1,1)):
|
||||
self.set_translation(*translation)
|
||||
self.set_rotation(rotation)
|
||||
self.set_scale(*scale)
|
||||
def enable(self):
|
||||
glPushMatrix()
|
||||
glTranslatef(self.translation[0], self.translation[1], 0) # translate to GL loc ppint
|
||||
glRotatef(RAD2DEG * self.rotation, 0, 0, 1.0)
|
||||
glScalef(self.scale[0], self.scale[1], 1)
|
||||
def disable(self):
|
||||
glPopMatrix()
|
||||
def set_translation(self, newx, newy):
|
||||
self.translation = (float(newx), float(newy))
|
||||
def set_rotation(self, new):
|
||||
self.rotation = float(new)
|
||||
def set_scale(self, newx, newy):
|
||||
self.scale = (float(newx), float(newy))
|
||||
|
||||
class Color(Attr):
|
||||
def __init__(self, vec4):
|
||||
self.vec4 = vec4
|
||||
def enable(self):
|
||||
glColor4f(*self.vec4)
|
||||
|
||||
class LineStyle(Attr):
|
||||
def __init__(self, style):
|
||||
self.style = style
|
||||
def enable(self):
|
||||
glEnable(GL_LINE_STIPPLE)
|
||||
glLineStipple(1, self.style)
|
||||
def disable(self):
|
||||
glDisable(GL_LINE_STIPPLE)
|
||||
|
||||
class LineWidth(Attr):
|
||||
def __init__(self, stroke):
|
||||
self.stroke = stroke
|
||||
def enable(self):
|
||||
glLineWidth(self.stroke)
|
||||
|
||||
class Point(Geom):
|
||||
def __init__(self):
|
||||
Geom.__init__(self)
|
||||
def render1(self):
|
||||
glBegin(GL_POINTS) # draw point
|
||||
glVertex3f(0.0, 0.0, 0.0)
|
||||
glEnd()
|
||||
|
||||
class FilledPolygon(Geom):
|
||||
def __init__(self, v):
|
||||
Geom.__init__(self)
|
||||
self.v = v
|
||||
def render1(self):
|
||||
if len(self.v) == 4 : glBegin(GL_QUADS)
|
||||
elif len(self.v) > 4 : glBegin(GL_POLYGON)
|
||||
else: glBegin(GL_TRIANGLES)
|
||||
for p in self.v:
|
||||
glVertex3f(p[0], p[1],0) # draw each vertex
|
||||
glEnd()
|
||||
|
||||
def make_circle(radius=10, res=30, filled=True):
|
||||
points = []
|
||||
for i in xrange(res):
|
||||
ang = 2*math.pi*i / res
|
||||
points.append((math.cos(ang)*radius, math.sin(ang)*radius))
|
||||
if filled:
|
||||
return FilledPolygon(points)
|
||||
else:
|
||||
return PolyLine(points, True)
|
||||
|
||||
def make_polygon(v, filled=True):
|
||||
if filled: return FilledPolygon(v)
|
||||
else: return PolyLine(v, True)
|
||||
|
||||
def make_polyline(v):
|
||||
return PolyLine(v, False)
|
||||
|
||||
def make_capsule(length, width):
|
||||
l, r, t, b = 0, length, width/2, -width/2
|
||||
box = make_polygon([(l,b), (l,t), (r,t), (r,b)])
|
||||
circ0 = make_circle(width/2)
|
||||
circ1 = make_circle(width/2)
|
||||
circ1.add_attr(Transform(translation=(length, 0)))
|
||||
geom = Compound([box, circ0, circ1])
|
||||
return geom
|
||||
|
||||
class Compound(Geom):
|
||||
def __init__(self, gs):
|
||||
Geom.__init__(self)
|
||||
self.gs = gs
|
||||
for g in self.gs:
|
||||
g.attrs = [a for a in g.attrs if not isinstance(a, Color)]
|
||||
def render1(self):
|
||||
for g in self.gs:
|
||||
g.render()
|
||||
|
||||
class PolyLine(Geom):
|
||||
def __init__(self, v, close):
|
||||
Geom.__init__(self)
|
||||
self.v = v
|
||||
self.close = close
|
||||
self.linewidth = LineWidth(1)
|
||||
self.add_attr(self.linewidth)
|
||||
def render1(self):
|
||||
glBegin(GL_LINE_LOOP if self.close else GL_LINE_STRIP)
|
||||
for p in self.v:
|
||||
glVertex3f(p[0], p[1],0) # draw each vertex
|
||||
glEnd()
|
||||
def set_linewidth(self, x):
|
||||
self.linewidth.stroke = x
|
||||
|
||||
class Line(Geom):
|
||||
def __init__(self, start=(0.0, 0.0), end=(0.0, 0.0)):
|
||||
Geom.__init__(self)
|
||||
self.start = start
|
||||
self.end = end
|
||||
self.linewidth = LineWidth(1)
|
||||
self.add_attr(self.linewidth)
|
||||
|
||||
def render1(self):
|
||||
glBegin(GL_LINES)
|
||||
glVertex2f(*self.start)
|
||||
glVertex2f(*self.end)
|
||||
glEnd()
|
||||
|
||||
class Image(Geom):
|
||||
def __init__(self, fname, width, height):
|
||||
Geom.__init__(self)
|
||||
self.width = width
|
||||
self.height = height
|
||||
img = pyglet.image.load(fname)
|
||||
self.img = img
|
||||
self.flip = False
|
||||
def render1(self):
|
||||
self.img.blit(-self.width/2, -self.height/2, width=self.width, height=self.height)
|
||||
|
||||
# ================================================================
|
||||
|
||||
class SimpleImageViewer(object):
|
||||
def __init__(self):
|
||||
self.window = None
|
||||
self.isopen = False
|
||||
def imshow(self, arr):
|
||||
if self.window is None:
|
||||
height, width, channels = arr.shape
|
||||
self.window = pyglet.window.Window(width=width, height=height)
|
||||
self.width = width
|
||||
self.height = height
|
||||
self.isopen = True
|
||||
assert arr.shape == (self.height, self.width, 3), "You passed in an image with the wrong number shape"
|
||||
image = pyglet.image.ImageData(self.width, self.height, 'RGB', arr.tobytes(), pitch=self.width * -3)
|
||||
self.window.clear()
|
||||
self.window.switch_to()
|
||||
self.window.dispatch_events()
|
||||
image.blit(0,0)
|
||||
self.window.flip()
|
||||
def close(self):
|
||||
if self.isopen:
|
||||
self.window.close()
|
||||
self.isopen = False
|
||||
def __del__(self):
|
||||
self.close()
|
1
gym/envs/mujoco/.gitignore
vendored
Normal file
1
gym/envs/mujoco/.gitignore
vendored
Normal file
@@ -0,0 +1 @@
|
||||
mujoco-bundle
|
12
gym/envs/mujoco/__init__.py
Normal file
12
gym/envs/mujoco/__init__.py
Normal file
@@ -0,0 +1,12 @@
|
||||
from gym.envs.mujoco.mujoco_env import MujocoEnv
|
||||
# ^^^^^ so that user gets the correct error
|
||||
# message if mujoco is not installed correctly
|
||||
from gym.envs.mujoco.ant import AntEnv
|
||||
from gym.envs.mujoco.half_cheetah import HalfCheetahEnv
|
||||
from gym.envs.mujoco.hopper import HopperEnv
|
||||
from gym.envs.mujoco.walker2d import Walker2dEnv
|
||||
from gym.envs.mujoco.humanoid import HumanoidEnv
|
||||
from gym.envs.mujoco.inverted_pendulum import InvertedPendulumEnv
|
||||
from gym.envs.mujoco.inverted_double_pendulum import InvertedDoublePendulumEnv
|
||||
from gym.envs.mujoco.reacher import ReacherEnv
|
||||
from gym.envs.mujoco.swimmer import SwimmerEnv
|
46
gym/envs/mujoco/ant.py
Normal file
46
gym/envs/mujoco/ant.py
Normal file
@@ -0,0 +1,46 @@
|
||||
import numpy as np
|
||||
from gym import utils
|
||||
from gym.envs.mujoco import mujoco_env
|
||||
|
||||
class AntEnv(mujoco_env.MujocoEnv, utils.EzPickle):
|
||||
def __init__(self):
|
||||
mujoco_env.MujocoEnv.__init__(self, 'ant.xml', 5)
|
||||
utils.EzPickle.__init__(self)
|
||||
self.finalize()
|
||||
|
||||
def _step(self, a):
|
||||
xposbefore = self.get_body_com("torso")[0]
|
||||
self.do_simulation(a, self.frame_skip)
|
||||
xposafter = self.get_body_com("torso")[0]
|
||||
forward_reward = (xposafter - xposbefore)/self.dt
|
||||
ctrl_cost = .5 * np.square(a).sum()
|
||||
contact_cost = 0.5 * 1e-3 * np.sum(
|
||||
np.square(np.clip(self.model.data.cfrc_ext, -1, 1)))
|
||||
survive_reward = 1.0
|
||||
reward = forward_reward - ctrl_cost - contact_cost + survive_reward
|
||||
state = self._state
|
||||
notdone = np.isfinite(state).all() \
|
||||
and state[2] >= 0.2 and state[2] <= 1.0
|
||||
done = not notdone
|
||||
ob = self._get_obs()
|
||||
return ob, reward, done, dict(
|
||||
reward_forward=forward_reward,
|
||||
reward_ctrl=-ctrl_cost,
|
||||
reward_contact=-contact_cost,
|
||||
reward_survive=survive_reward)
|
||||
|
||||
def _get_obs(self):
|
||||
return np.concatenate([
|
||||
self.model.data.qpos.flat[2:],
|
||||
self.model.data.qvel.flat,
|
||||
np.clip(self.model.data.cfrc_ext, -1, 1).flat,
|
||||
])
|
||||
|
||||
def _reset(self):
|
||||
self.model.data.qpos = self.init_qpos + np.random.uniform(size=(self.model.nq,1),low=-.1,high=.1)
|
||||
self.model.data.qvel = self.init_qvel + np.random.randn(self.model.nv,1)*.1
|
||||
self.reset_viewer_if_necessary()
|
||||
return self._get_obs()
|
||||
|
||||
def viewer_setup(self):
|
||||
self.viewer.cam.distance = self.model.stat.extent * 0.5
|
80
gym/envs/mujoco/assets/ant.xml
Normal file
80
gym/envs/mujoco/assets/ant.xml
Normal file
@@ -0,0 +1,80 @@
|
||||
<mujoco model="ant">
|
||||
<compiler angle="degree" coordinate="local" inertiafromgeom="true"/>
|
||||
<option integrator="RK4" timestep="0.01"/>
|
||||
<custom>
|
||||
<numeric data="0.0 0.0 0.55 1.0 0.0 0.0 0.0 0.0 1.0 0.0 -1.0 0.0 -1.0 0.0 1.0" name="init_qpos"/>
|
||||
</custom>
|
||||
<default>
|
||||
<joint armature="1" damping="1" limited="true"/>
|
||||
<geom conaffinity="0" condim="3" density="5.0" friction="1 0.5 0.5" margin="0.01" rgba="0.8 0.6 0.4 1"/>
|
||||
</default>
|
||||
<asset>
|
||||
<texture builtin="gradient" height="100" rgb1="1 1 1" rgb2="0 0 0" type="skybox" width="100"/>
|
||||
<texture builtin="flat" height="1278" mark="cross" markrgb="1 1 1" name="texgeom" random="0.01" rgb1="0.8 0.6 0.4" rgb2="0.8 0.6 0.4" type="cube" width="127"/>
|
||||
<texture builtin="checker" height="100" name="texplane" rgb1="0 0 0" rgb2="0.8 0.8 0.8" type="2d" width="100"/>
|
||||
<material name="MatPlane" reflectance="0.5" shininess="1" specular="1" texrepeat="60 60" texture="texplane"/>
|
||||
<material name="geom" texture="texgeom" texuniform="true"/>
|
||||
</asset>
|
||||
<worldbody>
|
||||
<light cutoff="100" diffuse="1 1 1" dir="-0 0 -1.3" directional="true" exponent="1" pos="0 0 1.3" specular=".1 .1 .1"/>
|
||||
<geom conaffinity="1" condim="3" material="MatPlane" name="floor" pos="0 0 0" rgba="0.8 0.9 0.8 1" size="40 40 40" type="plane"/>
|
||||
<body name="torso" pos="0 0 0.75">
|
||||
<geom name="torso_geom" pos="0 0 0" size="0.25" type="sphere"/>
|
||||
<joint armature="0" damping="0" limited="false" margin="0.01" name="root" pos="0 0 0" type="free"/>
|
||||
<body name="front_left_leg" pos="0 0 0">
|
||||
<geom fromto="0.0 0.0 0.0 0.2 0.2 0.0" name="aux_1_geom" size="0.08" type="capsule"/>
|
||||
<body name="aux_1" pos="0.2 0.2 0">
|
||||
<joint axis="0 0 1" name="hip_1" pos="0.0 0.0 0.0" range="-30 30" type="hinge"/>
|
||||
<geom fromto="0.0 0.0 0.0 0.2 0.2 0.0" name="left_leg_geom" size="0.08" type="capsule"/>
|
||||
<body pos="0.2 0.2 0">
|
||||
<joint axis="-1 1 0" name="ankle_1" pos="0.0 0.0 0.0" range="30 70" type="hinge"/>
|
||||
<geom fromto="0.0 0.0 0.0 0.4 0.4 0.0" name="left_ankle_geom" size="0.08" type="capsule"/>
|
||||
</body>
|
||||
</body>
|
||||
</body>
|
||||
<body name="front_right_leg" pos="0 0 0">
|
||||
<geom fromto="0.0 0.0 0.0 -0.2 0.2 0.0" name="aux_2_geom" size="0.08" type="capsule"/>
|
||||
<body name="aux_2" pos="-0.2 0.2 0">
|
||||
<joint axis="0 0 1" name="hip_2" pos="0.0 0.0 0.0" range="-30 30" type="hinge"/>
|
||||
<geom fromto="0.0 0.0 0.0 -0.2 0.2 0.0" name="right_leg_geom" size="0.08" type="capsule"/>
|
||||
<body pos="-0.2 0.2 0">
|
||||
<joint axis="1 1 0" name="ankle_2" pos="0.0 0.0 0.0" range="-70 -30" type="hinge"/>
|
||||
<geom fromto="0.0 0.0 0.0 -0.4 0.4 0.0" name="right_ankle_geom" size="0.08" type="capsule"/>
|
||||
</body>
|
||||
</body>
|
||||
</body>
|
||||
<body name="back_leg" pos="0 0 0">
|
||||
<geom fromto="0.0 0.0 0.0 -0.2 -0.2 0.0" name="aux_3_geom" size="0.08" type="capsule"/>
|
||||
<body name="aux_3" pos="-0.2 -0.2 0">
|
||||
<joint axis="0 0 1" name="hip_3" pos="0.0 0.0 0.0" range="-30 30" type="hinge"/>
|
||||
<geom fromto="0.0 0.0 0.0 -0.2 -0.2 0.0" name="back_leg_geom" size="0.08" type="capsule"/>
|
||||
<body pos="-0.2 -0.2 0">
|
||||
<joint axis="-1 1 0" name="ankle_3" pos="0.0 0.0 0.0" range="-70 -30" type="hinge"/>
|
||||
<geom fromto="0.0 0.0 0.0 -0.4 -0.4 0.0" name="third_ankle_geom" size="0.08" type="capsule"/>
|
||||
</body>
|
||||
</body>
|
||||
</body>
|
||||
<body name="right_back_leg" pos="0 0 0">
|
||||
<geom fromto="0.0 0.0 0.0 0.2 -0.2 0.0" name="aux_4_geom" size="0.08" type="capsule"/>
|
||||
<body name="aux_4" pos="0.2 -0.2 0">
|
||||
<joint axis="0 0 1" name="hip_4" pos="0.0 0.0 0.0" range="-30 30" type="hinge"/>
|
||||
<geom fromto="0.0 0.0 0.0 0.2 -0.2 0.0" name="rightback_leg_geom" size="0.08" type="capsule"/>
|
||||
<body pos="0.2 -0.2 0">
|
||||
<joint axis="1 1 0" name="ankle_4" pos="0.0 0.0 0.0" range="30 70" type="hinge"/>
|
||||
<geom fromto="0.0 0.0 0.0 0.4 -0.4 0.0" name="fourth_ankle_geom" size="0.08" type="capsule"/>
|
||||
</body>
|
||||
</body>
|
||||
</body>
|
||||
</body>
|
||||
</worldbody>
|
||||
<actuator>
|
||||
<motor ctrllimited="true" ctrlrange="-1.0 1.0" joint="hip_4" gear="150"/>
|
||||
<motor ctrllimited="true" ctrlrange="-1.0 1.0" joint="ankle_4" gear="150"/>
|
||||
<motor ctrllimited="true" ctrlrange="-1.0 1.0" joint="hip_1" gear="150"/>
|
||||
<motor ctrllimited="true" ctrlrange="-1.0 1.0" joint="ankle_1" gear="150"/>
|
||||
<motor ctrllimited="true" ctrlrange="-1.0 1.0" joint="hip_2" gear="150"/>
|
||||
<motor ctrllimited="true" ctrlrange="-1.0 1.0" joint="ankle_2" gear="150"/>
|
||||
<motor ctrllimited="true" ctrlrange="-1.0 1.0" joint="hip_3" gear="150"/>
|
||||
<motor ctrllimited="true" ctrlrange="-1.0 1.0" joint="ankle_3" gear="150"/>
|
||||
</actuator>
|
||||
</mujoco>
|
95
gym/envs/mujoco/assets/half_cheetah.xml
Normal file
95
gym/envs/mujoco/assets/half_cheetah.xml
Normal file
@@ -0,0 +1,95 @@
|
||||
<!-- Cheetah Model
|
||||
|
||||
The state space is populated with joints in the order that they are
|
||||
defined in this file. The actuators also operate on joints.
|
||||
|
||||
State-Space (name/joint/parameter):
|
||||
- rootx slider position (m)
|
||||
- rootz slider position (m)
|
||||
- rooty hinge angle (rad)
|
||||
- bthigh hinge angle (rad)
|
||||
- bshin hinge angle (rad)
|
||||
- bfoot hinge angle (rad)
|
||||
- fthigh hinge angle (rad)
|
||||
- fshin hinge angle (rad)
|
||||
- ffoot hinge angle (rad)
|
||||
- rootx slider velocity (m/s)
|
||||
- rootz slider velocity (m/s)
|
||||
- rooty hinge angular velocity (rad/s)
|
||||
- bthigh hinge angular velocity (rad/s)
|
||||
- bshin hinge angular velocity (rad/s)
|
||||
- bfoot hinge angular velocity (rad/s)
|
||||
- fthigh hinge angular velocity (rad/s)
|
||||
- fshin hinge angular velocity (rad/s)
|
||||
- ffoot hinge angular velocity (rad/s)
|
||||
|
||||
Actuators (name/actuator/parameter):
|
||||
- bthigh hinge torque (N m)
|
||||
- bshin hinge torque (N m)
|
||||
- bfoot hinge torque (N m)
|
||||
- fthigh hinge torque (N m)
|
||||
- fshin hinge torque (N m)
|
||||
- ffoot hinge torque (N m)
|
||||
|
||||
-->
|
||||
<mujoco model="cheetah">
|
||||
<compiler angle="radian" coordinate="local" inertiafromgeom="true" settotalmass="14"/>
|
||||
<default>
|
||||
<joint armature=".1" damping=".01" limited="true" solimplimit="0 .8 .03" solreflimit=".02 1" stiffness="8"/>
|
||||
<geom conaffinity="0" condim="3" contype="1" friction=".4 .1 .1" rgba="0.8 0.6 .4 1" solimp="0.0 0.8 0.01" solref="0.02 1"/>
|
||||
<motor ctrllimited="true" ctrlrange="-1 1"/>
|
||||
</default>
|
||||
<size nstack="300000" nuser_geom="1"/>
|
||||
<option gravity="0 0 -9.81" timestep="0.01"/>
|
||||
<asset>
|
||||
<texture builtin="gradient" height="100" rgb1="1 1 1" rgb2="0 0 0" type="skybox" width="100"/>
|
||||
<texture builtin="flat" height="1278" mark="cross" markrgb="1 1 1" name="texgeom" random="0.01" rgb1="0.8 0.6 0.4" rgb2="0.8 0.6 0.4" type="cube" width="127"/>
|
||||
<texture builtin="checker" height="100" name="texplane" rgb1="0 0 0" rgb2="0.8 0.8 0.8" type="2d" width="100"/>
|
||||
<material name="MatPlane" reflectance="0.5" shininess="1" specular="1" texrepeat="60 60" texture="texplane"/>
|
||||
<material name="geom" texture="texgeom" texuniform="true"/>
|
||||
</asset>
|
||||
<worldbody>
|
||||
<light cutoff="100" diffuse="1 1 1" dir="-0 0 -1.3" directional="true" exponent="1" pos="0 0 1.3" specular=".1 .1 .1"/>
|
||||
<geom conaffinity="1" condim="3" material="MatPlane" name="floor" pos="0 0 0" rgba="0.8 0.9 0.8 1" size="40 40 40" type="plane"/>
|
||||
<body name="torso" pos="0 0 .7">
|
||||
<joint armature="0" axis="1 0 0" damping="0" limited="false" name="rootx" pos="0 0 0" stiffness="0" type="slide"/>
|
||||
<joint armature="0" axis="0 0 1" damping="0" limited="false" name="rootz" pos="0 0 0" stiffness="0" type="slide"/>
|
||||
<joint armature="0" axis="0 1 0" damping="0" limited="false" name="rooty" pos="0 0 0" stiffness="0" type="hinge"/>
|
||||
<geom fromto="-.5 0 0 .5 0 0" name="torso" size="0.046" type="capsule"/>
|
||||
<geom axisangle="0 1 0 .87" name="head" pos=".6 0 .1" size="0.046 .15" type="capsule"/>
|
||||
<!-- <site name='tip' pos='.15 0 .11'/>-->
|
||||
<body name="bthigh" pos="-.5 0 0">
|
||||
<joint axis="0 1 0" damping="6" name="bthigh" pos="0 0 0" range="-.52 1.05" stiffness="240" type="hinge"/>
|
||||
<geom axisangle="0 1 0 -3.8" name="bthigh" pos=".1 0 -.13" size="0.046 .145" type="capsule"/>
|
||||
<body name="bshin" pos=".16 0 -.25">
|
||||
<joint axis="0 1 0" damping="4.5" name="bshin" pos="0 0 0" range="-.785 .785" stiffness="180" type="hinge"/>
|
||||
<geom axisangle="0 1 0 -2.03" name="bshin" pos="-.14 0 -.07" rgba="0.9 0.6 0.6 1" size="0.046 .15" type="capsule"/>
|
||||
<body name="bfoot" pos="-.28 0 -.14">
|
||||
<joint axis="0 1 0" damping="3" name="bfoot" pos="0 0 0" range="-.4 .785" stiffness="120" type="hinge"/>
|
||||
<geom axisangle="0 1 0 -.27" name="bfoot" pos=".03 0 -.097" rgba="0.9 0.6 0.6 1" size="0.046 .094" type="capsule"/>
|
||||
</body>
|
||||
</body>
|
||||
</body>
|
||||
<body name="fthigh" pos=".5 0 0">
|
||||
<joint axis="0 1 0" damping="4.5" name="fthigh" pos="0 0 0" range="-1 .7" stiffness="180" type="hinge"/>
|
||||
<geom axisangle="0 1 0 .52" name="fthigh" pos="-.07 0 -.12" size="0.046 .133" type="capsule"/>
|
||||
<body name="fshin" pos="-.14 0 -.24">
|
||||
<joint axis="0 1 0" damping="3" name="fshin" pos="0 0 0" range="-1.2 .87" stiffness="120" type="hinge"/>
|
||||
<geom axisangle="0 1 0 -.6" name="fshin" pos=".065 0 -.09" rgba="0.9 0.6 0.6 1" size="0.046 .106" type="capsule"/>
|
||||
<body name="ffoot" pos=".13 0 -.18">
|
||||
<joint axis="0 1 0" damping="1.5" name="ffoot" pos="0 0 0" range="-.5 .5" stiffness="60" type="hinge"/>
|
||||
<geom axisangle="0 1 0 -.6" name="ffoot" pos=".045 0 -.07" rgba="0.9 0.6 0.6 1" size="0.046 .07" type="capsule"/>
|
||||
</body>
|
||||
</body>
|
||||
</body>
|
||||
</body>
|
||||
</worldbody>
|
||||
<actuator>
|
||||
<motor gear="120" joint="bthigh" name="bthigh"/>
|
||||
<motor gear="90" joint="bshin" name="bshin"/>
|
||||
<motor gear="60" joint="bfoot" name="bfoot"/>
|
||||
<motor gear="120" joint="fthigh" name="fthigh"/>
|
||||
<motor gear="60" joint="fshin" name="fshin"/>
|
||||
<motor gear="30" joint="ffoot" name="ffoot"/>
|
||||
</actuator>
|
||||
</mujoco>
|
44
gym/envs/mujoco/assets/hopper.xml
Normal file
44
gym/envs/mujoco/assets/hopper.xml
Normal file
@@ -0,0 +1,44 @@
|
||||
<mujoco model="hopper">
|
||||
<compiler angle="degree" coordinate="global" inertiafromgeom="true"/>
|
||||
<default>
|
||||
<joint armature="1" damping="1" limited="true"/>
|
||||
<geom conaffinity="1" condim="1" contype="1" margin="0.001" material="geom" rgba="0.8 0.6 .4 1" solimp=".8 .8 .01" solref=".02 1"/>
|
||||
<motor ctrllimited="true" ctrlrange="-.4 .4"/>
|
||||
</default>
|
||||
<option integrator="RK4" timestep="0.002"/>
|
||||
<worldbody>
|
||||
<light cutoff="100" diffuse="1 1 1" dir="-0 0 -1.3" directional="true" exponent="1" pos="0 0 1.3" specular=".1 .1 .1"/>
|
||||
<geom conaffinity="1" condim="3" name="floor" pos="0 0 0" rgba="0.8 0.9 0.8 1" size="20 20 .125" type="plane" material="MatPlane"/>
|
||||
<body name="torso" pos="0 0 1.25">
|
||||
<joint armature="0" axis="1 0 0" damping="0" limited="false" name="rootx" pos="0 0 0" stiffness="0" type="slide"/>
|
||||
<joint armature="0" axis="0 0 1" damping="0" limited="false" name="rootz" pos="0 0 0" ref="1.25" stiffness="0" type="slide"/>
|
||||
<joint armature="0" axis="0 1 0" damping="0" limited="false" name="rooty" pos="0 0 1.25" stiffness="0" type="hinge"/>
|
||||
<geom friction="0.9" fromto="0 0 1.45 0 0 1.05" name="torso_geom" size="0.05" type="capsule"/>
|
||||
<body name="thigh" pos="0 0 1.05">
|
||||
<joint axis="0 -1 0" name="thigh_joint" pos="0 0 1.05" range="-150 0" type="hinge"/>
|
||||
<geom friction="0.9" fromto="0 0 1.05 0 0 0.6" name="thigh_geom" size="0.05" type="capsule"/>
|
||||
<body name="leg" pos="0 0 0.35">
|
||||
<joint axis="0 -1 0" name="leg_joint" pos="0 0 0.6" range="-150 0" type="hinge"/>
|
||||
<geom friction="0.9" fromto="0 0 0.6 0 0 0.1" name="leg_geom" size="0.04" type="capsule"/>
|
||||
<body name="foot" pos="0.13/2 0 0.1">
|
||||
<joint axis="0 -1 0" name="foot_joint" pos="0 0 0.1" range="-45 45" type="hinge"/>
|
||||
<geom friction="2.0" fromto="-0.13 0 0.1 0.26 0 0.1" name="foot_geom" size="0.06" type="capsule"/>
|
||||
</body>
|
||||
</body>
|
||||
</body>
|
||||
</body>
|
||||
</worldbody>
|
||||
<actuator>
|
||||
<motor ctrllimited="true" ctrlrange="-1.0 1.0" gear="200.0" joint="thigh_joint"/>
|
||||
<motor ctrllimited="true" ctrlrange="-1.0 1.0" gear="200.0" joint="leg_joint"/>
|
||||
<motor ctrllimited="true" ctrlrange="-1.0 1.0" gear="200.0" joint="foot_joint"/>
|
||||
</actuator>
|
||||
<asset>
|
||||
<texture type="skybox" builtin="gradient" rgb1=".4 .5 .6" rgb2="0 0 0"
|
||||
width="100" height="100"/>
|
||||
<texture builtin="flat" height="1278" mark="cross" markrgb="1 1 1" name="texgeom" random="0.01" rgb1="0.8 0.6 0.4" rgb2="0.8 0.6 0.4" type="cube" width="127"/>
|
||||
<texture builtin="checker" height="100" name="texplane" rgb1="0 0 0" rgb2="0.8 0.8 0.8" type="2d" width="100"/>
|
||||
<material name="MatPlane" reflectance="0.5" shininess="1" specular="1" texrepeat="60 60" texture="texplane"/>
|
||||
<material name="geom" texture="texgeom" texuniform="true"/>
|
||||
</asset>
|
||||
</mujoco>
|
120
gym/envs/mujoco/assets/humanoid.xml
Executable file
120
gym/envs/mujoco/assets/humanoid.xml
Executable file
@@ -0,0 +1,120 @@
|
||||
<mujoco model="humanoid">
|
||||
<compiler angle="degree" inertiafromgeom="true"/>
|
||||
<default>
|
||||
<joint armature="1" damping="1" limited="true"/>
|
||||
<geom conaffinity="1" condim="1" contype="1" margin="0.001" material="geom" rgba="0.8 0.6 .4 1"/>
|
||||
<motor ctrllimited="true" ctrlrange="-.4 .4"/>
|
||||
</default>
|
||||
<option integrator="RK4" iterations="50" solver="PGS" timestep="0.003">
|
||||
<!-- <flags solverstat="enable" energy="enable"/>-->
|
||||
</option>
|
||||
<size nkey="5" nuser_geom="1"/>
|
||||
<visual>
|
||||
<map fogend="5" fogstart="3"/>
|
||||
</visual>
|
||||
<asset>
|
||||
<texture builtin="gradient" height="100" rgb1=".4 .5 .6" rgb2="0 0 0" type="skybox" width="100"/>
|
||||
<!-- <texture builtin="gradient" height="100" rgb1="1 1 1" rgb2="0 0 0" type="skybox" width="100"/>-->
|
||||
<texture builtin="flat" height="1278" mark="cross" markrgb="1 1 1" name="texgeom" random="0.01" rgb1="0.8 0.6 0.4" rgb2="0.8 0.6 0.4" type="cube" width="127"/>
|
||||
<texture builtin="checker" height="100" name="texplane" rgb1="0 0 0" rgb2="0.8 0.8 0.8" type="2d" width="100"/>
|
||||
<material name="MatPlane" reflectance="0.5" shininess="1" specular="1" texrepeat="60 60" texture="texplane"/>
|
||||
<material name="geom" texture="texgeom" texuniform="true"/>
|
||||
</asset>
|
||||
<worldbody>
|
||||
<light cutoff="100" diffuse="1 1 1" dir="-0 0 -1.3" directional="true" exponent="1" pos="0 0 1.3" specular=".1 .1 .1"/>
|
||||
<geom condim="3" friction="1 .1 .1" material="MatPlane" name="floor" pos="0 0 0" rgba="0.8 0.9 0.8 1" size="20 20 0.125" type="plane"/>
|
||||
<!-- <geom condim="3" material="MatPlane" name="floor" pos="0 0 0" size="10 10 0.125" type="plane"/>-->
|
||||
<body name="torso" pos="0 0 1.4">
|
||||
<joint armature="0" damping="0" limited="false" name="root" pos="0 0 0" stiffness="0" type="free"/>
|
||||
<geom fromto="0 -.07 0 0 .07 0" name="torso1" size="0.07" type="capsule"/>
|
||||
<geom name="head" pos="0 0 .19" size=".09" type="sphere" user="258"/>
|
||||
<geom fromto="-.01 -.06 -.12 -.01 .06 -.12" name="uwaist" size="0.06" type="capsule"/>
|
||||
<body name="lwaist" pos="-.01 0 -0.260" quat="1.000 0 -0.002 0">
|
||||
<geom fromto="0 -.06 0 0 .06 0" name="lwaist" size="0.06" type="capsule"/>
|
||||
<joint armature="0.02" axis="0 0 1" damping="5" name="abdomen_z" pos="0 0 0.065" range="-45 45" stiffness="20" type="hinge"/>
|
||||
<joint armature="0.02" axis="0 1 0" damping="5" name="abdomen_y" pos="0 0 0.065" range="-75 30" stiffness="10" type="hinge"/>
|
||||
<body name="pelvis" pos="0 0 -0.165" quat="1.000 0 -0.002 0">
|
||||
<joint armature="0.02" axis="1 0 0" damping="5" name="abdomen_x" pos="0 0 0.1" range="-35 35" stiffness="10" type="hinge"/>
|
||||
<geom fromto="-.02 -.07 0 -.02 .07 0" name="butt" size="0.09" type="capsule"/>
|
||||
<body name="right_thigh" pos="0 -0.1 -0.04">
|
||||
<joint armature="0.01" axis="1 0 0" damping="5" name="right_hip_x" pos="0 0 0" range="-25 5" stiffness="10" type="hinge"/>
|
||||
<joint armature="0.01" axis="0 0 1" damping="5" name="right_hip_z" pos="0 0 0" range="-60 35" stiffness="10" type="hinge"/>
|
||||
<joint armature="0.0080" axis="0 1 0" damping="5" name="right_hip_y" pos="0 0 0" range="-110 20" stiffness="20" type="hinge"/>
|
||||
<geom fromto="0 0 0 0 0.01 -.34" name="right_thigh1" size="0.06" type="capsule"/>
|
||||
<body name="right_shin" pos="0 0.01 -0.403">
|
||||
<joint armature="0.0060" axis="0 -1 0" name="right_knee" pos="0 0 .02" range="-160 -2" type="hinge"/>
|
||||
<geom fromto="0 0 0 0 0 -.3" name="right_shin1" size="0.049" type="capsule"/>
|
||||
<body name="right_foot" pos="0 0 -0.45">
|
||||
<geom name="right_foot" pos="0 0 0.1" size="0.075" type="sphere" user="0"/>
|
||||
</body>
|
||||
</body>
|
||||
</body>
|
||||
<body name="left_thigh" pos="0 0.1 -0.04">
|
||||
<joint armature="0.01" axis="-1 0 0" damping="5" name="left_hip_x" pos="0 0 0" range="-25 5" stiffness="10" type="hinge"/>
|
||||
<joint armature="0.01" axis="0 0 -1" damping="5" name="left_hip_z" pos="0 0 0" range="-60 35" stiffness="10" type="hinge"/>
|
||||
<joint armature="0.01" axis="0 1 0" damping="5" name="left_hip_y" pos="0 0 0" range="-120 20" stiffness="20" type="hinge"/>
|
||||
<geom fromto="0 0 0 0 -0.01 -.34" name="left_thigh1" size="0.06" type="capsule"/>
|
||||
<body name="left_shin" pos="0 -0.01 -0.403">
|
||||
<joint armature="0.0060" axis="0 -1 0" name="left_knee" pos="0 0 .02" range="-160 -2" stiffness="1" type="hinge"/>
|
||||
<geom fromto="0 0 0 0 0 -.3" name="left_shin1" size="0.049" type="capsule"/>
|
||||
<body name="left_foot" pos="0 0 -0.45">
|
||||
<geom name="left_foot" type="sphere" size="0.075" pos="0 0 0.1" user="0" />
|
||||
</body>
|
||||
</body>
|
||||
</body>
|
||||
</body>
|
||||
</body>
|
||||
<body name="right_upper_arm" pos="0 -0.17 0.06">
|
||||
<joint armature="0.0068" axis="2 1 1" name="right_shoulder1" pos="0 0 0" range="-85 60" stiffness="1" type="hinge"/>
|
||||
<joint armature="0.0051" axis="0 -1 1" name="right_shoulder2" pos="0 0 0" range="-85 60" stiffness="1" type="hinge"/>
|
||||
<geom fromto="0 0 0 .16 -.16 -.16" name="right_uarm1" size="0.04 0.16" type="capsule"/>
|
||||
<body name="right_lower_arm" pos=".18 -.18 -.18">
|
||||
<joint armature="0.0028" axis="0 -1 1" name="right_elbow" pos="0 0 0" range="-90 50" stiffness="0" type="hinge"/>
|
||||
<geom fromto="0.01 0.01 0.01 .17 .17 .17" name="right_larm" size="0.031" type="capsule"/>
|
||||
<geom name="right_hand" pos=".18 .18 .18" size="0.04" type="sphere"/>
|
||||
<camera pos="0 0 0"/>
|
||||
</body>
|
||||
</body>
|
||||
<body name="left_upper_arm" pos="0 0.17 0.06">
|
||||
<joint armature="0.0068" axis="2 -1 1" name="left_shoulder1" pos="0 0 0" range="-60 85" stiffness="1" type="hinge"/>
|
||||
<joint armature="0.0051" axis="0 1 1" name="left_shoulder2" pos="0 0 0" range="-60 85" stiffness="1" type="hinge"/>
|
||||
<geom fromto="0 0 0 .16 .16 -.16" name="left_uarm1" size="0.04 0.16" type="capsule"/>
|
||||
<body name="left_lower_arm" pos=".18 .18 -.18">
|
||||
<joint armature="0.0028" axis="0 -1 -1" name="left_elbow" pos="0 0 0" range="-90 50" stiffness="0" type="hinge"/>
|
||||
<geom fromto="0.01 -0.01 0.01 .17 -.17 .17" name="left_larm" size="0.031" type="capsule"/>
|
||||
<geom name="left_hand" pos=".18 -.18 .18" size="0.04" type="sphere"/>
|
||||
</body>
|
||||
</body>
|
||||
</body>
|
||||
</worldbody>
|
||||
<tendon>
|
||||
<fixed name="left_hipknee">
|
||||
<joint coef="-1" joint="left_hip_y"/>
|
||||
<joint coef="1" joint="left_knee"/>
|
||||
</fixed>
|
||||
<fixed name="right_hipknee">
|
||||
<joint coef="-1" joint="right_hip_y"/>
|
||||
<joint coef="1" joint="right_knee"/>
|
||||
</fixed>
|
||||
</tendon>
|
||||
|
||||
<actuator>
|
||||
<motor gear="100" joint="abdomen_y" name="abdomen_y"/>
|
||||
<motor gear="100" joint="abdomen_z" name="abdomen_z"/>
|
||||
<motor gear="100" joint="abdomen_x" name="abdomen_x"/>
|
||||
<motor gear="100" joint="right_hip_x" name="right_hip_x"/>
|
||||
<motor gear="100" joint="right_hip_z" name="right_hip_z"/>
|
||||
<motor gear="300" joint="right_hip_y" name="right_hip_y"/>
|
||||
<motor gear="200" joint="right_knee" name="right_knee"/>
|
||||
<motor gear="100" joint="left_hip_x" name="left_hip_x"/>
|
||||
<motor gear="100" joint="left_hip_z" name="left_hip_z"/>
|
||||
<motor gear="300" joint="left_hip_y" name="left_hip_y"/>
|
||||
<motor gear="200" joint="left_knee" name="left_knee"/>
|
||||
<motor gear="25" joint="right_shoulder1" name="right_shoulder1"/>
|
||||
<motor gear="25" joint="right_shoulder2" name="right_shoulder2"/>
|
||||
<motor gear="25" joint="right_elbow" name="right_elbow"/>
|
||||
<motor gear="25" joint="left_shoulder1" name="left_shoulder1"/>
|
||||
<motor gear="25" joint="left_shoulder2" name="left_shoulder2"/>
|
||||
<motor gear="25" joint="left_elbow" name="left_elbow"/>
|
||||
</actuator>
|
||||
</mujoco>
|
47
gym/envs/mujoco/assets/inverted_double_pendulum.xml
Normal file
47
gym/envs/mujoco/assets/inverted_double_pendulum.xml
Normal file
@@ -0,0 +1,47 @@
|
||||
<!-- Cartpole Model
|
||||
|
||||
The state space is populated with joints in the order that they are
|
||||
defined in this file. The actuators also operate on joints.
|
||||
|
||||
State-Space (name/joint/parameter):
|
||||
- cart slider position (m)
|
||||
- pole hinge angle (rad)
|
||||
- cart slider velocity (m/s)
|
||||
- pole hinge angular velocity (rad/s)
|
||||
|
||||
Actuators (name/actuator/parameter):
|
||||
- cart motor force x (N)
|
||||
|
||||
-->
|
||||
<mujoco model="cartpole">
|
||||
<compiler coordinate="local" inertiafromgeom="true"/>
|
||||
<custom>
|
||||
<numeric data="2" name="frame_skip"/>
|
||||
</custom>
|
||||
<default>
|
||||
<joint damping="0.05"/>
|
||||
<geom contype="0" friction="1 0.1 0.1" rgba="0.7 0.7 0 1"/>
|
||||
</default>
|
||||
<option gravity="1e-5 0 -9.81" integrator="RK4" timestep="0.01"/>
|
||||
<size nstack="3000"/>
|
||||
<worldbody>
|
||||
<geom name="floor" pos="0 0 -3.0" rgba="0.8 0.9 0.8 1" size="40 40 40" type="plane"/>
|
||||
<geom name="rail" pos="0 0 0" quat="0.707 0 0.707 0" rgba="0.3 0.3 0.7 1" size="0.02 1" type="capsule"/>
|
||||
<body name="cart" pos="0 0 0">
|
||||
<joint axis="1 0 0" limited="true" margin="0.01" name="slider" pos="0 0 0" range="-1 1" type="slide"/>
|
||||
<geom name="cart" pos="0 0 0" quat="0.707 0 0.707 0" size="0.1 0.1" type="capsule"/>
|
||||
<body name="pole" pos="0 0 0">
|
||||
<joint axis="0 1 0" name="hinge" pos="0 0 0" type="hinge"/>
|
||||
<geom fromto="0 0 0 0 0 0.6" name="cpole" rgba="0 0.7 0.7 1" size="0.045 0.3" type="capsule"/>
|
||||
<body name="pole2" pos="0 0 0.6">
|
||||
<joint axis="0 1 0" name="hinge2" pos="0 0 0" type="hinge"/>
|
||||
<geom fromto="0 0 0 0 0 0.6" name="cpole2" rgba="0 0.7 0.7 1" size="0.045 0.3" type="capsule"/>
|
||||
<site name="tip" pos="0 0 .6" size="0.01 0.01"/>
|
||||
</body>
|
||||
</body>
|
||||
</body>
|
||||
</worldbody>
|
||||
<actuator>
|
||||
<motor ctrllimited="true" ctrlrange="-1 1" gear="500" joint="slider" name="slide"/>
|
||||
</actuator>
|
||||
</mujoco>
|
27
gym/envs/mujoco/assets/inverted_pendulum.xml
Normal file
27
gym/envs/mujoco/assets/inverted_pendulum.xml
Normal file
@@ -0,0 +1,27 @@
|
||||
<mujoco model="inverted pendulum">
|
||||
<compiler inertiafromgeom="true"/>
|
||||
<default>
|
||||
<joint armature="0" damping="1" limited="true"/>
|
||||
<geom contype="0" friction="1 0.1 0.1" rgba="0.7 0.7 0 1"/>
|
||||
<tendon/>
|
||||
<motor ctrlrange="-3 3"/>
|
||||
</default>
|
||||
<option gravity="0 0 -9.81" integrator="RK4" timestep="0.02"/>
|
||||
<size nstack="3000"/>
|
||||
<worldbody>
|
||||
<!--geom name="ground" type="plane" pos="0 0 0" /-->
|
||||
<geom name="rail" pos="0 0 0" quat="0.707 0 0.707 0" rgba="0.3 0.3 0.7 1" size="0.02 1" type="capsule"/>
|
||||
<body name="cart" pos="0 0 0">
|
||||
<joint axis="1 0 0" limited="true" name="slider" pos="0 0 0" range="-1 1" type="slide"/>
|
||||
<geom name="cart" pos="0 0 0" quat="0.707 0 0.707 0" size="0.1 0.1" type="capsule"/>
|
||||
<body name="pole" pos="0 0 0">
|
||||
<joint axis="0 1 0" name="hinge" pos="0 0 0" range="-90 90" type="hinge"/>
|
||||
<geom fromto="0 0 0 0.001 0 0.6" name="cpole" rgba="0 0.7 0.7 1" size="0.049 0.3" type="capsule"/>
|
||||
<!-- <body name="pole2" pos="0.001 0 0.6"><joint name="hinge2" type="hinge" pos="0 0 0" axis="0 1 0"/><geom name="cpole2" type="capsule" fromto="0 0 0 0 0 0.6" size="0.05 0.3" rgba="0.7 0 0.7 1"/><site name="tip2" pos="0 0 .6"/></body>-->
|
||||
</body>
|
||||
</body>
|
||||
</worldbody>
|
||||
<actuator>
|
||||
<motor gear="100" joint="slider" name="slide"/>
|
||||
</actuator>
|
||||
</mujoco>
|
31
gym/envs/mujoco/assets/point.xml
Normal file
31
gym/envs/mujoco/assets/point.xml
Normal file
@@ -0,0 +1,31 @@
|
||||
<mujoco>
|
||||
<compiler angle="degree" coordinate="local" inertiafromgeom="true"/>
|
||||
<option integrator="RK4" timestep="0.02"/>
|
||||
<default>
|
||||
<joint armature="0" damping="0" limited="false"/>
|
||||
<geom conaffinity="0" condim="3" density="100" friction="1 0.5 0.5" margin="0" rgba="0.8 0.6 0.4 1"/>
|
||||
</default>
|
||||
<asset>
|
||||
<texture builtin="gradient" height="100" rgb1="1 1 1" rgb2="0 0 0" type="skybox" width="100"/>
|
||||
<texture builtin="flat" height="1278" mark="cross" markrgb="1 1 1" name="texgeom" random="0.01" rgb1="0.8 0.6 0.4" rgb2="0.8 0.6 0.4" type="cube" width="127"/>
|
||||
<texture builtin="checker" height="100" name="texplane" rgb1="0 0 0" rgb2="0.8 0.8 0.8" type="2d" width="100"/>
|
||||
<material name="MatPlane" reflectance="0.5" shininess="1" specular="1" texrepeat="30 30" texture="texplane"/>
|
||||
<material name="geom" texture="texgeom" texuniform="true"/>
|
||||
</asset>
|
||||
<worldbody>
|
||||
<light cutoff="100" diffuse="1 1 1" dir="-0 0 -1.3" directional="true" exponent="1" pos="0 0 1.3" specular=".1 .1 .1"/>
|
||||
<geom conaffinity="1" condim="3" material="MatPlane" name="floor" pos="0 0 0" rgba="0.8 0.9 0.8 1" size="40 40 40" type="plane"/>
|
||||
<body name="torso" pos="0 0 0">
|
||||
<geom name="pointbody" pos="0 0 0.5" size="0.5" type="sphere"/>
|
||||
<geom name="pointarrow" pos="0.6 0 0.5" size="0.5 0.1 0.1" type="box"/>
|
||||
<joint axis="1 0 0" name="ballx" pos="0 0 0" type="slide"/>
|
||||
<joint axis="0 1 0" name="bally" pos="0 0 0" type="slide"/>
|
||||
<joint axis="0 0 1" limited="false" name="rot" pos="0 0 0" type="hinge"/>
|
||||
</body>
|
||||
</worldbody>
|
||||
<actuator>
|
||||
<!-- Those are just dummy actuators for providing ranges -->
|
||||
<motor ctrllimited="true" ctrlrange="-1 1" joint="ballx"/>
|
||||
<motor ctrllimited="true" ctrlrange="-0.25 0.25" joint="rot"/>
|
||||
</actuator>
|
||||
</mujoco>
|
39
gym/envs/mujoco/assets/reacher.xml
Normal file
39
gym/envs/mujoco/assets/reacher.xml
Normal file
@@ -0,0 +1,39 @@
|
||||
<mujoco model="reacher">
|
||||
<compiler angle="radian" inertiafromgeom="true"/>
|
||||
<default>
|
||||
<joint armature="1" damping="1" limited="true"/>
|
||||
<geom contype="0" friction="1 0.1 0.1" rgba="0.7 0.7 0 1"/>
|
||||
</default>
|
||||
<option gravity="0 0 -9.81" integrator="RK4" timestep="0.01"/>
|
||||
<worldbody>
|
||||
<!-- Arena -->
|
||||
<geom conaffinity="0" contype="0" name="ground" pos="0 0 0" rgba="0.9 0.9 0.9 1" size="1 1 10" type="plane"/>
|
||||
<geom conaffinity="0" fromto="-.3 -.3 .01 .3 -.3 .01" name="sideS" rgba="0.9 0.4 0.6 1" size=".02" type="capsule"/>
|
||||
<geom conaffinity="0" fromto=" .3 -.3 .01 .3 .3 .01" name="sideE" rgba="0.9 0.4 0.6 1" size=".02" type="capsule"/>
|
||||
<geom conaffinity="0" fromto="-.3 .3 .01 .3 .3 .01" name="sideN" rgba="0.9 0.4 0.6 1" size=".02" type="capsule"/>
|
||||
<geom conaffinity="0" fromto="-.3 -.3 .01 -.3 .3 .01" name="sideW" rgba="0.9 0.4 0.6 1" size=".02" type="capsule"/>
|
||||
<!-- Arm -->
|
||||
<geom conaffinity="0" contype="0" fromto="0 0 0 0 0 0.02" name="root" rgba="0.9 0.4 0.6 1" size=".011" type="cylinder"/>
|
||||
<body name="body0" pos="0 0 .01">
|
||||
<geom fromto="0 0 0 0.1 0 0" name="link0" rgba="0.0 0.4 0.6 1" size=".01" type="capsule"/>
|
||||
<joint axis="0 0 1" limited="false" name="joint0" pos="0 0 0" type="hinge"/>
|
||||
<body name="body1" pos="0.1 0 0">
|
||||
<joint axis="0 0 1" limited="true" name="joint1" pos="0 0 0" range="-3.0 3.0" type="hinge"/>
|
||||
<geom fromto="0 0 0 0.1 0 0" name="link1" rgba="0.0 0.4 0.6 1" size=".01" type="capsule"/>
|
||||
<body name="fingertip" pos="0.11 0 0">
|
||||
<geom contype="0" name="fingertip" pos="0 0 0" rgba="0.0 0.8 0.6 1" size=".01" type="sphere"/>
|
||||
</body>
|
||||
</body>
|
||||
</body>
|
||||
<!-- Target -->
|
||||
<body name="target" pos=".1 -.1 .01">
|
||||
<joint armature="0" axis="1 0 0" damping="0" limited="true" name="target_x" pos="0 0 0" range="-.27 .27" ref=".1" stiffness="0" type="slide"/>
|
||||
<joint armature="0" axis="0 1 0" damping="0" limited="true" name="target_y" pos="0 0 0" range="-.27 .27" ref="-.1" stiffness="0" type="slide"/>
|
||||
<geom conaffinity="0" contype="0" name="target" pos="0 0 0" rgba="0.9 0.2 0.2 1" size=".009" type="sphere"/>
|
||||
</body>
|
||||
</worldbody>
|
||||
<actuator>
|
||||
<motor ctrllimited="true" ctrlrange="-1.0 1.0" gear="200.0" joint="joint0"/>
|
||||
<motor ctrllimited="true" ctrlrange="-1.0 1.0" gear="200.0" joint="joint1"/>
|
||||
</actuator>
|
||||
</mujoco>
|
38
gym/envs/mujoco/assets/swimmer.xml
Normal file
38
gym/envs/mujoco/assets/swimmer.xml
Normal file
@@ -0,0 +1,38 @@
|
||||
<mujoco model="swimmer">
|
||||
<compiler angle="degree" coordinate="local" inertiafromgeom="true"/>
|
||||
<option collision="predefined" density="4000" integrator="RK4" timestep="0.01" viscosity="0.1"/>
|
||||
<default>
|
||||
<geom conaffinity="1" condim="1" contype="1" material="geom" rgba="0.8 0.6 .4 1"/>
|
||||
<joint armature='0.1' />
|
||||
</default>
|
||||
<asset>
|
||||
<texture builtin="gradient" height="100" rgb1="1 1 1" rgb2="0 0 0" type="skybox" width="100"/>
|
||||
<texture builtin="flat" height="1278" mark="cross" markrgb="1 1 1" name="texgeom" random="0.01" rgb1="0.8 0.6 0.4" rgb2="0.8 0.6 0.4" type="cube" width="127"/>
|
||||
<texture builtin="checker" height="100" name="texplane" rgb1="0 0 0" rgb2="0.8 0.8 0.8" type="2d" width="100"/>
|
||||
<material name="MatPlane" reflectance="0.5" shininess="1" specular="1" texrepeat="30 30" texture="texplane"/>
|
||||
<material name="geom" texture="texgeom" texuniform="true"/>
|
||||
</asset>
|
||||
<worldbody>
|
||||
<light cutoff="100" diffuse="1 1 1" dir="-0 0 -1.3" directional="true" exponent="1" pos="0 0 1.3" specular=".1 .1 .1"/>
|
||||
<geom conaffinity="1" condim="3" material="MatPlane" name="floor" pos="0 0 -0.1" rgba="0.8 0.9 0.8 1" size="40 40 0.1" type="plane"/>
|
||||
<!-- ================= SWIMMER ================= /-->
|
||||
<body name="torso" pos="0 0 0">
|
||||
<geom density="1000" fromto="1.5 0 0 0.5 0 0" size="0.1" type="capsule"/>
|
||||
<joint axis="1 0 0" name="slider1" pos="0 0 0" type="slide"/>
|
||||
<joint axis="0 1 0" name="slider2" pos="0 0 0" type="slide"/>
|
||||
<joint axis="0 0 1" name="rot" pos="0 0 0" type="hinge"/>
|
||||
<body name="mid" pos="0.5 0 0">
|
||||
<geom density="1000" fromto="0 0 0 -1 0 0" size="0.1" type="capsule"/>
|
||||
<joint axis="0 0 1" limited="true" name="rot2" pos="0 0 0" range="-100 100" type="hinge"/>
|
||||
<body name="back" pos="-1 0 0">
|
||||
<geom density="1000" fromto="0 0 0 -1 0 0" size="0.1" type="capsule"/>
|
||||
<joint axis="0 0 1" limited="true" name="rot3" pos="0 0 0" range="-100 100" type="hinge"/>
|
||||
</body>
|
||||
</body>
|
||||
</body>
|
||||
</worldbody>
|
||||
<actuator>
|
||||
<motor ctrllimited="true" ctrlrange="-1 1" gear="150.0" joint="rot2"/>
|
||||
<motor ctrllimited="true" ctrlrange="-1 1" gear="150.0" joint="rot3"/>
|
||||
</actuator>
|
||||
</mujoco>
|
61
gym/envs/mujoco/assets/walker2d.xml
Normal file
61
gym/envs/mujoco/assets/walker2d.xml
Normal file
@@ -0,0 +1,61 @@
|
||||
<mujoco model="walker2d">
|
||||
<compiler angle="degree" coordinate="global" inertiafromgeom="true"/>
|
||||
<default>
|
||||
<joint armature="0.01" damping=".1" limited="true"/>
|
||||
<geom conaffinity="0" condim="3" contype="1" density="1000" friction=".7 .1 .1" rgba="0.8 0.6 .4 1"/>
|
||||
</default>
|
||||
<option integrator="RK4" timestep="0.002"/>
|
||||
<worldbody>
|
||||
<light cutoff="100" diffuse="1 1 1" dir="-0 0 -1.3" directional="true" exponent="1" pos="0 0 1.3" specular=".1 .1 .1"/>
|
||||
<geom conaffinity="1" condim="3" name="floor" pos="0 0 0" rgba="0.8 0.9 0.8 1" size="40 40 40" type="plane" material="MatPlane"/>
|
||||
<body name="torso" pos="0 0 1.25">
|
||||
<joint armature="0" axis="1 0 0" damping="0" limited="false" name="rootx" pos="0 0 0" stiffness="0" type="slide"/>
|
||||
<joint armature="0" axis="0 0 1" damping="0" limited="false" name="rootz" pos="0 0 0" ref="1.25" stiffness="0" type="slide"/>
|
||||
<joint armature="0" axis="0 1 0" damping="0" limited="false" name="rooty" pos="0 0 1.25" stiffness="0" type="hinge"/>
|
||||
<geom friction="0.9" fromto="0 0 1.45 0 0 1.05" name="torso_geom" size="0.05" type="capsule"/>
|
||||
<body name="thigh" pos="0 0 1.05">
|
||||
<joint axis="0 -1 0" name="thigh_joint" pos="0 0 1.05" range="-150 0" type="hinge"/>
|
||||
<geom friction="0.9" fromto="0 0 1.05 0 0 0.6" name="thigh_geom" size="0.05" type="capsule"/>
|
||||
<body name="leg" pos="0 0 0.35">
|
||||
<joint axis="0 -1 0" name="leg_joint" pos="0 0 0.6" range="-150 0" type="hinge"/>
|
||||
<geom friction="0.9" fromto="0 0 0.6 0 0 0.1" name="leg_geom" size="0.04" type="capsule"/>
|
||||
<body name="foot" pos="0.2/2 0 0.1">
|
||||
<joint axis="0 -1 0" name="foot_joint" pos="0 0 0.1" range="-45 45" type="hinge"/>
|
||||
<geom friction="0.9" fromto="-0.0 0 0.1 0.2 0 0.1" name="foot_geom" size="0.06" type="capsule"/>
|
||||
</body>
|
||||
</body>
|
||||
</body>
|
||||
<!-- copied and then replace thigh->thigh_left, leg->leg_left, foot->foot_right -->
|
||||
<body name="thigh_left" pos="0 0 1.05">
|
||||
<joint axis="0 -1 0" name="thigh_left_joint" pos="0 0 1.05" range="-150 0" type="hinge"/>
|
||||
<geom friction="0.9" fromto="0 0 1.05 0 0 0.6" name="thigh_left_geom" rgba=".7 .3 .6 1" size="0.05" type="capsule"/>
|
||||
<body name="leg_left" pos="0 0 0.35">
|
||||
<joint axis="0 -1 0" name="leg_left_joint" pos="0 0 0.6" range="-150 0" type="hinge"/>
|
||||
<geom friction="0.9" fromto="0 0 0.6 0 0 0.1" name="leg_left_geom" rgba=".7 .3 .6 1" size="0.04" type="capsule"/>
|
||||
<body name="foot_left" pos="0.2/2 0 0.1">
|
||||
<joint axis="0 -1 0" name="foot_left_joint" pos="0 0 0.1" range="-45 45" type="hinge"/>
|
||||
<geom friction="1.9" fromto="-0.0 0 0.1 0.2 0 0.1" name="foot_left_geom" rgba=".7 .3 .6 1" size="0.06" type="capsule"/>
|
||||
</body>
|
||||
</body>
|
||||
</body>
|
||||
</body>
|
||||
</worldbody>
|
||||
<actuator>
|
||||
<!-- <motor joint="torso_joint" ctrlrange="-100.0 100.0" isctrllimited="true"/>-->
|
||||
<motor ctrllimited="true" ctrlrange="-1.0 1.0" gear="100" joint="thigh_joint"/>
|
||||
<motor ctrllimited="true" ctrlrange="-1.0 1.0" gear="100" joint="leg_joint"/>
|
||||
<motor ctrllimited="true" ctrlrange="-1.0 1.0" gear="100" joint="foot_joint"/>
|
||||
<motor ctrllimited="true" ctrlrange="-1.0 1.0" gear="100" joint="thigh_left_joint"/>
|
||||
<motor ctrllimited="true" ctrlrange="-1.0 1.0" gear="100" joint="leg_left_joint"/>
|
||||
<motor ctrllimited="true" ctrlrange="-1.0 1.0" gear="100" joint="foot_left_joint"/>
|
||||
<!-- <motor joint="finger2_rot" ctrlrange="-20.0 20.0" isctrllimited="true"/>-->
|
||||
</actuator>
|
||||
<asset>
|
||||
<texture type="skybox" builtin="gradient" rgb1=".4 .5 .6" rgb2="0 0 0"
|
||||
width="100" height="100"/>
|
||||
<texture builtin="flat" height="1278" mark="cross" markrgb="1 1 1" name="texgeom" random="0.01" rgb1="0.8 0.6 0.4" rgb2="0.8 0.6 0.4" type="cube" width="127"/>
|
||||
<texture builtin="checker" height="100" name="texplane" rgb1="0 0 0" rgb2="0.8 0.8 0.8" type="2d" width="100"/>
|
||||
<material name="MatPlane" reflectance="0.5" shininess="1" specular="1" texrepeat="60 60" texture="texplane"/>
|
||||
<material name="geom" texture="texgeom" texuniform="true"/>
|
||||
</asset>
|
||||
</mujoco>
|
35
gym/envs/mujoco/half_cheetah.py
Normal file
35
gym/envs/mujoco/half_cheetah.py
Normal file
@@ -0,0 +1,35 @@
|
||||
import numpy as np
|
||||
from gym import utils
|
||||
from gym.envs.mujoco import mujoco_env
|
||||
|
||||
class HalfCheetahEnv(mujoco_env.MujocoEnv, utils.EzPickle):
|
||||
def __init__(self):
|
||||
mujoco_env.MujocoEnv.__init__(self, 'half_cheetah.xml', 5)
|
||||
utils.EzPickle.__init__(self)
|
||||
self.finalize()
|
||||
|
||||
def _step(self, action):
|
||||
xposbefore = self.model.data.qpos[0,0]
|
||||
self.do_simulation(action, self.frame_skip)
|
||||
xposafter = self.model.data.qpos[0,0]
|
||||
ob = self._get_obs()
|
||||
reward_ctrl = - 0.1 * np.square(action).sum()
|
||||
reward_run = (xposafter - xposbefore)/self.dt
|
||||
reward = reward_ctrl + reward_run
|
||||
done = False
|
||||
return ob, reward, done, dict(reward_run = reward_run, reward_ctrl=reward_ctrl)
|
||||
|
||||
def _get_obs(self):
|
||||
return np.concatenate([
|
||||
self.model.data.qpos.flat[1:],
|
||||
self.model.data.qvel.flat,
|
||||
])
|
||||
|
||||
def _reset(self):
|
||||
self.model.data.qpos = self.init_qpos + np.random.uniform(size=(self.model.nq,1),low=-.1,high=.1)
|
||||
self.model.data.qvel = self.init_qvel + np.random.randn(self.model.nv,1)*.1
|
||||
self.reset_viewer_if_necessary()
|
||||
return self._get_obs()
|
||||
|
||||
def viewer_setup(self):
|
||||
self.viewer.cam.distance = self.model.stat.extent * 0.5
|
41
gym/envs/mujoco/hopper.py
Normal file
41
gym/envs/mujoco/hopper.py
Normal file
@@ -0,0 +1,41 @@
|
||||
import numpy as np
|
||||
from gym import utils
|
||||
from gym.envs.mujoco import mujoco_env
|
||||
|
||||
class HopperEnv(mujoco_env.MujocoEnv, utils.EzPickle):
|
||||
def __init__(self):
|
||||
mujoco_env.MujocoEnv.__init__(self, 'hopper.xml', 4)
|
||||
utils.EzPickle.__init__(self)
|
||||
self.finalize()
|
||||
|
||||
def _step(self, a):
|
||||
posbefore = self.model.data.qpos[0,0]
|
||||
self.do_simulation(a, self.frame_skip)
|
||||
posafter,height,ang = self.model.data.qpos[0:3,0]
|
||||
alive_bonus = 1.0
|
||||
reward = (posafter - posbefore) / self.dt
|
||||
reward += alive_bonus
|
||||
reward -= 1e-3 * np.square(a).sum()
|
||||
s = self._state
|
||||
done = not (np.isfinite(s).all() and (np.abs(s[2:]) < 100).all() and
|
||||
(height > .7) and (abs(ang) < .2))
|
||||
ob = self._get_obs()
|
||||
return ob, reward, done, {}
|
||||
|
||||
def _get_obs(self):
|
||||
return np.concatenate([
|
||||
self.model.data.qpos.flat[1:],
|
||||
np.clip(self.model.data.qvel.flat,-10,10)
|
||||
])
|
||||
|
||||
def _reset(self):
|
||||
self.model.data.qpos = self.init_qpos + np.random.rand(self.model.nq,1)*.01-.005
|
||||
self.model.data.qvel = self.init_qvel + np.random.rand(self.model.nv,1)*.01-.005
|
||||
self.reset_viewer_if_necessary()
|
||||
return self._get_obs()
|
||||
|
||||
def viewer_setup(self):
|
||||
self.viewer.cam.trackbodyid = 2
|
||||
self.viewer.cam.distance = self.model.stat.extent * 0.75
|
||||
self.viewer.cam.lookat[2] += .8
|
||||
self.viewer.cam.elevation = -20
|
53
gym/envs/mujoco/humanoid.py
Normal file
53
gym/envs/mujoco/humanoid.py
Normal file
@@ -0,0 +1,53 @@
|
||||
import numpy as np
|
||||
from gym.envs.mujoco import mujoco_env
|
||||
from gym import utils
|
||||
|
||||
def mass_center(model):
|
||||
mass = model.body_mass
|
||||
xpos = model.data.xipos
|
||||
return (np.sum(mass * xpos, 0) / np.sum(mass))[0]
|
||||
|
||||
class HumanoidEnv(mujoco_env.MujocoEnv, utils.EzPickle):
|
||||
def __init__(self, initial_randomness=0.01):
|
||||
mujoco_env.MujocoEnv.__init__(self, 'humanoid.xml', 5)
|
||||
utils.EzPickle.__init__(self)
|
||||
self.initial_randomness = initial_randomness
|
||||
self.finalize()
|
||||
|
||||
def _get_obs(self):
|
||||
data = self.model.data
|
||||
return np.concatenate([data.qpos.flat[2:],
|
||||
data.qvel.flat,
|
||||
data.cinert.flat,
|
||||
data.cvel.flat,
|
||||
data.qfrc_actuator.flat,
|
||||
data.cfrc_ext.flat])
|
||||
|
||||
def _step(self, a):
|
||||
pos_before = mass_center(self.model)
|
||||
self.do_simulation(a, self.frame_skip)
|
||||
pos_after = mass_center(self.model)
|
||||
alive_bonus = 5.0
|
||||
data = self.model.data
|
||||
lin_vel_cost = 0.25 * (pos_after - pos_before) / self.model.opt.timestep
|
||||
quad_ctrl_cost = 0.1 * np.square(data.ctrl).sum()
|
||||
quad_impact_cost = .5e-6 * np.square(data.cfrc_ext).sum()
|
||||
quad_impact_cost = min(quad_impact_cost, 10)
|
||||
reward = lin_vel_cost - quad_ctrl_cost - quad_impact_cost + alive_bonus
|
||||
qpos = self.model.data.qpos
|
||||
done = bool((qpos[2] < 1.0) or (qpos[2] > 2.0))
|
||||
return self._get_obs(), reward, done, dict(reward_linvel=lin_vel_cost, reward_quadctrl=-quad_ctrl_cost, reward_alive=alive_bonus, reward_impact=-quad_impact_cost)
|
||||
|
||||
# TODO: requires more complicated reset.
|
||||
def _reset(self):
|
||||
self.model.data.qpos = self.init_qpos + (np.random.rand(self.model.nq,1)-0.5)*2*self.initial_randomness
|
||||
self.model.data.qvel = self.init_qvel + (np.random.rand(self.model.nv,1)-0.5)*2*self.initial_randomness
|
||||
self.model.forward()
|
||||
self.reset_viewer_if_necessary()
|
||||
return self._get_obs()
|
||||
|
||||
def viewer_setup(self):
|
||||
self.viewer.cam.trackbodyid = 1
|
||||
self.viewer.cam.distance = self.model.stat.extent * 1.0
|
||||
self.viewer.cam.lookat[2] += .8
|
||||
self.viewer.cam.elevation = -20
|
43
gym/envs/mujoco/inverted_double_pendulum.py
Normal file
43
gym/envs/mujoco/inverted_double_pendulum.py
Normal file
@@ -0,0 +1,43 @@
|
||||
import numpy as np
|
||||
from gym import utils
|
||||
from gym.envs.mujoco import mujoco_env
|
||||
|
||||
class InvertedDoublePendulumEnv(mujoco_env.MujocoEnv, utils.EzPickle):
|
||||
|
||||
def __init__(self):
|
||||
mujoco_env.MujocoEnv.__init__(self, 'inverted_double_pendulum.xml', 5)
|
||||
utils.EzPickle.__init__(self)
|
||||
self.finalize()
|
||||
|
||||
def _step(self, action):
|
||||
self.do_simulation(action, self.frame_skip)
|
||||
ob = self._get_obs()
|
||||
x, _, y = self.model.data.site_xpos[0]
|
||||
dist_penalty = 0.01 * x ** 2 + (y - 2) ** 2
|
||||
v1, v2 = self.model.data.qvel[1:3]
|
||||
vel_penalty = 1e-3 * v1**2 + 5e-3 * v2**2
|
||||
alive_bonus = 10
|
||||
r = (alive_bonus - dist_penalty - vel_penalty)[0]
|
||||
done = bool(y <= 1)
|
||||
return ob, r, done, {}
|
||||
|
||||
def _get_obs(self):
|
||||
return np.concatenate([
|
||||
self.model.data.qpos[:1], # cart x pos
|
||||
np.sin(self.model.data.qpos[1:]), # link angles
|
||||
np.cos(self.model.data.qpos[1:]),
|
||||
np.clip(self.model.data.qvel, -10, 10),
|
||||
np.clip(self.model.data.qfrc_constraint, -10, 10)
|
||||
]).ravel()
|
||||
|
||||
def _reset(self):
|
||||
self.model.data.qpos = self.init_qpos + np.random.uniform(size=(self.model.nq,1),low=-.1,high=.1)
|
||||
self.model.data.qvel = self.init_qvel + np.random.randn(self.model.nv,1)*.1
|
||||
self.reset_viewer_if_necessary()
|
||||
return self._get_obs()
|
||||
|
||||
def viewer_setup(self):
|
||||
v = self.viewer
|
||||
v.cam.trackbodyid=0
|
||||
v.cam.distance = v.model.stat.extent * 0.5
|
||||
v.cam.lookat[2] += 3#v.model.stat.center[2]
|
31
gym/envs/mujoco/inverted_pendulum.py
Normal file
31
gym/envs/mujoco/inverted_pendulum.py
Normal file
@@ -0,0 +1,31 @@
|
||||
import numpy as np
|
||||
from gym import utils
|
||||
from gym.envs.mujoco import mujoco_env
|
||||
|
||||
class InvertedPendulumEnv(mujoco_env.MujocoEnv, utils.EzPickle):
|
||||
def __init__(self):
|
||||
utils.EzPickle.__init__(self)
|
||||
mujoco_env.MujocoEnv.__init__(self, 'inverted_pendulum.xml', 2)
|
||||
self.finalize()
|
||||
|
||||
def _step(self, a):
|
||||
reward = 1.0
|
||||
self.do_simulation(a, self.frame_skip)
|
||||
ob = self._get_obs()
|
||||
notdone = np.isfinite(ob).all() and (np.abs(ob[1]) <= .2)
|
||||
done = not notdone
|
||||
return ob, reward, done, {}
|
||||
|
||||
def _reset(self):
|
||||
self.model.data.qpos = self.init_qpos + np.random.uniform(size=(self.model.nq,1), low=-0.01, high=0.01)
|
||||
self.model.data.qvel = self.init_qvel + np.random.uniform(size=(self.model.nv,1), low=-0.01, high=0.01)
|
||||
self.reset_viewer_if_necessary()
|
||||
return self._get_obs()
|
||||
|
||||
def _get_obs(self):
|
||||
return np.concatenate([self.model.data.qpos, self.model.data.qvel]).ravel()
|
||||
|
||||
def viewer_setup(self):
|
||||
v = self.viewer
|
||||
v.cam.trackbodyid=0
|
||||
v.cam.distance = v.model.stat.extent
|
109
gym/envs/mujoco/mujoco_env.py
Normal file
109
gym/envs/mujoco/mujoco_env.py
Normal file
@@ -0,0 +1,109 @@
|
||||
import os.path
|
||||
|
||||
import numpy as np
|
||||
import gym
|
||||
from gym import error, spaces
|
||||
|
||||
try:
|
||||
import mujoco_py
|
||||
except ImportError as e:
|
||||
raise error.DependencyNotInstalled("{}. (HINT: you need to install mujoco_py, and also perform the setup instructions here: https://github.com/openai/mujoco-py/.)'".format(e))
|
||||
|
||||
BIG=10000
|
||||
|
||||
class MujocoEnv(gym.Env):
|
||||
def __init__(self, model_path, frame_skip):
|
||||
if model_path.startswith("/"):
|
||||
fullpath = model_path
|
||||
else:
|
||||
fullpath = os.path.join(os.path.dirname(__file__), "assets", model_path)
|
||||
if not os.path.exists(fullpath):
|
||||
raise IOError("File %s does not exist"%fullpath)
|
||||
self.frame_skip= frame_skip
|
||||
self.model = mujoco_py.MjModel(fullpath)
|
||||
self.data = self.model.data
|
||||
self.viewer = None
|
||||
|
||||
self.metadata = {
|
||||
'render.modes': ['human', 'rgb_array'],
|
||||
'video.frames_per_second' : int(np.round(1.0 / self.dt))
|
||||
}
|
||||
|
||||
@property
|
||||
def dt(self):
|
||||
return self.model.opt.timestep * self.frame_skip
|
||||
|
||||
def do_simulation(self, ctrl, n_frames):
|
||||
self.model.data.ctrl = ctrl
|
||||
for _ in range(n_frames):
|
||||
self.model.step()
|
||||
|
||||
def finalize(self):
|
||||
self.init_qpos = self.model.data.qpos.copy()
|
||||
self.init_qvel = self.model.data.qvel.copy()
|
||||
self.ctrl_dim = self.model.data.ctrl.size
|
||||
observation, _reward, done, _info = self.step(np.zeros(self.ctrl_dim))
|
||||
assert not done
|
||||
self.obs_dim = observation.size
|
||||
|
||||
high = np.ones(self.ctrl_dim)
|
||||
low = -high
|
||||
self.action_space = spaces.Box(low, high)
|
||||
|
||||
high = BIG*np.ones(self.obs_dim)
|
||||
low = -high
|
||||
self.observation_space = spaces.Box(low, high)
|
||||
|
||||
def _render(self, mode='human', close=False):
|
||||
if close:
|
||||
self._get_viewer().finish()
|
||||
return
|
||||
|
||||
if mode == 'rgb_array':
|
||||
self._get_viewer().render()
|
||||
data, width, height = self._get_viewer().get_image()
|
||||
return np.fromstring(data, dtype='uint8').reshape(height, width, 3)[::-1,:,:]
|
||||
elif mode is 'human':
|
||||
self._get_viewer().loop_once()
|
||||
|
||||
def _get_viewer(self):
|
||||
if self.viewer is None:
|
||||
self.viewer = mujoco_py.MjViewer()
|
||||
self.viewer.start()
|
||||
self.viewer.set_model(self.model)
|
||||
self.viewer_setup()
|
||||
return self.viewer
|
||||
|
||||
def viewer_setup(self):
|
||||
pass
|
||||
|
||||
def reset_viewer_if_necessary(self):
|
||||
if self.viewer is not None:
|
||||
self.viewer.autoscale()
|
||||
self.viewer_setup()
|
||||
|
||||
def get_body_com(self, body_name):
|
||||
idx = self.model.body_names.index(body_name)
|
||||
return self.model.data.com_subtree[idx]
|
||||
|
||||
def get_body_comvel(self, body_name):
|
||||
idx = self.model.body_names.index(body_name)
|
||||
return self.model.body_comvels[idx]
|
||||
|
||||
def get_body_xmat(self, body_name):
|
||||
idx = self.model.body_names.index(body_name)
|
||||
return self.model.data.xmat[idx].reshape((3, 3))
|
||||
|
||||
@property
|
||||
def action_bounds(self):
|
||||
bounds = self.model.actuator_ctrlrange
|
||||
lb = bounds[:, 0]
|
||||
ub = bounds[:, 1]
|
||||
return lb, ub
|
||||
|
||||
@property
|
||||
def _state(self):
|
||||
return np.concatenate([
|
||||
self.model.data.qpos.flat,
|
||||
self.model.data.qvel.flat
|
||||
])
|
45
gym/envs/mujoco/reacher.py
Normal file
45
gym/envs/mujoco/reacher.py
Normal file
@@ -0,0 +1,45 @@
|
||||
import numpy as np
|
||||
from gym import utils
|
||||
from gym.envs.mujoco import mujoco_env
|
||||
|
||||
class ReacherEnv(mujoco_env.MujocoEnv, utils.EzPickle):
|
||||
def __init__(self):
|
||||
utils.EzPickle.__init__(self)
|
||||
mujoco_env.MujocoEnv.__init__(self, 'reacher.xml', 2)
|
||||
self.finalize()
|
||||
|
||||
def _step(self, a):
|
||||
vec = self.get_body_com("fingertip")-self.get_body_com("target")
|
||||
reward_dist = - np.linalg.norm(vec)
|
||||
reward_ctrl = - np.square(a).sum()
|
||||
reward = reward_dist + reward_ctrl
|
||||
self.do_simulation(a, self.frame_skip)
|
||||
ob = self._get_obs()
|
||||
done = False
|
||||
return ob, reward, done, dict(reward_dist=reward_dist, reward_ctrl=reward_ctrl)
|
||||
|
||||
def viewer_setup(self):
|
||||
self.viewer.cam.trackbodyid=0
|
||||
|
||||
def _reset(self):
|
||||
qpos = np.random.uniform(low=-0.1, high=0.1, size=(self.model.nq,1)) + self.init_qpos
|
||||
while True:
|
||||
self.goal = np.random.uniform(low=-.2, high=.2, size=(2,1))
|
||||
if np.linalg.norm(self.goal) < 2: break
|
||||
qpos[-2:] = self.goal
|
||||
self.model.data.qpos = qpos
|
||||
qvel = self.init_qvel + np.random.rand(self.model.nv,1)*.01-.005
|
||||
qvel[-2:] = 0
|
||||
self.model.data.qvel = qvel
|
||||
self.reset_viewer_if_necessary()
|
||||
return self._get_obs()
|
||||
|
||||
def _get_obs(self):
|
||||
theta = self.model.data.qpos.flat[:2]
|
||||
return np.concatenate([
|
||||
np.cos(theta),
|
||||
np.sin(theta),
|
||||
self.model.data.qpos.flat[2:],
|
||||
self.model.data.qvel.flat[:2],
|
||||
self.get_body_com("fingertip") - self.get_body_com("target")
|
||||
])
|
35
gym/envs/mujoco/swimmer.py
Normal file
35
gym/envs/mujoco/swimmer.py
Normal file
@@ -0,0 +1,35 @@
|
||||
import numpy as np
|
||||
from gym import utils
|
||||
from gym.envs.mujoco import mujoco_env
|
||||
|
||||
class SwimmerEnv(mujoco_env.MujocoEnv, utils.EzPickle):
|
||||
def __init__(self):
|
||||
mujoco_env.MujocoEnv.__init__(self, 'swimmer.xml', 4)
|
||||
utils.EzPickle.__init__(self)
|
||||
self.ctrl_cost_coeff = 0.0001
|
||||
self.finalize()
|
||||
|
||||
def _step(self, a):
|
||||
xposbefore = self.model.data.qpos[0,0]
|
||||
self.do_simulation(a, self.frame_skip)
|
||||
xposafter = self.model.data.qpos[0,0]
|
||||
reward_fwd = (xposafter - xposbefore) / self.dt
|
||||
reward_ctrl = - self.ctrl_cost_coeff * np.square(a).sum()
|
||||
reward = reward_fwd + reward_ctrl
|
||||
ob = self._get_obs()
|
||||
return ob, reward, False, dict(reward_fwd = reward_fwd, reward_ctrl=reward_ctrl)
|
||||
|
||||
|
||||
def _get_obs(self):
|
||||
qpos = self.model.data.qpos
|
||||
qvel = self.model.data.qvel
|
||||
return np.concatenate([
|
||||
qpos.flat[2:],
|
||||
qvel.flat
|
||||
])
|
||||
|
||||
def _reset(self):
|
||||
self.model.data.qpos = self.init_qpos + np.random.uniform(size=(self.model.nq,1),low=-.1,high=.1)
|
||||
self.model.data.qvel = self.init_qvel + np.random.uniform(size=(self.model.nv,1),low=-.1,high=.1)
|
||||
self.reset_viewer_if_necessary()
|
||||
return self._get_obs()
|
41
gym/envs/mujoco/walker2d.py
Normal file
41
gym/envs/mujoco/walker2d.py
Normal file
@@ -0,0 +1,41 @@
|
||||
import numpy as np
|
||||
from gym import utils
|
||||
from gym.envs.mujoco import mujoco_env
|
||||
|
||||
# copied from hopper
|
||||
class Walker2dEnv(mujoco_env.MujocoEnv, utils.EzPickle):
|
||||
|
||||
def __init__(self):
|
||||
mujoco_env.MujocoEnv.__init__(self, "walker2d.xml", 4)
|
||||
utils.EzPickle.__init__(self)
|
||||
self.finalize()
|
||||
|
||||
def _step(self, a):
|
||||
posbefore = self.model.data.qpos[0,0]
|
||||
self.do_simulation(a, self.frame_skip)
|
||||
posafter,height,ang = self.model.data.qpos[0:3,0]
|
||||
alive_bonus = 1.0
|
||||
reward = ((posafter - posbefore) / self.dt )
|
||||
reward += alive_bonus
|
||||
reward -= 1e-3 * np.square(a).sum()
|
||||
done = not (height > 0.8 and height < 2.0
|
||||
and ang > -1.0 and ang < 1.0)
|
||||
ob = self._get_obs()
|
||||
return ob, reward, done, {}
|
||||
|
||||
def _get_obs(self):
|
||||
qpos = self.model.data.qpos
|
||||
qvel = self.model.data.qvel
|
||||
return np.concatenate([qpos[1:], np.clip(qvel,-10,10)]).ravel()
|
||||
|
||||
def _reset(self):
|
||||
self.model.data.qpos = self.init_qpos + np.random.rand(self.model.nq,1)*.01-.005
|
||||
self.model.data.qvel = self.init_qvel + np.random.rand(self.model.nv,1)*.01-.005
|
||||
self.reset_viewer_if_necessary()
|
||||
return self._get_obs()
|
||||
|
||||
def viewer_setup(self):
|
||||
self.viewer.cam.trackbodyid = 2
|
||||
self.viewer.cam.distance = self.model.stat.extent * 0.5
|
||||
self.viewer.cam.lookat[2] += .8
|
||||
self.viewer.cam.elevation = -20
|
115
gym/envs/registration.py
Normal file
115
gym/envs/registration.py
Normal file
@@ -0,0 +1,115 @@
|
||||
import logging
|
||||
import pkg_resources
|
||||
import re
|
||||
import six
|
||||
import sys
|
||||
from gym import error
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
# This format is true today, but it's *not* an official spec.
|
||||
env_id_re = re.compile(r'^([\w:-]+)-v(\d+)$')
|
||||
|
||||
def load(name):
|
||||
entry_point = pkg_resources.EntryPoint.parse('x={}'.format(name))
|
||||
try:
|
||||
result = entry_point.load(False)
|
||||
except ImportError as e:
|
||||
_, _, traceback = sys.exc_info()
|
||||
new_e = ImportError("{} (while loading {})".format(e, name))
|
||||
six.reraise(type(new_e), new_e, traceback)
|
||||
else:
|
||||
return result
|
||||
|
||||
class EnvSpec(object):
|
||||
"""A specification for a particular instance of the environment. Used
|
||||
to register the parameters for official evaluations.
|
||||
|
||||
Args:
|
||||
id (str): The official environment ID
|
||||
entry_point (str): The Python entrypoint of the environment class (e.g. module.name:Class)
|
||||
timestep_limit (int): The max number of timesteps per episode during training
|
||||
trials (int): The number of trials to average reward over
|
||||
reward_threshold (Optional[int]): The reward threshold before the task is considered solved
|
||||
kwargs (dict): The kwargs to pass to the environment class
|
||||
|
||||
Attributes:
|
||||
id (str): The official environment ID
|
||||
timestep_limit (int): The max number of timesteps per episode in official evaluation
|
||||
trials (int): The number of trials run in official evaluation
|
||||
"""
|
||||
|
||||
def __init__(self, id, entry_point, timestep_limit=1000, trials=100, reward_threshold=None, kwargs=None):
|
||||
self.id = id
|
||||
# Evaluation parameters
|
||||
self.timestep_limit = timestep_limit
|
||||
self.trials = trials
|
||||
self.reward_threshold = reward_threshold
|
||||
|
||||
# We may make some of these other parameters public if they're
|
||||
# useful.
|
||||
match = env_id_re.search(id)
|
||||
if not match:
|
||||
raise error.Error('Attempted to register malformed environment ID: {}. (Currently all IDs must be of the form {}.)'.format(id, env_id_re.pattern))
|
||||
self._entry_point = entry_point
|
||||
self._kwargs = {} if kwargs is None else kwargs
|
||||
|
||||
def make(self):
|
||||
"""Instantiates an instance of the environment with appropriate kwargs"""
|
||||
cls = load(self._entry_point)
|
||||
try:
|
||||
env = cls(**self._kwargs)
|
||||
except TypeError as e:
|
||||
type, value, traceback = sys.exc_info()
|
||||
|
||||
# This likely indicates unsupported kwargs
|
||||
six.reraise(type, """Could not 'make' {} ({}): {}.
|
||||
|
||||
(For reference, the environment was instantiated with kwargs: {}).""".format(self.id, cls, e.message, self._kwargs), traceback)
|
||||
|
||||
# Make the enviroment aware of which spec it came from.
|
||||
env.spec = self
|
||||
return env
|
||||
|
||||
def __repr__(self):
|
||||
return "EnvSpec({})".format(self.id)
|
||||
|
||||
|
||||
class EnvRegistry(object):
|
||||
"""Register an env by ID. IDs remain stable over time and are
|
||||
guaranteed to resolve to the same environment dynamics (or be
|
||||
desupported). The goal is that results on a particular environment
|
||||
should always be comparable, and not depend on the version of the
|
||||
code that was running.
|
||||
"""
|
||||
|
||||
def __init__(self):
|
||||
self.env_specs = {}
|
||||
|
||||
def make(self, id):
|
||||
logger.info('Making new env: %s', id)
|
||||
spec = self.spec(id)
|
||||
return spec.make()
|
||||
|
||||
def all(self):
|
||||
return self.env_specs.values()
|
||||
|
||||
def spec(self, id):
|
||||
match = env_id_re.search(id)
|
||||
if not match:
|
||||
raise error.Error('Attempted to look up malformed environment ID: {}. (Currently all IDs must be of the form {}.)'.format(id.encode('utf-8'), env_id_re.pattern))
|
||||
|
||||
try:
|
||||
return self.env_specs[id]
|
||||
except KeyError:
|
||||
raise error.UnregisteredEnv('No registered env with id: {}'.format(id))
|
||||
|
||||
def register(self, id, entry_point, **kwargs):
|
||||
if id in self.env_specs:
|
||||
raise error.Error('Cannot re-register id: {}'.format(id))
|
||||
self.env_specs[id] = EnvSpec(id, entry_point, **kwargs)
|
||||
|
||||
# Have a global registry
|
||||
registry = EnvRegistry()
|
||||
register = registry.register
|
||||
make = registry.make
|
||||
spec = registry.spec
|
35
gym/envs/tests/test_envs.py
Normal file
35
gym/envs/tests/test_envs.py
Normal file
@@ -0,0 +1,35 @@
|
||||
import numpy as np
|
||||
from nose2 import tools
|
||||
from gym import envs
|
||||
|
||||
# This runs a smoketest on each official registered env. We may want
|
||||
# to try also running environments which are not officially registered
|
||||
# envs.
|
||||
specs = [spec for spec in envs.registry.all() if (not spec.id.startswith("atari")) or ("space_invaders" in spec.id)] # only test space invaders out of atari games
|
||||
@tools.params(*specs)
|
||||
def test_env(spec):
|
||||
env = spec.make()
|
||||
ob_space = env.observation_space
|
||||
act_space = env.action_space
|
||||
ob = env.reset()
|
||||
assert ob_space.contains(ob), 'Reset observation: {!r} not in space'.format(ob)
|
||||
a = act_space.sample()
|
||||
observation, reward, done, _info = env.step(a)
|
||||
assert ob_space.contains(observation), 'Step observation: {!r} not in space'.format(observation)
|
||||
assert np.isscalar(reward), "{} is not a scalar for {}".format(reward, env)
|
||||
assert isinstance(done, bool), "Expected {} to be a boolean".format(done)
|
||||
|
||||
for mode in env.metadata.get('render.modes'):
|
||||
env.render(mode=mode)
|
||||
|
||||
# Run a longer rollout on some environments
|
||||
def test_random_rollout():
|
||||
for env in [envs.make('CartPole-v0'), envs.make('FrozenLake-v0')]:
|
||||
agent = lambda ob: env.action_space.sample()
|
||||
ob = env.reset()
|
||||
for _ in xrange(10):
|
||||
assert env.observation_space.contains(ob)
|
||||
a = agent(ob)
|
||||
assert env.action_space.contains(a)
|
||||
(ob, _reward, done, _info) = env.step(a)
|
||||
if done: break
|
35
gym/envs/tests/test_registration.py
Normal file
35
gym/envs/tests/test_registration.py
Normal file
@@ -0,0 +1,35 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
from gym import error, envs
|
||||
from gym.envs import registration
|
||||
from gym.envs.classic_control import cartpole
|
||||
|
||||
def test_make():
|
||||
env = envs.make('CartPole-v0')
|
||||
assert env.spec.id == 'CartPole-v0'
|
||||
assert isinstance(env, cartpole.CartPoleEnv)
|
||||
|
||||
def test_spec():
|
||||
spec = envs.spec('CartPole-v0')
|
||||
assert spec.id == 'CartPole-v0'
|
||||
|
||||
def test_missing_lookup():
|
||||
registry = registration.EnvRegistry()
|
||||
registry.register(id='Test-v0', entry_point=None)
|
||||
registry.register(id='Test-v15', entry_point=None)
|
||||
registry.register(id='Test-v9', entry_point=None)
|
||||
registry.register(id='Other-v100', entry_point=None)
|
||||
try:
|
||||
registry.spec('Test-v1')
|
||||
except error.UnregisteredEnv:
|
||||
pass
|
||||
else:
|
||||
assert False
|
||||
|
||||
def test_malformed_lookup():
|
||||
registry = registration.EnvRegistry()
|
||||
try:
|
||||
registry.spec(u'“Breakout-v0”')
|
||||
except error.Error as e:
|
||||
assert 'malformed environment ID' in e.message, 'Unexpected message: {}'.format(e)
|
||||
else:
|
||||
assert False
|
2
gym/envs/toy_text/__init__.py
Normal file
2
gym/envs/toy_text/__init__.py
Normal file
@@ -0,0 +1,2 @@
|
||||
from gym.envs.toy_text.roulette import RouletteEnv
|
||||
from gym.envs.toy_text.frozen_lake import FrozenLakeEnv
|
40
gym/envs/toy_text/discrete.py
Normal file
40
gym/envs/toy_text/discrete.py
Normal file
@@ -0,0 +1,40 @@
|
||||
from gym import Env
|
||||
from gym import spaces
|
||||
import numpy as np
|
||||
|
||||
def categorical_sample(prob_n):
|
||||
"""
|
||||
Sample from categorical distribution
|
||||
Each row specifies class probabilities
|
||||
"""
|
||||
prob_n = np.asarray(prob_n)
|
||||
csprob_n = np.cumsum(prob_n)
|
||||
return (csprob_n > np.random.rand()).argmax()
|
||||
|
||||
|
||||
class DiscreteEnv(Env):
|
||||
def __init__(self, nS, nA, P, isd):
|
||||
"""
|
||||
Compute a transition probabilities, of the form
|
||||
P[s][a] == [(probability, nextstate, reward, done)]
|
||||
|
||||
also compute initial state distribution
|
||||
"""
|
||||
self.action_space = spaces.Discrete(nA)
|
||||
self.observation_space = spaces.Discrete(nS)
|
||||
self.nA = nA
|
||||
self.P = P
|
||||
self.isd = isd
|
||||
self.lastaction=None # for rendering
|
||||
|
||||
def _reset(self):
|
||||
self.s = categorical_sample(self.isd)
|
||||
return self.s
|
||||
|
||||
def _step(self, a):
|
||||
transitions = self.P[self.s][a]
|
||||
i = categorical_sample([t[0] for t in transitions])
|
||||
p, s, r, d= transitions[i]
|
||||
self.s = s
|
||||
self.lastaction=a
|
||||
return (s, r, d, {"prob" : p})
|
127
gym/envs/toy_text/frozen_lake.py
Normal file
127
gym/envs/toy_text/frozen_lake.py
Normal file
@@ -0,0 +1,127 @@
|
||||
import numpy as np
|
||||
import StringIO, sys
|
||||
|
||||
from gym import utils
|
||||
from gym.envs.toy_text import discrete
|
||||
|
||||
UP = 0
|
||||
RIGHT = 1
|
||||
DOWN = 2
|
||||
LEFT = 3
|
||||
|
||||
MAPS = {
|
||||
"4x4": [
|
||||
"SFFF",
|
||||
"FHFH",
|
||||
"FFFH",
|
||||
"HFFG"
|
||||
],
|
||||
"8x8": [
|
||||
"SFFFFFFF",
|
||||
"FFFFFFFF",
|
||||
"FFFHFFFF",
|
||||
"FFFFFHFF",
|
||||
"FFFHFFFF",
|
||||
"FHHFFFHF",
|
||||
"FHFFHFHF",
|
||||
"FFFHFFFG"
|
||||
],
|
||||
}
|
||||
|
||||
class FrozenLakeEnv(discrete.DiscreteEnv):
|
||||
"""
|
||||
Winter is here. You and your friends were tossing around a frisbee at the park
|
||||
when you made a wild throw that left the frisbee out in the middle of the lake.
|
||||
The water is mostly frozen, but there are a few holes where the ice has melted.
|
||||
If you step into one of those holes, you'll fall into the freezing water.
|
||||
At this time, there's an international frisbee shortage, so it's absolutely imperative that
|
||||
you navigate across the lake and retrieve the disc.
|
||||
However, the ice is slippery, so you won't always move in the direction you intend.
|
||||
The surface is described using a grid like the following
|
||||
|
||||
SFFF
|
||||
FHFH
|
||||
FFFH
|
||||
HFFG
|
||||
|
||||
S : starting point, safe
|
||||
F : frozen surface, safe
|
||||
H : hole, fall to your doom
|
||||
G : goal, where the frisbee is located
|
||||
|
||||
The episode ends when you reach the goal or fall in a hole.
|
||||
You receive a reward of 1 if you reach the goal, and zero otherwise.
|
||||
|
||||
"""
|
||||
|
||||
metadata = {'render.modes': ['human', 'ansi']}
|
||||
|
||||
def __init__(self, desc=None, map_name="4x4",is_slippery=True):
|
||||
if desc is None and map_name is None:
|
||||
raise ValueError('Must provide either desc or map_name')
|
||||
elif desc is None:
|
||||
desc = MAPS[map_name]
|
||||
self.desc = desc = np.asarray(desc,dtype='c')
|
||||
self.nrow, self.ncol = nrow, ncol = desc.shape
|
||||
|
||||
nA = 4
|
||||
nS = nrow * ncol
|
||||
|
||||
isd = (desc == 'S').ravel().astype('float64')
|
||||
isd /= isd.sum()
|
||||
|
||||
P = {s : {a : [] for a in xrange(nA)} for s in xrange(nS)}
|
||||
|
||||
def to_s(row, col):
|
||||
return row*ncol + col
|
||||
def inc(row, col, a):
|
||||
if a==0:
|
||||
col = max(col-1,0)
|
||||
elif a==1:
|
||||
row = min(row+1,nrow-1)
|
||||
elif a==2:
|
||||
col = min(col+1,ncol-1)
|
||||
elif a==3:
|
||||
row = max(row-1,0)
|
||||
return (row, col)
|
||||
|
||||
for row in xrange(nrow):
|
||||
for col in xrange(ncol):
|
||||
s = to_s(row, col)
|
||||
for a in xrange(4):
|
||||
li = P[s][a]
|
||||
if is_slippery:
|
||||
for b in [(a-1)%4, a, (a+1)%4]:
|
||||
newrow, newcol = inc(row, col, b)
|
||||
newstate = to_s(newrow, newcol)
|
||||
letter = desc[newrow, newcol]
|
||||
done = letter in 'GH'
|
||||
rew = float(letter == 'G')
|
||||
li.append((1.0/3.0, newstate, rew, done))
|
||||
else:
|
||||
newrow, newcol = inc(row, col, a)
|
||||
newstate = to_s(newrow, newcol)
|
||||
letter = desc[newrow, newcol]
|
||||
done = letter in 'GH'
|
||||
rew = float(letter == 'G')
|
||||
li.append((1.0/3.0, newstate, rew, done))
|
||||
|
||||
super(FrozenLakeEnv, self).__init__(nrow * ncol, 4, P, isd)
|
||||
|
||||
def _render(self, mode='human', close=False):
|
||||
if close:
|
||||
return
|
||||
|
||||
outfile = StringIO.StringIO() if mode == 'ansi' else sys.stdout
|
||||
|
||||
row, col = self.s // self.ncol, self.s % self.ncol
|
||||
desc = self.desc.tolist()
|
||||
desc[row][col] = utils.colorize(desc[row][col], "red", highlight=True)
|
||||
|
||||
outfile.write("\n".join("".join(row) for row in desc)+"\n")
|
||||
if self.lastaction is not None:
|
||||
outfile.write(" ({})\n".format(["Left","Down","Right","Up"][self.lastaction]))
|
||||
else:
|
||||
outfile.write("\n")
|
||||
|
||||
return outfile
|
40
gym/envs/toy_text/roulette.py
Normal file
40
gym/envs/toy_text/roulette.py
Normal file
@@ -0,0 +1,40 @@
|
||||
import numpy as np
|
||||
|
||||
import gym
|
||||
from gym import spaces
|
||||
|
||||
|
||||
class RouletteEnv(gym.Env):
|
||||
"""Simple roulette environment
|
||||
|
||||
The roulette wheel has 37 spots. If the bet is 0 and a 0 comes up,
|
||||
you win a reward of 35. If the parity of your bet matches the parity
|
||||
of the spin, you win 1. Otherwise you receive a reward of -1.
|
||||
|
||||
The long run reward for playing 0 should be -1/37 for any state
|
||||
|
||||
The last action (38) stops the rollout for a return of 0 (walking away)
|
||||
"""
|
||||
def __init__(self, spots=37):
|
||||
self.n = spots + 1
|
||||
self.action_space = spaces.Discrete(self.n)
|
||||
self.observation_space = spaces.Discrete(1)
|
||||
|
||||
def _step(self, action):
|
||||
assert(action >= 0 and action < self.n)
|
||||
if action == self.n - 1:
|
||||
# observation, reward, done, info
|
||||
return 0, 0, True, {}
|
||||
|
||||
# N.B. np.random.randint draws from [A, B) while random.randint draws from [A,B]
|
||||
val = np.random.randint(0, self.n - 1)
|
||||
if val == action == 0:
|
||||
reward = self.n - 2.0
|
||||
elif val != 0 and action != 0 and val % 2 == action % 2:
|
||||
reward = 1.0
|
||||
else:
|
||||
reward = -1.0
|
||||
return 0, reward, False, {}
|
||||
|
||||
def _reset(self):
|
||||
return 0
|
135
gym/envs/toy_text/taxi.py
Normal file
135
gym/envs/toy_text/taxi.py
Normal file
@@ -0,0 +1,135 @@
|
||||
import numpy as np
|
||||
import StringIO, sys
|
||||
|
||||
from gym import spaces, utils
|
||||
from gym.envs.toy_text import discrete
|
||||
|
||||
MAP = [
|
||||
"+---------+",
|
||||
"|R: | : :G|",
|
||||
"| : : : : |",
|
||||
"| : : : : |",
|
||||
"| | :F| : |",
|
||||
"|Y| : |B: |",
|
||||
"+---------+",
|
||||
]
|
||||
|
||||
class TaxiEnv(discrete.DiscreteEnv):
|
||||
"""
|
||||
The Taxi Problem
|
||||
from "Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition"
|
||||
by Tom Dietterich
|
||||
|
||||
rendering:
|
||||
- blue: passenger
|
||||
- magenta: destination
|
||||
- yellow: empty taxi
|
||||
- green: full taxi
|
||||
- other letters: locations
|
||||
|
||||
"""
|
||||
metadata = {'render.modes': ['human', 'ansi']}
|
||||
|
||||
def __init__(self):
|
||||
self.desc = np.asarray(MAP,dtype='c')
|
||||
|
||||
self.locs = locs = [(0,0), (0,4), (4,0), (3,2), (4,3)]
|
||||
|
||||
nS = 500
|
||||
nR = 5
|
||||
nC = 5
|
||||
maxR = nR-1
|
||||
maxC = nC-1
|
||||
isd = np.zeros(nS)
|
||||
nA = 6
|
||||
P = {s : {a : [] for a in xrange(nA)} for s in xrange(nS)}
|
||||
for row in xrange(5):
|
||||
for col in xrange(5):
|
||||
for passidx in xrange(5):
|
||||
for destidx in xrange(4):
|
||||
for a in xrange(nA):
|
||||
state = self.encode(row, col, passidx, destidx)
|
||||
# defaults
|
||||
newrow, newcol, newpassidx = row, col, passidx
|
||||
reward = -1
|
||||
done = False
|
||||
taxiloc = (row, col)
|
||||
|
||||
if a==0:
|
||||
newrow = min(row+1, maxR)
|
||||
elif a==1:
|
||||
newrow = max(row-1, 0)
|
||||
if a==2 and self.desc[1+row,2*col+2]==":":
|
||||
newcol = min(col+1, maxC)
|
||||
elif a==3 and self.desc[1+row,2*col]==":":
|
||||
newcol = max(col-1, 0)
|
||||
elif a==4: # pickup
|
||||
if (taxiloc == locs[passidx]):
|
||||
newpassidx = 4
|
||||
else:
|
||||
reward = -10
|
||||
elif a==5: # dropoff
|
||||
if (taxiloc == locs[destidx]) and passidx==4:
|
||||
done = True
|
||||
elif (taxiloc in locs) and passidx==4:
|
||||
newpassidx = locs.index(taxiloc)
|
||||
else:
|
||||
reward = -10
|
||||
newstate = self.encode(newrow, newcol, newpassidx, destidx)
|
||||
if passidx < 4: isd[state] += 1
|
||||
P[state][a].append((1.0, newstate, reward, done))
|
||||
isd /= isd.sum()
|
||||
discrete.DiscreteEnv.__init__(self, nS, nA, P, isd)
|
||||
|
||||
self.observation_space = spaces.Discrete(500)
|
||||
self.action_space = spaces.Discrete(6)
|
||||
|
||||
def encode(self, taxirow, taxicol, passloc, destidx):
|
||||
# (5) 5, 5, 4
|
||||
i = taxirow
|
||||
i *= 5
|
||||
i += taxicol
|
||||
i *= 5
|
||||
i += passloc
|
||||
i *= 4
|
||||
i += destidx
|
||||
return i
|
||||
|
||||
def decode(self, i):
|
||||
out = []
|
||||
out.append(i % 4)
|
||||
i = i // 4
|
||||
out.append(i % 5)
|
||||
i = i // 5
|
||||
out.append(i % 5)
|
||||
i = i // 5
|
||||
out.append(i)
|
||||
assert 0 <= i < 5
|
||||
return reversed(out)
|
||||
|
||||
def _render(self, mode='human', close=False):
|
||||
if close:
|
||||
return
|
||||
|
||||
outfile = StringIO.StringIO() if mode == 'ansi' else sys.stdout
|
||||
|
||||
out = self.desc.copy().tolist()
|
||||
taxirow, taxicol, passidx, destidx = self.decode(self.s)
|
||||
def ul(x): return "_" if x == " " else x
|
||||
if passidx < 4:
|
||||
out[1+taxirow][2*taxicol+1] = utils.colorize(out[1+taxirow][2*taxicol+1], 'yellow', highlight=True)
|
||||
pi, pj = self.locs[passidx]
|
||||
out[1+pi][2*pj+1] = utils.colorize(out[1+pi][2*pj+1], 'blue', bold=True)
|
||||
else: # passenger in taxi
|
||||
out[1+taxirow][2*taxicol+1] = utils.colorize(ul(out[1+taxirow][2*taxicol+1]), 'green', highlight=True)
|
||||
|
||||
di, dj = self.locs[destidx]
|
||||
out[1+di][2*dj+1] = utils.colorize(out[1+di][2*dj+1], 'magenta')
|
||||
outfile.write("\n".join(["".join(row) for row in out])+"\n")
|
||||
if self.lastaction is not None:
|
||||
outfile.write(" ({})\n".format(["North", "South", "East", "West", "Pickup", "Dropoff"][self.lastaction]))
|
||||
else: outfile.write("\n")
|
||||
|
||||
# No need to return anything for human
|
||||
if mode != 'human':
|
||||
return outfile
|
97
gym/error.py
Normal file
97
gym/error.py
Normal file
@@ -0,0 +1,97 @@
|
||||
import sys
|
||||
|
||||
class Error(Exception):
|
||||
pass
|
||||
|
||||
# Local errors
|
||||
|
||||
class UnregisteredEnv(Error):
|
||||
"""Raised when the user requests an env from the registry that does
|
||||
not actually exist.
|
||||
"""
|
||||
pass
|
||||
|
||||
class DependencyNotInstalled(Error):
|
||||
pass
|
||||
|
||||
class UnsupportedMode(Exception):
|
||||
"""Raised when the user requests a rendering mode not supported by the
|
||||
environment.
|
||||
"""
|
||||
pass
|
||||
|
||||
class ResetNeeded(Exception):
|
||||
"""When the monitor is active, raised when the user tries to step an
|
||||
environment that's already done.
|
||||
"""
|
||||
pass
|
||||
|
||||
class ResetNotAllowed(Exception):
|
||||
"""When the monitor is active, raised when the user tries to step an
|
||||
environment that's not yet done.
|
||||
"""
|
||||
pass
|
||||
|
||||
# API errors
|
||||
|
||||
class APIError(Error):
|
||||
def __init__(self, message=None, http_body=None, http_status=None,
|
||||
json_body=None, headers=None):
|
||||
super(APIError, self).__init__(message)
|
||||
|
||||
if http_body and hasattr(http_body, 'decode'):
|
||||
try:
|
||||
http_body = http_body.decode('utf-8')
|
||||
except:
|
||||
http_body = ('<Could not decode body as utf-8. '
|
||||
'Please report to gym@openai.com>')
|
||||
|
||||
self._message = message
|
||||
self.http_body = http_body
|
||||
self.http_status = http_status
|
||||
self.json_body = json_body
|
||||
self.headers = headers or {}
|
||||
self.request_id = self.headers.get('request-id', None)
|
||||
|
||||
def __unicode__(self):
|
||||
if self.request_id is not None:
|
||||
msg = self._message or "<empty message>"
|
||||
return u"Request {0}: {1}".format(self.request_id, msg)
|
||||
else:
|
||||
return self._message
|
||||
|
||||
if sys.version_info > (3, 0):
|
||||
def __str__(self):
|
||||
return self.__unicode__()
|
||||
else:
|
||||
def __str__(self):
|
||||
return unicode(self).encode('utf-8')
|
||||
|
||||
|
||||
class APIConnectionError(APIError):
|
||||
pass
|
||||
|
||||
|
||||
class InvalidRequestError(APIError):
|
||||
|
||||
def __init__(self, message, param, http_body=None,
|
||||
http_status=None, json_body=None, headers=None):
|
||||
super(InvalidRequestError, self).__init__(
|
||||
message, http_body, http_status, json_body,
|
||||
headers)
|
||||
self.param = param
|
||||
|
||||
|
||||
class AuthenticationError(APIError):
|
||||
pass
|
||||
|
||||
class RateLimitError(APIError):
|
||||
pass
|
||||
|
||||
# Video errors
|
||||
|
||||
class VideoRecorderError(Error):
|
||||
pass
|
||||
|
||||
class InvalidFrame(Error):
|
||||
pass
|
3
gym/monitoring/__init__.py
Normal file
3
gym/monitoring/__init__.py
Normal file
@@ -0,0 +1,3 @@
|
||||
from gym.monitoring.monitor import Monitor, load_results, monitors as _monitors
|
||||
from gym.monitoring.stats_recorder import StatsRecorder
|
||||
from gym.monitoring.video_recorder import VideoRecorder
|
328
gym/monitoring/monitor.py
Normal file
328
gym/monitoring/monitor.py
Normal file
@@ -0,0 +1,328 @@
|
||||
import atexit
|
||||
import logging
|
||||
import json
|
||||
import numpy as np
|
||||
import os
|
||||
import six
|
||||
import sys
|
||||
import threading
|
||||
import weakref
|
||||
|
||||
from gym import error, version
|
||||
from gym.monitoring import stats_recorder, video_recorder
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
FILE_PREFIX = 'openaigym'
|
||||
MANIFEST_PREFIX = FILE_PREFIX + '.manifest'
|
||||
|
||||
i = -1
|
||||
lock = threading.Lock()
|
||||
def next_monitor_id():
|
||||
global i
|
||||
with lock:
|
||||
i += 1
|
||||
return i
|
||||
|
||||
def detect_training_manifests(training_dir):
|
||||
return [os.path.join(training_dir, f) for f in os.listdir(training_dir) if f.startswith(MANIFEST_PREFIX + '.')]
|
||||
|
||||
def detect_monitor_files(training_dir):
|
||||
return [os.path.join(training_dir, f) for f in os.listdir(training_dir) if f.startswith(FILE_PREFIX + '.')]
|
||||
|
||||
def clear_monitor_files(training_dir):
|
||||
files = detect_monitor_files(training_dir)
|
||||
if len(files) == 0:
|
||||
return
|
||||
|
||||
logger.info('Clearing %d monitor files from previous run (because force=True was provided)', len(files))
|
||||
for file in files:
|
||||
os.unlink(file)
|
||||
|
||||
def capped_cubic_video_schedule(episode_id):
|
||||
if episode_id < 1000:
|
||||
return int(round(episode_id ** (1. / 3))) ** 3 == episode_id
|
||||
else:
|
||||
return episode_id % 1000 == 0
|
||||
|
||||
# Monitors will automatically close themselves when garbage collected
|
||||
# (via __del__) or when the program exits (via close_all_monitors's
|
||||
# atexit behavior).
|
||||
monitors = weakref.WeakValueDictionary()
|
||||
def ensure_close_at_exit(monitor):
|
||||
monitors[monitor.monitor_id] = monitor
|
||||
|
||||
@atexit.register
|
||||
def close_all_monitors():
|
||||
for key, monitor in monitors.items():
|
||||
monitor.close()
|
||||
|
||||
class Monitor(object):
|
||||
"""A configurable monitor for your training runs.
|
||||
|
||||
Every env has an attached monitor, which you can access as
|
||||
'env.monitor'. Simple usage is just to call 'monitor.start(dir)'
|
||||
to begin monitoring and 'monitor.close()' when training is
|
||||
complete. This will record stats and will periodically record a video.
|
||||
|
||||
For finer-grained control over how often videos are collected, use the
|
||||
video_callable argument, e.g.
|
||||
'monitor.start(video_callable=lambda count: count % 100 == 0)'
|
||||
to record every 100 episodes. ('count' is how many episodes have completed)
|
||||
|
||||
Depending on the environment, video can slow down execution. You
|
||||
can also use 'monitor.configure(video=lambda count: False)' to disable
|
||||
video.
|
||||
|
||||
Monitor supports multiple threads and multiple processes writing
|
||||
to the same directory of training data. The data will later be
|
||||
joined by scoreboard.upload_training_data and on the server.
|
||||
|
||||
Args:
|
||||
env (gym.Env): The environment instance to monitor.
|
||||
|
||||
Attributes:
|
||||
id (Optional[str]): The ID of the monitored environment
|
||||
|
||||
"""
|
||||
|
||||
def __init__(self, env):
|
||||
self.env = env
|
||||
self.videos = []
|
||||
|
||||
self.stats_recorder = None
|
||||
self.video_recorder = None
|
||||
self.enabled = False
|
||||
self.episode_id = 0
|
||||
|
||||
self.monitor_id = next_monitor_id()
|
||||
|
||||
ensure_close_at_exit(self)
|
||||
|
||||
def start(self, directory, video_callable=None, force=False):
|
||||
"""Start monitoring.
|
||||
|
||||
Args:
|
||||
directory (str): A per-training run directory where to record stats.
|
||||
video_callable: function that takes in the index of the episode and outputs a boolean, indicating whether we should record a video on this episode. The default is to take perfect cubes.
|
||||
force (bool): Clear out existing training data from this directory (by deleting every file prefixed with "openaigym.").
|
||||
"""
|
||||
if self.env.spec is None:
|
||||
logger.warn("Trying to monitor an environment which has no 'spec' set. This usually means you did not create it via 'gym.make', and is recommended only for advanced users.")
|
||||
|
||||
if not os.path.exists(directory):
|
||||
logger.info('Creating monitor directory %s', directory)
|
||||
os.makedirs(directory)
|
||||
|
||||
if video_callable is None:
|
||||
video_callable = capped_cubic_video_schedule
|
||||
|
||||
# Check on whether we need to clear anything
|
||||
if force:
|
||||
clear_monitor_files(directory)
|
||||
else:
|
||||
training_manifests = detect_training_manifests(directory)
|
||||
if len(training_manifests) > 0:
|
||||
raise error.Error('''Trying to write to monitor directory {} with existing monitor files: {}.
|
||||
|
||||
You should use a unique directory for each training run, or use 'force=True' to automatically clear previous monitor files.'''.format(directory, ', '.join(training_manifests[:5])))
|
||||
|
||||
|
||||
self.enabled = True
|
||||
self.directory = os.path.abspath(directory)
|
||||
# We use the 'openai-gym' prefix to determine if a file is
|
||||
# ours
|
||||
self.file_prefix = FILE_PREFIX
|
||||
self.file_infix = str(self.monitor_id)
|
||||
self.stats_recorder = stats_recorder.StatsRecorder(directory, '{}.episode_batch.{}'.format(self.file_prefix, self.file_infix))
|
||||
self.configure(video_callable=video_callable)
|
||||
if not os.path.exists(directory):
|
||||
os.mkdir(directory)
|
||||
|
||||
def close(self):
|
||||
"""Flush all monitor data to disk and close any open rending windows."""
|
||||
if not self.enabled:
|
||||
return
|
||||
stats_file = None
|
||||
|
||||
if self.stats_recorder:
|
||||
stats_file = self.stats_recorder.close()
|
||||
if self.video_recorder is not None:
|
||||
self._close_video_recorder()
|
||||
# Note we'll close the env's rendering window even if we did
|
||||
# not open it. There isn't a particular great way to know if
|
||||
# we did, since some environments will have a window pop up
|
||||
# during video recording.
|
||||
try:
|
||||
self.env.render(close=True)
|
||||
except Exception:
|
||||
type, value, traceback = sys.exc_info()
|
||||
if self.env.spec:
|
||||
key = self.env.spec.id
|
||||
else:
|
||||
key = self.env
|
||||
# This likely indicates unsupported kwargs
|
||||
six.reraise(type, '{} (when closing {})'.format(value, key), traceback)
|
||||
|
||||
# Give it a very distiguished name, since we need to pick it
|
||||
# up from the filesystem later.
|
||||
path = os.path.join(self.directory, '{}.manifest.{}.{}.manifest.json'.format(self.file_prefix, self.file_infix, os.getpid()))
|
||||
logger.debug('Writing training manifest file to %s', path)
|
||||
with open(path, 'w') as f:
|
||||
# We need to write relative paths here since people may
|
||||
# move the training_dir around. It would be cleaner to
|
||||
# already have the basenames rather than basename'ing
|
||||
# manually, but this works for now.
|
||||
json.dump({
|
||||
'stats': os.path.basename(stats_file),
|
||||
'videos': [(os.path.basename(v), os.path.basename(m))
|
||||
for v, m in self.videos],
|
||||
'env_info': self._env_info(),
|
||||
}, f)
|
||||
self.enabled = False
|
||||
# Stop tracking this for autoclose
|
||||
del monitors[self.monitor_id]
|
||||
|
||||
logger.info('''Finished writing results. You can upload them to the scoreboard via gym.upload(%r)''', self.directory)
|
||||
|
||||
def configure(self, video_callable=None):
|
||||
"""Reconfigure the monitor.
|
||||
|
||||
video_callable (function): Whether to record video to upload to the scoreboard.
|
||||
"""
|
||||
if video_callable is not None:
|
||||
self.video_callable = video_callable
|
||||
|
||||
def _before_step(self, action):
|
||||
if not self.enabled: return
|
||||
self.stats_recorder.before_step(action)
|
||||
|
||||
def _after_step(self, observation, reward, done, info):
|
||||
if not self.enabled: return done
|
||||
|
||||
# Add 1 since about to take another step
|
||||
if self.env.spec and self.stats_recorder.steps+1 >= self.env.spec.timestep_limit:
|
||||
logger.info('Ending episode %i because it reached the timestep limit of %i.', self.episode_id, self.env.spec.timestep_limit)
|
||||
done = True
|
||||
|
||||
# Record stats
|
||||
self.stats_recorder.after_step(observation, reward, done, info)
|
||||
# Record video
|
||||
self.video_recorder.capture_frame()
|
||||
|
||||
return done
|
||||
|
||||
|
||||
def _before_reset(self):
|
||||
if not self.enabled: return
|
||||
self.stats_recorder.before_reset()
|
||||
|
||||
def _after_reset(self, observation):
|
||||
if not self.enabled: return
|
||||
|
||||
# Reset the stat count
|
||||
self.stats_recorder.after_reset(observation)
|
||||
|
||||
# Close any existing video recorder
|
||||
if self.video_recorder:
|
||||
self._close_video_recorder()
|
||||
|
||||
# Start recording the next video.
|
||||
self.video_recorder = video_recorder.VideoRecorder(
|
||||
env=self.env,
|
||||
base_path=os.path.join(self.directory, '{}.video.{}.{}.video{:06}'.format(self.file_prefix, self.file_infix, os.getpid(), self.episode_id)),
|
||||
metadata={'episode_id': self.episode_id},
|
||||
enabled=self._video_enabled(),
|
||||
)
|
||||
self.video_recorder.capture_frame()
|
||||
|
||||
# Bump *after* all reset activity has finished
|
||||
self.episode_id += 1
|
||||
|
||||
def _close_video_recorder(self):
|
||||
self.video_recorder.close()
|
||||
if self.video_recorder.functional:
|
||||
self.videos.append((self.video_recorder.path, self.video_recorder.metadata_path))
|
||||
|
||||
def _video_enabled(self):
|
||||
return self.video_callable(self.episode_id)
|
||||
|
||||
def _env_info(self):
|
||||
if self.env.spec:
|
||||
return {
|
||||
'env_id': self.env.spec.id,
|
||||
'gym_version': version.VERSION,
|
||||
}
|
||||
else:
|
||||
return {}
|
||||
|
||||
def __del__(self):
|
||||
# Make sure we've closed up shop when garbage collecting
|
||||
self.close()
|
||||
|
||||
def load_results(training_dir):
|
||||
if not os.path.exists(training_dir):
|
||||
return
|
||||
|
||||
manifests = detect_training_manifests(training_dir)
|
||||
if not manifests:
|
||||
return
|
||||
|
||||
logger.debug('Uploading data from manifest %s', ', '.join(manifests))
|
||||
|
||||
# Load up stats + video files
|
||||
stats_files = []
|
||||
videos = []
|
||||
env_infos = []
|
||||
|
||||
for manifest in manifests:
|
||||
with open(manifest) as f:
|
||||
contents = json.load(f)
|
||||
# Make these paths absolute again
|
||||
stats_files.append(os.path.join(training_dir, contents['stats']))
|
||||
videos += [(os.path.join(training_dir, v), os.path.join(training_dir, m))
|
||||
for v, m in contents['videos']]
|
||||
env_infos.append(contents['env_info'])
|
||||
|
||||
env_info = collapse_env_infos(env_infos, training_dir)
|
||||
timestamps, episode_lengths, episode_rewards = merge_stats_files(stats_files)
|
||||
|
||||
return {
|
||||
'manifests': manifests,
|
||||
'env_info': env_info,
|
||||
'timestamps': timestamps,
|
||||
'episode_lengths': episode_lengths,
|
||||
'episode_rewards': episode_rewards,
|
||||
'videos': videos,
|
||||
}
|
||||
|
||||
def merge_stats_files(stats_files):
|
||||
timestamps = []
|
||||
episode_lengths = []
|
||||
episode_rewards = []
|
||||
|
||||
for path in stats_files:
|
||||
with open(path) as f:
|
||||
content = json.load(f)
|
||||
timestamps += content['timestamps']
|
||||
episode_lengths += content['episode_lengths']
|
||||
episode_rewards += content['episode_rewards']
|
||||
|
||||
idxs = np.argsort(timestamps)
|
||||
timestamps = np.array(timestamps)[idxs].tolist()
|
||||
episode_lengths = np.array(episode_lengths)[idxs].tolist()
|
||||
episode_rewards = np.array(episode_rewards)[idxs].tolist()
|
||||
return timestamps, episode_lengths, episode_rewards
|
||||
|
||||
def collapse_env_infos(env_infos, training_dir):
|
||||
assert len(env_infos) > 0
|
||||
|
||||
first = env_infos[0]
|
||||
for other in env_infos[1:]:
|
||||
if first != other:
|
||||
raise error.Error('Found two unequal env_infos: {} and {}. This usually indicates that your training directory {} has commingled results from multiple runs.'.format(first, other, training_dir))
|
||||
|
||||
for key in ['env_id', 'gym_version']:
|
||||
if key not in first:
|
||||
raise error.Error("env_info {} from training directory {} is missing expected key {}. This is unexpected and likely indicates a bug in gym.".format(first, training_dir, key))
|
||||
return first
|
56
gym/monitoring/stats_recorder.py
Normal file
56
gym/monitoring/stats_recorder.py
Normal file
@@ -0,0 +1,56 @@
|
||||
import json
|
||||
import os
|
||||
import time
|
||||
|
||||
from gym import error
|
||||
|
||||
class StatsRecorder(object):
|
||||
def __init__(self, directory, file_prefix):
|
||||
self.directory = directory
|
||||
self.file_prefix = file_prefix
|
||||
self.episode_lengths = []
|
||||
self.episode_rewards = []
|
||||
self.timestamps = []
|
||||
self.steps = None
|
||||
self.rewards = None
|
||||
|
||||
self.done = None
|
||||
|
||||
def before_step(self, action):
|
||||
if self.done:
|
||||
raise error.ResetNeeded("Trying to step environment which is currently done. While the monitor is active, you cannot step beyond the end of an episode. Call 'env.reset()' to start the next episode.")
|
||||
elif self.steps is None:
|
||||
raise error.ResetNeeded("Trying to step an environment before reset. While the monitor is active, you must call 'env.reset()' before taking an initial step.")
|
||||
|
||||
def after_step(self, observation, reward, done, info):
|
||||
self.steps += 1
|
||||
self.rewards += reward
|
||||
if done:
|
||||
self.done = True
|
||||
|
||||
def before_reset(self):
|
||||
self.done = False
|
||||
|
||||
def after_reset(self, observation):
|
||||
self.flush()
|
||||
|
||||
def flush(self):
|
||||
if self.steps is not None:
|
||||
self.episode_lengths.append(self.steps)
|
||||
self.episode_rewards.append(self.rewards)
|
||||
self.timestamps.append(time.time())
|
||||
self.steps = 0
|
||||
self.rewards = 0
|
||||
|
||||
def close(self):
|
||||
self.flush()
|
||||
|
||||
filename = '{}.{}.stats.json'.format(self.file_prefix, os.getpid())
|
||||
path = os.path.join(self.directory, filename)
|
||||
with open(path, 'w') as f:
|
||||
json.dump({
|
||||
'timestamps': self.timestamps,
|
||||
'episode_lengths': self.episode_lengths,
|
||||
'episode_rewards': self.episode_rewards,
|
||||
}, f)
|
||||
return path
|
67
gym/monitoring/tests/test_video_recorder.py
Normal file
67
gym/monitoring/tests/test_video_recorder.py
Normal file
@@ -0,0 +1,67 @@
|
||||
import json
|
||||
import os
|
||||
import shutil
|
||||
import tempfile
|
||||
|
||||
import numpy as np
|
||||
from nose2 import tools
|
||||
|
||||
import gym
|
||||
from gym.monitoring import VideoRecorder
|
||||
|
||||
class BrokenRecordableEnv(object):
|
||||
metadata = {'render.modes': [None, 'rgb_array']}
|
||||
|
||||
def render(self, mode=None):
|
||||
pass
|
||||
|
||||
class UnrecordableEnv(object):
|
||||
metadata = {'render.modes': [None]}
|
||||
|
||||
def render(self, mode=None):
|
||||
pass
|
||||
|
||||
# TODO(jonas): disabled until we have ffmpeg on travis
|
||||
# def test_record_simple():
|
||||
# rec = VideoRecorder()
|
||||
# env, id = gym.make("CartPole")
|
||||
# rec.capture_frame(env)
|
||||
# rec.close()
|
||||
# assert not rec.empty
|
||||
# assert not rec.broken
|
||||
# assert os.path.exists(rec.path)
|
||||
# f = open(rec.path)
|
||||
# assert os.fstat(f.fileno()).st_size > 100
|
||||
|
||||
def test_no_frames():
|
||||
env = BrokenRecordableEnv()
|
||||
rec = VideoRecorder(env)
|
||||
rec.close()
|
||||
assert rec.empty
|
||||
assert rec.functional
|
||||
assert not os.path.exists(rec.path)
|
||||
|
||||
def test_record_unrecordable_method():
|
||||
env = UnrecordableEnv()
|
||||
rec = VideoRecorder(env)
|
||||
assert not rec.enabled
|
||||
rec.close()
|
||||
|
||||
def test_record_breaking_render_method():
|
||||
env = BrokenRecordableEnv()
|
||||
rec = VideoRecorder(env)
|
||||
rec.capture_frame()
|
||||
rec.close()
|
||||
assert rec.empty
|
||||
assert rec.broken
|
||||
assert not os.path.exists(rec.path)
|
||||
|
||||
def test_text_envs():
|
||||
env = gym.make('FrozenLake-v0')
|
||||
video = VideoRecorder(env)
|
||||
try:
|
||||
env.reset()
|
||||
video.capture_frame()
|
||||
video.close()
|
||||
finally:
|
||||
os.remove(video.path)
|
290
gym/monitoring/video_recorder.py
Normal file
290
gym/monitoring/video_recorder.py
Normal file
@@ -0,0 +1,290 @@
|
||||
import logging
|
||||
import json
|
||||
import os
|
||||
import subprocess
|
||||
import tempfile
|
||||
import os.path
|
||||
import distutils.spawn
|
||||
import numpy as np
|
||||
import StringIO
|
||||
|
||||
from gym import error
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
def touch(path):
|
||||
open(path, 'a').close()
|
||||
|
||||
class VideoRecorder(object):
|
||||
"""VideoRecorder renders a nice movie of a rollout, frame by frame. It
|
||||
comes with an `enabled` option so you can still use the same code
|
||||
on episodes where you don't want to record video.
|
||||
|
||||
Note:
|
||||
You are responsible for calling `close` on a created
|
||||
VideoRecorder, or else you may leak an encoder process.
|
||||
|
||||
Args:
|
||||
env (Env): Environment to take video of.
|
||||
path (Optional[str]): Path to the video file; will be randomly chosen if omitted.
|
||||
base_path (Optional[str]): Alternatively, path to the video file without extension, which will be added.
|
||||
metadata (Optional[dict]): Contents to save to the metadata file.
|
||||
enabled (bool): Whether to actually record video, or just no-op (for convenience)
|
||||
"""
|
||||
|
||||
def __init__(self, env, path=None, metadata=None, enabled=True, base_path=None):
|
||||
modes = env.metadata.get('render.modes', [])
|
||||
self.ansi_mode = False
|
||||
if 'rgb_array' not in modes:
|
||||
if 'ansi' in modes:
|
||||
self.ansi_mode = True
|
||||
else:
|
||||
logger.info('Disabling video recorder because {} neither supports video mode "rgb_array" nor "ansi".'.format(env))
|
||||
enabled = False
|
||||
|
||||
if path is not None and base_path is not None:
|
||||
raise error.Error("You can pass at most one of `path` or `base_path`.")
|
||||
|
||||
self.enabled = enabled
|
||||
self.last_frame = None
|
||||
if not self.enabled:
|
||||
return
|
||||
|
||||
self.env = env
|
||||
|
||||
required_ext = '.json' if self.ansi_mode else '.mp4'
|
||||
if path is None:
|
||||
if base_path is not None:
|
||||
# Base path given, append ext
|
||||
path = base_path + required_ext
|
||||
else:
|
||||
# Otherwise, just generate a unique filename
|
||||
with tempfile.NamedTemporaryFile(suffix=required_ext, delete=False) as f:
|
||||
path = f.name
|
||||
self.path = path
|
||||
|
||||
path_base, actual_ext = os.path.splitext(self.path)
|
||||
|
||||
if actual_ext != required_ext:
|
||||
hint = " HINT: The environment is text-only, therefore we're recording its text output in a structured JSON format." if self.ansi_mode else ''
|
||||
raise error.Error("Invalid path given: {} -- must have file extension {}.{}".format(self.path, required_ext, hint))
|
||||
# Touch the file in any case, so we know it's present. (This
|
||||
# corrects for platform platform differences. Using ffmpeg on
|
||||
# OS X, the file is precreated, but not on Linux.
|
||||
touch(path)
|
||||
|
||||
self.frames_per_sec = env.metadata.get('video.frames_per_second', 30)
|
||||
self.encoder = None # lazily start the process
|
||||
self.broken = False
|
||||
|
||||
# Dump metadata
|
||||
self.metadata = metadata or {}
|
||||
self.metadata['content_type'] = 'video/vnd.openai.ansivid' if self.ansi_mode else 'video/mp4'
|
||||
self.metadata_path = '{}.meta.json'.format(path_base)
|
||||
self.write_metadata()
|
||||
|
||||
logger.info('Starting new video recorder writing to %s', self.path)
|
||||
self.empty = True
|
||||
|
||||
@property
|
||||
def functional(self):
|
||||
return self.enabled and not self.broken
|
||||
|
||||
def capture_frame(self):
|
||||
"""Render the given `env` and add the resulting frame to the video."""
|
||||
if not self.functional: return
|
||||
logger.debug('Capturing video frame: path=%s', self.path)
|
||||
|
||||
render_mode = 'ansi' if self.ansi_mode else 'rgb_array'
|
||||
frame = self.env.render(mode=render_mode)
|
||||
|
||||
if frame is None:
|
||||
# Indicates a bug in the environment: don't want to raise
|
||||
# an error here.
|
||||
logger.warn('Env returned None on render(). Disabling further rendering for video recorder by marking as disabled: path=%s metadata_path=%s', self.path, self.metadata_path)
|
||||
self.broken = True
|
||||
else:
|
||||
self.last_frame = frame
|
||||
if self.ansi_mode:
|
||||
self._encode_ansi_frame(frame)
|
||||
else:
|
||||
self._encode_image_frame(frame)
|
||||
|
||||
def close(self):
|
||||
"""Make sure to manually close, or else you'll leak the encoder process"""
|
||||
if not self.enabled:
|
||||
return
|
||||
|
||||
if self.encoder:
|
||||
logger.debug('Closing video encoder: path=%s', self.path)
|
||||
self.encoder.close()
|
||||
self.encoder = None
|
||||
else:
|
||||
# No frames captured. Set metadata, and remove the empty output file.
|
||||
os.remove(self.path)
|
||||
|
||||
if self.metadata is None:
|
||||
self.metadata = {}
|
||||
self.metadata['empty'] = True
|
||||
|
||||
# If broken, get rid of the output file, otherwise we'd leak it.
|
||||
if self.broken:
|
||||
logger.info('Cleaning up paths for broken video recorder: path=%s metadata_path=%s', self.path, self.metadata_path)
|
||||
|
||||
# Might have crashed before even starting the output file, don't try to remove in that case.
|
||||
if os.path.exists(self.path):
|
||||
os.remove(self.path)
|
||||
|
||||
if self.metadata is None:
|
||||
self.metadata = {}
|
||||
self.metadata['broken'] = True
|
||||
|
||||
self.write_metadata()
|
||||
|
||||
def write_metadata(self):
|
||||
with open(self.metadata_path, 'w') as f:
|
||||
json.dump(self.metadata, f)
|
||||
|
||||
def _encode_ansi_frame(self, frame):
|
||||
if not self.encoder:
|
||||
self.encoder = TextEncoder(self.path, self.frames_per_sec)
|
||||
self.metadata['encoder_version'] = self.encoder.version_info
|
||||
self.encoder.capture_frame(frame)
|
||||
self.empty = False
|
||||
|
||||
def _encode_image_frame(self, frame):
|
||||
if not self.encoder:
|
||||
self.encoder = ImageEncoder(self.path, frame.shape, self.frames_per_sec)
|
||||
self.metadata['encoder_version'] = self.encoder.version_info
|
||||
|
||||
try:
|
||||
self.encoder.capture_frame(frame)
|
||||
except error.InvalidFrame as e:
|
||||
logger.warn('Tried to pass invalid video frame, marking as broken: %s', e)
|
||||
self.broken = True
|
||||
else:
|
||||
self.empty = False
|
||||
|
||||
|
||||
class TextEncoder(object):
|
||||
"""Store a moving picture made out of ANSI frames. Format adapted from
|
||||
https://github.com/asciinema/asciinema/blob/master/doc/asciicast-v1.md"""
|
||||
|
||||
def __init__(self, output_path, frames_per_sec):
|
||||
self.output_path = output_path
|
||||
self.frames_per_sec = frames_per_sec
|
||||
self.frames = []
|
||||
|
||||
def capture_frame(self, frame):
|
||||
string = None
|
||||
if isinstance(frame, str):
|
||||
string = frame
|
||||
elif isinstance(frame, StringIO.StringIO):
|
||||
string = frame.getvalue()
|
||||
else:
|
||||
raise error.InvalidFrame('Wrong type {} for {}: text frame must be a string or StringIO'.format(type(frame), frame))
|
||||
|
||||
if string[-1] != '\n':
|
||||
raise error.InvalidFrame('Frame must end with a newline: """{}"""'.format(string))
|
||||
|
||||
if '\r\n' in string:
|
||||
raise error.InvalidFrame('Frame contains carriage returns (only newlines are allowed: """{}"""'.format(string))
|
||||
|
||||
self.frames.append(string)
|
||||
|
||||
def close(self):
|
||||
#frame_duration = float(1) / self.frames_per_sec
|
||||
frame_duration = .5
|
||||
|
||||
# Turn frames into events: clear screen beforehand
|
||||
# https://rosettacode.org/wiki/Terminal_control/Clear_the_screen#Python
|
||||
# https://rosettacode.org/wiki/Terminal_control/Cursor_positioning#Python
|
||||
clear_code = "%c[2J\033[1;1H" % (27)
|
||||
events = [ (frame_duration, clear_code+frame.replace('\n','\r\n')) for frame in self.frames ]
|
||||
|
||||
# Calculate frame size from the largest frames.
|
||||
# Add some padding since we'll get cut off otherwise.
|
||||
height = max([frame.count('\n') for frame in self.frames]) + 1
|
||||
width = max([max([len(line) for line in frame.split('\n')])]) + 2
|
||||
|
||||
data = {
|
||||
"version": 1,
|
||||
"width": width,
|
||||
"height": height,
|
||||
"duration": len(self.frames)*frame_duration,
|
||||
"command": "-",
|
||||
"title": "gym VideoRecorder episode",
|
||||
"env": {}, # could add some env metadata here
|
||||
"stdout": events,
|
||||
}
|
||||
|
||||
with open(self.output_path, 'w') as f:
|
||||
json.dump(data, f)
|
||||
|
||||
@property
|
||||
def version_info(self):
|
||||
return {'backend':'TextEncoder','version':1}
|
||||
|
||||
class ImageEncoder(object):
|
||||
def __init__(self, output_path, frame_shape, frames_per_sec):
|
||||
self.proc = None
|
||||
self.output_path = output_path
|
||||
# Frame shape should be lines-first, so w and h are swapped
|
||||
h, w, pixfmt = frame_shape
|
||||
if pixfmt != 3 and pixfmt != 4:
|
||||
raise error.InvalidFrame("Your frame has shape {}, but we require (w,h,3) or (w,h,4), i.e. RGB values for a w-by-h image, with an optional alpha channl.".format(frame_shape))
|
||||
self.wh = (w,h)
|
||||
self.includes_alpha = (pixfmt == 4)
|
||||
self.frame_shape = frame_shape
|
||||
self.frames_per_sec = frames_per_sec
|
||||
|
||||
if distutils.spawn.find_executable('ffmpeg') is not None:
|
||||
self.backend = 'ffmpeg'
|
||||
elif distutils.spawn.find_executable('avconv') is not None:
|
||||
self.backend = 'avconv'
|
||||
else:
|
||||
raise error.DependencyNotInstalled("""Found neither the ffmpeg nor avconv executables. On OS X, you can install ffmpeg via `brew install ffmpeg`. On most Ubuntu variants, `sudo apt-get install ffmpeg` should do it. On Ubuntu 14.04, however, you'll need to install avconv with `sudo apt-get install libav-tools`.""")
|
||||
|
||||
self.start()
|
||||
|
||||
@property
|
||||
def version_info(self):
|
||||
return {'backend':self.backend,'version':subprocess.check_output([self.backend, '-version']),'cmdline':self.cmdline}
|
||||
|
||||
def start(self):
|
||||
self.cmdline = (self.backend,
|
||||
'-nostats',
|
||||
'-loglevel', 'error', # suppress warnings
|
||||
'-y',
|
||||
'-r', '%d' % self.frames_per_sec,
|
||||
|
||||
# input
|
||||
'-f', 'rawvideo',
|
||||
'-s:v', '{}x{}'.format(*self.wh),
|
||||
'-pix_fmt',('rgb32' if self.includes_alpha else 'rgb24'),
|
||||
'-i', '/dev/stdin',
|
||||
|
||||
# output
|
||||
'-vcodec', 'libx264',
|
||||
'-pix_fmt', 'yuv420p',
|
||||
self.output_path
|
||||
)
|
||||
|
||||
logger.debug('Starting ffmpeg with "%s"', ' '.join(self.cmdline))
|
||||
self.proc = subprocess.Popen(self.cmdline, stdin=subprocess.PIPE)
|
||||
|
||||
def capture_frame(self, frame):
|
||||
if not isinstance(frame, (np.ndarray, np.generic)):
|
||||
raise error.InvalidFrame('Wrong type {} for {} (must be np.ndarray or np.generic)'.format(type(frame), frame))
|
||||
if frame.shape != self.frame_shape:
|
||||
raise error.InvalidFrame("Your frame has shape {}, but the VideoRecorder is configured for shape {}.".format(frame_shape, self.frame_shape))
|
||||
if frame.dtype != np.uint8:
|
||||
raise error.InvalidFrame("Your frame has data type {}, but we require uint8 (i.e. RGB values from 0-255).".format(frame.dtype))
|
||||
|
||||
self.proc.stdin.write(frame.tobytes())
|
||||
|
||||
def close(self):
|
||||
self.proc.stdin.close()
|
||||
ret = self.proc.wait()
|
||||
if ret != 0:
|
||||
logger.error("VideoRecorder encoder exited with status {}".format(ret))
|
9
gym/scoreboard/__init__.py
Normal file
9
gym/scoreboard/__init__.py
Normal file
@@ -0,0 +1,9 @@
|
||||
import os
|
||||
|
||||
from gym.scoreboard.client.resource import FileUpload, Evaluation
|
||||
|
||||
# Discover API key from the environment. (You should never have to
|
||||
# change api_base / web_base.)
|
||||
api_key = os.environ.get('OPENAI_GYM_API_KEY')
|
||||
api_base = os.environ.get('OPENAI_GYM_API_BASE', 'https://gym-api.openai.com')
|
||||
web_base = os.environ.get('OPENAI_GYM_WEB_BASE', 'https://gym.openai.com')
|
181
gym/scoreboard/api.py
Normal file
181
gym/scoreboard/api.py
Normal file
@@ -0,0 +1,181 @@
|
||||
import logging
|
||||
import json
|
||||
import os
|
||||
import re
|
||||
import tarfile
|
||||
import tempfile
|
||||
from gym import error, monitoring
|
||||
from gym.scoreboard.client import resource, util
|
||||
|
||||
MAX_VIDEOS = 100
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
video_name_re = re.compile('^[\w.-]+\.(mp4|avi|json)$')
|
||||
metadata_name_re = re.compile('^[\w.-]+\.meta\.json$')
|
||||
|
||||
def upload(training_dir, algorithm_id=None, writeup=None, api_key=None):
|
||||
"""Upload the results of training (as automatically recorded by your
|
||||
env's monitor) to OpenAI Gym.
|
||||
|
||||
Args:
|
||||
training_dir (Optional[str]): A directory containing the results of a training run.
|
||||
algorithm_id (Optional[str]): An arbitrary string indicating the paricular version of the algorithm (including choices of parameters) you are running.
|
||||
writeup (Optional[str]): A Gist URL (of the form https://gist.github.com/<user>/<id>) containing your writeup for this evaluation.
|
||||
api_key (Optional[str]): Your OpenAI API key. Can also be provided as an environment variable (OPENAI_GYM_API_KEY).
|
||||
"""
|
||||
|
||||
open_monitors = monitoring._monitors.values()
|
||||
if open_monitors:
|
||||
envs = [m.env.spec.id if m.env.spec else '(unknown)' for m in open_monitors]
|
||||
raise error.Error("Still have an open monitor on {}. You must run 'env.monitor.close()' before uploading.".format(', '.join(envs)))
|
||||
|
||||
env_info, training_episode_batch, training_video = upload_training_data(training_dir, api_key=None)
|
||||
training_episode_batch_id = training_video_id = None
|
||||
if training_episode_batch:
|
||||
training_episode_batch_id = training_episode_batch.id
|
||||
if training_video:
|
||||
training_video_id = training_video.id
|
||||
|
||||
if logger.level <= logging.INFO:
|
||||
message = ['Creating evaluation object on the server']
|
||||
if training_episode_batch_id is not None and training_video_id is not None:
|
||||
logger.info('Creating evaluation object from %s with learning curve and training video', training_dir)
|
||||
elif training_episode_batch_id is not None:
|
||||
logger.info('Creating evaluation object from %s with learning curve', training_dir)
|
||||
elif training_video_id is not None:
|
||||
logger.info('Creating evaluation object from %s with training video', training_dir)
|
||||
else:
|
||||
raise error.Error("You didn't have any recorded training data in {}. Once you've used 'env.monitor.start(training_dir)' to start recording, you need to actually run some rollouts. Please join the community chat on https://gym.openai.com if you have any issues.".format(training_dir))
|
||||
|
||||
evaluation = resource.Evaluation.create(
|
||||
training_episode_batch=training_episode_batch_id,
|
||||
training_video=training_video_id,
|
||||
env=env_info['env_id'],
|
||||
algorithm={
|
||||
'id': algorithm_id,
|
||||
},
|
||||
writeup=writeup,
|
||||
gym_version=env_info['gym_version'],
|
||||
api_key=api_key,
|
||||
)
|
||||
|
||||
logger.info(
|
||||
|
||||
"""
|
||||
****************************************************
|
||||
You successfully uploaded your agent evaluation to
|
||||
OpenAI Gym! You can find it at:
|
||||
|
||||
%s
|
||||
|
||||
****************************************************
|
||||
""".rstrip(), evaluation.web_url())
|
||||
|
||||
return evaluation
|
||||
|
||||
def upload_training_data(training_dir, api_key=None):
|
||||
# Could have multiple manifests
|
||||
results = monitoring.load_results(training_dir)
|
||||
if not results:
|
||||
raise error.Error('''Could not find any manifest files in {}.
|
||||
|
||||
(HINT: this usually means you did not yet close() your env.monitor and have not yet exited the process. You should call 'env.monitor.start(training_dir)' at the start of training and 'env.monitor.close()' at the end, or exit the process.)'''.format(training_dir))
|
||||
|
||||
manifests = results['manifests']
|
||||
env_info = results['env_info']
|
||||
timestamps = results['timestamps']
|
||||
episode_lengths = results['episode_lengths']
|
||||
episode_rewards = results['episode_rewards']
|
||||
videos = results['videos']
|
||||
|
||||
logger.debug('Uploading data from manifest %s', ', '.join(manifests))
|
||||
|
||||
# Do the relevant uploads
|
||||
if len(episode_lengths) > 0:
|
||||
training_episode_batch = upload_training_episode_batch(episode_lengths, episode_rewards, timestamps, api_key)
|
||||
else:
|
||||
training_episode_batch = None
|
||||
|
||||
if len(videos) > MAX_VIDEOS:
|
||||
logger.warn('You recorded videos for {} episodes, but the scoreboard only supports up to {}. We will automatically subsample for you, but you also might wish to adjust your video recording rate.'.format(len(videos), MAX_VIDEOS))
|
||||
skip = len(videos) / (MAX_VIDEOS - 1)
|
||||
videos = videos[::skip]
|
||||
|
||||
if len(videos) > 0:
|
||||
training_video = upload_training_video(videos, api_key)
|
||||
else:
|
||||
training_video = None
|
||||
|
||||
return env_info, training_episode_batch, training_video
|
||||
|
||||
def upload_training_episode_batch(episode_lengths, episode_rewards, timestamps, api_key=None):
|
||||
logger.info('Uploading %d episodes of training data', len(episode_lengths))
|
||||
file_upload = resource.FileUpload.create(purpose='episode_batch', api_key=api_key)
|
||||
file_upload.put({
|
||||
'episode_lengths': episode_lengths,
|
||||
'episode_rewards': episode_rewards,
|
||||
'timestamps': timestamps,
|
||||
})
|
||||
return file_upload
|
||||
|
||||
def upload_training_video(videos, api_key=None):
|
||||
"""videos: should be list of (video_path, metadata_path) tuples"""
|
||||
with tempfile.TemporaryFile() as archive_file:
|
||||
write_archive(videos, archive_file)
|
||||
archive_file.seek(0)
|
||||
|
||||
logger.info('Uploading videos of %d training episodes (%d bytes)', len(videos), util.file_size(archive_file))
|
||||
file_upload = resource.FileUpload.create(purpose='video', content_type='application/vnd.openai.video+x-compressed', api_key=api_key)
|
||||
file_upload.put(archive_file, encode=None)
|
||||
|
||||
return file_upload
|
||||
|
||||
def write_archive(videos, archive_file):
|
||||
if len(videos) > MAX_VIDEOS:
|
||||
raise error.Error('Trying to upload {} videos, but there is a limit of {} currently. If you actually want to upload this many videos, please email gym@openai.com with your use-case.'.format(MAX_VIDEOS, len(videos)))
|
||||
|
||||
logger.debug('Preparing an archive of %d videos: %s', len(videos), videos)
|
||||
|
||||
# Double check that there are no collisions
|
||||
basenames = set()
|
||||
manifest = {
|
||||
'version': 0,
|
||||
'videos': []
|
||||
}
|
||||
|
||||
with tarfile.open(fileobj=archive_file, mode='w:gz') as tar:
|
||||
for video_path, metadata_path in videos:
|
||||
video_name = os.path.basename(video_path)
|
||||
metadata_name = os.path.basename(metadata_path)
|
||||
|
||||
if not os.path.exists(video_path):
|
||||
raise error.Error('No such video file {}. (HINT: Your video recorder may have broken midway through the run. You can check this with `video_recorder.functional`.)'.format(video_path))
|
||||
elif not os.path.exists(metadata_path):
|
||||
raise error.Error('No such metadata file {}. (HINT: this should be automatically created when using a VideoRecorder instance.)'.format(video_path))
|
||||
|
||||
# Do some sanity checking
|
||||
if video_name in basenames:
|
||||
raise error.Error('Duplicated video name {} in video list: {}'.format(video_name, videos))
|
||||
elif metadata_name in basenames:
|
||||
raise error.Error('Duplicated metadata file name {} in video list: {}'.format(metadata_name, videos))
|
||||
elif not video_name_re.search(video_name):
|
||||
raise error.Error('Invalid video name {} (must match {})'.format(video_name, video_name_re.pattern))
|
||||
elif not metadata_name_re.search(metadata_name):
|
||||
raise error.Error('Invalid metadata file name {} (must match {})'.format(metadata_name, metadata_name_re.pattern))
|
||||
|
||||
# Record that we've seen these names; add to manifest
|
||||
basenames.add(video_name)
|
||||
basenames.add(metadata_name)
|
||||
manifest['videos'].append((video_name, metadata_name))
|
||||
|
||||
# Import the files into the archive
|
||||
tar.add(video_path, arcname=video_name, recursive=False)
|
||||
tar.add(metadata_path, arcname=metadata_name, recursive=False)
|
||||
|
||||
# Actually write the manifest file
|
||||
with tempfile.NamedTemporaryFile() as f:
|
||||
json.dump(manifest, f)
|
||||
f.flush()
|
||||
|
||||
tar.add(f.name, arcname='manifest.json')
|
4
gym/scoreboard/client/README.md
Normal file
4
gym/scoreboard/client/README.md
Normal file
@@ -0,0 +1,4 @@
|
||||
# Client
|
||||
|
||||
This client was forked from the (Stripe
|
||||
Python)[https://github.com/stripe/stripe-python] bindings.
|
6
gym/scoreboard/client/__init__.py
Normal file
6
gym/scoreboard/client/__init__.py
Normal file
@@ -0,0 +1,6 @@
|
||||
import logging
|
||||
import os
|
||||
|
||||
from gym import error
|
||||
|
||||
logger = logging.getLogger(__name__)
|
158
gym/scoreboard/client/api_requestor.py
Normal file
158
gym/scoreboard/client/api_requestor.py
Normal file
@@ -0,0 +1,158 @@
|
||||
import json
|
||||
import platform
|
||||
import urlparse
|
||||
|
||||
from gym import error, version
|
||||
import gym.scoreboard.client
|
||||
from gym.scoreboard.client import http_client
|
||||
|
||||
verify_ssl_certs = True # [SECURITY CRITICAL] only turn this off while debugging
|
||||
http_client = http_client.RequestsClient(verify_ssl_certs=verify_ssl_certs)
|
||||
|
||||
def _build_api_url(url, query):
|
||||
scheme, netloc, path, base_query, fragment = urlparse.urlsplit(url)
|
||||
|
||||
if base_query:
|
||||
query = '%s&%s' % (base_query, query)
|
||||
|
||||
return urlparse.urlunsplit((scheme, netloc, path, query, fragment))
|
||||
|
||||
def _strip_nulls(params):
|
||||
if isinstance(params, dict):
|
||||
stripped = {}
|
||||
for key, value in params.iteritems():
|
||||
value = _strip_nulls(value)
|
||||
if value is not None:
|
||||
stripped[key] = value
|
||||
return stripped
|
||||
else:
|
||||
return params
|
||||
|
||||
class APIRequestor(object):
|
||||
def __init__(self, key=None, api_base=None):
|
||||
self.api_base = api_base or gym.scoreboard.api_base
|
||||
self.api_key = key
|
||||
self._client = http_client
|
||||
|
||||
def request(self, method, url, params=None, headers=None):
|
||||
rbody, rcode, rheaders, my_api_key = self.request_raw(
|
||||
method.lower(), url, params, headers)
|
||||
resp = self.interpret_response(rbody, rcode, rheaders)
|
||||
return resp, my_api_key
|
||||
|
||||
def handle_api_error(self, rbody, rcode, resp, rheaders):
|
||||
# Rate limits were previously coded as 400's with code 'rate_limit'
|
||||
if rcode == 429:
|
||||
raise error.RateLimitError(
|
||||
resp.get('detail'), rbody, rcode, resp, rheaders)
|
||||
elif rcode in [400, 404]:
|
||||
type = resp.get('type')
|
||||
if type == 'about:blank':
|
||||
type = None
|
||||
raise error.InvalidRequestError(
|
||||
resp.get('detail'), type,
|
||||
rbody, rcode, resp, rheaders)
|
||||
elif rcode == 401:
|
||||
raise error.AuthenticationError(
|
||||
resp.get('detail'), rbody, rcode, resp,
|
||||
rheaders)
|
||||
else:
|
||||
detail = resp.get('detail')
|
||||
|
||||
# This information will only be returned to developers of
|
||||
# the OpenAI Gym Scoreboard.
|
||||
dev_info = resp.get('dev_info')
|
||||
if dev_info:
|
||||
detail = "{}\n\n<dev_info>\n{}\n</dev_info>".format(detail, dev_info['traceback'])
|
||||
raise error.APIError(detail, rbody, rcode, resp,
|
||||
rheaders)
|
||||
|
||||
def request_raw(self, method, url, params=None, supplied_headers=None):
|
||||
"""
|
||||
Mechanism for issuing an API call
|
||||
"""
|
||||
if self.api_key:
|
||||
my_api_key = self.api_key
|
||||
else:
|
||||
my_api_key = gym.scoreboard.api_key
|
||||
|
||||
if my_api_key is None:
|
||||
raise error.AuthenticationError("""You must provide an OpenAI Gym API key.
|
||||
|
||||
(HINT: Set your API key using "gym.scoreboard.api_key = .." or "export OPENAI_GYM_API_KEY=..."). You can find your API key in the OpenAI Gym web interface: https://gym.openai.com/settings/profile.""")
|
||||
|
||||
abs_url = '%s%s' % (self.api_base, url)
|
||||
|
||||
if params:
|
||||
encoded_params = json.dumps(_strip_nulls(params))
|
||||
else:
|
||||
encoded_params = None
|
||||
|
||||
if method == 'get' or method == 'delete':
|
||||
if params:
|
||||
abs_url = _build_api_url(abs_url, encoded_params)
|
||||
post_data = None
|
||||
elif method == 'post':
|
||||
post_data = encoded_params
|
||||
else:
|
||||
raise error.APIConnectionError(
|
||||
'Unrecognized HTTP method %r. This may indicate a bug in the '
|
||||
'OpenAI Gym bindings. Please contact gym@openai.com for '
|
||||
'assistance.' % (method,))
|
||||
|
||||
ua = {
|
||||
'bindings_version': version.VERSION,
|
||||
'lang': 'python',
|
||||
'publisher': 'openai',
|
||||
'httplib': self._client.name,
|
||||
}
|
||||
for attr, func in [['lang_version', platform.python_version],
|
||||
['platform', platform.platform]]:
|
||||
try:
|
||||
val = func()
|
||||
except Exception as e:
|
||||
val = "!! %s" % (e,)
|
||||
ua[attr] = val
|
||||
|
||||
headers = {
|
||||
'Openai-Gym-User-Agent': json.dumps(ua),
|
||||
'User-Agent': 'Openai-Gym/v1 PythonBindings/%s' % (version.VERSION,),
|
||||
'Authorization': 'Bearer %s' % (my_api_key,)
|
||||
}
|
||||
|
||||
if method == 'post':
|
||||
headers['Content-Type'] = 'application/json'
|
||||
|
||||
if supplied_headers is not None:
|
||||
for key, value in supplied_headers.items():
|
||||
headers[key] = value
|
||||
|
||||
rbody, rcode, rheaders = self._client.request(
|
||||
method, abs_url, headers, post_data)
|
||||
|
||||
return rbody, rcode, rheaders, my_api_key
|
||||
|
||||
def interpret_response(self, rbody, rcode, rheaders):
|
||||
content_type = rheaders.get('Content-Type', '')
|
||||
if content_type.startswith('text/plain'):
|
||||
# Pass through plain text
|
||||
resp = rbody
|
||||
|
||||
if not (200 <= rcode < 300):
|
||||
self.handle_api_error(rbody, rcode, {}, rheaders)
|
||||
else:
|
||||
# TODO: Be strict about other Content-Types
|
||||
try:
|
||||
if hasattr(rbody, 'decode'):
|
||||
rbody = rbody.decode('utf-8')
|
||||
resp = json.loads(rbody)
|
||||
except Exception:
|
||||
raise error.APIError(
|
||||
"Invalid response body from API: %s "
|
||||
"(HTTP response code was %d)" % (rbody, rcode),
|
||||
rbody, rcode, rheaders)
|
||||
|
||||
if not (200 <= rcode < 300):
|
||||
self.handle_api_error(rbody, rcode, resp, rheaders)
|
||||
|
||||
return resp
|
93
gym/scoreboard/client/http_client.py
Normal file
93
gym/scoreboard/client/http_client.py
Normal file
@@ -0,0 +1,93 @@
|
||||
# Forked from
|
||||
import logging
|
||||
import requests
|
||||
import textwrap
|
||||
|
||||
from gym import error
|
||||
from gym.scoreboard.client import util
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
warned = False
|
||||
|
||||
def render_post_data(post_data):
|
||||
if hasattr(post_data, 'fileno'): # todo: is this the right way of checking if it's a file?
|
||||
return '%r (%d bytes)' % (post_data, util.file_size(post_data))
|
||||
elif isinstance(post_data, basestring):
|
||||
return '%r (%d bytes)' % (post_data, len(post_data))
|
||||
else:
|
||||
return None
|
||||
|
||||
class RequestsClient(object):
|
||||
name = 'requests'
|
||||
|
||||
def __init__(self, verify_ssl_certs=True):
|
||||
self._verify_ssl_certs = verify_ssl_certs
|
||||
self.session = requests.Session()
|
||||
|
||||
def request(self, method, url, headers, post_data=None, files=None):
|
||||
kwargs = {}
|
||||
|
||||
# Really, really only turn this off while debugging.
|
||||
if not self._verify_ssl_certs:
|
||||
if not warned:
|
||||
logger.warn('You have disabled SSL cert verification in OpenAI Gym, so we will not verify SSL certs. This means an attacker with control of your network could snoop on or modify your data in transit.')
|
||||
warned = True
|
||||
kwargs['verify'] = False
|
||||
|
||||
try:
|
||||
try:
|
||||
result = self.session.request(method,
|
||||
url,
|
||||
headers=headers,
|
||||
data=post_data,
|
||||
timeout=200,
|
||||
files=files,
|
||||
**kwargs)
|
||||
except TypeError as e:
|
||||
raise TypeError(
|
||||
'Warning: It looks like your installed version of the '
|
||||
'"requests" library is not compatible with OpenAI Gym\'s'
|
||||
'usage thereof. (HINT: The most likely cause is that '
|
||||
'your "requests" library is out of date. You can fix '
|
||||
'that by running "pip install -U requests".) The '
|
||||
'underlying error was: %s' % (e,))
|
||||
|
||||
# This causes the content to actually be read, which could cause
|
||||
# e.g. a socket timeout. TODO: The other fetch methods probably
|
||||
# are susceptible to the same and should be updated.
|
||||
content = result.content
|
||||
status_code = result.status_code
|
||||
except Exception as e:
|
||||
# Would catch just requests.exceptions.RequestException, but can
|
||||
# also raise ValueError, RuntimeError, etc.
|
||||
self._handle_request_error(e, method, url)
|
||||
|
||||
if util.logger.level <= logging.DEBUG:
|
||||
util.logger.debug(
|
||||
"""API request to %s returned (response code, response body) of
|
||||
(%d, %r)
|
||||
|
||||
Request body was: %s""", url, status_code, content, render_post_data(post_data))
|
||||
elif util.logger.level <= logging.INFO:
|
||||
util.logger.info('HTTP request: %s %s %d', method.upper(), url, status_code)
|
||||
return content, status_code, result.headers
|
||||
|
||||
def _handle_request_error(self, e, method, url):
|
||||
if isinstance(e, requests.exceptions.RequestException):
|
||||
msg = ("Unexpected error communicating with OpenAI Gym "
|
||||
"(while calling {} {}). "
|
||||
"If this problem persists, let us know at "
|
||||
"gym@openai.com.".format(method, url))
|
||||
err = "%s: %s" % (type(e).__name__, str(e))
|
||||
else:
|
||||
msg = ("Unexpected error communicating with OpenAI Gym. "
|
||||
"It looks like there's probably a configuration "
|
||||
"issue locally. If this problem persists, let us "
|
||||
"know at gym@openai.com.")
|
||||
err = "A %s was raised" % (type(e).__name__,)
|
||||
if str(e):
|
||||
err += " with error message %s" % (str(e),)
|
||||
else:
|
||||
err += " with no error message"
|
||||
msg = textwrap.fill(msg, width=140) + "\n\n(Network error: %s)" % (err,)
|
||||
raise error.APIConnectionError(msg)
|
378
gym/scoreboard/client/resource.py
Normal file
378
gym/scoreboard/client/resource.py
Normal file
@@ -0,0 +1,378 @@
|
||||
import json
|
||||
import urllib
|
||||
import warnings
|
||||
import sys
|
||||
|
||||
import gym
|
||||
from gym import error
|
||||
from gym.scoreboard.client import api_requestor, util
|
||||
|
||||
def convert_to_gym_object(resp, api_key):
|
||||
types = {
|
||||
'evaluation': Evaluation,
|
||||
'file': FileUpload,
|
||||
}
|
||||
|
||||
if isinstance(resp, list):
|
||||
return [convert_to_gym_object(i, api_key) for i in resp]
|
||||
elif isinstance(resp, dict) and not isinstance(resp, GymObject):
|
||||
resp = resp.copy()
|
||||
klass_name = resp.get('object')
|
||||
if isinstance(klass_name, basestring):
|
||||
klass = types.get(klass_name, GymObject)
|
||||
else:
|
||||
klass = GymObject
|
||||
return klass.construct_from(resp, api_key)
|
||||
else:
|
||||
return resp
|
||||
|
||||
def populate_headers(idempotency_key):
|
||||
if idempotency_key is not None:
|
||||
return {"Idempotency-Key": idempotency_key}
|
||||
return None
|
||||
|
||||
def _compute_diff(current, previous):
|
||||
if isinstance(current, dict):
|
||||
previous = previous or {}
|
||||
diff = current.copy()
|
||||
for key in set(previous.keys()) - set(diff.keys()):
|
||||
diff[key] = ""
|
||||
return diff
|
||||
return current if current is not None else ""
|
||||
|
||||
class GymObject(dict):
|
||||
def __init__(self, id=None, api_key=None, **params):
|
||||
super(GymObject, self).__init__()
|
||||
|
||||
self._unsaved_values = set()
|
||||
self._transient_values = set()
|
||||
|
||||
self._retrieve_params = params
|
||||
self._previous = None
|
||||
|
||||
object.__setattr__(self, 'api_key', api_key)
|
||||
|
||||
if id:
|
||||
self['id'] = id
|
||||
|
||||
def update(self, update_dict):
|
||||
for k in update_dict:
|
||||
self._unsaved_values.add(k)
|
||||
|
||||
return super(GymObject, self).update(update_dict)
|
||||
|
||||
def __setattr__(self, k, v):
|
||||
if k[0] == '_' or k in self.__dict__:
|
||||
return super(GymObject, self).__setattr__(k, v)
|
||||
else:
|
||||
self[k] = v
|
||||
|
||||
def __getattr__(self, k):
|
||||
if k[0] == '_':
|
||||
raise AttributeError(k)
|
||||
|
||||
try:
|
||||
return self[k]
|
||||
except KeyError as err:
|
||||
raise AttributeError(*err.args)
|
||||
|
||||
def __delattr__(self, k):
|
||||
if k[0] == '_' or k in self.__dict__:
|
||||
return super(GymObject, self).__delattr__(k)
|
||||
else:
|
||||
del self[k]
|
||||
|
||||
def __setitem__(self, k, v):
|
||||
if v == "":
|
||||
raise ValueError(
|
||||
"You cannot set %s to an empty string. "
|
||||
"We interpret empty strings as None in requests."
|
||||
"You may set %s.%s = None to delete the property" % (
|
||||
k, str(self), k))
|
||||
|
||||
super(GymObject, self).__setitem__(k, v)
|
||||
|
||||
# Allows for unpickling in Python 3.x
|
||||
if not hasattr(self, '_unsaved_values'):
|
||||
self._unsaved_values = set()
|
||||
|
||||
self._unsaved_values.add(k)
|
||||
|
||||
def __getitem__(self, k):
|
||||
try:
|
||||
return super(GymObject, self).__getitem__(k)
|
||||
except KeyError as err:
|
||||
if k in self._transient_values:
|
||||
raise KeyError(
|
||||
"%r. HINT: The %r attribute was set in the past."
|
||||
"It was then wiped when refreshing the object with "
|
||||
"the result returned by Rl_Gym's API, probably as a "
|
||||
"result of a save(). The attributes currently "
|
||||
"available on this object are: %s" %
|
||||
(k, k, ', '.join(self.keys())))
|
||||
else:
|
||||
raise err
|
||||
|
||||
def __delitem__(self, k):
|
||||
super(GymObject, self).__delitem__(k)
|
||||
|
||||
# Allows for unpickling in Python 3.x
|
||||
if hasattr(self, '_unsaved_values'):
|
||||
self._unsaved_values.remove(k)
|
||||
|
||||
@classmethod
|
||||
def construct_from(cls, values, key):
|
||||
instance = cls(values.get('id'), api_key=key)
|
||||
instance.refresh_from(values, api_key=key)
|
||||
return instance
|
||||
|
||||
def refresh_from(self, values, api_key=None, partial=False):
|
||||
self.api_key = api_key or getattr(values, 'api_key', None)
|
||||
|
||||
# Wipe old state before setting new. This is useful for e.g.
|
||||
# updating a customer, where there is no persistent card
|
||||
# parameter. Mark those values which don't persist as transient
|
||||
if partial:
|
||||
self._unsaved_values = (self._unsaved_values - set(values))
|
||||
else:
|
||||
removed = set(self.keys()) - set(values)
|
||||
self._transient_values = self._transient_values | removed
|
||||
self._unsaved_values = set()
|
||||
self.clear()
|
||||
|
||||
self._transient_values = self._transient_values - set(values)
|
||||
|
||||
for k, v in values.iteritems():
|
||||
super(GymObject, self).__setitem__(
|
||||
k, convert_to_gym_object(v, api_key))
|
||||
|
||||
self._previous = values
|
||||
|
||||
@classmethod
|
||||
def api_base(cls):
|
||||
return None
|
||||
|
||||
def request(self, method, url, params=None, headers=None):
|
||||
if params is None:
|
||||
params = self._retrieve_params
|
||||
requestor = api_requestor.APIRequestor(
|
||||
key=self.api_key, api_base=self.api_base())
|
||||
response, api_key = requestor.request(method, url, params, headers)
|
||||
|
||||
return convert_to_gym_object(response, api_key)
|
||||
|
||||
def __repr__(self):
|
||||
ident_parts = [type(self).__name__]
|
||||
|
||||
if isinstance(self.get('object'), basestring):
|
||||
ident_parts.append(self.get('object'))
|
||||
|
||||
if isinstance(self.get('id'), basestring):
|
||||
ident_parts.append('id=%s' % (self.get('id'),))
|
||||
|
||||
unicode_repr = '<%s at %s> JSON: %s' % (
|
||||
' '.join(ident_parts), hex(id(self)), str(self))
|
||||
|
||||
if sys.version_info[0] < 3:
|
||||
return unicode_repr.encode('utf-8')
|
||||
else:
|
||||
return unicode_repr
|
||||
|
||||
def __str__(self):
|
||||
return json.dumps(self, sort_keys=True, indent=2)
|
||||
|
||||
def to_dict(self):
|
||||
warnings.warn(
|
||||
'The `to_dict` method is deprecated and will be removed in '
|
||||
'version 2.0 of the Rl_Gym bindings. The GymObject is '
|
||||
'itself now a subclass of `dict`.',
|
||||
DeprecationWarning)
|
||||
|
||||
return dict(self)
|
||||
|
||||
@property
|
||||
def gym_id(self):
|
||||
return self.id
|
||||
|
||||
def serialize(self, previous):
|
||||
params = {}
|
||||
unsaved_keys = self._unsaved_values or set()
|
||||
previous = previous or self._previous or {}
|
||||
|
||||
for k, v in self.items():
|
||||
if k == 'id' or (isinstance(k, str) and k.startswith('_')):
|
||||
continue
|
||||
elif isinstance(v, APIResource):
|
||||
continue
|
||||
elif hasattr(v, 'serialize'):
|
||||
params[k] = v.serialize(previous.get(k, None))
|
||||
elif k in unsaved_keys:
|
||||
params[k] = _compute_diff(v, previous.get(k, None))
|
||||
|
||||
return params
|
||||
|
||||
class APIResource(GymObject):
|
||||
@classmethod
|
||||
def retrieve(cls, id, api_key=None, **params):
|
||||
instance = cls(id, api_key, **params)
|
||||
instance.refresh()
|
||||
return instance
|
||||
|
||||
def refresh(self):
|
||||
self.refresh_from(self.request('get', self.instance_path()))
|
||||
return self
|
||||
|
||||
@classmethod
|
||||
def class_name(cls):
|
||||
if cls == APIResource:
|
||||
raise NotImplementedError(
|
||||
'APIResource is an abstract class. You should perform '
|
||||
'actions on its subclasses (e.g. Charge, Customer)')
|
||||
return str(urllib.quote_plus(cls.__name__.lower()))
|
||||
|
||||
@classmethod
|
||||
def class_path(cls):
|
||||
cls_name = cls.class_name()
|
||||
return "/v1/%ss" % (cls_name,)
|
||||
|
||||
def instance_path(self):
|
||||
id = self.get('id')
|
||||
if not id:
|
||||
raise error.InvalidRequestError(
|
||||
'Could not determine which URL to request: %s instance '
|
||||
'has invalid ID: %r' % (type(self).__name__, id), 'id')
|
||||
id = util.utf8(id)
|
||||
base = self.class_path()
|
||||
extn = urllib.quote_plus(id)
|
||||
return "%s/%s" % (base, extn)
|
||||
|
||||
class ListObject(GymObject):
|
||||
def list(self, **params):
|
||||
return self.request('get', self['url'], params)
|
||||
|
||||
def all(self, **params):
|
||||
warnings.warn("The `all` method is deprecated and will"
|
||||
"be removed in future versions. Please use the "
|
||||
"`list` method instead",
|
||||
DeprecationWarning)
|
||||
return self.list(**params)
|
||||
|
||||
def auto_paging_iter(self):
|
||||
page = self
|
||||
params = dict(self._retrieve_params)
|
||||
|
||||
while True:
|
||||
item_id = None
|
||||
for item in page:
|
||||
item_id = item.get('id', None)
|
||||
yield item
|
||||
|
||||
if not getattr(page, 'has_more', False) or item_id is None:
|
||||
return
|
||||
|
||||
params['starting_after'] = item_id
|
||||
page = self.list(**params)
|
||||
|
||||
def create(self, idempotency_key=None, **params):
|
||||
headers = populate_headers(idempotency_key)
|
||||
return self.request('post', self['url'], params, headers)
|
||||
|
||||
def retrieve(self, id, **params):
|
||||
base = self.get('url')
|
||||
id = util.utf8(id)
|
||||
extn = urllib.quote_plus(id)
|
||||
url = "%s/%s" % (base, extn)
|
||||
|
||||
return self.request('get', url, params)
|
||||
|
||||
def __iter__(self):
|
||||
return getattr(self, 'data', []).__iter__()
|
||||
|
||||
# Classes of API operations
|
||||
|
||||
class ListableAPIResource(APIResource):
|
||||
@classmethod
|
||||
def all(cls, *args, **params):
|
||||
warnings.warn("The `all` class method is deprecated and will"
|
||||
"be removed in future versions. Please use the "
|
||||
"`list` class method instead",
|
||||
DeprecationWarning)
|
||||
return cls.list(*args, **params)
|
||||
|
||||
@classmethod
|
||||
def auto_paging_iter(self, *args, **params):
|
||||
return self.list(*args, **params).auto_paging_iter()
|
||||
|
||||
@classmethod
|
||||
def list(cls, api_key=None, idempotency_key=None, **params):
|
||||
requestor = api_requestor.APIRequestor(api_key)
|
||||
url = cls.class_path()
|
||||
response, api_key = requestor.request('get', url, params)
|
||||
return convert_to_gym_object(response, api_key)
|
||||
|
||||
|
||||
class CreateableAPIResource(APIResource):
|
||||
@classmethod
|
||||
def create(cls, api_key=None, idempotency_key=None, **params):
|
||||
requestor = api_requestor.APIRequestor(api_key)
|
||||
url = cls.class_path()
|
||||
headers = populate_headers(idempotency_key)
|
||||
response, api_key = requestor.request('post', url, params, headers)
|
||||
return convert_to_gym_object(response, api_key)
|
||||
|
||||
|
||||
class UpdateableAPIResource(APIResource):
|
||||
def save(self, idempotency_key=None):
|
||||
updated_params = self.serialize(None)
|
||||
headers = populate_headers(idempotency_key)
|
||||
|
||||
if updated_params:
|
||||
self.refresh_from(self.request('post', self.instance_path(),
|
||||
updated_params, headers))
|
||||
else:
|
||||
util.logger.debug("Trying to save already saved object %r", self)
|
||||
return self
|
||||
|
||||
|
||||
class DeletableAPIResource(APIResource):
|
||||
def delete(self, **params):
|
||||
self.refresh_from(self.request('delete', self.instance_path(), params))
|
||||
return self
|
||||
|
||||
## Our resources
|
||||
|
||||
class FileUpload(ListableAPIResource):
|
||||
@classmethod
|
||||
def class_name(cls):
|
||||
return 'file'
|
||||
|
||||
@classmethod
|
||||
def create(cls, api_key=None, **params):
|
||||
requestor = api_requestor.APIRequestor(
|
||||
api_key, api_base=cls.api_base())
|
||||
url = cls.class_path()
|
||||
response, api_key = requestor.request(
|
||||
'post', url, params=params)
|
||||
return convert_to_gym_object(response, api_key)
|
||||
|
||||
def put(self, contents, encode='json'):
|
||||
supplied_headers = {
|
||||
"Content-Type": self.content_type
|
||||
}
|
||||
if encode == 'json':
|
||||
contents = json.dumps(contents)
|
||||
elif encode is None:
|
||||
pass
|
||||
else:
|
||||
raise error.Error('Encode request for put must be "json" or None, not {}'.format(encode))
|
||||
|
||||
files = {'file': contents}
|
||||
|
||||
body, code, headers = api_requestor.http_client.request(
|
||||
'post', self.post_url, post_data=self.post_fields, files=files, headers={})
|
||||
if code != 204:
|
||||
raise error.Error("Upload to S3 failed. If error persists, please contact us at gym@openai.com this message. S3 returned '{} -- {}'. Tried 'POST {}' with fields {}.".format(code, body, self.post_url, self.post_fields))
|
||||
|
||||
class Evaluation(CreateableAPIResource):
|
||||
def web_url(self):
|
||||
return "%s/evaluations/%s" % (gym.scoreboard.web_base, self.get('id'))
|
0
gym/scoreboard/client/tests/__init__.py
Normal file
0
gym/scoreboard/client/tests/__init__.py
Normal file
32
gym/scoreboard/client/tests/helper.py
Normal file
32
gym/scoreboard/client/tests/helper.py
Normal file
@@ -0,0 +1,32 @@
|
||||
import mock
|
||||
import unittest
|
||||
import uuid
|
||||
|
||||
def fake_id(prefix):
|
||||
entropy = ''.join([a for a in str(uuid.uuid4()) if a.isalnum()])
|
||||
return '{}_{}'.format(prefix, entropy)
|
||||
|
||||
class APITestCase(unittest.TestCase):
|
||||
def setUp(self):
|
||||
super(APITestCase, self).setUp()
|
||||
self.requestor_patcher = mock.patch('gym.scoreboard.client.api_requestor.APIRequestor')
|
||||
requestor_class_mock = self.requestor_patcher.start()
|
||||
self.requestor_mock = requestor_class_mock.return_value
|
||||
|
||||
def mock_response(self, res):
|
||||
self.requestor_mock.request = mock.Mock(return_value=(res, 'reskey'))
|
||||
|
||||
class TestData(object):
|
||||
@classmethod
|
||||
def file_upload_response(cls):
|
||||
return {
|
||||
'id': fake_id('file'),
|
||||
'object': 'file',
|
||||
}
|
||||
|
||||
@classmethod
|
||||
def evaluation_response(cls):
|
||||
return {
|
||||
'id': fake_id('file'),
|
||||
'object': 'evaluation',
|
||||
}
|
16
gym/scoreboard/client/tests/test_evaluation.py
Normal file
16
gym/scoreboard/client/tests/test_evaluation.py
Normal file
@@ -0,0 +1,16 @@
|
||||
from gym.scoreboard.client.tests import helper
|
||||
from gym import scoreboard
|
||||
|
||||
class EvaluationTest(helper.APITestCase):
|
||||
def test_create_evaluation(self):
|
||||
self.mock_response(helper.TestData.evaluation_response())
|
||||
|
||||
evaluation = scoreboard.Evaluation.create()
|
||||
assert isinstance(evaluation, scoreboard.Evaluation)
|
||||
|
||||
self.requestor_mock.request.assert_called_with(
|
||||
'post',
|
||||
'/v1/evaluations',
|
||||
{},
|
||||
None
|
||||
)
|
15
gym/scoreboard/client/tests/test_file_upload.py
Normal file
15
gym/scoreboard/client/tests/test_file_upload.py
Normal file
@@ -0,0 +1,15 @@
|
||||
from gym.scoreboard.client.tests import helper
|
||||
from gym import scoreboard
|
||||
|
||||
class FileUploadTest(helper.APITestCase):
|
||||
def test_create_file_upload(self):
|
||||
self.mock_response(helper.TestData.file_upload_response())
|
||||
|
||||
file_upload = scoreboard.FileUpload.create()
|
||||
assert isinstance(file_upload, scoreboard.FileUpload), 'File upload is: {!r}'.format(file_upload)
|
||||
|
||||
self.requestor_mock.request.assert_called_with(
|
||||
'post',
|
||||
'/v1/files',
|
||||
params={},
|
||||
)
|
14
gym/scoreboard/client/util.py
Normal file
14
gym/scoreboard/client/util.py
Normal file
@@ -0,0 +1,14 @@
|
||||
import logging
|
||||
import os
|
||||
import sys
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
def utf8(value):
|
||||
if isinstance(value, unicode) and sys.version_info < (3, 0):
|
||||
return value.encode('utf-8')
|
||||
else:
|
||||
return value
|
||||
|
||||
def file_size(f):
|
||||
return os.fstat(f.fileno()).st_size
|
123
gym/scoreboard/scoring.py
Normal file
123
gym/scoreboard/scoring.py
Normal file
@@ -0,0 +1,123 @@
|
||||
"""This is the actual code we use to score people's solutions
|
||||
server-side. The interfaces here are not yet stable, but we include
|
||||
them so that people can reproduce our scoring calculations
|
||||
independently.
|
||||
|
||||
We correspondly do not currently import this module.
|
||||
"""
|
||||
|
||||
import numpy as np
|
||||
import requests
|
||||
|
||||
import gym
|
||||
|
||||
def score_from_remote(url):
|
||||
result = requests.get(url)
|
||||
parsed = result.json()
|
||||
episode_lengths = parsed['episode_lengths']
|
||||
episode_rewards = parsed['episode_rewards']
|
||||
timestamps = parsed['timestamps']
|
||||
env_id = parsed['env_id']
|
||||
|
||||
spec = gym.spec(env_id)
|
||||
return score_from_merged(episode_lengths, episode_rewards, timestamps, spec.trials, spec.reward_threshold)
|
||||
|
||||
def score_from_merged(episode_lengths, episode_rewards, timestamps, trials, reward_threshold):
|
||||
"""Method to calculate the score from merged monitor files.
|
||||
"""
|
||||
# Make sure everything is a float -- no pesky ints.
|
||||
episode_rewards = np.array(episode_rewards, dtype='float64')
|
||||
episode_t_value = timestep_t_value = mean = error = time_in_seconds = None
|
||||
|
||||
if len(timestamps) > 2:
|
||||
# This is: time from the first *step* to the last *step*.
|
||||
time_in_seconds = timestamps[-1] - timestamps[0]
|
||||
if len(episode_rewards) >= trials:
|
||||
means = running_mean(episode_rewards, trials)
|
||||
if reward_threshold is not None:
|
||||
# Compute t-value by finding the first index above the
|
||||
# threshold. It comes out as a singleton tuple.
|
||||
(indexes_above_threshold, ) = np.where(means > reward_threshold)
|
||||
if len(indexes_above_threshold) > 0:
|
||||
# Grab the first episode index that is above the threshold value
|
||||
episode_t_value = indexes_above_threshold[0]
|
||||
|
||||
# Find timestep corresponding to this episode
|
||||
cumulative_timesteps = np.cumsum(np.insert(episode_lengths, 0, 0))
|
||||
# Convert that into timesteps
|
||||
timestep_t_value = cumulative_timesteps[episode_t_value]
|
||||
|
||||
# Find the window with the best mean
|
||||
best_idx = np.argmax(means)
|
||||
best_rewards = episode_rewards[best_idx:best_idx+trials]
|
||||
mean = np.mean(best_rewards)
|
||||
error = np.std(best_rewards) / (np.sqrt(trials) - 1)
|
||||
return {
|
||||
'episode_t_value': episode_t_value,
|
||||
'timestep_t_value': timestep_t_value,
|
||||
'mean': mean,
|
||||
'error': error,
|
||||
'number_episodes': len(episode_rewards),
|
||||
'number_timesteps': sum(episode_lengths),
|
||||
'time_in_seconds': time_in_seconds,
|
||||
}
|
||||
|
||||
def running_mean(x, N):
|
||||
x = np.array(x, dtype='float64')
|
||||
cumsum = np.cumsum(np.insert(x, 0, 0))
|
||||
return (cumsum[N:] - cumsum[:-N]) / N
|
||||
|
||||
def compute_graph_stats(episode_lengths, episode_rewards, timestamps, buckets):
|
||||
"""Method to compute the aggregates for the graphs."""
|
||||
# Not a dependency of OpenAI Gym generally.
|
||||
import scipy
|
||||
|
||||
num_episodes = len(episode_lengths)
|
||||
|
||||
episode_rewards = np.array(episode_rewards)
|
||||
episode_lengths = np.array(episode_lengths)
|
||||
|
||||
# The index of the start of each episode
|
||||
x_timestep = np.cumsum(np.insert(episode_lengths, 0, 0))[:-1]
|
||||
assert len(x_timestep) == num_episodes
|
||||
|
||||
# Nothing to compute here
|
||||
x_timestamp = timestamps
|
||||
|
||||
# The index of each episode
|
||||
x_episode = range(num_episodes)
|
||||
|
||||
# Calculate the appropriate x/y statistics
|
||||
x_timestep_y_reward = scipy.stats.binned_statistic(x_timestep, episode_rewards, 'median', buckets)
|
||||
x_timestep_y_length = scipy.stats.binned_statistic(x_timestep, episode_lengths, 'median', buckets)
|
||||
|
||||
x_episode_y_reward = scipy.stats.binned_statistic(x_episode, episode_rewards, 'median', buckets)
|
||||
x_episode_y_length = scipy.stats.binned_statistic(x_episode, episode_lengths, 'median', buckets)
|
||||
|
||||
x_timestamp_y_reward = scipy.stats.binned_statistic(x_timestamp, episode_rewards, 'median', buckets)
|
||||
x_timestamp_y_length = scipy.stats.binned_statistic(x_timestamp, episode_lengths, 'median', buckets)
|
||||
|
||||
|
||||
return {
|
||||
'x_timestep_y_reward': graphable_binned_statistic(x_timestep_y_reward),
|
||||
'x_timestep_y_length': graphable_binned_statistic(x_timestep_y_length),
|
||||
'x_episode_y_reward': graphable_binned_statistic(x_episode_y_reward),
|
||||
'x_episode_y_length': graphable_binned_statistic(x_episode_y_length),
|
||||
'x_timestamp_y_length': graphable_binned_statistic(x_timestamp_y_length),
|
||||
'x_timestamp_y_length': graphable_binned_statistic(x_timestamp_y_length),
|
||||
}
|
||||
|
||||
def graphable_binned_statistic(binned):
|
||||
x = running_mean(binned.bin_edges, 2)
|
||||
y = binned.statistic
|
||||
assert len(x) == len(y)
|
||||
|
||||
# Get rid of nasty NaNs
|
||||
valid = np.logical_not(np.isnan(x)) & np.logical_not(np.isnan(y))
|
||||
x = x[valid]
|
||||
y = y[valid]
|
||||
|
||||
return {
|
||||
'x': x,
|
||||
'y': y,
|
||||
}
|
5
gym/spaces/__init__.py
Normal file
5
gym/spaces/__init__.py
Normal file
@@ -0,0 +1,5 @@
|
||||
from .box import Box
|
||||
from .discrete import Discrete
|
||||
from .tuple_space import Tuple
|
||||
|
||||
__all__ = ["Box", "Discrete", "Tuple"]
|
39
gym/spaces/box.py
Normal file
39
gym/spaces/box.py
Normal file
@@ -0,0 +1,39 @@
|
||||
from gym import Space
|
||||
import numpy as np
|
||||
|
||||
class Box(Space):
|
||||
"""
|
||||
A box in R^n.
|
||||
I.e., each coordinate is bounded.
|
||||
"""
|
||||
def __init__(self, low, high, shape=None):
|
||||
"""
|
||||
Two kinds of valid input:
|
||||
Box(-1.0, 1.0, (3,4)) # low and high are scalars, and shape is provided
|
||||
Box(np.array([-1.0,-2.0]), np.array([2.0,4.0])) # low and high are arrays of the same shape
|
||||
"""
|
||||
if shape is None:
|
||||
assert low.shape == high.shape
|
||||
self.low = low
|
||||
self.high = high
|
||||
else:
|
||||
assert np.isscalar(low) and np.isscalar(high)
|
||||
self.low = low + np.zeros(shape)
|
||||
self.high = high + np.zeros(shape)
|
||||
def sample(self):
|
||||
return np.random.uniform(low=self.low, high=self.high, size=self.low.shape)
|
||||
def contains(self, x):
|
||||
return x.shape == self.shape and (x >= self.low).all() and (x <= self.high).all()
|
||||
|
||||
def to_jsonable(self, sample_n):
|
||||
return np.array(sample_n).tolist()
|
||||
def from_jsonable(self, sample_n):
|
||||
return [np.asarray(sample) for sample in sample_n]
|
||||
|
||||
@property
|
||||
def shape(self):
|
||||
return self.low.shape
|
||||
def __repr__(self):
|
||||
return "Box" + str(self.shape)
|
||||
def __eq__(self, other):
|
||||
return np.allclose(self.low, other.low) and np.allclose(self.high, other.high)
|
17
gym/spaces/discrete.py
Normal file
17
gym/spaces/discrete.py
Normal file
@@ -0,0 +1,17 @@
|
||||
import numpy as np
|
||||
from gym import Space
|
||||
|
||||
class Discrete(Space):
|
||||
"""
|
||||
{0,1,...,n-1}
|
||||
"""
|
||||
def __init__(self, n):
|
||||
self.n = n
|
||||
def sample(self):
|
||||
return np.random.randint(self.n)
|
||||
def contains(self, x):
|
||||
return isinstance(x, int) and x >= 0 and x < self.n
|
||||
def __repr__(self):
|
||||
return "Discrete(%d)" % self.n
|
||||
def __eq__(self, other):
|
||||
return self.n == other.n
|
30
gym/spaces/tests/test_spaces.py
Normal file
30
gym/spaces/tests/test_spaces.py
Normal file
@@ -0,0 +1,30 @@
|
||||
import json # note: ujson fails this test due to float equality
|
||||
|
||||
import numpy as np
|
||||
from nose2 import tools
|
||||
|
||||
from gym.spaces import Tuple, Box, Discrete
|
||||
|
||||
@tools.params(Discrete(3),
|
||||
Tuple([Discrete(5), Discrete(10)]),
|
||||
Tuple([Discrete(5), Box(np.array([0,0]),np.array([1,5]))]),
|
||||
Tuple((Discrete(5), Discrete(2), Discrete(2)))
|
||||
)
|
||||
def test_roundtripping(space):
|
||||
sample_1 = space.sample()
|
||||
sample_2 = space.sample()
|
||||
assert space.contains(sample_1)
|
||||
assert space.contains(sample_2)
|
||||
json_rep = space.to_jsonable([sample_1, sample_2])
|
||||
|
||||
json_roundtripped = json.loads(json.dumps(json_rep))
|
||||
|
||||
samples_after_roundtrip = space.from_jsonable(json_roundtripped)
|
||||
sample_1_prime, sample_2_prime = samples_after_roundtrip
|
||||
|
||||
s1 = space.to_jsonable([sample_1])
|
||||
s1p = space.to_jsonable([sample_1_prime])
|
||||
s2 = space.to_jsonable([sample_2])
|
||||
s2p = space.to_jsonable([sample_2_prime])
|
||||
assert s1 == s1p, "Expected {} to equal {}".format(s1, s1p)
|
||||
assert s2 == s2p, "Expected {} to equal {}".format(s2, s2p)
|
26
gym/spaces/tuple_space.py
Normal file
26
gym/spaces/tuple_space.py
Normal file
@@ -0,0 +1,26 @@
|
||||
from gym import Space
|
||||
|
||||
class Tuple(Space):
|
||||
"""
|
||||
A tuple (i.e., product) of simpler spaces
|
||||
"""
|
||||
def __init__(self, spaces):
|
||||
self.spaces = spaces
|
||||
|
||||
def sample(self):
|
||||
return tuple([space.sample() for space in self.spaces])
|
||||
|
||||
def contains(self, x):
|
||||
return isinstance(x, tuple) and len(x) == len(self.spaces) and all(
|
||||
space.contains(part) for (space,part) in zip(self.spaces,x))
|
||||
|
||||
def __repr__(self):
|
||||
return "Tuple(" + ", ". join([str(s) for s in self.spaces]) + ")"
|
||||
|
||||
def to_jsonable(self, sample_n):
|
||||
# serialize as list-repr of tuple of vectors
|
||||
return [space.to_jsonable([sample[i] for sample in sample_n]) \
|
||||
for i, space in enumerate(self.spaces)]
|
||||
|
||||
def from_jsonable(self, sample_n):
|
||||
return zip(*[space.from_jsonable(sample_n[i]) for i, space in enumerate(self.spaces)])
|
55
gym/utils.py
Normal file
55
gym/utils.py
Normal file
@@ -0,0 +1,55 @@
|
||||
"""A set of common utilities used within the environments. These are
|
||||
not intended as API functions, and will not remain stable over time.
|
||||
"""
|
||||
|
||||
color2num = dict(
|
||||
gray=30,
|
||||
red=31,
|
||||
green=32,
|
||||
yellow=33,
|
||||
blue=34,
|
||||
magenta=35,
|
||||
cyan=36,
|
||||
white=37,
|
||||
crimson=38
|
||||
)
|
||||
|
||||
def colorize(string, color, bold=False, highlight = False):
|
||||
"""Return string surrounded by appropriate terminal color codes to
|
||||
print colorized text. Valid colors: gray, red, green, yellow,
|
||||
blue, magenta, cyan, white, crimson
|
||||
"""
|
||||
attr = []
|
||||
num = color2num[color]
|
||||
if highlight: num += 10
|
||||
attr.append(unicode(num))
|
||||
if bold: attr.append('1')
|
||||
return '\x1b[%sm%s\x1b[0m' % (';'.join(attr), string)
|
||||
|
||||
class EzPickle(object):
|
||||
"""Objects that are pickled and unpickled via their constructor
|
||||
arguments.
|
||||
|
||||
Example usage:
|
||||
|
||||
class Dog(Animal, EzPickle):
|
||||
def __init__(self, furcolor, tailkind="bushy"):
|
||||
Animal.__init__()
|
||||
EzPickle.__init__(furcolor, tailkind)
|
||||
...
|
||||
|
||||
When this object is unpickled, a new Dog will be constructed by passing the provided
|
||||
furcolor and tailkind into the constructor. However, philosophers are still not sure
|
||||
whether it is still the same dog.
|
||||
|
||||
This is generally needed only for environments which wrap C/C++ code, such as MuJoCo
|
||||
and Atari.
|
||||
"""
|
||||
def __init__(self, *args, **kwargs):
|
||||
self._ezpickle_args = args
|
||||
self._ezpickle_kwargs = kwargs
|
||||
def __getstate__(self):
|
||||
return {"_ezpickle_args" : self._ezpickle_args, "_ezpickle_kwargs": self._ezpickle_kwargs}
|
||||
def __setstate__(self, d):
|
||||
out = type(self)(*d["_ezpickle_args"], **d["_ezpickle_kwargs"])
|
||||
self.__dict__.update(out.__dict__)
|
1
gym/version.py
Normal file
1
gym/version.py
Normal file
@@ -0,0 +1 @@
|
||||
VERSION = '0.0.1'
|
5
requirements.txt
Normal file
5
requirements.txt
Normal file
@@ -0,0 +1,5 @@
|
||||
# Testing
|
||||
nose2
|
||||
mock
|
||||
|
||||
-e .[all]
|
34
setup.py
Normal file
34
setup.py
Normal file
@@ -0,0 +1,34 @@
|
||||
from setuptools import setup, find_packages
|
||||
import sys, os.path
|
||||
|
||||
# Don't import gym module here, since deps may not be installed
|
||||
sys.path.insert(0, os.path.join(os.path.dirname(__file__), 'gym'))
|
||||
from version import VERSION
|
||||
|
||||
setup(name='gym',
|
||||
version=VERSION,
|
||||
description='The OpenAI Gym: A toolkit for developing and comparing your reinforcement learning agents.',
|
||||
url='https://github.com/openai/gym',
|
||||
author='OpenAI',
|
||||
author_email='gym@openai.com',
|
||||
license='',
|
||||
packages=[package for package in find_packages()
|
||||
if package.startswith('gym')],
|
||||
zip_safe=False,
|
||||
install_requires=[
|
||||
'numpy>=1.10.4', 'requests', 'six'
|
||||
],
|
||||
extras_require={
|
||||
'all': ['atari_py>=0.0.14', 'Pillow', 'pyglet',
|
||||
'pachi-py>=0.0.16',
|
||||
'mujoco_py>=0.4.0', 'imageio'],
|
||||
|
||||
# Environment-specific dependencies. Keep these in sync with
|
||||
# 'all'!
|
||||
'atari': ['atari_py>=0.0.14', 'Pillow', 'pyglet'],
|
||||
'board_game' : ['pachi-py>=0.0.16'],
|
||||
'classic_control': ['pyglet'],
|
||||
'mujoco': ['mujoco_py>=0.4.0', 'imageio'],
|
||||
},
|
||||
tests_require=['nose2', 'mock'],
|
||||
)
|
Reference in New Issue
Block a user