<h2>Description<aclass="headerlink"href="#description"title="Permalink to this heading">#</a></h2>
<p>“Pusher” is a multi-jointed robot arm which is very similar to that of a human.
The goal is to move a target cylinder (called <em>object</em>) to a goal position using the robot’s end effector (called <em>fingertip</em>).
The robot consists of shoulder, elbow, forearm, and wrist joints.</p>
</section>
<sectionid="action-space">
<h2>Action Space<aclass="headerlink"href="#action-space"title="Permalink to this heading">#</a></h2>
<p>The action space is a <codeclass="docutils literal notranslate"><spanclass="pre">Box(-2,</span><spanclass="pre">2,</span><spanclass="pre">(7,),</span><spanclass="pre">float32)</span></code>. An action <codeclass="docutils literal notranslate"><spanclass="pre">(a,</span><spanclass="pre">b)</span></code> represents the torques applied at the hinge joints.</p>
<thclass="head"><p>Name (in corresponding XML file)</p></th>
<thclass="head"><p>Joint</p></th>
<thclass="head"><p>Unit</p></th>
</tr>
</thead>
<tbody>
<trclass="row-even"><td><p>0</p></td>
<td><p>Rotation of the panning the shoulder</p></td>
<td><p>-2</p></td>
<td><p>2</p></td>
<td><p>r_shoulder_pan_joint</p></td>
<td><p>hinge</p></td>
<td><p>torque (N m)</p></td>
</tr>
<trclass="row-odd"><td><p>1</p></td>
<td><p>Rotation of the shoulder lifting joint</p></td>
<td><p>-2</p></td>
<td><p>2</p></td>
<td><p>r_shoulder_lift_joint</p></td>
<td><p>hinge</p></td>
<td><p>torque (N m)</p></td>
</tr>
<trclass="row-even"><td><p>2</p></td>
<td><p>Rotation of the shoulder rolling joint</p></td>
<td><p>-2</p></td>
<td><p>2</p></td>
<td><p>r_upper_arm_roll_joint</p></td>
<td><p>hinge</p></td>
<td><p>torque (N m)</p></td>
</tr>
<trclass="row-odd"><td><p>3</p></td>
<td><p>Rotation of hinge joint that flexed the elbow</p></td>
<td><p>-2</p></td>
<td><p>2</p></td>
<td><p>r_elbow_flex_joint</p></td>
<td><p>hinge</p></td>
<td><p>torque (N m)</p></td>
</tr>
<trclass="row-even"><td><p>4</p></td>
<td><p>Rotation of hinge that rolls the forearm</p></td>
<td><p>-2</p></td>
<td><p>2</p></td>
<td><p>r_forearm_roll_joint</p></td>
<td><p>hinge</p></td>
<td><p>torque (N m)</p></td>
</tr>
<trclass="row-odd"><td><p>5</p></td>
<td><p>Rotation of flexing the wrist</p></td>
<td><p>-2</p></td>
<td><p>2</p></td>
<td><p>r_wrist_flex_joint</p></td>
<td><p>hinge</p></td>
<td><p>torque (N m)</p></td>
</tr>
<trclass="row-even"><td><p>6</p></td>
<td><p>Rotation of rolling the wrist</p></td>
<td><p>-2</p></td>
<td><p>2</p></td>
<td><p>r_wrist_roll_joint</p></td>
<td><p>hinge</p></td>
<td><p>torque (N m)</p></td>
</tr>
</tbody>
</table>
</div>
</section>
<sectionid="observation-space">
<h2>Observation Space<aclass="headerlink"href="#observation-space"title="Permalink to this heading">#</a></h2>
<p>Observations consist of</p>
<ulclass="simple">
<li><p>Angle of rotational joints on the pusher</p></li>
<li><p>Angular velocities of rotational joints on the pusher</p></li>
<li><p>The coordinates of the fingertip of the pusher</p></li>
<li><p>The coordinates of the object to be moved</p></li>
<li><p>The coordinates of the goal position</p></li>
</ul>
<p>The observation is a <codeclass="docutils literal notranslate"><spanclass="pre">ndarray</span></code> with shape <codeclass="docutils literal notranslate"><spanclass="pre">(23,)</span></code> where the elements correspond to the table below.
An analogy can be drawn to a human arm in order to help understand the state space, with the words flex and roll meaning the
<thclass="head"><p>Name (in corresponding XML file)</p></th>
<thclass="head"><p>Joint</p></th>
<thclass="head"><p>Unit</p></th>
</tr>
</thead>
<tbody>
<trclass="row-even"><td><p>0</p></td>
<td><p>Rotation of the panning the shoulder</p></td>
<td><p>-Inf</p></td>
<td><p>Inf</p></td>
<td><p>r_shoulder_pan_joint</p></td>
<td><p>hinge</p></td>
<td><p>angle (rad)</p></td>
</tr>
<trclass="row-odd"><td><p>1</p></td>
<td><p>Rotation of the shoulder lifting joint</p></td>
<td><p>-Inf</p></td>
<td><p>Inf</p></td>
<td><p>r_shoulder_lift_joint</p></td>
<td><p>hinge</p></td>
<td><p>angle (rad)</p></td>
</tr>
<trclass="row-even"><td><p>2</p></td>
<td><p>Rotation of the shoulder rolling joint</p></td>
<td><p>-Inf</p></td>
<td><p>Inf</p></td>
<td><p>r_upper_arm_roll_joint</p></td>
<td><p>hinge</p></td>
<td><p>angle (rad)</p></td>
</tr>
<trclass="row-odd"><td><p>3</p></td>
<td><p>Rotation of hinge joint that flexed the elbow</p></td>
<td><p>-Inf</p></td>
<td><p>Inf</p></td>
<td><p>r_elbow_flex_joint</p></td>
<td><p>hinge</p></td>
<td><p>angle (rad)</p></td>
</tr>
<trclass="row-even"><td><p>4</p></td>
<td><p>Rotation of hinge that rolls the forearm</p></td>
<td><p>-Inf</p></td>
<td><p>Inf</p></td>
<td><p>r_forearm_roll_joint</p></td>
<td><p>hinge</p></td>
<td><p>angle (rad)</p></td>
</tr>
<trclass="row-odd"><td><p>5</p></td>
<td><p>Rotation of flexing the wrist</p></td>
<td><p>-Inf</p></td>
<td><p>Inf</p></td>
<td><p>r_wrist_flex_joint</p></td>
<td><p>hinge</p></td>
<td><p>angle (rad)</p></td>
</tr>
<trclass="row-even"><td><p>6</p></td>
<td><p>Rotation of rolling the wrist</p></td>
<td><p>-Inf</p></td>
<td><p>Inf</p></td>
<td><p>r_wrist_roll_joint</p></td>
<td><p>hinge</p></td>
<td><p>angle (rad)</p></td>
</tr>
<trclass="row-odd"><td><p>7</p></td>
<td><p>Rotational velocity of the panning the shoulder</p></td>
<td><p>-Inf</p></td>
<td><p>Inf</p></td>
<td><p>r_shoulder_pan_joint</p></td>
<td><p>hinge</p></td>
<td><p>angular velocity (rad/s)</p></td>
</tr>
<trclass="row-even"><td><p>8</p></td>
<td><p>Rotational velocity of the shoulder lifting joint</p></td>
<td><p>-Inf</p></td>
<td><p>Inf</p></td>
<td><p>r_shoulder_lift_joint</p></td>
<td><p>hinge</p></td>
<td><p>angular velocity (rad/s)</p></td>
</tr>
<trclass="row-odd"><td><p>9</p></td>
<td><p>Rotational velocity of the shoulder rolling joint</p></td>
<td><p>-Inf</p></td>
<td><p>Inf</p></td>
<td><p>r_upper_arm_roll_joint</p></td>
<td><p>hinge</p></td>
<td><p>angular velocity (rad/s)</p></td>
</tr>
<trclass="row-even"><td><p>10</p></td>
<td><p>Rotational velocity of hinge joint that flexed the elbow</p></td>
<td><p>-Inf</p></td>
<td><p>Inf</p></td>
<td><p>r_elbow_flex_joint</p></td>
<td><p>hinge</p></td>
<td><p>angular velocity (rad/s)</p></td>
</tr>
<trclass="row-odd"><td><p>11</p></td>
<td><p>Rotational velocity of hinge that rolls the forearm</p></td>
<td><p>-Inf</p></td>
<td><p>Inf</p></td>
<td><p>r_forearm_roll_joint</p></td>
<td><p>hinge</p></td>
<td><p>angular velocity (rad/s)</p></td>
</tr>
<trclass="row-even"><td><p>12</p></td>
<td><p>Rotational velocity of flexing the wrist</p></td>
<td><p>-Inf</p></td>
<td><p>Inf</p></td>
<td><p>r_wrist_flex_joint</p></td>
<td><p>hinge</p></td>
<td><p>angular velocity (rad/s)</p></td>
</tr>
<trclass="row-odd"><td><p>13</p></td>
<td><p>Rotational velocity of rolling the wrist</p></td>
<td><p>-Inf</p></td>
<td><p>Inf</p></td>
<td><p>r_wrist_roll_joint</p></td>
<td><p>hinge</p></td>
<td><p>angular velocity (rad/s)</p></td>
</tr>
<trclass="row-even"><td><p>14</p></td>
<td><p>x-coordinate of the fingertip of the pusher</p></td>
<td><p>-Inf</p></td>
<td><p>Inf</p></td>
<td><p>tips_arm</p></td>
<td><p>slide</p></td>
<td><p>position (m)</p></td>
</tr>
<trclass="row-odd"><td><p>15</p></td>
<td><p>y-coordinate of the fingertip of the pusher</p></td>
<td><p>-Inf</p></td>
<td><p>Inf</p></td>
<td><p>tips_arm</p></td>
<td><p>slide</p></td>
<td><p>position (m)</p></td>
</tr>
<trclass="row-even"><td><p>16</p></td>
<td><p>z-coordinate of the fingertip of the pusher</p></td>
<td><p>-Inf</p></td>
<td><p>Inf</p></td>
<td><p>tips_arm</p></td>
<td><p>slide</p></td>
<td><p>position (m)</p></td>
</tr>
<trclass="row-odd"><td><p>17</p></td>
<td><p>x-coordinate of the object to be moved</p></td>
<td><p>-Inf</p></td>
<td><p>Inf</p></td>
<td><p>object (obj_slidex)</p></td>
<td><p>slide</p></td>
<td><p>position (m)</p></td>
</tr>
<trclass="row-even"><td><p>18</p></td>
<td><p>y-coordinate of the object to be moved</p></td>
<td><p>-Inf</p></td>
<td><p>Inf</p></td>
<td><p>object (obj_slidey)</p></td>
<td><p>slide</p></td>
<td><p>position (m)</p></td>
</tr>
<trclass="row-odd"><td><p>19</p></td>
<td><p>z-coordinate of the object to be moved</p></td>
<td><p>-Inf</p></td>
<td><p>Inf</p></td>
<td><p>object</p></td>
<td><p>cylinder</p></td>
<td><p>position (m)</p></td>
</tr>
<trclass="row-even"><td><p>20</p></td>
<td><p>x-coordinate of the goal position of the object</p></td>
<td><p>-Inf</p></td>
<td><p>Inf</p></td>
<td><p>goal (goal_slidex)</p></td>
<td><p>slide</p></td>
<td><p>position (m)</p></td>
</tr>
<trclass="row-odd"><td><p>21</p></td>
<td><p>y-coordinate of the goal position of the object</p></td>
<td><p>-Inf</p></td>
<td><p>Inf</p></td>
<td><p>goal (goal_slidey)</p></td>
<td><p>slide</p></td>
<td><p>position (m)</p></td>
</tr>
<trclass="row-even"><td><p>22</p></td>
<td><p>z-coordinate of the goal position of the object</p></td>
<td><p>-Inf</p></td>
<td><p>Inf</p></td>
<td><p>goal</p></td>
<td><p>sphere</p></td>
<td><p>position (m)</p></td>
</tr>
</tbody>
</table>
</div>
</section>
<sectionid="rewards">
<h2>Rewards<aclass="headerlink"href="#rewards"title="Permalink to this heading">#</a></h2>
<p>The reward consists of two parts:</p>
<ulclass="simple">
<li><p>*reward_near *: This reward is a measure of how far the <em>fingertip</em>
of the pusher (the unattached end) is from the object, with a more negative
value assigned for when the pusher’s <em>fingertip</em> is further away from the
target. It is calculated as the negative vector norm of (position of
the fingertip - position of target), or <em>-norm(“fingertip” - “target”)</em>.</p></li>
<li><p>*reward_dist *: This reward is a measure of how far the object is from
the target goal position, with a more negative value assigned for object is
further away from the target. It is calculated as the negative vector norm of
(position of the object - position of goal), or <em>-norm(“object” - “target”)</em>.</p></li>
<li><p><em>reward_control</em>: A negative reward for penalising the pusher if
it takes actions that are too large. It is measured as the negative squared
Euclidean norm of the action, i.e. as <em>- sum(action<sup>2</sup>)</em>.</p></li>
</ul>
<p>The total reward returned is <em><strong>reward</strong></em><em>=</em><em>reward_dist + 0.1 * reward_ctrl + 0.5 * reward_near</em></p>
<p>Unlike other environments, Pusher does not allow you to specify weights for the individual reward terms.
However, <codeclass="docutils literal notranslate"><spanclass="pre">info</span></code> does contain the keys <em>reward_dist</em> and <em>reward_ctrl</em>. Thus, if you’d like to weight the terms,
you should create a wrapper that computes the weighted reward from <codeclass="docutils literal notranslate"><spanclass="pre">info</span></code>.</p>
</section>
<sectionid="starting-state">
<h2>Starting State<aclass="headerlink"href="#starting-state"title="Permalink to this heading">#</a></h2>
<p>All pusher (not including object and goal) states start in
(0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0). A uniform noise in the range
[-0.005, 0.005] is added to the velocity attributes only. The velocities of
the object and goal are permanently set to 0. The object’s x-position is selected uniformly
between [-0.3, 0] while the y-position is selected uniformly between [-0.2, 0.2], and this
process is repeated until the vector norm between the object’s (x,y) position and origin is not greater
than 0.17. The goal always have the same position of (0.45, -0.05, -0.323).</p>
<p>The default framerate is 5 with each frame lasting for 0.01, giving rise to a <em>dt = 5 * 0.01 = 0.05</em></p>
</section>
<sectionid="episode-end">
<h2>Episode End<aclass="headerlink"href="#episode-end"title="Permalink to this heading">#</a></h2>
<p>The episode ends when any of the following happens:</p>
<olclass="arabic simple">
<li><p>Truncation: The episode duration reaches a 100 timesteps.</p></li>
<li><p>Termination: Any of the state space values is no longer finite.</p></li>
</ol>
</section>
<sectionid="arguments">
<h2>Arguments<aclass="headerlink"href="#arguments"title="Permalink to this heading">#</a></h2>
<p>No additional arguments are currently supported (in v2 and lower),
but modifications can be made to the XML file in the assets folder
(or by changing the path to a modified XML file in another folder)..</p>
<p>There is no v3 for Pusher, unlike the robot environments where a v3 and
beyond take gymnasium.make kwargs such as xml_file, ctrl_cost_weight, reset_noise_scale etc.</p>
</section>
<sectionid="version-history">
<h2>Version History<aclass="headerlink"href="#version-history"title="Permalink to this heading">#</a></h2>
<ulclass="simple">
<li><p>v4: all mujoco environments now use the mujoco bindings in mujoco>=2.1.3</p></li>
<li><p>v2: All continuous control environments now use mujoco_py >= 1.50</p></li>
<li><p>v1: max_time_steps raised to 1000 for robot based tasks (not including reacher, which has a max_time_steps of 50). Added reward_threshold to environments.</p></li>