Software and Tasks for Continuous Control

[ad_1]

Overview

A public colab notebook with a tutorial for dm_control software is available here.

Infrastructure

An autogenerated MuJoCo Python wrapper provides full access to the underlying engine.
PyMJCF is a Document Object Model, wherein a hierarchy of Python Entity objects corresponds to MuJoCo model elements.
Composer is the high-level “game engine” which streamlines the composing of Entities into scenes and the defining observations, rewards, terminations and general game logic.
The Locomotion framework introduces several abstract Composer entities such as the Arena and Walker, facilitating locomotion-like tasks.

Environments

The Control Suite, including a new quadruped and dog environment.
Several locomotion tasks, including soccer.
Single arm robotic manipulation tasks using snap-together bricks.

Highlights

Named Indexing

Exploiting MuJoCo’s support of names for all model elements, we allow strings to index and slice into arrays. So instead of writing:

“fingertip_height = physics.data.geom_xpos[7, 2]”

…using obscure, fragile numerical indexing, you can write:

“fingertip_height = physics.named.data.geom_xpos[‘fingertip’, ‘z’]”

leading to a much more robust, readable codebase.

PyMJCF

The PyMJCF library creates a Python object hierarchy with 1:1 correspondence to a MuJoCo model. It introduces the attach() method which allows models to be attached to one another. For example, in our tutorial we create procedural multi-legged creatures by attaching legs to bodies and creatures to the scene.

Composer

Composer is the “game engine“ framework, which defines a particular order of runtime function calls, and abstracts the affordances of reward, termination and observation. These abstractions allowed us to create useful submodules:

composer.Observable: An abstract observation wrapper which can add noise, delays, buffering and filtering to any sensor.

composer.Variation: A set of tools for randomising simulation quantities, allowing for agent robustification and sim-to-real via model variation.

Diagram showing the life-cycle of Composer callbacks. Rounded rectangles represent callbacks that Tasks and Entities may implement. Blue rectangles represent built-in Composer operations.

Locomotion

The Locomotion framework introduced the abstractions:

Walker: A controllable entity with common locomotion-related methods, like projection of vectors into an egocentric frame.

Arena: A self-scaling randomised scene, in which the walker can be placed and given a task to perform.

For example, using just 4 function calls, we can instantiate a humanoid walker, a WallsCorridor arena and combine them in a RunThroughCorridor task.

New Control Suite domains

Quadruped

A generic quadruped domain with a passively stable body.
Several pure locomotion tasks (e.g. walk, run).
An escape task requiring rough terrain navigation.
A fetch task requiring ball dribbling.

Dog

An elaborate model based on a skeleton commissioned from leo3Dmodels.
A challenging ball-fetching task that requires precision grasping with the mouth.