Actuator Gain PRM causing issues at scale

Discussion in 'Simulation' started by Felix Su, Feb 5, 2020.

  1. I am trying to modify the actuator force gain's P-Term for the ShadowHand environment (Actuator XML here:

    I am trying to duplicate the domain randomization listed in Table 1 of the OpenAI Learning Dextrous In-Hand Manipulation Paper (

    To do this I wrote the following code:

    for actuator_name in sim.model.actuator_names:
        scale = np.exp(np.random.uniform(np.log(0.75), np.log(1.5)))
        sim.model.actuator_gainprm[sim.model.actuator_name2id(actuator_name)][0] *= scale
    This causes no problems when I run a few rollouts locally, but when I run domain randomization at scale using a machine learning model that trains on thousands of rollouts, my training stops in the middle and the CPU stops doing work without throwing any errors.

    Is there something incorrect that I am doing to the actuator gain that is causing instability int he environment? Also, is there any situation where the environment can fail silently without errors?