controllers package

Submodules

controllers.Controller module

class controllers.Controller.Controller(*args, **kwargs)

Bases: object

Framework of a controller class

act(obs, t, get_pred_cost=False)

Performs an action.

dump_logs(primary_logdir, iter_logdir)

Dumps logs into primary log directory and per-train iteration log directory.

reset()

Resets this controller.

train(obs_trajs, acs_trajs, rews_trajs)

Trains this controller using lists of trajectories.

controllers.MPC module

class controllers.MPC.MPC(params)

Bases: controllers.Controller.Controller

Parameters:params (dotmap) –

Configuration parameters. .env (gym.env):

Environment for which this controller will be used.
.update_fns (list<func>):
A list of functions that will be invoked (possibly with a tensorflow session) every time this controller is reset.
.ac_ub (np.ndarray): (optional)
An array of action upper bounds. Defaults to environment action upper bounds.
.ac_lb (np.ndarray): (optional)
An array of action lower bounds. Defaults to environment action lower bounds.
.per (int): (optional)
Determines how often the action sequence will be optimized. Defaults to 1 (reoptimizes at every call to act()).
.prop_cfg
.model_init_cfg (DotMap):
A DotMap of initialization parameters for the model. .model_constructor (func):
A function which constructs an instance of this model, given model_init_cfg.
.model_train_cfg (dict): (optional)
A DotMap of training parameters that will be passed into the model every time is is trained. Defaults to an empty dict.
.model_pretrained (bool): (optional)
If True, assumes that the model has been trained upon construction.
.mode (str):
Propagation method. Choose between [E, DS, TSinf, TS1, MM]. See https://arxiv.org/abs/1805.12114 for details.
.npart (int):
Number of particles used for DS, TSinf, TS1, and MM propagation methods.
.ign_var (bool): (optional)
Determines whether or not variance output of the model will be ignored. Defaults to False unless deterministic propagation is being used.
.obs_preproc (func): (optional)
A function which modifies observations (in a 2D matrix) before they are passed into the model. Defaults to lambda obs: obs. Note: Must be able to process both NumPy and Tensorflow arrays.
.obs_postproc (func): (optional)
A function which returns vectors calculated from the previous observations and model predictions, which will then be passed into the provided cost function on observations. Defaults to lambda obs, model_out: model_out. Note: Must be able to process both NumPy and Tensorflow arrays.
.obs_postproc2 (func): (optional)
A function which takes the vectors returned by obs_postproc and (possibly) modifies it into the predicted observations for the next time step. Defaults to lambda obs: obs. Note: Must be able to process both NumPy and Tensorflow arrays.
.targ_proc (func): (optional)
A function which takes current observations and next observations and returns the array of targets (so that the model learns the mapping obs -> targ_proc(obs, next_obs)). Defaults to lambda obs, next_obs: next_obs. Note: Only needs to process NumPy arrays.
.opt_cfg
.mode (str):
Internal optimizer that will be used. Choose between [CEM, Random].
.cfg (DotMap):
A map of optimizer initializer parameters.
.plan_hor (int):
The planning horizon that will be used in optimization.
.obs_cost_fn (func):
A function which computes the cost of every observation in a 2D matrix. Note: Must be able to process both NumPy and Tensorflow arrays.
.ac_cost_fn (func):
A function which computes the cost of every action in a 2D matrix.
.constrains (np.array):
An array with the optimisation constrains = [[lb, ub], [lc1, uc1], [lc2, uc2]] so that if u = [v, q], lb <= u <= ub, lc1 <= q/v <= uc2, lc2 <= q/sqrt(v) <= uc2. Overwrites ac_lb and ac_ub is constrains[0] is not None.
.log_cfg
.save_all_models (bool): (optional)
If True, saves models at every iteration. Defaults to False (only most recent model is saved). Warning: Can be very memory-intensive.
.log_traj_preds (bool): (optional)
If True, saves the mean and variance of predicted particle trajectories. Defaults to False.
.log_particles (bool) (optional)
If True, saves all predicted particles trajectories. Defaults to False. Note: Takes precedence over log_traj_preds. Warning: Can be very memory-intensive
act(obs, t, get_pred_cost=False)

Returns the action that this controller would take at time t given observation obs.

Parameters:
  • obs – The current observation
  • t – The current timestep
  • get_pred_cost – If True, returns the predicted cost for the action sequence found by the internal optimizer.

Returns: An action (and possibly the predicted cost)

changePlanHor(T, freq=None, change_over=False)

Dynamically changes the planning horizon of the MPC algorithm.

Parameters:T (int) – New planning horizon.
changeTargetCost(target)
dump_logs(primary_logdir, iter_logdir)

Saves logs to either a primary log directory or another iteration-specific directory. See __init__ documentation to see what is being logged.

Parameters:
  • primary_logdir (str) – A directory path. This controller assumes that this directory does not change every iteration.
  • iter_logdir (str) – A directory path. This controller assumes that this directory changes every time dump_logs is called.

Returns: None

optimizers = {'CEM': <class 'dmbrl.misc.optimizers.cem.CEMOptimizer'>, 'Random': <class 'dmbrl.misc.optimizers.random.RandomOptimizer'>}
reset()

Resets this controller (clears previous solution, calls all update functions).

Returns: None

train(obs_trajs, obs_prime_trajs, acs_trajs)

Trains the internal model of this controller. Once trained, this controller switches from applying random actions to using MPC.

Parameters:
  • obs_trajs – (N, nS)
  • obs_prime_trajs – (N, nS) observations at next time step
  • acs_trajs – (N, nU)

Returns: None.

Module contents