refine_plan.algorithms.explore

Functions for synthesising an exploration policy using the approach in:

Shyam, P., Jaśkowski, W. and Gomez, F., 2019, May. Model-based active exploration. In International conference on machine learning (pp. 5779-5788). PMLR.

Author: Charlie Street Owner: Charlie Street

Functions

solve_finite_horizon_mdp(mdp, state_idx_map, horizon)

Synthesise a policy for a finite horizon MDP.

synthesise_exploration_policy(connection_str, db_name, ...)

Synthesises an exploration policy for the current episode.

Module Contents

refine_plan.algorithms.explore.solve_finite_horizon_mdp(mdp, state_idx_map, horizon, mat_type=np.float32)

Synthesise a policy for a finite horizon MDP.

This can be done through one backwards Bellman backup through time.

Parameters:
  • mdp – The MDP (with DBNOptionEnsemble options)

  • state_idx_map – The state to matrix indice mapping

  • horizon – The planning horizon

  • mat_type – The dtype for the matrices

Returns:

A TimeDependentPolicy

refine_plan.algorithms.explore.synthesise_exploration_policy(connection_str, db_name, collection_name, sf_list, option_names, ensemble_size, horizon, enabled_conds, initial_state=None, use_storm=False, motion_params=None)

Synthesises an exploration policy for the current episode.

Parameters:
  • connection_str – The mongodb connection string

  • db_name – The Mongo database name

  • collection_name – The collection within the database

  • sf_list – The list of state factors used for planning

  • option_names – The list of option (action) names

  • ensemble_size – The size of the ensemble model for each option

  • horizon – The length of the planning horizon

  • enabled_conds – A dictionary from option name to enabled Condition

  • initial_state – The initial state of the exploration MDP

  • use_storm – If True, use Storm instead of the local solver

  • motion_params – A dictionary from option names to a list of params for that option

Returns:

The exploration policy