refine_plan.algorithms.explore

Functions for synthesising an exploration policy using the approach in:

Shyam, P., Jaśkowski, W. and Gomez, F., 2019, May. Model-based active exploration. In International conference on machine learning (pp. 5779-5788). PMLR.

Author: Charlie Street Owner: Charlie Street

Functions

`solve_finite_horizon_mdp`(mdp, state_idx_map, horizon)	Synthesise a policy for a finite horizon MDP.
`synthesise_exploration_policy`(connection_str, db_name, ...)	Synthesises an exploration policy for the current episode.

Module Contents

refine_plan.algorithms.explore.solve_finite_horizon_mdp(mdp, state_idx_map, horizon, mat_type=np.float32)

Synthesise a policy for a finite horizon MDP.

This can be done through one backwards Bellman backup through time.

Parameters:

mdp – The MDP (with DBNOptionEnsemble options)
state_idx_map – The state to matrix indice mapping
horizon – The planning horizon
mat_type – The dtype for the matrices

Returns:

A TimeDependentPolicy

refine_plan.algorithms.explore.synthesise_exploration_policy(connection_str, db_name, collection_name, sf_list, option_names, ensemble_size, horizon, enabled_conds, initial_state=None, use_storm=False, motion_params=None)

Synthesises an exploration policy for the current episode.

Parameters:

connection_str – The mongodb connection string
db_name – The Mongo database name
collection_name – The collection within the database
sf_list – The list of state factors used for planning
option_names – The list of option (action) names
ensemble_size – The size of the ensemble model for each option
horizon – The length of the planning horizon
enabled_conds – A dictionary from option name to enabled Condition
initial_state – The initial state of the exploration MDP
use_storm – If True, use Storm instead of the local solver
motion_params – A dictionary from option names to a list of params for that option

Returns:

The exploration policy