refine_plan.algorithms.explore
Functions for synthesising an exploration policy using the approach in:
Shyam, P., Jaśkowski, W. and Gomez, F., 2019, May. Model-based active exploration. In International conference on machine learning (pp. 5779-5788). PMLR.
Author: Charlie Street Owner: Charlie Street
Functions
|
Synthesise a policy for a finite horizon MDP. |
|
Synthesises an exploration policy for the current episode. |
Module Contents
- refine_plan.algorithms.explore.solve_finite_horizon_mdp(mdp, state_idx_map, horizon, mat_type=np.float32)
Synthesise a policy for a finite horizon MDP.
This can be done through one backwards Bellman backup through time.
- Parameters:
mdp – The MDP (with DBNOptionEnsemble options)
state_idx_map – The state to matrix indice mapping
horizon – The planning horizon
mat_type – The dtype for the matrices
- Returns:
A TimeDependentPolicy
- refine_plan.algorithms.explore.synthesise_exploration_policy(connection_str, db_name, collection_name, sf_list, option_names, ensemble_size, horizon, enabled_conds, initial_state=None, use_storm=False, motion_params=None)
Synthesises an exploration policy for the current episode.
- Parameters:
connection_str – The mongodb connection string
db_name – The Mongo database name
collection_name – The collection within the database
sf_list – The list of state factors used for planning
option_names – The list of option (action) names
ensemble_size – The size of the ensemble model for each option
horizon – The length of the planning horizon
enabled_conds – A dictionary from option name to enabled Condition
initial_state – The initial state of the exploration MDP
use_storm – If True, use Storm instead of the local solver
motion_params – A dictionary from option names to a list of params for that option
- Returns:
The exploration policy