refine_plan.algorithms.explore ============================== .. py:module:: refine_plan.algorithms.explore .. autoapi-nested-parse:: Functions for synthesising an exploration policy using the approach in: Shyam, P., Jaśkowski, W. and Gomez, F., 2019, May. Model-based active exploration. In International conference on machine learning (pp. 5779-5788). PMLR. Author: Charlie Street Owner: Charlie Street Functions --------- .. autoapisummary:: refine_plan.algorithms.explore.solve_finite_horizon_mdp refine_plan.algorithms.explore.synthesise_exploration_policy Module Contents --------------- .. py:function:: solve_finite_horizon_mdp(mdp, state_idx_map, horizon, mat_type=np.float32) Synthesise a policy for a finite horizon MDP. This can be done through one backwards Bellman backup through time. :param mdp: The MDP (with DBNOptionEnsemble options) :param state_idx_map: The state to matrix indice mapping :param horizon: The planning horizon :param mat_type: The dtype for the matrices :returns: A TimeDependentPolicy .. py:function:: synthesise_exploration_policy(connection_str, db_name, collection_name, sf_list, option_names, ensemble_size, horizon, enabled_conds, initial_state=None, use_storm=False, motion_params=None) Synthesises an exploration policy for the current episode. :param connection_str: The mongodb connection string :param db_name: The Mongo database name :param collection_name: The collection within the database :param sf_list: The list of state factors used for planning :param option_names: The list of option (action) names :param ensemble_size: The size of the ensemble model for each option :param horizon: The length of the planning horizon :param enabled_conds: A dictionary from option name to enabled Condition :param initial_state: The initial state of the exploration MDP :param use_storm: If True, use Storm instead of the local solver :param motion_params: A dictionary from option names to a list of params for that option :returns: The exploration policy