refine_plan.algorithms.explore
==============================

.. py:module:: refine_plan.algorithms.explore

.. autoapi-nested-parse::

   Functions for synthesising an exploration policy using the approach in:

   Shyam, P., Jaśkowski, W. and Gomez, F., 2019, May. Model-based active exploration.
   In International conference on machine learning (pp. 5779-5788). PMLR.

   Author: Charlie Street
   Owner: Charlie Street


Functions
---------

.. autoapisummary::

   refine_plan.algorithms.explore.solve_finite_horizon_mdp
   refine_plan.algorithms.explore.synthesise_exploration_policy


Module Contents
---------------

.. py:function:: solve_finite_horizon_mdp(mdp, state_idx_map, horizon, mat_type=np.float32)

   Synthesise a policy for a finite horizon MDP.

   This can be done through one backwards Bellman backup through time.

   :param mdp: The MDP (with DBNOptionEnsemble options)
   :param state_idx_map: The state to matrix indice mapping
   :param horizon: The planning horizon
   :param mat_type: The dtype for the matrices

   :returns: A TimeDependentPolicy


.. py:function:: synthesise_exploration_policy(connection_str, db_name, collection_name, sf_list, option_names, ensemble_size, horizon, enabled_conds, initial_state=None, use_storm=False, motion_params=None)

   Synthesises an exploration policy for the current episode.

   :param connection_str: The mongodb connection string
   :param db_name: The Mongo database name
   :param collection_name: The collection within the database
   :param sf_list: The list of state factors used for planning
   :param option_names: The list of option (action) names
   :param ensemble_size: The size of the ensemble model for each option
   :param horizon: The length of the planning horizon
   :param enabled_conds: A dictionary from option name to enabled Condition
   :param initial_state: The initial state of the exploration MDP
   :param use_storm: If True, use Storm instead of the local solver
   :param motion_params: A dictionary from option names to a list of params for that option

   :returns: The exploration policy