refine_plan.models.dbn_option_ensemble
======================================

.. py:module:: refine_plan.models.dbn_option_ensemble

.. autoapi-nested-parse::

   A class for an ensemble of DBNOption models.

   This is used for active exploration.

   Author: Charlie Street
   Owner: Charlie Street


Classes
-------

.. autoapisummary::

   refine_plan.models.dbn_option_ensemble.DBNOptionEnsemble


Module Contents
---------------

.. py:class:: DBNOptionEnsemble(name, data, ensemble_size, horizon, sf_list, enabled_cond, state_idx_map, compute_prism_str=False)

   Bases: :py:obj:`refine_plan.models.option.Option`


   A class containing an ensemble of DBNOptions for active exploration.

   Each DBNOption in the ensemble is trained on a different subset of the data.

   In _transition_dicts[i][state] or _sampled_transition_dict[state], a None
   value is used to represent a uniform distribution over the state space.

   .. attribute:: Same as superclass, plus

      
   .. attribute:: _ensemble_size

      The size of the ensemble

   .. attribute:: _horizon

      Number of steps in the planning horizon

   .. attribute:: _sf_list

      The list of state factors that make up the state space

   .. attribute:: _enabled_cond

      A Condition which is satisfied in states where the option is enabled

   .. attribute:: _enabled_states

      A list of states where the option is enabled

   .. attribute:: _dbns

      The ensemble (list) of DBNOptions

   .. attribute:: _transition_dicts

      The corresponding transition dicts for each DBNOption.

   .. attribute:: _sampled_transition_dict

      The sampled transitions

   .. attribute:: _reward_dict

      The reward dictionary containing information gain values

   .. attribute:: _transition_prism_str

      The transition PRISM string, cached

   .. attribute:: _reward_prism_str

      The reward PRISM string, cached

   .. attribute:: _state_idx_map

      A map from states to matrix indices

   .. attribute:: _sampled_transition_mat

      _sampled_transition_dict as a matrix

   .. attribute:: _reward_mat

      _reward_dict as a matrix


   .. py:method:: get_transition_prob(state, next_state)

      Return the exploration probability for a (s,s') pair.

      This is sampled uniformly from one of the ensemble models

      :param state: The first state
      :param next_state: The next state

      :returns: The transition probability


   .. py:method:: get_reward(state)

      Return the reward for executing this option in a state.

      The reward is the entropy of the average minus the average entropy.

      :param state: The state we want to check

      :returns: The reward for the state


   .. py:method:: get_scxml_transitions(sf_names, policy_name)

      Return a list of SCXML transition elements for this option.

      The time state factor is not included here, that is only for PRISM to
      facilitate the finite horizon planning objective.

      :param sf_names: The list of state factor names
      :param policy_name: The name of the policy in SCXML

      :returns: A list of SCXML transition elements


   .. py:method:: get_transition_prism_string()

      Write out the PRISM string with all (sampled) transitions.

      :returns: The transition PRISM string


   .. py:method:: get_reward_prism_string()

      Write out the PRISM string with all exploration rewards.

      The reward is the entropy of the average minus the average entropy.

      :returns: The reward PRISM string