refine_plan.models.dbn_option_ensemble ====================================== .. py:module:: refine_plan.models.dbn_option_ensemble .. autoapi-nested-parse:: A class for an ensemble of DBNOption models. This is used for active exploration. Author: Charlie Street Owner: Charlie Street Classes ------- .. autoapisummary:: refine_plan.models.dbn_option_ensemble.DBNOptionEnsemble Module Contents --------------- .. py:class:: DBNOptionEnsemble(name, data, ensemble_size, horizon, sf_list, enabled_cond, state_idx_map, compute_prism_str=False) Bases: :py:obj:`refine_plan.models.option.Option` A class containing an ensemble of DBNOptions for active exploration. Each DBNOption in the ensemble is trained on a different subset of the data. In _transition_dicts[i][state] or _sampled_transition_dict[state], a None value is used to represent a uniform distribution over the state space. .. attribute:: Same as superclass, plus .. attribute:: _ensemble_size The size of the ensemble .. attribute:: _horizon Number of steps in the planning horizon .. attribute:: _sf_list The list of state factors that make up the state space .. attribute:: _enabled_cond A Condition which is satisfied in states where the option is enabled .. attribute:: _enabled_states A list of states where the option is enabled .. attribute:: _dbns The ensemble (list) of DBNOptions .. attribute:: _transition_dicts The corresponding transition dicts for each DBNOption. .. attribute:: _sampled_transition_dict The sampled transitions .. attribute:: _reward_dict The reward dictionary containing information gain values .. attribute:: _transition_prism_str The transition PRISM string, cached .. attribute:: _reward_prism_str The reward PRISM string, cached .. attribute:: _state_idx_map A map from states to matrix indices .. attribute:: _sampled_transition_mat _sampled_transition_dict as a matrix .. attribute:: _reward_mat _reward_dict as a matrix .. py:method:: get_transition_prob(state, next_state) Return the exploration probability for a (s,s') pair. This is sampled uniformly from one of the ensemble models :param state: The first state :param next_state: The next state :returns: The transition probability .. py:method:: get_reward(state) Return the reward for executing this option in a state. The reward is the entropy of the average minus the average entropy. :param state: The state we want to check :returns: The reward for the state .. py:method:: get_scxml_transitions(sf_names, policy_name) Return a list of SCXML transition elements for this option. The time state factor is not included here, that is only for PRISM to facilitate the finite horizon planning objective. :param sf_names: The list of state factor names :param policy_name: The name of the policy in SCXML :returns: A list of SCXML transition elements .. py:method:: get_transition_prism_string() Write out the PRISM string with all (sampled) transitions. :returns: The transition PRISM string .. py:method:: get_reward_prism_string() Write out the PRISM string with all exploration rewards. The reward is the entropy of the average minus the average entropy. :returns: The reward PRISM string