refine_plan.models.dbn_option_ensemble

A class for an ensemble of DBNOption models.

This is used for active exploration.

Author: Charlie Street Owner: Charlie Street

Classes

DBNOptionEnsemble

A class containing an ensemble of DBNOptions for active exploration.

Module Contents

class refine_plan.models.dbn_option_ensemble.DBNOptionEnsemble(name, data, ensemble_size, horizon, sf_list, enabled_cond, state_idx_map, compute_prism_str=False)

Bases: refine_plan.models.option.Option

A class containing an ensemble of DBNOptions for active exploration.

Each DBNOption in the ensemble is trained on a different subset of the data.

In _transition_dicts[i][state] or _sampled_transition_dict[state], a None value is used to represent a uniform distribution over the state space.

Same as superclass, plus

_ensemble_size: The size of the ensemble

_horizon: Number of steps in the planning horizon

_sf_list: The list of state factors that make up the state space

_enabled_cond: A Condition which is satisfied in states where the option is enabled

_enabled_states: A list of states where the option is enabled

_dbns: The ensemble (list) of DBNOptions

_transition_dicts: The corresponding transition dicts for each DBNOption.

_sampled_transition_dict: The sampled transitions

_reward_dict: The reward dictionary containing information gain values

_transition_prism_str: The transition PRISM string, cached

_reward_prism_str: The reward PRISM string, cached

_state_idx_map: A map from states to matrix indices

_sampled_transition_mat: _sampled_transition_dict as a matrix

_reward_mat: _reward_dict as a matrix

get_transition_prob(state, next_state)

Return the exploration probability for a (s,s’) pair.

This is sampled uniformly from one of the ensemble models

Parameters:

state – The first state
next_state – The next state

Returns:

The transition probability

get_reward(state)

Return the reward for executing this option in a state.

The reward is the entropy of the average minus the average entropy.

Parameters:: state – The state we want to check
Returns:: The reward for the state

get_scxml_transitions(sf_names, policy_name)

Return a list of SCXML transition elements for this option.

The time state factor is not included here, that is only for PRISM to facilitate the finite horizon planning objective.

Parameters:

sf_names – The list of state factor names
policy_name – The name of the policy in SCXML

Returns:

A list of SCXML transition elements

get_transition_prism_string()

Write out the PRISM string with all (sampled) transitions.

Returns:: The transition PRISM string

get_reward_prism_string()

Write out the PRISM string with all exploration rewards.

The reward is the entropy of the average minus the average entropy.

Returns:: The reward PRISM string