refine_plan.models.dbn_option_ensemble

A class for an ensemble of DBNOption models.

This is used for active exploration.

Author: Charlie Street Owner: Charlie Street

Classes

DBNOptionEnsemble

A class containing an ensemble of DBNOptions for active exploration.

Module Contents

class refine_plan.models.dbn_option_ensemble.DBNOptionEnsemble(name, data, ensemble_size, horizon, sf_list, enabled_cond, state_idx_map, compute_prism_str=False)

Bases: refine_plan.models.option.Option

A class containing an ensemble of DBNOptions for active exploration.

Each DBNOption in the ensemble is trained on a different subset of the data.

In _transition_dicts[i][state] or _sampled_transition_dict[state], a None value is used to represent a uniform distribution over the state space.

Same as superclass, plus
_ensemble_size

The size of the ensemble

_horizon

Number of steps in the planning horizon

_sf_list

The list of state factors that make up the state space

_enabled_cond

A Condition which is satisfied in states where the option is enabled

_enabled_states

A list of states where the option is enabled

_dbns

The ensemble (list) of DBNOptions

_transition_dicts

The corresponding transition dicts for each DBNOption.

_sampled_transition_dict

The sampled transitions

_reward_dict

The reward dictionary containing information gain values

_transition_prism_str

The transition PRISM string, cached

_reward_prism_str

The reward PRISM string, cached

_state_idx_map

A map from states to matrix indices

_sampled_transition_mat

_sampled_transition_dict as a matrix

_reward_mat

_reward_dict as a matrix

get_transition_prob(state, next_state)

Return the exploration probability for a (s,s’) pair.

This is sampled uniformly from one of the ensemble models

Parameters:
  • state – The first state

  • next_state – The next state

Returns:

The transition probability

get_reward(state)

Return the reward for executing this option in a state.

The reward is the entropy of the average minus the average entropy.

Parameters:

state – The state we want to check

Returns:

The reward for the state

get_scxml_transitions(sf_names, policy_name)

Return a list of SCXML transition elements for this option.

The time state factor is not included here, that is only for PRISM to facilitate the finite horizon planning objective.

Parameters:
  • sf_names – The list of state factor names

  • policy_name – The name of the policy in SCXML

Returns:

A list of SCXML transition elements

get_transition_prism_string()

Write out the PRISM string with all (sampled) transitions.

Returns:

The transition PRISM string

get_reward_prism_string()

Write out the PRISM string with all exploration rewards.

The reward is the entropy of the average minus the average entropy.

Returns:

The reward PRISM string