refine_plan.models.dbn_option_ensemble
A class for an ensemble of DBNOption models.
This is used for active exploration.
Author: Charlie Street Owner: Charlie Street
Classes
A class containing an ensemble of DBNOptions for active exploration. |
Module Contents
- class refine_plan.models.dbn_option_ensemble.DBNOptionEnsemble(name, data, ensemble_size, horizon, sf_list, enabled_cond, state_idx_map, compute_prism_str=False)
Bases:
refine_plan.models.option.OptionA class containing an ensemble of DBNOptions for active exploration.
Each DBNOption in the ensemble is trained on a different subset of the data.
In _transition_dicts[i][state] or _sampled_transition_dict[state], a None value is used to represent a uniform distribution over the state space.
- Same as superclass, plus
- _ensemble_size
The size of the ensemble
- _horizon
Number of steps in the planning horizon
- _sf_list
The list of state factors that make up the state space
- _enabled_cond
A Condition which is satisfied in states where the option is enabled
- _enabled_states
A list of states where the option is enabled
- _dbns
The ensemble (list) of DBNOptions
- _transition_dicts
The corresponding transition dicts for each DBNOption.
- _sampled_transition_dict
The sampled transitions
- _reward_dict
The reward dictionary containing information gain values
- _transition_prism_str
The transition PRISM string, cached
- _reward_prism_str
The reward PRISM string, cached
- _state_idx_map
A map from states to matrix indices
- _sampled_transition_mat
_sampled_transition_dict as a matrix
- _reward_mat
_reward_dict as a matrix
- get_transition_prob(state, next_state)
Return the exploration probability for a (s,s’) pair.
This is sampled uniformly from one of the ensemble models
- Parameters:
state – The first state
next_state – The next state
- Returns:
The transition probability
- get_reward(state)
Return the reward for executing this option in a state.
The reward is the entropy of the average minus the average entropy.
- Parameters:
state – The state we want to check
- Returns:
The reward for the state
- get_scxml_transitions(sf_names, policy_name)
Return a list of SCXML transition elements for this option.
The time state factor is not included here, that is only for PRISM to facilitate the finite horizon planning objective.
- Parameters:
sf_names – The list of state factor names
policy_name – The name of the policy in SCXML
- Returns:
A list of SCXML transition elements
- get_transition_prism_string()
Write out the PRISM string with all (sampled) transitions.
- Returns:
The transition PRISM string
- get_reward_prism_string()
Write out the PRISM string with all exploration rewards.
The reward is the entropy of the average minus the average entropy.
- Returns:
The reward PRISM string