Teaching Skills
Skills are the building blocks of intelligent autonomous agents. Teaching allows you to define these AI building blocks so that your agent can succeed at complex tasks in dynamic conditions.
Teacher Python Class for Teaching Agents
The Composabl SDK provides a Python class with a series of functions that you will use to teach your agent how to succeed at each task.
In the teaching class you can store variables from previous decision cycles that you need for teaching.
Here is an example of an instance of the Teaching Python Class.
class IncrementTeacher(Teacher):
def __init__(self):
self.past_obs = None
self.counter = 0
# This function transforms any sensor variable.
def transform_obs(self, obs, action):
return obs
# This function transforms any action variable.
def transform_action(self, transformed_obs, action):
return action
# The reward provides a step by step signal about how effective each agent action is.
def compute_reward(self, transformed_obs, action, sim_reward):
self.counter += 1
if self.past_obs is None:
self.past_obs = transformed_obs
return 0
else:
if self.past_obs["state1"] < transformed_obs["state1"]:
return 1
else:
return -1
# Termination critera determines when to stop an episode during training.
def compute_termination(self, transformed_obs, action):
return False
# The action mask provides rules at each step about which actions the agent is allowed to take.
def compute_action_mask(self, transformed_obs, action):
return None
# Success criteria influences when skills are well trained.
def compute_success_criteria(self, transformed_obs, action):
return self.counter > 100
# This function chooses which sensor variables the skill sees as feedback.
def filtered_observation_space(self):
return ["state1"]
Goals
Goals are how your agent learns to improves its performance. Reward your agent for good decisions, penalize it for poor decisions, and stop the scenario it's practicing when it fails or succeeds.
Each skill in your agent succeeds as it approaches a specific goal. The goals of each skill should be clean and simple. If your agent is designed well, based on a good breakdown of the task into skills, each skill will have a clear goal.
Learn more about setting goals in Designing Autonomous AI.
There are three functions in the Teacher Python Class that relate to goals: the reward function, the termination function, and the success criteria function.
compute_reward
Function
The compute_reward
function provides the bulk of the feedback after each agent action about how much that action contributed to the success of the skill. This function returns a number that represents the reward signal the agent will receive for its last decision. Reward Functions, as they are called in Reinforcement Learning, can be tricky to craft. If you'd like more information on how to write good reward functions, see: https://medium.com/@BonsaiAI/deep-reinforcement-learning-models-tips-tricks-for-writing-reward-functions-a84fe525e8e0
def compute_reward(self, transformed_obs, action, sim_reward):
self.counter += 1
if self.past_obs is None:
self.past_obs = transformed_obs
return 0
else:
if self.past_obs["state1"] < transformed_obs["state1"]:
return 1
else:
return -1
compute_termination
Function
The compute_termination
function tells the Composabl platform when to terminate a practice episode and start over with a new practice scenario (episode). From a teaching perspective, it makes most senses to terminate an episode when the agent succeeds, fails, or is pursuing a course of action that you do not find likely to succeed. This function returns a Boolean flag (True
or False
) whether to terminate the episode. You can calculate this criteria however seems best.
def compute_termination(self, transformed_obs, action):
return False
compute_success_criteria
Function
The success_criteria
function provides a definition of skill success and a proxy for how completely the agent has learned the skill. The platform uses the output of this function (True
or False
) to calculate when to stop training one skill and move on to training the next skill. It is also used to determine when to move to the next skill in a fixed order sequence. The agent cannot move from one skill in a fixed order sequence to the next, until the success criteria for one skill is reached.
def compute_success_criteria(self, transformed_obs, action):
return self.counter > 100
Here are some examples of success criteria definition:
- A simple but naive success criteria might return
True
if the average reward for an episode or scenario crosses a threshold, butFalse
if it does not. - A more complex success criteria might calculate root mean squared error (RMSE) for key variables across the episode and return
True
if the error is less than a customer specified benchmark, butFalse
otherwise. - A complex success criteria might compare a benchmark controller or another agent to the agent across many key variables and trials. It returns
True
if the agent beats the benchmark on this criteria, butFalse
otherwise.
Guiding Behavior with Rules
Just like rules guide training and behavior for humans, providing rules for the agent to follow can guide agent decision-making more quickly to success. Rules guide the behavior of an agent based on expertise and constraints. The compute_action_mask
teaching function expresses rules that trainable agents must follow.
# The action mask provides rules at each step about which actions the agent is allowed to take.
def compute_action_mask(self, transformed_obs, action):
return [0, 1, 1]
WARNING
The compute_action_mask
teaching function works only for discrete action spaces (where the actions are integers or categories), not for continuous action spaces (where decision actions are decimal numbers). If you specify a mask for a skill whose actions are continuous, the platform will ignore the action mask.
The function returns a list of 0 and 1 values. Zero means that the action is forbidden by the rule. One means that the action is allowed by the rule. The function may change returned value after each decision. This allows complex logic to express nuanced rules.
In the example above, the first action is forbidden for the next decision, but the second and third actions are allowed. The logic in the skill itself (whether learned or programmed) will choose between the allowed second and third actions.
TIP
All selectors have a discrete action space (they choose which child skill to activate), so you can always apply the compute_action_mask
function to teach them.
Managing Information Inside Agents
As information passes through perceptors, skills, and selectors in the agent, sometimes it needs to change format along the way. You can use three teaching functions to transform sensor and action variables inside agents: transform_obs
, transform_action
, and filtered_observation_space
.
Transforming Sensor Variables
To transform sensor variables, use the transform_obs
function to calculate changes to specific sensors, then return the complete set of sensor variables (the observation space).
def transform_obs(self, obs, action):
return obs
Two of the most common reasons for transforming sensor variables are conversion and normalizaiton. For example, if a simulator reports temperature values in Farenheit, but the agent expects temperature values in Celsius, use the transform_obs
function to convert between the two.
Normalization is when you transform variables into different ranges. For example, one sensor variable in your agent might have very large values (in the thousands), but another variable might have small values (in the tenths), so you might use the transform_obs
function to transform these disparate sensor values to a range from 0 to 1 so that they can better be compared and used in the agent.
Transforming Decisions within the Agent
You may want to transform action variables for the same reasons as sensor variables.
def transform_action(self, transformed_obs, action):
return action
Filtering the Sensor List
Use the filtered_observation_space
function to pare down the list of sensor variables you need for a particular skill. Pass only the information that a skill or module needs in order to learn or perform well.
def filtered_observation_space(self):
return ["state1"]
Return a list of all the sensor variables that you want passed to the skill by this teacher.