Skip to content

📣 Release Notes - 0.6.2 (Latest)

0.6.2 might sound as a small release, but it's packed with some exciting features and improvements. We've added alpha support for Coordinated Skills (also known as Multi-Agent Training)

⏰ For some statistics! This release contains 129 changed files with 3,990 additions and 2,720 deletions.

Note: Coordinated Skills is currently in Alpha and we expect the spec to change as we gather feedback from the community.

💖 Improvements

Coordinated Skills

One of the most exciting additions this release is that we've added support for Coordinated Skills (also known as Multi-Agent Training - MARL).

What are Coordinated Skills?

As a sub-field in the Reinforcement Learning community, this focuses on the behavior of multiple agents in a shared environment, allowing those agents to learn to coordinate their actions with each other. This is useful for training agents that need to coordinate with each other to solve a task. This feature is currently in beta and we are working on improving it further.

Example Use Cases

  • Traffic Optimization: Enhancing traffic flow and safety by teaching individual vehicles to navigate optimally and cooperate with each other.
  • Collaborative Robotics: Enabling robots to work together on tasks such as assembly in manufacturing or coordination in logistics.
  • Smart Grids: Optimizing energy distribution by having agents represent power plants, storage, and consumers to improve efficiency and stability.
  • Multiplayer Games: Creating adaptive and intelligent NPCs that can offer dynamic challenges to players in competitive or cooperative game settings.
  • Communication Networks: Improving network performance by optimizing resource allocation and traffic routing through agents representing network components.
  • Environmental Management: Balancing economic, ecological, and social goals in land use and resource management by simulating stakeholders as agents.
  • Healthcare Logistics: Strategizing resource allocation and treatment plans in scenarios like pandemics by considering the actions of hospitals, pharmacies, and patients as agents.
  • Supply Chain Optimization: Minimizing costs and delivery times in supply chains by coordinating agents representing various stages of the supply chain process.

How to Use

We have expanded the API to integrate Coordinated Skills through the add_coordinated_skill method on your agent. This method accepts a new class that gets configured named CoordinatedSkill, just as with the Teacher or Controller classes we implement this class by inheriting from the Coach class.

The coordinated skill will now take the incoming observation and action spaces and pass it to the sub-skills as a shared environment observation and action taking. The sub-skills will then return their own observations and actions, which will be passed back to the coordinated skill. The coordinated skill will then return the combined observations and actions to the agent.

python
# ####################################################################################################
# Define the Coordinated Coach
# ####################################################################################################
class CoordinatedCoach(Coach):
    def __init__(self):
        self.counter = 0

    def compute_reward(self, transformed_obs, action, sim_reward):
        """
        Computes the reward for the given transformed observation and action
        :param transformed_obs: The transformed observation
        :param action: The actions dict
        :param sim_reward: The reward from the simulation
        :return: The reward, as a dictionary, with each key the sub-skill name and the value the reward
        """
        self.counter += 1
        return 1

    def compute_success_criteria(self, transformed_obs, action):
        # keep the episodes short to make testing quicker
        return self.counter > 100

    def compute_termination(self, transformed_obs, action):
        # keep the episodes short to make testing quicker
        return self.counter > 150

    def transform_action(self, composabl_obs, action):
        return action


# ####################################################################################################
# Construct your Agent
# ####################################################################################################
s1 = Skill("skill1", IncrementTeacher)
s2 = Skill("skill2", IncrementTeacher)

a = Agent()
a.add_coordinated_skill(CoordinatedSkill(
  "my-coordinated-skill",
  CoordinatedCoach,
  [s1, s2]
))

When now running this agent, we get the output below, illustrating the training of the 2 sub-skills in a coordinated manner.

bash
# coming soon

Configuring Custom Dependencies

You can now configure custom dependencies for your projects. This is useful for training on Kubernetes where you do not own the environment and want to specify additional packages to be installed. Currently we support packages from the PyPI repository only.

python
r = Runtime({
  "runtime": {
    "dependencies": [{
      "url": "https://pypi.org/project/composabl-core-dev/",
      "version": "0.1.0",
      "type": "pip"
    }]
  }
})

Opt-out of Telemetry

You can now opt-out of telemetry by setting the COMPOSABL_TELEMETRY_OPT_OUT environment variable to one of the following: y, 1, True.

bash
export COMPOSABL_TELEMETRY_OPT_OUT=1

This disables the sending of telemetry and thus the outgoing connection as required in some enterprise security regulations.

Misc

  • Send status updates and heartbeats to the Kubernetes controller

🐛 Bug Fix

  • We tried applying action masking even if it was not supported by the teacher. This has been fixed now.
  • Fixed an issue in serialization of the Scenario class.
  • Fixed an issue in serializing local modules
  • Fixed an issue with reloading checkpoints
  • Fixed discrete scenarios
  • Fixed action masking for discrete scenarios