Manual installation
Overview
Due to the many options available to you for installing Kubernetes clusters, this document will not go into the specifics of setting up the cluster. Rather, it will provide you with guidance and requirements for your cluster.
Nodes
You will need to provision 2 groups of nodes:
- For training - These nodes will run the Composabl components, as well as the actual training processes
- For Simulators - These nodes will be used for Simulators exclusively.
1. Sizing
Autoscaling
If your cluster supports autoscaling, you can continue on to 2. Labels. If you need to set your cluster's sizing manually, please follow the below guide.
Manual scaling
Nodes of each group must be sized properly to be able to run at least a single workload.
For training nodes, this means you need to consider the amount of resources you will request in your training config - more specifically the resources_cpus
and resources_memory
settings of your agent. The training node must cover these in addition to the requirements for the Composabl components (which also run on the training nodes) and some small overhead for cluster components.
Nodes that will be running simulators are easier to size - they just need to cover at least a single simulator, the requirements of which are set inside the kubernetes
configuration of your agent as sim_cpu
and sim_memory
.
Lastly, you must also consider the workers
parameter, which indicates the amount of workers that will be spawned. You need to provision enough resources to cover resources_cpus
and resources_memory
for each worker. Each worker will also spawn its own simulator instance, so you must provide sufficient resources to cover these totals.
Examples
Let's consider the following minimal training config:
{
"target": {
"kubernetes": {
"image": "my-sim/image:latest",
"sim_cpu": "2",
"sim_memory": "2Gi"
}
},
"trainer": {
"workers": 2,
"resources_cpus": "2",
"resources_memory": "4Gi"
}
}
In this configuration, we have specified the requirements for both types of workloads. Here's a step-by-step guide to calculate the required node sizing for Kubernetes:
- Calculate Total Resources for Simulator Pods:
- Number of required simulators = number of workers: 2
- Each simulator pod requires:
- 2 CPU
- 2 Gi of Memory
- The total is then
- CPU: 2 workers * 2 CPU = 4 CPU
- Memory: 2 workers * 2 Gi = 4 Gi
- Calculate Total Resources for Worker Pods:
- Number of required workers = number of defined workers: 2
- Each worker pod requires:
- 2 CPU
- 4 Gi of Memory
- The total is then
- CPU: 2 workers * 2 CPU = 4 CPU
- Memory: 2 workers * 4 Gi = 8 Gi
- Add Kubernetes Overhead:
- Kubernetes requires a static amount of overhead per node. For simplicity, let's assume:
- Kubernetes CPU overhead per node: 0.5 cores, so
500m
- Kubernetes Memory overhead per node: 1Gi
- Kubernetes CPU overhead per node: 0.5 cores, so
- Kubernetes requires a static amount of overhead per node. For simplicity, let's assume:
- For training nodes only: add Composabl system components overhead
- To run Composabl on the cluster, you need some additional resources:
- 2 CPU
- 4 Gi of memory
- This is in total, not per node
- To run Composabl on the cluster, you need some additional resources:
- Combine everything:
- Simulators:
- 4 CPU + 0.5 CPU overhead: 4.5 CPU
- 4 Gi + 1 Gi: 5 Gi
- Training:
- 4 CPU + 0.5 CPU (kubernetes) + 2 (Composabl): 6.5 CPU
- 8 Gi + 1 Gi (Kubernetes) + 4 Gi (Composabl): 13 Gi
- Simulators:
This example assumes you want to use a single simulator and a single training node. In a real-world scenario, you likely want to split the workload over multiple nodes.
In that case, you need to make sure that:
- Each node can accommodate at least one pod with its respective resources.
- Each node must also have additional resources to cover the Kubernetes overhead.
2. Labels
Both groups of nodes must be labeled accordingly. Training nodes must be labeled as agentpool: train
. Simulator nodes must be labeled as agentpool: sims
. You may be able to define this during your cluster setup, but if not, use the following commands:
kubectl label node <my-training-node> agentpool=train --overwrite
kubectl label node <my-simulator-node> agentpool=sims --overwrite
Replace the values in between <>
with the name of the nodes you'd like to assign to a specific pool.
Storage
The components also need access to (semi)persistent, shared storage. This section will detail the types and amount of storage needed.
It needs the following PersistentVolumeClaim
s in the composabl-train
namespace:
pvc-controller-data
with a size of ±1Gi
andReadWriteOnce
(or better)accessMode
When using Azure, you will need to set thenobrl
mountOption for this PVC, as this is required for the Composabl controller to function.pvc-training-results
with a suitable size - this is where your final agent data will be stored before it is uploaded to the No-code application. It needsaccessmode
to beReadWriteMany
(RWX). A good initial size is to matchhistorian-tmp
.historian-tmp
is used as temporary storage for historian data. It needs to have anaccessMode
ofReadWriteOnce
and the size will depend on the length of your training sessions. We recommend starting with5Gi
.
The size of pvc-training-results
and historian-tmp
is dependent on the amount and size of training jobs you want to run simultaneously on your cluster. If you plan on running long-lived training sessions with many cycles, you may want to increase the capacity for both,
Private image registry
If you want to use a private registry for simulator images, you will need to set up this private registry yourself, and make sure the cluster is able to pull images from this registry.
Next steps
Once your cluster is running, and you have verified your setup is working, you can continue to Installing Composabl