Device Management

ScalingPolicy

class ScalingPolicy : public Object

Abstract base class for scaling policies.

ScalingPolicy decides the target performance state for a backend accelerator based on per-backend state.

Subclassed by ns3::ConservativeScalingPolicy, ns3::UtilizationScalingPolicy

Public Functions

virtual Ptr<ScalingDecision> Decide(uint32_t backendIdx, const ClusterState::BackendState &backend) = 0

Decide the target performance state for a backend.

Parameters:
  • backendIdx – The backend index in the cluster.

  • backend – The backend’s current state.

Returns:

A scaling decision, or nullptr if no change is needed.

virtual std::string GetName() const = 0

Get the policy name for logging.

Returns:

A string identifying this policy type.

Public Static Functions

static TypeId GetTypeId()

Get the type ID.

Returns:

The object TypeId.

UtilizationScalingPolicy

class UtilizationScalingPolicy : public ns3::ScalingPolicy

Utilization-based DVFS scaling policy inspired by Linux ondemand governor.

Simple binary policy using the accelerator’s performance state table:

  • Busy or queued tasks -> highest performance state (aggressive scale-up for latency)

  • Idle -> lowest performance state (energy savings)

Public Functions

virtual Ptr<ScalingDecision> Decide(uint32_t backendIdx, const ClusterState::BackendState &backend) override

Decide the target performance state for a backend.

Parameters:
  • backendIdx – The backend index in the cluster.

  • backend – The backend’s current state.

Returns:

A scaling decision, or nullptr if no change is needed.

virtual std::string GetName() const override

Get the policy name for logging.

Returns:

A string identifying this policy type.

Public Static Functions

static TypeId GetTypeId()

Get the type ID.

Returns:

The object TypeId.

ConservativeScalingPolicy

class ConservativeScalingPolicy : public ns3::ScalingPolicy

Work-proportional DVFS policy with conservative one-OPP stepping.

Computes a target OPP proportional to the estimated drain time of pending compute work (remaining FLOPS / nominal compute rate), then steps one OPP at a time toward that target. The TargetDrainTime attribute sets the drain time that maps to the highest OPP. Handles heterogeneous workloads naturally since heavy tasks contribute more FLOPS than lightweight ones.

Public Functions

virtual Ptr<ScalingDecision> Decide(uint32_t backendIdx, const ClusterState::BackendState &backend) override

Decide the target performance state for a backend.

Parameters:
  • backendIdx – The backend index in the cluster.

  • backend – The backend’s current state.

Returns:

A scaling decision, or nullptr if no change is needed.

virtual std::string GetName() const override

Get the policy name for logging.

Returns:

A string identifying this policy type.

Public Static Functions

static TypeId GetTypeId()

Get the type ID.

Returns:

The object TypeId.

DeviceProtocol

class DeviceProtocol : public Object

Abstract base class for device metrics protocols.

DeviceProtocol encapsulates how accelerator metrics are serialized into packets and parsed back into DeviceMetrics objects. Each accelerator type provides its own concrete protocol implementation.

Subclassed by ns3::GpuDeviceProtocol

Public Functions

virtual Ptr<Packet> CreateMetricsPacket(Ptr<const Accelerator> accel) = 0

Serialize accelerator state into a metrics packet.

Called by the server application on task lifecycle events.

Parameters:

accel – The accelerator whose state is read.

Returns:

A packet containing the serialized metrics header.

virtual Ptr<DeviceMetrics> ParseMetrics(Ptr<Packet> packet) = 0

Parse a metrics packet into a DeviceMetrics object.

Called by the DeviceManager when a type-4 packet arrives.

Parameters:

packet – The packet containing the metrics header.

Returns:

The parsed DeviceMetrics.

virtual std::string GetName() const = 0

Get the name of this device protocol.

Returns:

A string identifying the protocol.

Public Static Functions

static TypeId GetTypeId()

Get the type ID.

Returns:

The object TypeId.

GpuDeviceProtocol

class GpuDeviceProtocol : public ns3::DeviceProtocol

Concrete DeviceProtocol for GPU accelerators.

Serializes metrics using DeviceMetricsHeader (type 4) and parses received metrics packets into DeviceMetrics objects.

Public Functions

virtual Ptr<Packet> CreateMetricsPacket(Ptr<const Accelerator> accel) override

Serialize accelerator state into a metrics packet.

Called by the server application on task lifecycle events.

Parameters:

accel – The accelerator whose state is read.

Returns:

A packet containing the serialized metrics header.

virtual Ptr<DeviceMetrics> ParseMetrics(Ptr<Packet> packet) override

Parse a metrics packet into a DeviceMetrics object.

Called by the DeviceManager when a type-4 packet arrives.

Parameters:

packet – The packet containing the metrics header.

Returns:

The parsed DeviceMetrics.

virtual std::string GetName() const override

Get the name of this device protocol.

Returns:

A string identifying the protocol.

Public Static Functions

static TypeId GetTypeId()

Get the type ID.

Returns:

The object TypeId.

DeviceManager

class DeviceManager : public Object

Manages performance scaling for backend accelerators in the orchestrator.

DeviceManager is a concrete component of Orchestrator. It evaluates a pluggable ScalingPolicy using per-backend state from ClusterState, and sends ScalingCommandHeader packets to backends via the orchestrator’s worker ConnectionManager.

Public Types

typedef void (*StateChangedTracedCallback)(uint32_t backendIdx, uint32_t oldStateIdx, uint32_t newStateIdx)

TracedCallback signature for performance state change events.

Param backendIdx:

The backend index.

Param oldStateIdx:

The previous performance state index.

Param newStateIdx:

The new performance state index.

Public Functions

void Start(const Cluster &cluster, Ptr<ConnectionManager> backendCm, ClusterState &state)

Initialize the device manager with a cluster and connection manager.

Must be called before HandleMetrics() or EvaluateScaling().

Parameters:
  • cluster – The backend cluster.

  • backendCm – The backend connection manager for sending commands.

  • state – The cluster state to initialize performance states and backend compute capabilities.

void HandleMetrics(Ptr<Packet> packet, uint32_t backendIdx, ClusterState &state)

Store metrics received from a backend.

Called by Orchestrator when a type-4 packet arrives.

Parameters:
  • packet – The metrics packet (DeviceMetricsHeader).

  • backendIdx – The backend index in the cluster.

  • state – The cluster state to update with parsed metrics.

bool TryConsumeMetrics(Ptr<Packet> buffer, const Address &from, ClusterState &state)

Try to consume a device metrics message from a receive buffer.

Peeks at the first byte of the buffer. If it is a metrics message and enough data is available, the message is consumed (removed from the buffer), parsed, and stored in ClusterState.

Parameters:
  • buffer – The receive buffer (modified in-place if consumed).

  • from – The backend address (used to resolve backend index).

  • state – The cluster state to update.

Returns:

true if a metrics message was consumed, false otherwise.

void EvaluateScaling(ClusterState &state)

Evaluate scaling decisions for all backends.

Called by Orchestrator on task events. For each backend, runs ScalingPolicy::Decide() and sends command packets if the performance state changed.

Parameters:

state – The cluster state with per-backend load and metrics.

Public Static Functions

static TypeId GetTypeId()

Get the type ID.

Returns:

The object TypeId.