What is MARO?

MARO

Multi-Agent Resource Optimization (MARO) platform is an instance of Reinforcement learning as a Service (RaaS) for real-world resource optimization. It can be applied to many important industrial domains, such as container inventory management in logistics, bike repositioning in transportation, virtual machine provisioning in data centers, and asset management in finance. Besides Reinforcement Learning (RL), it also supports other planning/decision mechanisms, such as Operations Research.

Key Components

_images/maro_overview.svg
  • Simulation toolkit: it provides some predefined scenarios, and the reusable wheels for building new scenarios.

  • RL toolkit: it provides a full-stack abstraction for RL, such as agent manager, agent, RL algorithms, learner, actor, and various shapers.

  • Distributed toolkit: it provides distributed communication components, interface of user-defined functions for message auto-handling, cluster provision, and job orchestration.

Quick Start

from maro.simulator import Env
from maro.simulator.scenarios.cim.common import Action, ActionType, DecisionEvent

from random import randint

# Initialize an Env for cim scenario
env = Env(scenario="cim", topology="toy.5p_ssddd_l0.0", start_tick=0, durations=100)

metrics: object = None
decision_event: DecisionEvent = None
is_done: bool = False
action: Action = None

# Start the env with a None Action
metrics, decision_event, is_done = env.step(None)

while not is_done:
    # Generate a random Action according to the action_scope in DecisionEvent
    action_scope = decision_event.action_scope
    to_discharge = action_scope.discharge > 0 and randint(0, 1) > 0

    action = Action(
        decision_event.vessel_idx,
        decision_event.port_idx,
        randint(0, action_scope.discharge if to_discharge else action_scope.load),
        ActionType.DISCHARGE if to_discharge else ActionType.LOAD
    )

    # Respond the environment with the generated Action
    metrics, decision_event, is_done = env.step(action)

Contents