Greedy Policy for Citi Bike ================================================ In this example we demonstrate using a simple greedy policy for `Citi Bike `_, which is a real-world bike repositioning scenario. Greedy Policy ------------- Our greedy policy is simple: if the event type is supply, the policy will make the current station send as many bikes as possible to one of k stations with the most empty docks. If the event type is demand, the policy will make the current station request as many bikes as possible from one of k stations with the most bikes. We use a heap data structure to find the top k supply/demand candidates from the action scope associated with each decision event. .. code-block:: python class GreedyPolicy: ... def choose_action(self, decision_event: DecisionEvent): if decision_event.type == DecisionType.Supply: """ Find k target stations with the most empty slots, randomly choose one of them and send as many bikes to it as allowed by the action scope """ top_k_demands = [] for demand_candidate, available_docks in decision_event.action_scope.items(): if demand_candidate == decision_event.station_idx: continue heapq.heappush(top_k_demands, (available_docks, demand_candidate)) if len(top_k_demands) > self._demand_top_k: heapq.heappop(top_k_demands) max_reposition, target_station_idx = random.choice(top_k_demands) action = Action(decision_event.station_idx, target_station_idx, max_reposition) else: """ Find k source stations with the most bikes, randomly choose one of them and request as many bikes from it as allowed by the action scope. """ top_k_supplies = [] for supply_candidate, available_bikes in decision_event.action_scope.items(): if supply_candidate == decision_event.station_idx: continue heapq.heappush(top_k_supplies, (available_bikes, supply_candidate)) if len(top_k_supplies) > self._supply_top_k: heapq.heappop(top_k_supplies) max_reposition, source_idx = random.choice(top_k_supplies) action = Action(source_idx, decision_event.station_idx, max_reposition) return action Interaction with the Greedy Policy ---------------------------------- This environment is driven by `real trip history data `_ from Citi Bike. .. code-block:: python env = Env(scenario=config.env.scenario, topology=config.env.topology, start_tick=config.env.start_tick, durations=config.env.durations, snapshot_resolution=config.env.resolution) if config.env.seed is not None: env.set_seed(config.env.seed) policy = GreedyPolicy(config.agent.supply_top_k, config.agent.demand_top_k) metrics, decision_event, done = env.step(None) while not done: metrics, decision_event, done = env.step(policy.choose_action(decision_event)) env.reset() .. note:: All related code snippets are supported in `maro playground `_.