Greedy Policy for Citi Bike¶
In this example we demonstrate using a simple greedy policy for Citi Bike, which is a real-world bike repositioning scenario.
Greedy Policy¶
Our greedy policy is simple: if the event type is supply, the policy will make the current station send as many bikes as possible to one of k stations with the most empty docks. If the event type is demand, the policy will make the current station request as many bikes as possible from one of k stations with the most bikes. We use a heap data structure to find the top k supply/demand candidates from the action scope associated with each decision event.
class GreedyPolicy:
...
def choose_action(self, decision_event: DecisionEvent):
if decision_event.type == DecisionType.Supply:
"""
Find k target stations with the most empty slots, randomly choose one of them and send as many bikes to
it as allowed by the action scope
"""
top_k_demands = []
for demand_candidate, available_docks in decision_event.action_scope.items():
if demand_candidate == decision_event.station_idx:
continue
heapq.heappush(top_k_demands, (available_docks, demand_candidate))
if len(top_k_demands) > self._demand_top_k:
heapq.heappop(top_k_demands)
max_reposition, target_station_idx = random.choice(top_k_demands)
action = Action(decision_event.station_idx, target_station_idx, max_reposition)
else:
"""
Find k source stations with the most bikes, randomly choose one of them and request as many bikes from
it as allowed by the action scope.
"""
top_k_supplies = []
for supply_candidate, available_bikes in decision_event.action_scope.items():
if supply_candidate == decision_event.station_idx:
continue
heapq.heappush(top_k_supplies, (available_bikes, supply_candidate))
if len(top_k_supplies) > self._supply_top_k:
heapq.heappop(top_k_supplies)
max_reposition, source_idx = random.choice(top_k_supplies)
action = Action(source_idx, decision_event.station_idx, max_reposition)
return action
Interaction with the Greedy Policy¶
This environment is driven by real trip history data from Citi Bike.
env = Env(scenario=config.env.scenario, topology=config.env.topology, start_tick=config.env.start_tick,
durations=config.env.durations, snapshot_resolution=config.env.resolution)
if config.env.seed is not None:
env.set_seed(config.env.seed)
policy = GreedyPolicy(config.agent.supply_top_k, config.agent.demand_top_k)
metrics, decision_event, done = env.step(None)
while not done:
metrics, decision_event, done = env.step(policy.choose_action(decision_event))
env.reset()
Note
All related code snippets are supported in maro playground.