Skip to main content

Architecture

OpenRA-RL connects three components through a gRPC bridge:

System Architecture

LLM Agent → MCP Server → Python Backend → C# Game Engine

LLM Agent
Claude / GPT / local model
tool_calls
LLM
Sends tool_use commands
Legend
AI / Agent layer
MCP Server
Python Backend
C# Game Engine
gRPC Bridge
Data / State
MCP tool calls
OpenEnv Server
MCP protocol, port 8000
dispatches to tools
Game Tools
get_game_state
get_observation
step
reset
get_map_info
Combat Tools
attack_move
force_attack
guard
patrol
retreat
scatter
Economy Tools
harvest
build_structure
train_unit
set_rally_point
sell
Python API calls
OpenRAEnvironment
Gymnasium-style API
Game Store
episode state
BridgeClient
gRPC client
ProcessManager
daemon lifecycle
OpenRAEnvironmentBridgeClientgRPC calls
OpenRAEnvironmentProcessManagerspawn/kill
gRPC over localhost
ExternalBotBridge
IBot, ITick trait
RLBridgeService
gRPC service impl
RLBridgeServiceExternalBotBridgeroute by session_id
ActionHandler
commands → Orders
process actions
ObservationSerializer
World → Protobuf
read state
issue Orders
read state
OpenRA World
actors, traits, orders
Game Loop
tick cycle

Three-Repo Design

RepositoryLanguageRole
OpenRA-RLPythonEnvironment wrapper, gRPC client, agent examples
OpenRA (submodule)C#Modified game engine with embedded gRPC server
OpenEnvPythonFramework providing standardized Gymnasium-style APIs

Data Flow

Agent (Python)
↕ HTTP/WebSocket (OpenEnv protocol, port 8000)
Environment Wrapper (FastAPI)
↕ gRPC/Protobuf (bidirectional streaming, port 9999)
Game Engine (OpenRA + Kestrel)
↕ Native game logic
OpenRA World (C# actors, traits, orders)

gRPC Bridge

The bridge is embedded inside the OpenRA game engine using ASP.NET Core Kestrel. Key design decisions:

  • Static ActiveBridge pattern: The gRPC server starts once (in Activate()), while a static reference is updated each time the mod reloads. This avoids port conflicts from multiple server instances.
  • DropOldest channels: The game ticks at ~25 ticks/sec independently of the agent. Observation channels use a "drop oldest" policy so slow agents always receive the latest state.
  • Non-blocking ticks: The game never blocks waiting for the agent. If no action arrives, the game continues with a no-op.

Key C# Files

FilePurpose
ExternalBotBridge.csMain trait (IBot, ITick), Kestrel gRPC server
RLBridgeService.csgRPC service implementation
ObservationSerializer.csWorld/Actor/Player state → Protobuf
ActionHandler.csProtobuf commands → OpenRA Orders

Key Python Files

FilePurpose
bridge_client.pyAsync gRPC client with background observation reader
openra_environment.pyOpenEnv Environment (reset/step/state)
openra_process.pySubprocess manager for game engine
models.pyPydantic models for observations and actions

Protobuf Schema

The canonical schema lives at proto/rl_bridge.proto and defines:

  • GameObservation — Tick-level state: economy, military, units, buildings, spatial map, episode signals
  • AgentAction — List of commands to execute
  • GameState — High-level game phase query

The proto is compiled to both Python (gRPC stubs) and C# (pre-generated for Docker/CI compatibility).

Async Event Queue

The game engine and the agent run at fundamentally different speeds — the game ticks at ~25 Hz while an LLM agent might take 2+ seconds per decision. The async event queue design solves this mismatch using .NET's System.Threading.Channels with bounded, non-blocking semantics.

Async Event Queue Design

Non-blocking channels decouple game ticks from agent I/O · DropOldest ensures freshness

Never block
Game thread
ticks independently
DropOldest
Channel policy
stale data discarded
~25 tps
Tick rate
game produces fast
Bounded
Memory
fixed-size channels
World.Tick()
~25 ticks/sec
ObservationSerializer
World state → Protobuf
write obs
ActionHandler
Protobuf → OpenRA Orders
drain actions
Each tick:
1. Serialize observation
2. Write to obs channel
3. Read all pending actions
4. Execute as Orders
game writes
BoundedChannel<GameObservation>
obs(t)
capacity = 1
DropOldest
obs(t-1) dropped
gRPC reads
game drains
BoundedChannel<AgentAction>
cmd₁
cmd₂
...
 
 
 
capacity = 16
DropOldest
gRPC writes
Why DropOldest?
The game ticks at ~25/sec independently. A slow agent (LLM thinking for 2s) would miss ~50 ticks. With DropOldest, the agent always sees the latest state, not a queue of stale observations. The game never blocks waiting for a slow reader.
RLBridgeService
bidirectional stream
Two async tasks:
ObsSender — reads channel, sends to agent
ActionReceiver — reads agent, writes channel
Agent (Python)
LLM / RL policy
Agent perspective:
Sees latest state on every read
Missed ticks are invisible
No backpressure on agent
Timing: Fast Agent vs Slow Agent
Fast Agent (~40ms/step)
tick 100obs→ agent reads immediately
tick 101obs→ agent reads immediately
No observations dropped — agent keeps up with game
Slow Agent (~2s/step, e.g. LLM)
tick 100obs→ agent busy thinking...
tick 101-149dropped (DropOldest)
tick 150obs→ agent reads latest
Agent skips to current state — no stale queue buildup

Channel Design

Each session maintains two bounded channels:

ChannelTypeCapacityPolicyWriterReader
ObservationBoundedChannel<GameObservation>1DropOldestGame threadgRPC stream
ActionBoundedChannel<AgentAction>16DropOldestgRPC streamGame thread
// Observation channel: capacity=1, latest overwrites stale
readonly Channel<GameObservation> observationChannel =
Channel.CreateBounded<GameObservation>(
new BoundedChannelOptions(1)
{
FullMode = BoundedChannelFullMode.DropOldest,
SingleWriter = true,
SingleReader = true,
});

Why This Works

  1. Game never blocks: observationChannel.Writer.TryWrite(obs) is non-blocking. If the channel is full (agent hasn't read yet), the old observation is silently replaced.
  2. Agent always sees latest state: With capacity=1 and DropOldest, the single slot always contains the most recent tick's observation. An agent waking up after 2 seconds of thinking reads tick 150, not a queue of ticks 100-150.
  3. Actions are batched: The game drains all pending actions each tick with TryRead() in a loop. Multiple commands sent between ticks are all executed together.
  4. No-op on empty: If no actions are pending, the game continues with default behavior — no stall, no error.

gRPC Stream Tasks

The RLBridgeService.GameSession RPC spawns two concurrent async tasks that bridge the channels to the gRPC stream:

ObsSender task:    channel.Reader.ReadAsync() → stream.WriteAsync()
ActionReceiver: stream.MoveNext() → channel.Writer.WriteAsync()

Both tasks exit when either the game ends or the agent disconnects, triggering cleanup via CancellationToken.

Multi-Session Architecture

The training setup runs 64 game sessions inside a single .NET process, sharing JIT-compiled code and mod data. A gRPC server routes requests by session_id to a thread pool that ticks each game forward independently.

Multi-Session Worker Pool Architecture

64 game sessions in a single .NET process · shared JIT & mod data

256ms
Reset latency
(was 5-15s)
15K
Ticks/sec
aggregate
~6 GB
RSS
64 sessions
64/64
Sessions
pass rate
Training Loop
PPO / GRPO agent
64x Environment
Single gRPC Channel
shared by all 64 envs
Request (invoke)
Response (result)
gRPC RPCs
CreateSession
DestroySession
FastAdvance
GetState
session_id
routing key
Shared ModData + JIT
loaded once, reused by all sessions
gRPC Server (Kestrel)
routes by session_id to worker pool
submit WorkItem
...N threads
BlockingCollection<WorkItem> — RESOURCE_EXHAUSTED if full
tick game forward
Session 1
OrderManager
World
BotBridge
Session 2
OrderManager
World
BotBridge
Session 3
OrderManager
World
BotBridge
Session 4
OrderManager
World
BotBridge
Session 5
OrderManager
World
BotBridge
Session 6
OrderManager
World
BotBridge
Session 7
OrderManager
World
BotBridge
Session 8
OrderManager
World
BotBridge
...up to 64 sessions
Request Flow
1
FastAdvance
gRPC call with session_id
Python client
2
Route by session_id
ConcurrentDictionary lookup
Kestrel gRPC
3
Submit WorkItem
BlockingCollection.TryAdd
Worker pool
4
Tick game forward
World.Tick() in loop until target
Worker thread
5
Return observation
TCS completes, obs serialized
gRPC response

This design replaced the original one-process-per-environment approach, cutting reset time from 5-15 seconds down to 256ms and reducing memory from ~40 GB to ~5-7 GB for 64 concurrent sessions.

Game Lifecycle

Game State Machine

Environment lifecycle · live game & replay playback paths

TIMEOUT
120 retries exhausted
abort()
CONN LOST
stream broke, abort
abort()
to CLEANUP
IDLE
environment constructed, no game running
call reset()
LAUNCHING
dotnet OpenRA.dll subprocess
spawn process
LOADING
map, rules, traits, gRPC server
start gRPC server
CONNECTING
BridgeClient retries GetState() RPC
establish session
STREAMING
GameSession RPC, bg obs reader
receive first obs
PLAYING
step() loop, recording .orarep
step()
detect game end
GAME OVER
done=True, result: win / lose / draw
close streams
CLEANUP
close bridge, kill process
loop back to IDLE (next episode)
from IDLE
load .orarep
LOADING REPLAY
parse .orarep, extract metadata
start playback
REPLAYING
ReplayConnection reads packets
next frame
consume all frames
REPLAY ENDED
all packets consumed
Legend
Python-side state
C# Game Engine state
gRPC Bridge state
Replay state
Error path
Episode loop-back
  1. Reset: Environment starts a new game map, waits for the game to initialize
  2. Planning Phase (optional): Agent studies the map and opponent before acting
  3. Game Loop: Agent receives observations, sends actions each tick
  4. Game Over: Episode ends with win/lose/draw signal

Appendix: Architecture Evolution

This section traces the evolution from the original one-process-per-session design to the current multi-session worker pool. Understanding why each decision was made is as important as understanding the final design.

A.1 Legacy Architecture (v1)

The original design was simple: each RL environment spawned a separate .NET game process.

Legacy: One Process Per Session

Each environment spawns a separate .NET process with its own ModData, JIT, and gRPC server

5-15s
Reset latency
kill + respawn + JIT
~40 GB
RSS
64 processes
64x
JIT overhead
each process re-compiles
~200
Threads
3 per process + OS
Training Loop
Port Pool
9901, 9902, ...9964
Launch Semaphore
serialize JIT startup
Env 1
port 9901
Env 2
port 9902
Env 3
port 9903
Env 4
port 9904
...x64 environments
1:1 gRPC
ModData + JIT
~100-200 MB each
OrderManager
World
gRPC Server
port 9901
ModData + JIT
~100-200 MB each
OrderManager
World
gRPC Server
port 9902
ModData + JIT
~100-200 MB each
OrderManager
World
gRPC Server
port 9903
ModData + JIT
~100-200 MB each
OrderManager
World
gRPC Server
port 9904
...x64 identical processes, each with its own ModData, JIT, gRPC server
Episode Reset Cycle (5-15 seconds)
1
Kill .NET process~100ms
2
Spawn new process~500ms
3
JIT compile all code2-5s
4
Load mod data + map1-3s
5
Start gRPC server~500ms

Why it worked at first: For 1-4 concurrent sessions, this is the simplest correct design. Process isolation guarantees zero shared state bugs. Each process has its own gRPC server on a unique port. Reset means kill + respawn.

Why it broke at 64 sessions:

ProblemImpact
64x JIT compilationEach .NET process re-JITs the same code. 2-5s per process, serialized by a semaphore to avoid CPU saturation
~40 GB RSSEach process loads ~600 MB of ModData + JIT cache. 64 copies = massive memory waste
5-15s reset latencyKill process → spawn → JIT → load mod → load map → start gRPC. For RL training at 1000+ episodes, this dominates wall time
Port pool exhaustionEach process needs a unique gRPC port. Managing 64 ports adds complexity and fragility
File descriptor limits64 processes × ~30 fds each approaches OS limits on some Linux configs

A.2 Key Architecture Decisions

Decision 1: Share ModData across sessions

Problem: ModData (game rules, sprites, traits) is ~600 MB per process and identical across all sessions.

Options considered:

  • (a) Shared memory / memory-mapped files — complex, C# interop overhead
  • (b) Single process, multiple threads sharing one ModData instance

Chosen: (b). ModData is read-only after initialization, so sharing is safe without locks. This alone saves ~35 GB for 64 sessions.

Risk: Any code that writes to ModData during gameplay would corrupt all sessions. Mitigated by the fact that OpenRA's ModData is immutable by design.

Decision 2: Worker pool instead of per-session threads

Problem: 64 dedicated threads (one per session) waste CPU when sessions are idle (waiting for agent input between FastAdvance calls).

Options considered:

  • (a) Dedicated thread per session that sleeps when idle
  • (b) .NET ThreadPool (Task.Run) for ticking
  • (c) Fixed-size worker pool with BlockingCollection

Chosen: (c).

Why not (a): 64 threads that mostly sleep still consume stack memory and OS scheduler overhead. When all 64 wake simultaneously (batch training), they thrash the CPU.

Why not (b): Task.Run puts work on the .NET ThreadPool. gRPC's Kestrel server also uses the ThreadPool. If 64 game ticks are running on pool threads, gRPC can't accept new requests — thread pool starvation. We hit this in testing: 0/16 sessions completed because gRPC handlers couldn't execute.

Why (c): Dedicated background threads (not ThreadPool) with a bounded queue. Workers only consume CPU when there's actual work. The bounded queue provides backpressure — if all workers are busy, FastAdvance returns RESOURCE_EXHAUSTED and the client retries. This prevents the system from accepting more work than it can handle.

Decision 3: Tick on worker threads, not gRPC threads

Problem: The initial worker pool attempt ticked game state inline on the gRPC handler thread (using a SemaphoreSlim to limit concurrency). This starved the gRPC thread pool.

Solution: Worker threads are completely separate from the .NET ThreadPool. The gRPC handler submits a WorkItem to the queue and awaits the TaskCompletionSource — this frees the gRPC thread to handle other requests while the worker ticks the game.

gRPC thread:   submit WorkItem → await tcs.Task → return observation
Worker thread: dequeue WorkItem → tick game → tcs.TrySetResult(true)

Decision 4: Per-session tick lock

Problem: Two concurrent FastAdvance calls for the same session could tick the same World from two worker threads simultaneously. World.Tick() mutates actors, effects, and game state — none of this is thread-safe.

Solution: SemaphoreSlim(1,1) per SessionState. The worker acquires it before ticking. A second FastAdvance for the same session queues behind the first.

Decision 5: SetLocalPauseState vs SetPauseState

Problem: Games start paused (waiting for agent to connect). The original code called world.SetPauseState(false) to unpause, which queues a PauseGame Order. But in the multi-session tick loop, this Order gets processed during the same tick cycle and immediately re-pauses the game. The game never advances past tick 3.

Solution: world.SetLocalPauseState(false) sets the pause flag directly without queuing an order. In multi-session mode, there's no network peer to synchronize with, so the order-based approach is unnecessary.

Decision 6: [ThreadStatic] on Sync.unsyncCount

Problem: Sync.RunUnsynced() uses a static unsyncCount to track reentry. With multiple worker threads, thread A increments it, thread B reads a corrupted value, and the sync check throws false positives.

Solution: [ThreadStatic] gives each thread its own counter. Default value is 0 on new threads, which is the correct initial state.

Decision 7: PerfHistory.Disabled to avoid lock contention

Problem: World.Tick() wraps actor ticking in new PerfSample("tick_actors"), which calls PerfHistory.Increment() — protected by lock (SyncRoot). With 64 sessions ticking concurrently, all workers contend on this single lock.

Solution: A volatile bool Disabled flag that short-circuits Increment() to a no-op. Set during RLSessionManager.Initialize(). PerfHistory is only useful for the UI performance overlay, which doesn't exist in headless mode.

A.3 Comparison

MetricLegacy (v1)Multi-Session (v2)Improvement
Reset latency5-15s256ms~40x faster
RSS (64 sessions)~40 GB~6 GB~7x less
JIT compilation64x (once per process)1x (shared)64x less
Threads~200 (3 per process)~20 (N workers + gRPC)~10x fewer
Aggregate ticks/sec~8K (contention)~15K (worker pool)~2x faster
Port managementPool of 64 portsSingle port, session_id routingSimpler

A.4 What We'd Do Differently

  1. Start with the multi-session design. The per-process design was "correct by isolation" but hit scaling walls quickly. If you know you'll need 16+ concurrent sessions, design for shared state from day one.

  2. Use [ThreadStatic] sparingly. It's a blunt instrument. We only needed it for Sync.unsyncCount because OpenRA's codebase assumes single-threaded access. A session-scoped context object would be cleaner.

  3. Don't trust the .NET ThreadPool for mixed workloads. gRPC and game ticking both want ThreadPool threads. Dedicated threads for the compute-heavy path (game ticking) and ThreadPool for the I/O-heavy path (gRPC) is the right separation.

  4. Instrument first. The PerfHistory lock contention was invisible until we benchmarked at 64 sessions. Adding timing to TickSession early would have surfaced it sooner.