Tenant Placement Policy
Companion to: Frappe SaaS Multitenant Docker Standard (ADR-002) Related code:
prego-control-plane/src/placement-3server.ts,tenant-allocation.ts
The Placement Engine decides which app server, DB shard, Redis pool, and pre-warmed Site a new tenant lands on, given region / plan / capacity / data-residency constraints.
This document is the contract for that engine. The current code in placement-3server.ts covers a subset of the rules below (region + plan + capacity); the missing items (db_shards.plan_tier_lock, site_pool reservation, infra_provider selection) are scoped for Phase 2 in ADR-002 §17.
One tenant, one server (production default)
prego-control-plane defaults to one new Hetzner server per new tenant (all plan tiers, including free trial and Enterprise paths that enqueue provisioning). Bin-packing onto existing shared app nodes is off unless the Worker sets PREGO_SHARED_PLACEMENT_LEGACY=1 (or true / yes). D1 audit JSON uses policy_json.algorithm: one_tenant_one_server_v1 when this policy applies.
Under this policy, MariaDB, Redis, and Frappe bench are intended to run on the same VM (single-node stack). The separated PlacementResult shape in §1.2 below remains the long-term contract for multi-shard designs; new rows may instead be interpreted as “all IDs refer to the same host” until the type definitions are refactored.
Canonical write-up: prego-control-plane ADR — One tenant, one server. Compose guidance: prego-docker — Single-node Frappe stack.
1. Inputs and outputs
1.1 PlacementRequest
export interface PlacementRequest { tenantId: string; region: "sg" | "eu" | "us"; plan: "trial" | "starter" | "business" | "enterprise"; expectedUsers?: number; // default by plan tier storageQuotaGb?: number; // default by plan tier dataResidency?: string; // ISO country code; restricts infra_provider candidates}1.2 PlacementResult
export interface PlacementResult { appServerId: string; dbShardId: string; redisPoolId: string; siteName: string; // promoted from site_pool databaseName: string; // tenant_<slug>_db infraProvider: "hetzner" | "aws" | "gcp"; decision: PlacementDecision;}
export interface PlacementDecision { algorithm: "best_fit" | "first_fit" | "round_robin" | "new_server" | "ai_recommended"; candidatesEvaluated: number; selectionReason: string; confidenceScore?: number; // 0..1 rejectedCandidates: { id: string; reason: string }[];}The decision block is persisted to allocation_snapshots (already exists — see migrations/0030_allocation_snapshots.sql).
2. Decision pipeline
flowchart TB
req[PlacementRequest]
req --> regionFilter[1 Region filter]
regionFilter --> providerFilter[2 InfraProvider data residency filter]
providerFilter --> planFilter[3 Plan tier filter]
planFilter --> capacityFilter[4 Capacity filter]
capacityFilter --> rank[5 Rank candidates]
rank --> sitePoolReserve[6 Reserve from site_pool]
sitePoolReserve --> result[PlacementResult]
capacityFilter -.->|"empty pool"| scaleOut[Trigger scale-out via WorkflowEngine]
sitePoolReserve -.->|"empty pool"| poolWarm[Trigger site_pool refill]
2.1 Step 1 — Region filter
- Hard filter on
servers.region == request.region. - If
dataResidencyis set, the result region must belong to the residency zone (e.g.KRresidency may only land onsguntil a dedicatedkrregion exists).
2.2 Step 2 — InfraProvider / data residency filter
- Look up
infra_providersrows whereenabled = trueandregionmatches. - If
dataResidencyis set, intersect with allowed providers. - Drop providers that have no servers passing later capacity checks.
2.3 Step 3 — Plan tier filter
| Plan | App constraint | DB shard constraint | Redis constraint |
|---|---|---|---|
trial | Shared, must accept trial workloads | db_shards.plan_tier_lock IN (NULL, 'trial') | Shared |
starter | Shared | db_shards.plan_tier_lock IS NULL | Shared |
business | Shared | db_shards.plan_tier_lock = 'business' | Shared OR dedicated (per contract flag) |
enterprise | Dedicated (servers.tenant_count = 0) | Dedicated (db_shards.dedicated_for_tenant_id IS NULL AND assigned in this transaction) | Dedicated |
Implementation: db_shards.plan_tier_lock is the first filter applied to candidate shards. This avoids placing a Starter tenant on a Business-only shard reserved for performance guarantees.
2.4 Step 4 — Capacity filter
A candidate (appServer, dbShard, redisPool) triple passes when all of:
appServer.memory_pct < PLACEMENT_MEMORY_PCT_MAX(default70)appServer.tenant_count < PLACEMENT_MAX_TENANTS_PER_NODE(default per ADR-001: 15-20)dbShard.current_tenant_count < dbShard.tenant_capacityredisPool.current_tenants < redisPool.tenant_capacityappServer.status = 'active'(skipdraining,terminated,failed)
Threshold env vars are already documented in prego-control-plane/wrangler.toml (PLACEMENT_*).
2.5 Step 5 — Rank candidates
Default algorithm: best-fit by composite score.
score(candidate) = w_app_load * (1 - app_memory_pct) + w_app_count * (1 - tenant_count / max_tenants_per_node) + w_db_load * (1 - db_tenant_count / db_capacity) + w_redis_load * (1 - redis_current / redis_capacity) + w_affinity * affinity_bonus(tenant.org_id, candidate.app_server_id)Default weights (subject to ops tuning):
| Weight | Default | Purpose |
|---|---|---|
w_app_load | 0.40 | Avoid memory hot spots |
w_app_count | 0.20 | Prevent excessive site count per bench |
w_db_load | 0.20 | Spread tenants across shards |
w_redis_load | 0.10 | Avoid Redis eviction pressure |
w_affinity | 0.10 | Co-locate tenants of the same parent organization for cache locality |
affinity_bonus is non-zero only when previous tenants of the same parent_org_id already live on the candidate app server (and the candidate is not already overloaded).
2.6 Step 6 — Reserve from site_pool
After ranking, the engine reserves an available site from the chosen app server’s pool (atomic UPDATE with SELECT ... FOR UPDATE semantics, or D1 conditional WHERE state = 'available').
If no site is available on the chosen app server:
- Re-rank candidates excluding empty-pool app servers.
- If all candidate app servers have empty pools: enqueue a
site_pool_refilljob and either (a) fall back to on-demandbench new-site(slower path, configurable) or (b) hold the placement and retry after warm-up.
Pool reservation rules and refill SLO live in Site Pool Strategy.
3. Scale-out triggers
When no capacity-passing candidate exists, Placement records a scale_out_required event and:
- Inserts a row into
scaling_events(already exists — seemigrations/0044_hybrid_multisite.sql). - Enqueues a workflow to
InfraProvider.createServer({ role, region, spec }). - Holds the tenant in
provisioningstate with status textawaiting_capacity.
Scale-out per role (app / db-shard / redis-pool) can fire independently. The thresholds inherit from ADR-001 §“Scaling Policy: 70% Threshold”.
4. Bin-packing extension (Phase 3)
The default best_fit algorithm is sufficient for thousands of tenants. Beyond that, multi-dimensional bin-packing is required.
Dimensions:
- App memory, app CPU, app worker slots
- DB tenant count, DB disk
- Redis memory
Algorithm: First-Fit Decreasing on the dominant dimension per region, with affinity tie-breaks. Implementation reference for the simulation engine already exists in src/types.ts ResourcePoolSimulation* types; the production engine reuses that scoring code path.
5. Affinity rules
| Affinity | Default | Rule |
|---|---|---|
| Parent org → same app server | on | Tenants sharing parent_org_id prefer same app server when score delta ≤ 0.05 |
| Parent org → same DB shard | off by default | Co-locating org tenants in one shard concentrates blast radius; opt-in only |
| Anti-affinity per Enterprise tenant | on | Enterprise tenants of different customers never share a server |
| Trial isolation from paying | on | Trial tenants only land on shards with plan_tier_lock IN (NULL, 'trial'); paying tenants never land on 'trial'-locked shards |
6. Forbidden placements (invariants)
The engine must reject any candidate combination that violates:
- App server
status != 'active'→ reject - Plan tier mismatch with shard
plan_tier_lock→ reject - Enterprise tenant landing on a server with
tenant_count > 0→ reject - Trial tenant landing on a Business-only shard → reject
- App server / DB shard / Redis pool in different regions → reject
dataResidencyviolation → reject
Invariants are checked after scoring (defense in depth) and any violation aborts the placement transaction.
7. Decision logging
Every placement writes:
tenant_allocationsrow (existing —migrations/0017_tenant_resource_allocation.sql)allocation_snapshotsrow (existing —migrations/0030_allocation_snapshots.sql)tenant_lifecycle_eventsrow withfrom_state='trial'orprovisioning,to_state='provisioning',actor='placement_engine',reason=PlacementDecision.selectionReason
The decision block (algorithm, score, candidates evaluated, rejection reasons) is persisted to allocation_snapshots.decision_log for audit and retroactive analysis.
8. Operator overrides
Operators can override placement via POST /internal/placement-overrides (existing pattern in src/internal.ts):
{ "tenant_id": "...", "force_app_server_id": "app-sgp-007", "force_db_shard_id": "db-shard-sgp-002", "reason": "Enterprise contract — colocate with parent tenant"}Overrides:
- Skip steps 4-5 (capacity / ranking)
- Still enforce invariants from §6
- Record
algorithm = 'manual'inallocation_snapshots
9. Test surface
The placement engine has a dedicated test suite (src/placement-3server.test.ts and the simulation harness in src/types.ts ResourcePoolSimulation*).
Mandatory test cases when extending the engine:
P1: every order receives a placement (nonullresults)P2: no candidate exceeds memory / tenant-count thresholdsP3: region + role invariants holdP4: enterprise tenants never share a serverP5: monotonic invariant — adding a tenant never decreases a server’stenant_countP6(new):plan_tier_lockviolations never occurP7(new):dataResidencyviolations never occurP8(new):site_poolreservation is atomic (no double-assignment)
10. References
- Frappe SaaS Multitenant Docker Standard (ADR-002)
- Control Plane Direct Provisioning (ADR-003) — placement results are realised by the Worker calling Hetzner Cloud API + Agent inbound exec directly (no Pulumi/Ansible)
- Site Pool Strategy
- Tenant Lifecycle
prego-control-plane/src/placement-3server.tsprego-control-plane/src/tenant-allocation.ts- ADR-001 Scaling Policy (still in force)