Skip to content

Tenant Placement Policy

Companion to: Frappe SaaS Multitenant Docker Standard (ADR-002) Related code: prego-control-plane/src/placement-3server.ts, tenant-allocation.ts

The Placement Engine decides which app server, DB shard, Redis pool, and pre-warmed Site a new tenant lands on, given region / plan / capacity / data-residency constraints.

This document is the contract for that engine. The current code in placement-3server.ts covers a subset of the rules below (region + plan + capacity); the missing items (db_shards.plan_tier_lock, site_pool reservation, infra_provider selection) are scoped for Phase 2 in ADR-002 §17.

One tenant, one server (production default)

prego-control-plane defaults to one new Hetzner server per new tenant (all plan tiers, including free trial and Enterprise paths that enqueue provisioning). Bin-packing onto existing shared app nodes is off unless the Worker sets PREGO_SHARED_PLACEMENT_LEGACY=1 (or true / yes). D1 audit JSON uses policy_json.algorithm: one_tenant_one_server_v1 when this policy applies.

Under this policy, MariaDB, Redis, and Frappe bench are intended to run on the same VM (single-node stack). The separated PlacementResult shape in §1.2 below remains the long-term contract for multi-shard designs; new rows may instead be interpreted as “all IDs refer to the same host” until the type definitions are refactored.

Canonical write-up: prego-control-plane ADR — One tenant, one server. Compose guidance: prego-docker — Single-node Frappe stack.


1. Inputs and outputs

1.1 PlacementRequest

export interface PlacementRequest {
tenantId: string;
region: "sg" | "eu" | "us";
plan: "trial" | "starter" | "business" | "enterprise";
expectedUsers?: number; // default by plan tier
storageQuotaGb?: number; // default by plan tier
dataResidency?: string; // ISO country code; restricts infra_provider candidates
}

1.2 PlacementResult

export interface PlacementResult {
appServerId: string;
dbShardId: string;
redisPoolId: string;
siteName: string; // promoted from site_pool
databaseName: string; // tenant_<slug>_db
infraProvider: "hetzner" | "aws" | "gcp";
decision: PlacementDecision;
}
export interface PlacementDecision {
algorithm: "best_fit" | "first_fit" | "round_robin" | "new_server" | "ai_recommended";
candidatesEvaluated: number;
selectionReason: string;
confidenceScore?: number; // 0..1
rejectedCandidates: { id: string; reason: string }[];
}

The decision block is persisted to allocation_snapshots (already exists — see migrations/0030_allocation_snapshots.sql).


2. Decision pipeline

flowchart TB
    req[PlacementRequest]
    req --> regionFilter[1 Region filter]
    regionFilter --> providerFilter[2 InfraProvider data residency filter]
    providerFilter --> planFilter[3 Plan tier filter]
    planFilter --> capacityFilter[4 Capacity filter]
    capacityFilter --> rank[5 Rank candidates]
    rank --> sitePoolReserve[6 Reserve from site_pool]
    sitePoolReserve --> result[PlacementResult]

    capacityFilter -.->|"empty pool"| scaleOut[Trigger scale-out via WorkflowEngine]
    sitePoolReserve -.->|"empty pool"| poolWarm[Trigger site_pool refill]

2.1 Step 1 — Region filter

  • Hard filter on servers.region == request.region.
  • If dataResidency is set, the result region must belong to the residency zone (e.g. KR residency may only land on sg until a dedicated kr region exists).

2.2 Step 2 — InfraProvider / data residency filter

  • Look up infra_providers rows where enabled = true and region matches.
  • If dataResidency is set, intersect with allowed providers.
  • Drop providers that have no servers passing later capacity checks.

2.3 Step 3 — Plan tier filter

PlanApp constraintDB shard constraintRedis constraint
trialShared, must accept trial workloadsdb_shards.plan_tier_lock IN (NULL, 'trial')Shared
starterShareddb_shards.plan_tier_lock IS NULLShared
businessShareddb_shards.plan_tier_lock = 'business'Shared OR dedicated (per contract flag)
enterpriseDedicated (servers.tenant_count = 0)Dedicated (db_shards.dedicated_for_tenant_id IS NULL AND assigned in this transaction)Dedicated

Implementation: db_shards.plan_tier_lock is the first filter applied to candidate shards. This avoids placing a Starter tenant on a Business-only shard reserved for performance guarantees.

2.4 Step 4 — Capacity filter

A candidate (appServer, dbShard, redisPool) triple passes when all of:

  • appServer.memory_pct < PLACEMENT_MEMORY_PCT_MAX (default 70)
  • appServer.tenant_count < PLACEMENT_MAX_TENANTS_PER_NODE (default per ADR-001: 15-20)
  • dbShard.current_tenant_count < dbShard.tenant_capacity
  • redisPool.current_tenants < redisPool.tenant_capacity
  • appServer.status = 'active' (skip draining, terminated, failed)

Threshold env vars are already documented in prego-control-plane/wrangler.toml (PLACEMENT_*).

2.5 Step 5 — Rank candidates

Default algorithm: best-fit by composite score.

score(candidate) =
w_app_load * (1 - app_memory_pct)
+ w_app_count * (1 - tenant_count / max_tenants_per_node)
+ w_db_load * (1 - db_tenant_count / db_capacity)
+ w_redis_load * (1 - redis_current / redis_capacity)
+ w_affinity * affinity_bonus(tenant.org_id, candidate.app_server_id)

Default weights (subject to ops tuning):

WeightDefaultPurpose
w_app_load0.40Avoid memory hot spots
w_app_count0.20Prevent excessive site count per bench
w_db_load0.20Spread tenants across shards
w_redis_load0.10Avoid Redis eviction pressure
w_affinity0.10Co-locate tenants of the same parent organization for cache locality

affinity_bonus is non-zero only when previous tenants of the same parent_org_id already live on the candidate app server (and the candidate is not already overloaded).

2.6 Step 6 — Reserve from site_pool

After ranking, the engine reserves an available site from the chosen app server’s pool (atomic UPDATE with SELECT ... FOR UPDATE semantics, or D1 conditional WHERE state = 'available').

If no site is available on the chosen app server:

  1. Re-rank candidates excluding empty-pool app servers.
  2. If all candidate app servers have empty pools: enqueue a site_pool_refill job and either (a) fall back to on-demand bench new-site (slower path, configurable) or (b) hold the placement and retry after warm-up.

Pool reservation rules and refill SLO live in Site Pool Strategy.


3. Scale-out triggers

When no capacity-passing candidate exists, Placement records a scale_out_required event and:

  1. Inserts a row into scaling_events (already exists — see migrations/0044_hybrid_multisite.sql).
  2. Enqueues a workflow to InfraProvider.createServer({ role, region, spec }).
  3. Holds the tenant in provisioning state with status text awaiting_capacity.

Scale-out per role (app / db-shard / redis-pool) can fire independently. The thresholds inherit from ADR-001 §“Scaling Policy: 70% Threshold”.


4. Bin-packing extension (Phase 3)

The default best_fit algorithm is sufficient for thousands of tenants. Beyond that, multi-dimensional bin-packing is required.

Dimensions:

  • App memory, app CPU, app worker slots
  • DB tenant count, DB disk
  • Redis memory

Algorithm: First-Fit Decreasing on the dominant dimension per region, with affinity tie-breaks. Implementation reference for the simulation engine already exists in src/types.ts ResourcePoolSimulation* types; the production engine reuses that scoring code path.


5. Affinity rules

AffinityDefaultRule
Parent org → same app serveronTenants sharing parent_org_id prefer same app server when score delta ≤ 0.05
Parent org → same DB shardoff by defaultCo-locating org tenants in one shard concentrates blast radius; opt-in only
Anti-affinity per Enterprise tenantonEnterprise tenants of different customers never share a server
Trial isolation from payingonTrial tenants only land on shards with plan_tier_lock IN (NULL, 'trial'); paying tenants never land on 'trial'-locked shards

6. Forbidden placements (invariants)

The engine must reject any candidate combination that violates:

  1. App server status != 'active' → reject
  2. Plan tier mismatch with shard plan_tier_lock → reject
  3. Enterprise tenant landing on a server with tenant_count > 0 → reject
  4. Trial tenant landing on a Business-only shard → reject
  5. App server / DB shard / Redis pool in different regions → reject
  6. dataResidency violation → reject

Invariants are checked after scoring (defense in depth) and any violation aborts the placement transaction.


7. Decision logging

Every placement writes:

The decision block (algorithm, score, candidates evaluated, rejection reasons) is persisted to allocation_snapshots.decision_log for audit and retroactive analysis.


8. Operator overrides

Operators can override placement via POST /internal/placement-overrides (existing pattern in src/internal.ts):

{
"tenant_id": "...",
"force_app_server_id": "app-sgp-007",
"force_db_shard_id": "db-shard-sgp-002",
"reason": "Enterprise contract — colocate with parent tenant"
}

Overrides:

  • Skip steps 4-5 (capacity / ranking)
  • Still enforce invariants from §6
  • Record algorithm = 'manual' in allocation_snapshots

9. Test surface

The placement engine has a dedicated test suite (src/placement-3server.test.ts and the simulation harness in src/types.ts ResourcePoolSimulation*).

Mandatory test cases when extending the engine:

  1. P1: every order receives a placement (no null results)
  2. P2: no candidate exceeds memory / tenant-count thresholds
  3. P3: region + role invariants hold
  4. P4: enterprise tenants never share a server
  5. P5: monotonic invariant — adding a tenant never decreases a server’s tenant_count
  6. P6 (new): plan_tier_lock violations never occur
  7. P7 (new): dataResidency violations never occur
  8. P8 (new): site_pool reservation is atomic (no double-assignment)

10. References

Help