Skip to content

English

Scope

  • Monitoring: Hetzner VMs exist and usage is visible in Control Plane D1 and operator UI.
  • Provisioning: “Base set” means the pipeline that creates or selects App/DB/Redis capacity per placement and provision-tenant workflow.

Definition of done — monitoring

  1. Every production shared node that accepts tenants has a stable nodes.node_id and receives at least one successful POST /internal/server-metrics per 5 minutes under normal conditions.
  2. cpu_pct, memory_pct, disk_pct, and tenant_count are populated according to the metrics push runbook (explicit policy for tenant_count).
  3. Operator can confirm health via GET /internal/nodes or /cp/infra (Bearer / operator shell as documented for the deployment).
  4. Hetzner Cloud API tokens are not stored on the Control Plane Worker; collectors run on the node or in approved automation.

Definition of done — provisioning (base set)

  1. Shared / use existing: For a tenant where create_new_server is false and target_server_id is set, provision-tenant completes without running Pulumi, resolves host from GET /internal/nodes/:node_id, and downstream jobs (DNS, Ansible as configured) succeed or are intentionally skipped with documented exceptions.
  2. Shared / new server: For create_new_server true and dedicated false, Pulumi runs against stack {region}, registers a node via POST /internal/nodes, and the job reaches Completed with trace stages visible when trace_id is set.
  3. Dedicated: For dedicated true, Pulumi uses stack dedicated-{sanitized_tenant_id} (per workflow), producing the three-server resource model described in product docs; job completes or fails with recorded provision_jobs status and audit rows.
  4. Idempotency: Re-running with the same job_id / idempotency rules does not create duplicate infrastructure (follow provision pipeline field mapping and existing job guards).

E2E scenarios (acceptance)

IDScenarioPass criteria
E1Existing node, shared tenanttarget_server_id resolves; no Pulumi; tenant DNS step outcome recorded; CP callbacks fire as designed.
E2New shared stack nodePulumi success; new nodes row; metrics agent can use new node_id; provision job Completed.
E3Dedicated tenantDedicated stack selected; three-server outputs match prego-pulumi; completion criteria in E2 apply per org policy.
E4Metrics onlyAfter E1–E3, server_metrics shows fresh updated_at for each live node.

한국어

모니터링 완료 정의

  1. 프로덕션 공유 노드마다 nodes.node_id가 고정되어 있고, 정상 시 5분 이내POST /internal/server-metrics가 최소 1회 이상 성공한다.
  2. CPU·메모리·디스크·tenant_count 산출 정책이 메트릭 푸시 런북에 따라 문서화되어 있다.
  3. 운영자가 GET /internal/nodes 또는 /cp/infra 로 상태를 확인할 수 있다.
  4. Hetzner API 토큰은 Control Plane Worker에 넣지 않는다.

프로비저닝(기본 셋) 완료 정의

  1. 기존 노드 사용: Pulumi 없이 호스트 조회·후속 단계가 설계대로 동작한다.
  2. 신규 공유 스택: 리전 스택으로 Pulumi 성공 후 POST /internal/nodes 로 노드 등록, job Completed.
  3. Dedicated: dedicated-* 스택으로 3서버 모델이 조직 정책과 일치하며 job 상태·감사 기록이 남는다.
  4. 멱등성: 동일 job 이중 인프라 생성이 없도록 기존 가드·문서를 따른다.

E2E 시나리오

위 영문 표 E1–E4를 릴리스 검증에 사용합니다.

관련 문서: 메트릭 푸시 런북, node_id 매핑, 필드 매핑

Help