English
Scope
- Monitoring: Hetzner VMs exist and usage is visible in Control Plane D1 and operator UI.
- Provisioning: “Base set” means the pipeline that creates or selects App/DB/Redis capacity per placement and provision-tenant workflow.
Definition of done — monitoring
- Every production shared node that accepts tenants has a stable
nodes.node_idand receives at least one successfulPOST /internal/server-metricsper 5 minutes under normal conditions. cpu_pct,memory_pct,disk_pct, andtenant_countare populated according to the metrics push runbook (explicit policy fortenant_count).- Operator can confirm health via
GET /internal/nodesor/cp/infra(Bearer / operator shell as documented for the deployment). - Hetzner Cloud API tokens are not stored on the Control Plane Worker; collectors run on the node or in approved automation.
Definition of done — provisioning (base set)
- Shared / use existing: For a tenant where
create_new_serveris false andtarget_server_idis set,provision-tenantcompletes without running Pulumi, resolves host fromGET /internal/nodes/:node_id, and downstream jobs (DNS, Ansible as configured) succeed or are intentionally skipped with documented exceptions. - Shared / new server: For
create_new_servertrue anddedicatedfalse, Pulumi runs against stack{region}, registers a node viaPOST /internal/nodes, and the job reaches Completed with trace stages visible whentrace_idis set. - Dedicated: For
dedicatedtrue, Pulumi uses stackdedicated-{sanitized_tenant_id}(per workflow), producing the three-server resource model described in product docs; job completes or fails with recordedprovision_jobsstatus and audit rows. - Idempotency: Re-running with the same
job_id/ idempotency rules does not create duplicate infrastructure (follow provision pipeline field mapping and existing job guards).
E2E scenarios (acceptance)
| ID | Scenario | Pass criteria |
|---|---|---|
| E1 | Existing node, shared tenant | target_server_id resolves; no Pulumi; tenant DNS step outcome recorded; CP callbacks fire as designed. |
| E2 | New shared stack node | Pulumi success; new nodes row; metrics agent can use new node_id; provision job Completed. |
| E3 | Dedicated tenant | Dedicated stack selected; three-server outputs match prego-pulumi; completion criteria in E2 apply per org policy. |
| E4 | Metrics only | After E1–E3, server_metrics shows fresh updated_at for each live node. |
Related
한국어
모니터링 완료 정의
- 프로덕션 공유 노드마다
nodes.node_id가 고정되어 있고, 정상 시 5분 이내로POST /internal/server-metrics가 최소 1회 이상 성공한다. - CPU·메모리·디스크·
tenant_count산출 정책이 메트릭 푸시 런북에 따라 문서화되어 있다. - 운영자가
GET /internal/nodes또는/cp/infra로 상태를 확인할 수 있다. - Hetzner API 토큰은 Control Plane Worker에 넣지 않는다.
프로비저닝(기본 셋) 완료 정의
- 기존 노드 사용: Pulumi 없이 호스트 조회·후속 단계가 설계대로 동작한다.
- 신규 공유 스택: 리전 스택으로 Pulumi 성공 후
POST /internal/nodes로 노드 등록, job Completed. - Dedicated:
dedicated-*스택으로 3서버 모델이 조직 정책과 일치하며 job 상태·감사 기록이 남는다. - 멱등성: 동일 job 이중 인프라 생성이 없도록 기존 가드·문서를 따른다.
E2E 시나리오
위 영문 표 E1–E4를 릴리스 검증에 사용합니다.
관련 문서: 메트릭 푸시 런북, node_id 매핑, 필드 매핑