Onboarding launch stuck at “Check resource” / ~10%
What the UI means
- The label Check resource is the first row of the admin checklist. Control Plane maps it to no
prego_ui_step_*rows yet for the job’strace_id(or legacy progress when UI-step traces are absent). - ~10% is often the quantized fallback when
progress_from_uiis undefined (Pending+ fewtrace_events).
So “stuck on step 1” usually means either the GitHub / Ansible pipeline has not written prego_ui_step_01 yet or POST /internal/trace-events from Actions failed silently before this repo’s workflow hardening.
Operator checklist (ordered)
-
D1 timeline for
trace_idFrom repo
prego-control-plane:Terminal window cd prego-control-plane./scripts/d1-provision-trace-timeline.sh "<trace_id>"Or:
D1_TARGET=local ./scripts/d1-provision-trace-timeline.sh "<trace_id>"for local D1.Inspect:
provision_jobs:status,updated_at,region,plan_tierworkflow_dispatch_log: whether a dispatch row exists for thatjob_idtrace_events: presence ofpipeline_started,prego_ui_step_01,resolve_server_ok,prego_ui_step_02, …
-
GitHub Actions
Workflow:
Provision Tenant(provision-tenant.yml). Find the run for the samejob_id/trace_idinputs.resolve-serverfailed before trace posts: misconfiguredCONTROL_PLANE_API_KEY, wrongCONTROL_PLANE_URL, or no server target (target_server_idempty andcreate_new_serverfalse) — the job fails fast; trace steps now usecurl -fso HTTP failures fail the step.- Empty
trace_idinput: early UI-step trace steps are skipped entirely (if: inputs.trace_id != '').
-
Control Plane logs (Workers)
Search JSON logs for:
provision_pipeline_trigger_skipped— missing queue + missingGITHUB_TOKEN/GITHUB_WORKFLOW_DISPATCH_URLprovision_workflow_dispatch_failed/_http_error/_fetch_failedprovision_job_accepted_but_pipeline_not_triggered— job row accepted but trigger returned falseprovision_jobs_possibly_stuck(cron) — stalePending/Running
-
Browser (user session)
- Network: job poll returns
status,progress,provision_ui_current_label - Session:
prego_provision_job_idpresent; ifprego_provision_pipeline_never_triggeredis1, CP returnedpipeline_triggered: falsewhen the job was created (admin-web shows an alert banner).
- Network: job poll returns
Product / platform follow-ups
- Keep GitHub trace POST steps failing loudly (
curl -f, nocontinue-on-error) so D1 matches Actions failure state. - Surface
pipeline_triggered: falseon Launch (implemented in admin-web when CP exposes the field). - Optional: persist last pipeline stage on
provision_jobsfor UI when traces are missing.