Files
ci/wiki/Architecture.md
ozan 33f967e42e docs: convert ci docs to the in-repo wiki/ standard + fix stale ECS facts
Adopt the team wiki convention (in-repo wiki/ folder, plain markdown) used in
tinqs/studio. Convert DEVOPS.md + PLAN.md and the heavy parts of README.md
into cross-linked wiki pages: Home, Architecture, DevOps-Reference,
Operations, Roadmap. Root README slimmed to a repo intro pointing at wiki/.

Corrects stale topology while converting:
- ECS cluster tinqs-git / EFS tinqs-git-repos retired 2026-06-05; platform now
  the standalone EC2 box tinqs-prod-gitea (ALB tinqs-git, ECR image, RDS).
- Records this session's fixes: deploy-label dry-run route, runner-name
  collisions, arikigame IAM bucket, and template deploy repointed ECS→EC2/SSM.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 20:43:05 +01:00

3.9 KiB

Architecture

← Home · DevOps Reference · Operations · Roadmap

Push → Gitea webhook → Lambda (tinqs-ci-dispatch) → EC2 Spot → act_runner → job → self-terminate

Runners are ephemeral: one Spot instance per job, self-terminating on completion. Private-repo clones are authenticated via git config url.insteadOf injected in the runner user-data.

Key design decisions

  • Ephemeral Spot instances (not Fargate, not persistent runners) — cheapest, cleanest, no state to manage.
  • --ephemeral on act_runner register — the runner exits after one job, triggering shutdown -h now → the instance terminates. Without this, runners pile up as zombies (see the 25 May 2026 incident in Operations).
  • No local action cacheact_runner uses go-git internally, which ignores ~/.gitconfig. The url.insteadOf trick only works for the git binary (used by the checkout action), so action repos are cloned fresh each run. This is why tinqs/ci must stay public.
  • tinqs.com — Gitea's ROOT_URL is tinqs.com. The old git.tinqs.com subdomain is retired.

Composite actions

Bash-only composite actions (no Node.js runtime). Resolve via @v1 (the main branch).

Action What it does
tinqs/ci/checkout@v1 Clone a repo from tinqs.com (sparse checkout, depth control, token auth)
tinqs/ci/setup-go@v1 Install Go (skips if pre-baked in the AMI)
tinqs/ci/setup-node@v1 Install Node.js + pnpm (skips if pre-baked)
tinqs/ci/setup-aws@v1 Install AWS CLI + optional ECR login
steps:
  - uses: tinqs/ci/checkout@v1
    with:
      sparse: 'cmd/tstudio'
  - uses: tinqs/ci/setup-go@v1
  - uses: tinqs/ci/setup-aws@v1
    with:
      ecr-login: 'true'

Dispatcher (Lambda)

orchestrator/dispatch/main.go receives Gitea push webhooks, fetches .gitea/workflows/*.yml via the Gitea API, evaluates triggers (branch + path filters), reads each matched workflow's runs-on label, and launches a Spot instance with that label. Run state is tracked in DynamoDB.

Routing by label (labelToSpot map in main.go):

Label Instance Use
go t3.small Go builds (tstudio, proxy, docgen)
docker t3.medium Docker image builds (platform, bot)
deploy t3.micro S3 sync, CloudFront invalidation, SSM template deploy
node t3.medium Frontend builds
godot t3.medium Game exports (future)

runs-on: host is skipped by the dispatcher (it's for a standing registered runner, not Spot).

Fixed 2026-06-07: deploy-labelled jobs used to route to a separate executor Lambda (tinqs-ci-exec) that was deleted 26 May, so they silently hit a [DRY RUN] Would invoke executor no-op and never ran. They now fall through to the normal Spot path like every other label. A second bug — runner names derived from runID[:12] collided across same-commit deploys — was also fixed (names now use the full sanitised runID).

Runner lifecycle (user-data)

boot → git auth config (url.insteadOf with GITEA_TOKEN)
     → act_runner register --ephemeral --labels <label>:host
     → act_runner daemon (blocks until job completes)
     → EXIT trap → shutdown -h now → instance terminates

Runner images

Dockerfiles in images/ — lean, purpose-built. Push to ECR with images/build-all.sh v1.

Image Contents
base Alpine + git + AWS CLI + SSH
go base + Go
node base + Node + pnpm
docker docker:dind + Go + AWS CLI
deploy base only (lightest)
godot base + headless Godot

Note: the live Spot runners boot from a pre-baked AMI (RUNNER_AMI, with Go/Node/Docker/act_runner installed), not these container images. The images exist for purpose-built runner variants; the AMI is the fast path.