Files
ci/wiki/DevOps-Reference.md
ozan 33f967e42e docs: convert ci docs to the in-repo wiki/ standard + fix stale ECS facts
Adopt the team wiki convention (in-repo wiki/ folder, plain markdown) used in
tinqs/studio. Convert DEVOPS.md + PLAN.md and the heavy parts of README.md
into cross-linked wiki pages: Home, Architecture, DevOps-Reference,
Operations, Roadmap. Root README slimmed to a repo intro pointing at wiki/.

Corrects stale topology while converting:
- ECS cluster tinqs-git / EFS tinqs-git-repos retired 2026-06-05; platform now
  the standalone EC2 box tinqs-prod-gitea (ALB tinqs-git, ECR image, RDS).
- Records this session's fixes: deploy-label dry-run route, runner-name
  collisions, arikigame IAM bucket, and template deploy repointed ECS→EC2/SSM.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 20:43:05 +01:00

4.9 KiB
Raw Permalink Blame History

DevOps Reference

← Home · Architecture · Operations · Roadmap

AWS resources (eu-west-1)

Resource Name/ID Purpose
Lambda tinqs-ci-dispatch Webhook handler + Spot launcher
DynamoDB tinqs-ci-runs Run tracking (repo, run_id, instance_id, status)
AMI tinqs-ci-runner-v2 (ami-00a129385002e4de9) Pre-baked runner (Go, Node, Docker, act_runner)
Security Group sg-030bf74b43d3faac7 Runner SG (outbound HTTPS)
Subnet subnet-04b5aeec9bfc4ec2c Default VPC subnet
Instance Profile tinqs-ci-runner → role tinqs-git-task Runner IAM role (S3, ECR, SSM)
CloudWatch /aws/lambda/tinqs-ci-dispatch Dispatcher logs
API Gateway q4ohxovfr8…/webhook Receives the per-repo Gitea push webhook

Platform host (NOT CI — context)

Resource Name/ID Purpose
EC2 tinqs-prod-gitea (i-0d085288f467083e0, t3.medium) Runs tinqs.com as a single docker Gitea container
ALB tinqs-git Fronts the platform
ECR tinqs-git:latest Platform image (built by build.yml → CodeBuild)
RDS tinqs-prod (PostgreSQL) Platform DB

The platform mounts host /data; GITEA_CUSTOM=/data/gitea, so custom templates live at /data/gitea/templates/. Template-only changes deploy here via SSM — see Operations.

Retired resources

Resource When / why
ECS Cluster tinqs-git Deleted 2026-06-05 — platform moved to the tinqs-prod-gitea EC2 box
EFS tinqs-git-repos Retired in the 2026-06-05 EC2 migration (repos now on instance /data)
Lambda tinqs-ci-exec Deleted 26 May 2026 — never ran a build; deploy jobs go through Spot now
CloudWatch /aws/lambda/tinqs-ci-exec, /ecs/tinqs-runner Log groups for the above / the Fargate era
Fargate runner service Scaled to 0 then removed

Webhook flow

Gitea (tinqs.com)
  └─ per-repo webhook on push
       └─ POST https://<api-gw>/webhook
            └─ Lambda tinqs-ci-dispatch
                 ├─ Fetch .gitea/workflows/*.yml via Gitea API
                 ├─ Evaluate triggers (branch + path filters)
                 ├─ For each matched workflow:
                 │    ├─ Read runs-on label
                 │    └─ RunInstances (Spot, ephemeral)   [host → skipped]
                 └─ Track in DynamoDB

Spot instance lifecycle

1. Lambda calls RunInstances (Spot, InstanceInitiatedShutdownBehavior=terminate)
2. User-data runs:
   a. Configure git auth (url.insteadOf with GITEA_TOKEN)
   b. act_runner register --ephemeral --labels <label>:host
   c. act_runner daemon (blocks until job completes)
   d. EXIT trap fires → shutdown -h now → instance terminates
3. DynamoDB record: running → completed (or timeout after 30 min cleanup)

Offline runners listed in Gitea admin → Actions → Runners are normal — they're spent ephemeral registrations, not a fault.

Cleanup cron

The dispatcher Lambda also handles cleanup when invoked with an empty body or {"action":"cleanup"}. Triggered by EventBridge every 5 minutes.

  • Scans DynamoDB for runs older than 30 min with status=running
  • Terminates matching EC2 instances
  • Sweeps for orphan instances (tagged tinqs-ci, running > 30 min)

Lambda env vars

Configured in the AWS console, not in code:

Var Purpose
GITEA_URL https://tinqs.com
GITEA_TOKEN API token — fetches workflows AND provides runner git auth
RUNNER_TOKEN act_runner registration token (Gitea admin → Runners)
RUNNER_AMI Pre-baked AMI ID
SUBNET VPC subnet for Spot instances
SECURITY_GROUP SG allowing outbound HTTPS
DDB_TABLE DynamoDB run-tracking table (tinqs-ci-runs)
INSTANCE_PROFILE IAM instance profile for runners

Runner IAM role (tinqs-git-task)

Inline policies of note:

  • tinqs-ci-s3 — R/W on tinqs-cli-releases, arikigame-com-website, docs.tinqs.com (corrected 2026-06-07: was the non-existent arikigame.com, which broke the arikigame deploy)
  • tinqs-git-s3 — R/W on tinqs-git-lfs, tinqs-git-preview
  • tinqs-ci-deploy — ECR push, CloudFront CreateInvalidation, (legacy ECS update)
  • tinqs-ci-ssm-deployec2:DescribeInstances + ssm:SendCommand scoped to the tinqs-prod-gitea instance (added 2026-06-07 for template deploys)
  • ssm-exec — Session Manager channels · ec2-self-terminate — terminate own tinqs-ci-tagged instance

Cost

Component Estimated monthly cost
Spot instances (t3.small, ~10 min/build, ~5 builds/day) ~$12
Lambda (< 1000 invocations/month) ~$0 (free tier)
DynamoDB (< 1 GB, low RCU/WCU) ~$0 (free tier)
CloudWatch logs ~$0.50
Total CI ~$23/month