Adopt the team wiki convention (in-repo wiki/ folder, plain markdown) used in tinqs/studio. Convert DEVOPS.md + PLAN.md and the heavy parts of README.md into cross-linked wiki pages: Home, Architecture, DevOps-Reference, Operations, Roadmap. Root README slimmed to a repo intro pointing at wiki/. Corrects stale topology while converting: - ECS cluster tinqs-git / EFS tinqs-git-repos retired 2026-06-05; platform now the standalone EC2 box tinqs-prod-gitea (ALB tinqs-git, ECR image, RDS). - Records this session's fixes: deploy-label dry-run route, runner-name collisions, arikigame IAM bucket, and template deploy repointed ECS→EC2/SSM. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
5.3 KiB
Operations
← Home · Architecture · DevOps Reference · Roadmap
Deploy the dispatcher
The dispatcher Lambda can't CI itself — deploy manually:
cd orchestrator/dispatch
GOOS=linux GOARCH=amd64 CGO_ENABLED=0 go build -o bootstrap -ldflags "-s -w" .
# Windows: powershell -Command "Compress-Archive -Path bootstrap -DestinationPath function.zip -Force"
# Mac/Linux: zip -j function.zip bootstrap
aws lambda update-function-code --region eu-west-1 \
--function-name tinqs-ci-dispatch --zip-file fileb://function.zip
# Verify: push a change to cmd/tstudio/ in tinqs/studio and watch the pipeline
Deploy templates to prod (no rebuild)
Template-only changes don't need a platform rebuild. tinqs/studio/.gitea/workflows/deploy-templates.yml (label deploy) tars templates/ → s3://tinqs-git-lfs/custom-templates.tar.gz, then over SSM tells tinqs-prod-gitea to pull + extract into /data/gitea/templates/ and docker restart gitea.
Repointed to SSM/EC2 on 2026-06-07. It previously ran
aws ecs update-service --cluster tinqs-git, which failed withClusterNotFoundExceptionafter the cluster was deleted on 06-05 — that's why the repo Wiki tab and theme CSS never went live. The runner role gained a scopedssm:SendCommand(prod-gitea only).
Manual one-off (admin creds):
tar czf /tmp/custom-templates.tar.gz -C templates .
aws s3 cp /tmp/custom-templates.tar.gz s3://tinqs-git-lfs/custom-templates.tar.gz --region eu-west-1
IID=$(aws ec2 describe-instances --region eu-west-1 \
--filters "Name=tag:Name,Values=tinqs-prod-gitea" "Name=instance-state-name,Values=running" \
--query "Reservations[0].Instances[0].InstanceId" --output text)
aws ssm send-command --region eu-west-1 --instance-ids "$IID" \
--document-name AWS-RunShellScript \
--parameters 'commands=["aws s3 cp s3://tinqs-git-lfs/custom-templates.tar.gz /tmp/ct.tar.gz --region eu-west-1","tar xzf /tmp/ct.tar.gz -C /data/gitea/templates","docker restart gitea"]'
Note: a template change does not bump the platform version string in the footer (that tracks the Go binary build). Unchanged footer ≠ failed deploy.
Rotate GITEA_TOKEN
- Generate a new token in Gitea: Settings → Applications → Generate Token
aws lambda update-function-configuration --function-name tinqs-ci-dispatch --environment ...- Old token is burned into running instances — they die within 30 min
Rotate RUNNER_TOKEN
- Gitea admin → Actions → Runners → Create new registration token
- Update the Lambda env var
- Running instances keep their existing registration until they die
Build a new AMI
aws ec2 run-instances --image-id ami-00a129385002e4de9 \
--instance-type t3.small --key-name <your-key> --region eu-west-1 \
--query 'Instances[0].InstanceId'
# SSH in, update tools (Go, Node, Docker, act_runner), then:
aws ec2 create-image --instance-id <id> --name tinqs-ci-runner-v3
aws lambda update-function-configuration --function-name tinqs-ci-dispatch \
--environment "Variables={...,RUNNER_AMI=ami-NEW,...}"
aws ec2 terminate-instances --instance-id <id>
Add CI to a new repo
- Create
.gitea/workflows/<name>.ymlin the repo - Add a per-repo webhook in Gitea: Settings → Webhooks → Add Webhook
- URL: the dispatcher API Gateway URL · Events: Push · Content type:
application/json
- URL: the dispatcher API Gateway URL · Events: Push · Content type:
- Push a change matching the workflow trigger
Monitoring
# Zombie check (should be 0 except during active builds)
aws ec2 describe-instances --region eu-west-1 \
--filters "Name=tag:tinqs-ci,Values=true" "Name=instance-state-name,Values=running" \
--query 'Reservations[].Instances[].InstanceId'
# Dispatcher logs (MSYS_NO_PATHCONV=1 on Windows/Git Bash; or use PowerShell)
MSYS_NO_PATHCONV=1 aws logs tail '/aws/lambda/tinqs-ci-dispatch' --region eu-west-1
# Build/job logs
curl -s "https://tinqs.com/api/v1/repos/tinqs/studio/actions/jobs/<JOB_ID>/logs" \
-H "Authorization: token <gitea-token>"
# Stale DynamoDB runs
aws dynamodb scan --region eu-west-1 --table-name tinqs-ci-runs \
--filter-expression "#s = :r" \
--expression-attribute-names '{"#s":"status"}' \
--expression-attribute-values '{":r":{"S":"running"}}' --query Count
Contributing
New composite action: create <name>/action.yml (using: composite, shell: bash), keep it bash-only, add a <name>/README.md, list it in Architecture, push to main (resolves via @v1).
Modify the dispatcher: edit orchestrator/dispatch/main.go, go build . to catch errors, deploy manually (above), verify with a push to tinqs/studio.
New runner label: add to labelToSpot in main.go, create images/<label>/Dockerfile if needed, build/push (cd images && ./build-all.sh v1), deploy the Lambda, add runs-on: <label> to the consuming workflow.
Incidents
- 25 May 2026 — 18 zombie runners DDoS-ing Gitea. Root cause: no
--ephemeralon registration + no git auth after repos went private. Fix:--ephemeral+url.insteadOfgit auth in user-data. - 07 Jun 2026 — all
runs-on: deployjobs silently dry-running (deadtinqs-ci-execroute) + arikigame IAM bucket mismatch + template deploy pointing at the deleted ECS cluster. All fixed; see Architecture and the template-deploy note above.