K8s production-readiness workplan โ walt.id slice
Goal: take the waltid scenario from ./deploy.sh up waltid && ./deploy.sh run waltid to a single-click Kubernetes deployment with security hardening, monitoring, and autoscaling. Cloud-agnostic: the same Terraform + Helm artifacts deploy to a local kind cluster, an on-prem k3s cluster, or self-managed nodes on AWS, with zero use of cloud-managed services.
Scope (from deploy.sh:87-92 and the waltid case at deploy.sh:155-161):
- walt.id services:
postgres,caddy,issuer-api,verifier-api,wallet-api - IdP:
keycloak,wso2is - Translation:
libretranslate - Application:
verifiably-go(the Go app fromcmd/)
Single-click target: ./k8s-deploy.sh up waltid [--target=local|onprem|aws] and ./k8s-deploy.sh run waltid.
Architectural ground rules (apply to every prompt below)
- Cloud agnostic = workloads only depend on in-cluster primitives:
- PVCs (any CSI driver)
ServicetypesClusterIP/LoadBalancer(provided by MetalLB on-prem, AWS LB controller on EKS)Ingressvia ingress-nginx- Secrets via Vault + External Secrets Operator
- Postgres via the CloudNativePG operator (no RDS)
- Object storage via the MinIO operator
- Certs via cert-manager (self-signed CA on-prem, ACME elsewhere)
- Terraform's job = drive Kubernetes (
kubernetes,helm,kubectlproviders) against any kubeconfig. Cluster creation lives in thin, optional per-target modules (bootstrap/local-kind,bootstrap/onprem-k3s,bootstrap/aws-eks-self-managed) selected by a single var. The core platform module is identical for all three. - Helm's job = walt.id + companion workloads, one chart per service plus an umbrella, values-driven.
- Single-click =
./k8s-deploy.sh up waltidand./k8s-deploy.sh run waltid, mirroring the existing CLI surface.
Phase 0 โ Audit (no code yet, two prompts)
Prompt 0.1 โ Walt.id config surface inventory
In
verifiably-go/deploy/compose/stack/docker-compose.yml, audit only thewaltidscenario services:postgres,caddy,issuer-api,verifier-api,wallet-api,keycloak,wso2is,libretranslate. For each, produce a table of: every env var consumed (with which are secret), every bind-mounted file/dir and what it contains, every port, every dependency on another service, and every piece of persistent state. Also list which env vars are referenced in.env.example. Write the result todocs/k8s/inventory.md. Do not write any chart code. Under 400 lines.
Prompt 0.2 โ Walt.id config-file deep-read
Read every config file mounted into
issuer-api,verifier-api, andwallet-api(look underdeploy/compose/stack/). For each file, summarize: schema, which fields are environment-specific (URLs, DB DSN, signing keys, OIDC client secrets), and which are static. The output is the basis for thevalues.yamlschema of each Helm chart. Write todocs/k8s/values-schema.md.
Phase 1 โ Pre-K8s code refactors (three prompts)
Prompt 1.1 โ Pin all base images
In
verifiably-go/deploy/compose/stack/docker-compose.yml, replace every:latesttag with a pinned minor version (postgres:16.4,caddy:2.8,quay.io/keycloak/keycloak:25.0,libretranslate/libretranslate:1.6, etc.). Do not touch walt.id services โ they're already at0.18.2. Verify./deploy.sh up waltidstill works. Single commit.
Prompt 1.2 โ Externalize walt.id config from bind mounts to templated files
Walt.id services currently consume YAML configs via bind mounts. Move those config files into
verifiably-go/deploy/k8s/config/{issuer,verifier,wallet}/as standalone files with placeholder env-var references (e.g.{{ .Values.db.dsn }}). The compose stack should keep working by rendering them with envsubst during./deploy.sh up waltid(mirror the existingdeploy/compose/injiweb/render-config.shpattern). This makes the same files reusable as Helm ConfigMap sources.
Prompt 1.3 โ Build & publish the verifiably-go image
Add a multi-stage
Dockerfilebuild target for the Go app suitable for K8s (non-root user, distroless or alpine,/healthendpoint, structured JSON logs to stdout). Add amake imagetarget that builds and tagsverifiably-go:<git-sha>. Add a CI workflow (.github/workflows/image.yml) that on push tomainbuilds, runs Trivy, and pushes to a configurable registry (REGISTRYenv var, defaultghcr.io/${GITHUB_REPOSITORY_OWNER}).
Phase 2 โ Repo scaffolding (one prompt)
Prompt 2.1 โ Create K8s deployment skeleton
Create
verifiably-go/deploy/k8s/with this layout, all empty stubs with TODO comments:deploy/k8s/ terraform/ bootstrap/local-kind/ # optional cluster creation bootstrap/onprem-k3s/ bootstrap/aws-eks/ platform/ # operators + cluster services (always applied) workloads/ # thin wrapper that helm-installs the umbrella chart environments/dev.tfvars environments/prod.tfvars helm/ charts/ walt-issuer/ walt-verifier/ walt-wallet/ verifiably-go/ keycloak/ # wrapper around bitnami/keycloak with our values wso2is/ libretranslate/ umbrella/waltid/ # depends on the above subcharts scripts/ k8s-deploy.sh # the single-click entry point README.mdAlso add a top-level
Makefilewithmake k8s-up,make k8s-down,make k8s-statusshelling intoscripts/k8s-deploy.sh. Do not write any real chart/TF content yet.
Phase 3 โ Cloud-agnostic platform (Terraform, four prompts)
Prompt 3.1 โ Local-kind bootstrap module
Implement
deploy/k8s/terraform/bootstrap/local-kind/. It should: create a kind cluster with 3 nodes via thetehcyx/kindprovider (or anull_resourcecallingkind), expose ports 80/443 via extraPortMappings, install MetalLB with a Docker-network IP pool, output a kubeconfig path. This is the dev-machine target โ it must come up withterraform applyand nothing else.
Prompt 3.2 โ Platform module (cloud-agnostic, the heart of it)
Implement
deploy/k8s/terraform/platform/. Inputs:kubeconfig_path,cluster_issuer_email,domain. Using thehelmandkubernetesproviders, install in this order with explicitdepends_on:
- ingress-nginx
- cert-manager (with a
ClusterIssuerfor self-signed CA + one for Let's Encrypt, user picks via var)- MetalLB CRDs (skip if running on EKS โ gate on var
lb_mode = "metallb" | "cloud")- CloudNativePG operator
- MinIO operator
- External Secrets Operator
- HashiCorp Vault in HA mode (3 replicas, Raft storage on PVCs)
- kube-prometheus-stack
- Loki + Promtail
- Argo CD
Every chart version pinned. No cloud-specific resources anywhere in this module. Outputs: ingress LB hostname, Vault address, Argo CD admin password secret name.
Prompt 3.3 โ On-prem k3s and AWS EKS bootstrap modules
Implement
bootstrap/onprem-k3s/(assumes user provides node IPs via var; shells out tok3supto install) andbootstrap/aws-eks/(uses the officialterraform-aws-modules/eks/awsmodule but with only self-managed node groups, no Fargate, no managed addons โ keep it portable). Both must produce the same output shape asbootstrap/local-kind/:kubeconfig_path,lb_mode. The platform module from 3.2 must apply unmodified to the output of all three.
Prompt 3.4 โ Workloads module wrapping the umbrella chart
Implement
deploy/k8s/terraform/workloads/. It takeskubeconfig_path+ avalues.yamlpath, installs theumbrella/waltidchart fromdeploy/k8s/helm/umbrella/waltidinto namespacewaltid, withwait = trueand a 15-min timeout. Also creates theExternalSecretresources that pull walt.id DB creds + signing keys from Vault. No business logic here โ just glue.
Phase 4 โ Walt.id Helm charts (six prompts, can parallelize)
Prompt 4.1 โ walt-issuer chart
Create
deploy/k8s/helm/charts/walt-issuer/. Imagewaltid/issuer-api:0.18.2. Mounts ConfigMap built fromdeploy/k8s/config/issuer/(per Prompt 1.2). DB DSN, signing keys come from aSecretreferenced viaexistingSecretin values. Liveness/livez, readiness/readyz(verify actual endpoints in Phase 0 audit). Resources, HPA (default off), PodDisruptionBudget, NetworkPolicy (deny-all + allow ingress from ingress-nginx and verifiably-go namespaces), securityContext (non-root, readOnlyRootFilesystem, drop ALL caps), ServiceAccount with no token automount.values.yamlschema must matchdocs/k8s/values-schema.mdfrom Prompt 0.2. Addhelm lint+helm templatesmoke test in CI.
Prompt 4.2 โ walt-verifier chart
Same shape as 4.1 but for
waltid/verifier-api:0.18.2. The verifier is stateless โ HPA defaultsminReplicas: 2.
Prompt 4.3 โ walt-wallet chart
Same shape as 4.1 but for
waltid/wallet-api:0.18.2. This is the stateful one: it talks to Postgres and holds key material. Defaultreplicas: 1andHPA: disableduntil Phase 6 verifies horizontal-scale behaviour. Wire wallet DB connection to aCloudNativePGClusterresource referenced by name from values.
Prompt 4.4 โ verifiably-go chart
Chart for the Go app. Image from Prompt 1.3. The
backends.jsonthatdeploy.sh run waltidgenerates becomes a templated ConfigMap. The chart must acceptbackendsmap in values and produce the same JSON shape (readdeploy.sh:820-870for the format).
Prompt 4.5 โ Wrapper charts for keycloak, wso2is, libretranslate
For each, write a thin chart that either wraps the upstream community chart (Bitnami Keycloak 25.x) or implements directly if no upstream chart exists (WSO2IS โ write from scratch, mirror the compose env vars and volumes). Pin all versions. Walt.id integration depends on Keycloak realm import โ handle via a
pre-installJob that importskeycloak-realm.jsonfrom the existing compose dir.
Prompt 4.6 โ umbrella/waltid chart
Umbrella chart with subchart dependencies on all six charts from 4.1โ4.5 plus a
cnpg-clustertemplate for the shared walt.id Postgres. Also includesIngressresources for the public hostnames (issuer.<domain>,verifier.<domain>,wallet.<domain>,app.<domain>). Onevalues.yamlwith sane defaults; onevalues-prod.yamlexample with HPA + PDB + multi-replica.helm install waltid ./umbrella/waltid -n waltid --create-namespacemust work end-to-end.
Phase 5 โ Single-click orchestrator (one prompt)
Prompt 5.1 โ Implement k8s-deploy.sh mirroring deploy.sh
Implement
deploy/k8s/scripts/k8s-deploy.shwith the same UX as the existingverifiably-go/deploy.sh:
./k8s-deploy.sh up waltid [--target=local|onprem|aws]runsterraform applyon the chosenbootstrap/*module, thenterraform applyonplatform/, thenterraform applyonworkloads/. Waits for all pods Ready. Prints URLs + credentials../k8s-deploy.sh run waltidre-builds + pushes the verifiably-go image (callsmake imagefrom Prompt 1.3) and rolls the deployment viakubectl rollout restart../k8s-deploy.sh down [waltid]reverses workloads (keeps platform + cluster)../k8s-deploy.sh resetdestroys everything including the cluster../k8s-deploy.sh statusprints pod / ingress / cert state../k8s-deploy.sh logs <service>tails logs.Re-use the colour helpers and option parsing style from
deploy.shso the two feel sibling. Default target:local. Document indeploy/k8s/README.md.
Phase 6 โ Observability + autoscaling validation (two prompts)
Prompt 6.1 โ Wire walt.id metrics
Verify which Prometheus endpoints walt.id exposes (test against a running container โ likely
/actuator/prometheusor Micrometer on a sidecar port). AddServiceMonitorresources to each chart from Phase 4. Provide a Grafana dashboard JSON indeploy/k8s/helm/charts/walt-*/dashboards/and load it via thekube-prometheus-stacksidecar pattern.
Prompt 6.2 โ Load-test wallet-api horizontal scale
Write a k6 script under
deploy/k8s/test/load/wallet-scale.jsthat exercises wallet creation + credential storage. Run it against a 1-replica wallet-api, then a 3-replica deployment, and report whether sessions/keys break under round-robin load balancing. If they do: document needed sticky-session config or a session-store dependency. Report only โ do not change chart defaults until results are in.
Phase 7 โ Security hardening (two prompts)
Prompt 7.1 โ Vault Transit for signing keys + secret rotation
Replace the file-mounted signing keys in walt-issuer + walt-wallet with Vault Transit references. Issuer + wallet talk to Vault via the agent injector sidecar pattern. Add a Vault policy file under
deploy/k8s/terraform/platform/vault-policies/and a Job that bootstraps Transit keys on first install. Document the rotation runbook.
Prompt 7.2 โ Pod Security + NetworkPolicy + image signing
Apply
PodSecurity: restrictedlabel to thewaltidnamespace. Audit every chart from Phase 4: any pod that fails restricted gets fixed (likely WSO2IS will need work). Add a default-denyNetworkPolicyto the namespace and per-service allow rules. Sign all custom images (verifiably-go) with cosign in CI; add a Kyverno policy that rejects unsigned images in thewaltidnamespace.
Phase 8 โ End-to-end verification (one prompt)
Prompt 8.1 โ Port the existing e2e suite to K8s
The repo has an
e2e/suite that runs against the compose stack. Port it (or wrap it) so it runs against a fresh./k8s-deploy.sh up waltid --target=localcluster. Add a CI job that does the full cycle โ bring up kind + platform + workloads, run e2e, tear down โ on every PR touchingdeploy/k8s/. Wallclock budget: 20 min.
Execution notes
- Run phases in order. Within a phase, prompts marked "can parallelize" are independent.
- After each prompt, verify the deliverable exists and the smoke test in that prompt passes before moving on.
- Each prompt is self-contained โ copy into a fresh agent session with no context.
- Track progress by checking off the boxes below (or via the conversation's task list).
Checklist
- 0.1 Walt.id config surface inventory โ
docs/k8s/inventory.md - 0.2 Walt.id config-file deep-read โ
docs/k8s/values-schema.md - 1.1 Pin all base images
- 1.2 Externalize walt.id config to templated files (+ 4 security fixes)
- 1.3 Build & publish verifiably-go image (
/healthz,/readyz, JSON logs, CI) - 2.1 Create K8s deployment skeleton
- 3.1 Local-kind bootstrap module
- 3.2 Platform module
- 3.3 On-prem k3s + AWS EKS bootstrap modules
- 3.4 Workloads module
- 4.1 walt-issuer chart
- 4.2 walt-verifier chart
- 4.3 walt-wallet chart
- 4.4 verifiably-go chart
- 4.5 keycloak / wso2is / libretranslate wrapper charts
- 4.6 umbrella/waltid chart
- 5.1 k8s-deploy.sh single-click orchestrator
- 6.1 Wire walt.id metrics (ServiceMonitor + dashboards + endpoint discovery doc)
- 6.2 Load-test wallet-api horizontal scale (k6 script โ run pending cluster)
- 7.1 Vault Transit + secret rotation (policy, bootstrap stub, runbook)
- 7.2 Pod Security restricted + NetworkPolicy default-deny + Kyverno cosign
- 8.1 Port e2e suite to K8s (
run-against-cluster.sh+ CI workflow)