Skip to content

Vault

Vault is the analytics pipeline — it ingests consent events and session data, stores them in ClickHouse, and serves aggregated metrics via an API.

Architecture

flowchart LR
    Widget[Widget JS] -->|events| API[vault-api]
    API --> CH[(ClickHouse)]
    ETL[vault-etl] --> CH
    Grafana[Grafana] --> CH
    CoreAPI[core-api] -->|HTTP| API
Component Purpose
vault-api TypeScript HTTP API. Ingests events, serves analytics queries.
vault-etl ECS task that runs data pipeline jobs into ClickHouse.
ClickHouse Column-oriented analytics database. Stores all consent and session data.
Grafana Monitoring dashboards, queries ClickHouse directly.

Previous generations deprecated

Earlier architectures used Docker Compose + Portainer, and later Docker Swarm + Trino + Iceberg. These have been fully replaced by the current ClickHouse + ECS setup.

Infrastructure

Production

Component Details
ECS cluster prod-euc1-vault-ecs-cluster (c7i.xlarge)
vault-api ECS service on internal ALB (vault-api.internal.cookiehub.net)
vault-etl ECS task (scheduled/triggered)
ClickHouse EC2 m7i.xlarge with 1TB EBS data volume
Grafana ECS task, port 3000, accessible via port forwarding
Database PostgreSQL RDS (prod-euc1-core-postgres)
ALB Internal only — not publicly accessible

Stage

Component Details
Database PostgreSQL RDS (stage-euc1-core-postgres)

Access

Production vault services are behind the internal ALB — accessible only via VPN or SSM port forwarding.

Secrets

Stored in SSM under /{group}/{env}/vault-api/ and /{group}/{env}/etl/:

Parameter Description
clickhouse_url ClickHouse HTTP endpoint
clickhouse_username ClickHouse auth
clickhouse_password ClickHouse auth
vault_api_key Static API key for service-to-service auth
vault_ingest_config JSON config for ETL pipeline

Dependencies

  • ClickHouse — primary data store
  • PostgreSQL RDS — metadata storage
  • Core API — calls vault-api for consent analytics and session timeline data