Airbyte OSS Metrics in Prometheus

Reading Time: 4 minutes

Airbyte is a fast-growing ELT tool that helps acquire data from multiple sources. Particularly useful in building data lakes. Airbyte offers pre-built connectors to over 300 sources and 10s of destinations and also allows custom connectors to be built quickly using language SDKs.

Airbyte recently released Opentelemetry-based metrics, however, the documentation has been spotty and incomplete. You can check it out here. In this blog, I will document my learnings through the journey of integrating Airbyte open source, running in GKE to Grafana, and using GCP’s managed Prometheus service. The available metrics can be seen here

Airbyte to Grafana – Via Open Telemetry & Prometheus

The design looks as follows.

It is quite a long journey for a metric to appear in Grafana at the moment. Airbyte integrates natively with Datadog if you wish to pursue, and takes away the bulk of this complexity. Of course, you will have to pay for it though.

Implementation

Step 1 – Deploy Airbyte

Install Airbyte on Kubernetes. This is pretty straightforward. Follow the instructions on this page.

git clone https://github.com/airbytehq/airbyte.git
cd airbyte
kubectl apply -k kube/overlays/stable

You can customize the namespace to fit your needs. Add the following 2 variables to the worker in the .env (Which will translate into configmap and eventually the environment variable of the pod).

METRIC_CLIENT=otel
OTEL_COLLECTOR_ENDPOINT=http://otel-collector:4317

We will build open telemetry pod in subsequent steps. Note that, if you add PUBLISH_METRICS=true, currently worker will look for datadog configurations.

Step 2 – Deploy Metric-reporter

Metric reporter queries the metrics from database in batches and pumps it to open telemetry instance. Use the following yaml as an example.

---
apiVersion: apps/v1
kind: Deployment
metadata:
   name: airbyte-metrics
   namespace: airbyte-dev
   labels:
       app: airbyte-metrics
spec:
  replicas: 1
  selector:
     matchLabels:
         app: airbyte-metrics
  template:
     metadata:
       labels:
           app: airbyte-metrics
     spec:
         serviceAccountName: airbyte-admin
         automountServiceAccountToken: true
         containers:
         - name: metrics
           image: airbyte/metrics-reporter:0.39.31-alpha
           env:
            - name: METRIC_CLIENT
              value: "otel"
            -  name: OTEL_COLLECTOR_ENDPOINT
               value: "otel-collector:4317"
            - name: PUBLISH_METRICS
              value: "true"

Note: Metric reporter need access to the airbyte database. Copy all the configs of Airbyte worker (env: section) in addition to above key value pairs.

Step 3: Create Open Telemetry Collector

Open telemetry collector receives metrics from the metric exporter and writes to prometheus. Its fairly well documented and standard implementation.

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: otel-collector-conf
  namespace: airbyte-dev
  labels:
    app: opentelemetry
    component: otel-collector-conf
data:
  otel-collector-config: |
    receivers:
      otlp:
        protocols:
          grpc:
          http:
    processors:
      batch:
      memory_limiter:
        limit_mib: 1500
        spike_limit_mib: 512
        check_interval: 5s
    extensions:
      zpages: {}
      memory_ballast:
        size_mib: 683
    exporters:
      logging:
        loglevel: debug
      prometheusremotewrite:
        endpoint: "http://prometheus-test.airbyte-dev.svc:9090/api/v1/write"
    service:
      extensions: [zpages, memory_ballast]
      pipelines:
        metrics:
          receivers: [otlp]
          processors: [memory_limiter, batch]
          exporters: [logging, prometheusremotewrite]
---
apiVersion: v1
kind: Service
metadata:
  name: otel-collector
  namespace: airbyte-dev
  labels:
    app: opentelemetry
    component: otel-collector
spec:
  ports:
  - name: otlp-grpc # Default endpoint for OpenTelemetry gRPC receiver.
    port: 4317
    protocol: TCP
    targetPort: 4317
  - name: otlp-http # Default endpoint for OpenTelemetry HTTP receiver.
    port: 4318
    protocol: TCP
    targetPort: 4318
  - name: metrics # Default endpoint for querying metrics.
    port: 8888
  selector:
    component: otel-collector
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: otel-collector
  namespace: airbyte-dev
  labels:
    app: opentelemetry
    component: otel-collector
spec:
  selector:
    matchLabels:
      app: opentelemetry
      component: otel-collector
  minReadySeconds: 5
  progressDeadlineSeconds: 120
  replicas: 1 #TODO - adjust this to your own requirements
  template:
    metadata:
      labels:
        app: opentelemetry
        component: otel-collector
    spec:
      containers:
      - command:
          - "/otelcol"
          - "--config=/conf/otel-collector-config.yaml"
        image: otel/opentelemetry-collector:0.54.0
        name: otel-collector
        resources:
          limits:
            cpu: 1
            memory: 2Gi
          requests:
            cpu: 200m
            memory: 400Mi
        ports:
        - containerPort: 55679 # Default endpoint for ZPages.
        - containerPort: 4317 # Default endpoint for OpenTelemetry receiver.
        - containerPort: 14250 # Default endpoint for Jaeger gRPC receiver.
        - containerPort: 14268 # Default endpoint for Jaeger HTTP receiver.
        - containerPort: 9411 # Default endpoint for Zipkin receiver.
        - name: metrics
          protocol: TCP
          containerPort: 8888
        volumeMounts:
        - name: otel-collector-config-vol
          mountPath: /conf
      volumes:
        - configMap:
            name: otel-collector-conf
            items:
              - key: otel-collector-config
                path: otel-collector-config.yaml
          name: otel-collector-config-vol

Step 4: Deploy Prometheus Proxy

Though we could deploy a full fledged Prometheus, I chose to use Google provided managed prometheus (GMP) service. However, managed prometheus require a proxy to provide the end point for open telemetry instance. The documentation is here. Here is my yaml for it.

---
apiVersion: v1
kind: Service
metadata:
  namespace: airbyte-dev
  name: prometheus-test
  labels:
    prometheus: test
spec:
  type: ClusterIP
  selector:
    app: prometheus
    prometheus: test
  ports:
  - name: web
    port: 9090
    targetPort: web
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  namespace: airbyte-dev
  name: prometheus-test
  labels:
    prometheus: test
spec:
  replicas: 1
  selector:
    matchLabels:
      app: prometheus
      prometheus: test
  serviceName: prometheus-test
  template:
    metadata:
      labels:
        app: prometheus
        prometheus: test
    spec:
      automountServiceAccountToken: true
      nodeSelector:
        kubernetes.io/arch: amd64
        kubernetes.io/os: linux
      containers:
      - name: prometheus
        image: gke.gcr.io/prometheus-engine/prometheus:v2.28.1-gmp.7-gke.0
        args:
        - --config.file=/prometheus/config_out/config.yaml
        - --storage.tsdb.path=/prometheus/data
        - --storage.tsdb.retention.time=24h
        - --web.enable-lifecycle
        - --enable-feature=remote-write-receiver
        - --storage.tsdb.no-lockfile
        - --web.route-prefix=/
        ports:
        - name: web
          containerPort: 9090
        readinessProbe:
          httpGet:
            path: /-/ready
            port: web
            scheme: HTTP
        resources:
          requests:
            memory: 400Mi
        volumeMounts:
        - name: config-out
          mountPath: /prometheus/config_out
          readOnly: true
        - name: prometheus-db
          mountPath: /prometheus/data
      - name: config-reloader
        image: gke.gcr.io/prometheus-engine/config-reloader:v0.4.1-gke.0
        args:
        - --config-file=/prometheus/config/config.yaml
        - --config-file-output=/prometheus/config_out/config.yaml
        - --reload-url=http://localhost:9090/-/reload
        - --listen-address=:19091
        ports:
        - name: reloader-web
          containerPort: 8080
        resources:
          limits:
            cpu: 100m
            memory: 50Mi
          requests:
            cpu: 100m
            memory: 50Mi
        volumeMounts:
        - name: config
          mountPath: /prometheus/config
        - name: config-out
          mountPath: /prometheus/config_out
      terminationGracePeriodSeconds: 600
      volumes:
      - name: prometheus-db
        emptyDir: {}
      - name: config
        configMap:
          name: prometheus-test
          defaultMode: 420
      - name: config-out
        emptyDir: {}
---
apiVersion: v1
kind: ConfigMap
metadata:
  namespace: airbyte-dev
  name: prometheus-test
  labels:
    prometheus: test
data:
  config.yaml: |
    global:
      scrape_interval: 30s

    scrape_configs:
    - job_name: otel-collector
      static_configs:
        - targets: ['otel-collector.airbyte-dev.svc:8888']

There are 2 key points. First, this setup is for open telemetry to write metrics to prometheus vs, Prometheus pulling the metrics, hence needed to add argument –enable-feature=remote-write-receiver . Second is, for pull, Prometheus needs to be configured for scraping, which is not implemented, though added in scrape configs.

Step-5: Install Grafana

This is fairly straightforward. Deploy the pod as below and point the data source to the GMP proxy at http://prometheus-test.airbyte-dev.svc:9090.

---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: grafana-pvc
  namespace: airbyte-dev
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi
---
apiVersion: apps/v1
kind: Deployment
metadata:
  namespace: airbyte-dev
  labels:
    app: grafana
  name: grafana
spec:
  selector:
    matchLabels:
      app: grafana
  template:
    metadata:
      labels:
        app: grafana
    spec:
      securityContext:
        fsGroup: 472
        supplementalGroups:
          - 0
      containers:
        - name: grafana
          image: grafana/grafana:8.4.4
          imagePullPolicy: IfNotPresent
          ports:
            - containerPort: 3000
              name: http-grafana
              protocol: TCP
          readinessProbe:
            failureThreshold: 3
            httpGet:
              path: /robots.txt
              port: 3000
              scheme: HTTP
            initialDelaySeconds: 10
            periodSeconds: 30
            successThreshold: 1
            timeoutSeconds: 2
          livenessProbe:
            failureThreshold: 3
            initialDelaySeconds: 30
            periodSeconds: 10
            successThreshold: 1
            tcpSocket:
              port: 3000
            timeoutSeconds: 1
          resources:
            requests:
              cpu: 250m
              memory: 750Mi
          volumeMounts:
            - mountPath: /var/lib/grafana
              name: grafana-pv
      volumes:
        - name: grafana-pv
          persistentVolumeClaim:
            claimName: grafana-pvc
---
apiVersion: v1
kind: Service
metadata:
  name: grafana
  namespace: airbyte-dev
spec:
  ports:
    - port: 3000
      protocol: TCP
      targetPort: http-grafana
  selector:
    app: grafana
  sessionAffinity: None
  type: LoadBalancer


The steps may not be in the necessary order, but at the end of it, each instance should discover the endpoints and should function.

The metrics can now be queried either in GMP or Grafana.

Written by 

As an Engineer, I help customers in architecting platforms using Spark, Mesos, Cassandra, Kafka (And their commercial versions). As a partner, I guide customers in setting up the organization, processes and build top-notch teams that solve complex problems or deliver digital transformation. My interests and expertise are in Mathematics, Machine learning, Microservices, Linked data, distributed cloud infrastructure, Real-time enterprise data integration.