Skip to content

Practical Uses for Google Cloud’s Workflow — Simple Data Pipelines

Published: at 03:57 PM

The Challenge

I recently spoke with a customer that was looking to capture and track DevOps Research and Assessment (DORA) metrics. The DORA metrics are four key metrics that indicate the performance of a software development team. These metrics are:

  1. Deployment Frequency: How often an organization successfully releases to production
  2. Lead Time for Changes: The amount of time it takes a commit to get into production
  3. Change Failure Rate: The percentage of deployments causing a failure in production
  4. Time to Restore Service: How long it takes an organization to recover from a failure in production

This customer could report on the Deployment Frequency, Lead Time for Changes, and the Change Failure Rate. However, they didn’t have an easy way to measure the Time to Restore Service. This customer uses GCP’s Operations Suite alerting tools to detect failures and resolve incidents. While the events are generated by Operations Suite, ultimately we want the data in BigQuery so we can build Looker Studio dashboards and analyze the data later.

The Solution

To capture these events, we can take advantage of the Pub/Sub notification channels. After your alerting policy is defined, alerts are sent to the notification channel of your choice (e.g. SMS, Email, Pub/Sub, etc.) Pub/Sub notification channels are a great, general solution when your preferred destination is not currently supported in Operations Suite. This allows us to take the generated events and do, really, whatever we want.

We could use a BigQuery subscription to write the Pub/Sub messages directly to BigQuery; however, we’d encounter a few issues.

But what if we used Workflows? We can unmarshal the JSON object, extract the specific fields we want, and then write them into BigQuery.

main:
  params: [event]
  steps:
  - decode_pubsub:
      assign:
        - base64: ${base64.decode(event.data.message.data)}
        - message: ${text.decode(base64)}
        - alert: ${json.decode(message)}
  - if_event_over:
      switch:
        - condition: ${alert.incident.ended_at != null}
          next: insert_record
      next: end
  - insert_record:
      call: googleapis.bigquery.v2.jobs.query
      args:
        projectId: taylor-lab
        body:
          useLegacySql: false
          query: ${"INSERT INTO `reporting.alerts` (incident_id, scoping_project_id, policy_name, started_at, ended_at) VALUES ( \"" + alert.incident.incident_id + "\", \"" + alert.incident.scoping_project_id + "\", \"" + alert.incident.policy_name + "\", " + string(alert.incident.started_at) + ", " + string(alert.incident.ended_at) + ")"}
      next: end

One design quirk of this workflow is the if_event_over step. In Pub/Sub, subscriptions can be filtered by attributes but not by data in the message. If the incident is over, the event will contain the start and end date. However, the Pub/Sub topic will receive messages at the start of the incident, which is incomplete for our reporting purposes. It’s simpler to insert the complete record instead of updating the record later so we’ll perform one insert when the incident is done.

This provides some advantages and unique characteristics:

However, you should still watch for schema changes to minimize the chance of disruptions.

Next Steps

Optionally, if you expect this Workflow to have high throughput, you could publish a message to Pub/Sub with a BigQuery subscription after the JSON object has been parsed. While we couldn’t guarantee the schema beforehand, our workflow allows us to convert a dynamic schema into a predictable output to take advantage of BigQuery subscriptions.