Skip to content

Cloud Run Worker Pools are worth a look

Published: at 12:00 AM

In June 2025, Google released a new operating mode for Cloud Run called Worker Pools. Worker Pools are a new operational mode in addition to Services and Jobs, and they solve a specific problem that’s plagued Cloud Run users trying to build reliable Pub/Sub consumers: how do you process messages that take longer than ten minutes without adopting the operational complexity of Kubernetes?

The Problem with Cloud Run Services and Jobs

Pub/Sub has two types of subscriptions: push and pull.

More often than not, I prefer to use pull subscriptions when you can. Detailed comparisons exist between the two, but the short version is you can achieve higher throughput and you get more flexibility in how long you can spend processing each message.

So what’s the issue? The problem comes down to how Cloud Run Services and Jobs were designed to work.

Why Cloud Run Jobs Fall Short

Cloud Run Jobs seem like they’d be perfect for this at first glance. But they’re not really built to autoscale. When you start a Job execution, you have to decide upfront how many resources you need. Plus, jobs are expected to finish at some point. The runtime for a job is capped at 24 hours, which means you’re stuck restarting the job at least once a day.

Now let’s say you wanted to automatically scale your job to match demand. How would you even do that?

You’d probably need to cap the execution time to something shorter than 24 hours. Let’s say thirty minutes for the sake of argument. After those thirty minutes are up, you’d need to check your monitoring system, figure out how many compute resources you need, start a new job for the next thirty minutes with the right number of instances, and then keep repeating this whole dance.

This approach has all kinds of problems. The biggest one: what happens to messages you’re still processing after thirty minutes? Do you just stop, skip the ACK back to Pub/Sub, and pick them up again next time? That’s incredibly wasteful and you end up processing the same messages multiple times.

On top of that, you’re dealing with cold starts every time you spin up a new job, you have to manage state between runs somehow, and you’ve essentially built an entire orchestration system just to scale your message processing. Not great.

Why Cloud Run Services Don’t Cut It

Okay, so what if we flip things around and use Services with push subscriptions instead? This actually works pretty well! The annoying part is you’re forced to wrap everything in an HTTP server, and now you’re stuck with that ten-minute upper limit to process your messages. If your messages can be processed in under ten minutes, you’re golden. If not, you’re out of luck.

For anything that needs longer processing time—video transcoding, large-scale data transformations, ML model training, complex ETL workflows—Cloud Run Services just can’t handle it.

The Kubernetes Escape Hatch

At this point, if you know your way around GKE, you’re probably thinking “Why not just use Horizontal Pod Autoscalers that scale based on Pub/Sub metrics?”

Sure, that works. A lot of time has been spent debating the merits of Kubernetes over the years. However, it’s generally more work than Cloud Run if you don’t have a dedicated platform team. You need to manage cluster infrastructure, understand pod scheduling, and usually install additional components like KEDA or set up custom metrics adapters just to scale based on your Pub/Sub queue depth.

Plenty of teams work perfectly fine within Cloud Run’s serverless model. It’s a bit much to tell them they need to completely rethink their entire operational setup just because they need to process some longer-running messages.

Enter Worker Pools

This is where Worker Pools really shine. You get long-running containers that scale horizontally and process Pub/Sub messages without needing HTTP endpoints or dealing with artificial time limits.

How Worker Pools Work

Worker Pools run continuously like Services but don’t require HTTP endpoints like Jobs do. You deploy a container that runs your message processing code using the standard Pub/Sub client libraries with pull subscriptions. Your code subscribes to a Pub/Sub topic, processes messages as they come in, and acknowledges them when you’re done.

Unlike Services, Worker Pools don’t include automatic scaling out of the box. You need to handle scaling yourself based on whatever metrics make sense for your workload—typically things like CPU utilization, memory usage, or Pub/Sub queue depth.

To help with this, I’ve released an example autoscaler on Github that shows how to scale Worker Pools based on Pub/Sub metrics. It monitors your subscription’s unacknowledged message count and scales your worker pool up when the queue gets backed up and down when things calm down.

Conclusion

It’s early, it’s still in preview, but I’m excited for it. It’s not revolunary by any means but it absolutely solves a consistent paper cut I’ve experienced when trying to adopt Cloud Run. I’m very excited to roll this out once it hits GA.