By Steve Cole — 15 Mar 2026

AWS Auto Scaling Policies

Posted another video this week. It’s a double-continuation; it dives deeper on the previous video I posted and continues the FinOps thread I’ve been focussed on this month.

Figured I’d post the transcript for those of you on United flights without headphones so you don’t get banned. Video link at the bottom.

This is AWS Auto Scaling Policies, in FOUR minutes.

We pick up where we left off and dive deeper into how you can match your infrastructure to your demands, saving costs when traffic is low and saving face when traffic is high.

There are five ways to scale your traffic.

Six, if you count needing to build an Auto Scaling Group in the first place.

You can run one policy, or many, whatever suits your workload best.

However you configure it, by using Auto Scaling you’ve already improved over static footprints.

Manual scaling means setting the desired instance count directly, with no automation.

It sounds basic, but many orchestration tools work this way.

They make the capacity decisions and Auto Scaling handles the infrastructure.

If you’re running ECS, or EKS without Karpenter, you’re probably already using this.

Simple scaling fires one alarm and scales by a fixed amount.

Then it enters a five-minute cooldown before it can act again.

During that cooldown, demand could spike to any level and the policy wouldn’t respond.

It exists, but don’t use it.

Where Simple Scaling fires once and waits, Step Scaling adds proportional responses.

A small alarm breach might add one instance.

Or a severe breach might add five. You define the steps.

It also doesn’t freeze in cooldown the way Simple Scaling does

Fortunately you don’t have to do either of those.

AWS likes to say Target Tracking works like a thermostat.

They’re not wrong.

Set a target metric value, say 50% CPU utilization, and the policy adjusts instance count to hold that target.

Unlike Step, it does the math for you.

Instead of guessing what you need, Target Tracking will calculate what you need to return to that optimal metric, whether you’re scaling in or out.

If you use ONE scaling policy, this should be it.

Have an important event next week and expect to take the internet by storm?

Scheduled Scaling allows you to change the desired capacity on a given day at a given time.

That viral event on Friday will need more capacity.

Maybe your engineering team will need to reduce capacity to do some database work the day after.

What many people don’t realize, is that Scheduled Scaling can also adjust your floor and ceiling at the same time as your desired capacity.

This moves the guardrails to prevent another policy from over- or under-scaling.

You knew you could run multiple policies, right?

Predictive Scaling is like automating Scheduled Scaling.

AWS analyzes two weeks of historical data and forecasts two days ahead.

This allows your fleet to scale out before expected traffic arrives.

But won’t that cost more money?

It does, but can you put a price on lost customers?

For workloads with recognizable daily or weekly patterns, this eliminates the gap between spike and scale-out.

Because no one can really predict the future…

And let’s be honest, you wouldn’t be watching this if you could.

You’d be in Monaco.

…never run Predictive Scaling alone.

At a minimum, add a Target Tracking Policy to respond to what Predictive Scaling didn’t expect.

We mentioned before that if you only run one scaling policy, use Target Tracking for metric-driven scaling.

Have more than one metric you think are important?

Use CPU and network utilization.

Or request count.

Whatever matters most to your system.

Auto Scaling likes big footprints.

It will automatically choose the higher recommendation of your policies.

Another common pattern is to use Step Scaling to burst quickly, as it detects spikes faster.

Then use Target for less aggressive changes.

And that is Auto Scaling policies in four minutes.

AWS Auto Scaling Groups should absolutely be your first choice for any work that’s not containerized or serverless.

It’s still the unsung hero that does the optimization work for you so you can focus elsewhere.

Like your product and the compute it runs on. Hmm, maybe we should talk about that next!

Subscribe to theSteveCo