AWS Says 30% Better Performance. That Number Needs a Denominator
AWS says Graviton4 delivers 30% better performance than comparable its previous generation. That number is real under the benchmark conditions AWS used to generate it. The more honest question is what “30% better performance” means for your workload under your traffic conditions. No spec sheet can answer that.
The architectural decisions that drive cloud spend get made weeks before any bill arrives. And the people making those decisions are usually the furthest from the bill. Giving engineers a metric that shows up at decision time means giving them something more specific than a dollar total: a ratio between work produced and money spent.
Infrastructure cost alone tells you almost nothing. Cost per unit of work tells you whether an architecture decision actually improved anything.
The Denominator Changes Everything
"How much are we spending?" is the wrong question. "How much work are we getting per dollar?" is the one that actually optimizes for the right outcome.
Take your total spend and divide it by whatever your infrastructure is actually producing. For an API service that might be request count. For a data pipeline it could be gigabytes processed or queries completed. What you choose determines what you’re optimizing for, and it’s the part most teams stop paying attention to.
Spend and unit cost can move in opposite directions, and that distinction matters. A team that doubles its infrastructure budget but triples throughput improved its efficiency even though the bill went up. The inverse is the real failure mode: spend drops while the cost of each unit of work rises. At the quarterly review the cost chart looks good, and nobody notices the system is doing less for every dollar. Total spend can’t distinguish between those outcomes. Cost per operation can.
Divide request count by the EC2 spend for that service over the same period. If you’re on AWS, load balancer metrics often provide a reasonable proxy, but ideally you should track the unit that actually represents work for that system. When you change instance types, watch whether that ratio improves. If it doesn’t, the architectural change didn’t help.
Better yet, measure this before production. Benchmark the amount of work an instance can deliver under realistic load, then divide the instance cost (family, generation, and size) by the work completed. Sometimes a c7a will outperform a c7i or a c8g for a given workload. The instance with the lowest cost per unit of work wins.
For teams running AI inference workloads, the same logic surfaces as cost per token. The spread between efficient and inefficient model selection compounds quickly at scale, and $/hr tells you just as little about cost efficiency there as it does on EC2.
Three Architectures, One Right Answer for Your Workload
Instance architecture is one of the earliest decisions that affects cost per operation, and it’s one many teams overlook. Intel has been the default for years. AMD closed the gap, sometimes surpassing it, and Arm processors like Graviton introduced a third architecture that often shifts the economics entirely.
Current-generation on-demand pricing puts Graviton roughly 20% below Intel, with AMD running slightly above Intel. Think of hourly pricing as the cost of gas per gallon. Benchmarking is miles per gallon: how far each gallon will actually take you.
Suppose Graviton is 20% cheaper per hour than Intel. That looks like a win before you measure anything. But if Intel handles 30% more requests per hour on your particular workload, Intel's cost per request comes in lower than Graviton's, despite the higher hourly rate. You're paying more per hour and getting a better deal on the work that actually matters. Conversely, if AMD's floating-point throughput is 30% higher on a compute-intensive job, AMD could win the cost-per-operation comparison even though it costs more per hour than Intel.
AWS benchmarks each instance generation against its own predecessor. The cross-architecture comparisons that appeared when Graviton 2 launched, because a new instruction set needed to earn its place, don't exist in current vendor publications. AMD doesn't appear in Graviton marketing comparisons at all. The spec sheet won't answer the question you're actually asking, which is why you have to run the test.
The hourly rate is a starting point for deciding what to test. It is not the answer the test would give you.
How to Actually Measure
Run your workload on each architecture you’re evaluating under realistic load: traffic that resembles production and reflects the utilization levels where the system spends most of its time, not just peak. Record the throughput metric that represents work for that service and divide it by the hourly instance cost for the run. That gives you cost per operation for each architecture. Lowest number wins. I already mentioned gas and MPG, so for the electric crowd: think watt-hours per mile. 400 blasting down the highway, 200 cruising surface streets.
Run it across all three instance families. For most general-purpose Linux services, Graviton tends to come out ahead as a finding, not a premise. For compute-intensive workloads with floating-point-heavy patterns, AMD is worth testing rather than dismissing because the sticker price is higher. The benchmark is the only thing that tells you which assumption was wrong.
The Stacking Math
Once measurement identifies the right architecture, the commitment layer is where savings compound. What follows assumes Graviton won your benchmark; if Intel or AMD won, the same arithmetic applies with those instances as the base.
On-demand Graviton instances run as much as 15% below comparable Intel instances. Apply a one-year Compute Savings Plan to that baseline and the effective rate drops further, landing around 40% below Intel on-demand. The discounts stack because they operate at different layers: Graviton pricing reflects ARM economics, while the Savings Plan discount reflects your commitment term. One doesn’t cancel out the other.
If AMD won your benchmark, the same Savings Plan structure applies. The starting rate is higher, but if the performance delta was large enough to win the cost-per-operation comparison, the committed rate can still come out ahead of Graviton's. Measure before you commit; that sequence is what makes the math mean anything.
Compute Savings Plans are the right commitment vehicle for most teams, not Reserved Instances. Reserved Instances lock to a specific instance type and region. Compute Savings Plans cover your full EC2 footprint regardless of family or region, and extend to Lambda and Fargate. For teams likely to change instance families or regions within the year, that flexibility justifies the slightly lower maximum discount. Right-size the instance before committing to anything; a Savings Plan on an instance running at 20% utilization is waste at a discount, locked in for a year.
How to Optimize Your Way Into a Different Problem
Right-sizing past the performance cliff is the most common self-inflicted wound. Cutting an API tier to a smaller instance because average CPU looks low misses the p99 story. The system spends most of its time cruising, until the moment it doesn’t. When that happens the instance runs out of headroom, latency spikes, and the savings show up neatly in the infrastructure report while the damage shows up somewhere harder to diagnose.
Don’t cut observability to trim the bill. Whatever your stack relies on for visibility is the feedback loop that tells you whether any of this is working. Without it, you won’t know if an architecture migration introduced a latency regression or whether last quarter’s changes are still holding. Treating observability as overhead is how the next cost surprise arrives with less information to diagnose it.
You have the measurement framework and the arithmetic to use it. Soon we'll talk about ASG configuration and instance diversification. Specifically how to structure a fleet so a sound architecture decision doesn't get undercut by a default deployment, and how pricing models can be coupled with the benchmarks your testing identifies to compound the savings further.