What is FinOps?
The FinOps Foundation defines FinOps as an operational framework and cultural practice that enables organizations to get maximum business value from their cloud spend by bringing together engineering, finance, and business teams. The three maturity stages form the Crawl / Walk / Run model:Crawl — visibility and accountability
Get basic cost visibility in place. This means tagging resources, establishing a cost allocation model, and getting the right people looking at the right data.
- Enable AWS Cost Explorer and configure cost allocation tags
- Identify the top 5 cost drivers in your account
- Establish a regular (at minimum monthly) cloud cost review meeting
- Assign cost ownership to teams/projects
Walk — optimization and process
Move from reactive to proactive. You have visibility; now act on it systematically.
- Implement rightsizing recommendations from Trusted Advisor / Compute Optimizer
- Purchase Reserved Instances or Savings Plans for stable baseline workloads
- Automate shutdown of non-production environments outside business hours
- Set budget alerts with meaningful thresholds
Run — continuous optimization and culture
FinOps is embedded in engineering workflows, not bolted on afterward.
- Unit economics: track cost per customer, per transaction, per feature
- Cloud costs are part of sprint planning and architecture reviews
- Automated anomaly detection triggers Slack/PagerDuty alerts before bills land
- Engineers can self-serve cost data without asking finance
Most organizations spend 18–24 months in “Walk” before reaching “Run.” That’s fine — consistent incremental improvement compounds significantly over time. Focus on the highest-impact items first rather than trying to do everything at once.
Key Cost Optimization Strategies
- Rightsizing
- Reserved Instances & Savings Plans
- Spot & Preemptible Instances
Rightsizing means matching compute resources to actual workload requirements. Oversized instances are the most common and most immediately actionable source of waste.How to find rightsizing opportunities:Common rightsizing patterns:
| Scenario | Current | Recommended | Typical Savings |
|---|---|---|---|
| Dev/test workload | m5.xlarge | t3.large | 40–60% |
| Low-CPU web server | c5.2xlarge | c5.large | ~70% |
| Over-provisioned DB | r5.4xlarge | r5.2xlarge | ~50% |
| Idle NAT instance | c5.large | Managed NAT GW | Varies |
AWS Cost Management Tools
Cost Explorer
The primary AWS cost visualization and analysis tool. Slice costs by service, linked account, region, usage type, or any cost allocation tag. Use it to identify trends, anomalies, and the biggest cost drivers before acting on them.
AWS Budgets
Set cost, usage, or coverage thresholds with SNS/email alerts. Supports forecasted spend alerts — you get notified before you overspend, not after. Supports RI/Savings Plans coverage and utilization budgets.
Trusted Advisor
Checks your account against AWS best practices across cost optimization, security, fault tolerance, performance, and service limits. Cost checks include idle EC2, unattached EBS, underutilized RDS, and low-utilization load balancers.
Compute Optimizer
Uses ML to analyze 14 days of CloudWatch metrics and recommend optimal instance types for EC2, Auto Scaling groups, EBS volumes, Lambda, ECS on Fargate, and RDS. More accurate than Trusted Advisor for compute recommendations.
FinOps Tooling: Cloudhealth & Cloudability
Both tools sit above the native AWS Cost Explorer and are designed for multi-account, multi-cloud estates where native tooling becomes unwieldy.Cloudhealth by VMware
Cloudhealth by VMware
Cloudhealth excels at policy-based governance and multi-cloud cost management across AWS, GCP, and Azure in a single pane.Core use cases I use it for:
- Perspectives: Custom groupings (by business unit, product, team) that map onto your org structure rather than just AWS account hierarchy
- Policies & Actions: Automated alerts and (optionally) enforcement actions — e.g., notify owner when an untagged resource has been running for 7 days
- Rightsizing: The recommendation engine pulls Cloudwatch data and blends it with RI/SP coverage
- Reports: Scheduled cost reports delivered to Slack or email for team leads
- Asset: Any tracked cloud resource (EC2, S3, RDS, etc.)
- Perspective: A hierarchical grouping of assets by tag, account, or custom rule
- Policy: A rule that triggers an alert or action based on asset state, cost, or tag compliance
Cloudability by Apptio
Cloudability by Apptio
Cloudability focuses on financial reporting accuracy, unit economics, and chargeback/showback workflows. Particularly strong in enterprises with mature IT financial management (ITFM) practices.Core use cases:
- True Cost: Amortizes RI/SP fees and distributes shared costs (support plans, data transfer) across teams fairly
- Allocations: Map costs to cost centers, products, or teams using tag-based and rule-based allocation logic
- Budgeting and Forecasting: Forward-looking spend models with scenario planning
- Rightsizing: Recommendations with one-click report generation for executive-ready output
- You need detailed showback/chargeback reports that satisfy finance teams
- Unit economics (cost per customer, cost per transaction) are a priority
- You integrate cloud costs into existing ITFM or Apptio tooling
Tagging Strategy for Cost Allocation
A tagging strategy is the foundation of everything in FinOps. Without consistent tags you cannot allocate costs, enforce ownership, or build accurate reports.Define mandatory tags at the account level
Establish a core set of tags that every billable resource must carry. Enforce these via SCP (AWS Organizations) or Tag Policies.
| Tag Key | Example Values | Purpose |
|---|---|---|
Environment | prod, staging, dev | Filter by lifecycle |
Project | payments, platform, data-pipeline | Allocate to project |
Team | backend, platform, data | Ownership and alerting |
CostCenter | 1001, 2034 | Finance chargeback |
Owner | alice@company.com | Point of contact |
ManagedBy | terraform, cloudformation, manual | Audit and compliance |
Common Cloud Waste Patterns
Idle and underutilized EC2 instances
Idle and underutilized EC2 instances
Pattern: Instances running 24/7 with < 5% average CPU, typically forgotten dev/test environments or decommissioned services that were never terminated.Fix:Tag dev instances with
AutoShutdown=true and build a Lambda that stops/starts based on this tag on a schedule.Unattached EBS volumes
Unattached EBS volumes
Pattern: EBS volumes left behind when EC2 instances are terminated (especially when Prevention: Always set
Delete on Termination was not set). Can accumulate silently.Fix:delete_on_termination = true in EC2 launch configurations and Terraform root_block_device blocks.Idle load balancers
Idle load balancers
Pattern: ALBs and NLBs that receive no traffic (e.g., old environments, migrated services) but continue to incur hourly charges plus LCU costs.Fix:
Old EBS snapshots and AMIs
Old EBS snapshots and AMIs
Pattern: Snapshots created for backups or AMI builds accumulate over months/years. Each GB stored incurs ongoing cost. 1,000 snapshots of a 50GB volume = 50TB of snapshot storage.Fix:Implement a Data Lifecycle Manager (DLM) policy to automate snapshot creation and retention, rather than keeping manual snapshots indefinitely.
Overprovisioned RDS instances
Overprovisioned RDS instances
Pattern: RDS instances sized for peak load that rarely happens — or Multi-AZ enabled on dev/staging databases where a brief failover window is acceptable.Fix:
- Use Compute Optimizer RDS recommendations (available for MySQL and PostgreSQL on RDS)
- Disable Multi-AZ on non-production environments (saves ~50% of RDS cost)
- Use Aurora Serverless v2 for workloads with highly variable or unpredictable load
- Consider Aurora for production PostgreSQL/MySQL — better price/performance at scale
Data transfer and NAT Gateway costs
Data transfer and NAT Gateway costs
Pattern: NAT Gateway charges $0.045/GB of data processed in addition to hourly fees. Lambda functions and containers making frequent calls to public endpoints (S3, DynamoDB, external APIs) through a NAT Gateway can generate surprisingly large bills.Fix:
- Use VPC Endpoints (Gateway or Interface) for S3 and DynamoDB — no data transfer cost through NAT
- Use PrivateLink for other AWS services to avoid traversing the public internet
- Review the
NatGateway DataProcessedCloudWatch metric to identify high-volume sources
Quick Reference: FinOps Checklist
Related Notes
AWS Reference
AWS CLI commands for Cost Explorer, CloudWatch, and resource management that feed directly into FinOps workflows.
Terraform
Enforce tagging standards and cost-conscious defaults (right instance types, GP3 volumes) through infrastructure code.
GCP & Azure
Cost optimization patterns for GCP committed use discounts and Azure reservations.
DevOps Overview
How FinOps practices integrate with engineering workflows, sprint planning, and CI/CD pipelines.