Cloud SecuritycriticalUpdated 2026-01-1831 min

Cloud Security Incident Response: The Complete Playbook for AWS, Azure, and GCP

It's 2:14 AM. Your phone rings. Unusual outbound data transfer from a production EC2 instance. The cloud IR playbook you didn't write is now mission-critical. This is the complete cloud IR guide. Pre-incident preparation (break-glass, IR roles, logging architecture). Six-phase IR process. AWS / Azure / GCP-specific playbooks. Common attack patterns in 2026. When to bring in external forensics.

Phillip (Tre) Bucchi·Founder, Valtik Studios. Penetration Tester

Founder of Valtik Studios. Penetration tester. Based in Connecticut, serving US mid-market.

The 2 AM cloud IR call

Your phone rings at 2:14 AM. Slack blew up ten minutes earlier. The CloudWatch alarm fired on unusual outbound data transfer from a production EC2 instance. Your on-call engineer is trying to figure out if it's a false positive or something real. They've already lost 40 minutes checking logs because nobody set up the right IAM policies ahead of time, so they're waiting for the security team to grant them read-access to CloudTrail.

By the time the CTO is on the call at 3:30 AM, the attacker has already finished exfiltrating. The question now is scope. What did they get. Where else did they land. Is this still active. Your team doesn't have the answers because your cloud incident response playbook assumes you can log into AWS and look around. You can't. The attacker has been editing your IAM. The account roles you need to investigate no longer work.

This is cloud incident response in 2026. It's not on-prem IR with a new venue. It's a different discipline. The speed of the environment, the API-based attack surface, the blast radius of a compromised identity, the way logs work, all of it changes the playbook.

This post is the complete cloud IR playbook we walk through on AWS, Azure, and GCP incidents.

What makes cloud IR different

Five structural differences change how you respond.

Speed. API-driven infrastructure moves at machine speed. On-prem attackers might take days to escalate privileges. Cloud attackers can enumerate, pivot, and exfiltrate in minutes because every action is an API call.

Identity is everything. On-prem IR focuses on hosts. Cloud IR focuses on identity. An attacker with a compromised IAM credential is already everywhere the credential permits. There's no lateral movement to prevent. The damage is done.

The attacker can see + modify your tooling. CloudTrail, CloudWatch, Azure Activity Log, GCP Audit Logs. All API-accessible. A sophisticated attacker with IAM write access can turn off logging, delete audit trails, and hide their tracks. Your first containment step has to be preserving the evidence before the attacker destroys it.

Logs ≠ evidence. CloudTrail shows what API calls happened. It doesn't show what the attacker read from an S3 bucket unless you've enabled data events specifically. VPC Flow Logs are sampled. Many forensic questions you'd want to answer aren't in the default log set.

The shared responsibility boundary shifts constantly. What's your responsibility on EC2 is different from what's your responsibility on Lambda is different from what's your responsibility on managed database. An IR plan has to be layer-aware.

Pre-incident preparation

Most of the IR work happens before the incident. If you show up at 2 AM trying to figure out how to read CloudTrail for the first time, you've already lost.

1. The break-glass account

Every cloud account needs a break-glass identity. Not used for routine operations. Locked down, hardware MFA, stored physically. Used only when the primary admin access is compromised or locked out.

AWS. IAM user in the management account with AdministratorAccess, hardware FIDO2 key for MFA, credentials in a physical safe.
Azure. Emergency access account in the tenant, excluded from Conditional Access policies, hardware MFA.
GCP. Super Admin account with hardware MFA, separate from operational admins.

Test the break-glass quarterly. Rotate after every use.

2. IR-specific IAM roles

Your normal IAM policies don't give your IR team what they need in an incident. Pre-create IR roles:

Read-only cloud forensic role. Can read every log source, every IAM policy, every resource configuration. Cannot write anything. Assumable by the security team during an incident.
Quarantine role. Can stop instances, isolate security groups, disable IAM access keys, revoke sessions. Narrow but powerful. Assumable only during declared incidents.
Snapshot role. Can create snapshots of EBS volumes, memory dumps via SSM, and archive copies of S3 buckets for preservation.

Store the role ARNs in your IR runbook. When the incident fires, your team assumes the roles immediately.

3. Comprehensive logging

Before the incident you need:

CloudTrail / Azure Activity / GCP Audit Logs. All regions. All accounts. Centralized in a logging account the attacker can't touch.
CloudTrail Data Events for S3. Expensive but necessary. Without these, you cannot tell what the attacker read from a bucket.
CloudTrail Data Events for Lambda. If you run Lambda, these are the only way to see what functions got invoked.
VPC Flow Logs. All VPCs. Centralized.
GuardDuty / Defender for Cloud / Security Command Center. All three cloud providers have native threat detection. Enable it.
Application logs. Whatever your apps produce, centralized somewhere queryable.
DNS logs. Route 53 query logs, Azure DNS query logs, Cloud DNS logs.

4. Centralized logging account

All logs ship to a dedicated account / subscription / project with:

Write-only access from other accounts (no delete, no modify)
Lifecycle policies for cost management (hot tier 90 days, cold tier 1 year)
IAM tightly restricted
Cross-account access for forensic investigation

This is the single most important pre-incident control. Without it, attackers can destroy your audit trail.

5. IR runbook

Documented, tested, updated. Covers:

Role assignments (Incident Commander, Forensic Lead, Communications Lead, Recovery Lead, Legal Liaison)
Initial validation procedure
Break-glass activation procedure
IR role assumption steps
Containment options with specific IAM/resource commands
Preservation steps for common resource types
Notification decision tree
External resources (forensic firm, legal, insurance)

6. Forensic retainer

Relationship established with a cloud-specialist forensic firm before you need them. Mandiant, CrowdStrike Services, Kroll, Stroz Friedberg, Unit 42. Retainer isn't free but the discount on response-time-at-need is material.

7. Tabletop exercises

Semi-annually minimum. Run scenarios specific to cloud:

Compromised IAM access key
Compromised IAM role via instance metadata
Malicious Lambda function deployed
S3 bucket made public accidentally + now exposed
Ransomware on RDS
Azure tenant takeover via stolen admin credentials
GCP organization-level compromise

The six-phase cloud IR process

Phase 1. Detection and validation (0-15 minutes)

Alert fires. Multiple sources possible:

GuardDuty / Defender for Cloud / SCC finding
CloudWatch/Azure Monitor/Cloud Monitoring anomaly
User report (someone saw something weird)
Third-party notification (AWS Abuse, cloud provider security team, partner)
Unusual billing spike

First question. Is it real?

Validate the finding against context. Is this expected behavior? Planned testing? Known legitimate activity?
Check related logs for corroboration.
If ambiguous, treat as incident pending disposition.

Second question. How urgent?

Rank 1. Active ongoing attack, active data exfiltration, active ransomware encryption.
Rank 2. Confirmed compromise, attacker activity paused or unclear if still active.
Rank 3. Suspicious activity that might be compromise.
Rank 4. Finding that's concerning but not confirmed as incident.

Rank 1 and 2 kick off the full IR process immediately. Rank 3 gets 30 min of investigation before escalation. Rank 4 goes to the queue.

Phase 2. Activation (15-30 minutes)

Incident declared. Team assembled.

Incident Commander assigned. Single decision-maker.
Forensic Lead begins evidence preservation.
Communications Lead starts internal + external comms planning.
Recovery Lead starts recovery planning.
Legal Liaison activated, outside counsel considered.

Break-glass account used to assume IR roles.

Forensic Lead's first actions:

Lock down CloudTrail logging (turn off any ability to disable it)
Confirm logs are flowing to centralized account
Create snapshots of any resources that might be compromised (DON'T delete yet)
Enable additional data event logging (S3, Lambda) if not already on

External notifications:

Insurance carrier: open claim.
Forensic firm if on retainer: engage.
Law enforcement if legally required or strategically useful.
Cloud provider's security team for serious incidents.

Phase 3. Containment (30 minutes to 4 hours)

Stop the bleeding without destroying evidence.

Identity containment. If IAM was compromised:

Rotate compromised credentials immediately
Revoke active sessions (AWS: aws sts revoke-session-token or IAM policy change). Azure: Sign-in session revocation via Entra. GCP: revoke OAuth tokens + session)
Disable compromised accounts
Check for created backdoor accounts (IAM users, roles, service principals created recently)
Check for modified IAM policies

Network containment. If a compute instance is compromised:

Change security group to allow only forensic access
Do NOT stop the instance (loses memory artifacts, may break forensic continuity)
Snapshot the EBS volume
Memory dump via SSM if possible
Tag as "quarantine"

Data containment. If S3 is the attack surface:

Block public access on the specific bucket
Enable S3 Object Lock if not already
Revoke cross-account access if that's the vector

Lambda containment. If a Lambda function is malicious or compromised:

Set function reserved concurrency to 0 (effectively disables without deleting)
Preserve the function code + configuration for forensics
Check for recent code changes + who made them

IAM containment. Most important for cloud incidents:

List IAM actions taken by suspicious identities in last 24-72 hours
Look for privilege escalation patterns
Look for newly assumed roles
Look for IAM changes that create persistence (new users, new access keys, modified trust policies)

Phase 4. Assessment (4 hours to 72 hours)

Forensic understanding of what happened.

Questions to answer:

How did the attacker get in?
How long were they in?
What did they access?
What did they exfiltrate?
What did they modify?
Where are their persistence mechanisms?
Are they still active?

Initial access investigation

Common cloud initial access patterns:

Leaked credentials (GitHub, PasteBin, stealer logs)
Phishing of cloud administrator
Web application vulnerability leading to SSRF and metadata access
Compromised CI/CD pipeline with cloud credentials
Compromised SaaS-to-cloud integration
Public-facing service with RCE

Investigate each via:

CloudTrail / Activity Log: first suspicious API calls, source IP, user agent
VPC Flow Logs: first network indicators
Authentication logs: first login from unexpected location

Lateral movement investigation

Common cloud lateral movement:

IAM role assumption chains (attacker uses compromised role to assume other roles)
Cross-account role assumption if trust relationships exist
Service-to-service access via instance metadata
OIDC / SSO abuse
Service account key theft

Investigate via:

CloudTrail: chains of AssumeRole calls
IAM Access Advisor: services accessed by each compromised identity
Network traffic: unexpected cross-VPC or cross-account traffic

Exfiltration investigation

Common exfil patterns:

Bulk S3 object reads (S3 Data Events)
Large outbound data to attacker-controlled cloud storage
DNS tunneling
Encryption key exposure allowing offline data decryption
Backup theft (stealing from backup location)
Replication abuse (attacker sets up replication to their account)

Investigate via:

S3 access logs + CloudTrail Data Events
VPC Flow Logs for outbound data
DNS query logs for tunneling patterns
KMS key usage logs
Snapshot + image exports from your account to unknown accounts

Persistence investigation

Common cloud persistence:

New IAM users with access keys
IAM policy attached to existing legitimate user
Cross-account trust relationships created
OIDC identity provider added
Lambda function set to auto-trigger
Scheduled events creating backdoors
Modified account-level settings
Secondary MFA methods added to admin accounts

Investigate via:

CloudTrail: IAM-create and IAM-modify events in last 30 days
Current state snapshot compared to known-good configuration
Configuration drift analysis via AWS Config / Azure Policy / GCP SCC

Phase 5. Eradication (timeline varies)

Remove attacker presence.

Rotate every credential the attacker might have touched
Delete attacker-created IAM entities
Revert unauthorized policy changes
Remove attacker-installed Lambda functions, EC2 instances, Azure resources, GCP workloads
Close any exposed attack surfaces (public buckets, open security groups)
Reset OIDC / SSO configurations if modified
Re-enroll MFA for all admins if MFA config was touched

Phase 6. Recovery + post-incident (timeline varies)

Restore clean operations.

Verify eradication via independent review
Rebuild compromised workloads from clean images
Test recovery before declaring all clear
Communicate to affected parties (customers, regulators, employees)
Document lessons learned
Update IR runbook based on what broke
Update detection rules based on indicators
Reassess controls

AWS-specific playbook

Compromised IAM user

Rotate keys: aws iam update-access-key --access-key-id X --status Inactive
List activity: aws cloudtrail lookup-events --lookup-attributes AttributeKey=Username,AttributeValue=X --start-time 7daysago
Check for session creation: events matching GetFederationToken, GetSessionToken, AssumeRole
Revoke sessions: detach policies, attach deny-all, wait for STS token expiry (default 15m-12h)
Snapshot affected resources
Investigate scope

Compromised IAM role

Attach explicit deny policy: aws iam put-role-policy --role-name X --policy-name EMERGENCY-DENY --policy-document '{"Version":"2012-10-17","Statement":[{"Effect":"Deny","Action":"*","Resource":"*"}]}'
Cannot rotate credentials for assumed roles (they're temporary). Effective containment is the deny policy + identifying any credentials the attacker already assumed.
List role assumptions: CloudTrail AssumeRole events
Investigate chains
Consider disabling the role entirely if no legitimate use

Compromised EC2 instance

Do NOT stop. Do NOT reboot.
Snapshot EBS: aws ec2 create-snapshot --volume-id vol-X
Tag snapshot for forensic chain of custody
Collect memory via SSM:

\\\`

aws ssm send-command --document-name AWS-RunShellScript \

--parameters 'commands=["dd if=/dev/mem of=/tmp/mem.raw"]' \

--instance-ids i-X

\\\`

Copy out for analysis.

Change security group to quarantine (only forensic access allowed)
Investigate via CloudTrail + VPC Flow Logs + system logs

S3 bucket incident

Block public access: aws s3api put-public-access-block --bucket X --public-access-block-configuration "BlockPublicAcls=true,IgnorePublicAcls=true,BlockPublicPolicy=true,RestrictPublicBuckets=true"
Enable Object Lock if not already (for future protection)
Copy bucket to forensic preservation location
Query CloudTrail Data Events for all GET/LIST activity in last 90 days
Investigate every IP + identity that accessed bucket

Azure-specific playbook

Compromised user / service principal

Revoke sessions: Entra ID -> User -> Sign out from all sessions
Disable account: Entra ID -> User -> Block sign-in
Check for created applications/service principals in last 30 days
Check for consent grants to new apps
Review sign-in logs: Entra ID -> Sign-in logs, filter by user

Compromised application / service principal

Disable app: Entra ID -> Enterprise applications -> Properties -> Enabled for users to sign-in = No
Remove API permissions granted
Rotate client secrets, remove client certificates
Investigate what the app did: Azure Activity Log + diagnostics

Azure tenant root compromise

Activate emergency access account
Review global admin list - any unfamiliar accounts?
Review Conditional Access policies - any suspicious exceptions?
Review Entra ID audit log for recent admin actions
If compromised: Microsoft 365 incident response engagement (internal team or Microsoft Incident Response)

GCP-specific playbook

Compromised service account

Disable key: gcloud iam service-accounts keys delete KEY_ID --iam-account=X
Disable service account: gcloud iam service-accounts disable X
Check audit logs: gcloud logging read 'protoPayload.authenticationInfo.principalEmail="X"'
Look for created resources, new service accounts, modified IAM

Compromised user

Revoke sessions via Google Workspace or Cloud Identity admin
Force password change
Review IAM bindings that user has
Check project-level audit logs for actions

GCP organization root compromise

Use break-glass super admin
Review Organization Admin role assignments
Audit log for organization-level changes
Resource Manager for newly created projects
Possibly engage Google Incident Response

Common attack patterns in 2026

IMDSv1 abuse

Legacy EC2 metadata service allows retrieving instance credentials without token. Any SSRF bug in a web app running on the instance leaks IAM credentials. Upgrade to IMDSv2, disable IMDSv1.

Stolen CI/CD credentials

Build pipelines pulling cloud credentials from environment variables. Compromised pipeline leaks those credentials. Fix: OIDC-based federated access from CI to cloud, no long-lived keys.

Attacker creates an app in their own tenant that requests broad permissions in the target tenant. User clicks consent link in phishing email, grants app access, attacker-controlled app now has API access. Defend via Conditional Access + consent governance.

KMS key abuse

Attacker with IAM permissions uses your KMS keys to decrypt data they've exfiltrated. Or encrypts data you own and holds the decryption hostage. Monitor KMS key usage, enforce deny-by-default on decrypt actions.

Cross-account role trust abuse

Attacker finds an account in your organization with over-permissive cross-account trust. Assumes the role from their account. Audit trust relationships, enforce external ID conditions, principle of least trust.

Resource tagging abuse

Attacker creates resources in your account with tags matching your legitimate operations to avoid detection. Monitor creation events by source IP + identity, not just by tag.

The external forensic firm decision

When to bring in external:

Attacker sophistication indicators (tooling matches known APT)
Regulated data potentially exfiltrated
Litigation risk (breach of contract with customers, PII regulations)
Insurance claim requiring third-party validation
Skill gap (your team isn't cloud-forensic-capable)

When you can handle internally:

Low sophistication attack (automated attacker, known botnet)
Clear scope
Strong internal team with cloud + forensic skills
No regulated data exposure

External firms charge $450-$1200/hour for cloud IR work. A significant incident runs $100K-$500K in forensic fees. Budget for it.

The insurance component

Cyber insurance covers most cloud IR costs. Typically covered:

Forensic investigation
Legal counsel
Notification costs
Credit monitoring for affected customers
PR / crisis communications
Restoration costs

Typically not covered:

Your own employee time
Business interruption in certain policies
Regulatory fines in some jurisdictions
Pre-incident hardening

Engage the carrier within 24 hours of incident declaration. Their approved forensic vendors may differ from your retainer.

Working with us

We don't staff 24/7 SOC for active incident response. We do:

Pre-incident readiness engagements (IR plan, tabletop, IAM review, logging audit)
Post-incident retrospectives + remediation
Control implementation to prevent recurrence
Cloud security architecture reviews that harden against the common attacks

For active cloud incidents, we maintain partner relationships with cloud-specialist IR firms who handle first-responder work. We'll coordinate the handoff.

Valtik Studios. valtikstudios.com.

cloud securityincident responseawsazuregcpcloud ircloud forensicsiamcloudtrailcomplete guide

Putting AI tools near production code?

We audit agent permissions, repo access, secrets, MCP servers, prompt injection paths, and CI blast radius before an assistant becomes a breach path.

Book an AI security audit Ask for a quote

Get new research in your inbox

No spam. No newsletter filler. Only new posts as they publish.

Cloud Security Incident Response: The Complete Playbook for AWS, Azure, and GCP

#The 2 AM cloud IR call

#What makes cloud IR different

#Pre-incident preparation

#1. The break-glass account

#2. IR-specific IAM roles

#3. Comprehensive logging

#4. Centralized logging account

#5. IR runbook

#6. Forensic retainer

#7. Tabletop exercises

#The six-phase cloud IR process

#Phase 1. Detection and validation (0-15 minutes)

#Phase 2. Activation (15-30 minutes)

#Phase 3. Containment (30 minutes to 4 hours)

#Phase 4. Assessment (4 hours to 72 hours)

#Initial access investigation

#Lateral movement investigation

#Exfiltration investigation

#Persistence investigation

#Phase 5. Eradication (timeline varies)

#Phase 6. Recovery + post-incident (timeline varies)

#AWS-specific playbook

#Compromised IAM user

#Compromised IAM role

#Compromised EC2 instance

#S3 bucket incident

#Azure-specific playbook

#Compromised user / service principal

#Compromised application / service principal

#Azure tenant root compromise

#GCP-specific playbook

#Compromised service account

#Compromised user

#GCP organization root compromise

#Common attack patterns in 2026

#IMDSv1 abuse

#Stolen CI/CD credentials

#OAuth consent phishing

#KMS key abuse

#Cross-account role trust abuse

#Resource tagging abuse

#The external forensic firm decision

#The insurance component

#Working with us

Putting AI tools near production code?

The 2 AM cloud IR call

What makes cloud IR different

Pre-incident preparation

1. The break-glass account

2. IR-specific IAM roles

3. Comprehensive logging

4. Centralized logging account

5. IR runbook

6. Forensic retainer

7. Tabletop exercises

The six-phase cloud IR process

Phase 1. Detection and validation (0-15 minutes)

Phase 2. Activation (15-30 minutes)

Phase 3. Containment (30 minutes to 4 hours)

Phase 4. Assessment (4 hours to 72 hours)

Initial access investigation

Lateral movement investigation

Exfiltration investigation

Persistence investigation

Phase 5. Eradication (timeline varies)

Phase 6. Recovery + post-incident (timeline varies)

AWS-specific playbook

Compromised IAM user

Compromised IAM role

Compromised EC2 instance

S3 bucket incident

Azure-specific playbook

Compromised user / service principal

Compromised application / service principal

Azure tenant root compromise

GCP-specific playbook

Compromised service account

Compromised user

GCP organization root compromise

Common attack patterns in 2026

IMDSv1 abuse

Stolen CI/CD credentials

OAuth consent phishing

KMS key abuse

Cross-account role trust abuse

Resource tagging abuse

The external forensic firm decision

The insurance component

Working with us