Valtik Studios
Back to blog
Cloud SecuritycriticalUpdated 2026-04-1731 min

Cloud Security Incident Response: The Complete Playbook for AWS, Azure, and GCP

It's 2:14 AM. Your phone rings. Unusual outbound data transfer from a production EC2 instance. The cloud IR playbook you didn't write is now mission-critical. This is the complete cloud IR guide. Pre-incident preparation (break-glass, IR roles, logging architecture). Six-phase IR process. AWS / Azure / GCP-specific playbooks. Common attack patterns in 2026. When to bring in external forensics.

TT
Tre Trebucchi·Founder, Valtik Studios. Penetration Tester

Founder of Valtik Studios. Pentester. Based in Connecticut, serving US mid-market.

The 2 AM cloud IR call

Your phone rings at 2:14 AM. Slack blew up ten minutes earlier. The CloudWatch alarm fired on unusual outbound data transfer from a production EC2 instance. Your on-call engineer is trying to figure out if it's a false positive or something real. They've already lost 40 minutes checking logs because nobody set up the right IAM policies ahead of time, so they're waiting for the security team to grant them read-access to CloudTrail.

By the time the CTO is on the call at 3:30 AM, the attacker has already finished exfiltrating. The question now is scope. What did they get. Where else did they land. Is this still active. Your team doesn't have the answers because your cloud incident response playbook assumes you can log into AWS and look around. You can't. The attacker has been editing your IAM. The account roles you need to investigate no longer work.

This is cloud incident response in 2026. It's not on-prem IR with a new venue. It's a different discipline. The speed of the environment, the API-based attack surface, the blast radius of a compromised identity, the way logs work, all of it changes the playbook.

This post is the complete cloud IR playbook we walk through on AWS, Azure, and GCP incidents.

What makes cloud IR different

Five structural differences change how you respond.

Speed. API-driven infrastructure moves at machine speed. On-prem attackers might take days to escalate privileges. Cloud attackers can enumerate, pivot, and exfiltrate in minutes because every action is an API call.

Identity is everything. On-prem IR focuses on hosts. Cloud IR focuses on identity. An attacker with a compromised IAM credential is already everywhere the credential permits. There's no lateral movement to prevent. The damage is done.

The attacker can see + modify your tooling. CloudTrail, CloudWatch, Azure Activity Log, GCP Audit Logs. All API-accessible. A sophisticated attacker with IAM write access can turn off logging, delete audit trails, and hide their tracks. Your first containment step has to be preserving the evidence before the attacker destroys it.

Logs ≠ evidence. CloudTrail shows what API calls happened. It doesn't show what the attacker read from an S3 bucket unless you've enabled data events specifically. VPC Flow Logs are sampled. Many forensic questions you'd want to answer aren't in the default log set.

The shared responsibility boundary shifts constantly. What's your responsibility on EC2 is different from what's your responsibility on Lambda is different from what's your responsibility on managed database. An IR plan has to be layer-aware.

Pre-incident preparation

Most of the IR work happens before the incident. If you show up at 2 AM trying to figure out how to read CloudTrail for the first time, you've already lost.

1. The break-glass account

Every cloud account needs a break-glass identity. Not used for routine operations. Locked down, hardware MFA, stored physically. Used only when the primary admin access is compromised or locked out.

  • AWS. IAM user in the management account with AdministratorAccess, hardware FIDO2 key for MFA, credentials in a physical safe.
  • Azure. Emergency access account in the tenant, excluded from Conditional Access policies, hardware MFA.
  • GCP. Super Admin account with hardware MFA, separate from operational admins.

Test the break-glass quarterly. Rotate after every use.

2. IR-specific IAM roles

Your normal IAM policies don't give your IR team what they need in an incident. Pre-create IR roles:

  • Read-only cloud forensic role. Can read every log source, every IAM policy, every resource configuration. Cannot write anything. Assumable by the security team during an incident.
  • Quarantine role. Can stop instances, isolate security groups, disable IAM access keys, revoke sessions. Narrow but powerful. Assumable only during declared incidents.
  • Snapshot role. Can create snapshots of EBS volumes, memory dumps via SSM, and archive copies of S3 buckets for preservation.

Store the role ARNs in your IR runbook. When the incident fires, your team assumes the roles immediately.

3. Comprehensive logging

Before the incident you need:

  • CloudTrail / Azure Activity / GCP Audit Logs. All regions. All accounts. Centralized in a logging account the attacker can't touch.
  • CloudTrail Data Events for S3. Expensive but necessary. Without these, you cannot tell what the attacker read from a bucket.
  • CloudTrail Data Events for Lambda. If you run Lambda, these are the only way to see what functions got invoked.
  • VPC Flow Logs. All VPCs. Centralized.
  • GuardDuty / Defender for Cloud / Security Command Center. All three cloud providers have native threat detection. Enable it.
  • Application logs. Whatever your apps produce, centralized somewhere queryable.
  • DNS logs. Route 53 query logs, Azure DNS query logs, Cloud DNS logs.

4. Centralized logging account

All logs ship to a dedicated account / subscription / project with:

  • Write-only access from other accounts (no delete, no modify)
  • Lifecycle policies for cost management (hot tier 90 days, cold tier 1 year)
  • IAM tightly restricted
  • Cross-account access for forensic investigation

This is the single most important pre-incident control. Without it, attackers can destroy your audit trail.

5. IR runbook

Documented, tested, updated. Covers:

  • Role assignments (Incident Commander, Forensic Lead, Communications Lead, Recovery Lead, Legal Liaison)
  • Initial validation procedure
  • Break-glass activation procedure
  • IR role assumption steps
  • Containment options with specific IAM/resource commands
  • Preservation steps for common resource types
  • Notification decision tree
  • External resources (forensic firm, legal, insurance)

6. Forensic retainer

Relationship established with a cloud-specialist forensic firm before you need them. Mandiant, CrowdStrike Services, Kroll, Stroz Friedberg, Unit 42. Retainer isn't free but the discount on response-time-at-need is material.

7. Tabletop exercises

Semi-annually minimum. Run scenarios specific to cloud:

  • Compromised IAM access key
  • Compromised IAM role via instance metadata
  • Malicious Lambda function deployed
  • S3 bucket made public accidentally + now exposed
  • Ransomware on RDS
  • Azure tenant takeover via stolen admin credentials
  • GCP organization-level compromise

The six-phase cloud IR process

Phase 1. Detection and validation (0-15 minutes)

Alert fires. Multiple sources possible:

  • GuardDuty / Defender for Cloud / SCC finding
  • CloudWatch/Azure Monitor/Cloud Monitoring anomaly
  • User report (someone saw something weird)
  • Third-party notification (AWS Abuse, cloud provider security team, partner)
  • Unusual billing spike

First question. Is it real?

  • Validate the finding against context. Is this expected behavior? Planned testing? Known legitimate activity?
  • Check related logs for corroboration.
  • If ambiguous, treat as incident pending disposition.

Second question. How urgent?

  • Rank 1. Active ongoing attack, active data exfiltration, active ransomware encryption.
  • Rank 2. Confirmed compromise, attacker activity paused or unclear if still active.
  • Rank 3. Suspicious activity that might be compromise.
  • Rank 4. Finding that's concerning but not confirmed as incident.

Rank 1 and 2 kick off the full IR process immediately. Rank 3 gets 30 min of investigation before escalation. Rank 4 goes to the queue.

Phase 2. Activation (15-30 minutes)

Incident declared. Team assembled.

  • Incident Commander assigned. Single decision-maker.
  • Forensic Lead begins evidence preservation.
  • Communications Lead starts internal + external comms planning.
  • Recovery Lead starts recovery planning.
  • Legal Liaison activated, outside counsel considered.

Break-glass account used to assume IR roles.

Forensic Lead's first actions:

  • Lock down CloudTrail logging (turn off any ability to disable it)
  • Confirm logs are flowing to centralized account
  • Create snapshots of any resources that might be compromised (DON'T delete yet)
  • Enable additional data event logging (S3, Lambda) if not already on

External notifications:

  • Insurance carrier: open claim.
  • Forensic firm if on retainer: engage.
  • Law enforcement if legally required or strategically useful.
  • Cloud provider's security team for serious incidents.

Phase 3. Containment (30 minutes to 4 hours)

Stop the bleeding without destroying evidence.

Identity containment. If IAM was compromised:

  • Rotate compromised credentials immediately
  • Revoke active sessions (AWS: aws sts revoke-session-token or IAM policy change). Azure: Sign-in session revocation via Entra. GCP: revoke OAuth tokens + session)
  • Disable compromised accounts
  • Check for created backdoor accounts (IAM users, roles, service principals created recently)
  • Check for modified IAM policies

Network containment. If a compute instance is compromised:

  • Change security group to allow only forensic access
  • Do NOT stop the instance (loses memory artifacts, may break forensic continuity)
  • Snapshot the EBS volume
  • Memory dump via SSM if possible
  • Tag as "quarantine"

Data containment. If S3 is the attack surface:

  • Block public access on the specific bucket
  • Enable S3 Object Lock if not already
  • Revoke cross-account access if that's the vector

Lambda containment. If a Lambda function is malicious or compromised:

  • Set function reserved concurrency to 0 (effectively disables without deleting)
  • Preserve the function code + configuration for forensics
  • Check for recent code changes + who made them

IAM containment. Most important for cloud incidents:

  • List IAM actions taken by suspicious identities in last 24-72 hours
  • Look for privilege escalation patterns
  • Look for newly assumed roles
  • Look for IAM changes that create persistence (new users, new access keys, modified trust policies)

Phase 4. Assessment (4 hours to 72 hours)

Forensic understanding of what happened.

Questions to answer:

  1. How did the attacker get in?
  2. How long were they in?
  3. What did they access?
  4. What did they exfiltrate?
  5. What did they modify?
  6. Where are their persistence mechanisms?
  7. Are they still active?

Initial access investigation

Common cloud initial access patterns:

  • Leaked credentials (GitHub, PasteBin, stealer logs)
  • Phishing of cloud administrator
  • Web application vulnerability leading to SSRF and metadata access
  • Compromised CI/CD pipeline with cloud credentials
  • Compromised SaaS-to-cloud integration
  • Public-facing service with RCE

Investigate each via:

  • CloudTrail / Activity Log: first suspicious API calls, source IP, user agent
  • VPC Flow Logs: first network indicators
  • Authentication logs: first login from unexpected location

Lateral movement investigation

Common cloud lateral movement:

  • IAM role assumption chains (attacker uses compromised role to assume other roles)
  • Cross-account role assumption if trust relationships exist
  • Service-to-service access via instance metadata
  • OIDC / SSO abuse
  • Service account key theft

Investigate via:

  • CloudTrail: chains of AssumeRole calls
  • IAM Access Advisor: services accessed by each compromised identity
  • Network traffic: unexpected cross-VPC or cross-account traffic

Exfiltration investigation

Common exfil patterns:

  • Bulk S3 object reads (S3 Data Events)
  • Large outbound data to attacker-controlled cloud storage
  • DNS tunneling
  • Encryption key exposure allowing offline data decryption
  • Backup theft (stealing from backup location)
  • Replication abuse (attacker sets up replication to their account)

Investigate via:

  • S3 access logs + CloudTrail Data Events
  • VPC Flow Logs for outbound data
  • DNS query logs for tunneling patterns
  • KMS key usage logs
  • Snapshot + image exports from your account to unknown accounts

Persistence investigation

Common cloud persistence:

  • New IAM users with access keys
  • IAM policy attached to existing legitimate user
  • Cross-account trust relationships created
  • OIDC identity provider added
  • Lambda function set to auto-trigger
  • Scheduled events creating backdoors
  • Modified account-level settings
  • Secondary MFA methods added to admin accounts

Investigate via:

  • CloudTrail: IAM-create and IAM-modify events in last 30 days
  • Current state snapshot compared to known-good configuration
  • Configuration drift analysis via AWS Config / Azure Policy / GCP SCC

Phase 5. Eradication (timeline varies)

Remove attacker presence.

  • Rotate every credential the attacker might have touched
  • Delete attacker-created IAM entities
  • Revert unauthorized policy changes
  • Remove attacker-installed Lambda functions, EC2 instances, Azure resources, GCP workloads
  • Close any exposed attack surfaces (public buckets, open security groups)
  • Reset OIDC / SSO configurations if modified
  • Re-enroll MFA for all admins if MFA config was touched

Phase 6. Recovery + post-incident (timeline varies)

Restore clean operations.

  • Verify eradication via independent review
  • Rebuild compromised workloads from clean images
  • Test recovery before declaring all clear
  • Communicate to affected parties (customers, regulators, employees)
  • Document lessons learned
  • Update IR runbook based on what broke
  • Update detection rules based on indicators
  • Reassess controls

AWS-specific playbook

Compromised IAM user

  1. Rotate keys: aws iam update-access-key --access-key-id X --status Inactive
  2. List activity: aws cloudtrail lookup-events --lookup-attributes AttributeKey=Username,AttributeValue=X --start-time 7daysago
  3. Check for session creation: events matching GetFederationToken, GetSessionToken, AssumeRole
  4. Revoke sessions: detach policies, attach deny-all, wait for STS token expiry (default 15m-12h)
  5. Snapshot affected resources
  6. Investigate scope

Compromised IAM role

  1. Attach explicit deny policy: aws iam put-role-policy --role-name X --policy-name EMERGENCY-DENY --policy-document '{"Version":"2012-10-17","Statement":[{"Effect":"Deny","Action":"*","Resource":"*"}]}'
  2. Cannot rotate credentials for assumed roles (they're temporary). Effective containment is the deny policy + identifying any credentials the attacker already assumed.
  3. List role assumptions: CloudTrail AssumeRole events
  4. Investigate chains
  5. Consider disabling the role entirely if no legitimate use

Compromised EC2 instance

  1. Do NOT stop. Do NOT reboot.
  2. Snapshot EBS: aws ec2 create-snapshot --volume-id vol-X
  3. Tag snapshot for forensic chain of custody
  4. Collect memory via SSM:
\\\`

aws ssm send-command --document-name AWS-RunShellScript \

--parameters 'commands=["dd if=/dev/mem of=/tmp/mem.raw"]' \

--instance-ids i-X

\\\`

Copy out for analysis.

  1. Change security group to quarantine (only forensic access allowed)
  2. Investigate via CloudTrail + VPC Flow Logs + system logs

S3 bucket incident

  1. Block public access: aws s3api put-public-access-block --bucket X --public-access-block-configuration "BlockPublicAcls=true,IgnorePublicAcls=true,BlockPublicPolicy=true,RestrictPublicBuckets=true"
  2. Enable Object Lock if not already (for future protection)
  3. Copy bucket to forensic preservation location
  4. Query CloudTrail Data Events for all GET/LIST activity in last 90 days
  5. Investigate every IP + identity that accessed bucket

Azure-specific playbook

Compromised user / service principal

  1. Revoke sessions: Entra ID -> User -> Sign out from all sessions
  2. Disable account: Entra ID -> User -> Block sign-in
  3. Check for created applications/service principals in last 30 days
  4. Check for consent grants to new apps
  5. Review sign-in logs: Entra ID -> Sign-in logs, filter by user

Compromised application / service principal

  1. Disable app: Entra ID -> Enterprise applications -> Properties -> Enabled for users to sign-in = No
  2. Remove API permissions granted
  3. Rotate client secrets, remove client certificates
  4. Investigate what the app did: Azure Activity Log + diagnostics

Azure tenant root compromise

  1. Activate emergency access account
  2. Review global admin list - any unfamiliar accounts?
  3. Review Conditional Access policies - any suspicious exceptions?
  4. Review Entra ID audit log for recent admin actions
  5. If compromised: Microsoft 365 incident response engagement (internal team or Microsoft Incident Response)

GCP-specific playbook

Compromised service account

  1. Disable key: gcloud iam service-accounts keys delete KEY_ID --iam-account=X
  2. Disable service account: gcloud iam service-accounts disable X
  3. Check audit logs: gcloud logging read 'protoPayload.authenticationInfo.principalEmail="X"'
  4. Look for created resources, new service accounts, modified IAM

Compromised user

  1. Revoke sessions via Google Workspace or Cloud Identity admin
  2. Force password change
  3. Review IAM bindings that user has
  4. Check project-level audit logs for actions

GCP organization root compromise

  1. Use break-glass super admin
  2. Review Organization Admin role assignments
  3. Audit log for organization-level changes
  4. Resource Manager for newly created projects
  5. Possibly engage Google Incident Response

Common attack patterns in 2026

IMDSv1 abuse

Legacy EC2 metadata service allows retrieving instance credentials without token. Any SSRF bug in a web app running on the instance leaks IAM credentials. Upgrade to IMDSv2, disable IMDSv1.

Stolen CI/CD credentials

Build pipelines pulling cloud credentials from environment variables. Compromised pipeline leaks those credentials. Fix: OIDC-based federated access from CI to cloud, no long-lived keys.

Attacker creates an app in their own tenant that requests broad permissions in the target tenant. User clicks consent link in phishing email, grants app access, attacker-controlled app now has API access. Defend via Conditional Access + consent governance.

KMS key abuse

Attacker with IAM permissions uses your KMS keys to decrypt data they've exfiltrated. Or encrypts data you own and holds the decryption hostage. Monitor KMS key usage, enforce deny-by-default on decrypt actions.

Cross-account role trust abuse

Attacker finds an account in your organization with over-permissive cross-account trust. Assumes the role from their account. Audit trust relationships, enforce external ID conditions, principle of least trust.

Resource tagging abuse

Attacker creates resources in your account with tags matching your legitimate operations to avoid detection. Monitor creation events by source IP + identity, not just by tag.

The external forensic firm decision

When to bring in external:

  • Attacker sophistication indicators (tooling matches known APT)
  • Regulated data potentially exfiltrated
  • Litigation risk (breach of contract with customers, PII regulations)
  • Insurance claim requiring third-party validation
  • Skill gap (your team isn't cloud-forensic-capable)

When you can handle internally:

  • Low sophistication attack (automated attacker, known botnet)
  • Clear scope
  • Strong internal team with cloud + forensic skills
  • No regulated data exposure

External firms charge $450-$1200/hour for cloud IR work. A significant incident runs $100K-$500K in forensic fees. Budget for it.

The insurance component

Cyber insurance covers most cloud IR costs. Typically covered:

  • Forensic investigation
  • Legal counsel
  • Notification costs
  • Credit monitoring for affected customers
  • PR / crisis communications
  • Restoration costs

Typically not covered:

  • Your own employee time
  • Business interruption in certain policies
  • Regulatory fines in some jurisdictions
  • Pre-incident hardening

Engage the carrier within 24 hours of incident declaration. Their approved forensic vendors may differ from your retainer.

Working with us

We don't staff 24/7 SOC for active incident response. We do:

  • Pre-incident readiness engagements (IR plan, tabletop, IAM review, logging audit)
  • Post-incident retrospectives + remediation
  • Control implementation to prevent recurrence
  • Cloud security architecture reviews that harden against the common attacks

For active cloud incidents, we maintain partner relationships with cloud-specialist IR firms who handle first-responder work. We'll coordinate the handoff.

Valtik Studios. valtikstudios.com.

cloud securityincident responseawsazuregcpcloud ircloud forensicsiamcloudtrailcomplete guide

Want us to check your Cloud Security setup?

Our scanner detects this exact misconfiguration. plus dozens more across 38 platforms. Free website check available, no commitment required.

Get new research in your inbox
No spam. No newsletter filler. Only new posts as they publish.