Google Compute Engine: The 2026 Deep Dive

Prerequisites This note assumes familiarity with core GCP concepts. WikiLinks like [[Cloud IAM]] and [[VPC Networks]] reference other notes in this digital garden—if those pages don't exist yet, treat them as topics to explore in the official documentation.

At a Glance

Topic	What You’ll Learn	ACE Exam Weight
Machine Families	Choosing N4/E2/C4/M3 for scenarios	High (Section 2.1)
Storage	PD vs Hyperdisk decision matrix	High (Section 2.1)
MIGs & Autoscaling	Designing for 99.99% availability	High (Sections 2.1, 3.1)
Security	Shielded/Confidential VMs, OS Login	Medium (Section 4.1–4.2)
Cost Optimization	Spot vs CUD vs On-Demand	Medium (Section 1.2)

🚀 If You Only Have 30 Minutes Before Practice Exam

Must read (15 min):

The Five Machine Families table + its Exam Strategy callout

Managed Instance Groups (MIGs) table + Autoscaling Reactive vs Predictive

The Cost Spectrum diagram

Quick skim (10 min): 4. Hyperdisk Variants table + Storage Exam Strategy callout 5. Security Threat Model comparison table

If time permits (5 min): 6. Exam Traps Cheat Sheet at the bottom 7. Practice Scenarios — try answering before checking

Why GCE Still Matters in a Serverless World

Here’s a dirty little secret that the “serverless everything” crowd doesn’t want you to think about: every GKE node is just a Compute Engine VM under the hood. Every Cloud Run container ultimately executes on Google’s virtualized infrastructure. When you peel back the abstraction layers of any “serverless” platform, you find VMs—and understanding those VMs gives you superpowers when things go sideways.

For those of us coming from a malware analysis background, GCE is particularly fascinating. It’s where you can spin up isolated sandbox environments, create honeypots with precise network controls, and build forensic workstations with exactly the hardware profile you need. But more broadly, GCE is the Infrastructure as a Service (IaaS) foundation that everything else in GCP sits on.

Think of it this way: if Google Kubernetes Engine is a luxury apartment complex with a concierge (Google) handling maintenance, GCE is buying raw land and building exactly the house you want. More work? Yes. More control? Absolutely.

Where GCE Fits in the Cloud Spectrum

GCE is “hiring a contractor to build a custom house.” You tell them the specs (vCPUs, memory, storage), they provide the land and foundation (hardware, virtualization), and you’re responsible for everything from the OS up—runtime, scaling, application code, and data.

The Hardware Deep Dive: Understanding Machine Families

Goal: Pick the right machine family in under 30 seconds for any exam scenario.

Maps to ACE 2.1: Planning and implementing compute resources — “Selecting appropriate compute choices for a given workload.”

The biggest conceptual shift in modern GCE (2024-2026) is understanding that not all vCPUs are created equal. Google now offers highly specialized machine families optimized for radically different workloads. Picking the wrong family is like bringing a scalpel to a demolition job—or a sledgehammer to surgery.

The Five Machine Families

graph TD
    subgraph "Machine Type Families"
        GP[🖥️ General Purpose<br/>N4, E2, Tau T2A]
        CO[⚡ Compute Optimized<br/>C4, C3]
        MO[🧠 Memory Optimized<br/>X4, M3]
        AO[🎮 Accelerator Optimized<br/>A3, A2 - GPUs]
        SO[💾 Storage Optimized<br/>Z3]
    end
    
    GP --> |"Best for"| GPuse[Web servers, Dev/Test<br/>Microservices, Databases]
    CO --> |"Best for"| COuse[HPC, Gaming servers<br/>Single-threaded apps, CI/CD]
    MO --> |"Best for"| MOuse[SAP HANA, In-memory DBs<br/>Large analytics workloads]
    AO --> |"Best for"| AOuse[ML Training, Rendering<br/>Scientific simulation]
    SO --> |"Best for"| SOuse[High-IOPS databases<br/>Real-time analytics]
    
    style GP fill:#4CAF50,color:white
    style CO fill:#2196F3,color:white
    style MO fill:#9C27B0,color:white
    style AO fill:#FF9800,color:white
    style SO fill:#607D8B,color:white

General Purpose (N4, E2, Tau T2A/T2D)

The workhorses. These are your go-to for 80% of workloads.

Series	Architecture	Sweet Spot	Notes
N4	4th-gen Intel Xeon + Titanium offloads	Production web apps, balanced workloads	Newest generation, best price/performance
E2	Shared-core or burst	Dev/test, cost-sensitive workloads	Up to 32 vCPUs, typically the lowest-cost GP option
Tau T2A	Arm-based (Ampere Altra)	Scale-out microservices, containerized apps	~20% better price/performance for Arm-compatible workloads
Tau T2D	AMD EPYC (x86)	Scale-out web serving, containerized apps	x86 option for Tau family

Previous-Generation GP Options You'll still encounter N2, N2D, and N1 in production environments and exam questions. These remain fully supported but are considered previous-generation for new deployments. N4 offers better performance per dollar for most new workloads.

Exam Strategy: General Purpose The ACE exam loves to test whether you understand when to use E2 vs. N4. Key differentiator: E2 uses shared cores for smaller instances (cost savings through CPU bursting), while N4 provides dedicated cores. For consistent, production workloads, N4. For bursty dev/test, E2.

Compute Optimized (C4, C3)

When raw CPU clock speed and single-threaded performance matter more than anything else.

C4: Built on Titanium with Intel Emerald Rapids. Highest per-core performance available.
C3: Previous generation, still excellent for HPC and gaming servers.

Use cases: Game servers (think Minecraft or Valheim hosting), High-Performance Computing (HPC) batch jobs, Electronic Design Automation (EDA), single-threaded legacy applications that can’t be parallelized.

Memory Optimized (X4, M3)

These are the beasts. We’re talking machines with multiple terabytes of RAM (up to tens of TB) and hundreds of vCPUs.

X4: Latest generation, Intel Sapphire Rapids, massive memory capacity.
M3: Excellent for SAP HANA certified workloads.

If you’re running in-memory databases like Redis clusters at scale, SAP HANA, or doing genomics analysis where the entire dataset needs to fit in RAM, this is your family.

Storage Optimized (Z3)

The newest family (2024+), designed for workloads that need extreme local SSD IOPS—think distributed databases like Cassandra, CockroachDB, or time-series databases.

Accelerator Optimized (A3, A2)

GPU-attached instances for ML/AI training, inference, rendering, and scientific simulation. The A3 series with NVIDIA H100 GPUs is the current flagship for large language model training.

The Titanium Evolution

Here’s where it gets interesting for systems engineers. Google’s Titanium is a custom silicon offload chip that handles I/O operations—network virtualization, storage I/O, security functions—off the host CPU.

Why Titanium matters:

More usable CPU cycles — Your vCPUs aren’t wasting time on packet processing
Consistent performance — I/O-heavy workloads don’t starve compute
Enhanced security — Hardware-isolated operations reduce attack surface

When Titanium matters for your choice:

✅ Choose N4/C4 (Titanium-enabled) when: Network-intensive apps, storage-heavy databases, security-sensitive workloads
⚠️ Titanium doesn’t matter when: Simple dev/test VMs, cost is primary concern (E2 is fine)

Think of it like having a dedicated mail room staff (Titanium) so your employees (vCPUs) don’t have to stop working every time a package arrives.

Exam Traps: Machine Families

Trap 1: Picking C4 (compute-optimized) when E2 would suffice. The exam often emphasizes cost efficiency over maximum performance.

Trap 2: Forgetting that Tau T2A is Arm-based. If the scenario mentions “existing x86 binaries,” T2A won’t work without recompilation.

Trap 3: Choosing memory-optimized (M3/X4) for a “large database” without checking if it’s actually memory-bound. Most databases are I/O-bound first.

Legacy Machine Types on the Exam You'll still see N1, N2, and N2D in exam scenarios. These aren't deprecated, but for new deployments, the N4/C4/M3/X4 families offer better performance per dollar. The exam may present scenarios where you need to identify the appropriate generation for a migration or cost optimization question.

The Storage Layer: Why Hyperdisk is the 2026 Standard

Goal: Instantly distinguish when to use PD-Balanced, Regional PD, or Hyperdisk based on scenario keywords.

Maps to ACE 2.1: “Choosing the appropriate storage for Compute Engine (e.g., zonal Persistent Disk, regional Persistent Disk, Google Cloud Hyperdisk).”

The Critical Decision: Regional PD vs Hyperdisk

Before diving into details, memorize this decision rule:

Scenario Keyword	Best Choice	Why
”Cross-zone redundancy” / “survive zone failure”	Regional PD	Synchronously replicated across 2 zones
”Tune IOPS independently” / “SAN-like flexibility”	Hyperdisk	Decoupled capacity and performance
”Cost-effective general workload”	PD-Balanced	Good default, scales with size
”Maximum database IOPS”	Hyperdisk Extreme	Up to ~350k IOPS, independently tunable

The Old World: Persistent Disk

Traditional Persistent Disk options still exist and you’ll encounter them:

Type	IOPS	Throughput	Use Case
PD-Standard	Low (0.75 per GB)	Low	Boot disks, cold storage
PD-Balanced	Medium (6 per GB)	Medium	General workloads
PD-SSD	High (30 per GB)	High	Databases, high-performance apps

The critical limitation: performance scales linearly with capacity. Want more IOPS? You need a bigger disk, even if you don’t need the space.

The New World: Hyperdisk

Hyperdisk is Google’s answer to enterprise SAN (Storage Area Network) flexibility. The revolutionary concept: decoupled performance and capacity.

graph LR
    subgraph "Traditional PD"
        PD[100GB PD-SSD] --> IOPS1[3,000 IOPS<br/>Fixed to size]
    end
    
    subgraph "Hyperdisk"
        HD[100GB Hyperdisk] --> IOPS2[3,000 to 350,000 IOPS<br/>You choose]
        HD --> TP[Throughput: You choose]
        HD --> CAP[Capacity: You choose]
    end
    
    style PD fill:#FFA726
    style HD fill:#66BB6A

Hyperdisk Variants

Type	Max IOPS	Max Throughput	Sweet Spot
Hyperdisk Balanced	Up to ~160,000	Up to ~2.4 GB/s	General workloads needing flexibility
Hyperdisk Extreme	Up to ~350,000	Up to ~5 GB/s	Databases, SAP HANA
Hyperdisk Throughput	Scales with throughput	Up to ~2.4 GB/s	Analytics, large sequential reads/writes
Hyperdisk ML	Optimized for parallel reads	Massive parallelism	ML training data, multi-node access

Hyperdisk Throughput I/O Profile Hyperdisk Throughput is tuned for large sequential workloads rather than random I/O. IOPS scales with provisioned throughput (roughly 4 IOPS per MiB/s), making it cost-effective for analytics pipelines and data lakes.

Exam Strategy: Hyperdisk Variants For the ACE exam, 90% of storage questions boil down to three choices: PD-Standard (cheap/cold data), PD-Balanced or Hyperdisk Balanced (default for web apps), and PD-SSD or Hyperdisk Extreme (databases needing high IOPS). Hyperdisk ML and Throughput are real products but rarely appear on the associate-level exam—save that knowledge for the Professional Cloud Architect track.

Storage Pools: The Cost Optimization Secret

Here’s where it gets financially interesting. Storage Pools allow you to provision a pool of Hyperdisk capacity and performance, then carve out individual disks from that pool.

Why does this matter? Thin provisioning.

Imagine you have 10 VMs that each might need 100GB and 10,000 IOPS during peak load, but typically use 20GB and 2,000 IOPS. Without pools:

10 × 100GB × 10,000 IOPS = 1TB provisioned, 100,000 IOPS provisioned
Cost: $$

With Storage Pools:

Pool: 400GB capacity, 30,000 IOPS (sized for realistic aggregate demand)
Individual disks: 100GB each, but drawing from shared pool
Cost: $ (you pay for the pool, not theoretical maximums)

Exam Traps: Storage

Trap 1: Ignoring Regional PD when “cross-zone durability” or “zone failure” is mentioned. Hyperdisk is flexible but doesn’t inherently replicate across zones.

Trap 2: Choosing Hyperdisk for a simple web server boot disk. PD-Balanced is usually the right default—Hyperdisk shines when you need to tune IOPS/throughput independently.

Trap 3: Forgetting that PD performance scales with size. If someone says “need more IOPS on PD-SSD,” the answer might be “increase disk size,” not “switch to Hyperdisk.”

Exam Strategy: Storage The exam loves to trick you with scenarios asking about Balanced PD vs. Hyperdisk Balanced. Key insight: if the question mentions needing to scale IOPS independently of capacity, or talks about "enterprise SAN-like flexibility," the answer is Hyperdisk. If it's a simple "what's the default for a web server?" question, PD-Balanced is usually the cost-effective answer.

Life of a VM: From Boot to Production

Goal: Understand what happens at each boot stage and how Shielded VMs protect that process.

Maps to ACE 2.1: “Launching a compute instance (e.g., availability policy, SSH keys)” and ACE 3.1: “Working with snapshots and images.”

Understanding the VM lifecycle is crucial for both the exam and real-world troubleshooting.

The Boot Process

sequenceDiagram
    participant User
    participant GCE API
    participant Scheduler
    participant Host
    participant VM
    
    User->>GCE API: gcloud compute instances create
    GCE API->>Scheduler: Find suitable host
    Scheduler->>Host: Allocate resources
    Host->>VM: Create VM from image
    VM->>VM: UEFI/Shielded VM verification
    VM->>VM: Boot OS
    VM->>VM: Execute startup script
    VM->>VM: Guest Agent registers
    VM-->>User: RUNNING status

Image Selection: You specify a boot disk image (public like debian-12 or custom).
Host Placement: Google’s scheduler finds a host with available resources in your specified zone.
Shielded VM Verification (if enabled, which is default): The boot process is verified against known-good measurements.
OS Boot: The operating system loads.
Startup Script Execution: Any metadata-specified startup scripts run.
Guest Agent: The GCE guest agent registers, enabling features like OS Login and metadata queries.

Shielded VM: Security by Default

Since ~2020, all VM families are Shielded VMs by default. This is huge for security-conscious deployments.

What Shielded VMs provide:

Feature	What It Does	Why It Matters
Secure Boot	Verifies bootloader and kernel signatures	Prevents boot-level rootkits
vTPM	Virtual Trusted Platform Module	Enables measured boot, secure key storage
Integrity Monitoring	Compares boot measurements to baseline	Detects tampering in real-time

For Malware Analysts The vTPM and integrity monitoring features are goldmines for incident response. If you suspect a VM has been compromised at the boot level, you can compare its current boot measurements against the baseline. A mismatch is a strong indicator of boot-level persistence mechanisms.

Startup Scripts: Automation at Boot

Startup scripts run every time a VM boots (not just first boot). They’re specified via instance metadata:

gcloud compute instances create my-vm \
  --metadata startup-script='#!/bin/bash
apt-get update
apt-get install -y nginx
systemctl enable nginx
systemctl start nginx'

Or reference a script in Cloud Storage:

gcloud compute instances create my-vm \
  --metadata startup-script-url=gs://my-bucket/startup.sh

Startup Script Pitfalls Startup scripts run as root and block the boot process until completion. Long-running scripts delay your VM reaching "RUNNING" status. For complex provisioning, consider using startup scripts only to bootstrap a configuration management tool like Ansible or Puppet.

Resiliency: Designing for 99.99% SLA

Goal: Design a multi-zone architecture that survives zone failures using MIGs, health checks, and autoscaling.

Maps to ACE 2.1: “Creating an autoscaled managed instance group by using an instance template” and ACE 3.1: “Working with GKE node pools… autoscaling node pool.”

The uploaded materials emphasize Managed Instance Groups (MIGs) as the core of high availability, and they’re absolutely right.

The Hierarchy of Resilience

graph TD
    subgraph "GCP Project"
        subgraph "Region: us-central1"
            subgraph "Zone A"
                MIG1[Regional MIG<br/>Instance 1]
                HD1[Hyperdisk 1]
                MIG1 --> HD1
            end
            subgraph "Zone B"
                MIG2[Regional MIG<br/>Instance 2]
                HD2[Hyperdisk 2]
                MIG2 --> HD2
            end
            subgraph "Zone C"
                MIG3[Regional MIG<br/>Instance 3]
                HD3[Hyperdisk 3]
                MIG3 --> HD3
            end
        end
        LB[Global Load Balancer] --> MIG1
        LB --> MIG2
        LB --> MIG3
    end
    
    style LB fill:#4285F4,color:white
    style MIG1 fill:#34A853,color:white
    style MIG2 fill:#34A853,color:white
    style MIG3 fill:#34A853,color:white

Managed Instance Groups (MIGs)

A MIG is a collection of identical VMs created from the same Instance Template. Google manages them as a unit.

Key features:

Feature	Zonal MIG	Regional MIG
Distribution	Single zone	Across 3 zones
Failure domain	Zone failure = total outage	Zone failure = ~66% capacity remains
Use case	Stateful apps, cost-sensitive	Production, HA required
Availability	Lower (single-zone dependency)	Higher (can reach 99.99% with proper LB and health checks)

Autohealing: Self-Repairing Infrastructure

MIGs can automatically detect and replace unhealthy instances using health checks:

gcloud compute health-checks create http my-health-check \
  --port 80 \
  --request-path /health \
  --check-interval 10s \
  --timeout 5s \
  --unhealthy-threshold 3

When an instance fails 3 consecutive health checks, the MIG automatically:

Deletes the unhealthy instance
Creates a new instance from the template
Waits for it to pass health checks
Adds it to the load balancer

Autoscaling: Reactive vs. Predictive

Reactive autoscaling (traditional) responds to current metrics:

CPU utilization exceeds 70%? Add instances.
CPU drops below 40%? Remove instances.

Predictive autoscaling (the 2024+ game-changer) uses ML to forecast load based on historical patterns:

Predictive Autoscaling

If your workload has predictable patterns (daily traffic spikes at 9 AM, weekly batch jobs on Sundays), predictive autoscaling learns these patterns and pre-provisions instances before the load arrives. This eliminates the “cold start” lag where users experience slowness while new instances spin up.

gcloud compute instance-groups managed update my-mig \
  --enable-predictive-autoscaling \
  --predictive-autoscaling-mode=optimize-availability

Live Migration: Google’s Magic Trick

Here’s something that still amazes systems engineers: Google can move your running VM to a different physical host without any downtime.

When live migration happens:

Hardware maintenance scheduled
Software updates on the host
Resource rebalancing

What you experience: Maybe a brief network blip (sub-second). Your VM keeps running, processes stay alive, network connections persist.

Exam Strategy: Availability The exam tests whether you understand the availability policy options:

MIGRATE (default): Live migration during maintenance

TERMINATE: VM is stopped, then restarted on new host

Spot VMs (preemptible) only support TERMINATE because they can be reclaimed at any time.

Security Deep Dive

Goal: Distinguish which security feature addresses which threat: boot tampering, memory snooping, or supply-chain risks.

Maps to ACE 4.1: “Managing Identity and Access Management (IAM)” and ACE 4.2: “Managing service accounts.”

For those of us from security backgrounds, this section is critical.

Security Threat Model

Before diving into features, understand what each one protects against:

Feature	Threat Addressed	Attack Example
Shielded VMs	Boot-level tampering	Bootkit/rootkit persistence in MBR/UEFI
Confidential VMs	Memory snooping	Hypervisor compromise, cold boot attacks, physical access
Trusted Images Policy	Supply-chain / image hygiene	Developer spins up VM from untrusted ISO with backdoor
OS Login	Key management sprawl	Leaked SSH keys, orphaned access after employee departure

The Security Layer Cake

graph TB
    subgraph "Defense in Depth"
        L1[Trusted Images Policy<br/>What can be deployed?]
        L2[Shielded VMs<br/>Is the boot process clean?]
        L3[Confidential VMs<br/>Is memory encrypted?]
        L4[IAM & Firewall<br/>Who can access?]
        L5[OS Login<br/>How do users authenticate?]
    end
    
    L1 --> L2 --> L3 --> L4 --> L5
    
    style L1 fill:#FFCDD2
    style L2 fill:#F8BBD9
    style L3 fill:#E1BEE7
    style L4 fill:#C5CAE9
    style L5 fill:#BBDEFB

Confidential VMs: Memory Encryption in Use

Standard encryption protects data at rest (disk) and in transit (network). Confidential VMs add encryption for data in use—the data in RAM while your application processes it.

Technologies:

AMD SEV-SNP: Secure Encrypted Virtualization with Secure Nested Paging
Intel TDX: Trust Domain Extensions (newer option)

Why this matters: Even if an attacker compromises the hypervisor or has physical access to the host, they cannot read your VM’s memory contents.

gcloud compute instances create my-confidential-vm \
  --confidential-compute \
  --machine-type n2d-standard-4 \
  --zone us-central1-a

Confidential VM Limitations

Not all machine types support Confidential VMs. As of 2026, you’re primarily looking at N2D (AMD) and C3 (Intel TDX) families. The exam may ask which machine types support confidential computing—remember the AMD/Intel split.

Trusted Images Policy: Organizational Control

Prevent developers from spinning up VMs with arbitrary images (like that sketchy ISO someone found on the internet):

# Organization Policy
constraint: compute.trustedImageProjects
listPolicy:
  allowedValues:
    - projects/debian-cloud
    - projects/ubuntu-os-cloud
    - projects/my-golden-images

This is enforced at the Organization or Folder level, ensuring that even project owners can only use approved images.

Instead of managing SSH keys manually on each VM, OS Login ties VM access to Cloud Identity / Google Workspace:

User authenticates with their Google account
Their permissions are checked against IAM
A POSIX account is automatically created/managed on the VM
SSH session is established

gcloud compute instances create my-vm \
  --metadata enable-oslogin=TRUE

Exam Strategy: Security The exam frequently tests the difference between:

Project-wide SSH keys: Stored in project metadata, apply to all VMs

Instance-specific SSH keys: Stored in instance metadata

OS Login: IAM-based, no key management required

For enterprise/secure environments, OS Login is almost always the right answer.

Cost Optimization: A Decision Framework

Goal: Instantly classify a workload into Spot / CUD / On-Demand based on two questions: “Can it be interrupted?” and “Is usage predictable?”

Maps to ACE 1.2: “Managing billing configuration” and touches on ACE 2.1 (Spot VM instances).

The Cost Spectrum

Quick decision rules:

✅ Use Spot VMs when: Batch jobs, CI/CD, fault-tolerant distributed systems, anything with checkpointing
❌ Avoid Spot when: User-facing apps, stateful workloads without external state, SLA requirements
✅ Use CUDs when: Always-on production, predictable baseline (even if traffic varies), 1+ year commitment is acceptable
❌ Avoid CUDs when: Uncertain capacity needs, short-term projects, rapidly evolving architecture
✅ Use On-Demand when: Unpredictable workloads, short experiments, need maximum flexibility
❌ Avoid On-Demand when: Sustained usage over 25% of month (you’re leaving money on the table)

graph TD
    Start[New Workload] --> Q1{Can it tolerate<br/>interruption?}
    
    Q1 -->|Yes| Q2{Batch or<br/>continuous?}
    Q1 -->|No| Q3{Predictable<br/>usage?}
    
    Q2 -->|Batch| SPOT[🎰 Spot VMs<br/>Up to 80-91% off]
    Q2 -->|Continuous| Q3
    
    Q3 -->|Yes, 1+ year| CUD[📝 Committed Use<br/>Up to ~57% off]
    Q3 -->|Variable| SUD[🔄 Sustained Use<br/>Automatic up to ~30% off]
    Q3 -->|Unpredictable| OD[💰 On-Demand<br/>Full price, max flexibility]
    
    style SPOT fill:#4CAF50,color:white
    style CUD fill:#2196F3,color:white
    style SUD fill:#9C27B0,color:white
    style OD fill:#FF5722,color:white

Discount Mechanisms Explained

Sustained Use Discounts (Automatic)

If you run an instance for more than 25% of the month, Google automatically applies discounts:

Usage	Discount
25-50% of month	~10% off
50-75% of month	~20% off
75-100% of month	~30% off

No commitment required. This is pure “thank you for being a good customer.”

Committed Use Discounts (CUDs)

Commit to 1 or 3 years of specific resource usage:

Term	Discount
1 year	Up to ~37% off
3 years	Up to ~57% off

Key insight: CUDs are for resource types (vCPUs, memory), not specific VMs. You can change instance types, zones, even machine families, as long as you’re using the committed resources.

Spot VMs (Formerly Preemptible)

Up to 80–91% off (depending on machine type and region), but Google can reclaim them with 30 seconds notice.

Ideal for:

Batch processing (render farms, ETL jobs)
CI/CD pipelines
Fault-tolerant distributed systems
Anything with checkpointing

Not suitable for:

User-facing applications
Stateful workloads without external state
Anything that can’t handle sudden termination

gcloud compute instances create my-spot-vm \
  --provisioning-model=SPOT \
  --instance-termination-action=DELETE

Exam Strategy: Cost

The exam loves scenarios like: “A company runs nightly data processing jobs that take 4 hours and can restart if interrupted. What’s the most cost-effective option?”

Answer: Spot VMs (fault-tolerant batch = Spot).

Contrast with: “A company runs a critical e-commerce site 24/7 with predictable traffic.”

Answer: 3-year CUD (always-on, predictable = committed use).

Right-Sizing: Don’t Overprovision

Google’s Active Assist analyzes your VM utilization and recommends right-sizing:

gcloud recommender recommendations list \
  --recommender=google.compute.instance.MachineTypeRecommender \
  --project=my-project \
  --location=us-central1-a

Common findings:

“This n2-standard-8 averages 12% CPU. Consider n2-standard-2.”
“This VM has 32GB RAM but uses 4GB. Consider custom machine type.”

Quick Reference: gcloud Commands

Instance Management

# Create a basic instance
gcloud compute instances create my-vm \
  --zone=us-central1-a \
  --machine-type=n4-standard-4 \
  --image-family=debian-12 \
  --image-project=debian-cloud
 
# Create with Hyperdisk
gcloud compute instances create my-vm \
  --zone=us-central1-a \
  --machine-type=n4-standard-8 \
  --create-disk=name=my-data,size=200GB,type=hyperdisk-balanced,provisioned-iops=10000
 
# SSH into instance
gcloud compute ssh my-vm --zone=us-central1-a
 
# List running instances
gcloud compute instances list
 
# Stop/Start
gcloud compute instances stop my-vm --zone=us-central1-a
gcloud compute instances start my-vm --zone=us-central1-a
 
# Delete
gcloud compute instances delete my-vm --zone=us-central1-a

🧪 Micro-Lab: Your First GCE VM Try this hands-on sequence (15 minutes):

Create an E2-micro VM with PD-Balanced boot disk

SSH in and run lsblk to see disk layout

In Console: Compute Engine → Disks → Resize the disk to +10GB

Back in SSH: run sudo growpart /dev/sda 1 and sudo resize2fs /dev/sda1

Note: IOPS didn’t change (PD scales with size, but you have to check the metrics)

Delete the VM when done to avoid charges

Snapshots and Images

# Create snapshot
gcloud compute disks snapshot my-disk \
  --zone=us-central1-a \
  --snapshot-names=my-snapshot
 
# Create image from snapshot
gcloud compute images create my-image \
  --source-snapshot=my-snapshot
 
# Create image from disk
gcloud compute images create my-image \
  --source-disk=my-disk \
  --source-disk-zone=us-central1-a

Managed Instance Groups

# Create instance template
gcloud compute instance-templates create my-template \
  --machine-type=n4-standard-2 \
  --image-family=debian-12 \
  --image-project=debian-cloud
 
# Create regional MIG
gcloud compute instance-groups managed create my-mig \
  --template=my-template \
  --size=3 \
  --region=us-central1
 
# Configure autoscaling
gcloud compute instance-groups managed set-autoscaling my-mig \
  --region=us-central1 \
  --min-num-replicas=2 \
  --max-num-replicas=10 \
  --target-cpu-utilization=0.7

🧪 Micro-Lab: Watch Autohealing in Action Try this to see MIG self-healing (20 minutes):

Create a regional MIG with 2 instances using the commands above

Add an HTTP health check: gcloud compute health-checks create http my-hc --port 80

Attach it: gcloud compute instance-groups managed update my-mig --region=us-central1 --health-check=my-hc

SSH into one instance and run sudo systemctl stop nginx (or kill whatever the health check expects)

Watch in Console: the instance goes UNHEALTHY → gets deleted → new one created

Clean up: delete the MIG, template, and health check

🧪 Micro-Lab: See Right-Sizing Recommendations Quick cost optimization exercise (5 minutes):

Go to Console → Billing → Reports to see spend by SKU

Go to Console → Recommender → VM Right-sizing

Look for any VMs with <20% average CPU—these are candidates for downsizing

Note: recommendations take ~24 hours of usage data to appear

Google Kubernetes Engine - Container orchestration built on GCE
Cloud Run - Serverless containers (abstracted GCE)
Persistent Disk - Legacy storage deep dive
VPC Networks - Networking for Compute Engine
Cloud IAM - Access control for GCE resources
Cloud Monitoring - Observability for your VMs
Instance Templates - Reproducible VM configurations
Cloud Load Balancing - Distributing traffic to MIGs

Summary: The ACE Exam Mental Model

When approaching GCE questions on the Associate Cloud Engineer exam, think in layers:

What family? Match workload characteristics to machine family
What storage? Default to PD-Balanced unless Hyperdisk flexibility is needed
What availability? Single VM < Zonal MIG < Regional MIG (cost vs. SLA tradeoff)
What cost model? Spot (interruptible) → CUD (predictable) → On-Demand (flexible)
What security? Shielded (default) → Confidential (memory encryption) → Trusted Images (org control)

Master these decision points and the exam scenarios become straightforward pattern matching.

Exam Traps Cheat Sheet

Quick reference of common ACE exam pitfalls for GCE:

Topic	Trap	Correct Thinking
Machine families	Picking C4 for “needs good performance”	Check if cost matters—E2/N4 often sufficient
Machine families	Choosing M3 for “large database”	Most DBs are I/O-bound, not memory-bound—check the bottleneck
Machine families	Forgetting Tau T2A is Arm	x86 binaries won’t run without recompilation
Storage	Hyperdisk for simple boot disk	PD-Balanced is the right default
Storage	Hyperdisk when “zone failure” mentioned	Regional PD provides cross-zone replication
Storage	Ignoring disk size → IOPS relationship	On PD, bigger disk = more IOPS
MIGs	Zonal MIG for “high availability”	Regional MIG survives zone failures
MIGs	Forgetting health checks for autohealing	No health check = no autohealing
Autoscaling	Predictive for spiky/unpredictable loads	Predictive needs historical patterns; use reactive for chaos
Cost	CUD for uncertain future capacity	CUD is a commitment—use on-demand if unsure
Cost	On-demand for 24/7 production	You’re leaving 30-57% savings on the table
Security	Confidential VM for “secure boot”	Shielded VM handles boot integrity; Confidential is for memory
Security	SSH keys for enterprise access	OS Login integrates with IAM—prefer it

Practice Scenarios

Test your mental model with these one-liner scenarios. Try to answer before checking.

Scenario: A startup runs nightly ML training jobs that process 500GB of data. Jobs can restart from checkpoints. They want to minimize costs.
- Your answer: ?
- Check answer
Scenario: A bank needs VMs for a trading application where even the hypervisor shouldn’t be able to read memory contents.
- Your answer: ?
- Check answer
Scenario: An e-commerce company has predictable traffic with 3x spikes during sales events. They need to survive a zone failure.
- Your answer: ?
- Check answer
Scenario: A team needs to run legacy Windows x86 applications with consistent CPU performance for EDA workloads.
- Your answer: ?
- Check answer
Scenario: A company wants to prevent developers from launching VMs using random public images from the internet.
- Your answer: ?
- Check answer

Practice Answers

ML training with checkpoints, cost-sensitive: Spot VMs (fault-tolerant batch) + possibly Storage Optimized Z3 or Hyperdisk Throughput for the 500GB data reads.
Memory contents hidden from hypervisor: Confidential VMs (AMD SEV-SNP or Intel TDX for memory encryption in use).
Predictable traffic, survives zone failure: Regional MIG with predictive autoscaling + load balancer. CUD for the baseline capacity.
Legacy Windows x86, consistent CPU, EDA: Compute Optimized C4 (not Tau T2A—that’s Arm). Consider sole-tenant nodes if licensing requires it.
Prevent random public images: Trusted Images Policy via Organization Policy constraint compute.trustedImageProjects.

GCE Mastery Roadmap — 20 hands-on projects
ACE Certification Plan — Full study plan
gcp-overview — Why choose GCP
gcp-learning-path — GCP learning roadmap
gcp-resources — Courses and labs
GCP Hierarchy Explorer — Python project
docker-to-cloud-run — Container deployment lab

Last updated: 2026-01-09

garden.izzy.sh

Explorer

Google Compute Engine: The 2026 Deep Dive

At a Glance

Why GCE Still Matters in a Serverless World

The Hardware Deep Dive: Understanding Machine Families

The Five Machine Families

General Purpose (N4, E2, Tau T2A/T2D)

Compute Optimized (C4, C3)

Memory Optimized (X4, M3)

Storage Optimized (Z3)

Accelerator Optimized (A3, A2)

The Titanium Evolution

The Storage Layer: Why Hyperdisk is the 2026 Standard

The Critical Decision: Regional PD vs Hyperdisk

The Old World: Persistent Disk

The New World: Hyperdisk

Hyperdisk Variants

Storage Pools: The Cost Optimization Secret

Life of a VM: From Boot to Production

The Boot Process

Shielded VM: Security by Default

Startup Scripts: Automation at Boot

Resiliency: Designing for 99.99% SLA

The Hierarchy of Resilience

Managed Instance Groups (MIGs)

Autohealing: Self-Repairing Infrastructure

Autoscaling: Reactive vs. Predictive

Live Migration: Google’s Magic Trick

Security Deep Dive

Security Threat Model

The Security Layer Cake

Confidential VMs: Memory Encryption in Use

Trusted Images Policy: Organizational Control

OS Login: Centralized Access Control

Cost Optimization: A Decision Framework

The Cost Spectrum

Discount Mechanisms Explained

Sustained Use Discounts (Automatic)

Committed Use Discounts (CUDs)

Spot VMs (Formerly Preemptible)

Right-Sizing: Don’t Overprovision

Quick Reference: gcloud Commands

Instance Management

Snapshots and Images

Managed Instance Groups

Related Topics

Summary: The ACE Exam Mental Model

Exam Traps Cheat Sheet

Practice Scenarios

Practice Answers

Related

Graph View

Table of Contents

Backlinks