Deploying Next.js to AWS with SST v4: the walkthrough we wish we had

Vercel is fine until it isn't. Around the time your monthly bill stops being a rounding error, or your client needs data residency, or your middleware invocations start showing up as their own line item, the math flips. This is the guide we use internally to move a Next.js app to AWS on SST v4 in about a day — written so a junior engineer can follow it end to end without getting stuck on the three things that usually bite.

When SST v4 is actually the right call

We don't move every client off Vercel. We move them when one of these is true:

The bill is a real line item. Middleware executions, image optimization, and egress add up fast above the hobby plans.
The client owns their AWS account already. Compliance, SOC 2, or "our data has to live in our cloud" makes this non-negotiable.
Traffic is spiky in a way Vercel's pricing hates. Background jobs, batch workloads, or occasional traffic storms are cheaper on raw AWS primitives.

For a weekend project, stay on Vercel. For a business whose infrastructure is part of the product, SST v4 + OpenNext is the grown-up answer.

The mental model (read this first)

Skip the model and every error message will feel like a mystery. Spend fifteen minutes here and the rest is mostly typing.

SST v4 is three layers wearing one command:

sst.config.ts describes the infrastructure you want. SST uses Pulumi under the hood to turn that TypeScript into real AWS resources.
OpenNext compiles next build output into Lambda-shaped artifacts. The sst.aws.Nextjs component calls OpenNext for you. You don't touch it directly.
AWS provisions the actual resources. Lambda runs your server code, S3 stores static assets and the ISR cache, CloudFront is the public URL and global CDN.

Three things to internalize before you type a single command:

sst.config.ts is the source of truth. Not the AWS console. If you click around in the console, SST will eventually overwrite what you did — or refuse to touch it. Treat the console as read-only.
State lives in S3. SST writes a state file to a bucket in your account (sst-state-*). Do not delete it. Losing state means SST forgets it owns your resources and you get to import or recreate them by hand.
Stages are just namespaces. sst deploy --stage production and sst deploy --stage joe create two totally separate copies of the whole stack. This is the killer feature — PR previews cost nothing to spin up and nothing to tear down.

Pre-flight: AWS account hygiene

First-time SST deploys fail in the same three ways every time: wrong region, wrong IAM permissions, no default profile. Knock all three out before writing any config.

Pick a primary region and commit to it. We default to us-west-2. Pick whatever you want; just be consistent across sst.config.ts and your GitHub Actions workflows later. The ACM cert for CloudFront still has to live in us-east-1 because CloudFront requires it there, but SST handles that for you.
Confirm your AWS profile works:

aws sts get-caller-identity

You should get back an account ID and an ARN. If you get an error, fix your profile before going further.

Give the deploy user AdministratorAccess for now. Least-privilege IAM is a worthy goal, but on day one it'll cost you four hours of debugging S3 bucket policies. Tighten later.
If you have multiple AWS profiles, set AWS_PROFILE in your shell before running any sst command. SST inherits whatever the AWS SDK would inherit, and sourcing the wrong credentials into a deploy is a classic foot-gun.

Make the app deploy-safe before adding SST

Fixing Next.js issues locally with next build is cheap. Fixing them inside a Lambda cold-start log is not. Harden the app first, then wrap it in infra.

Lambda is not your dev machine. The serverless server has:

No persistent memory between invocations. Your in-memory rate limit is already dead; it just doesn't know it yet.
No writable filesystem except /tmp, which is ephemeral.
A cold start cost on the first request after idleness.
A strict timeout (10s default on OpenNext, configurable).

Anything in your code that assumes "the server keeps running" is a bug waiting to happen. Before going further:

Run a real production build locally and read every warning:

npm run build

If you have an in-memory rate limiter in middleware.ts, mark it with a // TODO(prod): comment. Don't fix it yet — we'll move it to Cloudflare's edge in a later step.
Don't refactor anything else. No "while we're in here" changes mid-deploy. Resist the urge.

Install SST and scaffold the config

This is where AWS becomes a thing your repo describes.

Think of sst.config.ts as package.json for infrastructure. You declare what you want; sst deploy reconciles reality to match. Delete a line, the resource gets destroyed on the next deploy. Add one, it gets created. There's no "save" button — the file is the save.

From the project root:

npx sst@latest init

Choose AWS when prompted. It adds sst to devDependencies, creates sst.config.ts, and adds .sst/ to .gitignore. Verify that last part actually happened.

The generated sst.config.ts should look roughly like this:

export default $config({
  app(input) {
    return {
      name: 'my-app',
      removal: input?.stage === 'production' ? 'retain' : 'remove',
      home: 'aws',
      providers: { aws: { region: 'us-west-2' } },
    }
  },
  async run() {
    new sst.aws.Nextjs('Web')
  },
})

Two lines matter right now:

removal: 'retain' on production means sst remove --stage production will not nuke your S3 buckets. Keep it. This is the line that saves your bacon when you accidentally tear down the wrong stage six months from now.
home: 'aws' means SST stores its state in your AWS account. You own it.

Don't add a custom domain yet. Don't deploy yet. First, lint the config:

npx sst diff --stage dev

Read the plan. You should see a Lambda function, an S3 bucket, a CloudFront distribution, and some IAM glue. If you see anything unexpected, figure out why before deploying.

First deploy to a dev stage

One variable at a time. You want to see the app working on AWS before adding the domain, secrets, CI, and Cloudflare.

npx sst deploy --stage dev

The first deploy takes 5-10 minutes because CloudFront is provisioning globally. Subsequent deploys take 1-3 minutes because only changed pieces update. Don't Ctrl-C it. An interrupted first deploy leaves a half-built stack that's annoying to clean up.

When it finishes, open the CloudFront URL it prints — something like https://d1234abcd.cloudfront.net. Your home page should render, fonts should load, and static images should work. The contact form will fail because the secret isn't wired yet. That's expected.

Take five minutes to explore the AWS console (read-only) and see what SST built for you:

CloudFront → find your new distribution. The origin points at both an S3 bucket and a Lambda Function URL. CloudFront routes static asset paths to S3 and dynamic paths to Lambda. That split is the whole magic of OpenNext.
Lambda → you have two or three functions: Web-server, Web-image-optimization, maybe Web-revalidation.
S3 → one bucket holds your _next/static/* assets, ISR cache, and any uploads.

If the deploy fails partway with an IAM error, it's almost always a missing permission on the deploy user. Re-check the pre-flight step.

Secrets the SST way

.env.local doesn't exist on Lambda. We need a way to ship API keys into the runtime safely, without putting them in the repo, in env files, or in CI.

An SST Secret is an AWS SSM parameter with an automatic IAM grant wrapped around it. You declare a secret, set its value once with sst secret set, then link it to a component. SST generates a typed Resource.X.value you import in your code.

Edit sst.config.ts:

async run() {
  const resendKey = new sst.Secret('ResendApiKey')
  const fromEmail = new sst.Secret('FromEmail')

  new sst.aws.Nextjs('Web', {
    link: [resendKey, fromEmail],
  })
}

Set the values for the dev stage:

npx sst secret set ResendApiKey "re_yourNewKey..." --stage dev
npx sst secret set FromEmail "no-reply@example.com" --stage dev

Then in your API route, read from Resource instead of process.env:

import { Resource } from 'sst'

const apiKey = Resource.ResendApiKey.value
const fromEmail = Resource.FromEmail.value

Redeploy with npx sst deploy --stage dev and submit the contact form. Email should arrive.

Secrets are per-stage. You'll set them again for production later. That's a feature, not a bug — it means dev keys can't leak into prod and vice versa.

The custom domain step everyone gets wrong

This is the part every guide glosses over and every engineer loses an afternoon to. The trap is that there are two valid architectures and they conflict.

Option A — Cloudflare proxy (orange cloud) in front of CloudFront. Cloudflare terminates TLS at the edge, re-encrypts, and forwards to CloudFront, which has its own ACM cert. You get Cloudflare WAF, rate limiting, analytics, and Turnstile. Two TLS hops. Slightly more latency. Effectively free.

Option B — Cloudflare DNS-only (gray cloud) pointing straight at CloudFront. Cloudflare just resolves the name; CloudFront terminates TLS. You lose Cloudflare's WAF and rate-limit features on this domain. One TLS hop. Simpler.

For most SST + Next.js projects, Option A is what you want. The Cloudflare WAF pays for itself the first time a bot discovers your contact form.

Add the domain to your Nextjs component:

const isProd = $app.stage === 'production'

new sst.aws.Nextjs('Web', {
  link: [resendKey, fromEmail],
  domain: {
    name: isProd ? 'example.com' : `${$app.stage}.example.com`,
    aliases: isProd ? ['www.example.com'] : [],
    dns: false,
  },
})

dns: false tells SST you're managing DNS in Cloudflare, not Route 53. Run npx sst deploy --stage dev. The deploy will pause and print DNS records for you to add to Cloudflare. Then:

Add the ACM validation CNAME(s) Cloudflare asks you to. Set these to DNS-only (gray cloud). This is the one everyone gets wrong. ACM cannot validate a certificate through Cloudflare's proxy. If the cloud is orange, validation hangs forever and you'll think SST is broken. It's not.
Add CNAME dev → d1234abcd.cloudfront.net — proxied (orange cloud) if you want Option A, gray if you want Option B.
In Cloudflare SSL/TLS settings, set encryption mode to Full (strict).
Wait 2-10 minutes. SST resumes automatically once ACM confirms the certificate.

Verify with curl -I https://dev.example.com — if you see a cf-ray header, Cloudflare is in front of the request. If the page loads over HTTPS with a valid cert, you're done.

Cloudflare as your security layer

Now honor that // TODO(prod): we left in middleware.ts. Do it at the edge, not in your Lambda.

Cloudflare's free tier gives you a managed WAF ruleset, rate-limiting rules, and bot detection. Doing this work in your origin means every abusive request still costs you Lambda invocations and doesn't get blocked until it's already inside your perimeter.

In the Cloudflare dashboard → Security → WAF → Rate Limiting Rules, add a rule:

Match: URI Path contains "/api/"
Threshold: 5 requests per 1 minute per IP
Action: Block (or Managed Challenge) for 1 minute

While you're there, enable Cloudflare's Managed Ruleset (free tier subset) and consider adding Turnstile to your contact form. Both are close to zero effort and close to 100% of the value of a "real" WAF.

Then strip the in-memory rate limiter out of middleware.ts. Keep any origin allow-list check you have — it's still useful as a last-line defense — but the Map, the rate-limit window, and the "count requests per IP" branch all need to go. They don't do anything on Lambda anyway; globals reset on every cold start.

Verify the new rule by hammering the API from a shell:

for i in {1..10}; do curl -X POST https://dev.example.com/api/send-email; done

Requests 6 and beyond should come back with a 429-ish response from Cloudflare.

Promote to production

You've done the entire flow once on a throwaway stage. Production is just the same flow with a different stage name.

Production is not special. It's a stage you treat carefully. The removal: 'retain' line from earlier is what makes it "special" — its data buckets survive even if you accidentally sst remove.

npx sst secret set ResendApiKey "re_yourProdKey..." --stage production
npx sst secret set FromEmail "no-reply@example.com" --stage production
npx sst deploy --stage production

Configure your production DNS in Cloudflare the same way you did for dev: ACM validation CNAMEs gray-cloud, the CloudFront CNAME orange-cloud, SSL/TLS set to Full (strict). If you're using both apex and www, pick one as canonical and use a Cloudflare redirect rule to 301 the other to it. Redirect rules must be Dynamic type — Static silently strips the path on deep URLs, which breaks /sitemap.xml fetches from Search Console.

Verify with real requests, not eyeballs:

curl -sSI https://example.com/ | head -1
# → HTTP/2 200

curl -sSI "https://www.example.com/services?utm=test" | grep -iE "^HTTP|^location"
# → HTTP/2 301
# → location: https://example.com/services?utm=test

After prod is live, submit your sitemap to Google Search Console and request indexing on your main routes. That's what forces Google to re-crawl and replace any stale SERP snippets from whatever was at that domain before.

CI/CD with GitHub Actions

Manual deploys are fine until you forget to set a secret, deploy from the wrong branch, or merge a PR that doesn't compile. CI removes that entire class of mistake.

The pipeline we run on every SST project has four workflows and two trigger points:

PR opens → a verify workflow runs typecheck, lint, and tests in parallel, and a separate sst-diff workflow posts npx sst diff --stage production as a sticky PR comment.
Push to main → a deploy workflow re-runs the same gates, then runs npx sst deploy --stage production.

CI itself is just a Linux box with short-lived OIDC credentials running the same sst commands you ran locally. The only new ideas are where the AWS credentials come from and when each workflow runs.

One-time AWS setup

Create an IAM OIDC identity provider for GitHub Actions (once per AWS account, ever):

aws iam create-open-id-connect-provider \
  --url https://token.actions.githubusercontent.com \
  --client-id-list sts.amazonaws.com \
  --thumbprint-list 6938fd4d98bab03faadb97b34396831e3780aea1

Create a role gh-actions-sst-deployer whose trust policy only lets this repo assume it:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Federated": "arn:aws:iam::<ACCOUNT_ID>:oidc-provider/token.actions.githubusercontent.com"
      },
      "Action": "sts:AssumeRoleWithWebIdentity",
      "Condition": {
        "StringEquals": { "token.actions.githubusercontent.com:aud": "sts.amazonaws.com" },
        "StringLike": { "token.actions.githubusercontent.com:sub": "repo:<OWNER>/<REPO>:*" }
      }
    }
  ]
}

Attach AdministratorAccess to the role (tighten later), then store the role ARN as a GitHub repo secret called AWS_ROLE_ARN.

Make the config CI-safe

Local deploys use a named AWS profile. CI doesn't have one, and hardcoding a profile would override the OIDC env vars and break CI. Guard it:

providers: {
  aws: {
    profile: process.env.GITHUB_ACTIONS ? undefined : 'my-profile',
    region: 'us-west-2',
  },
},

GITHUB_ACTIONS=true is set automatically on every Actions runner.

The deploy workflow

The core of deploy.yml:

name: Deploy
on:
  push:
    branches: [main]
  workflow_dispatch:
permissions:
  id-token: write
  contents: read
concurrency:
  group: deploy-production
  cancel-in-progress: false
jobs:
  deploy:
    runs-on: ubuntu-latest
    timeout-minutes: 20
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with: { node-version: '22', cache: 'npm' }
      - run: npm ci
      - uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: ${{ secrets.AWS_ROLE_ARN }}
          aws-region: us-west-2
      - run: npx sst deploy --stage production

cancel-in-progress: false is load-bearing. Never cancel a deploy mid-flight — it can corrupt SST state and turn a 5-minute deploy into a 90-minute recovery.

Then turn on branch protection for main and require the verify workflow to pass before merging. Without branch protection, your PR checks are decorative.

The gotchas that will cost you time

After shipping a few dozen of these, these are the ones we still see trip up experienced engineers:

ACM validation CNAMEs must be gray-cloud in Cloudflare. ACM cannot validate through a proxy. If validation hangs for more than 15 minutes, this is almost always why.
Never edit infrastructure in the AWS console. If you must, immediately reflect the change in sst.config.ts and redeploy. Drift between the config and reality is how you end up with a 2 AM incident over an "innocent" console tweak from six months ago.
In-memory rate limiting on Lambda is a no-op. Globals reset on every cold start. Move the logic to Cloudflare's edge or a shared Redis instance (Upstash works well).
sst diff before sst deploy on production. Every time. It's the equivalent of git status before git push --force.
Check the AWS bill weekly for the first month. A marketing site on SST should cost a few dollars a month. If you see $50, something's wrong — usually a runaway log group or an accidental NAT gateway.

Need a team to do this with you?

We do this migration as a fixed-scope engagement: Vercel or managed-host Next.js app → SST v4 on your AWS account → CI/CD gated on typecheck, lint, tests, and a posted sst diff. Usually 1-2 weeks, depending on how much of your config is hiding in a Vercel dashboard.

Send us the repo and we'll tell you what we'd change first — and whether SST is actually the right move for your setup, because sometimes it isn't.