Boost Docker Image Builds on AWS CodeBuild with ECR Remote Cache

As a DevOps or platform engineer, you live in the CI/CD pipeline. And one of the most frustrating bottlenecks in that pipeline is slow Docker image builds. Every time AWS CodeBuild spins up a fresh environment, it starts from zero, pulling base layers and re-building every intermediate step. This wastes valuable compute minutes and slows down your feedback loop from commit to deployment.

The standard CodeBuild local caching (type: local) is often insufficient, as it’s bound to a single build host and frequently misses. The real solution is a shared, persistent, remote cache. This guide will show you exactly how to implement a high-performance remote cache using Docker’s BuildKit engine and Amazon ECR.

Why Are Your Docker Image Builds in CI So Slow?

In a typical CI environment like AWS CodeBuild, each build runs in an ephemeral, containerized environment. This isolation is great for security and reproducibility but terrible for caching. When you run docker build, it has no access to the layers from the previous build run. This means:

  • Base layers (like ubuntu:22.04 or node:18-alpine) are downloaded every single time.
  • Application dependencies (like apt-get install or npm install) are re-run and re-downloaded, even if package.json hasn’t changed.
  • Every RUN, COPY, and ADD command executes from scratch.

This results in builds that can take 10, 15, or even 20 minutes, when the same build on your local machine (with its persistent cache) takes 30 seconds. This is not just an annoyance; it’s a direct cost in developer productivity and AWS compute billing.

The Solution: BuildKit’s Registry-Based Remote Cache

The modern Docker build engine, BuildKit, introduces a powerful caching mechanism that solves this problem perfectly. Instead of relying on a fragile local-disk cache, BuildKit can use a remote OCI-compliant registry (like Amazon ECR) as its cache backend.

This is achieved using two key flags in the docker buildx build command:

  • --cache-from: Tells BuildKit where to *pull* existing cache layers from.
  • --cache-to: Tells BuildKit where to *push* new or updated cache layers to after a successful build.

The build process becomes:

  1. Start build.
  2. Pull cache metadata from the ECR cache repository (defined by --cache-from).
  3. Build the Dockerfile, skipping any steps that have a matching layer in the cache.
  4. Push the final application image to its ECR repository.
  5. Push the new/updated cache layers to the ECR cache repository (defined by --cache-to).
# This is a conceptual example. The buildspec implementation is below.
docker buildx build \
    --platform linux/amd64 \
    --tag my-app:latest \
    --push \
    --cache-from type=registry,ref=ACCOUNT_ID.dkr.ecr.REGION.amazonaws.com/my-cache-repo:latest \
    --cache-to type=registry,ref=ACCOUNT_ID.dkr.ecr.REGION.amazonaws.com/my-cache-repo:latest,mode=max \
    .

Step-by-Step: Implementing ECR Remote Cache in AWS CodeBuild

Let’s configure this production-ready solution from the ground up. We’ll assume you already have a CodeBuild project and an ECR repository for your application image.

Prerequisite: Enable BuildKit in CodeBuild

First, you must instruct CodeBuild to use the BuildKit engine. The easiest way is by setting the DOCKER_BUILDKIT=1 environment variable in your buildspec.yml. You also need to ensure your build environment has a new enough Docker version. The aws/codebuild/amazonlinux2-x86_64-standard:5.0 image (or newer) works perfectly.

Add this to the top of your buildspec.yml:

version: 0.2

env:
  variables:
    DOCKER_BUILDKIT: 1
phases:
  # ... rest of the buildspec ...

This simple flag switches CodeBuild from the legacy builder to the modern BuildKit-enabled buildx CLI. You can also get more explicit control by installing the docker-buildx-plugin, but the environment variable is sufficient for most use cases.

Step 1: Configure IAM Permissions

Your CodeBuild project’s Service Role needs permission to read from and write to **both** your application ECR repository and your new cache ECR repository. Ensure its IAM policy includes the following actions:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "ecr:GetDownloadUrlForLayer",
                "ecr:BatchGetImage",
                "ecr:BatchCheckLayerAvailability",
                "ecr:PutImage",
                "ecr:InitiateLayerUpload",
                "ecr:UploadLayerPart",
                "ecr:CompleteLayerUpload",
                "ecr:GetAuthorizationToken"
            ],
            "Resource": [
                "arn:aws:ecr:YOUR_REGION:YOUR_ACCOUNT_ID:repository/your-app-repo",
                "arn:aws:ecr:YOUR_REGION:YOUR_ACCOUNT_ID:repository/your-build-cache-repo"
            ]
        },
        {
            "Effect": "Allow",
            "Action": "ecr:GetAuthorizationToken",
            "Resource": "*"
        }
    ]
}

Step 2: Define Your Cache Repository

It is a strong best practice to create a **separate ECR repository** just for your build cache. Do *not* push your cache to the same repository as your application images.

  1. Go to the Amazon ECR console.
  2. Create a new **private** repository. Name it something descriptive, like my-project-build-cache.
  3. Set up a Lifecycle Policy on this cache repository to automatically expire old images (e.g., “expire images older than 14 days”). This is critical for cost management, as the cache can grow quickly.

Step 3: Update Your buildspec.yml for Caching

Now, let’s tie it all together in the buildspec.yml. We’ll pre-define our repository URIs and use the buildx command with our cache flags.

version: 0.2

env:
  variables:
    DOCKER_BUILDKIT: 1
    # Define your repositories
    APP_IMAGE_REPO_URI: "YOUR_ACCOUNT_ID.dkr.ecr.YOUR_REGION.amazonaws.com/your-app-repo"
    CACHE_REPO_URI: "YOUR_ACCOUNT_ID.dkr.ecr.YOUR_REGION.amazonaws.com/your-build-cache-repo"
    IMAGE_TAG: "latest" # Or use $CODEBUILD_RESOLVED_SOURCE_VERSION

phases:
  pre_build:
    commands:
      - echo "Logging in to Amazon ECR..."
      - aws ecr get-login-password --region $AWS_DEFAULT_REGION | docker login --username AWS --password-stdin $AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com

  build:
    commands:
      - echo "Starting Docker image build with remote cache..."
      - |
        docker buildx build \
          --platform linux/amd64 \
          --tag $APP_IMAGE_REPO_URI:$IMAGE_TAG \
          --cache-from type=registry,ref=$CACHE_REPO_URI:$IMAGE_TAG \
          --cache-to type=registry,ref=$CACHE_REPO_URI:$IMAGE_TAG,mode=max \
          --push \
          .
      - echo "Build complete."

  post_build:
    commands:
      - echo "Writing image definitions file..."
      # (Optional) For CodePipeline deployments
      - printf '[{"name":"app-container","imageUri":"%s"}]' "$APP_IMAGE_REPO_URI:$IMAGE_TAG" > imagedefinitions.json

artifacts:
  files:
    - imagedefinitions.json

Breaking Down the buildx Command

  • --platform linux/amd64: Explicitly defines the target platform. This is a good practice for CI environments.
  • --tag ...: Tags the final image for your application repository.
  • --cache-from type=registry,ref=$CACHE_REPO_URI:$IMAGE_TAG: This tells BuildKit to look in your cache repository for a manifest tagged with latest (or your specific branch/commit tag) and use its layers as a cache source.
  • --cache-to type=registry,ref=$CACHE_REPO_URI:$IMAGE_TAG,mode=max: This is the magic. It tells BuildKit to push the resulting cache layers back to the cache repository. mode=max ensures all intermediate layers are cached, not just the final stage.
  • --push: This single flag tells buildx to *both* build the image and push it to the repository specified in the --tag flag. It’s more efficient than a separate docker push command.

Architectural Note: Handling the First Build

On the very first run, the --cache-from repository won’t exist, and the build log will show a “not found” error. This is expected and harmless. The build will proceed without a cache and then populate it using --cache-to. Subsequent builds will find and use this cache.

Analyzing the Performance Boost

You will see the difference immediately in your CodeBuild logs.

**Before (Uncached):**


#1 [internal] load build definition from Dockerfile
#1 transferring dockerfile: 32B done
#1 ...
#2 [internal] load .dockerignore
#2 transferring context: 2B done
#2 ...
#3 [internal] load metadata for docker.io/library/node:18-alpine
#3 ...
#4 [1/5] FROM docker.io/library/node:18-alpine
#4 resolve docker.io/library/node:18-alpine...
#4 sha256:.... 6.32s done
#4 ...
#5 [2/5] WORKDIR /app
#5 0.5s done
#6 [3/5] COPY package*.json ./
#6 0.1s done
#7 [4/5] RUN npm install --production
#7 28.5s done
#8 [5/5] COPY . .
#8 0.2s done
    

**After (Remote Cache Hit):**


#1 [internal] load build definition from Dockerfile
#1 transferring dockerfile: 32B done
#1 ...
#2 [internal] load .dockerignore
#2 transferring context: 2B done
#2 ...
#3 [internal] load metadata for docker.io/library/node:18-alpine
#3 ...
#4 [internal] load build context
#4 transferring context: 450kB done
#4 ...
#5 [1/5] FROM docker.io/library/node:18-alpine
#5 CACHED
#6 [2/5] WORKDIR /app
#6 CACHED
#7 [3/5] COPY package*.json ./
#7 CACHED
#8 [4/5] RUN npm install --production
#8 CACHED
#9 [5/5] COPY . .
#9 0.2s done
#10 exporting to image
    

Notice the CACHED status for almost every step. The build time can drop from 10 minutes to under 1 minute, as CodeBuild is only executing the steps that actually changed (in this case, the final COPY . .) and downloading the pre-built layers from ECR.

Advanced Strategy: Multi-Stage Builds and Cache Granularity

This remote caching strategy truly shines with multi-stage Dockerfiles. BuildKit is intelligent enough to cache each stage independently.

Consider this common pattern:

# --- Build Stage ---
FROM node:18-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm install
COPY . .
RUN npm run build

# --- Production Stage ---
FROM node:18-alpine
WORKDIR /app
COPY --from=builder /app/package.json ./package.json
COPY --from=builder /app/dist ./dist
# Only copy production node_modules
COPY --from=builder /app/node_modules ./node_modules
EXPOSE 3000
CMD ["node", "dist/main.js"]

With the --cache-to mode=max setting, BuildKit will store the layers for *both* the builder stage and the final production stage in the ECR cache. If you only change a file in the dist directory (e.g., a source code change), BuildKit will:

  1. Pull the cache.
  2. Find a match for the entire builder stage and skip it (CACHED).
  3. Re-run only the COPY --from=builder commands and subsequent steps in the final stage.

This provides maximum granularity and speed, ensuring you only ever rebuild the absolute minimum necessary.

Frequently Asked Questions (FAQ)

Is ECR remote caching free?

No, but it is extremely cheap. You pay standard Amazon ECR storage costs for the cache images and data transfer costs. This is why setting a Lifecycle Policy on your cache repository to delete images older than 7-14 days is essential. The cost savings in CodeBuild compute-minutes will almost always vastly outweigh the minor ECR storage cost.

How is this different from CodeBuild’s local cache (cache: paths)?

CodeBuild’s local cache (cache: - '/root/.docker') saves the Docker cache *on the build host* and attempts to restore it for the next build. This is unreliable because:

  1. You aren’t guaranteed to get the same build host.
  2. The cache is not shared across concurrent builds (e.g., for two different branches).

The ECR remote cache is a centralized, shared, persistent cache. All builds (concurrent or sequential) pull from and push to the same ECR repository, leading to much higher cache-hit rates.

Can I use this with other registries (e.g., Docker Hub, GHCR)?

Yes. The type=registry cache backend is part of the BuildKit standard. As long as your CodeBuild role has credentials to docker login and push/pull from that registry, you can point your --cache-from and --cache-to flags at any OCI-compliant registry.

How should I tag my cache?

Using :latest (as in the example) provides a good general-purpose cache. However, for more granular control, you can tag your cache based on the branch name (e.g., $CACHE_REPO_URI:$CODEBUILD_WEBHOOK_HEAD_REF). A common “best of both worlds” approach is to cache-to a branch-specific tag but cache-from both the branch and the default branch (e.g., main):


docker buildx build \
  ...
  --cache-from type=registry,ref=$CACHE_REPO_URI:main \
  --cache-from type=registry,ref=$CACHE_REPO_URI:$MY_BRANCH_TAG \
  --cache-to type=registry,ref=$CACHE_REPO_URI:$MY_BRANCH_TAG,mode=max \
  ...
    

This allows feature branches to benefit from the cache built by main, while also building their own specific cache.

Boost Docker Image Builds on AWS CodeBuild with ECR Remote Cache

Conclusion

Stop waiting for slow Docker image builds in CI. By moving away from fragile local caches and embracing a centralized remote cache, you can drastically improve the performance and reliability of your entire CI/CD pipeline.

Leveraging AWS CodeBuild’s support for BuildKit and Amazon ECR as a cache backend is a modern, robust, and cost-effective solution. The configuration is minimal-a few lines in your buildspec.yml and an IAM policy update—but the impact on your developer feedback loop is enormous. Thank you for reading the DevopsRoles page!

,

About HuuPV

My name is Huu. I love technology, especially Devops Skill such as Docker, vagrant, git, and so forth. I like open-sources, so I created DevopsRoles.com to share the knowledge I have acquired. My Job: IT system administrator. Hobbies: summoners war game, gossip.
View all posts by HuuPV →

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.