Tag Archives: DevOps

AWS

Master Amazon EKS Metrics: Automated Collection with AWS Prometheus

01/07/2026 HuuPV Leave a comment

Observability at scale is the silent killer of Kubernetes operations. For expert platform engineers, the challenge isn’t just generating Amazon EKS metrics; it is ingesting, storing, and querying them without managing a fragile, self-hosted Prometheus stateful set that collapses under high cardinality.

In this guide, we bypass the basics. We will architect a production-grade observability pipeline using Amazon Managed Service for Prometheus (AMP) and the AWS Distro for OpenTelemetry (ADOT). We will cover Infrastructure as Code (Terraform) implementation, IAM Roles for Service Accounts (IRSA) security patterns, and advanced filtering techniques to keep your metric ingestion costs manageable.

The Scaling Problem: Why Self-Hosted Prometheus Fails EKS

Standard Prometheus deployments on EKS work flawlessly for development clusters. However, as you scale to hundreds of nodes and thousands of pods, the “pull-based” model combined with local TSDB storage hits a ceiling.

Vertical Scaling Limits: A single Prometheus server eventually runs out of memory (OOM) attempting to ingest millions of active series.
Data Persistence: Managing EBS volumes for long-term metric retention is operational toil.
High Availability: Running HA Prometheus pairs doubles your cost and introduces “gap” complexities during failovers.

Pro-Tip: The solution is to decouple collection from storage. By using stateless collectors (ADOT) to scrape Amazon EKS metrics and remote-writing them to a managed backend (AMP), you offload the heavy lifting of storage, availability, and backups to AWS.

Architecture: EKS, ADOT, and AMP

The modern AWS-native observability stack consists of three distinct layers:

Generation: Your application pods and Kubernetes node exporters.
Collection (The Agent): The AWS Distro for OpenTelemetry (ADOT) collector running as a DaemonSet or Deployment. It scrapes Prometheus endpoints and remote-writes data.
Storage (The Backend): Amazon Managed Service for Prometheus (AMP), which is Cortex-based, scalable, and fully compatible with PromQL.

Step-by-Step Implementation

We will use Terraform for the infrastructure foundation and Helm for the Kubernetes components.

1. Provisioning the AMP Workspace

First, we create the AMP workspace. This is the distinct logical space where your metrics will reside.

resource "aws_prometheus_workspace" "eks_observability" {
  alias = "production-eks-metrics"

  tags = {
    Environment = "Production"
    ManagedBy   = "Terraform"
  }
}

output "amp_workspace_id" {
  value = aws_prometheus_workspace.eks_observability.id
}

output "amp_remote_write_url" {
  value = "${aws_prometheus_workspace.eks_observability.prometheus_endpoint}api/v1/remote_write"
}

2. Security: IRSA for Metric Ingestion

The ADOT collector needs permission to write to AMP. We utilize IAM Roles for Service Accounts (IRSA) to grant least-privilege access, avoiding static access keys.

Create an IAM policy AWSManagedPrometheusWriteAccess (or a scoped inline policy) and attach it to a role trusted by your EKS OIDC provider.

data "aws_iam_policy_document" "amp_ingest_policy" {
  statement {
    actions = [
      "aps:RemoteWrite",
      "aps:GetSeries",
      "aps:GetLabels",
      "aps:GetMetricMetadata"
    ]
    resources = [aws_prometheus_workspace.eks_observability.arn]
  }
}

resource "aws_iam_role" "adot_collector" {
  name = "eks-adot-collector-role"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Action = "sts:AssumeRoleWithWebIdentity"
      Effect = "Allow"
      Principal = {
        Federated = "arn:aws:iam::${var.account_id}:oidc-provider/${var.oidc_provider}"
      }
      Condition = {
        StringEquals = {
          "${var.oidc_provider}:sub" = "system:serviceaccount:adot-system:adot-collector"
        }
      }
    }]
  })
}

3. Deploying the ADOT Collector

We deploy the ADOT collector using the EKS add-on or Helm. For granular control over the scraping configuration, the Helm chart is often preferred by power users.

Below is a snippet of the values.yaml configuration required to enable the Prometheus receiver and configure the remote write exporter to send Amazon EKS metrics to your workspace.

# ADOT Helm values.yaml
mode: deployment
serviceAccount:
  create: true
  name: adot-collector
  annotations:
    eks.amazonaws.com/role-arn: "arn:aws:iam::123456789012:role/eks-adot-collector-role"

config:
  receivers:
    prometheus:
      config:
        global:
          scrape_interval: 15s
        scrape_configs:
          - job_name: 'kubernetes-pods'
            kubernetes_sd_configs:
              - role: pod
            relabel_configs:
              - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
                action: keep
                regex: true

  exporters:
    prometheusremotewrite:
      endpoint: "https://aps-workspaces.us-east-1.amazonaws.com/workspaces/ws-xxxx/api/v1/remote_write"
      auth:
        authenticator: sigv4auth

  extensions:
    sigv4auth:
      region: "us-east-1"
      service: "aps"

  service:
    extensions: [sigv4auth]
    pipelines:
      metrics:
        receivers: [prometheus]
        exporters: [prometheusremotewrite]

Optimizing Costs: Managing High Cardinality

Amazon EKS metrics can generate massive bills if you ingest every label from every ephemeral pod. AMP charges based on ingestion (samples) and storage.

Filtering at the Collector Level

Use the processors block in your ADOT configuration to drop unnecessary metrics or labels before they leave the cluster.

processors:
  filter:
    metrics:
      exclude:
        match_type: strict
        metric_names:
          - kubelet_volume_stats_available_bytes
          - kubelet_volume_stats_capacity_bytes
          - container_fs_usage_bytes # Often high noise, low value
  resource:
    attributes:
      - key: jenkins_build_id
        action: delete  # Remove high-cardinality labels

Advanced Concept: Avoid including high-cardinality labels such as client_ip, user_id, or unique request_id in your metric dimensions. These explode the series count and degrade query performance in PromQL.

Visualizing with Amazon Managed Grafana

Once data is flowing into AMP, visualization is standard.

Deploy Amazon Managed Grafana (AMG).
Add the “Prometheus” data source.
Toggle “SigV4 SDK” authentication in the data source settings (this seamlessly uses the AMG workspace IAM role to query AMP).
Select your AMP region and workspace.

Because AMP is 100% PromQL compatible, you can import standard community dashboards (like the Kubernetes Cluster Monitoring dashboard) and they will work immediately.

Frequently Asked Questions (FAQ)

Does AMP support Prometheus Alert Manager?

Yes. AMP supports a serverless Alert Manager. You upload your alerting rules (YAML) and routing configuration directly to the AMP workspace via the AWS CLI or Terraform. You do not need to run a separate Alert Manager pod in your cluster.

What is the difference between ADOT and the standard Prometheus Server?

The standard Prometheus server is a monolithic binary that scrapes, stores, and serves data. ADOT (based on the OpenTelemetry Collector) is a pipeline that receives data, processes it, and exports it. ADOT is stateless and easier to scale horizontally, making it ideal for shipping Amazon EKS metrics to a managed backend.

How do I monitor the control plane (API Server, etcd)?

EKS Control Plane metrics are not exposed via standard scraping endpoints inside your VPC because the control plane is managed by AWS. However, you can enable “Control Plane Logging” in EKS to send metrics to CloudWatch, or use specific PromQL exporters if AWS exposes the metrics endpoint (varies by EKS version and configuration).

Conclusion

Migrating to Amazon Managed Service for Prometheus allows expert teams to treat observability as a service rather than a server. By leveraging ADOT for collection and IRSA for security, you build a robust, scalable pipeline for your Amazon EKS metrics.

Your next step is to audit your current metric cardinality using the ADOT processor configuration to ensure you aren’t paying for noise. Focus on the golden signals—Latency, Traffic, Errors, and Saturation—and let AWS manage the infrastructure. Thank you for reading the DevopsRoles page!

Linux

Linux Kernel Security: Mastering Essential Workflows & Best Practices

01/06/2026 HuuPV Leave a comment

In the realm of high-performance infrastructure, the kernel is not just the engine; it is the ultimate arbiter of access. For expert Systems Engineers and SREs, Linux Kernel Security moves beyond simple package updates and firewall rules. It requires a comprehensive strategy involving surface reduction, advanced access controls, and runtime observability.

As containerization and microservices expose the kernel to new attack vectors—specifically container escapes and privilege escalation—relying solely on perimeter defense is insufficient. This guide dissects the architectural layers of kernel hardening, providing production-ready workflows for LSMs, Seccomp, and eBPF-based security to help you establish a robust defense-in-depth posture.

1. The Defense-in-Depth Model: Beyond Discretionary Access

Standard Linux permissions (Discretionary Access Control, or DAC) are the first line of defense but are notoriously prone to user error and privilege escalation. To secure a production kernel, we must enforce Mandatory Access Control (MAC).

Leveraging Linux Security Modules (LSMs)

Whether you utilize SELinux (Red Hat ecosystem) or AppArmor (Debian/Ubuntu ecosystem), the goal is identical: confine processes to the minimum necessary privileges.

Pro-Tip: SELinux in CI/CD
Experts often disable SELinux (`setenforce 0`) when facing friction. Instead, use audit2allow during your staging pipeline to generate permissive modules automatically, ensuring production remains in `Enforcing` mode without breaking applications.

To analyze a denial and generate a custom policy module:

# 1. Search for denials in the audit log
grep "denied" /var/log/audit/audit.log

# 2. Pipe the denial into audit2allow to see why it failed
grep "httpd" /var/log/audit/audit.log | audit2allow -w

# 3. Generate a loadable kernel module (.pp)
grep "httpd" /var/log/audit/audit.log | audit2allow -M my_httpd_policy

# 4. Load the module
semodule -i my_httpd_policy.pp

2. Reducing the Attack Surface via Sysctl Hardening

The default upstream kernel configuration prioritizes compatibility over security. For a hardened environment, specific sysctl parameters must be tuned to restrict memory access and network stack behavior.

Below is a production-grade /etc/sysctl.d/99-security.conf snippet targeting memory protection and network hardening.

# --- Kernel Self-Protection ---

# Restrict access to kernel pointers in /proc/kallsyms
# 0=disabled, 1=hide from unprivileged, 2=hide from all
kernel.kptr_restrict = 2

# Restrict access to the kernel log buffer (dmesg)
# Prevents attackers from reading kernel addresses from logs
kernel.dmesg_restrict = 1

# Restrict use of the eBPF subsystem to privileged users (CAP_BPF/CAP_SYS_ADMIN)
# Essential for preventing unprivileged eBPF exploits
kernel.unprivileged_bpf_disabled = 1

# Turn on BPF JIT hardening (blinding constants)
net.core.bpf_jit_harden = 2

# --- Network Stack Hardening ---

# Enable IP spoofing protection (Reverse Path Filtering)
net.ipv4.conf.all.rp_filter = 1
net.ipv4.conf.default.rp_filter = 1

# Disable ICMP Redirect Acceptance (prevents Man-in-the-Middle routing attacks)
net.ipv4.conf.all.accept_redirects = 0
net.ipv4.conf.default.accept_redirects = 0
net.ipv6.conf.all.accept_redirects = 0

Apply these changes dynamically with sysctl -p /etc/sysctl.d/99-security.conf. Refer to the official kernel sysctl documentation for granular details on specific parameters.

3. Syscall Filtering with Seccomp BPF

Secure Computing Mode (Seccomp) is critical for reducing the kernel’s exposure to userspace. By default, a process can make any system call. Seccomp acts as a firewall for syscalls.

In modern container orchestrators like Kubernetes, Seccomp profiles are defined in JSON. However, understanding how to profile an application is key.

Profiling Applications

You can use tools like strace to identify exactly which syscalls an application needs, then blacklist everything else.

# Trace the application and count syscalls
strace -c -f ./my-application

A basic whitelist profile (JSON) for a container runtime might look like this:

{
    "defaultAction": "SCMP_ACT_ERRNO",
    "architectures": [
        "SCMP_ARCH_X86_64"
    ],
    "syscalls": [
        {
            "names": [
                "read", "write", "exit", "exit_group", "futex", "mmap", "nanosleep"
            ],
            "action": "SCMP_ACT_ALLOW"
        }
    ]
}

Advanced Concept: Seccomp allows filtering based on syscall arguments, not just the syscall ID. This allows for extremely granular control, such as allowing `socket` calls but only for specific families (e.g., AF_UNIX).

4. Kernel Module Signing and Lockdown

Rootkits often persist by loading malicious kernel modules. To prevent this, enforce Module Signing. This ensures the kernel only loads modules signed by a trusted key (usually the distribution vendor or your own secure boot key).

Enforcing Lockdown Mode

The Linux Kernel Lockdown feature (available in 5.4+) draws a line between the root user and the kernel itself. Even if an attacker gains root, Lockdown prevents them from modifying kernel memory or injecting code.

Enable it via boot parameters or securityfs:

# Check current status
cat /sys/kernel/security/lockdown

# Enable integrity mode (prevents modifying running kernel)
# Usually set via GRUB: lockdown=integrity or lockdown=confidentiality

5. Runtime Observability & Security with eBPF

Traditional security tools rely on parsing logs or checking file integrity. Modern Linux Kernel Security leverages eBPF (Extended Berkeley Packet Filter) to observe kernel events in real-time with minimal overhead.

Tools like Tetragon or Falco attach eBPF probes to syscalls (e.g., `execve`, `connect`, `open`) to detect anomalous behavior.

Example: Detecting Shell Execution in Containers

Instead of scanning for signatures, eBPF can trigger an alert the moment a sensitive binary is executed inside a specific namespace.

# A conceptual Falco rule for detecting shell access
- rule: Terminal Shell in Container
  desc: A shell was used as the entrypoint for the container executable
  condition: >
    spawned_process and container
    and shell_procs
  output: >
    Shell executed in container (user=%user.name container_id=%container.id image=%container.image.repository)
  priority: WARNING

Frequently Asked Questions (FAQ)

Does enabling Seccomp cause performance degradation?

Generally, the overhead is negligible for most workloads. The BPF filters used by Seccomp are JIT-compiled and extremely fast. However, for syscall-heavy applications (like high-frequency trading platforms), benchmarking is recommended.

What is the difference between Kernel Lockdown “Integrity” and “Confidentiality”?

Integrity prevents userland from modifying the running kernel (e.g., writing to `/dev/mem` or loading unsigned modules). Confidentiality goes a step further by preventing userland from reading sensitive kernel information that could reveal cryptographic keys or layout randomization.

How do I handle kernel vulnerabilities (CVEs) without rebooting?

For mission-critical systems where downtime is unacceptable, use Kernel Live Patching technologies like kpatch (Red Hat) or Livepatch (Canonical). These tools inject functional replacements for vulnerable code paths into the running kernel memory.

Conclusion

Mastering Linux Kernel Security is not a checklist item; it is a continuous process of reducing trust and increasing observability. By implementing a layered defense—starting with strict LSM policies, minimizing the attack surface via sysctl, enforcing Seccomp filters, and utilizing modern eBPF observability—you transform the kernel from a passive target into an active guardian of your infrastructure.

Start by auditing your current sysctl configurations and moving your container workloads to a default-deny Seccomp profile. The security of the entire stack rests on the integrity of the kernel. Thank you for reading the DevopsRoles page!

Kubernetes

PureVPN Review: Why I Don’t Trust “Free Trials” (And Why This One Is Different)

01/05/2026 HuuPV Leave a comment

PureVPN Review 2026

Transparency Note: This review is based on real data. Some links may be affiliate links, meaning I earn a commission at no extra cost to you if you purchase through them.

Let’s get real for a second. The VPN industry is full of snakes.

Every provider screams they are the “fastest,” “most secure,” and “best for Netflix.” 90% of them are lying. As an industry insider with 20 years of analyzing traffic and affiliate backends, I’ve seen it all.

I’m not here to sell you a dream. I’m here to tear apart the data. I logged into my own partner dashboard to verify if PureVPN is legitimate or just another marketing machine.

Here is the ugly, unfiltered truth about their $0.99 Trial, their Streaming capabilities, and whether they deserve your money in 2026.

The $0.99 “Backdoor” Offer (Why Do They Hide It?)

Most premium VPNs have killed their free trials. Why? Because their service sucks, and they know you’ll cancel before paying. Instead, they force you to pay $50 upfront and pray their “30-day money-back guarantee” isn’t a nightmare to claim.

PureVPN is one of the rare exceptions, but they don’t exactly shout about it on their homepage.

I dug through the backend marketing assets, and I found this:

[Trial page with $0.99…] Caption: Proof from my dashboard: The hidden $0.99 trial landing page actually exists.

Here is the deal:

The Cost: $0.99. Less than a pack of gum.
The Catch: It’s 7 days.
The Reality: This is the only smart way to buy a VPN.

Don’t be a fool and buy a 2-year plan blindly. [Use this specific link] to grab the $0.99 trial. Stress-test it for 7 days. Download huge files. Stream 4K content. If it fails? You lost one dollar. If it works? You just saved yourself a headache.

“Streaming Optimized” – Marketing Fluff or Real Tech?

I get asked this every day: “Does it actually work with Netflix US?”

Usually, my answer is “Maybe.” But looking at PureVPN’s internal structure, I see something interesting. They don’t just dump everyone onto the same servers.

Caption: PureVPN segments traffic at the source. This is why their unblocking actually works.

Look at the screenshot above. They have dedicated gateways (landing pages and server routes) specifically for:

Netflix & Kodi: High bandwidth, obfuscated IPs.
Crypto: High security, static IPs.
Sports: Low latency.

This isn’t just a UI button; it’s infrastructure segregation. When I tested their Netflix US server, I didn’t get the dreaded “Proxy Detected” error. Why? Because they are actively fighting Netflix’s ban list with these specific gateways.

Transparency: Show Me The Data

I don’t trust words; I trust numbers.

One of the biggest red flags with VPN companies is “shady operations.” If they can’t track a click, they can’t protect your data.

I monitor my PureVPN partnership panel daily. Look at this granular tracking:

[Clicks] Caption: Real-time tracking of unique vs. repeated clicks. If they are this precise with my stats, they are precise with your privacy.

The system distinguishes between Unique and Repeated traffic instantly. This level of technical competency in their backend suggests a mature infrastructure. They aren’t running this out of a basement. They have the resources to maintain a strict No-Log policy and have been audited to prove it.

Who Should AVOID PureVPN?

I promised to be brutal, so here it is.

If you want a simplistic, 1-button app: PureVPN might annoy you. Their app is packed with modes and features. It’s for power users, not your grandma.
If you want a permanently free VPN: Go use a free proxy and let them sell your data to advertisers. PureVPN is a paid tool for serious privacy.

The Verdict

Is PureVPN the “Best VPN in the Universe”? stop it. There is no such thing.

But is it the smartest purchase you can make right now? Yes.

Because of the $0.99 Trial.

It removes all the risk. You don’t have to trust my review. You don’t have to trust their ads. You just pay $0.99 and judge for yourself.

Here is the link I verified in the screenshots. Use it before they pull the offer:

👉 [Get The 7-Day Trial for $0.99 (Verified Link)]

Trusted by 3 million+ satisfied users

Easy to use VPN app for all your devices

Thank you for reading the DevopsRoles page!

Docker

Docker Alternatives: Secure & Scalable Container Solutions

01/05/2026 HuuPV Leave a comment

For over a decade, Docker has been synonymous with containerization. It revolutionized how we build, ship, and run applications. However, the container landscape has matured significantly. Between the changes to Docker Desktop’s licensing model, the deprecation of Dockershim in Kubernetes, and the inherent security risks of a root-privileged daemon, many organizations are actively evaluating Docker alternatives.

As experienced practitioners, we know that “replacing Docker” isn’t just about swapping a CLI; it’s about understanding the OCI (Open Container Initiative) standards, optimizing the CI/CD supply chain, and reducing the attack surface. This guide navigates the best production-ready tools for runtimes, building, and orchestration.

Why Look Beyond Docker?

Before diving into the tools, let’s articulate the architectural drivers for migration. The Docker daemon (dockerd) is a monolithic complexity that runs as root. This architecture presents three primary challenges:

Security (Root Daemon): By default, the Docker daemon runs with root privileges. If the daemon is compromised, the attacker gains root access to the host.
Kubernetes Compatibility: Kubernetes deprecated the Dockershim in v1.24. While Docker images are OCI-compliant, the Docker runtime itself is no longer the native interface for K8s, usually replaced by containerd or CRI-O via the CRI (Container Runtime Interface).
Licensing: The updated subscription terms for Docker Desktop have forced many large enterprises to seek open-source equivalents for local development.

Pro-Tip: The term “Docker” is often conflated to mean the image format, the runtime, and the orchestration. Most modern tools comply with the OCI Image Specification and OCI Runtime Specification. This means an image built with Buildah can be run by Podman or Kubernetes without issue.

1. Podman: The Direct CLI Replacement

Podman (Pod Manager) is arguably the most robust of the Docker alternatives for Linux users. Developed by Red Hat, it is a daemonless container engine for developing, managing, and running OCI containers on your Linux system.

Architecture: Daemonless & Rootless

Unlike Docker, Podman interacts directly with the image registry, container, and image storage implementation within the Linux kernel. It uses a fork-exec model for running containers.

Rootless by Default: Containers run under the user’s UID/GID namespace, drastically reducing the security blast radius.
Daemonless: No background process means less overhead and no single point of failure managing all containers.
Systemd Integration: Podman allows you to generate systemd unit files for your containers, treating them as first-class citizens of the OS.

Migration Strategy

Podman’s CLI is designed to be identical to Docker’s. In many cases, migration is as simple as aliasing the command.

# Add this to your .bashrc or .zshrc
alias docker=podman

# Verify installation
podman version

Podman also introduces the concept of “Pods” (groups of containers sharing namespaces) to the CLI, bridging the gap between local dev and K8s.

# Run a pod with a shared network namespace
podman pod create --name web-pod -p 8080:80

# Run a container inside that pod
podman run -d --pod web-pod nginx:alpine

2. containerd & nerdctl: The Kubernetes Native

containerd is the industry-standard container runtime. It was actually spun out of Docker originally and donated to the CNCF. It focuses on being simple, robust, and portable.

While containerd is primarily a daemon used by Kubernetes, it can be used directly for debugging or local execution. However, the raw ctr CLI is not user-friendly. Enter nerdctl.

nerdctl (contaiNERD ctl)

nerdctl is a Docker-compatible CLI for containerd. It supports modern features that Docker is sometimes slow to adopt, such as:

Lazy-pulling (stargz)
Encrypted images (OCICrypt)
IPFS-based image distribution

# Installing nerdctl (example)
brew install nerdctl

# Run a container (identical syntax to Docker)
nerdctl run -d -p 80:80 nginx

3. Advanced Build Tools: Buildah & Kaniko

In a CI/CD pipeline, running a Docker daemon inside a Jenkins or GitLab runner (Docker-in-Docker) is a known security anti-pattern. We need tools that build OCI images without a daemon.

Buildah

Buildah specializes in building OCI images. It allows you to build images from scratch (an empty directory) or using a Dockerfile. It excels in scripting builds via Bash rather than relying solely on Dockerfile instruction sets.

# Example: Building an image without a Dockerfile using Buildah
container=$(buildah from scratch)
mnt=$(buildah mount $container)

# Install packages into the mounted directory
dnf install --installroot $mnt --releasever 8 --setopt=install_weak_deps=false --nodocs -y httpd

# Config
buildah config --cmd "/usr/sbin/httpd -D FOREGROUND" $container
buildah commit $container my-httpd-image

Kaniko

Kaniko is Google’s solution for building container images inside a container or Kubernetes cluster. It does not depend on a Docker daemon and executes each command within a Dockerfile completely in userspace. This makes it ideal for securing Kubernetes-based CI pipelines like Tekton or Jenkins X.

4. Desktop Replacements (GUI)

For developers on macOS and Windows who rely on the Docker Desktop GUI and ease of use, straight Linux CLI tools aren’t enough.

Rancher Desktop

Rancher Desktop is an open-source app for Mac, Windows, and Linux. It provides Kubernetes and container management. Under the hood, it uses a Lima VM on macOS and WSL2 on Windows. It allows you to switch the runtime engine between dockerd (Moby) and containerd.

OrbStack (macOS)

For macOS power users, OrbStack has gained massive traction. It is a drop-in replacement for Docker Desktop that is significantly faster, lighter on RAM, and offers seamless bi-directional networking and file sharing. It is highly recommended for performance-critical local development.

Frequently Asked Questions (FAQ)

Can I use Docker Compose with Podman?

Yes. You can use the podman-compose tool, which is a community-driven implementation. Alternatively, modern versions of Podman run a unix socket that mimics the Docker socket, allowing the standard docker-compose binary to communicate directly with the Podman backend.

Is Podman truly safer than Docker?

Architecturally, yes. Because Podman uses a fork/exec model and supports rootless containers by default, the attack surface is significantly smaller. There is no central daemon running as root waiting to receive commands.

What is the difference between CRI-O and containerd?

Both are CRI (Container Runtime Interface) implementations for Kubernetes. containerd is a general-purpose runtime (used by Docker and K8s). CRI-O is purpose-built strictly for Kubernetes; it aims to be lightweight and defaults to OCI standards, but it is rarely used as a standalone CLI tool for developers.

Conclusion

The ecosystem of Docker alternatives has evolved from experimental projects to robust, enterprise-grade standards. For local development on Linux, Podman offers a superior security model with a familiar UX. For Kubernetes-native workflows, containerd with nerdctl prepares you for the production environment.

Switching tools requires effort, but aligning your local development environment closer to your production Kubernetes clusters using OCI-compliant tools pays dividends in security, stability, and understanding of the cloud-native stack.

Ready to make the switch? Start by auditing your current CI pipelines for “Docker-in-Docker” usage and test a migration to Buildah or Kaniko today. Thank you for reading the DevopsRoles page!

Kubernetes

Kubernetes Validate GPU Accelerator Access Isolation in OKE

01/03/2026 HuuPV Leave a comment

In multi-tenant high-performance computing (HPC) environments, ensuring strict resource boundaries is not just a performance concern—it is a critical security requirement. For Oracle Cloud Infrastructure Container Engine for Kubernetes (OKE), verifying GPU Accelerator Access Isolation is paramount when running untrusted workloads alongside critical AI/ML inference tasks. This guide targets expert Platform Engineers and SREs, focusing on the mechanisms, configuration, and practical validation of GPU isolation within OKE clusters.

The Mechanics of GPU Isolation in Kubernetes

Before diving into validation, it is essential to understand how OKE and the underlying container runtime mediate access to hardware accelerators. Unlike CPU and RAM, which are compressible resources managed via cgroups, GPUs are treated as extended resources.

Pro-Tip: The default behavior of the NVIDIA Container Runtime is often permissive. Without the NVIDIA Device Plugin explicitly setting environment variables like NVIDIA_VISIBLE_DEVICES, a container might gain access to all GPU devices on the node. Isolation relies heavily on the correct interaction between the Kubelet, the Device Plugin, and the Container Runtime Interface (CRI).

Isolation Layers

Physical Isolation (Passthrough): Giving a Pod exclusive access to a specific PCle device.
Logical Isolation (MIG): Using Multi-Instance GPU (MIG) on Ampere architectures (e.g., A100) to partition a single physical GPU into multiple isolated instances with dedicated compute, memory, and cache.
Time-Slicing: Sharing a single GPU context across multiple processes (weakest isolation, mostly for efficiency, not security).

Prerequisites for OKE

To follow this validation procedure, ensure your environment meets the following criteria:

An active OKE Cluster (version 1.25+ recommended).
Node pools using GPU-enabled shapes (e.g., VM.GPU.A10.1, BM.GPU.A100-vCP.8).
The NVIDIA Device Plugin installed (standard in OKE GPU images, but verify the daemonset).
kubectl context configured for administrative access.

Step 1: Establishing the Baseline (The “Rogue” Pod)

To validate GPU Accelerator Access Isolation, we must first attempt to access resources from a Pod that has not requested them. This simulates a “rogue” workload attempting to bypass resource quotas or scrape data from GPU memory.

Deploying a Non-GPU Workload

Deploy a standard pod that includes the NVIDIA utilities but requests 0 GPU resources.

apiVersion: v1
kind: Pod
metadata:
  name: gpu-rogue-validation
  namespace: default
spec:
  restartPolicy: Never
  containers:
  - name: cuda-container
    image: nvcr.io/nvidia/k8s/cuda-sample:nbody-cuda11.7.1-ubuntu20.04
    command: ["sleep", "3600"]
    # CRITICAL: No resources.limits.nvidia.com/gpu defined here
    resources:
      limits:
        cpu: "500m"
        memory: "512Mi"

Verification Command

Exec into the pod and attempt to query the GPU status. If isolation is working correctly, the NVIDIA driver should report no devices found or the command should fail.

kubectl exec -it gpu-rogue-validation -- nvidia-smi

Expected Outcome:

Failed to initialize NVML: Unknown Error
Or, a clear output stating No devices were found.

If this pod returns a full list of GPUs, isolation has failed. This usually indicates that the default runtime is exposing all devices because the Device Plugin did not inject the masking environment variables.

Step 2: Validating Authorized Access

Now, deploy a valid workload that requests a specific number of GPUs to ensure the scheduler and device plugin are correctly allocating resources.

apiVersion: v1
kind: Pod
metadata:
  name: gpu-authorized
spec:
  restartPolicy: Never
  containers:
  - name: cuda-container
    image: nvcr.io/nvidia/k8s/cuda-sample:nbody-cuda11.7.1-ubuntu20.04
    command: ["sleep", "3600"]
    resources:
      limits:
        nvidia.com/gpu: 1 # Requesting 1 GPU

Inspection

Run nvidia-smi inside this pod. You should see exactly one GPU device.

Furthermore, inspect the environment variables injected by the plugin:

kubectl exec gpu-authorized -- env | grep NVIDIA_VISIBLE_DEVICES

This should return a UUID (e.g., GPU-xxxxxxxx-xxxx-xxxx...) rather than all.

Step 3: Advanced Validation with MIG (Multi-Instance GPU)

For workloads requiring strict hardware-level isolation on OKE using A100 instances, you must validate MIG partitioning. GPU Accelerator Access Isolation in a MIG context means a Pod on “Instance A” cannot impact the memory bandwidth or compute units of “Instance B”.

If you have configured MIG strategies (e.g., mixed or single) in your OKE node pool:

Deploy two separate pods, each requesting nvidia.com/mig-1g.5gb (or your specific profile).

Run a stress test on Pod A:

kubectl exec -it pod-a -- /usr/local/cuda/samples/1_Utilities/deviceQuery/deviceQuery

Verify UUIDs: Ensure the UUID visible in Pod A is distinct from Pod B.
Crosstalk Check: Attempt to target the GPU index of Pod B from Pod A using CUDA code. It should fail with an invalid device error.

Troubleshooting Isolation Leaks

If your validation tests fail (i.e., the “rogue” pod can see GPUs), check the following configurations in your OKE cluster.

1. Privileged Security Context

A common misconfiguration is running containers as privileged. This bypasses the container runtime’s device cgroup restrictions.

# AVOID THIS IN MULTI-TENANT CLUSTERS
securityContext:
  privileged: true

Fix: Enforce Pod Security Standards (PSS) to disallow privileged containers in non-system namespaces.

2. HostPath Volume Mounts

Ensure users are not mounting /dev or /var/run/nvidia-container-devices directly. Use OPA Gatekeeper or Kyverno to block HostPath mounts that expose device nodes.

Frequently Asked Questions (FAQ)

Does OKE enable GPU isolation by default?

Yes, OKE uses the standard Kubernetes Device Plugin model. However, “default” relies on the assumption that you are not running privileged containers. You must actively validate that your RBAC and Pod Security Policies prevent privilege escalation.

Can I share a single GPU across two Pods safely?

Yes, via Time-Slicing or MIG. However, Time-Slicing does not provide memory isolation (OOM in one pod can crash the GPU context for others). For true isolation, you must use MIG (available on A100 shapes in OKE).

How do I monitor GPU violations?

Standard monitoring (Prometheus/Grafana) tracks utilization, not access violations. To detect access violations, you need runtime security tools like Falco, configured to alert on unauthorized open() syscalls on /dev/nvidia* devices by pods that haven’t requested them.

Conclusion

Validating GPU Accelerator Access Isolation in OKE is a non-negotiable step for securing high-value AI infrastructure. By systematically deploying rogue and authorized pods, inspecting environment variable injection, and enforcing strict Pod Security Standards, you verify that your multi-tenant boundaries are intact. Whether you are using simple passthrough or complex MIG partitions, trust nothing until you have seen the nvidia-smi output deny access. Thank you for reading the DevopsRoles page!

Kubernetes

Optimize Kubernetes Request Right Sizing with Kubecost for Cost Savings

01/01/2026 HuuPV Leave a comment

In the era of cloud-native infrastructure, the scheduler is king. However, the efficiency of that scheduler depends entirely on the accuracy of the data you feed it. For expert Platform Engineers and SREs, Kubernetes request right sizing is not merely a housekeeping task—it is a critical financial and operational lever. Over-provisioning leads to “slack” (billed but unused capacity), while under-provisioning invites CPU throttling and OOMKilled events.

This guide moves beyond the basics of resources.yaml. We will explore the mechanics of resource contention, the algorithmic approach Kubecost takes to optimization, and how to implement a data-driven right-sizing strategy that balances cost reduction with production stability.

The Technical Economics of Resource Allocation

To master Kubernetes request right sizing, one must first understand how the Kubernetes scheduler and the underlying Linux kernel interpret these values.

The Scheduler vs. The Kernel

Requests are primarily for the Kubernetes Scheduler. They ensure a node has enough allocatable capacity to host a Pod. Limits, conversely, are enforced by the Linux kernel via cgroups.

CPU Requests: Determine the cpu.shares in cgroups. This is a relative weight, ensuring that under contention, the container gets its guaranteed slice of time.
CPU Limits: Determine cpu.cfs_quota_us. Hard throttling occurs immediately if this quota is exceeded within a period (typically 100ms), regardless of node idleness.
Memory Requests: Primarily used for scheduling.
Memory Limits: Enforce the OOM Killer threshold.

Pro-Tip (Expert): Be cautious with CPU limits. While they prevent a runaway process from starving neighbors, they can introduce tail latency due to CFS throttling bugs or micro-bursts. Many high-performance shops (e.g., at the scale of Twitter or Zalando) choose to set CPU Requests but omit CPU Limits for Burstable workloads, relying on cpu.shares for fairness.

Why “Guesstimation” Fails at Scale

Manual right-sizing is impossible in dynamic environments. Developers often default to “safe” (bloated) numbers, or copy-paste manifests from StackOverflow. This results in the “Kubernetes Resource Gap”: the delta between Allocated resources (what you pay for) and Utilized resources (what you actually use).

Without tooling like Kubecost, you are likely relying on static Prometheus queries that look like this to find usage peaks:

max_over_time(container_memory_working_set_bytes{namespace="production"}[24h])

While useful, raw PromQL queries lack context regarding billing models, spot instance savings, and historical seasonality. This is where Kubernetes request right sizing via Kubecost becomes essential.

Implementing Kubecost for Granular Visibility

Kubecost models your cluster’s costs by correlating real-time resource usage with your cloud provider’s billing API (AWS Cost Explorer, GCP Billing, Azure Cost Management).

1. Installation & Prometheus Integration

For production clusters, installing via Helm is standard. Ensure you are scraping metrics at a resolution high enough to catch micro-bursts, but low enough to manage TSDB cardinality.

helm repo add kubecost https://kubecost.github.io/cost-analyzer/
helm upgrade --install kubecost kubecost/cost-analyzer \
    --namespace kubecost --create-namespace \
    --set kubecostToken="YOUR_TOKEN_HERE" \
    --set prometheus.server.persistentVolume.enabled=true \
    --set prometheus.server.retention=15d

2. The Right-Sizing Algorithm

Kubecost’s recommendation engine doesn’t just look at “now.” It analyzes a configurable window (e.g., 2 days, 7 days, 30 days) to recommend Kubernetes request right sizing targets.

The core logic typically follows a usage profile:

Peak Aware: It identifies max(usage) over the window to prevent OOMs.
Headroom Buffer: It adds a configurable overhead (e.g., 15-20%) to the recommendation to account for future growth or sudden spikes.

Executing the Optimization Loop

Once Kubecost is ingesting data, navigate to the Savings > Request Right Sizing dashboard. Here is the workflow for an SRE applying these changes.

Step 1: Filter by Namespace and Owner

Do not try to resize the entire cluster at once. Filter by namespace: backend or label: team=data-science.

Step 2: Analyze the “Efficiency” Score

Kubecost assigns an efficiency score based on the ratio of idle to used resources.

Target: A healthy range is typically 60-80% utilization. Approaching 100% is dangerous; staying below 30% is wasteful.

Step 3: Apply the Recommendation (GitOps)

As an expert, you should never manually patch a deployment via `kubectl edit`. Take the recommended YAML values from Kubecost and update your Helm Charts or Kustomize bases.

# Before Optimization
resources:
  requests:
    memory: "4Gi" # 90% idle based on Kubecost data
    cpu: "2000m"

# After Optimization (Kubecost Recommendation)
resources:
  requests:
    memory: "600Mi" # calculated max usage + 20% buffer
    cpu: "350m"

Advanced Strategy: Automating with VPA

Static right-sizing has a shelf life. As traffic patterns change, your static values become obsolete. The ultimate maturity level in Kubernetes request right sizing is coupling Kubecost’s insights with the Vertical Pod Autoscaler (VPA).

Kubecost can integrate with VPA to automatically apply recommendations. However, in production, “Auto” mode is risky because it restarts Pods to change resource specifications.

Warning: For critical stateful workloads (like Databases or Kafka), use VPA in Off or Initial mode. This allows VPA to calculate the recommendation object, which you can then monitor via metrics or export to your GitOps repo, without forcing restarts.

VPA Configuration for Recommendations Only

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: backend-service-vpa
spec:
  targetRef:
    apiVersion: "apps/v1"
    kind: Deployment
    name: backend-service
  updatePolicy:
    updateMode: "Off" # Kubecost reads the recommendation; VPA does not restart pods.

Frequently Asked Questions (FAQ)

1. How does right-sizing affect Quality of Service (QoS) classes?

Right-sizing directly dictates QoS.

Guaranteed: Requests == Limits. Safest, but most expensive.

Burstable: Requests < Limits. Ideal for most HTTP web services.

BestEffort: No requests/limits. High risk of eviction.

When you lower requests to save money, ensure you don’t accidentally drop a critical service from Guaranteed to Burstable if strict isolation is required.

2. Can I use Kubecost to resize specific sidecars (like Istio/Envoy)?

Yes. Sidecars often suffer from massive over-provisioning because they are injected with generic defaults. Kubecost breaks down usage by container, allowing you to tune the istio-proxy container independently of the main application container.

3. What if my workload has very “spiky” traffic?

Standard averaging algorithms fail with spiky workloads. In Kubecost, adjust the profiling window to a shorter duration (e.g., 2 days) to capture recent spikes, or ensure your “Target Utilization” threshold is set lower (e.g., 50% instead of 80%) to leave a larger safety buffer for bursts.

Conclusion

Kubernetes request right sizing is not a one-time project; it is a continuous loop of observability and adjustment. By leveraging Kubecost, you move from intuition-based guessing to data-driven precision.

The goal is not just to lower the cloud bill. The goal is to maximize the utility of every CPU cycle you pay for while guaranteeing the stability your users expect. Start by identifying your top 10 most wasteful deployments, apply the “Requests + Buffer” logic, and integrate these checks into your CI/CD pipelines to prevent resource drift before it hits production. Thank you for reading the DevopsRoles page!

AWS

AWS SDK for Rust: Your Essential Guide to Quick Setup

12/30/2025 HuuPV Leave a comment

In the evolving landscape of cloud-native development, the AWS SDK for Rust represents a paradigm shift toward memory safety, high performance, and predictable resource consumption. While languages like Python and Node.js have long dominated the AWS ecosystem, Rust provides an unparalleled advantage for high-throughput services and cost-optimized Lambda functions. This guide moves beyond the basics, offering a technical deep-dive into setting up a production-ready environment using the SDK.

Pro-Tip: The AWS SDK for Rust is built on top of smithy-rs, a code generator capable of generating SDKs from Smithy models. This architecture ensures that the Rust SDK stays in sync with AWS service updates almost instantly.

1. Project Initialization and Dependency Management

To begin working with the AWS SDK for Rust, you must configure your Cargo.toml carefully. Unlike monolithic SDKs, the Rust SDK is modular. You only include the crates for the services you actually use, which significantly reduces compile times and binary sizes.

Every project requires the aws-config crate for authentication and the specific service crates (e.g., aws-sdk-s3). Since the SDK is inherently asynchronous, a runtime like Tokio is mandatory.

[dependencies]
# Core configuration and credential provider
aws-config = { version = "1.1", features = ["behavior-version-latest"] }

# Service specific crates
aws-sdk-s3 = "1.17"
aws-sdk-dynamodb = "1.16"

# Async runtime
tokio = { version = "1", features = ["full"] }

# Error handling
anyhow = "1.0"

2. Deep Dive: Configuring the AWS SDK for Rust

The entry point for almost any application is the aws_config::load_from_env() function. For expert developers, understanding how the SdkConfig object manages the credential provider chain and region resolution is critical for debugging cross-account or cross-region deployments.

Asynchronous Initialization

The SDK uses async/await throughout. Here is the standard boilerplate for a robust initialization:

use aws_config::meta::region::RegionProviderChain;
use aws_config::BehaviorVersion;

#[tokio::main]
async fn main() -> Result<(), anyhow::Error> {
    // Determine region, falling back to us-east-1 if not set
    let region_provider = RegionProviderChain::default_provider().or_else("us-east-1");
    
    // Load configuration with the latest behavior version for future-proofing
    let config = aws_config::defaults(BehaviorVersion::latest())
        .region(region_provider)
        .load()
        .await;

    // Initialize service clients
    let s3_client = aws_sdk_s3::Client::new(&config);
    
    println!("AWS SDK for Rust initialized for region: {:?}", config.region().unwrap());
    Ok(())
}

Advanced Concept: The BehaviorVersion parameter is crucial. It allows the AWS team to introduce breaking changes to default behaviors (like retry logic) without breaking existing binaries. Always use latest() for new projects or a specific version for legacy stability.

3. Production Patterns: Interacting with Services

Once the AWS SDK for Rust is configured, interacting with services follows a consistent “Builder” pattern. This pattern ensures type safety and prevents the construction of invalid requests at compile time.

Example: High-Performance S3 Object Retrieval

When fetching large objects, leveraging Rust’s stream handling is significantly more efficient than buffering the entire payload into memory.

use aws_sdk_s3::Client;

async fn download_object(client: &Client, bucket: &str, key: &str) -> Result<(), anyhow::Error> {
    let resp = client
        .get_object()
        .bucket(bucket)
        .key(key)
        .send()
        .await?;

    let data = resp.body.collect().await?;
    println!("Downloaded {} bytes", data.into_bytes().len());

    Ok(())
}

4. Error Handling and Troubleshooting

Error handling in the AWS SDK for Rust is exhaustive. Each operation returns a specialized error type that distinguishes between service-specific errors (e.g., NoSuchKey) and transient network failures.

Service Errors: Errors returned by the AWS API (4xx or 5xx).
SdkErrors: Errors related to the local environment, such as construction failures or timeouts.

For more details on error structures, refer to the Official Smithy Error Documentation.

Feature	Rust Advantage	Impact on DevOps
Memory Safety	Zero-cost abstractions/Ownership	Lower crash rates in production.
Binary Size	Modular crates	Faster Lambda cold starts.
Concurrency	Fearless concurrency with Tokio	High throughput on minimal hardware.

Frequently Asked Questions (FAQ)

Is the AWS SDK for Rust production-ready?

Yes. As of late 2023, the AWS SDK for Rust is General Availability (GA). It is used internally by AWS and by numerous high-scale organizations for production workloads.

How do I handle authentication for local development?

The SDK follows the standard AWS credential provider chain. It will automatically check for environment variables (AWS_ACCESS_KEY_ID), the ~/.aws/credentials file, and IAM roles if running on EC2 or EKS.

Can I use the SDK without Tokio?

While the SDK is built to be executor-agnostic in theory, currently, aws-config and the default HTTP clients are heavily integrated with Tokio and Hyper. Using a different runtime requires implementing custom HTTP connectors.

Conclusion

Setting up the AWS SDK for Rust is a strategic move for developers who prioritize performance and reliability. By utilizing the modular crate system, embracing the async-first architecture of Tokio, and understanding the SdkConfig lifecycle, you can build cloud applications that are both cost-effective and remarkably fast. Whether you are building microservices on EKS or high-performance Lambda functions, Rust offers the tooling necessary to master the AWS ecosystem.

Would you like me to generate a specialized guide on optimizing AWS Lambda cold starts using the Rust SDK and Cargo Lambda? Thank you for reading the DevopsRoles page!

AWS, Terraform

Mastering AWS Account Deployment: Terraform & AWS Control Tower

12/29/2025 HuuPV Leave a comment

For modern enterprises, AWS account deployment is no longer a manual task of clicking through the AWS Organizations console. As infrastructure scales, the need for consistent, compliant, and automated “vending machines” for AWS accounts becomes paramount. By combining the governance power of AWS Control Tower with the Infrastructure as Code (IaC) flexibility of Terraform, SREs and Cloud Architects can build a robust deployment pipeline that satisfies both developer velocity and security requirements.

The Foundations: Why Control Tower & Terraform?

In a decentralized cloud environment, AWS account deployment must address three critical pillars: Governance, Security, and Scalability. While AWS Control Tower provides the managed “Landing Zone” environment, Terraform provides the declarative state management required to manage thousands of resources across multiple accounts without configuration drift.

Advanced Concept: Control Tower uses “Guardrails” (Service Control Policies and Config Rules). When deploying accounts via Terraform, you aren’t just creating a container; you are attaching a policy-driven ecosystem that inherits the root organization’s security posture by default.

By leveraging the Terraform AWS Provider alongside Control Tower, you enable a “GitOps” workflow where an account request is simply a .tf file in a repository. This approach ensures that every account is born with the correct IAM roles, VPC configurations, and logging buckets pre-provisioned.

Deep Dive: Account Factory for Terraform (AFT)

The AWS Control Tower Account Factory for Terraform (AFT) is the official bridge between these two worlds. AFT sets up a separate orchestration engine that listens for Terraform changes and triggers the Control Tower account creation API.

The AFT Component Stack

AFT Management Account: A dedicated account within your Organization to host the AFT pipeline.
Request Metadata: A DynamoDB table or Git repo that stores account parameters (Email, OU, SSO user).
Customization Pipeline: A series of Step Functions and Lambda functions that apply “Global” and “Account-level” Terraform modules after the account is provisioned.

Step-by-Step: Deploying Your First Managed Account

To master AWS account deployment via AFT, you must understand the structure of an account request. Below is a production-grade example of a Terraform module call to request a new “Production” account.


module "sandbox_account" {
  source = "github.com/aws-ia/terraform-aws-control_tower_account_factory"

  control_tower_parameters = {
    AccountEmail              = "cloud-ops+prod-app-01@example.com"
    AccountName               = "production-app-01"
    ManagedOrganizationalUnit = "Production"
    SSOUserEmail              = "admin@example.com"
    SSOUserFirstName          = "Platform"
    SSOUserLastName           = "Team"
  }

  account_tags = {
    "Project"     = "Apollo"
    "Environment" = "Production"
    "CostCenter"  = "12345"
  }

  change_management_parameters = {
    change_requested_by = "DevOps Team"
    change_reason       = "New microservice deployment for Q4"
  }

  custom_fields = {
    vpc_cidr = "10.0.0.0/20"
  }
}

After applying this Terraform code, AFT triggers a workflow in the background. It calls the Control Tower ProvisionProduct API, waits for the account to be “Ready,” and then executes your post-provisioning Terraform modules to set up VPCs, IAM roles, and CloudWatch alarms.

Production-Ready Best Practices

Expert SREs know that AWS account deployment is only 20% of the battle; the other 80% is maintaining those accounts. Follow these standards:

Idempotency is King: Ensure your post-provisioning scripts can run multiple times without failure. Use Terraform’s lifecycle { prevent_destroy = true } on critical resources like S3 logging buckets.
Service Quota Management: Newly deployed accounts start with default limits. Use the aws_servicequotas_service_quota resource to automatically request increases for EC2 instances or VPCs during the deployment phase.
Region Deny Policies: Use Control Tower guardrails to restrict deployments to approved regions. This reduces your attack surface and prevents “shadow IT” in unmonitored regions like me-south-1.
Centralized Logging: Always ensure the aws_s3_bucket_policy in your log-archive account allows the newly created account’s CloudTrail service principal to write logs immediately.

Troubleshooting Common Deployment Failures

Even with automation, AWS account deployment can encounter hurdles. Here are the most common failure modes observed in enterprise environments:

Issue	Root Cause	Resolution
Email Already in Use	AWS account emails must be globally unique across all of AWS.	Use email sub-addressing (e.g., `ops+acc1@company.com`) if supported by your provider.
STS Timeout	AFT cannot assume the `AWSControlTowerExecution` role in the new account.	Check if a Service Control Policy (SCP) is blocking `sts:AssumeRole` in the target OU.
Customization Loop	Terraform state mismatch in the AFT pipeline.	Manually clear the DynamoDB lock table for the specific account ID in the AFT Management account.

Frequently Asked Questions

Can I use Terraform to deploy accounts without Control Tower?

Yes, using the aws_organizations_account resource. However, you lose the managed guardrails and automated dashboarding provided by Control Tower. For expert-level setups, Control Tower + AFT is the industry standard for compliance.

How does AFT handle Terraform state?

AFT manages state files in an S3 bucket within the AFT Management account. It creates a unique state key for each account it provisions to ensure isolation and prevent blast-radius issues during updates.

How long does a typical AWS account deployment take via AFT?

Usually between 20 to 45 minutes. This includes the time AWS takes to provision the physical account container, apply Control Tower guardrails, and run your custom Terraform modules.

Conclusion

Mastering AWS account deployment requires a shift from manual administration to a software engineering mindset. By treating your accounts as immutable infrastructure and managing them through Terraform and AWS Control Tower, you gain the ability to scale your cloud footprint with confidence. Whether you are managing five accounts or five thousand, the combination of AFT and IaC provides the consistency and auditability required by modern regulatory frameworks. For further technical details, refer to the Official AFT Documentation. Thank you for reading the DevopsRoles page!

Terraform

Terraform Secrets: Deploy Your Terraform Workers Like a Pro

12/26/2025 HuuPV Leave a comment

If you are reading this, you’ve likely moved past the “Hello World” stage of Infrastructure as Code. You aren’t just spinning up a single EC2 instance; you are orchestrating fleets. Whether you are managing high-throughput Celery nodes, Kubernetes worker pools, or self-hosted Terraform Workers (Terraform Cloud Agents), the game changes at scale.

In this guide, we dive deep into the architecture of deploying resilient, immutable worker nodes. We will move beyond basic resource blocks and explore lifecycle management, drift detection strategies, and the “cattle not pets” philosophy that distinguishes a Junior SysAdmin from a Staff Engineer.

The Philosophy of Immutable Terraform Workers

When we talk about Terraform Workers in an expert context, we are usually discussing compute resources that perform background processing. The biggest mistake I see in production environments is treating these workers as mutable infrastructure—servers that are patched, updated, and nursed back to health.

To deploy workers like a pro, you must embrace Immutability. Your Terraform configuration should not describe changes to a worker; it should describe the replacement of a worker.

GigaCode Pro-Tip: Stop using remote-exec provisioners to configure your workers. It introduces brittleness and makes your terraform apply dependent on SSH connectivity and runtime package repositories. Instead, shift left. Use HashiCorp Packer to bake your dependencies into a Golden Image, and use Terraform solely for orchestration.

Architecting Resilient Worker Fleets

Let’s look at the actual HCL required to deploy a robust fleet of workers. We aren’t just using aws_instance; we are using Launch Templates and Auto Scaling Groups (ASGs) to ensure self-healing capabilities.

1. The Golden Image Strategy

Your Terraform Workers should boot instantly. If your user_data script takes 15 minutes to install Python dependencies, your autoscaling events will be too slow to handle traffic spikes.

data "aws_ami" "worker_golden_image" {
  most_recent = true
  owners      = ["self"]

  filter {
    name   = "name"
    values = ["my-worker-image-v*"]
  }

  filter {
    name   = "tag:Status"
    values = ["production"]
  }
}

2. Zero-Downtime Rotation with Lifecycle Blocks

One of the most powerful yet underutilized features for managing workers is the lifecycle meta-argument. When you update a Launch Template, Terraform’s default behavior might be aggressive.

To ensure you don’t kill active jobs, use create_before_destroy within your resource definitions. This ensures new workers are healthy before the old ones are terminated.

resource "aws_autoscaling_group" "worker_fleet" {
  name                = "worker-asg-${aws_launch_template.worker.latest_version}"
  min_size            = 3
  max_size            = 10
  vpc_zone_identifier = module.vpc.private_subnets

  launch_template {
    id      = aws_launch_template.worker.id
    version = "$Latest"
  }

  instance_refresh {
    strategy = "Rolling"
    preferences {
      min_healthy_percentage = 90
    }
  }

  lifecycle {
    create_before_destroy = true
    ignore_changes        = [load_balancers, target_group_arns]
  }
}

Specific Use Case: Terraform Cloud Agents (Self-Hosted Workers)

Sometimes, “Terraform Workers” refers specifically to Terraform Cloud Agents. These are specialized workers you deploy in your own private network to execute Terraform runs on behalf of Terraform Cloud (TFC) or Terraform Enterprise (TFE). This allows TFC to manage resources behind your corporate firewall without whitelisting public IPs.

Security & Isolation

When deploying TFC Agents, security is paramount. These workers hold the “Keys to the Kingdom”—they need broad IAM permissions to provision infrastructure.

Network Isolation: Deploy these workers in private subnets with no ingress access, only egress (443) to app.terraform.io.
Ephemeral Tokens: Do not hardcode the TFC Agent Token. Inject it via a secrets manager (like AWS Secrets Manager or HashiCorp Vault) at runtime.
Single-Use Agents: For maximum security, configure your agents to terminate after a single job (if your architecture supports high churn) to prevent credential caching attacks.

# Example: Passing a TFC Agent Token securely via User Data
resource "aws_launch_template" "tfc_agent" {
  name_prefix   = "tfc-agent-"
  image_id      = data.aws_ami.ubuntu.id
  instance_type = "t3.medium"

  user_data = base64encode(<<-EOF
              #!/bin/bash
              # Fetch token from Secrets Manager (requires IAM role)
              export TFC_AGENT_TOKEN=$(aws secretsmanager get-secret-value --secret-id tfc-agent-token --query SecretString --output text)
              
              # Start the agent container
              docker run -d --restart always \
                --name tfc-agent \
                -e TFC_AGENT_TOKEN=$TFC_AGENT_TOKEN \
                -e TFC_AGENT_NAME="worker-$(hostname)" \
                hashicorp/tfc-agent:latest
              EOF
  )
}

Advanced Troubleshooting & Drift Detection

Even the best-architected Terraform Workers can experience drift. This happens when a process on the worker changes a configuration file, or a manual intervention occurs.

Detecting “Zombie” Workers

A common failure mode is a worker that passes the EC2 status check but fails the application health check. Terraform generally looks at the cloud provider API status.

The Solution: decouple your health checks. Use Terraform to provision the infrastructure, but rely on the Autoscaling Group’s health_check_type = "ELB" (if using Load Balancers) or custom CloudWatch alarms to terminate unhealthy instances. Terraform’s job is to define the state of the fleet, not monitor the health of the application process inside it.

Frequently Asked Questions (FAQ)

1. Should I use Terraform `count` or `for_each` for worker nodes?

For identical worker nodes (like an ASG), you generally shouldn’t use either—you should use an Autoscaling Group resource which handles the count dynamically. However, if you are deploying distinct workers (e.g., “Worker-High-CPU” vs “Worker-High-Mem”), use for_each. It allows you to add or remove specific workers without shifting the index of all other resources, which happens with count.

2. How do I handle secrets on my Terraform Workers?

Never commit secrets to your Terraform state or code. Use IAM Roles (Instance Profiles) attached to the workers. The code running on the worker should use the AWS SDK (or equivalent) to fetch secrets from a managed service like AWS Secrets Manager or Vault at runtime.

3. What is the difference between Terraform Workers and Cloudflare Workers?

This is a common confusion. Terraform Workers (in this context) are compute instances managed by Terraform. Cloudflare Workers are a serverless execution environment provided by Cloudflare. Interestingly, you can use the cloudflare Terraform provider to manage Cloudflare Workers, treating the serverless code itself as an infrastructure resource!

Conclusion

Deploying Terraform Workers effectively requires a shift in mindset from “managing servers” to “managing fleets.” By leveraging Golden Images, utilizing ASG lifecycle hooks, and securing your TFC Agents, you elevate your infrastructure from fragile to anti-fragile.

Remember, the goal of an expert DevOps engineer isn’t just to write code that works; it’s to write code that scales, heals, and protects itself. Thank you for reading the DevopsRoles page!

Kubernetes

Kyverno OPA Gatekeeper: Simplify Kubernetes Security Now!

12/18/2025 HuuPV Leave a comment

Securing a Kubernetes cluster at scale is no longer optional; it is a fundamental requirement for production-grade environments. As clusters grow, manual configuration audits become impossible, leading to the rise of Policy-as-Code (PaC). In the cloud-native ecosystem, the debate usually centers around two heavyweights: Kyverno OPA Gatekeeper. While both aim to enforce guardrails, their architectural philosophies and day-two operational impacts differ significantly.

Understanding Policy-as-Code in K8s

In a typical Admission Control workflow, a request to the API server is intercepted after authentication and authorization. Policy engines act as Validating or Mutating admission webhooks. They ensure that incoming manifests (like Pods or Deployments) comply with organizational standards—such as disallowing root containers or requiring specific labels.

Pro-Tip: High-maturity SRE teams don’t just use policy engines for security; they use them for governance. For example, automatically injecting sidecars or default resource quotas to prevent “noisy neighbor” scenarios.

OPA Gatekeeper: The General Purpose Powerhouse

The Open Policy Agent (OPA) is a CNCF graduated project. Gatekeeper is the Kubernetes-specific implementation of OPA. It uses a declarative language called Rego.

The Rego Learning Curve

Rego is a query language inspired by Datalog. It is incredibly powerful but has a steep learning curve for engineers used to standard YAML manifests. To enforce a policy in OPA Gatekeeper, you must define a ConstraintTemplate (the logic) and a Constraint (the application of that logic).

# Example: OPA Gatekeeper ConstraintTemplate logic (Rego)
package k8srequiredlabels

violation[{"msg": msg, "details": {"missing_labels": missing}}] {
  provided := {label | input.review.object.metadata.labels[label]}
  required := {label | label := input.parameters.labels[_]}
  missing := required - provided
  count(missing) > 0
  msg := sprintf("you must provide labels: %v", [missing])
}

Kyverno: Kubernetes-Native Simplicity

Kyverno (Greek for “govern”) was designed specifically for Kubernetes. Unlike OPA, it does not require a new programming language. If you can write a Kubernetes manifest, you can write a Kyverno policy. This makes Kyverno OPA Gatekeeper comparisons often lean toward Kyverno for teams wanting faster adoption.

Key Kyverno Capabilities

Mutation: Modify resources (e.g., adding imagePullSecrets).
Generation: Create new resources (e.g., creating a default NetworkPolicy when a Namespace is created).
Validation: Deny non-compliant resources.
Cleanup: Remove stale resources based on time-to-live (TTL) policies.

# Example: Kyverno Policy to require 'team' label
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: require-team-label
spec:
  validationFailureAction: Enforce
  rules:
  - name: check-team-label
    match:
      any:
      - resources:
          kinds:
          - Pod
    validate:
      message: "The label 'team' is required."
      pattern:
        metadata:
          labels:
            team: "?*"

Kyverno vs. OPA Gatekeeper: Head-to-Head

Feature	Kyverno	OPA Gatekeeper
Language	Kubernetes YAML	Rego (DSL)
Mutation Support	Excellent (Native)	Supported (via Mutation CRDs)
Resource Generation	Native (Generate rule)	Not natively supported
External Data	Supported (API calls/ConfigMaps)	Highly Advanced (Context-aware)
Ecosystem	K8s focused	Cross-stack (Terraform, HTTP, etc.)

Production Best Practices & Troubleshooting

1. Audit Before Enforcing

Never deploy a policy in Enforce mode initially. Both tools support an Audit or Warn mode. Check your logs or PolicyReports to see how many existing resources would be “broken” by the new rule.

2. Latency Considerations

Every admission request adds latency. Complex Rego queries or Kyverno policies involving external API calls can slow down kubectl apply commands. Monitor the admission_webhook_admission_duration_seconds metric in Prometheus.

3. High Availability

If your policy engine goes down and the webhook is set to FailurePolicy: Fail, you cannot update your cluster. Always run at least 3 replicas of your policy controller and use pod anti-affinity to spread them across nodes.

Advanced Concept: Use Conftest (for OPA) or kyverno jp (for Kyverno) in your CI/CD pipeline to catch policy violations at the Pull Request stage, long before they hit the cluster.

Frequently Asked Questions

Is Kyverno better than OPA?

“Better” depends on use case. Kyverno is easier for Kubernetes-only teams. OPA is better if you need a unified policy language for your entire infrastructure (Cloud, Terraform, App-level auth).

Can I run Kyverno and OPA Gatekeeper together?

Yes, you can run both simultaneously. However, it increases complexity and makes troubleshooting “Why was my pod denied?” significantly harder for developers.

How does Kyverno handle existing resources?

Kyverno periodically scans the cluster and generates PolicyReports. It can also be configured to retroactively mutate or validate existing resources when policies are updated.

Conclusion

Choosing between Kyverno OPA Gatekeeper comes down to the trade-off between power and simplicity. If your team is deeply embedded in the Kubernetes ecosystem and values YAML-native workflows, Kyverno is the clear winner for simplifying security. If you require complex, context-aware logic that extends beyond Kubernetes into your broader platform, OPA Gatekeeper remains the industry standard.

Regardless of your choice, the goal is the same: shifting security left and automating the boring parts of compliance. Start small, audit your policies, and gradually harden your cluster security posture.

Next Step: Review the Kyverno Policy Library to find pre-built templates for the CIS Kubernetes Benchmark. Thank you for reading the DevopsRoles page!