Why Containers Are Not Proper Isolation for Code Sandboxes
Containers were built for packaging and deployment, not for running untrusted code. Here's why shared-kernel isolation falls short for code sandboxes and what actually works.
Steven Passynkov
People assume containers are secure
Containers get their own filesystem, network, and process tree through namespaces and cgroups. For packaging and deployment, it makes them seem like proper isolation. However, it isn’t.
The problem is when people use them to run code they didn’t write, don’t control, and don’t trust, container isolation is compromised. With AI agents now executing code on your behalf in both local and production environments, this is becoming a real issue. Every container on the host still shares the same kernel. If there’s an exploitable CVE in that kernel version, every container on the box is exposed. A VM like Firecracker runs its own kernel entirely, and gVisor intercepts syscalls in user space so containers never talk to the host kernel directly. Both drastically reduce that attack surface.
How container escapes usually happen
Most container escapes fall into a few categories. Some exploit bugs in the shared kernel, others target the container runtime itself, and some just take advantage of careless configuration.
Memory corruption in the kernel
Because every container shares the host kernel, a single use-after-free or overflow in a kernel subsystem can give an attacker full host access from inside any container on the box.
CVE-2022-0847, known as Dirty Pipe, let any unprivileged process overwrite data in read-only files by abusing how the kernel handled pipe buffers. Trivial to exploit and immediately dangerous in shared-kernel environments.
CVE-2016-5195, known as Dirty COW, exploited a race condition in the kernel’s copy-on-write mechanism.
CVE-2024-1086 is a use-after-free in nf_tables, the Linux netfilter subsystem. It became widely known for how reliably it could be exploited, making it a near-guaranteed privilege escalation on affected kernels.
CVE-2022-0185 hit the filesystem context handling path and was severe enough that Google paid a bounty through their kCTF program for a working escape against hardened Kubernetes.
CVE-2021-22555 is a heap out-of-bounds write in Netfilter’s setsockopt handling, used by a Google researcher to escape a hardened kCTF Kubernetes container.
Bugs in the container runtime
Not every escape goes through the kernel. The runtime components that set up and manage containers (runc, containerd, CRI-O, etc.) have their own attack surface.
CVE-2019-5736 allowed a malicious container to overwrite the host runc binary through /proc/self/exe, meaning the next time anyone ran a container on that host, they’d execute attacker-controlled code.
CVE-2024-21626, part of the Leaky Vessels series, exploited a file descriptor leak in runc that gave containers access to the host filesystem.
CVE-2022-0811 targeted CRI-O’s kernel parameter handling, allowing a container to set arbitrary sysctl values and escape.
CVE-2020-15257 exposed the containerd shim API over an abstract Unix socket, letting containers on the same host talk directly to the runtime.
In November 2025, three related runc vulnerabilities were disclosed together, all fixed in runc 1.2.8 and 1.3.3.
CVE-2025-31133exploited insufficient verification of/dev/nullbind-mount sources, allowing an attacker to use symlink races to create arbitrary mount gadgets that lead to host information disclosure, denial of service, or full container escape.CVE-2025-52565targeted the same class of flaw in the/dev/consolebind-mount path, letting an attacker gain writable access to normally masked paths like/proc/sysrq-triggeror/proc/sys/kernel/core_pattern.CVE-2025-52881used racing containers with shared mounts to misdirect runc’s writes to/procthrough symlinks in tmpfs, and was confirmed exploitable through a standarddocker buildx build.
Misconfigurations
Sometimes there is no bug at all. The escape is just a consequence of how the container was configured.
Running a container with --privileged disables nearly every isolation mechanism Linux provides (capabilities, seccomp, device cgroup restrictions) and effectively gives the container root-equivalent access to the host.
Mounting the Docker or containerd socket into a container lets anything inside it control the runtime, which means it can launch new privileged containers or manipulate existing ones.
Bind-mounting sensitive host paths like /proc, /sys, or /var/log can leak host information or provide write access to kernel tunables.
There will be more CVEs in the future
Every CVE listed above has been patched. If you’re running up-to-date kernels and runtimes, none of these specific bugs will affect you. That’s not the point.
The point is that the Linux kernel has millions of lines of code, and container runtimes sit directly on top of it. New escape paths are found every year. Three dropped for runc in a single month in 2025. The attack surface isn’t shrinking, and patching yesterday’s bugs does nothing about tomorrow’s.
Containers were designed to isolate applications from each other, not to defend against code that is actively trying to break out. If you’re running AI generated code, you need a boundary that doesn’t depend on a shared kernel being bug-free.
How Leap0 approaches this
At Leap0, we treat containers as a packaging primitive, not a trust boundary.
For code sandboxes, Leap0 uses Firecracker microVMs with Jailer. Firecracker is a lightweight virtual machine monitor built by AWS that boots a full guest kernel in under 125ms. Each microVM gets its own kernel, so a bug in the guest kernel doesn’t compromise the host. Jailer wraps around Firecracker and locks down the host-side process with chroot, cgroups, seccomp filters, and dropped capabilities so that even if the VMM itself is compromised, the blast radius stays contained.
Bottom line
Containers are great for trusted app deployment. They are not a sufficient boundary for AI-agent code execution.
If your product needs to run untrusted code, get started with Leap0.