Skip to main content

Command Palette

Search for a command to run...

What Is a Linux Namespace, Really? All 7, Explained.

How the kernel rewrites reality for every process.

Updated
โ€ข14 min read
What Is a Linux Namespace, Really? All 7, Explained.
P
Software and infrastructure engineer. Open source contributor. The person behind root.cause.dev. I build things, break them on purpose, and write about what I find underneath. Two-time GSoC alum. Speaker at OpenSearch Ahmedabad. BazelCon & OpenSearchCon Korea & Japan attendee.

TL;DR A Linux namespace wraps a global kernel resource and makes a process believe it has its own private copy. There are 7 types. Together, they are the entire reason containers exist, not Docker, not any runtime, just these kernel primitives. This post walks through all 7, with the filesystem evidence to prove each one.

Run ps aux inside a Docker container. You'll see almost nothing; just your process and maybe a handful of others. Run it on the host, and that same container process shows up with a PID somewhere in the thousands.

Same kernel. Same process. Two completely different numbers.

That's not a trick. That's not Docker doing something clever. That's the Linux kernel maintaining two separate views of the same reality, one for you, one for the container, and translating between them on every syscall. The mechanism that makes this possible is called a namespace, and it's the entire foundation of what we call a container.

Docker didn't invent this. The kernel did. Docker just learned how to ask for it.

The Mechanism First

Before we get into the 7 types, you need to understand one thing: a namespace is just a number.

Every process on your system belongs to a set of namespaces. The kernel tracks this in /proc.

Go look:

ls -la /proc/$$/ns/

Each /proc//ns/net entry references a network namespace. The number in brackets is its inode, which uniquely identifies that namespace instance. Processes with the same inode share the same network namespace (and thus the same network stack). Different inodes mean different network namespaces.

Three syscalls manipulate namespaces. clone() creates a new process inside a new namespace. unshare() moves the current process into a new one. setns() joins an existing namespace. When you run docker run, Docker calls clone() with namespace flags. When you run docker exec, it calls setns() to step into the container's existing namespaces. That's the whole mechanism.

The inode number is the namespace. If two processes share the same inode in /proc/<PID>/ns/net, they share a network stack. Not metaphorically โ€” that's exactly how the kernel tracks it.

Now let's go through all 7.

1. UTS - The Simplest One

UTS stands for UNIX Time-sharing System, which is a legacy name that tells you nothing useful. What it actually isolates is simpler: the hostname and the NIS domain name.

When a container has a different hostname than the host, this is why. Not configuration. Not a Docker setting. An isolated UTS struct in the kernel, one per namespace, each with its own hostname.

Let's prove it. The unshare command is a thin userspace wrapper around the unshare(2) syscall. It creates a new namespace of whatever type you specify and drops you into a shell inside it:

sudo unshare --uts bash

You're now in a shell with its own isolated hostname. It looks identical to the host shell โ€” same prompt, same terminal โ€” but change the hostname here and watch what happens:

hostname container-01
hostname

Now open a second terminal on the host and run the same command:

hostname
yourhostname

The host didn't move. You changed the hostname inside a new UTS namespace, and the kernel kept it entirely separate. Two hostname values, one kernel, zero conflict.

๐ŸŸฉ Try It Yourself

# Terminal 1
sudo unshare --uts bash
hostname inside-ns
hostname
# Expected: inside-ns

# Terminal 2 (host)
hostname
# Expected: your original hostname, untouched

2. PID - The Namespace That Makes a Process Feel Like PID 1

This one is where things get philosophically interesting.

Inside a PID namespace, the process tree starts at 1. The first process is PID 1. Its children are PID 2, 3, 4. The process has no visibility into any PIDs that exist outside its namespace; as far as it's concerned, nothing outside its tree exists.

But from the host? That same process has a completely different PID. The kernel is maintaining two separate numbering schemes simultaneously, translating between them on every relevant syscall.

sudo unshare --pid --fork bash

The --fork flag is required here; the OCI spec mandates that the first process in a PID namespace must be forked. Drop into this shell and check:

echo $$

PID 1. Now open a terminal on the host and find the actual PID this process has in the real world:

ps aux | grep bash
root     <PID>(48921)  0.0  0.0  ... bash

Look:

The same process. PID 1 to itself, PID (48921)(check this on your system) to the host. The kernel maintains a translation table per namespace and runs every PID-related syscall through it.

There's a footgun here worth knowing. ps reads its data from /proc. If you run ps inside a PID namespace without also mounting a new /proc, it reads the host's /proc ; and shows you host PIDs, not namespace PIDs. Docker handles this automatically by combining a PID namespace with a mount namespace and a fresh /proc. We'll see the mount namespace next.

A container process is PID 1 to itself and PID 48921 to the host. There's no deception happening; just a translation table. The kernel converts PIDs at the namespace boundary, transparently, on every syscall.

๐ŸŸฉ Try It Yourself

sudo unshare --pid --fork --mount-proc bash
echo $$
# Expected: 1

ps aux
# Expected: only bash and ps โ€” the entire host process tree is invisible

3. NET - Why Two Containers Can Both Use Port 8080

Here's a question that trips up a lot of people: if containers are just processes, and two processes try to bind port 8080 on the same machine, how does that not immediately crash?

The answer is that they're not on the same machine, as far as the kernel is concerned. Each container lives in its own network namespace, its own private network stack, with its own interfaces, its own routing table, its own iptables rules, its own port table. Port 8080 in one namespace has no relationship to port 8080 in another.

The ip netns command lets you work with network namespaces directly:

sudo ip netns add testns
sudo ip netns exec testns ip link

That's it. One interface, loopback, and it's DOWN. No eth0, no routes, no nothing. This namespace was just created and it has an empty network stack. Compare that to the host:

ip link

The host sees its full network, the bridge Docker creates, the real interface, everything. The namespace sees only what's been explicitly given to it. The port table inside the namespace is empty and independent. Start a server on port 8080 inside one namespace, start another on port 8080 in a second namespace; neither one knows the other exists.

Port conflicts between containers aren't just prevented; they're structurally impossible. Each network namespace has its own port table. There is no shared port space to conflict in.

# Clean up
sudo ip netns delete testns

๐ŸŸฉ Try It Yourself

sudo ip netns add mytest
sudo ip netns exec mytest ip route
# Expected: (empty) โ€” no routes exist in a fresh namespace

ip route
# Expected: your actual host routing table

sudo ip netns delete mytest

4. MNT - How a Container Gets a Different Root Than the Host

The mount namespace is where containers get their own filesystem. Their own /. Their own /usr, /bin, /lib - completely different content from the host's, with the host's root filesystem entirely invisible.

This sounds complicated. The mechanism is not. A mount namespace gives a process its own private copy of the mount table. Mounts made inside it don't propagate to the host. The host never sees them.

sudo unshare --mount bash

You're now in a shell with a private mount table. Verify the namespace changed:

readlink /proc/$$/ns/mnt
mnt:[4026532895]

Different inode than the host. Different mount table. Now make a mount that the host will never see:

mkdir /tmp/isolated-mount
mount -t tmpfs tmpfs /tmp/isolated-mount
df -h | grep isolated

Switch to a host terminal:

df -h | grep isolated

No output.

A filesystem was mounted. The host has no idea. The mount namespace gave this shell a private copy of the mount table, and the change was local.

When Docker pulls an image and starts a container, this is the core of what happens. A new mount namespace. The image layers assembled into a root filesystem. That root mounted as the container's /. The host's real / untouched โ€” not shadowed, not modified, not even read.

When Docker runs a container, it doesn't modify the host's filesystem. It creates a new mount namespace, assembles the image layers into a root, and mounts it privately. The host never sees it because mount namespaces give each process a personal copy of the mount table.

๐ŸŸฉ Try It Yourself

sudo unshare --mount bash
mkdir /tmp/ns-test
mount -t tmpfs tmpfs /tmp/ns-test
findmnt /tmp/ns-test
# Expected: tmpfs mounted at /tmp/ns-test

# New terminal on host:
findmnt /tmp/ns-test
# Expected: no output โ€” completely invisible

5. IPC - The Quiet One

IPC (Inter-Process Communication) isolates System V shared memory segments, semaphore arrays, and POSIX message queues. It's less dramatic to demo than network or PID, but the security implication is real: without IPC isolation, a containerized process could attach to shared memory segments owned by processes outside it.

Create a shared memory segment on the host:

ipcmk -M 1024
ipcs -m

Now step into a new IPC namespace:

sudo unshare --ipc bash
ipcs -m

Empty! Gone. The shared memory segment exists on the host, but this namespace has its own IPC resource table, and it's empty. No way to accidentally attach, no way to read across the boundary.

IPC namespaces mean a container can't accidentally exhaust the host's shared memory, and can't attach to IPC resources owned by processes outside it. Each namespace has its own table. They don't overlap.


6. USER - The One That Makes Rootless Containers Work

The user namespace is the most powerful of the seven, and the one that most completely changes the security picture. It lets the kernel maintain two separate UID/GID spaces, one inside the namespace, one outside, with a mapping table between them.

The practical result: a process can be UID 0 (root) inside a user namespace while being UID 1000 (an unprivileged user) on the host. Root inside, nobody outside.

This is how rootless containers work. Not a Docker feature. Not a containerd feature. A kernel feature.

And critically, creating a user namespace can be done without root if the kernel allows unprivileged user namespaces(when your kernel allows it). That's the point:

unshare --user --map-root-user bash

No sudo. You just created a new user namespace as a regular user. Check what the kernel thinks you are now:

id

Root. But read the mapping file:

cat /proc/$$/uid_map

That line says: namespace UID 0 maps to host UID 0, for a range of 1. You are root in this namespace's world. You are UID 0 in the host's world. The kernel runs every permission check through this table. Inside the namespace, ls /root works. On the host, that same process is unprivileged and always was.

Rootless containers aren't about bypassing security. They use user namespaces to give a process root-like capabilities inside an isolated scope, while the process remains entirely unprivileged from the host's perspective. The mapping table is the entire mechanism.

๐ŸŸฉ Try It Yourself

# As a regular user โ€” no sudo
unshare --user --map-root-user bash
whoami
# Expected: root

# In another terminal, check the real UID
ps aux | grep bash
cat /proc/<PID>/status | grep Uid
# Expected: your real UID (e.g. 1000), not 0

7. TIME - The Newest Arrival

The time namespace arrived in Linux 5.6, released March 2020. It's the youngest of the seven, and the most narrowly targeted. It allows processes to see a different CLOCK_MONOTONIC and CLOCK_BOOTTIME than the host, a per-process offset on the system's monotonic clock.

The use case isn't exotic. CRIU (Checkpoint/Restore in Userspace) needs this. When you checkpoint a process, snapshot its entire memory state, all open file descriptors, current execution point, and restore it on a different machine, the monotonic clock will have a completely different value. Without a time namespace, every timer and timeout in the restored process is wrong. With one, you can set an offset that makes the process believe it was never interrupted.

uname -r

If you're on 5.6+, the time namespace is available. Verify:

ls /proc/$$/ns/time

Create one with a clock offset and compare:

sudo unshare --time --boottime 100 bash
cat /proc/uptime
1523.42 4012.88

On the host:

cat /proc/uptime

The process inside the time namespace believes the system booted 100 seconds before it actually did. The host's real clock didn't move at all. Just this process's view of it.

The time namespace doesn't touch the system clock. It changes what one process's syscalls return when they ask the kernel what time it is. The wall clock is untouched. The process's world shifts.

All 7, Together

When you run docker run ubuntu bash, the Docker daemon calls clone() with flags for every namespace that should be isolated.

The container is not a special object. It's a process, created with clone(), namespace flags and all, that then calls execve() to replace itself with your command. The isolation is entirely the kernel's work. Docker handles image layers, networking setup, and lifecycle. The actual isolation? Seven numbers in /proc/<PID>/ns/.

You can verify any running container's namespaces directly:

docker run -d --name ns-demo ubuntu sleep 1000
docker inspect ns-demo --format '{{.State.Pid}}'
# 424753

sudo ls -la /proc/424753/ns/

Every inode in that output that differs from your shell's /proc/$$/ns/ is an isolated resource. Count them. That's your container.

A container is not a thing. It's a process with namespace flags. The word "container" is a human abstraction over a set of kernel inodes. The kernel doesn't know what a container is. It only knows namespaces.

๐ŸŸฉ Try It Yourself โ€” The Full Picture

docker run -d --name ns-demo ubuntu sleep 1000

# Get its host PID
PID=$(docker inspect ns-demo --format '{{.State.Pid}}')

# Compare its namespaces to your shell
diff <(ls -la /proc/$PID/ns/) <(ls -la /proc/$$/ns/)

# Every differing inode = one axis of isolation

docker rm -f ns-demo
What You Can Now Explain
  • Why a container has its own hostname without touching the host: UTS namespace keeps separate hostname structs per namespace

  • How a process is PID 1 inside a container and PID 49201 (ex. pid) on the host simultaneously: the PID namespace translation table in the kernel

  • Why two containers can both listen on port 8080 without conflict: each lives in a separate network namespace with its own port table

  • How a container gets a completely different root filesystem: MNT namespace with a private mount table, image layers assembled as a private root

  • Why a container can't attach to the host's shared memory: IPC namespace gives each container its own IPC resource table

  • How rootless containers work: USER namespace UID mapping: root inside, unprivileged outside

  • What the time namespace does and who actually needs it: per-process clock offset for checkpoint/restore workflows

  • What docker run is actually doing at the syscall level: clone() with six namespace flags, then execve()

  • How to read a container's namespace membership directly from /proc/<PID>/ns/

Next Post

โ†’ Why cgroups v2 replaced cgroups v1 - namespaces control what a process sees. cgroups control what it gets. Memory limits, CPU shares, I/O throttling - all of that lives in cgroups. The next post covers what was broken in v1 (and it was genuinely broken), why the kernel team rewrote it, and what that means for every memory limit you've ever set on a container.

root/cause - Not tutorials. Just the real picture.