Docker: insecure opening of file-descriptor allows privilege escalation

  • I discovered the vulnerability, and I'm not entirely sure that Trevor Jay fully understands the issue (though to be fair, the easiest way of exploiting it is using ptrace(2) which is blocked by most default security policies). You don't need to use ptrace(2) or CAP_SYS_PTRACE to exploit the vulnerability.

    You just need to have proc_fd_access_allowed(). I've not checked if ptrace_may_access(task, PTRACE_MODE_READ_FSCREDS) calls into SELinux hooks (it probably does, and if it doesn't then resolving further files probably does too) but neither seccomp profiles (unless you're blocking open(2)) nor blocking CAP_SYS_PTRACE can help you here.

    Now, the LXC exploit used ptrace in order to stop the process from closing its file descriptors. I'm not sure how you would reliably hit the race in this issue (something with SIGSTOP presumably?).

    In any case, SUSE's update has additional fixes which also fix the issue even when you give a container CAP_SYS_PTRACE (the released patch does _not_ protect containers that have CAP_SYS_PTRACE enabled). The patches will be merged upstream ASAP, but Docker didn't want them in the patchset sent to its customers (preferring instead to update their vendored runC once they are merged upstream).

  • For those that won't open the link :)

    Update by Trevor Jay:

    "This is an extremely difficult to exploit flaw on standard RHEL and Fedora systems.

    I checked the 1.10.3 and 1.12.5 builds on Brew. Both drop the `CAP_SYS_PTRACE` capability by default. 1.10.3 blacklists `ptrace` calls under the default seccomp profile. Thus, this flaw only comes into play for containers that already have elevated privileges.

    Even if `ptrace` is available. The proposed exploit scenario of quickly attaching to a process joining the container space and using its file descriptors is not possible under the default SELinux configuration. The containerized PID 1 will have a type of `container_t` or similar SELinux type and thus will be blocked by standard type enforcement from accessing accessing any resources that haven't already been made available to containerized processes."

  • Oh good, yet another vulnerability from the model of retroactively changing the execution environment of a process after it's been created. We had a thread about setuid binaries a week ago, which is the most common case of this design: https://news.ycombinator.com/item?id=13312722

    We would all be better off if we designed systems such that some helper process, already running with the right environment / config / privileges, spawns the process for you and proxies input/output to your terminal.

    And (as I mentioned in the other thread) this helper process could be literally sshd. Instead of having sudo, ssh root@localhost. No weird process trees with confusing things like effective UIDs. Instead of having runc exec, ssh root@container. No file descriptors get passed that aren't explicitly forwarded over the SSH connection.

    Patching sshd to run over UNIX sockets without encryption and to use getpeername() for authentication is left as an exercise to the reader.

  • CoreOS engineers started deploying patches across all channels for this CVE minutes after it was made public. More info here: https://coreos.com/blog/cve-2016-9962.html

  • A fix for CVE-2016-9962 was released in Docker upstream 1.12.6 a couple of days ago.

    https://github.com/docker/docker/releases/tag/v1.12.6

  • is there a poc? would love to see a walkthrough