Linux Kernel (ARM/ARM64) - 'perf_event_open()' Arbitrary Memory Read

EDB-ID:

40182

CVE:

N/A


Platform:

ARM

Published:

2016-07-29

perf_event_open() offers to collect various pieces of information when an event occurs, including a user stack backtrace (PERF_SAMPLE_CALLCHAIN). To collect a user stack backtrace, the kernel grabs the userland register state (if the event occured in kernelspace: the userland register state that was recorded on syscall entry), then walks the stackframes by following framepointers.

On ARM, the step from one stackframe to the next one is implemented in arch/arm/kernel/perf_callchain.c as follows:

/*
 * Get the return address for a single stackframe and return a pointer to the
 * next frame tail.
 */
static struct frame_tail __user *
user_backtrace(struct frame_tail __user *tail,
               struct perf_callchain_entry *entry)
{
        struct frame_tail buftail;
        unsigned long err;

        if (!access_ok(VERIFY_READ, tail, sizeof(buftail)))
                return NULL;

        pagefault_disable();
        err = __copy_from_user_inatomic(&buftail, tail, sizeof(buftail));
        pagefault_enable();

        if (err)
                return NULL;

        perf_callchain_store(entry, buftail.lr);

        /*
         * Frame pointers should strictly progress back up the stack
         * (towards higher addresses).
         */
        if (tail + 1 >= buftail.fp)
                return NULL;

        return buftail.fp - 1;
}

The access_ok() check is intended to prevent a malicious userland process from abusing the perf_event_open() API to leak kernelspace data. However, access_ok() does not actually check anything in set_fs(KERNEL_DS) sections, and performance events can occur in pretty much any context. Therefore, by causing a performance event to fire while e.g. the splice() syscall is running under KERNEL_DS, an attacker can circumvent this protection.

(The "tail + 1 >= buftail.fp" check has no relevance for an attacker; kernelspace addresses are higher than userspace addresses.)

After circumventing the protection, the attacker can set up a stackframe whose frame pointer points to an arbitrary kernelspace address. The kernel will follow that frame pointer, read the "saved link register" through it and make the result accessible to userspace. Therefore, this vulnerability can be used to read arbitrary kernelspace data.

The attached exploit can be used to leak 4 bytes at an arbitrary address, like this (tested on a Nexus 6, which runs a kernel based on upstream version 3.10, with a userdebug build that allows the shell user to get a root shell using "su"):

shell@shamu:/ $ su                                                             
root@shamu:/ # echo 0 > /proc/sys/kernel/kptr_restrict
root@shamu:/ # grep max_lock_depth /proc/kallsyms
c1042dc0 D max_lock_depth
root@shamu:/ # exit
shell@shamu:/ $ cat /proc/sys/kernel/max_lock_depth                            
1025
shell@shamu:/ $ /data/local/tmp/poc 0xc1042dc0
attempting to leak 0xc1042dc0
fake stackframe: fp=0xbeafd920
data_head is at e8
SUCCESS: 0x00000401
SUCCESS: 0x00000401
shell@shamu:/ $ su 
root@shamu:/ # echo 4100 > /proc/sys/kernel/max_lock_depth
root@shamu:/ # exit
shell@shamu:/ $ /data/local/tmp/poc 0xc1042dc0
attempting to leak 0xc1042dc0
fake stackframe: fp=0xbecbd920
data_head is at e8
SUCCESS: 0x00001004
SUCCESS: 0x00001004

(The number behind the "SUCCESS: " message is the leaked value.)

On recent kernels, the issue could be attacked more reliably using software events or tracepoints - however, before commit b3eac026 (first contained in Linux 4.2), there is no implementation of perf_arch_fetch_caller_regs() on ARM, making it impossible to exploit the issue that way.

The arm64 implementation seems to have the same issues as the arm implementation. The x86 code also looks dodgy and has an access_ok() check, but can't be exploited this way because of the valid_user_frame() check that occurs directly after the values have been read through the potentially-kernelspace pointer.

Regarding other architectures (which I haven't looked into in much detail because they seem less important): Interestingly, sparc already has a safe implementation that explicitly uses set_fs(USER_DS) to make access_ok() safe. tile doesn't seem to even make an effort to differentiate between kernelspace and userspace stacks at a first glance. xtensa has some code, but it looks dodgy. metag also has the bad access_ok() check, but does some sanity checking afterwards that makes it harder to attack. The powerpc code looks secure.

I have attached a completely untested patch that should fix the x86, arm and arm64 code.


Proof of Concept:
https://github.com/offensive-security/exploitdb-bin-sploits/raw/master/bin-sploits/40182.zip