Linux Kernel (Ubuntu 14.04.3) - 'perf_event_open()' Can Race with execve() (Access /etc/shadow)










A race condition in perf_event_open() allows local attackers to leak sensitive data from setuid programs.

perf_event_open() associates with a task as follows:

		struct perf_event_attr __user *, attr_uptr,
		pid_t, pid, int, cpu, int, group_fd, unsigned long, flags)
	struct task_struct *task = NULL;
	if (pid != -1 && !(flags & PERF_FLAG_PID_CGROUP)) {
		task = find_lively_task_by_vpid(pid);
		if (IS_ERR(task)) {
			err = PTR_ERR(task);
			goto err_group_fd;
	event = perf_event_alloc(&attr, cpu, task, group_leader, NULL,
				 NULL, NULL, cgroup_fd);

In find_lively_task_by_vpid():

static struct task_struct *
find_lively_task_by_vpid(pid_t vpid)
	struct task_struct *task;
	int err;

	if (!vpid)
		task = current;
		task = find_task_by_vpid(vpid);
	if (task)

	if (!task)
		return ERR_PTR(-ESRCH);

	/* Reuse ptrace permission checks for now. */
	err = -EACCES;
	if (!ptrace_may_access(task, PTRACE_MODE_READ_REALCREDS))
		goto errout;

	return task;

Because no relevant locks (in particular the cred_guard_mutex) are held during the ptrace_may_access() call, it is possible for the specified target task to perform an execve() syscall with setuid execution before perf_event_alloc() actually attaches to it, allowing an attacker to bypass the ptrace_may_access() check and the perf_event_exit_task(current) call that is performed in install_exec_creds() during privileged execve() calls.

The ability to observe the execution of setuid executables using performance event monitoring can be used to leak interesting data by setting up sampling breakpoint events (PERF_TYPE_BREAKPOINT) that report userspace register contents (PERF_SAMPLE_REGS_USER) to the tracer. For example, __memcpy_sse2() in Ubuntu's eglibc-2.19 will copy small amounts of data (below 1024 bytes) by moving them through the registers RAX, R8, R9 and R10, whose contents are exposed by PERF_SAMPLE_REGS_USER. An attacker who can bypass userland ASLR (e.g. by bruteforcing the ASLR base address of the heap, which seems to only have ~16 bits of randomness on x86-64) can e.g. use this to dump the contents of /etc/shadow through /bin/su.

(The setting of the kernel.perf_event_paranoid sysctl has no impact on the ability of an attacker to leak secrets from userland processes using this issue.)

simple_poc.tar contains a simple PoC for 64bit that only demonstrates the basic issue by leaking the result of a getpid() call from a setuid executable:

$ ./test
too early
$ ./test
data_head is at 18
RAX: 9559

(If this seems to not be working, try running "while true; do ./test; done | grep -v --line-buffered 'too early'" loops in multiple terminal windows.)

shadow_poc.tar contains a poc which leaks 32 bytes of the user's entry in /etc/shadow on a Ubuntu 14.04.3 desktop VM if ASLR has been disabled (by writing a zero to /proc/sys/kernel/randomize_va_space as root)

$ ./test
data_head is at 1080
got data: hi-autoipd:*:16848:0:99999:7:::

got data: -dispatcher:!:16848:0:99999:7:::
got data: $6$78m54P0T$WY0A/Qob/Ith0q2MzmdS
$ sudo grep user /etc/shadow

(If it doesn't immediately work, it might need to be re-run a few times.)

The current PoC code isn't very good at hitting the race condition, and with ASLR enabled, dumping hashes from shadow would likely take days. With a more optimized attack, it might be possible to dump password hashes in significantly less time.

Fixed in

Proof of Concept: