Exploitation With WriteProcessMemory() - Yet Another DEP Trick









  !-----------=|     Exploitation With WriteProcessMemory()     |=-----------!
  !-----------=|              Yet Another DEP Trick             |=-----------!
  !-----------=|                       ----                     |=-----------! 
  !-----------=|            Written By Spencer Pratt            |=-----------!
  !-----------=|            spencer.w.pratt () gmail com           |=-----------!
  !--=[ dd6c2309cab71bdb3aabce69cacb6f5f6c0e2d60bd51d6f629904553a8dc0a7c ]=--!
  !--=[ 5db5cef0f8e0a630d986b91815336bc9a81ebe5dbd03f1edaddde77e58eb2dba ]=--!

                ----=!#[     Table of Contents     ]#!=----
                ---                                     ---
                --  I.       Introduction                -- 
                --  II.      Background Information      -- 
                --  III.     WriteProcessMemory()        -- 
                --  VI.      Slightly Clever             -- 
                --  VII.     Return Chaining             --                  
                --  VIII.    Conclusions                 -- 
                --  IX.      Special Thanks, Greets      -- 
                ---                                     --- 

  ----------------------=![      Introduction      ]!=------------------------

  This paper introduces yet another function to defeat Windows DEP. It is
  assumed that the reader is already familiar with buffer overflows on x86,
  and has a basic understanding of the DEP protection mechanism. The technique
  discussed in this paper is aimed at Windows XP, however, it should also work
  on other Windows versions given that the attacker has some way to find the 
  address of the DLL, such as through a memory disclosure, etc. This paper
  does not address the issue of ASLR, rather it recognizes ASLR as a
  completely separate problem. The method described here is not conceptually 
  groundbreaking, and is ultimately only as impressive as any other ret-2-lib

  -----------------=! [      Background Information      ] !=-----------------
  The introduction of DEP and other mechanisms has slightly raised the bar for
  exploitation on Windows. Variations on the ret-2-lib technique have been 
  used in order to circumvent DEP. Some popular functions are:

    - WinExec() to execute a command: still useful but not as desirable as 
      having arbitrary shellcode execution.

    - VirtualProtect() to make memory executable: still useful, but often 
      requires ROP.

    - VirtualAlloc() to allocate new executable memory: still useful but 
      often requires ROP.

    - SetProcessDEPPolicy() to disable DEP: doesn't work if DEP is AlwaysOn, 
      and may only be called once per process.

    - NtSetProcessInformation() to disable DEP: this function fails if 
      AlwaysOn or MEM_EXECUTE_OPTION_PERMANENT flag is set.

  ------------------=! [      WriteProcessMemory()      ] !=------------------

  If you can't go to the mountain, bring the mountain to you.

  The function WriteProcessMemory() is typically used for debugging, and as 
  defined by MSDN it:

     "Writes data to an area of memory in a specified process. The entire 
     area to be written must be accessible or the operation fails."

  The function takes the following arguments:

  WriteProcessMemory(HANDLE hProcess, LPVOID lpBaseAddress, LPCVOID lpBuffer, 
           SIZE_T nSize, SIZE_T *lpNumberBytesWritten);

  The idea here is simple: if it is not possible execute the writable memory, 
  write to the executable memory instead. By returning to WriteProcessMemory() 
  it is possible to write arbitrary code into a running thread, effectively
  hot-patching it with shellcode. This works because WriteProcessMemory() 
  performs the required privilege changes using NtProtectVirtualMemory() to 
  allow the memory to be written to, regardless of being marked as executable.

  --------------------=! [      Slightly Clever      ] !=---------------------

  The caveats of WriteProcessMemory() introduce a couple of problems to solve
  before the function is eligible for exploitation. First, finding a suitable
  location to patch can be a difficult task. Second, the final argument needs
  to be NULL or a pointer to memory where the lpNumberBytesWritten is stored.
  As luck would have it, there is an easy solution to handle both issues: use
  the WriteProcessMemory() function to write to itself. 

  Using WriteProcessMemory() to patch itself removes the requirement of 
  finding a location in a thread to patch, as the destination address is now
  offset from the known address of WriteProcessMemory(). It also removes the 
  need for a second jmp/call to the patched location, as the natural flow of
  execution will walk directly into the patched code. Finally, by carefully 
  picking the offset into WriteProcessMemory() to patch, it eliminates the 
  need for the last pointer argument (or NULL), by overwriting the code that
  performs the pointer check and then stores the lpNumberBytesWritten.
  Finding a suitable location to write code to inside WriteProcessMemory() is
  easy. Observe the function code snip below:
    WindowsXP kernel32.dll, WriteProcessMemory 0x7C802213+...
      7C8022BD:  lea     eax, [ebp + hProcess]
      7C8022C0:  push    eax
      7C8022C1:  push    ebx
      7C8022C2:  push    [ebp + lpBuffer]
      7C8022C5:  push    [ebp + lpBaseAddress]
      7C8022C8:  push    edi
      7C8022C9:  call    NtWriteVirtualMemory
      7C8022CF:  mov     [ebp + lpBuffer], eax
      7C8022D2:  mov     eax, [ebp + lpNumberBytesWritten]
      7C8022D5:  test    eax, eax
      7C8022D7:  jz      short 7C8022DE
  The last operation that needs to complete in order to successfully patch
  the process is the call to NtWriteVirtualMemory() at 0x7C8022C9. The setup 
  for storing lpNumberBytesWritten starts afterwards, and so 0x7C8022CF is 
  the ideal destination address to begin overwriting. Immediately after the 
  write is completed the function flows directly into the freshly written
  code. This allows the bypass of permanent DEP in one call.

  The arguments to do this look like this:

  WriteProcessMemory(-1, 0x7C8022CF, ShellcodeAddr, ShellcodeLen, ..Arbitrary)

  The first argument, -1 for hProcess HANDLE, specifies the current process.
  The second argument is the offset into WriteProcessMemory() where shellcode
  will be written. The third argument, ShellcodeAddr, needs to be the address
  of shellcode stored somewhere in memory; this could be code that has been 
  sprayed onto the heap, or at a location disclosed by the application. The 
  fourth argument is the length of shellcode to copy. The last argument is no
  longer relevant as the code that deals with it is being overwritten by the 
  copy itself. 

  For a textbook example stack overflow this payload layout looks like:
  [0x7C802213] [AAAA] [0xffffffff] [0x7C8022CF] [&shellcode] [length]
  ^            ^      ^            ^            ^            ^
  '            '      '            '            '            '
  '            '      '            '            '            shellcode length
  '            '      '            '            '
  '            '      '            '            shellcode address
  '            '      '            '                  
  '            '      '            dest address in WriteProcessMemory()
  '            '      '
  '            '      hProcess HANDLE (-1)
  '            '
  '            next return address (irrelevant)
  WriteProcessMemory() address, overwritten EIP

  --------------------=! [      Return Chaining      ] !=---------------------

  The technique as described is still imperfect: it requires knowing where
  shellcode is in memory. Ideally, the location of the WriteProcessMemory() 
  function (kernel32.dll) should be all that is required to successfully land
  arbitrary code execution. Consider a scenario where control of the stack is
  gained, but the location of the stack or orther data (other than the address
  of WriteProcessMemory) is unknown. By chaining multiple calls together to 
  copy from offsets of known data, WriteProcessMemory() can be used to build
  shellcode dynamically from already existing code.

  In order to perform this, the following steps need to be taken:

    1. Locate offsets for the op codes and data to compose the shellcode with.

    2. Identify a location with enough space to patch, which does not conflict
       with any of the locations being copied from.
    3. Perform multiple returns to WriteProcessMemory(), patching the location
       with shellcode chunks from offsets.

    4. Return to newly patched shellcode.

  Step 1 of this process allows for some space optimization. Searching for 
  and finding multibyte sequences of the desired shellcode (rather than just 
  single bytes) allows for fewer returns to WriteProcessMemory(), and thus 
  less required space for the chained stack arguments.

  Consider generic win32 calc.exe shellcode from Metasploit as an example:


  By breaking this shellcode down into every possible unique chunk of 2 bytes
  or more, and then searching for it in kernel32.dll, it is easy to find the 
  pieces to dynamically construct this code. Of course, not all of this code 
  will be available in multibyte sequences. In turn some of the pieces will 
  need to be copied in as single bytes. Here is the output from an automated 
  scan for these sequences, code to build this table is provided later on:

    |----    Bytes    ----------   PE/DLL   ---    WPM()  ---|
    |  shellcode[000-001]        0x7c8016d9 -->  0x7c861967  |
    |  shellcode[001-006]        0x7c81b11c -->  0x7c861968  |
    |  shellcode[006-010]        0x7c8285e3 -->  0x7c86196d  |
    |  shellcode[010-012]        0x7c801e3c -->  0x7c861971  |
    |  shellcode[012-014]        0x7c804714 -->  0x7c861973  |
    |  shellcode[014-015]        0x7c801aa6 -->  0x7c861975  |
    |  shellcode[015-018]        0x7c87acf4 -->  0x7c861976  |
    |  shellcode[018-020]        0x7c80a2b1 -->  0x7c861979  |
    |  shellcode[020-022]        0x7c804664 -->  0x7c86197b  |
    |  shellcode[022-025]        0x7c84266b -->  0x7c86197d  |
    |  shellcode[025-026]        0x7c801737 -->  0x7c861980  |
    |  shellcode[026-028]        0x7c80473a -->  0x7c861981  |
    |  shellcode[028-030]        0x7c81315c -->  0x7c861983  |
    |  shellcode[030-032]        0x7c802b44 -->  0x7c861985  |
    |  shellcode[032-034]        0x7c81a061 -->  0x7c861987  |
    |  shellcode[034-037]        0x7c812ae7 -->  0x7c861989  |
    |  shellcode[037-038]        0x7c801639 -->  0x7c86198c  |
    |  shellcode[038-040]        0x7c841d31 -->  0x7c86198d  |
    |  shellcode[040-042]        0x7c8047a7 -->  0x7c86198f  |
    |  shellcode[042-044]        0x7c8121da -->  0x7c861991  |
    |  shellcode[044-047]        0x7c80988f -->  0x7c861993  |
    |  shellcode[047-048]        0x7c8016dc -->  0x7c861996  |
    |  shellcode[048-051]        0x7c84a0d0 -->  0x7c861997  |
    |  shellcode[051-052]        0x7c801a8a -->  0x7c86199a  |
    |  shellcode[052-054]        0x7c802e41 -->  0x7c86199b  |
    |  shellcode[054-055]        0x7c8016fb -->  0x7c86199d  |
    |  shellcode[055-059]        0x7c84bb29 -->  0x7c86199e  |
    |  shellcode[059-062]        0x7c80a2b1 -->  0x7c8619a2  |
    |  shellcode[062-063]        0x7c801677 -->  0x7c8619a5  |
    |  shellcode[063-065]        0x7c8210f4 -->  0x7c8619a6  |
    |  shellcode[065-067]        0x7c801e9a -->  0x7c8619a8  |
    |  shellcode[067-068]        0x7c801677 -->  0x7c8619aa  |
    |  shellcode[068-070]        0x7c821d86 -->  0x7c8619ab  |
    |  shellcode[070-071]        0x7c8019ba -->  0x7c8619ad  |
    |  shellcode[071-072]        0x7c801649 -->  0x7c8619ae  |
    |  shellcode[072-073]        0x7c8016dc -->  0x7c8619af  |
    |  shellcode[073-075]        0x7c832d0b -->  0x7c8619b0  |
    |  shellcode[075-076]        0x7c8023e4 -->  0x7c8619b2  |
    |  shellcode[076-078]        0x7c86a706 -->  0x7c8619b3  |
    |  shellcode[078-080]        0x7c80e11b -->  0x7c8619b5  |
    |  shellcode[080-083]        0x7c8325a2 -->  0x7c8619b7  |
    |  shellcode[083-087]        0x7c840db2 -->  0x7c8619ba  |
    |  shellcode[087-089]        0x7c812ff8 -->  0x7c8619be  |
    |  shellcode[089-091]        0x7c82be3c -->  0x7c8619c0  |
    |  shellcode[091-093]        0x7c802552 -->  0x7c8619c2  |
    |  shellcode[093-094]        0x7c80168e -->  0x7c8619c4  |
    |  shellcode[094-097]        0x7c81cd28 -->  0x7c8619c5  |
    |  shellcode[097-100]        0x7c812cc3 -->  0x7c8619c8  |
    |  shellcode[100-101]        0x7c80270d -->  0x7c8619cb  |
    |  shellcode[101-102]        0x7c80166b -->  0x7c8619cc  |
    |  shellcode[102-103]        0x7c801b17 -->  0x7c8619cd  |
    |  shellcode[103-105]        0x7c804d40 -->  0x7c8619ce  |
    |  shellcode[105-106]        0x7c802638 -->  0x7c8619d0  |
    |  shellcode[106-108]        0x7c82c4af -->  0x7c8619d1  |
    |  shellcode[108-111]        0x7c85f0b6 -->  0x7c8619d3  |
    |  shellcode[111-112]        0x7c80178f -->  0x7c8619d6  |
    |  shellcode[112-115]        0x7c804bed -->  0x7c8619d7  |
    |  shellcode[115-116]        0x7c80232d -->  0x7c8619da  |
    |  shellcode[116-121]        0x7c84eac0 -->  0x7c8619db  |

  As the scan shows, the shellcode is 121 bytes long, but using multibyte 
  sequences allows this code to be built by chaining just 59 calls to 

  Step 2 differentiates this from the previous technique of patching
  WriteProcessMemory() itself. In order to avoid accidentally overwriting some
  useful area, and for overall simplicity, it is best pick the address of a 
  disposable function or code area from kernel32.dll to overwrite. The example
  used in this paper is the GetTempFileNameA() function at 0x7c861967, as this
  code area has no overlap with WriteProcessMemory(), nor does it overlap with
  any of the shellcode offsets.

  Provided below is a base64 encoded zip of a python script to perform all of
  these steps. It scans for and maps locations of shellcode pieces to the 
  function at 0x7c861967. It prints the table displayed above, showing all of
  the mapped locations, as well as writes an output file containing the stack
  frames actually used to perform the return chaining. This can be decoded 

  $ base64 --decode pe_seance-base64.txt > pe_seance.py.zip
  $ unzip pe_seance.py.zip


  ----------------------=! [      Conclusions      ] !=-----------------------

  WriteProcessMemory() offers DEP-bypassing functionality for multiple 
  exploitation scenarios. In the scenario where a decent guess can be made for
  the location of shellcode, this function proves to be a convenient single 
  hop solution. Even when the location of shellcode is undetermined, so long  
  as stack space is available to chain multiple returns, WriteProcessMemory()
  is very helpful. 

  -----------------=! [      Special Thanks, Greets      ] !=-----------------


  ---------------------=! [      End Of Message      ] !=---------------------