Flash - PCRE Regex Compilation Zero-Length Assertion Arbitrary Bytecode Execution







Source: https://code.google.com/p/google-security-research/issues/detail?id=224&can=1&q=label%3AProduct-Flash%20modified-after%3A2015%2F8%2F17&sort=id

There’s an error in the PCRE engine version used in Flash that allows the execution of arbitrary PCRE bytecode, with potential for memory corruption and RCE.

This issue is a duplicate of http://bugs.exim.org/show_bug.cgi?id=1546 originally reported to PCRE upstream by mikispag; I rediscovered the issue fuzzing Flash so have filed this bug report to track disclosure deadline for Adobe.

The issue occurs in the handling of zero-length assertions; ie assertions where the object of the assertion is prepended with the OP_BRAZERO operator.

Simplest testcase that will crash in an ASAN build is the following:


This is pretty much a nonsense expression, and I'm not sure why it compiles successfully; but it corresponds to the statement that 'assert that named group 'a' optionally matches'; which is tautologically true regardless of 'a'.

Regardless, we emit the following bytecode:

0000 5d0012      93 BRA              [18]
0003 5f000c      95 COND             [12]
0006 66         102 BRAZERO          
0007 5e00050001  94 CBRA             [5, 1]
000c 540005      84 KET              [5]
000f 54000c      84 KET              [12]
0012 540012      84 KET              [18]
0015 00           0 END        

When this is executed, we reach the following code:

/* The condition is an assertion. Call match() to evaluate it - setting
the final argument match_condassert causes it to stop at the end of an
assertion. */

  RMATCH(eptr, ecode + 1 + LINK_SIZE, offset_top, md, ims, NULL,
      match_condassert, RM3);
  if (rrc == MATCH_MATCH)
    condition = TRUE;
    ecode += 1 + LINK_SIZE + GET(ecode, LINK_SIZE + 2);
    while (*ecode == OP_ALT) ecode += GET(ecode, 1);      <---- ecode is out of bounds at this point.

If we look at the execution trace for this expression, we can see where this code goes wrong:

exec 0x600e0000dfe4 93 [0x60040000dfd0 41]
exec 0x600e0000dfe7 95 [0x60040000dfd0 41]
exec 0x600e0000dfea 102 [0x60040000dfd0 41] <--- RMATCH recursive match
exec 0x600e0000dfeb 94 [0x60040000dfd0 41]
exec 0x600e0000dff0 84 [0x60040000dfd0 41]
exec 0x600e0000dff3 84 [0x60040000dfd0 41]
exec 0x600e0000dff6 84 [0x60040000dfd0 41]
exec 0x600e0000dff9 0 [0x60040000dfd0 41]   <--- recursive match returns
before 0x600e0000dfe7 24067                 <--- ecode == 0x...dfe7
after 0x600e00013dea

If we look at the start base for our regex, it was based at dfe4; so dfe7 is the OP_COND, as expected. Looking at the next block of code, we're clearly expecting the assertion to be followed by a group; likely OP_CBRA or another opcode that has a 16-bit length field following the opcode byte.

ecode += 1 + LINK_SIZE + GET(ecode, LINK_SIZE + 2);

In this case, the insertion of the OP_BRAZERO has resulted in the expected OP_CBRA being shifted forward by a byte to 0x...dfeb; and this GET results in the value of 0x5e00 + 1 + LINK_SIZE being added to the ecode pointer, instead of the correct 0x0005 + 1 + LINK_SIZE, resulting in bytecode execution hopping outside of the allocated heap buffer.

See attached for a crash PoC for the latest Chrome/Flash on x64 linux.