The LOD/H Technical Journal, Issue #3: File 06 of 11 ||||||||||||||||||||||||||||||||||||||||||||||||||| +-+-+-+-+-+-+/ X+-+-+-+-+-+-+ X L X Secure Data Encryption with Cellular Automatons / L / X O X / O / X D X by / D / +-+-+-+ +-+-+-+ X L X The Mentor / L / X O X / O / X H X A Legion of Doom Presentation! / H / +-+-+-+ +-+-+-+ X_X_X_X_________________________________/_/_/_/ One of the key issues that concerns anyone who has sensitive or illegal information on their computer system is preventing unauthorized access to this information. Even if you hit a key that deletes everything on the hard disk when you see that four-door sedan pull up in the driveway, any idiot with Norton's Utilities (IBM) or Copy II+ (Apple) can recover anything that's on your drive with minimal effort. A delete command only changes a flag in the VTOC (volume table of contents), it doesn't actually *remove* the file from your system. There are two methods to ensure that your data can't be read by a sector editor or recovered by NU. The first is to overwrite everything with a NULL (FF) or anything else for that matter. I've seen one batch file that does a global delete, creates a file that says 'EAT HOT DEATH', and then begins copying it until disk space is full. Unfortunately, you can't always guarantee that you will be able to get to your computer before someone else does. The second method is to encrypt your data. Most people have visions of data encryption being some sort of arcane process akin to summoning demons or talking with Dead Cow Cult members (two closely related process- es.) In practice, it isn't that difficult. This file is intended to show some very short programs that will encrypt data beyond the ability of any- thing short of a dedicated mainframe to crack. How to use: The code examples I provide will be in MicroSoft's AmigaBASIC. It is fairly generic and you should be able to convert it over to IBM, //e,c,gs, Mac, ST, C64, or any flavor of mainframe you like. For those of you setting up systems on Packet-Switched Networks (such as the LOD/H system one of our members is implementing) data encryption should be considered absolutely necessary to maintain security. The terseness of the routines make them easy to insert in a bulletin board also, although conversion into C or Assembly would be necessary for decent speed. Intro to Cryptography: Long before computers were around, there was a need for data security. Everyone used lemon juice as 'invisible ink' when they were a kid, heating it over a candle to bring it out. And everyone has seen the substitution code where "A" = 1, B = "2", "Z" = 26, etc... The easiest form of encryption involves a variation of the previous. First of all, don't think of A = 1 as a substitution, think of it as a remapping. Let's say we have a language made up of the five vowels, and we wanted to remap them to the numbers 1-5. Our map would look like this: "AEIOU12345" and our mapping function would be f(c) = POSITION(c) + x where c = the letter to map and x is the key (in this case 5.) So every time we needed to encrypt a letter, we would take its position in the map, add 5 to it, and come up with the character to substitute. For the entire alphabet, the mapping function would be f(c) = POS(c) + 26 for the map "A..Z,1..26". Your map should be composed of all the characters that will be used for encryption. In a text only encrypter, this will consist of all the printable characters your machine can use. The same method can be used to encrypt binary files, but it's not as clear as text only for a teaching example. The problem with this simple form of encryption is that your average C64 could crack it in a matter of minutes. Enter into the next level of cryptography, random numbers. During World War II the Allied Forces created a system to generate random electric noise, recorded this noise onto a wax cylinder, and scram- bled radio transmissions by mixing the seemingly random noise with the voice transmission. The soldiers in the field needed an imprint of the same cylinder to be able to understand the message. Think of the wax cy- linder as a 'filter' for the crypted message. A random number generator can be easily used to encrypt data providing you realize the following- a random number generator on a computer is not really random. If you initialize the generator with the same seed value on two seperate occasions, it will return the same sequence of psuedo- random numbers. Most BASIC's use the RANDOMIZE command to start the generator off. If you leave off the seed, they get a seed from the system clock or some other fairly random source, providing a much truer random selection. But by declaring the seed yourself, you can be sure that you will be able to reference this same string of numbers, a string that is very hard to figure out without the key (seed.) Program Listing 1 is an example of a BASIC encrypt/decrypt system that uses the built-in random number generator include on the machine (or language implementation.) Program Listing 1 ----------------- REM ************************************************************************ REM REM Ok, this is an example of very basic encryption. It takes the input REM string and the input key and processes them using the machine's built REM in random number generator. This version is written in AmigaBASIC 1.2. REM It can be compacted quite a bit by writting it in C, but it's an easy REM algorithm to crack. REM REM ************************************************************************ INPUT "String to be encoded"; C$ INPUT "Key Please! ";K REM ************************************************************************ REM REM When you use the RANDOMIZE command, it seeds the random number gener- REM ator with the key K. *EVERY* time you seed the generator with the same REM value, you will get the same sequence of psuedo-random numbers. Since REM the built in random-number generator uses a linear algorithm to gener- REM ate the sequence, it's easy (relatively) to crack. REM REM ************************************************************************ RANDOMIZE K REM ************************************************************************ REM REM The only difference between encoding and decoding is which way you REM move in your Q$ array space. Encoding takes the original and shifts REM to the right, decoding takes the codes value and shifts to the left. REM REM ************************************************************************ REREAD: INPUT "Encode or Decode ? ";A$ A$=LEFT$(A$,1) IF A$="E" OR A$="e" THEN A=1 GOTO HEAD END IF IF A$="D" OR A$="d" THEN A=-1 ELSE GOTO REREAD END IF REM ************************************************************************ REM REM Q$ contains all the characters that can be encoded. Every encoded REM character will be mapped to a character in this array. I haven't REM included any non-standard characters, so you will have to customize REM it to your particular keyboard/system. I've included an error check REM that will abort the encryption if it encounters a character that isn't REM in Q$. I have to use the STRING$ command to insert the spacebar and REM the quote into the string. It could also be done with a ASC(##) in REM many basics. You could expand this to include any non-printable REM characters you'd like so you could do non-text files. REM REM ************************************************************************ HEAD: SPACE = 32 QUOTE = 34 Q$="ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" Q$=Q$+"1234567890!@#$%^&*()-=_+[]$;:'.,<>/?X| " Q$=Q$+STRING$(1,SPACE)+STRING$(1,QUOTE) QSIZ = LEN(Q$) REM ************************************************************************ REM REM This is the main loop. L = length of the string to encrypt. In this REM example, I am only encrypting a single string. Most people who will REM actually use this will change the FOR loop to run until an EOF is REM encountered in the input file. Since the syntax for that will vary REM widely from system to system, I'll leave it out. REM REM ************************************************************************ L=LEN(C$) FOR I = 1 TO L REM /* Finds the character I in the input string */ X$ = MID$(C$,I,1) REM /* Finds the integer location of the character in Q$ REM returns variable POZ */ GOSUB LOKPOZ REM /* RND returns a random # between 0 and 1. Multiply it by the REM size of array Q$ and you get the number of positions to move REM when encoding or decoding. */ POZMV = (RND * QSIZ) REM /* If you are encoding, you will shift to the right using addition. REM you take the modula base QSIZ to keep the new character within REM the bounds of Q$. */ IF A = 1 THEN NUPOZ = (POZ + POZMV) MOD QSIZ ELSE REM /* Otherwise, you subtract, which takes a bit more math to keep REM up with. Once you have the distance to shift, you must REM convert it to a positive integer and then subtract two to REM account for the head & tail of the array. */ NUPOZ = (POZ - POZMV) MOD QSIZ NUPOZ = NUPOZ -2 IF NUPOZ < 1 THEN NUPOZ = QSIZ - ABS(NUPOZ) END IF END IF REM /* Now you assign the new character in array Q$ to Y$, and append REM it to your converted string */ IF NUPOZ < 1 THEN NUPOZ = QSIZ - ABS(NUPOZ) END IF Y$ = MID$(Q$,NUPOZ,1) D$ = D$ + Y$ NEXT I PRINT "Original = ";C$ PRINT "Modified = ";D$ END REM /* This finds character X$ in array Q$ and returns an integer REM value of the location. Called from the main loop. */ LOKPOZ: FOUND = 0 POZ = 1 TOP: IF FOUND = 1 THEN RETURN ELSE TMP$ = MID$(Q$,POZ,1) IF X$ = TMP$ THEN FOUND = 1 END IF POZ = POZ + 1 IF POZ > QSIZ THEN PRINT "Error: Character '";X$"' not in array Q." END END IF END IF GOTO TOP REM ********************************************************************** End of Program Listing 1 This method, while extremely simple, tight, and fast, is not fool- proof. Most computers use the following algorithm for generating pseudo- random number sequences: x(t+1) = ax(t) + b x(t+1) = next random number x(t) = previous random number a & b are constants that will cause a fairly even distribution For example, if you were using a three-bit system (8 possible postive integers) you might make a = 3 & b = 7 (there's a reason behind using prime numbers that is beyond the scope of this file.) If you seed the argument with RANDOMIZE 5 you would get the following: First x: x = 3*5 + 7 | Since we're restricting ourselves to three bits, and 22 won't fit in three bits, we'd need to perform a modula 8 on the number. (Modulo divides x by eight and keeps the remainder as the new value of x.) So MOD(22,8) is equal to 6 (16 + 6 = 22). Ok, let's do some simple mapping using our vowel set and the above three-bit random number generator. Let's say that the message reads "AAEOU" Our first random number was 6. Our map looks like "AEIOU12345". POS(A) + 6 gives us 2 as the character. Second x: x = 3*6 + 7 | MOD (25,8) = 1 | POS(A) + 1 gives us E. Third x: x = 3*1 + 7 | MOD (10,8) = 2 | POS(E) + 2 gives us O. Fourth x: x = 3*2 + 7 | MOD (13,8) = 5 | POS(O) + 5 gives us 4. Fifth x: x = 3*5 + 7 | MOD (22,8) = 6 | POS(U) + 6 wraps around the map to A. So our original "AAEOU" is crytped into "2E04A". This may at first seem difficult to crack since 'A' mapped into a '2' on one pass and an 'E' on the other, thus preventing a freuquency analysis from breaking the code. Unfortunately, if someone knows the random number algorithm, they can easily hack out the key. Since most of the people using this will be using it on a pc, it would be trivial to get another pc to hack it out. And even if you protect your random number algorithm, it is still a straight linear algebra problem that an AT could work on over a weekend and probably figure it out, especially if there is a fairly small map to work with. Solution: What we need to do is combine the random mapping with a random number generator that is tougher to figure out. Enter cellular automatons. CA's were first invented in the 1940's when John von Neumann (he of the famous bottleneck) started to explore the mathmatic implications of very simple machines. CA's are made of geometric patterns of cells that change their state at each tick of a clock according to a fixed rule. Early work provided automatons that could imitate a basic computer. Since the CA's are inherently parallel (the entire geometry is updated each clock tick) and easy to put on a chip, there is speculation that the next generation of parallel processing computers will use CA's as a base rather than the Turing machine model. You have probably seen a CA at work and not realized it if you've ever seen the computer graphic simulation 'LIFE' developed by John Conway at MIT to model real organisms. The rule for automaton reproduction was incr- edibly simple: If a cell has two or three neighbors, no change in the cell. Fewer or more neighbors, it starves or is overcrowded to death, and repro- duction occurs when a blank space has exactly three neighbors. Using these simple rules, incredibly complex patterns can be produced. Anything that can produce complex and varied results from a small algorithm is a good target for a random number application. Enter Steven Wolfram from the Institute of Advanced Studies in Princeton, NJ. Wolfram has been doing research on one-dimensional cellular machines, which have the advantage of being able to work with both todays machines and future parallel machines. Wolfram has developed an automaton that is a one dimensional circular array modified by the rule: a(x,t) = a(x-1,t-1) XOR (a(x,t-1) OR a(x+1,t-1)) MOD k Where x is the position in the array and t is the time, k is the number of available characters (k = LEN(Q$)), and a is the one-dimensional array. This rule has several interesting properties. The problem we had with linear algorithms was that simple algebra could be used to analyze the evolution of the algorithm (the patterns produced.) All that you have to do is figure out how *one* cell evolves, then apply that pattern across the entire array. In the above case, there is no way of analyzing the array at time t without loading the initial conditions and running the whole thing. The second thing to note is that there are k to the power of w (where w is the width (number of cells) in array a) possible states the machine can be in, and not all of these states have a predecessor that generates it. These states are called 'Garden of Eden' states, and can only occur if they are set as an ititial condition. As a result, this rule is neither a one-to-one mathmatical mapping, nor is it and onto mapping of the set of arrays into itself. In laymans' terms, this means that for any given array state it is impossible to tell what (if any) previous state generated it by mere pattern analysis. While this isn't a file on breaking codes- about the only way to crack this one that's been discovered is to load *every* k**w state into memory and page through them searching for a pattern. This method can be defeated easily by setting w to more than 30 cells (assuming k=256, all the ASCII characters.) If you are using my array Q$, you might want to set w to 40 or more. Since 256 to the 30th power is about a zillion bits, roughly equal to the largest memory bank in existence, there isn't much chance of anyone breaking it. For the truly paranoid, set w to 50 and sleep easy at night. Anyway, back to the algorithm... Each of the cells is filled on one of the k integers from 0 to k-1, giving each cell k possible states. Wolfram found that the string of bits occupying the 0 position (a(0,1), a(0,2), a(0,3)...) forms a random sequence that passes all statistical tests, sometimes with better results than standard DES algorithms. Let's break this down and see what it's doing. First of all, we will need two arrays. Each array is set up thus: Array for Time One |------| |------| |------| |------| |---->|a(0,1)|------>|a(1,1)|------>|a(2,1)|----->|a(3,1)|-----| | |------| |------| |------| |------| | |--------------------------------------------------------------| Array for Time Two |------| |------| |------| |------| |---->|a(0,2)|------>|a(1,2)|------>|a(2,2)|----->|a(3,2)|-----| | |------| |------| |------| |------| | |--------------------------------------------------------------| The reason we need two arrays is so you can update the array without destroying anything in it. In other words, you start out with array 1 active, then you update the array into array 2 and change the active array to 2. On the next clock tick you will update the active array (now 2) into the inactive one (now 1) and set the active array back to 1. You keep swapping like this. Logically, you only have one array- the active one. To initialize the array, the ASCII values of each character in the key are plugged into the first LEN(KEY$) spaces in the array. If you want to use a short key, modify the code to fill the *entire* array with values of the key (keep repeating a loop from 1 to W pulling characters out of K). Since the key can be anything printable, use something 10-20 characters long that you can remember- "HACK TO LIVE, LIVE TO HACK" is one of my favorites. Anyway, if you use a short (less than 10) key in this program, the distri- bution will be skewed for the first W MOD LEN(KEY$) itereations of the automaton, but will smooth out nicely after that. After the array is filled, it operates exactly like the first program *except* when it need a random number of positions to move. Then it drops down, updates each cell in the automaton, and then reads the value in A(0,time) as the random number to shift by. Let's look at the modified encryption code. REM ************************************************************************ REM REM This is an modification of Program 1 that doesn't use a machine REM specific random number generator, but instead uses a cellular automaton REM algorithm. W is the width of the actual automaton. A is dimensioned REM at 32 to avoid having to reference element 0 of the array, which is REM legal on some systems and illegal on the others. This way it can REM be implemented on anything. Y is set for time 1, Y1 for time 2. REM These correspond to the second dimension in array A. REM REM ************************************************************************ W = 30 DIM A(32,2) Y = 1 Y1 = 2 REM ************************************************************************ REM REM Once again, you can set this up to use files instead of strings. And REM note that, unlike the first example, the key doesn't have to be REM numeric. REM REM ************************************************************************ INPUT "String to be encoded"; C$ INPUT "Key Please! (Can be alpha-numeric) ";K$ REM ************************************************************************ REM REM This is where K$ is broken down into a series of characters and their REM ASCII value shoved sequentially into array A. REM REM ************************************************************************ FOR I = 1 TO LEN(K$) T$ = MID$(K$,I,1) A(I,Y1) = ASC(T$) NEXT I REM ************************************************************************ REM REM The only difference between encoding and decoding is which way you REM move in your Q$ array space. Encoding takes the original and shifts REM to the right, decoding takes the codes value and shifts to the left. REM REM ************************************************************************ REREAD: INPUT "Encode or Decode ? ";A$ A$=LEFT$(A$,1) IF A$="E" OR A$="e" THEN A=1 GOTO HEAD END IF IF A$="D" OR A$="d" THEN A=-1 ELSE GOTO REREAD END IF REM ************************************************************************ REM REM Q$ contains all the characters that can be encoded. Every encoded REM character will be mapped to a character in this array. I haven't REM included any non-standard characters, so you will have to customize REM it to your particular keyboard/system. I've included an error check REM that will abort the encryption if it encounters a character that isn't REM in Q$. I have to use the STRING$ command to insert the spacebar and REM the quote into the string. It could also be done with a ASC(##) in REM many basics. You could expand this to include any non-printable REM characters you'd like so you could do non-text files. REM REM ************************************************************************ HEAD: SPACE = 32 QUOTE = 34 Q$="ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" Q$=Q$+"1234567890!@#$%^&*()-=_+[]$;:'.><,/?X|" Q$=Q$+STRING$(1,SPACE)+STRING$(1,QUOTE) QSIZ = LEN(Q$) REM ************************************************************************ REM REM This is the main loop. L = length of the string to encrypt. In this REM example, I am only encrypting a single string. Most people who will REM actually use this will change the FOR loop to run until an EOF is REM encountered in the input file. Since the syntax for that will vary REM widely from system to system, I'll leave it out. REM REM ************************************************************************ L=LEN(C$) FOR H = 1 TO L REM /* Finds the character I in the input string */ X$ = MID$(C$,H,1) REM /* Finds the integer location of the character in Q$ REM returns variable POZ */ GOSUB LOKPOZ REM /* CELLULAR updates the cells in the automaton, switches the active REM time value, and returns X as the number of positions to shift. */ GOSUB CELLULAR REM /* If you are encoding, you will shift to the right using addition. REM you take the modula base QSIZ to keep the new character within REM the bounds of Q$. */ IF A = 1 THEN NUPOZ = (POZ + X) MOD QSIZ ELSE REM /* Otherwise, you subtract, which takes a bit more math to keep REM up with. Once you have the distance to shift, you must REM convert it to a positive integer and then subtract two to REM account for the head & tail of the array. */ NUPOZ = (POZ - X) MOD QSIZ NUPOZ = NUPOZ - 2 IF NUPOZ < 1 THEN NUPOZ = QSIZ - ABS(NUPOZ) END IF END IF REM /* Now you assign the new character in array Q$ to Y$, and append REM it to your converted string */ IF NUPOZ < 1 THEN NUPOZ = QSIZ - ABS(NUPOZ) END IF Y$ = MID$(Q$,NUPOZ,1) D$ = D$ + Y$ NEXT H PRINT "Original = ";C$ PRINT "Modified = ";D$ END REM /* This finds character X$ in array Q$ and returns an integer REM value of the location. Called from the main loop. */ LOKPOZ: FOUND = 0 POZ = 1 TOP: IF FOUND = 1 THEN RETURN ELSE TMP$ = MID$(Q$,POZ,1) IF X$ = TMP$ THEN FOUND = 1 END IF POZ = POZ + 1 IF POZ > QSIZ THEN PRINT "Error: Character '";X$"' not in array Q." END END IF END IF GOTO TOP REM *********************************************************************** REM REM This is the cellular automaton REM REM *********************************************************************** CELLULAR: REM /* Goes through the loop updating into the inactive time (1 or 2 dep- REM ending on how Y and Y1 are assigned) */ FOR I = 1 TO W A(I,Y) = A(I-1,Y1) XOR (A(I,Y1) OR A(I+1,Y1)) NEXT I REM /* Updates the ends of the array (logical positions 0 and 31) that REM are used in calculating other fields. */ A(0,Y) = A(W+1,Y1) XOR (A(0,Y1) OR A(1,Y1)) A(W+1,Y) = A(W,Y1) XOR (A(W+1,Y1) OR A(0,Y1)) REM /* Assigns the first cell to X as a random number */ X = A(1,Y) REM /* Flips the active time */ TMP = Y Y = Y1 Y1 = TMP RETURN Ok, let's trace through a *small* example. Assume our earlier map of "AEIOU12345" and an automaton of width 5. For a key, we'll use "A15". 1) Assign ASC(A) to a(1,1), ASC(1) to a(2,1), ASC(5) to a(3,1). ("0" will represent an empty cell in this example.) A(time 1) = 0 65 49 53 0 0 0 (Remember that an array of width 5 is going to have 7 actual elements) 2) Now then, we want to encrypt the string "EE3" First, we locate where E is in our map. LOKPOZ("E") = 2 3) Now then, we update the automaton. a(1,2) = 0 XOR (65 OR 49) a(2,2) = 65 XOR (49 OR 53) a(3,2) = 49 XOR (53 OR 0) a(4,2) = 53 XOR (0 OR 0) a(5,2) = 0 XOR (0 OR 0) Since this isn't a tutorial on binary numbers and boolean algebra, you'll have to trust me on this one... a(1,2) = 113 a(2,2) = 116 a(3,2) = 4 a(4,2) = 53 a(5,2) = 0 4) Now we update the ends. a(0,2) = 0 XOR (0 OR 65) a(6,2) = 0 XOR (0 OR 0) Again... a(0,2) = 65 a(6,2) = 0 5) Now we switch the active time from 1 to 2, and our new automaton is a(time 2) = 65 113 116 4 53 0 0 6) We then pull off a(1,2) as the number to shift by. 7) Postion 2 + 113 (we're encoding, so we add) is 5 (modulo arithmatic.) 8) "E" is encoded into "U". 9) We repeat this two more times (you don't really want me to step through it all, do you?) and end up with the encrypted version. Well, that's going to pretty much wrap this file up. If you are interested in more files of this nature, let me know. If you find this totally confusing, but want to learn more, call The Phoenix Project at 512/441-3088 (300/1200/2400, 24 hours a day). Our friendly and helpful LOD/H staff will be glad to assist you. Other people who you might want to talk to about encryption include Dr. Cypher, Tuc, and Prime Suspect. Also, if you are interested in seeing the above algorithm applied in other languages let me know. If there's enough of a demand I'll release C, Modula-2, and ADA versions. This has been a Legion of Doom/Legion of Hackers presentation! The Mentor LOD/H ***************************************************************************** References and Acknowledgments: "How to Generate Cryptographically Strong Sequences of Pseudo-Random Bits"; M. Blum & S. Micali; SIAM Journal of Computing, vol. 13, p. 850 (1984) "Functions of Random Variables"; John Freund & Ronald Walpole; Mathmatical Statistics, 4th Edition; Prentice-Hall Inc., NJ; pp. 240-71 "Building an Encryption System"; Peter Wayner Computer Language, Vol. 4, Num. 12, p. 67 (Dec. 1987 Issue) "Random Sequence Generation by Cellular Automata"; Institute for Advanced Study; Advances in Applied Mathmatics; "Breaking Pseudo-Random Number Based Cryptographic Algorithms"; M. Vahle & L. Tolendino; CRYPTOLOGIA, Oct 1982, p. 319 Also my thanks to: TUC, The Leftist, Prime Suspect, and Dr. Cypher, who all contributed to this one way or another. ***************************************************************************