Tuesday, March 6, 2012

Breaking the Caesar Cipher Through Cryptanalysis

My last post was about the Caesar Cipher which was invented around 50bce and is still in use today.  Here is an example of it failing in modern times.

There are two very simple ways to handle this.  You can even do it without anything more than a piece of paper.  I made a simple string using the function from the previous post.  We are going to take the following string and attack it a few different ways.

Ymnx%nx%f%ytu%xjhwjy%rjxxflj3

There is no way there is something in there!  Someone who knows how frequency analysis works wouldn't agree.  We will take a look at that, but first we are going to try the paper and pencil method.  Get your a pencil and a piece of paper.  Your average piece of paper has 34 lines, but our code can generate through 256 characters.  That means it could take up 8 pieces of paper to solve this.  Using logic, we know that a message would probably contain more letters than oddball characters, which we will look for in the frequency analysis section.

Pencil and Paper:
Take your standard piece of paper and put a negative number on each line, -1  We start at -1, because if we start at 0 and can read it, then it's not encrypted!  This is going to be the key you are guessing with.  Traditional Caesar Ciphers are limited to how many characters are in the alphabet, not 256.  We are going to try and crack this code with one piece of paper!  We are actually cracking it by reencrypting it by the inverted key.  To the right of the numbers, we will figure out each code.  It takes time, but it works.

Example of how a page would look:
-1: Xlmw$mw$e$xst$wigvix$qiwweki2
-2: Wklv#lv#d#wrs#vhfuhw#phvvdjh1
-3: Vjku"ku"c"vqr"ugetgv"oguucig0
-4: Uijt!jt!b!upq!tfdsfu!nfttbhf/
-5: This is a top secret message.
-6: Sghr hr ` sno rdbqds ldrr`fd-

Wow! That was pretty easy.  Of course it wasn't fast and knowing that it might take up to 256 attempts to figure out, you already know this isn't the best method.  The next version of this really just improves on the pencil and paper method.

Frequency Analysis:
Frequency analysis is the study of the frequency of letters or group of letters, usually it is only applied to breaking classical ciphers such as the Caesar.  The idea is based on the fact that in any given stretch of written language, certain letters and combinations of letters occur in varying frequencies.  In English, E, T, A , O and spaces are the most common letters. Z, Q and X are the rarest.  There are sequences known as bigrams or digraphs that are two letters in sequence that are common in many languages.  In English, it is common to see SS, EE, TT & FF are the most common.  What does this mean?  To speed up the pen and pencil method, you can look for characters that are not so common and count the difference between the two characters.  Those would be the keys you want try.  The bigrams are not very useful in short messages, but are very useful in longer messages.

Looking at the encoded message, we see that there are 5 x's and 4 j's.  That's a great spot to start.  There is only one series of letters where two letters repeat.  Finding it that way is a little too simple, so we are doing it a harder way.  We are going to use the 4 j's.  Knowing the most common letters in English are e, t, a, o and spaces, we figure out the character differences between each of these and j.

e(5) and j(10) = 5
t(19) and j(10) = -9
a(1) and j(10) = 9
o(13) and j(10) = -3
space(32) and j(10) = -22

Pretty good chance that it is one of these keys, let's decrypt using these keys.
-5: This is a top secret message.
9: bvw.w.o.‚}~.sq€s‚.{sous<
-9: Pdeo eo ] pkl oa_nap iaoo]ca*
3: \pq{(q{(i(|wx({mkzm|(um{{iom6
22: oƒ„Ž;„Ž;|;Š‹;Ž€~€;ˆ€ŽŽ|‚€I

We hit it on the head right away.  That's why frequency analysis is such an important tool in decrypting messages.

PHP Version

Say you want to solve this with code and you already have the function from the previous article.  The easiest method would be to loop through the possible keys.  Then, we would rely on the user  to see which one is really text.  Another option would be to follow the same method, then hit each string with a spellchecker.  The one with the least errors will normally be your hidden message.

Below is an example of our previous PHP function being used to crack the code.  It isn't up to HTML standards.

// Usage: 
// autoCrack('Ymnx%nx%f%ytu%xjhwjy%rjxxflj3');
function autoCrack($sInput = null) {
// Just for output
$sReturn = "<table border='1'>\r\n<tr><td>Key</td><td>Decrypted</td></tr>\r\n";
// Decryption loop.
// We start at 1, if we start at 0, we are just returning the same string.
// We stop at 256 because it is the same as printing with a key of 0
// Statistically, this normally shows the result within the first 26 rows.
for ($i = 1; $i < 256; $i++) {
// Decrypt and store it in a table for ease on reading.
$sReturn .= "<tr><td>$i</td><td>".caesar($sInput, $i, false)."</td></tr>\r\n";
}
// Finish our output
$sReturn .= "\r\n</table>";
return $sReturn;
}


// Usage:
// Encryption - caesar('Input String', 1, true);
// Decryption - caesar('Joqvu!Tusjoh', 1, true);
function caesar($sInput = '', $iShift = 0, $bEncrypt = true) {
// Inline if statement to set our shift value if encrypting or decrypting
$iShift = ($bEncrypt) ? $iShift : 0 - $iShift;
// We declare the length here instead of in the for statement, otherwise it reevaluates this every time, which is bad for speed.
$iLength = strlen($sInput);
// Declare our empty string
$sReturn = '';
// We are extracting individual characters and getting the ascii code with ord, then adding the shift value and
// converting it back to a char with chr and adding it to our return string. 
for ($i = 0; $i < $iLength; $i++) {
$sReturn .= chr((ord($sInput[$i])+$iShift) % 256);
}
// Now click your heels together three times and say "There’s no place like home"
return $sReturn;
}






No comments:

Post a Comment