Tuesday, March 6, 2012

Vigenere Encryption / Giovan's Cipher


The Vigenère cipher was originally described by Giovan Battista Bellaso in his 1553 book La cifra del. Sig. Giovan Battista Bellaso.  Somehow it gained popularity and was misattributed and named after Blaise de Vigenère.  In appearance, it almost appears unbreakable.  It earned the title le chiffre inéchiffrable, French for 'the indecipherable cipher.'  Being such a well known and common cipher, it is anything but indecipherable.  It has a basis of a Caesar cipher meaning that it is a shift cipher, but it switches for each letter of the key, earning it the category of a polyalphabetic substitution cipher.  Much stronger than it's predecessor, but it's still not bulletproof.


Well, let's build a function to generate them.  The math behind encryption and decryption is really the same as a caesar cipher, except that your key depends on which character you are at.  In the function, we are storing each key value into our array named $aKey.  We do this for good programming practice and speed.


Let's review our formula for encryption.  It is almost the same formula.  We set it to handle up to 256 characters.  The only other change is that n is dynamic. n will be the ascii value of the character of the key we are on.  As we increase our loop, it goes back to the first character in the key. 





For decryption, we simply are inverting n.  We are going to do it programmatically, but this is how it works.





Not much different than what we did on the Caesar, just n is "rolling."   Let's take a look at the function I wrote.


// Usage:
// Encrypt: vigenere('Attack at dawn.', 'classified', true);
// Decrypt: vigenere('¤àÕÔÖÔ†ÊÙ„ÇÍØá¡', 'classified', false)
function vigenere($sInput = null, $sKey = null, $bEncrypt = true) {
$aKey = array();
$iLength = strlen($sKey)-1;
// We put our key values into an array to prevent having to calculate
// it every character.  It will increase speed, especially on longer messages
while ($iLength >= 0) {
$aKey[$iLength] = ($bEncrypt) ? ord($sKey[$iLength]) : 0 - ord($sKey[$iLength]);
$iLength--;
}
$iLength = strlen($sInput);
$iS = sizeof($aKey);
$sReturn = '';
// Pasre the message.  
for ($i = 0; $i < $iLength; $i++) {
// Our shift code is stored in $aKey[($i % $iS)]
// We add % 256 to limit it to the ASCII chart.
$sReturn .= chr((ord($sInput[$i])+$aKey[($i % $iS)]) % 256);
}
return $sReturn;
}


Look at that, you passed a hidden message saying "Attack at dawn."  I'd hate to be your enemy come morning.  Of course, I might not mind if I knew a bit about cryptanalysis.

Look for the next post in the series to contain hints on cracking the code.  We will talk about the Friedman test & Kasiski examination to determine the length of the key before we show how you crack the code without using brute force.

I look forward to questions and comments.

Breaking the Caesar Cipher Through Cryptanalysis

My last post was about the Caesar Cipher which was invented around 50bce and is still in use today.  Here is an example of it failing in modern times.

There are two very simple ways to handle this.  You can even do it without anything more than a piece of paper.  I made a simple string using the function from the previous post.  We are going to take the following string and attack it a few different ways.

Ymnx%nx%f%ytu%xjhwjy%rjxxflj3

There is no way there is something in there!  Someone who knows how frequency analysis works wouldn't agree.  We will take a look at that, but first we are going to try the paper and pencil method.  Get your a pencil and a piece of paper.  Your average piece of paper has 34 lines, but our code can generate through 256 characters.  That means it could take up 8 pieces of paper to solve this.  Using logic, we know that a message would probably contain more letters than oddball characters, which we will look for in the frequency analysis section.

Pencil and Paper:
Take your standard piece of paper and put a negative number on each line, -1  We start at -1, because if we start at 0 and can read it, then it's not encrypted!  This is going to be the key you are guessing with.  Traditional Caesar Ciphers are limited to how many characters are in the alphabet, not 256.  We are going to try and crack this code with one piece of paper!  We are actually cracking it by reencrypting it by the inverted key.  To the right of the numbers, we will figure out each code.  It takes time, but it works.

Example of how a page would look:
-1: Xlmw$mw$e$xst$wigvix$qiwweki2
-2: Wklv#lv#d#wrs#vhfuhw#phvvdjh1
-3: Vjku"ku"c"vqr"ugetgv"oguucig0
-4: Uijt!jt!b!upq!tfdsfu!nfttbhf/
-5: This is a top secret message.
-6: Sghr hr ` sno rdbqds ldrr`fd-

Wow! That was pretty easy.  Of course it wasn't fast and knowing that it might take up to 256 attempts to figure out, you already know this isn't the best method.  The next version of this really just improves on the pencil and paper method.

Frequency Analysis:
Frequency analysis is the study of the frequency of letters or group of letters, usually it is only applied to breaking classical ciphers such as the Caesar.  The idea is based on the fact that in any given stretch of written language, certain letters and combinations of letters occur in varying frequencies.  In English, E, T, A , O and spaces are the most common letters. Z, Q and X are the rarest.  There are sequences known as bigrams or digraphs that are two letters in sequence that are common in many languages.  In English, it is common to see SS, EE, TT & FF are the most common.  What does this mean?  To speed up the pen and pencil method, you can look for characters that are not so common and count the difference between the two characters.  Those would be the keys you want try.  The bigrams are not very useful in short messages, but are very useful in longer messages.

Looking at the encoded message, we see that there are 5 x's and 4 j's.  That's a great spot to start.  There is only one series of letters where two letters repeat.  Finding it that way is a little too simple, so we are doing it a harder way.  We are going to use the 4 j's.  Knowing the most common letters in English are e, t, a, o and spaces, we figure out the character differences between each of these and j.

e(5) and j(10) = 5
t(19) and j(10) = -9
a(1) and j(10) = 9
o(13) and j(10) = -3
space(32) and j(10) = -22

Pretty good chance that it is one of these keys, let's decrypt using these keys.
-5: This is a top secret message.
9: bvw.w.o.‚}~.sq€s‚.{sous<
-9: Pdeo eo ] pkl oa_nap iaoo]ca*
3: \pq{(q{(i(|wx({mkzm|(um{{iom6
22: oƒ„Ž;„Ž;|;Š‹;Ž€~€;ˆ€ŽŽ|‚€I

We hit it on the head right away.  That's why frequency analysis is such an important tool in decrypting messages.

PHP Version

Say you want to solve this with code and you already have the function from the previous article.  The easiest method would be to loop through the possible keys.  Then, we would rely on the user  to see which one is really text.  Another option would be to follow the same method, then hit each string with a spellchecker.  The one with the least errors will normally be your hidden message.

Below is an example of our previous PHP function being used to crack the code.  It isn't up to HTML standards.

// Usage: 
// autoCrack('Ymnx%nx%f%ytu%xjhwjy%rjxxflj3');
function autoCrack($sInput = null) {
// Just for output
$sReturn = "<table border='1'>\r\n<tr><td>Key</td><td>Decrypted</td></tr>\r\n";
// Decryption loop.
// We start at 1, if we start at 0, we are just returning the same string.
// We stop at 256 because it is the same as printing with a key of 0
// Statistically, this normally shows the result within the first 26 rows.
for ($i = 1; $i < 256; $i++) {
// Decrypt and store it in a table for ease on reading.
$sReturn .= "<tr><td>$i</td><td>".caesar($sInput, $i, false)."</td></tr>\r\n";
}
// Finish our output
$sReturn .= "\r\n</table>";
return $sReturn;
}


// Usage:
// Encryption - caesar('Input String', 1, true);
// Decryption - caesar('Joqvu!Tusjoh', 1, true);
function caesar($sInput = '', $iShift = 0, $bEncrypt = true) {
// Inline if statement to set our shift value if encrypting or decrypting
$iShift = ($bEncrypt) ? $iShift : 0 - $iShift;
// We declare the length here instead of in the for statement, otherwise it reevaluates this every time, which is bad for speed.
$iLength = strlen($sInput);
// Declare our empty string
$sReturn = '';
// We are extracting individual characters and getting the ascii code with ord, then adding the shift value and
// converting it back to a char with chr and adding it to our return string. 
for ($i = 0; $i < $iLength; $i++) {
$sReturn .= chr((ord($sInput[$i])+$iShift) % 256);
}
// Now click your heels together three times and say "There’s no place like home"
return $sReturn;
}






Caesar Cipher in PHP


As more of your life is spent online, the more data comes online and the need for data encryption grows. Cryptography is the art of "secret writing," as defined by the word's roots.  One of the earliest known forms is the Caesar Cipher from around 50bce.  This is intended to be the first installment of the series in DIY cryptography.  

The Caesar Cipher is one of the earliest and simplest ciphers to exist.  It is easy to understand and easy to break, which we will get into soon.  It belongs to a cipher class known as shift ciphers and is the basis for many modern applications such as the Vigenere cipher and ROT13.

But how does it work?

Well, it takes any letters and shifts them by a defined integer.  If you decide that your integer/key is 5, you increase the letter value of every character by 5.  For example, A would become F, B would become G, etc.  In a traditional version, you would wrap around at Z back to A and the casing would all be upper case.  But, with PHP and the power of the processor, we are able to make it support any character in any language.

I am going to do the worst American thing possible and presume that you speak English in this demonstration.  Since the American English language has 26 characters in it, these are using a modulus to limit it at 26.  That way, your cipher code can be some really large number and still work.  Later, we will be switching this to a much higher number, which will allow us to include all 256 characters in the ASCII chart. 

If you assign all numbers an integer (A=1, B=2, etc) then these two formulas will do all of our math.

Encryption would be as follows.  With x being the integer value of our character in the string and add it to n, the shift value.  We then use modulus (% in PHP) to loop those characters that go past Z towards A.  This ends up leaving us with our new characters integer.  If we started with the letter C(3) and had a shift of +3, we would end up with F(6).  
Decryption is exactly the same thing, except we are reversing n.  So to get back to our original C(3) from F(6), our key is still 3, but we subtract it.


Got it? Great!  But you came here to do this in PHP, right?  We are going to write this as one simple function, with 3 inputs.  They will be your string, the shift value and if it is encryption of decryption.  We are expanding our original idea here and not limiting it to the 26 characters in the English language.  Since it is an example, we are leaving all error checking out.  In a production environment, you should always verify all input. 

Next post will be about adapting this to a rotating cipher based off of a key.  I always look forward to questions, comments and suggestions.

// Usage:
// Encryption - caesar('Input String', 1, true);
// Decryption - caesar('Joqvu!Tusjoh', 1, false);
function caesar($sInput = '', $iShift = 0, $bEncrypt = true) {
// Inline if statement to set our shift value if encrypting or decrypting
$iShift = ($bEncrypt) ? $iShift : 0 - $iShift;
// We declare the length here instead of in the for statement, otherwise it reevaluates this every time, which is bad for speed.
$iLength = strlen($sInput);
// Declare our empty string
$sReturn = '';
// We are extracting individual characters and getting the ascii code with ord, then adding the shift value and
// converting it back to a char with chr and adding it to our return string. 
for ($i = 0; $i < $iLength; $i++) {
$sReturn .= chr((ord($sInput[$i])+$iShift) % 256);
}
// Now click your heels together three times and say "There’s no place like home"
return $sReturn;
}