Wednesday, 29 May 2013

Cracking Substitution Ciphers

Substitution ciphers are a frequent part of many online challenges and CTF competitions, and are always fun to have a look at.

Most of these types of ciphers are fairly easy to crack with just a pencil and paper method, but there are other, quicker ways to get the job done as well.

The most frequently seen letter substitution ciphers are;

> Caesar shift ciphers
Shifting the letters of the alphabet up a fixed number of letters to encode / decode a given text.
> Substitution ciphers
Replacing the letters of the alphabet with randomly chosen letters to encode a given text.
> Position dependant shift ciphers
Replacing letters at a certain position with a shifted value and repeating that position shift cycle on each word or sentence.

These are the easiest to identify and decode.
Decoding can be done by simply writing out the alphabet and identifying the shift by trial and error and testing the (expected) correct outcome.

The most well known Caesar shift is the so-called ROT13, which can be used to both encode and decode a message by shifting the letters up 13 positions.

ROT13 messages are easy to encode / decode with a short one-liner using 'tr' ;
echo "This is a test, one plus one is two" | tr a-zA-Z n-za-mN-ZA-M

This can be expanded to included digits as well (sometimes referred to as ROT18), by replacing 0-4 with 5-9 and 5-9 with 0-4 ;
echo "this is a test, 1 + 1 = 2" | tr a-zA-Z0-45-9 n-za-mN-ZA-M5-90-4

The same principle of shifting letters can be used with any number of shifts.

The easiest way to check for a Caesar shift cipher is to check all the possible shifts of a word, or sequence of words, and verify at what shift the text becomes readable.

There are quite a few scripts / websites which can check and do this for you, but as a lot of them have some  kind of limitation, and as I am one who enjoys re-inventing the wheel ;) I decided to have a crack at making a bash script doing the same as well.
Introducing cshift

Unlike many other solutions found on the interwebz, cshift allows upper and/or lower case, negative values, as well as values higher than 26 (so for instance a shift of -5 characters or of +49 characters)

Running cshift on direct input (quotes!);
./ -i 'Jhlzhy Zhshk' -s 19

Running cshift on an example text file 'test.txt' ;
cat test
./cshift -i test.txt -s 7

'Bruteforce' checking of all possibilities.

Lets have a look at the following text (and assume it is part of an encoded text file)
(different shift value used, not the same as the above example) ;
hgdlwjywaklk afvmuw sfykl
Using cshift's -b switch for the 'bruteforce' function, we can check all the possible shifts and see which shift gives a readable outcome (this is best done on a short sequence of words, to be able to correctly ascertain shift values).

./ -i 'hgdlwjywaklk afvmuw sfykl' -b
(dont like colours ? add the -c switch; ./ -i 'hgdlwjywaklk afvmuw sfykl' -bc)

For possibly less well known words (or if the above colours have half blinded you preventing recognition of readable text..), this can be further simplified, by using cshift's -w switch which allows the the bruteforce output to be checked against a given dictionary or wordlist (use a small wordlist! its slow..).
In this case I have chosen to check against the UKACD list, which is a small wordlist for crossword puzzles etc.

For correct results this test should be done on a single long word (this also helps avoid false positives).
./ -i 'hgdlwjywaklk' -b -w ukacd.txt

So with either just the 'bruteforce' -b switch or together with the -w switch we can see that a shift of 8 letters gives readable text and can use that value to decode the full text / text file.

These are slightly harder depending on the amount of text given to work with.
The less text you have to work with, the harder it is.

Given a substantial amount of text, you can run a letter frequency analysis on the text and check the most frequent letters to create a starting point.
From there it is a matter of a decent vocabulary combined with some trial and error.

When looking at a text encoded with a substitution cipher, it is handy to take note of  few things (based on text being in English);
  • The letter 'E' is the most frequent letter in English, so it stands to reason that the most frequent letter in the encoded text could stand for an 'E'.
  • The letter 'T' is the 2nd most frequent letter in English.
  • Look for single character words; in an English text single letter words will be either 'A' or 'I'.
  • The word 'the' is the most frequent 3-character word used in English, it is also the most frequently used word in general in English.
Using the above, you can usually create a solid starting point and work forward from there.

Some helpful information on letter and word frequencies in English can be found here;

Let's have a look at a test file 'manifesto.txt'

Using the -f switch in cshift we can do a rudimentary letter frequency analysis on the above text file;
./ -i manifesto.txt -f

Now we have the letter analysis on the whole file, lets cut out the first few lines and work on those for a bit ;
head -n 25 manifesto.txt > new.txt
cat new.txt

From the previous letter frequency analysis it looks most likely that ;
K = E
A = T

We see that there are single letter words in use u & w
So, either  ;
U = A & W = I    or    U = I & W = A
From the use of apostrophes noted in the text following U, it seems that U == I & W = A

I use a simple replacement/substitution script using 'sed' with lines written out line by line to make it easier to check and alter as needed.
To make it 'easier' to read I lower case all letters in the text and put substitutions in upper case, then keep on running it on the text file with expected substitutions until words start appearing.
cat $1 | tr '[:upper:]' '[:lower:]' | sed \
-e 's/a/a/g' \
-e 's/b/b/g' \
-e 's/c/c/g' \
-e 's/d/d/g' \
-e 's/e/e/g' \
-e 's/f/f/g' \
-e 's/g/g/g' \
-e 's/h/h/g' \
-e 's/i/i/g' \
-e 's/j/j/g' \
-e 's/k/k/g' \
-e 's/l/l/g' \
-e 's/m/m/g' \
-e 's/n/n/g' \
-e 's/o/o/g' \
-e 's/p/p/g' \
-e 's/q/q/g' \
-e 's/r/r/g' \
-e 's/s/s/g' \
-e 's/t/t/g' \
-e 's/u/u/g' \
-e 's/v/v/g' \
-e 's/w/w/g' \
-e 's/x/x/g' \
-e 's/y/y/g' \
-e 's/z/z/g'
exit 0
(This actually also included in cshift with the -r switch,[./cshift -i input.file -r] but not practical as using it means continuous editing of the script in nano and possibly risking fubarring the whole script ;) use with care !) 

Let's enter the aforementioned probable substitutions in the script and check the outcome.
./ new.txt

From that outcome it becomes clear that ;
X = H
And also following the use of apostrophes we can deduce that ;
N = S

After entering the above and re-running the script, from the part of text that is (semi-)readable we can further deduce that ;
And with a calculated guess (based on thinking of the word TEACH) try T = C

after entering the above substitutions and running script again ;

Now you are already well on your way in just a couple of steps.

Going through the text carefully, you will find that ;
R = K
S = R
L = D
Y = Y
O = L

Entering those values in the script and running it ;

Now its easy to identify the other letters and solve the text ;
q = F
v = B
f = W
p = O
c = G
i = U
j = V
e = P
g = J

Now to show you how to make it even easier ;)

lightningmanic shared a great frequency_analysis java script on the THS forums, including excellent explanations on substitution ciphers and the decoding of same.
This java script does a much better job than my attempts with the above bash scripts and should definitely be in your toolbox if you enjoy this kind of thing.
Download FreqA ;

Check the file contents and then unzip ;
unzip -l

Then open the index.html file in your web browser.
As written in java it should work on most modern browsers in  most OS'.

The script is awesome, it shows the letter frequency analysis, most common two and three letter sequences, and a very quick and easy way to check substitutions.

For quick checks on letter substitution encoded text , this script is definitely what I will be using first.
Thanks for the share lightningmanic !


These are more complicated to find and sometimes come with a hint, sometimes left for the user to figure out.

There are too many variations to go through into it in much depth, but the idea is to basically have the letters shifted a number of letters depending on their position in the word or sentence.

So for instance the word 'computer' with the shift '2, 4, 6.. ' could be encoded into ;
c=(c + 2) == E
o=(o + 4) == S
m=(m + 6) == S
p=(p + 8) == X
u=(u + 10) == E
t=(t + 12) == F
e=(e + 14) == S
r=(r + 16) == H

It comes down to a lot of trial and error, in the past I have used a 'template' like the below to stare at thinking of possibilities.

There are so many possible varations that it can be quite a daunting task and deciphering such an encoded message becomes a lot harder, however with sufficient text and quite a bit of trial and error, success can be achieved !

Should you feel inclined to give the scripts mentioned in this post a whirl, please let me know if any unexpected errors or weird output is encountered.

Edit 30-07-2013
I'll admit I do like the fact that people take an interest and take the time to download scripts that I put up here :)
Sofar, over 100 people have taken the interest to do so, and I would be very interested to hear their thoughts on the cshift script !
(be gentle... ;) )
I truly do appreciate feedback, and although I am only a hobbyist in this field and the code will make many eyes bleed, your thoughts on the script and possible improvements are always appreciated !

Thanks for trying it out !

Saturday, 5 January 2013

Data Obfuscation

Security through Obscurity

Methods of hiding information without it appearing that there is any information, is an interesting topic and I recently got thinking on it following a few image challenges which were posted on various security sites a while ago.
I failed miserably at the challenges, but at least picked a few things up on the way to my epic fail..

Although security through obscurity is not truly secure, it is an interesting method of getting information to someone whilst being hidden to the un-informed.

This post is about the simple methods possible to use to hide info from the un-informed, the methods described are not supposed to be terribly secure, but rather, interesting.

The below done on VMWare Image of BackTrack5 R3 and on a Windows 7 PC.

The first stage is to have a look at the file information and see what information is revealed.


Exif Data
Image files often contain Exif data which can be read in the hex of a file, but using a tool such as Exiftool greatly simplifies this.

exiftool will also give you information on the file type.

exiftool can be run from the command line, and there is also a Windows GUI for exiftool available.
run 'exiftool file.jpg' from the command line and you will be presented with information available in the file which can include things like GPS positions camera make/model, software, comments, etc etc.

General usage on command line ;
exiftool matrix.jpg

There are a huge amount of options possible with exiftool, and it is a fantastic tool to manipulate information in image files.
Check out the links at the bottom of this post for further information.

Hex Data
Nearly all files have a so-called 'header' and 'trailer', some files types only have a 'header'.
The header and trailer of files are sections of the file which identify the file type so Operating Systems understand what fileformat they are dealing with.
These headers and trailers are typically unique for the file type and a good resource for checking the file signatures is ;

So when we have a file to examine, for instance a JPG file, open it with a Hex Editor (I am using Windows based HxD Hex Editor) and have a look at the file headers and trailers.

Image file matrix.jpg;

You will see that the file starts with 'FF D8 FF' and ends with 'FF D9'.

Image file header;

Image file trailer;

If there is extra data after the trailer FF D9, then it is possible that there is some sort of extra data to be found.
The information after the FF D9 trailer can give you an idea of what the extra information could be. In the below example the information after file trailer FF D9 starts with a known file header '50 4B 03 04' (PK.. in ACSII format), so it would appear that there is a zip file appended to the JPG.

So by checking the information in a Hex editor you can quickly see whether the file appears to be what it is supposed to be, or whether something looks out of the ordinary.

With many file formats it is possible to attach file information which can later be retrieved based on the above principal of files having headers and trailers.



For instance in Windows with 'Command Prompt';
copy /b kitty-hack1.jpg + kitty-hack2.jpg kitty.jpg

In Linux with 'cat' (lolz, no pun intended ;) ) ;
cat kitty-hack1.jpg kitty-hack2.jpg > kitty.jpg

The above commands will copy/append the data from kitty-hack2.jpg to kitty-hack1.jpg and name
the output file to kitty.jpg.

When checking the kitty.jpg in a Hex Editor you will see that there is a JPG trailer 'FF D9' followed by a JPG header 'FF D8 FF' of the second file.
So although it looks like 1 file, there are in fact 2 files which can be confirmed by this check in a Hex editor.

I cant post the kitty.jpg here as the photo sharing site I use (Photobucket) removes any extraneous information after the first found trailer on jpeg files.
This 'limitation' could be bypassed though by converting the 1st image to .bmp format (which has no trailer) and copying the 2nd image to the bmp file ;
Result after the conversion of the 1st image to .bmp format and then appending kitty-hack2.jpg with the above mentioned copy /b method ;
copy /b kitty-hack1.bmp + kitty-hack2.jpg kitty.jpg

Just by looking at the hex you would see something is up, and with a search for the JPG header 'FF D8 FF' in the above image you will find that the 2nd file is appended.

This method of hiding files/data by copying/appending to files can be done with many filetypes.
You can also for instance place files in a zip archive and copy this archive to an image in the
same way (zip secret files, append to image file, image file can then also be opened with Archive tool);
copy /b image.jpg + hidden.jpg

Of course it should be noted that this is not at all a secure way of hiding information, but for those not in the know, there is no indication that extra information is even there.

For instance I found this one online a while ago, cant remember where, but in any case if anyone objects to it being posted here, just say the word and I'll take it down.(image/content not made by me)
Below is a .png file and contains a .rar file with content.

Stegonography is the method of hiding information in a file in a way that only the recipient
of the file should know or be able to extract.
There are quite a few programs out there that can do this, but none are really maintained.
Steghide is a popular one which can hide information in various filetypes (JPG/BMP/WAV/AU)
and is installed on most PenTest distros.

If for instance you have a file 'passwords.txt' and want to hide it in an image 'forest.jpg'
you would run steghide as follows ;
steghide embed -cf forest.jpg -ef passwords.txt
You can also specify a different filename for the output using -sf ;
steghide embed -cf forest.jpg -ef passwords.txt -sf forest1.jpg
You will be prompted to enter a password which you can do or else leave blank for no password.

To later retrieve this information you would run steghide as follows ;
steghide extract -sf forest1.jpg
You will be prompted for a password, if there is none, simply hit enter to leave blank
and steghide will attempt to extract the hidden data.



Another interesting one is 'stepic' which is not installed on stock BT5R3 but can easily be done by ;
apt-get install python-stepic

stepic uses LSB (least significant bit) methods to hide any data in an existing png file. It does not include a password / encryption option and so is not a secure method, but works fine to hide data.

stepic -h

File Carving is the process of extracting files from data based on headers and trailers.
This is usually done on whole disc images, mainly for data recovery purposes on for instance damaged or unmounted drives requiring data extraction.
Some programs used for such operations on linux and available on most PenTest distros are for instance 'foremost' & ' scalpel'

The same principal however can be used on a single file if it appears that extra information is available within the file.

So if you find a JPG file (file1.jpg) with extra information after the expected trailer,
you could cut the first part of the file from the header 'DD F8 FF' (start of JPG file)
until the end of JPG file (denominated with FF D9).

In your Hex editor select and cut the data away from (the first if there are more than one) 'FF D8 FF'
upto and including 'FF D9' and save the file.
(you could then paste the cut section to a new file in HxD and save as file2.jpg to see
whether it matches what you saw in initial JPG file for verification)

You should then have a stripped file which you can then check again for file properties.
This file may have different properties, and so again you may have to look for headers and trailers.
(is it a different filetype ? check headers and trailers with the aforementioned file signatures link)

The above sequence is just a simple example. Possibly data you have will require different methods,
however for this example it is to simply show how you can 'carve' one file away from another
when dealing with simple appended files.

Some basic encoding and/or encryption can also be used to further obfuscate the hidden data.
The below examples are very weak methods of doing such, however it is simply to show how
data can further be made difficult to retrieve if you are not aware of the methods used to hide it.

base64 is a method to convert binary data to ASCII characters.
This could be used to for instance append data to an image file in ASCII form even further obfuscating the data.

base64 is installed on most linux distros, to use simply ;
base64 inputfile > outputfile
to decode ;
base64 -d filein > fileout
On windows you could download the bas64.exe from
base64.exe -e inputfile outputfile
to decode
base64.exe -d filein fileout

ROT13 is an 'encryption' that basically moves all letters of the alphabet up 13 letters, a variant of the Caeser shift cipher.
ROT5 is the same method based on moving numeric values up 5 numbers.
Using them together is sometimes referred to as ROT18.

So it is easy to identify and encode or decode, even more so if you have a reference of some kind and is
NOT a secure encryption method, but fun to play with.

If you were to see a line of text like ; uggc://jjj.paa.pbz
You can see that there are similarities with a normal web address, but the wording/letters don't appear to match up to what you would expect.
Run a ROT13 script over the line uggc://jjj.paa.pbz ;
echo uggc://jjj.paa.pbz | tr a-zA-Z n-za-mN-ZA-M
and you will find outcome ;

If there are digits there you could also include a ROT5 script and make it a ROT18.

So a quick and dirty ROT18 one-liner could look like the below ;
encoding with ROT18
echo "My birthday is 01-01-1900" | tr a-zA-Z0-45-9 n-za-mN-ZA-M5-90-4
Zl oveguqnl vf 56-56-6455

Decoding with ROT18
echo "Zl oveguqnl vf 56-56-6455" | tr a-zA-Z0-45-9 n-za-mN-ZA-M5-90-4
My birthday is 01-01-1900

I made a quick and dirty rot18 encoding/decoding script for shits and giggles should it be of interest
which can be run on either input or on a file.

A similar, but more elaborate variation is the rot47 encryption.

Same as the above, a simple script on rot47 encoding / decoding ;

Very basic, but does the job.

Consider this scenario of keeping/sharing your passwords ;
-> Create your passwords.txt
-> Zip passwords.txt and password protect it to
zip -e passwords.txt 
-> Encode with base64 to pass.base
base64 -w 0 > pass.base
(the '-w 0' to prevent linewraps that make it easer to add to an image comment.)

You could even ROT18 the file to further obfuscate the data ;
ROT18 encode the base64 file
cat pass.base | tr a-zA-Z0-45-9 n-za-mN-ZA-M5-90-4 > pass.rot
-> Find or create a nice image that would not arouse suspicion.
As the amount of data is so small, it can be included in the image comment, which is probably safer as if you are using image hosting websites they may strip off superfluous info from the jpg.
-> Add the data to the image comment using exiftool.
info=$(cat pass.rot) ; exiftool performance.jpg -comment=$info

-> Upload image file to photo or file sharing site and send yourself the link or whatever is appropriate.

Performance.jpg ;

Retrieval ;
exiftool performance.jpg -b -comment > out.put
ROT18 decode the file
cat out.put | tr a-zA-Z0-45-9 n-za-mN-ZA-M5-90-4 > rot.out
Decode the base64
base64 -d rot.out >
Unzip the created .zip file.

(password hint; worst 500 passwords)
fcrackzip -Dp worst_500_passwords.txt -uv

A highly cumbersome and not terribly secure method of doing something simple, but still food for thought on what is possible on hiding information in 'plain sight'.

Team THS Challenge 
I made the below file for the Top Hat Security team for an article on this same subject in our members magazine, based on the above possibilities.
See what you can discover and post the outcome here or on the THS forums !

Download the challenge (challenge.jpg) here ;

Edit dd 04-02-2013
No takers / results on the above THS challenge.jpg file  ?!

Some hints then..
The file challenge.jpg contains a password protected zip file, contents of which can be
extracted with a password which can be found in the challenge.jpg image data..

The challenge.jpg actually has 4 images (including challenge.jpg) and the final outcome
of the challenge should be a text file starting with ;

Well done ! 

Challenge complete, hope it was enjoyable !

All required processes are described in the above post, but if you're stuck, leave a comment
with what you have done / tried and I will see if it merrits a response ;)

Top Hat Security
Exiftool GUI
Exiftool forums
HxD Hex Editor
Stegdetect - Outguess
File Signatures

Some incredibly annoying challenges can be found here ;

Google Analytics Alternative