Sunday 24 July 2011

Wordlist manipulation revisited

Work In Progress !

Word List Manipulator
==================
A script to facilitate the commonly used options to make an existing 
wordlist more to your liking.. 
Downloads below and based on using the script in Backtrack although it should work
in most Linux environments.

Google code + WIKI ;  http://code.google.com/p/wordlist-manipulator/

Edit 21-10-2012
Release of WLM v0.7 ;
http://www.mediafire.com/file/p1tn76qw95hobi4/wlm
Video using WLM in BackBox ;
http://www.youtube.com/watch?v=FpflByHLp1I




--------------------------------------------

INTRO
After my posts from just over 2 years ago (wow... thought I would have learned more by now .. )
I thought it would be a good idea to have another, more detailed post on wordlist manipulations based on 'simple' one-liners or simple scripts (sometimes 1 line just doesn't cut it) which can be run over the wordlist.

For some reason I always manage to forget the best way to do the simplest of things using sed and the like, so this is as much a reference for me, as it is hopefully some help to those looking for quick answers !
My intention is that queries on wordlist manipulation posted in the comments are looked at and tested
and then, I will try to post the best solution in doing same.

There will be quite a bit duplication from the previous post on wordlist manipulation, but no harm in that,
I find myself returing to 'old' info all the time..



MANIPULATING WORDLISTS

When you have a wordlist, it often needs fine-tuning or alteration of some kind in order to get the
most out of it, sometimes heavy-duty alteration, other times minor adjustments such as splitting the wordlist into manageable sizes or capitalizing the first letter for instance.

The below examples are based on wordlists that have already been created and need some sort of tweaking or fine tuning.
Of course you can create wordlists from scratch how you like with for instance crunch, however this post is meant solely for altering existing wordlists.

Note that the below examples all done on BackTrack5 and not tested on any other OS.
(although most commands should work on most linux based OS')

SPLITTING WORDLISTS

One of the main issues with wordlists is that they can get hellish big.. and you may need to split them for;
>  for easy storage on portable drives,
> some programs only accept a certain maximum wordlist size,
> distributing segments of the wordlists to have tested by others,
etc.
etc.

First thing to do is to check the size of the file and how many lines(passphrases) are in it so you can estimate
how you can best split it.
In this case using a 6 digit wordlist with lowecase alpha values only.
Check the size of the wordlist ;
For info on size in bytes ;
du -b wordlist1.txt
or
Simple view of size in 'human readable' format (eg. 100K, 100M, 100G);
du -h wordlist1.txt

Get the linecount of the wordlist ;
wc -l wordlist1.txt

So in the above example the size is around 112MB and there are 16777216
lines (so 16777216 passphrases).
When using split to split wordlists, it is best to use split by line count, so that you don't accidentally split the actual words as can happen when you split by size.

Lets say we want to split that file into 3 wordlists, then the above file would need to be split into files containing +-5.500.000 words each.
If you are too lazy to work the little grey cells, let 'bc' do the work for you so you can make an educated guess on how many lines you want to have per split wordlist ;

echo "16777216 / 3" | bc

split -d -l 5600000 wordlist1.txt split-list
-d == giving a numeric suffix to the created split-list prefixes
-l ==  giving the number of lines you want each file to have as a maximum
wordlist1.txt is the input wordlist
splitlist is the prefix for the newly created split files.





















JOINING/COMBINING WORDLISTS

To actually combine seperate wordlists to one list, you can use the 'cat' command as follows ;
cat wordlist1.txt wordlist2.txt > combined-wordlist.txt

Depending on the size of your wordlists this can take a wee while..

You can also combine all .txt files in a directory to one larger file ;
cat *.txt > combinedlists.txt



CHANGING THE 'CASE' OF LETTERS IN A WORDLIST

Changing characters in a wordlist at a given position to either lower case or upper case is a frequent necessity.
Of course wordllists can easily be created with the required case in the required position (see my post on using the awesome crunch) however if you have an existing wordlist (which this post is all about) and need
to adjust the cases as required, this is (one of the ways) how to go about it.

CAPITALIZING FIRST AND/OR LAST LETTERS

First letter;
sed 's/^./\u&/' wordlist.txt


Last letter;
sed 's/.$/\u&/' wordlist.txt


CHANGING LETTERS TO LOWER / UPPER CASE

Changing the first letters of all entries to upper case ; 
sed 's/^./\u&/' wordlist.txt

Changing the last letter of all entries to upper case ;
sed 's/.$/\u&/' wordlist.txt

Changing the first letter of all entries to lower case ;
sed 's/^./\l&/' wordlist.txt

Changing the last letter of all entries to lower case ;
sed 's/.$/\l&/' wordlist.txt

Changing all upper case to lower case letters;
tr '[:upper:] ' '[:lower:]' < wordlist.txt

Changing all lower case to upper case letters;
tr '[:lower:]' '[:upper:]' < wordlist.txt


Inverting the case in the words ;
tr 'a-z A-Z' 'A-Z a-z' < wordlist.txt
or
sed 'y/abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ/ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz/' wordlist.txt


PREFIXING CHARACTER(S)/WORDS TO WORDLISTS

To prefix the word "test" to all lines in the wordlist ;
sed 's/^./test/' wordlist.txt
or
awk '{print "test" $0 }' wordlist.txt


PREFIXING NUMERIC VALUES TO WORDLISTS

To prefix 1 digit in sequence from 0 - 9 ;
for i in $(cat wordlist.txt) ; do seq -f %01.0f$i 0 9 ; done > numbers_wordlist.txt

To prefix 2 digits in sequence from 00 - 99 ;
for i in $(cat wordlist.txt) ; do seq -f %02.0f$i 0 99 ; done > numbers_wordlist.txt

To prefix upto 2 digits in sequence from 0 - 99 ;
for i in $(cat wordlist.txt) ; do seq -f %01.0f$i 0 99 ; done > numbers_wordlist.txt

To prefix 3 digits in sequence from 000 - 999
for i in $(cat wordlist.txt) ; do seq -f %03.0f$i 0 999 ; done > numbers_wordlist.txt

To prefix upto 3 digits in sequence from 0 - 999 ;
for i in $(cat wordlist.txt) ; do seq -f %01.0f$i 0 999 ; done > numbers_wordlist.txt



SUFFIXING CHARACTER(S)/WORDS TO WORDLISTS

To suffix the word "test" to each line in the wordlist ;
sed 's/.$/test/' wordlist.txt
or
awk '{print $0 "test"}' wordlist.txt


SUFFIXING NUMERIC VALUES TO WORDLISTS

To suffix 1 digit in sequence from 0 - 9
for i in $(cat wordlist.txt) ; do seq -f $i%01.0f 0 9 ; done > wordlist_numbers.txt

To suffix 2 digits in sequence from 00 - 99
for i in $(cat wordlist.txt) ; do seq -f $i%02.0f 0 99 ; done > wordlist_numbers.txt

To suffix upto 2 digits in sequence from 0 - 99 ;
for i in $(cat wordlist.txt) ; do seq -f $i%01.0f  0 99 ; done > wordlist_numbers.txt

To suffix 3 digits in sequence from 000 - 999 ;
for i in $(cat wordlist.txt) ; do seq -f $i%03.0f 0 999 ; done > wordlist_numbers.txt

To suffix upto 3 digits in sequence from 0 - 999 ;
for i in $(cat wordlist.txt) ; do seq -f $i%01.0f  0 999 ; done > wordlist_numbers.txt



INCLUDE CHARACTERS AT SPECIFIC POSITION

To include the word "test" after the first 2 characters ;
sed 's/^../&test/' wordlist.txt
or
sed 's/^.\{2\}/&test/' wordlist.txt


To include the word "test" before the last 2 characters ;
sed 's/..$/test&/' wordlist.txt
or
sed 's/.\{2\}$/test&/' wordlist.txt


REPLACE X NUMBER OF CHARACTERS FROM START OF WORDLIST

To replace the first character of each word with "test" ;
sed 's/^./test/' wordlist.txt

To replace the first 2 characters of each word with "test" ;
sed 's/^../test/' wordlist.txt

To replace the first 3 characters of each word with "test" ;
sed 's/^.../test/' wordlist.txt 
or
sed 's/^.\{3\}/test' wordlist.txt


REPLACE/SUBSTITUTE X NUMBER OF CHARACTERS FROM END OF WORDLIST

To replace the last character of each word with "test" ;
sed 's/.$/test/' wordlist.txt

To replace the last 2 characters of each word with "test" ;
sed 's/..$/test/' wordlist.txt

To replace the last 3 characters of each word with "test" ;
sed 's/...$/test/' wordlist.txt
or
sed 's/.\{3\}$/test/' wordlist.txt



REPLACE/SUBSTITUTE CHARACTER(S) AT A CERTAIN POSITION

To subsitute the third character of each word in the wordlist ;

sed -r "s/^(.{2})(.{1})/\1test/" wordlist.txt
or
sed 's/^\(.\{2\}\)\(.\{1\}\)/\1test/' wordlist.txt

To subsitute the third and fourth character of each word in the wordlist with "test" ;
sed -r "s/^(.{2})(.{2})/\1test/" wordlist.txt

To subsitute the fourth character of each word in the wordlist with "test" ;
sed -r "s/^(.{3})(.{1})/\1test/" wordlist.txt

To subsitute the fourth and fifth character of each word in the wordlist with "test" ;
sed -r "s/^(.{3})(.{2})/\1test/" wordlist.txt


NOTE! 
If the number of characters that are to be replaced are actually more than there
are characters in the word, the word will remain unaltered.
So if doing 
sed -r "s/^(.{3})(.{2})/\1test/" wordlist.txt
4 character letters such as the word 'beta' would not be altered as there is no fifth character. 


REVERSE THE DIRECTION OF THE WORDS IN WORDLIST

rev wordlist.txt




REMOVING WORDS WHICH DON'T HAVE 'X' NUMBER OF NUMERIC VALUES

To remove words from wordlist.txt that do not have 3 numeric values

nawk 'gsub("[0-9]","&",$0)==3' wordlist.txt


REMOVE WORDS WITH X NUMBER OF REOCURRING CHARACTERS

Under construction ;)






REMOVING WORDS WHICH HAVE MORE THAN 2 IDENTICAL ADJACENT CHARACTERS

sed '/\([^A-Za-z0-9_]\|[A-Za-z0-9]\)\1\{2,\}/d' wordlist.txt

sed "/\(.\)\1\1/d" wordlist.txt

To delete words with more than 3 identical adjacent characters ;

sed "/\(.\)\1\1\1/d" wordlist.txt


Some great bit of work from Gitsnik on manipulating wordlists to ignore words with


APPENDING WORDS FROM 1 WORDLIST TO ALL THE WORDS IN ANOTHER WORDLIST

See Wordlist Manipulator script at top of page


PERMUTE WORD / WORDLIST
To give all possible variations of a word / wordlist, fantastic bit of perl by Gitsnik ;
Copy / Paste the below and save as permute.pl
chmod 755 permute.pl to make executable.

to test on a single word (for instance "firewall") do ;
cat firewall | ./permute.pl
To test on a wordlist do ;
./permute.pl wordlist.txt


#!/usr/bin/perl

use strict;
use warnings;

my %permution = (
"a" => [ "a", "4", "@", "&", "A" ],
"b" => "bB",
"c" => "cC",
"d" => "dD",
"e" => "3Ee",
"f" => "fF",
"g" => "gG9",
"h" => "hH",
"i" => "iI!|1",
"j" => "jJ",
"k" => "kK",
"l" => "lL!71|",
"m" => "mM",
"n" => "nN",
"o" => "oO0",
"p" => "pP",
"q" => "qQ",
"r" => "rR",
"s" => "sS5$",
"t" => "tT71+",
"u" => "uU",
"v" => "vV",
"w" => ["w", "W", "\/\/"],
"x" => "xX",
"y" => "yY",
"z" => "zZ2",
);

# End config

while(my $word = <>) {
chomp $word;
my @string = split //, lc($word);
&permute(0, @string);
}

sub permute {
my $num = shift;
my @str = @_;
my $len = @str;

if($num >= $len) {
foreach my $char (@str) {
print $char;
}
print "n";
return;
}

my $per = $permution{$str[$num]};

if($per) {
my @letters = ();
if(ref($per) eq 'ARRAY') {
@letters = @$per;
} else {
@letters = split //, $per;
}
$per = "";

foreach $per (@letters) {
my $s = "";
for(my $i = 0; $i < $len; $i++) {
if($i eq 0) {
if($i eq $num) {
$s = $per;
} else {
$s = $str[0];
}
} else {
if($i eq $num) {
$s .= $per;
} else {
$s .= $str[$i];
}
}
}
my @st = split //, $s;
&permute(($num + 1), @st);
}
} else {
&permute(($num + 1), @str);
}
}


..
..
..
Please leave your comments, suggestions, mocking words of wisdom..etc.. so that the post can benefit from
the vast amount of knowledge out there.
 
Google Analytics Alternative