HMAC verification allowing faster decipher iterations

Discussion related to AES Crypt, the file encryption software for Windows, Linux, Mac, and Java.
Post Reply
damiank
Posts: 8
Joined: Wed Aug 01, 2012 4:58 pm

HMAC verification allowing faster decipher iterations

Post by damiank »

One thing I noticed going through the source code. AESCrypt has the ability to quickly verify whether or not the provided password is correct. Now, while being a nice feature, it also presents a significant problem of making the encrypted file far more vulnerable to brute force attack.

Here's why, if the password was NOT hashed with an HMAC inside the encrypted file, decryption software would have to guess the 32byte key, and decrypt the entire file, or a portion of the file, and then "magically determine" the file type to see if it indeed succeed at cracking the key. The software would have to look for dictionary words, etc, to see if it indeed had cracked the file. Given the numerous types of files encrypted with AES, this would take an extraordinary amount of time to do, because, there are potentially thousands of possible file types being encrypted and the number of cycles necessary to determine if the correct password (key via the hash/8192 iterations) has been found would be incredibly high because there would be no way for the decryption software to know whether or not it had guessed the correct password without performing some intelligent analysis on the output.

By storing the password in the file, encrypted/hashed, with an HMAC, yes, you have a fast way of checking whether or not the password is valid, BUT, the MASSIVE DOWNSIDE to this convenience is that it also makes it several thousand times more convenient for the decryption software to guess passwords.

Since the decryption software simply needs to perform the exact same password check that AESCrypt does during a dictionary/brute force/etc attack, there is no need to even look at the ciphertext at all. The decryption software will know immediately within a couple cycles whether or not it has guessed the correct password.

Personally, I would rather enter an incorrect password, and decrypt my aes file into a pile of jibberish, than get an immediate response saying "Bad Password".

Essentially, what you are doing here is enabling a "Quick Validation" of the password for decryption software.

Without this "feature" the decryption software would have to validate the jibberish produced for each possible key tried to validate whether or not the correct password was guessed. The evaluation of this "jibberish" would take an estimated millions of even billions of extra cycles to analyze the output from the guessed decryption key.

I can see the need for the convenience for knowing whether your password is correct or not for website logins, etc, and basic public and non-secure data. But, for personal cryptography files where security is paramount, giving decryption software a major leg up by letting it know right away whether a password is good or bad is a bad idea IMHO.

Personally, I use the Rijndael 256bit 128block to handle encryption/decryption in all of my software for persistent data.

Since my algorithms do not give any sort of easy spot check password/key validation, any decryption software would not only have to guess my key, but, it would then have to figure out what the heck it was looking at after decryption. In fact, most of the data which I encrypt is in essence a result of a binary serialization using a custom serializer, so, even if the attacker was successful at decrypting one of my "Binary Blobs", it would have no clue that it had actually guessed the correct key, because even after decryption, the file still looks like random jibberish.

I truly believe that due to the number of fileformats being encrypted using this software, removing the password check would make it thousands, if not millions of times more secure since the decrypt process would have to somehow identify the resultant decrypted file as a "known file type", or as "something not encrypted anymore".

Being that performing a simple hash/hmac check requires only a few cycles compared to intelligent analysis of decrypted content, you can see my argument here.

It would be extremely easy for me to right a brute force/dictionary attack against the .aes file format since all I would have to do is validate the password, not validate the resultant decrypted file(s). That attack would be very simple and be able to run trillions of iterations compared to software which not only had to guess the password, but, decrypt part of the cipher text and make an informative decision as to whether or not the decrypted ciphertext was in fact really decrypted.

I know this to be very true because just in my personal experience with my own software, since I don't have a quick and easy password check built into my ciphertext, there have been many times, where I decrypt something with the incorrect password/salt/IV and then the decrypted cipher text simpy doesn't work at all, When I look at the decrypted ciphertext in a hex editor, there is no way for me to tell if it is the original plaintext or not because it's just a binary blob of jibberish. So, with my own AES implementation, a hacker would have an impossible time decrypting one of my files since the binary jibberish before and after the decryption process would look the same.

Also, even if a hacker wrote software to specifically locate certain identifiers within by decrypted serialized blob so that it could "know" whether or not it was successful in decrypting my ciphertext, that would only work on that particular filetype. This, given that I use my AES 256 implementation on a wide variety of filetypes, even with a somewhat simple password, it would be nearly impossible for a hacker to decrypt my information, because of the numerous types of files encrypted.

Personally, if I were in your shoes, instead of offering up a simple way to verify the key to the decryption software, I would use a begin-end obfuscation technique which would be applied to the ciphertext FULLY before encryption/after FULL decryption and would require the entire file to be decrypted in order for the obfuscation software to serialize/deserialize the input/output. This way, after each key is tried, the entire ciphertext would have to be decrypted, then and only then could the obfuscation software be run to restore the original ciphertext.

That way, decryption software would not only have to guess the key,to validate it was working, it would have to completely decrypt the file, then run the obfuscation algorithm to be able to validate whether or not the key was correct. A simple streamed obfuscation software would be enough to make this happen if it required bytes from throughout the file to reproduce the original output. Just an XOR + some magic would be necessary, not much at all.

This way, instead of taking a billionth of a second to validate a key, it would take a second or two to validate the key. So, that would make decryption software work a billion times harder, or even more, to guess a single key.

So, with descent obfuscation on the input (inserting bytes all throughout the file, perhaps randomly, for de-obfuscation) and removal of the super easy password validations, I think it would be absolutely impossible to write a decryption mechanism for a file encrypted in such a way. If you think about it, writing some simple obfuscation mechanism wouldn't be that hard at all. The best part of all, is that by adding a simple obfuscation mechanism, the hacker/cracker would literally need to decrypt the entire file to even begin checking if the key was valid. This would take an eternity per guess compared to the super easy password check mechanism currently built into the software.

I love the software, but, I just think the password check convenience would make any attempts at decryption actually viable in the eyes of the cryptoanalyst. Personally, if someone gave me the format I described above, and told me to write a brute force attack against, I would take a pass. Any government agency, unless completely ordered to, would take a pass at something so futile. But, with the way it is now, I could write a brute force attack in minutes. Cracking the key is made so much easier by the ability to instantly know whether or not I've got the correct key.

Crypto software isn't just about keylength and complexity. Once could have a very simple key, yet with a full file obfuscation technique, it could potentially take an attacker years to guess it because of all the work necessary to check each key.

My personal contribution to cryptography was when I wrote something called WepBrute during the times when AirSnort and Kismet were popular. My approach was much different. Instead of looking for "weak 4's" in a lot of packets, I simply wrote an extremely intelligent dictionary attacker which used combinations of names, dictionary words, words with capitalized first and last letters, Number-Letter substitution, i.e. I looked for 3 instead of E or e in a word, I also would do all of the above with a single non-alpha character at the beginning of the end of a password. So many people do that it's almost comical.

Anyway, using AisSnort I tried cracking several networks with no success because they did not send enough traffic to capture and take advantage of the "weak 4's",

After writing WebBrute, I was able to crack about 80% of these exact same networks in literally seconds. All I needed was a single packet, and I was off and running. It was amazing software. Literally, I would go in for security audits and start reading off the WEP keys within seconds 80% of the time.

Granted, that was also because of the structure of the WEP key system, and how easy it was for a quick password verification against the cipher text. Since AES is not structured in that manner at all, there is no quick way to verify passcodes, except in your implementation.

The software works great, etc, but, I just think you've made it way too easy for crypto people to crack the passcodes, it's like giving them a slice of pie before they even start.

I say get rid of the password validation check and let the crackers suffer and decrypt the entire file with each key and then check the jibberish to see if it recreates a file, then after that, they have to identify if it's a valid file type! Basically, it would be impossible to crack. It would simple take too long. Your talking about a zillion lines of code, not to mention the massive amount of File IO necessary to fully decrypt everytime to check a key. What if it was a RAR file that was a few gigs, it would take forever to check even a single key.

You should at least make it an option for the user. Build it directly into AESCrypt, it would be very very simple to do, you would simply take the password, iterate and hash it 8192 times, then just encrypt with the IV and salt. On the flip side, you would do the same, Then if you wanted to really create havoc for any cryptographers trying to crack your AES files, you would simply put in a randomized byte obfuscator so that the file would look to be complete jibberish because of the random encoding blocks stuck throughout the file, from beginning to end. This way the cryptographer couldn't simply divide up the bytes and run some kind of sequence check at the first, last, 10th, 20th, byte, etc. Rather, make the obfuscation randomized whereas the byte sequence would simple identify the XOR'ish hash and the location of the next byte sequence. And since it would be random, the sequence could potentially exist on the 999th out of 1500 blocks and on the 1500th block, or it could be on the 2nd, 55th, and 99th out of a 100 blocks. The bottom line is that it would force the cracker to take a pass on the decryption. Personally, I wouldn't even attempt to crack something like that. Well, unless it was a certain supermodel's <blank> ... Then, maybe.

Thanks for great software,

Damian Kohlfeld
damiank
Posts: 8
Joined: Wed Aug 01, 2012 4:58 pm

Re: HMAC verification allowing faster decipher iterations

Post by damiank »

Just wanted to add that the block multiple from the Obfuscator portion would of course be 128bytes.

Depending on the implementation of course, if this was done, the decrypt procedure would pass with flying colors as I've just tested and implemented it myself. No matter what password I enter, I get a successful decrypt as long as the multiple is on target.

I just wanted to double check my statement earlier, and yes, no matter what password I put in, it decrypts fully and without error. Thus, if AESCRYPT made it an option to not embed the password check or any kind of passcode check, and the implementation used blocksize multiples it should work. On the other hand, I'm thinking that if using CBC mode, then there is a chance that there would be a failure on decrypting the last block. Perhaps GCM would be the way to go then. Or, perhaps not, perhaps parallel encoding would enable last block testing from the getgo. Just a thought.

Even if a failure occured on the last block, it still would make an incredible amount of difference when cracking the source file. The idea stands. Without the password, file, checking, attacking the .aes format would prove to be impossible, simply because of the resources involved in testing each possible key.

Bottom line, if just checking the digest of a password, the AESCrypt implementation makes it far to cheap resource wise to brute force, or do an intelligent dictionary attack.

This should definitely be a user configurable option as to whether or not make it simple to brute force the password. I personally think that while length and randomness are important things, making it almost impossible to validate each key is a much more worth advantage. If the CBC mode fails on the last block, which it presently is not doing in my implementation, but, I can see how it could, It would still make a tremendous difference in level of difficulty for the attacker.

Just noticed this was a Voip site, oh boy, let the flaming begin... of all the luck. geez.

Damian Kohlfeld
User avatar
paulej
Posts: 593
Joined: Sun Aug 23, 2009 7:32 pm
Location: Research Triangle Park, NC, USA
Contact:

Re: HMAC verification allowing faster decipher iterations

Post by paulej »

If a user uses a trivial password like "secret", then the password can be easily discovered with a dictionary attack. This would be true no matter whether we store the session IV and key in the file or not. What you're arguing is that we are able to increase security through obscurity. Specifically, you're arguing that by making it easy to determine that a password works weakens the security.

That's not true. The strength of the encrypted file is dependent on the strength of the password. If one uses a 384-bit password, for example, the key used to encrypt the session IV and key would be virtually impossible to crack. Basically, one would have to perform a brute-force attack on the 256-bit key used to encrypt the session IV and key.

So, let's assume we did get rid of the session IV and key and only used the password directly to encrypt the file, as we originally did with file file format version 0. In that case, the attacker could get the plain text readily if the password was "secret" or something so trivial. Now, the attacker would have to figure out that the decryted file is, as you said, actually the plain text. However, that's not a complex exercise for the vast majority of encrypted files. Most file types have a clearly identifiable fingerprint of some sort. See the UNIX "file" command as an example. Others are usually not random data, so just inspecting the first 128 octets or so can usually hint strongly as to whether the file is or is not garbage. If the file looks random (and this can be determined by automated analysis) then it would indicate the wrong password. Only if the decrypted data happened to be random (e.g., some other encryted data) would this be challenging.

As I said, avoiding use of HMAC and the session IV and key is just trying to provide security through obscurity. It will not work. If you wish to have strong security, the key (password in the case of AES Crypt) needs to be strong.
User avatar
paulej
Posts: 593
Joined: Sun Aug 23, 2009 7:32 pm
Location: Research Triangle Park, NC, USA
Contact:

Re: HMAC verification allowing faster decipher iterations

Post by paulej »

damiank wrote:Just noticed this was a Voip site, oh boy, let the flaming begin... of all the luck. geez.
And we use encryption for voice and video communications, too. And?
damiank
Posts: 8
Joined: Wed Aug 01, 2012 4:58 pm

AESCrypt basic brute crack 1000x faster, proof here.

Post by damiank »

Proof of Concept AESCrypt Cracks 1000x faster if it's gonna crack.

True, but, 1000x faster, is still 1000x faster. My argument is very sound. If you have a distributed brute force cluster with FCPGA, or ASIC cards handling the SHA, and a lot of engineering is put into batching, brute force cracking prefaced with an intelligent dictionary attack can be very effective. Along with extensive personal bio on the target.

Without any special code, or any special modifications, I clearly prove here that when using brute force, it will crack AESCrypt at least 1000x faster than a non-password-check embedded format.

1000x is a lot. 1000 years becomes 1 year, a 1000 days becomes a single day.

Point being, if I need to crack some AES files, and they're in different formats, I'm definitely going to start with an AESCrypt file format > 0. I'd be crazy not to, if there are multiple crypto mechanisms on the target's machine. Usually the target uses the same password, or password style across different media's. Most people have no clue what a key manager is, or a keyring. So, they just use variations of the same thing over and over. From my experience. So, if a target has an AESCrypt file and a LUKS partition, OpenSSL AES File, TrueCrypt file, I'm definitely without a doubt hitting the AESCrypt file first.

If I'm doing brute, and doing it well, meaning distributed, ASIC Algo's, AES NI, etc, with heavily parallel tasking, which I've done many times before, with much success DUE MAINLY TO WEAK PASSWORDS OR PASSWORDS RESEMBLING THE ENGLISH LANGUAGE OR MAINLY WITH ELEMENTS OF PERSONAL INFO!!!

Even the slightest resemblance to anything personal especially. Birthdays, DL numbers, SSN pieces, ATM PIN Number pieces, TX Number pieces, Zip Code pieces, etc. I've spent 20 years consulting, and on many many occasions I was hired for forensics. Specifically to decrypt something or to un-scrub/restore files. Mainly from hostile employees, and those jobs, by far paid the most on success. About 20x more than my normal rate. I not only used my very extensive gigabyte dictionary base, but, I had hundreds of gigs of mixed data matching pieces or entire portions of randomized personal information in conjunction with dictionary words which I would build at runtime from the extensive BC's of the targets provided to me, that was always my first request.

In my experience, 80-90% of people use some form of dictionary word, or some form of personal information in their passwords. Now, I don't know about the rest of the world, but, in my personal experiences, I was only unsuccessful about 5-10% of the time, when I had a full bio of the individual. Without a full BC, I would say my average was about 70%. I'm under so many freakin NDA's, thats about all I can share. I can say this though, these people were very smart, educated on average with a Masters or better. Smart people, but, oh so dumb when it came to picking passwords. I would love to be able to list a few, some of them are hilarious. What is it with people constantly inserting !'s and other shift-key's, or spelling backwards, or adding digit replacement, or using ZIP codes, ATM pins, TX numbers, yeah, phone numbers are ever so common for some reason? People must be drawn into using personal bio stuff for passwords. I have no clue why, but I've seen it so many times, especially initials. What is it with initials? Oh, and 123, 1234, 12345, saw that a lot too,

In this basic test I demonstrate without any efficiency measures taken that cracking an AESCrypt encoded file is ~1000 times faster than using the same format by OpenSSL (OpenSSL doesn't include built in password check, which makes it 1000x stronger against brute than AESCrypt),

To demonstrate how much more successful a cracking attempt at AESCrypt will be than an attempt at an AES256 encrypted file without a builtin password check, I created 2 encrypted files using the same source file of random data just over a gigabyte in length.

I used AESCrypt to encrypt the file to Random5.aes

I used OpenSSL to encrypt the file to Random5.enc

I created 2 scripts to attempt 50 failed iterations on the file. Since openssl has error detection built into checking the last block, I specifically chose their C implementation to test against AESCrypt.

I did not modify the source code in any way with either program. OpenSSL has a very heavy error library, but, I left that in anyway to keep the test portable and simple.

I used linux on an old AMD Server to test 50 failed password guesses on both OpenSSL and AESCrypt to show the difference in cracking times for AESCrypt and OpenSSL.

For 50 iterations of AESCrypt on the file it took:

(damian@hsp1:~) time ./script1.sh &>/dev/null

real 0m0.693s
user 0m0.664s
sys 0m0.029s

For 50 iterations of OpenSSL on the file it took:

(damian@hsp1:~) time ./script2.sh &>/dev/null

real 11m0.564s
user 8m14.255s
sys 2m26.734s

Script1 executed 953.19 x faster than Script2

This means that without creating specialized code, just using AESCrypts built in password checking code, I can build a distributed cracking system which will be able to crack AESCrypt encoded files 1000x faster than a traditional AES256 encrypted file which does not have any soft of built in password check. Whether it's a 1000 years or a 1000 days, doesn't matter, a 1000x faster is a 1000x faster. Thats a lot.

So, if it takes 1000 days to crack an AES256 file, that same file, encrypted with AESCrypt will only take a single day.

Yeah, this is a very big deal.

----Script1.sh ------
#!/bin/bash
i=0;
while [[ $i -lt 50 ]]; do
i=$i+1;
aescrypt -d -p ddfsa -o random5.dec random5.aes;
done

----Script2.sh ------
#!/bin/bash
i=0;
while [[ $i -lt 50 ]]; do
i=$i+1;
openssl enc -d -pass pass:ddfsa -out random5.deco -aes-256-cbc -in random5.enc;
done

To create the file, I created a 1 gigabyte file from /dev/urandom with dd, then encrypted it with AESCrypt with:

aescrypt -e -p a -o random5.aes random5

For openssl:

openssl enc -e -pass pass:a -in random5 -out random5.enc -aes-256-cbc


Test it yourself, this is completely non-optimized, and perhaps the slowest of possible cracking techniques. But, if just the base programs show a 1000x difference in speed, I would say some serious work needs to go into the AESCrypt format.

1000x faster is a big deal. I know, if your password is Godlike, then no, it's not such a big deal, but how many of you really have Godlike passwords.

I remember doing a security audit on a Unix network, and they wanted me to recover all the shadow passwords I could for the entire firm. ~4500 people.

I nailed about 75% of them just with my wicked dictionary, substitution, evolving word jumbling generator.

Some of the passwords used were very embarrassing for the user when read aloud, especially the ones with the F#$# word. Yeah, those went over well for corporate. ;)

I bet with some serious optimization on code in hardware (ASIC's a lot of them) for the 8192 RSA iterations, etc, I could bring that number to 2000 or even 3000 times faster. Or, even by expanded formulae, through circuitry. Depends on how the RSA algorithm breaks down iteration wide, I've never tried to make an FCPGA, or an ASIC from that. You'd be very surprised at how much America spends on private consulting to recover encrypted data. I'm just talking about corporate America. I've only done Gov work a couple times.

A lot of the jobs were just part of penetration testing. It's so easy to land a job when you start reading off passwords 5 minutes after you hook up a sniffer, or drop a CD into a drive. Clients see that, and they hire you right away,

Damian Kohlfeld
damiank
Posts: 8
Joined: Wed Aug 01, 2012 4:58 pm

Re: HMAC verification allowing faster decipher iterations

Post by damiank »

The voip comment was something completely different. Someone will figure it out eventually. Then the flaming will begin. Lets see how long it takes.

Damian Kohlfeld.
damiank
Posts: 8
Joined: Wed Aug 01, 2012 4:58 pm

Re: HMAC verification allowing faster decipher iterations

Post by damiank »

Regarding the strength of the passwords being important, I'm not disputing that at all. But I will bet my life that the majority of the users of your software have passwords I could crack within a month tops. Thats how sure I am. I would also bet that anyone using scrub software has a cached copy of whatever they tried scrubbing on their machine they have no clue about. Especially if they run Windows. Good God, so many copies, of anything touched...

Keep in mind, I am the predator in this case, I use personal crypto because I got hacked once, but, i don't have anything important to hide, I get hired to crack crypto and recover files. The reason I'm even here today is because I'm getting paid to find weaknesses in a list of software. I just so happens though that I use this particular software myself too. My stuff is locked up with > 30 character passwords, maybe ;) , and I always use opensource, and believe it or not, a lot of crypto software uses static entropy embedded within the code. Crazy huh? True though, mostly it's closed source, stuff, or older stuff. You can see it plain as day, usually as some sort of static constant. Some byte array usually.

Now, I'm also a jack of many trades, the whole reason I started doing pen-testing and forensics as a primary source of income is because I once had a server at a clients location that got hacked. On this server I had CVS running. Now, a new consultant came in to work at this company and I of course had direct access to this server of mine, as I pretty much was in control of everything at this company. Anyway, the developer (consultant) wanted to push his repo to a box and go home to work on it there. So, I setup a quick account, "cvs", "cvs123". God's truth. Anyway, he used it, and I completely forgot about it. Now, up until this point I had done plenty of pen-testing/recovery, but not a whole lot. This happened after I authored "WEP BRUTE" the 90 second network cracker..

Well, about 3 months later NASA shows up with the Secret Service. Oh yeah, what fun that was. A root kit got installed on the box, luckily, I had firewalled the box internally, so it was unable to access any internal resources. But, not only did we get a call from perhaps the largest WWW hosting provider in the world, but, we gotta call from NASA. Yep, the rootkit has successfully hacked a machine at NASA and they told me to leave the machine on and that they would be there in a day or two. (NEVER USE PORT 22 FOR SSH!!!!!)

So, these two guys come walking in in suits, and one of them was carrying a suitcase-computer called the "Forensics 2000". You can probably look that up, I dunno. Anyway, I had to walk him through hooking it up to my personal server under my desk. I had setup a static NAT from the External firewall all the way to this server I ran under my desk. Anyway, because I had the NIC in VLAN Trunking mode with about 6 different VLAN's, etc, he was unable to use his network interface to image the machine. So, after "interviewing" me for about an hour, they went ahead and turned the machine off and imaged the hard drive. Now, this part is where it got un-nerving. The one guy took off his jacket and had the biggest handgun I'd ever seen in my life, and was sitting at MY DESK at a huge company. So, you tell me, how did that make me look, especially since I ran the entire network, all the administrative stuff, everything? In a twisted Miracle, I was not fired. All because of the fact that the company was safe because I firewalled the machine, and because of the next paragraph.

Now, when I first heard about the attack, I did something which I cannot talk about. But, lets just say the people got caught 3 months later and were in the newspaper. Anyway, that was the day I started studying everything I could about crypto, rev. engineering, pen-testing, security audits, etc.

I did a ton of security audits after that, and still do. I was just taking a look at AESCrypt today for a deeper look at the source because of a job, not a pen testing/forensics job, a job which requires me to evaluate a list of different encryption software in use at a company. I personally have been using the software for a long time, perhaps, 5 years now, so never expected to find any holes. But, this one is insane. I mean common, no special mods and I can eval passwords 1000x faster, using STOCK CODE!!!

While I primarily use truecrypt, I had a couple AESCrypt files, so, I today did a code audit, and found a massively exploitable hole in my opinion.

The reason I say it's so exploitable is because the average user, in fact, most users, will not set proper passwords, It's a fact. I've done too many jobs, for too long, to tell you otherwise. No matter how many websites, advertisements, tv commercials, whatever, you do to spread the password Gospel, people will still screw up, Sometime, somewhere, on their computer, they will screw up.

So, I believe that for an encryption software to truly be safe, it has to expect people to use passwords based on all the criteria I listed earlier. They just do, I don't know why, it's habit, but, I guarantee the majority of your users will never read this list, and they will never have a clue about how dangerous the format is. If their using awesome passwords no problem, but, I know they're not. No matter how many times you tell them they won't listen.

The first 20 or so asset recovery jobs I did, I NEVER expected any success, boy oh boy was I wrong!! Now, I expect success.

In all fairness, all of my jobs have been on American targets. So, due to cultural differences, I won't even speculate on users in Europe, China, etc. I just know corporate America, and its love for pathetic passwords. Like I said before, my favorites are the ones I read out like " User: John Doe, password F#$# YOU" I love those, the corporate guys really get a rise from that sort of thing. At any big company, I always find a few like that. They figure it''s a password, nobody will ever see it... Yeah right. Think again.

If I can't get a password in a couple days, I try another, then another. I just look for any weakness I can find. So, in this case, I found a definite weakness.

If I have a list of files encrypted with AES and a couple of them are using AESCrypt, which ones would you brute/dictionary first??? Common sense man, you'd go for the ones you can brute/dictionary 1000 x faster than the rest.

Again, when I say Brute/Dictionary, I'm talking about a password generation system which is far beyond anything you'll ever be able to purchase, far beyond human concept. I've spent almost a full 20 years writing this software, and I change it as people's password trends change. I told most of it's secrets above, and those are 100% true. I DEFINITELY SEE PASSWORD TRENDING!! Anyway, I really hope you consider this from my point of view, a pen-tester, someone who is attempting to crack your software and look for any weekness.

You have a weekness here, a big one. Users don't set good passwords. 1000x faster really helps me a great deal if I get hired to do a recovery. When I do shadow dumps, or LSA dumps, I always look for trends, and build additional generation software around those trends, You would be amazed at the trends people use when making passwords.

Damian Kohlfeld
User avatar
paulej
Posts: 593
Joined: Sun Aug 23, 2009 7:32 pm
Location: Research Triangle Park, NC, USA
Contact:

Re: HMAC verification allowing faster decipher iterations

Post by paulej »

I know very well what you're referring to. As I said, the first version of AES Crypt more-or-less did what you describe. One would have to decrypt the whole file before learning that the password was wrong. We changed it so that we use the password only to decrypt the session IV and key. That made determining that the password was wrong significantly faster. We want that. It's not a bad thing.

AES Crypt does not provide security in terms of 1,000 years of processing. Rather, it's provides security that can take billions of years of compute time. Divide that by 1,000 and you still have far more years than I'll ever be alive.

If the password is weak, the file can be cracked in no time at all. It would be trivial. One cannot use weak passwords. How strong do they need to be? Here's the answer:
http://www.packetizer.com/security/pwgen/

A good password, especially one that is 16 characters or more, will provide an incredible level of security. If it is a password not comprised of words, but truly random, it would take a very long time to try all of the possible permutations.

Let's assume for a moment that we did go the route of providing no clues as to whether or not the file could be decrypted properly or not. You argue providing an explicit clue is damaging. I argue it makes no difference. Look at the attached documents. These are histograms of various files on my system, including /etc/passwd, a small Perl program, the Linux cat command, and an encrypted version of the cat command with AES Crypt. You can immediately see from visual inspection what is random data and what is not. There is an extremely high probability that whatever a user encrypts I can use this approach to "guess" whether the password I tried worked or not. I do not need to process the whole file. I only need to process enough to build a histogram. The wrong password will result in random output and will look like the encrypted file. The right password would have very different characteristics. One should be looking for something relatively close to an even distribution and, seeing that, conclude the password is wrong and move to the next try.

Bottom line is that strength comes through strong passwords and a good block cipher like AES.
Attachments
File Histograms.docx
(24.04 KiB) Downloaded 369 times
File Histograms.pdf
(127.4 KiB) Downloaded 546 times
damiank
Posts: 8
Joined: Wed Aug 01, 2012 4:58 pm

Re: HMAC verification allowing faster decipher iterations

Post by damiank »

When I started the thread, I was focused primarily on the issue if speed in cracking via brute/dictionary/my generator, against specifically your implementation of AES.

In this same thread, earlier I also pointed out a solution to make the password check take infinitely longer for any brute/dictionary/hybrid generator software. The part about "Obfuscation".

So, essentially there are 2 parts to my post, the built in password check requiring too few CPU cycles to validate the generated password, and secondly, a solution to the problem.

With regards to my initial post on the timing, I know, that even with built in histogram checking, or just an in memory plaintext header check, or even both, that any software written for key validation will specifically be able to crack the AESCrypt format faster than a traditional format. Now, as you pointed out, IT ISN'T NECESSARY TO CHECK THE ENTIRE FILE, JUST THE FIRST FEW BYTES. I agree fully.

So, my assumption of it taking a thousand times longer based on simply using the applications themselves as a password check WOULD NOT BE TRUE IN PRACTICE, however it would be many many times FASTER. To what degree, I don't know.. YET. I already have something blueprinted in my head which would actually tell EXACTLY HOW MUCH LONGER a traditional format would take vs. the AESCrypt format.

I still believe that a custom designed crack tool will crack the AESCrypt format SEVERAL TIMES faster because of the built in password check, but, I also have to take into account the 8192 RSA iterations, which while negligible, they are still there nonetheless. On the flip side, writing a plaintext identification tool is going to require a significant amount of work, especially if the tool is also able to guage filetype by "chr() mean/peak ratios" which is how I would interpret your histograms in code. In the end though, I would still venture that the validation time is much longer for a traditional AES format.

The password check raised concern because having done asset recovery so many times, and knowing just how difficult it can be, specifically because the software I use takes a long time (several nano-seconds, potentially milliseconds depending on type/length) to check an encrypted key. While I wrote the dictionaries, generators, infrastructure code, I did not write the AES Key checking software I use for a variety of AES file formats. But, I will definitely be looking under the hood after this to see if it does file validation withing the first few hundred bytes, etc. I'm sure it probably does, I just have not ever needed to analyze how it's implemented. All I know is that it's pretty fast, and yes, I do notice much longer key validation times on some formats than others. I will be checking the source after your point, because it would be extremely simple to do a "histogram" analysis.

To do it simply, I would simply iterate through those bytes counting character occurrences, calculating the mean, then take the mean difference percentage from the max, or perhaps max(n-1) for short files.

In the GUI I wrote, for my generators, nodes, I have a small window which displays the "keys / sec" value for the entire Grid. 99% of the AES Files I recover use a format which uses a password gen system to generate the 16 or 32 byte key (128/256bit). I have an extremely fast multi-threaded generator which is able to send batches of keys to each node, and then those nodes simply attempt a check for each key/password in the batch and return the execution time/batch size. These batch times and sizes are simply added to a revolving average to give me a somewhat weighted (averaged against prior 5 minutes) number so I know if there is a problem somewhere in the grid. If there is a problem the number drops gradually, and then I know some process is hung, or something is happening which shouldn't be. I am notified via various communications mediums when this number drops below a certain preset percentile. I usually run for about a month or two at the most, depending on the recovery bonus.

However, since I was validating the effectiveness of various software for a job, AND, I PERSONALLY USE three of the items, one being AESCrypt, in my home office, I diverged earlier in the thread and went tangential for bit when I posted an idea about making the AESCrypt fileformat virtually impossible to validate without checking the entire file at least once, then possibly many times, depending on cracking implementation. It was the couple paragraphs regarding "Obfuscation", for which I will show you an example implementation.

On that other note about "Obfuscation":

What I was talking about was a very quick randomization "Obfuscation" type crypto which could be ran against the file pre-encryption, and post-decryption. If an obfuscation technique was used which would require the full file contents to be read once or many times in order to build the simple key + IV used to un-obfuscate (decrypt) the file, it would make key checking an extremely difficult process. While creating an optimal solution for this would require some thought, the most basic and simple mechanism would be dual encryption. So, you would increase your CPU time to decrypt, but, to make deciphering impossible, and far exponentially more difficult with filesize, I would definitely do it, because the keycheck would simply take too long, and this would give a great deal of EXTRA protection for normal users.

It would give the regular user some very real protection against clusters of CPU/RAM only machines crunching against their regular passwords. Oh, how so many are "regular". To date, I've never once gone in for a pen-testing pre-sales job and NOT been able to extract some if not most passwords due to weaknesses. (America only again..)

The reason I say this would be a good thing is because this is exactly what I am presently doing with most if not all software I write which persists data to disk, or across a network, through serialization. My resultant plaintext is ciphertext with no identifiers of any kind which would make it possible to detect as a real file. Because the brute method takes makes no difference at all im my case. First off, I never use the same mechanism twice, so, if i was the target. The analyst would first have to figure out a way to ID my files once cracked. Good luck... None of my files are padded, its part of the serialization process I have been using for awhile. It's been approved for use by perhaps one of the biggest, if not the biggest, well, just a big company. My specific serialization technique is similar somewhat to what I propose for AESCrypt below. In my personal serialization, the resultant plaintext is always a multiple of the blocksize before encryption with no discernible padded blocks.

The reason for so much interest in adding this feature to AESCrypt, is like I said before, I use it. What I happen to use it on is my rsync'd gzipped tar file backups of all of my servers (at my home office). I have AESCrypt encrypt these tar.gz files everyday, using a ticket derived passcode. I don't really use passwords at home, I've actually got some descent biometrics (hi-res CMOS, multi-ordered-print), and some additional things, not the typical finger scanner on laptops. Many of which are fairly easy to crack IMHO, if the world only knew... In most cases, a fingerprint key is at the very very most 64 bit, and even then you don't need to crack the key, you simply grab storage key off the USB device itself, or get it from the vendor in some cases, and decrypt the key and you are done. So, if I'm not logged in somewhere, the backup's don't happen. However, I do always stay logged in to at least a single machine (locked of course). All of my drives are LUK's partitions of course. except for my RAID backup partition. This is for redundancy in case I ever had a problem recovering a LUKS LVM Partition. Yes, LVM Partition, not Logical Volume. Way faster IMHO, not to mention far simpler. No security is perfect and my most sensitive data I keep in a TrueCrypt hidden partition, but overall it's setup fairly well. I generally don't recommend most biometrics because a simple subpeona will get you whatever you need in case you cannot retrieve the symmetric storage key or are unable to grab the actual encryption key off storage (if all cracking attempts fail).

A simple mechanism for doing Obfuscation in AESCrypt would be the following (for filesizes of appropriate size of course):

First, you pad the plain text with X number of random bytes so that plaintext bytex %16 == 0. X gets stored in memory.

Encryption:
1) Gather file Key and IV and X value for padding
2) Generate the SHA512 hash of the Original key.
3) Generate random 32 byte key from 16 (2 byte) different sections of the SHA hash to encrypt the plaintext and use the XOR of the present IV, creating a 2nd IV for the IV to use here, and store relative locations of 32 byte key pieces in memory, along with the X value.
4) divide up the SHA512 hash into 8 byte sections (8 pieces). Go ahead and encrypt the plaintext with the key/iv2 form step 3.
5) randomly insert the 8 byte portions of the SHA hash into the ciphertext. Also, when inserting a SHA Chunk, create an extra 4 bytes of data which is appended to the SHA chunk.
6) Since we know where the chunks are located, we put the 16 relative locations of the new SHA based encryption key' chunks and the 2nd IV and the X value, and divide up this information into 8 pieces of 4 bytes (serialize the 32 bytes), store everything into each respective inserted SHA Chunk in the appended 4 bytes (32 total), and XOR each 4 byte appended section with the last 4 byte piece of the Original key, or even better, something derived from the key or IV. This finishes the pseudo-ciphertext which is actual ciphertext with an additional 96 bytes of data. Still a multiple of 16 bytes.
7) We then encrypt as normal.


Decryption:

1) Decrypt as normal
2) Get the SHA512 hash of the encryption key.
3) Search the resultant cipher text for the 1st 8 bytes of the SHA hash, and remove those 8 + 4 = 12 byte pieces from the cipher text. Continue iteration until the matching SHAHash is assembled as well as the second IV and key chunk location data and X value (deserialize the 32bytes) via performing an XOR on each of the appended 4 byte chunks(8 pieces of 4 bytes) with the last 4 bytes of the Original key (or derived data), This guarantee's that the ciphertext has been re-assembled properly. Also, check extra data (16 bytes) for Second key locations (16 locations) in SHA hash.
5) Decrypt the ciphertext with the re-assembled 32byte key and 2nd IV.
6) Trim off X bytes of plaintext

So, that gives us 32 bytes of storage for the 2nd IV (16 bytes) and 16 bytes (64 bits) to pack the locations of the 2 byte sections of the 64 byte SHA512 hash. That should work out perfectly actually. Not sure about packing in the X value as well, so since there are 8 chunks, you could simply add 2 bytes for the X value to each appended chunk, and formulate something storing the X values in those additional 16 bytes. Or just pack the key location data as offsets from the initial point, that should leave an extra 16 bits or so to describe the number of padded bytes.

Anyway, you can see above that this very simple operation, which would make the 1st run plaintext completely indiscernible would make a simple key check exponentially more difficult.

So, while adding some extra CPU processing time during decryption and being barely noticeable during encryption, none of which would matter anyway in a short amount of time as the prevalence of the AES-NI instruction set's in the i3/i5/i7 chipsets become more popular. Literally, with AES-NI I experience about a 500% increase in speed. It makes a huge difference. The only issue of course being that it's an intel AES implementation, thus, can it be trusted? But, all in the same, if you put this in as an AESCrypt filetype, it would protect the normal user.

Even a simple password would be nearly impossible to detect on a fairly large file. In most cases, people will only wait so long. And like I said, in the recoveries I've done, they usually have to happen within 2 months max. So, just a simple mechanism like the above would give the regular user high grade protection against an attacker.

I believe there are multiple ways in which effective encryption can be done. Using the AES Algorithm is one of the mechanisms at out disposal. It's a strong block cipher and currently there are only theoretical weaknesses in the algorithm. Also, the use of CBC/XTS/GCM "modes" further enhance security. This is all I'm proposing here, a simple method to make the ciphertext exponentially more difficult to crack by brute force.

If obfuscation was performed, and the resultant plaintext was actually ciphertext in the above described format, it would be impossible to determine if the key was correct without iterating through the file once, or perhaps many times. So, this would greatly enhance the security of the cryptography used.

When I look at encryption, I don't use the side-blinders approach of doing what the rest of the world does, thats why I have had such success with recovering plaintext. My point on this second issue, is that if we can perform a relatively simple operation which would make the cracking operation take 1000-2000 days instead of 1 day, I would say that the small change would be advantageous. It should at least be an option for the user IMHO. I look at extra protection just as I would look at CTS, or AES256, or CBC, GCM, etc. It's simply a mechanism to further strengthen the encryption of the file, just like the above mentioned methods which happen to be better methods to protect plaintext. The Obfuscation technique would be exactly that, a very significant enhancement because I know for a fact that I would fail on the majority of my attempts to recover plaintext, even with simple, or normal passwords.

In summary, I believe the decryption key checking thwarting mechanism described above would cripple recovery attempts so that instead of me being able to test 20,000 keys per second, I would only be able to test 1000. Thats dramatically slower, so, for files that would normally get recovered in 2 months, my typical max, it would take perhaps 4000 months. If you look at it in those terms, and the fact that many federal agencies will only brute for up to 4 months, then I would say that this technique would be a major win for AESCrypt, since the everyday user would be exponentially safer against attack. The only thing lost here would be the instant build in password check, while being a nice "feature", it makes it extremely easy to do a recovery for normal users, most of the people who use AESCrypt. Again, that's just based on personal experience in the USA.

With the new RSA vulnerabilities, with the Google CA Theft last year, BEAST, etc, I think it's important to implement as much protection as possible, even for normal users. Granted, this solution wouldn't make a difference for users who remember 30 character passwords, but, for the normal userbase, they would definitely stand a much bigger chance of having their data kept safe. Not to mention, as an analyst, if this format was embedded into AESCrypt, and I was doing a recovery, I wouldn't even bother attempting to decrypt the AESCrypt file if there were other formats on the system. I'm always going to pick the easiest and quickest recovery option against a target, and that simply because literally for forensics/pen-testing, 60% of the payout is in successful recovery. I always structure the contracts that way because of my high success rate.

Just food for thought, but, this really would offer the normal user some very real protection. Anyway, have a great weekend.

Damian Kohlfeld
User avatar
paulej
Posts: 593
Joined: Sun Aug 23, 2009 7:32 pm
Location: Research Triangle Park, NC, USA
Contact:

Re: HMAC verification allowing faster decipher iterations

Post by paulej »

XORing with a random string of bytes would be interesting if we did have a need to hide the contents of the file that is attacked brute force.

However, our goal is to avoid that in the first place. Nonetheless, I might add that capability to the next version, just as one more level of protection... just in case the encryption key happened to be matched in the first 20 or 30 million tries.

However, I'm not at all concerned that one can determine if the password is secure 1000 times more quickly. In real terms, that is nothing. For those with major compute time, that's one key test. For others, it might be a few hours. Our goal is to ensure that we have a random enough string of bits that would take millions or billions of years to crack. We're not talking weeks... billions of years.

Per Wikipedia, the fastest commercial password cracking machine could crack 2,800,000,000 passwords per second. That's a lot... nearly 3B passwords per second. If one had 1000 such machines, that would be nearly 3 trillion passwords per second.

A good 16-octet password (comprised of one of 65 different characters) (as explained here: http://www.packetizer.com/security/pwgen/) provides an effective strength of 2^95.27 bits of strength. That would mean that the bank that really fast password cracking machine would need 539 billion years. Divide that by 1000 and you need 539 million years. If there were 1,000,000 such machines running, one would need 539,000 years. I believe were safe.

Want something better? Use a 20 character password. Then, 1,000,000 of those super-fast password cracking machines would need about 8 trillion years. There's significant strength in longer, random passwords.
Post Reply