This will be my first post in the interests of Cutting the Crap. I’m a longtime online activist (dating back to days on Usenet), and am honored to be joining Milt in his quest to bring a little straight talk to political discussions.
I’ll be starting out with some recent posts I wrote for Daily Kos on computer security. It’s far from my only area of interest, but considering the FISA follies which are much in the news these days, the topic is rather timely.
This first post is an introduction to data encryption.
(Posted on Daily Kos on Sunday, August 5th)
I’ve long been a proponent of the use of encryption when it comes to electronic communication. This pre-dates Bush’s massive violation of the FISA law; most electronic communication is vulnerable should some bad actor decide to listen in, and the situation has become 100 times worse in the age of email.
In short: everyone should be encrypting their communications. It is simplicity itself for a criminal to read your online communications. You don’t have to be concerned about violations of the 4th Amendment to be concerned about protecting your privacy.
This is the first of a series of several diaries (not sure how long it will be) discussing encryption, the protection of private communications and the protection of privacy in general. There’s no "insider knowledge" presented here, since I have none to present. Rather, the information contained herein is gleaned from years of hobby interest in the field. Any comments or corrections are welcome.
Secret codes have been around nearly as long as the written and spoken word. With the advent of the Information Age, advanced mathematics has been leveraged to produce extremely powerful codes, commonly referred to as strong encryption. These cryptosystems are sets of mathematical transformations that convert readable data (called plaintext) into random-looking sets of data (called ciphertext). If properly done, ciphertext should, in fact, be indistinguishable from random numbers, e.g. the data you’d get counting the number and size of leaves on a tree, or digitizing the noise on a TV channel without a station broadcasting.
Modern encryption comes in two basic flavors: symmetric and public-key. To make matters more complex, public-key encryption is generally not a terribly efficient system in terms of speed of encryption/decryption of data, so it’s usually used in conjunction with symmetric encryption.
Symmetric encryption is a system which combines a mathematical algorithm with a single digital key used to both encrypt and decrypt the same data. Obviously the key used can be different from message to message, but the main point is the same key must be used to retrieve the original message as was used to encrypt it in the first place.
There are a number of available symmetric encryption algorithms that have been published and survived scrutiny. Perhaps the most commonly-used these days are:
- AES (Advanced Encryption Standard), formerly known as Rijndael
- DES (Digital Encryption Standard) and triple-DES
AES is a standard used by the US government, and therein lies a tale…
Overview of AES
In 1997, the National Institute of Standards and Technology (NIST) decided the old National Security Agency (NSA)- and IBM-designed encryption standard called DES could no longer reasonably protect sensitive data, due to its relatively short key length (56 bits). It was becoming feasible for modern computers to simply try all possible keys and decrypt messages. So, NIST proposed a public competition to replace DES with a more robust encryption system, to be known as "AES".
The competition resulted in several submissions from various renowned encryption experts, and in the course of the competition, researchers, including those who submitted algorithms, tried every way they could to break the ciphers. In the end, several algorithms were left standing, and the winner of the remaining algorithms was decided primarily on the basis of speed of encryption/decryption and code size of the implemented cryptosystem.
Further vetting of AES
Upon the selection of Rijndael as the new AES standard, NIST placed its seal of approval for use of the algorithm for the protection of sensitive but non-classified data. But then a surprising thing happened. In 2003, NSA announced they’d examined the algorithm upon request, and pronounced AES suitable for the protection of electronic data up to the classified TOP SECRET level. This approval means the general public, for the first time, has access to strong encryption that the NSA itself has rated sufficiently strong to protect the US government’s most important electronic data. And lest you think NSA was "pulling a fast one", AES is now used by defense and security contractors who provide encryption in products used by the government. It’s the real deal.
This description of AES is not intended to say it is necessarily "better" than the other algorithms that are listed above. There is good reason to believe all are sufficient to protect sensitive data. However, of the listed algorithms only AES is approved for use by the government itself to protect its most vital secrets.
The problem of key exchange
Modern symmetric encryption algorithms are very powerful, and many are considered unbreakable with current technology. However, they all suffer from one serious problem: how does one make sure the recipient of a message also has the key?
There are several ways to implement key exchange, including meeting in person. However, the problem of key exchange has always been a limiting factor when it comes to the widespread use of encryption. After all, if you have a secure means to exchange keys, then presumably you could just exchange messages securely in the first place. Encryption, then, becomes more a matter of convenience.
The advent of public-key encryption, however, has meant a whole new ballgame. This will be described below.
The ultimate in security: the one-time pad
Most modern symmetric ciphers make use of keys that are significantly shorter than the data being exchanged, and at least in theory, may be broken. The one exception to this rule is the "one-time pad". A one-time pad is a string of truly random data known to both entities who wish to communicate, and no one else. The pad is as long or longer than the plaintext data to be sent, and is never re-used to encrypt another message. The encryption algorithm itself may be extremely simple: a simple bitwise xor logic function will suffice. As long as the "no-reuse", "known only to the communicating entities" and "truly random" rules are followed, the one-time pad is provably unbreakable.
Obviously, the one-time pad system suffers from several problems, the most serious of which is how to exchange the pad between those who want to communicate, without anyone else intercepting it. The need to not reuse the same random data on a second message, combined with key exchange difficulties, makes the one-time pad difficult to use for practical protection of communications.
Public-key encryption makes use of so-called "one-way functions". These mathematical functions are easy to calculate one way, but very, very difficult to reverse. They are based upon "hard problems" that mathematicians have examined for centuries and been unable to solve. There are two main types. The first (and first generally available) is based upon the difficulty of factoring extremely large numbers that are the product of two prime numbers. This is the basis of the so-called "RSA" encryption algorithm. The second type is based upon the "discrete logarithm" problem (don’t ask me the details). "El Gamal"-type algorithms make use of this system.
In public-key encryption, there are two keys: one used to encrypt data, and the other used to decrypt it. Most of the time, this system is used by publishing one of the keys, and keeping the other secret. In this way, anyone can use your public key to encrypt a message to you, and only you, the holder of the private key, can decrypt it. If two people exchange public keys, they can send messages back and forth to each other, without ever having to meet and without risking compromising security.
The development and availability of public-key encryption is a huge step forward. One of the most difficult problems in encryption is how to exchange the key necessary to decrypt ciphertext. This problem is especially acute when the two individuals who wish to communicate cannot meet in person or . Public-key encryption solves this problem; indeed, it is what has made the widespread use of encryption possible.
The one remaining problem with public-key encryption is the issue of trust: making sure that the person with whom you exchange keys is who he or she claims to be. I’ll discuss that more in the next installment.
A subset of public-key encryption is the "digital signature". With digital signatures, the key usage is reversed: the person signing the data uses his or her secret key to "sign" the data, and others use that person’s public key to check the signature.
When data is "digitally signed", a string of data with a known length (the "hash" or "message digest") which represents the data being signed is calculated using a specialized encryption algorithm. This data is then encrypted using the signer’s private key. Anyone else looking at the same data can also calculate the correct hash for the data that was signed, and using the original signer’s public key, decrypt the hash the signer had calculated (which is provided with the signed data). If the two match, it may be assumed the data has not been altered and the original signer did, in fact, sign the data.
Good digital signatures depend upon good hash algorithms, which must:
- Produce one, and only one, hash string for a given input,
- Be immune from the ability to find two messages that produce the same hash value within a reasonable amount of time.
Because hash algorithms produce hashes that are usually much shorter than the data being hashed, it must be true that many messages of that size can produce the same hash. However, due to the hash size and the algorithm, it should be basically impossible to find other messages that produce the same hash. This is the basis for the utility of digital signatures.
In practice, designing hash algorithms that can do this has proven very difficult, and even NSA-designed and published algorithms have been found to have serious flaws.
A word on key length
You will hear quite a bit about "key length" when encryption is discussed. This can be a confusing topic, primarily because 1) the efficacy of a given key length for protecting sensitive data depends, in part, upon the algorithm used and 2) key length means wholly different things when discussing symmetric and public-key encryption.
Key lengths are usually specified in terms of binary bits. Typical key lengths in common use are 128 bits, 192 bits and 256 bits for symmetric ciphers and 1024-4096 bits for public-key ciphers. Longer keys are invariably more secure (assuming the encryption algorithm used is a good one); however, there is an issue of diminishing returns for larger key lengths, and good encryption algorithms may take longer to calculate longer keys or even to use them to encrypt and decrypt data.
The basic idea when using strong encryption is to make it impractical, using current and projected computational power, to simply try all of the possible keys until the right one is found (a procedure called "brute forcing"). With strong encryption algorithms and long keys, brute forcing using currently-available computer systems may take longer than the projected lifetime of the solar system.
It’s also a good idea to keep on top of developments in encryption cracking: public-key encryption using 1024 bits is generally not considered very safe at this point, as brute-forcing keys of this length within a reasonable amount of time is rapidly becoming feasible. For AES, NSA has specified key lengths of 192 and 256 bits for the protection of TOP SECRET data.
What makes an encryption algorithm "strong"? Well, there are several things:
- There must be no faster way to decrypt the ciphertext without the key than by trying an impossibly-large number of keys by brute force
- The encryption system should not depend upon secrecy of the algorithm for its security, or depend upon lack of attacker knowledge of some of the plaintext
- Retrieval of the key should not be feasible even when the attacker knows the original data, the algorithm and the ciphertext.
The breaking of an encryption system is generally accomplished using a combination of two methodologies:
- Reducing the key space (the effective number of possible keys that need to be tried) by finding mathematical weaknesses in the encryption algorithm, and
- Brute-forcing the possible keys.
Thus far, none of the above-listed encryption algorithms has been truly "broken" by anyone publicly, meaning a mathematical shortcut has been found allowing the plaintext to be derived from the ciphertext without the key. Several weaknesses have been found in most, which reduce the keyspace of each. However, all have been subjected to substantial and ongoing review by a sizable community of extremely knowledgable mathematicians and cryptologists (professional encryption experts) and all are deemed secure using sufficiently long keys.
So how do I know what (insert three-letter agency here) might be able to break?
The simple answer is: you don’t. There is no way to know what mathematical breakthroughs someone deep in the bowels of our government–or anywhere else, for that matter–has made and can leverage to break commonly-used encryption algorithms.
A more qualitative answer involves guessing the answers to the following questions…
- Would the US government use encryption to protect its most important data that it knows has a weakness? Can it take the chance that someone else hasn’t discovered the same weakness?
- How likely is it that top government mathematicians, working with nearly unlimited resources over the span of about five decades, have been able to solve hard problems that have stood the test of time for millenia?
Modern strong encryption has been implemented in many products available to the end user, and quite a few of them are available for free. Encryption products are available to keep your email, your files, even your voice and video data private. In the next installment of this series, I will detail some of these products and their use.