The magazine of the Melbourne PC User Group

Making a Hash of It
Major Keary

Most BBS users are familiar with cyclic redundancy checks (CRC) used to discover transmission or copying errors in files. Message digests - also known as hash, hash codes, and hash functions - are more secure and more useful. A hash is a fixed-length value generated from a file of arbitrary size using a one-way function algorithm, which means the process can't be reversed to discover the original data. Message digests create a fingerprint of files of any type, text or binary.

What Does a Hash Code Look Like?

The following 128-bit hash function was generated in hex by MD5SUM.EXE from a 17 kb text file (some 2800 words): 
          FE 6B 9C 9D F4 F0 1B 6C 99 72 EA 47 10 EF 8D C1
in actual usage there are no spaces and no upper case, just a single string,
          fe6b9c9df4f01b6c9972ea4710ef8d.

A slight change was made to the file; one word, on, was changed to no, and another hash generated:
         1B FE DO E8 C3 B3 BA 7D B6 B0 1E CB 4A 84 70 EE

As can be seen, there is no resemblance with the original.

What Can I Do With Hash?

Hash functions can do away with the need to store user passwords on a system. It works like this; a user's password is registered in the usual way but, instead being held on a file, the system performs a message digest and stores the hash function instead. There is no way of discovering the password from the stored hash function, and the hash function cannot be used as a substitute for the password.

When a user logs on and enters her or his password it is hashed and the result checked against the list, a method that will be familiar to Unix system administrators.

It does not, however, prevent capture of the password while in transmission, password guessing, or use of a stolen password. That weakness can be strengthened by adding some extra, variable data that is a secret known to only the user and the system. For example, user passwords could have attached to them, say, the value of pi to five decimal places followed by the date; the resulting string is hashed and the function used for authentication. The method is easy to remember, so there is no need to write it down; most cheap calculators have a pi function. The variable element is the date (or a time stamp). Even if the raw password is guessed or stolen, the added data makes it useless. Capture is also made more difficult because the transmitted hash function changes with each use.

Why use an extra input (in the example, pi)? Because an attacker who has acquired a password is likely to test for date input when the password alone doesn't work.

Another variation is to use the hash of a large number that is reduced by one digit at each login. A large number per se does not have to be difficult to remember. For example, 9999 where the second logon uses 9998, the third logon uses 9997, and so on. Another, more secure, method is to use a smaller, easily remembered number and raise it to, say, the power of two (any cheap calculator can do that) and reduce the result by one digit for each logon. That approach is useful where a user has to make regular logins from remote locations using untrusted links.

There is, incidentally, another system that caters for the travelling user; Bellcore's S/KEY uses hash functions to generate up to ninety-nine one-time passwords. The user calls up the next-to-be-used password by a simple computing process. It is described in Actually Useful Internet Security Techniques and the source code can be found at ftp.bellcore.com (look in /pub/nmh/). When I first wrote this item in 1997 S/KEY could be found by pointing a browser at ftp://ftp.bellcore.com/Pub/nmh, but on a recent attempt it wasn't connecting, and searching for S/KEY on the Bellcore home page led to a dead end. Using ftp in the Unix shell found the directory.

Authentication

Passwords can be stolen, guessed, or otherwise compromised. The purpose of authentication is to enable communicating parties to be sure of each other's identity. Suppose Herbert of Hobart wants to exchange messages with Helga of Helsinki. If they agree on a common secret (let's say it is Rumplestiltskin) it is possible for each to verify the other's identity. For example, Herbert sends the first message, "Are you there, Helga?"; Helga wants to be sure it is Herbert, so she sends a challenge in the form of a large number. On receiving Helga's challenge Herbert takes the number, appends the shared secret (Rumplestiltskin), and generates a hash function that he puts into a reply.

Herbert can authenticate Helga's identity in the same way, sending a large number by way of challenge.

Mutual authentication is achieved because both are in possession of the shared secret, which remains secure because it is never transmitted in clear text.

Of course, the contents of any messages are in clear text and can be intercepted. The issue here is authentication, not encryption; if it is important for the parties to be able to verify the identity of each other, this system is simple and secure. The shared secret, of course, has to be exchanged by secure means and kept secure.

The method also lends itself to fax. When a fax message is received there is no guarantee that it originates from the purported sender. Using message digests in combination with a shared secret is a solution that doesn't require any special programming.

Digital Signature Standard

The Secure Hash Algorithm (SHA), which generates a 160-bit hash, was established by the U.S. National Institute of Standards and Technology (NIST) and the National Security Authority (NSA) as part of the Digital Signature Standard (DSS). Where the communicating entities are not known to each other and are unable to exchange a shared secret, then one of the more complex public key systems is necessary. Hash Resources

MD5 is not the only program that generates message digests, but it runs on various platforms, is freely available, and easy to use. Using a Web search engine to find MD5SUM should turn up a site; a search for Schneier (author of Applied Cryptography) should find the source code for a number of cryptosystems, including two versions of MD5. A version to run under Windows may have been written, but I have not seen it. The DOS version is simple to use; for example, the command:
       MD5SUM myfile > hash

will generate a message digest of myfile and write the hash code to a file, hash. The target file can have any name, and , will be created if it does not already exist. That is useful because it enables programmers to automate authentication systems.

The subject is covered to varying degrees in the following literature. A group of eight experts has put together a very useful resource:

Cooper et al.: Implementing Internet Security 
ISBN 1 56205 471 6
Published by New Riders

Published in 1995 it is now out of print, but a copy is in the library. A good resource for those who want to survey their own requirements and carry out a risk assessment (there is no point in putting a security system into place before making a risk assessment). Message digests are described in some detail.

If you want an in-depth mathematical treatise (some familiarity with linear algebra is assumed), then

Douglas Stinson: Cryptography, Theory & Practice
ISBN 0-8493-8521-0
Published by CRC Press, 434 pp.

is probably the most professional resource. It has an excellent introduction to the early systems, including a description of Claude Shannon's work (which led to Huffman data compression). DES, RSA, signature schemes, key distribution systems, authentication codes, and zero-knowledge proofs are amongst the topic covered. The material on hash functions is very detailed.

This is an expensive title. I noticed one retailer had it listed at $160.00, but if anyone does want a copy I suggest a visit to www.dadirect.com.au, the Web site of DA Information (the last time I looked this title was quoted at $123.00). They are the biggest provider of professional and academic texts in Australia and their prices are adjusted according to exchange rates on a weekly basis. The site is well worth browsing.

Also on DAs lists is

Bruce Schneier: Applied Cryptography 2nd edn.
ISBN 0-471-11709-9
Published by Wiley, 758 pp.

It is written for programmers and communications professionals and contains some of the best explanations of crypto and compression algorithms that non-mathematicians will find. It is also has the most extensive list of resources to be found. The second edition was published in 1996; I asked the author if a new edition is planned, to which he answered: No, the task is too daunting.

Reprinted from the February 2000 issue of PC Update, the magazine of Melbourne PC User Group, Australia

 

[About Melbourne PC User Group]