The magazine of the Melbourne PC User Group
E-mail In Code
Gordon Woolf
|
|
|
Gordon Woolf looks at some more oddities with e-mail |
Wile it's being transmitted or stored by a computer, text is sometimes in a
format that is readable only by programs on the computer. Have you ever received
an e-mail with something that looks like this:
VGhpcyBpcyBzb211IHRIeHQgd2hpY2gaGFzIGJIZW4gY29udmVydGVkIHR vIEJhc2U2NCBIbmNlyZGluZy4=
If so, you are seeing the e-mail in a format which is supposed to be reserved
for transmission across the Internet, and should have been converted back into
human readable form. It is text (or possibly a graphic, if it's a lot longer
than the example above) which is encoded in Base64 and there are lots of places
on the Internet where you can just copy and paste it to decode it. Some of those
are listed in the box below.
Base64 means it is a numbering system that uses a base of 64 instead of the base
of 10 we use for our arithmetic... ideal for people with 64 appendages to count
on. More usefully, it is the largest power of two base that can be represented
using only printable ASCII characters.
This has led to its use as a transfer encoding for e-mail among other things.
All well-known variants use the characters A–Z, a–z, and 0-9 in
that order for the first 62 digits but the symbols chosen for the last two
digits vary considerably between different systems. Several other encoding
methods such as uuencode and later versions of binhex use a different set of 64
characters to represent six binary digits but these are never called by the name
Base64.
Look at http://email.about.com/cs/standards/a/base64_encoding.htm
and you'll find more about it than most of us need to know. If you do then try
http://my.pclink.com/~dthomas.ideas/mimeattach.html.
I liked one definition I came across, that sending a graphic by e-mail is like
the question of how to take a very large truck through a narrow alleyway. The
answer is to take it apart and reassemble it at the other end.
Well, you hope you can reassemble it.
Mostly you can, but just occasionally it gets stuck and you end up seeing what
you should never see — the bits, all neatly packaged and lined up ready for
assembly.
The efficient way of getting our truck
through the alley is to do it in the
fewest possible pieces, which is
achieved by taking one and a half
chunks of the binary
information. So every
three bytes becomes
four chunks of binary info. Then you can convert to a smaller file of data than
would be possible with the other
system we have probably
come across, named
Hexadecimal (Hex is the popular short form of this word). Using Hex would make
every formatted text file in email exactly double the length it was before it
was encoded. Using Base64 makes it only half as big again as the original.
And if you don't end up with exactly the right number of characters? Well throw
in an equal sign or two at the end, which tells the system that you are padding
it to the required length and to ignore the equal signs in the reconversion. Of
course a real equal sign in the original, won't be that in the conversion — what
it will be, depends upon precisely where it occurs in the byte splitting.
Base64 was a way for spammers to hide code that might be stopped by spam
filters, knowing that most e-mail programs will turn the encoding back into
readable text — after the spam filter has let it through. You will be pleased to
know that spam filters are generally now more intelligent and most can now
interpret encoded subject lines at the very least.
There are other encoding systems, among them uuencode which is commonly used for
e-mail attachments, and BinHex which is a similar system most often used on
Macintosh computers. Programs such as WinZip can decode almost all such files,
including Base64, but sometimes you may need to copy the encoded text from an
e-mail and save it as a separate file. WinZip will then work out what encoding
has been used...just drop the file on to the open WinZip window. There are hints
on the WinZip web site at http://www.winzip.com/1.11100007.htm and for Mac
and Windows users the free Stuffit Expander
program works in similar ways. Stuffit Expander for Windows at
http://www.stuffit.com is a big download as you download a trial version of the
complete Stuffit program but, after a trial period, continue to have free access
to the Expander part.
Other useful programs include the $20 shareware Etresoft Decoder at
http://www.etresoft.com/.
Another peculiarity which turns up in e-mail is "quoted printable" coding. With
this any 8-bit value may be represented by a "=" followed by two hexadecimal
digits. For example, if the character set in use were ISO-8859-1, the "="
character would thus be encoded as "=3D", and a SPACE by "=20". The receipt of
such messages usually indicates that the sender and/or you as recipient have
different settings for the character encoding... or there is an intermediate
step such as a mailing list server.
It becomes particularly annoying
when the text includes many words from other languages as each accented
character will come through in this encoded form.
So, for minor problems with odd coding, check how your e-mail or word processing
program allows you to use other encoding systems. And for chunks of coding, save
it as a file and use a program such as WinZip or any
of the online decoding services mentioned here.
About the Author
Gordon Woolf is a long time Melb PC member. He has vague memories of learning
about bases in advanced mathematics before he switched from that subject to art
at school many years ago.
gwoolf@melbpc.org.au
Reprinted from the May 2006 issue of PC Update, the magazine of Melbourne PC
User Group, Australia
|