Compression

Ever since people started using personal computers, space has always been a problem.

For many years, it was hard drive space. Like closets in your home, they fill up too quickly. People soon found it necessary to be creative in how they stored data. They could back it up on floppies or CDs, or delete files no longer needed. Or, they could make the data smaller somehow.

Today, storage has expanded greatly, and is less of a problem. Although we now keep much larger files (high-resolution digital photos and movies eat up most of the space), storing data is not as difficult as it was before.

However, now that the Internet is here, we have problems sending data; too many Internet users still have slow connections (Japan is very advanced in terms of Internet connection speeds), and so if we want to make data available, we have to try to make it smaller.

A common solution for both past and present is to make files smaller. The question is, how?

Real-World Compression

Consider putting a piece of letter-sized paper in your pocket. A piece of paper that size is too big. So, what do we do? Naturally, we fold the paper, usually three or even four times--and then it's small enough to fit in our pockets. There is a problem, however: when we open up the paper again, it now has fold lines--in other words, it has lost quality.

In some cases, however, you can make things smaller without losing quality. Think about a large amount of string, which has been unrolled and lays in an unorganized pile. If you try to put it in your pocket, it won't fit. However, most of that is just empty air; if you roll the string around a few fingers and then bind it with a rubber band, it is now compact enough to fit in your pocket. And because string bends naturally, there is no loss in quality.

Computer Compression

This works almost exactly the same way on computers. Instead of folding, however, we use something called CompressionTo "compress" means to make something smaller. Compression is commonly used in many file types--most of your music, images, and movies are already compressed..

Compression works by finding ways to make the data take up less space. One method is called "run-length encoding," and works by finding long strings of repeated data and reducing them to one string with a notation for how many times it repeats; for example, 111110000000 could be expressed as 1[5]0[7]. This is partly how a GIF file works--GIF images work best when there are many pixels of the exact same color, allowing for more "run-length encoding."

However, most compression is more complex, requiring mathematical formulas and special tricks to take a large amount of data and reduce it to a smaller amount. Two basic ways to do this are called lossy (methods where some quality is lost), and lossless (methods where no quality is lost). The example of folding a piece of paper is a "lossy" method; the example of binding up the string is an example of "lossless" compression.

Where Compression Is Used

Typically, most files which take up a large amount of space--photos, audio, and video--are compressed. This is done by saving the data in special file formats with compression built-in.

Take music files, for example. When you buy a music CD, the songs are saved as AIFF files--an older format which is not very compressed. An AIFF audio file containing one song might be 50 MB in size--allowing for about a dozen songs to fit on a CD. This is OK for that format, but if you want to save music on your computer--or more importantly, a small DAP (digital audio player) device, 50 MB per song is much too big. So when you transfer the song from the CD to your computer, it is translated into an MP3 file--a format with strong compression. The 50 MB AIFF file could be compressed into a 5 MB MP3. Now the music is small enough to fit on the DAP.

MP3 is a "lossy" format, meaning that you lose quality the more you compress it. Partly this is done by simply removing some of the data. Fortunately, most audio has more information than people can actually hear, so taking away some of the music data will not really affect how most people hear the music. Other tricks are also used to make the file smaller--and a great reduction in space is achieved.

The problem is when you try to compress something too much, and the quality become terrible as a result. Take the MP3 audio file below, a few seconds from a popular song. The first half (about 5 seconds) is at normal compression, and then the same bit repeats at extreme compression.

The original song file is 4.8 MB; the extreme-compression file is 410 KB. However, the quality is really bad, so most people would find it unusable.

The same is true with images. Take the picture below, a JPG image of me holding a Shiba Inu puppy. At left, the image uses normal compression; at right, extreme compression. The close-up below shows how the quality is affected at the pixel level.

Notice how the extremely compressed version groups the pixels into larger squares and then makes the pixels in each block more similar. This allows for the run-length encoding method to be used more. You might also recognize those blocks from your TV set--for example, in severe storm weather, the TV signal sometimes gets interrupted, and you see the picture cut up into small blocks--that's the same thing.

This distortion is called artifactionAn "artifact" is an artificial object; a natural image would not have these distortions, so the artificial nature of the distortion is given to this name..

Most image, audio, and video file types use various methods and strengths of compression. Compression is always improving. For example, DVD videos use an old type of compression, known as MPEG-2. When DVDs first appeared in the mid-1990's, MPEG-2 was state-of-the-art video compression, also used in digital TV transmission. Today, however, the quality of video compression has improved; one popular method today is H.264/MPEG-4, which can compress video at about half the size of MPEG-2 with about the same level of quality.

Other File Compression

Other types of files do not have compression built-in. Many text document file types, for example, or files for a variety of programs. Since they tend to be small in size--only a few hundred kilobytes to maybe a few megabytes--it is not considered as important today to compress them, since we have storage media that can save gigabytes or even terabytes of data. However, if you did need to compress such files, you would use a common compression format like ZIP, TAR, or SIT. There are many, many formats--most of them are free, but some are proprietary and cost money.