How to Use File Compression Rick Weerts Fred, a novice modem user, heads towards the download section of his favorite Bulletin Board System. He pulls off about four files and logs off an hour later. Fred then attempts to execute the programs, expecting marvelous things on his microcomputer. But no matter what he does or who he calls, the files he received refused to work. Fred is the victim of file compression. Numerous utilities exist today which allow the modem user to compress a file before transmission to a board or another user. This compression saves on users' time and phone bills, and the BBS itself receives more room for download files. Nevertheless, unless the files are converted properly after they are downloaded, they are useless, as the computer cannot read them. This article attempts to cut through some of the fog regarding file compression and its effects on the bulletin board community... The Concept ----------- Most files on a bulletin board (and most others on microcomputers in general) make use of a lot of text (alphabetic) information. This information (when stored in a standard format known as ASCII) can be analyzed by virtually any other computer system operating today. However, with the invention of 8 bit protocols that are used for most telecommunications, this storage format is wasteful. It is also much more expensive when telephone usage is paid for by the hour. Before I go on, it is important to explain the bit protocols and the information they attempt to represent. As most people know, each piece of information in a computer is coded with a series of 1's and 0's. The computer can act and interpret on these codes. A combination of seven 1's and 0's is called a 7 bit protocol (one bit for each number in the combination). This protocol makes use of the ASCII character standards. For you math wizards out there, the combination of seven binary bits of 1's and 0's allows for 128 numeric combinations. Each number in the 128 numeric combinations stands for a specific alphanumeric code. These codes much up the ASCII table and include all alphabetic and some special characters the computer generates. The numeric representation of characters is what the computer deals in, not the characters themselves. However, my IBM can generate 256 CHR$(x) codes! And I just said there were only 128? IBM and most other modern computer makers have also included a 8th (high) bit in the character codes for their system. This additional bit adds another 128 possible combinations (1+2+4+8+16+32+64+128) to the previous 128 on the original ASCII table. The additional 128 numerals can represent additional symbols or even block graphic characters. I am not going to get into the specifics on bit protocols. The idea behind file compression (squeezing, compressing, crunching, whatever you want to call it) is to make use of these extra 128 characters that are used less than the first 128. These extra characters can stand for twelve spaces, sixteen hyphens, or whatever else the compression software allows. This coding allows more information to be included in the same amount of space. Since the computer is using all 256 characters in such an environment, you must transmit using the 8 bit modem protocol (8-N-1, referring to eight bits, no parity, and one stop bit). Putting it into Action ---------------------- When a file is compressed, it passes through a utility program and the "white space" is deleted from the source document, resulting in a compact file. In some cases, this space savings can be more than 50% of the previous file size. However, the file is now in a format that the computer cannot read "stand-alone". It requires special software to interpret how to unsqueeze the file into standard storage specs. In addition, the file may be stored in a library or archive before or after it is compressed. The file is stored together with other files under one single filename on the disk. This process can then be reversed by the end user. The time savings involved in this clever process becomes readily apparent. Compare the time it takes to download one 200K file versus twenty 10K files, each with it's own distinct filename. File Extensions --------------- Most disk files have an eight letter name and a three letter extension (separated by a period or slash). Often, the extension indicates what method of compressing or archiving was used. Below are some common file extensions BBSs and archiving software uses to denote files which use packing techniques. File Name Packing Method ============================================================= FIREFLY.EXE Original File None FIREFLY.EQE Squeeze (SQ) FIREFLY.LBR Library (LU) FIREFLY.LQR (SQ), (LU) FIREFLY.ARC Archive (ARC) FIDONWS.DOC Original Text File None FIDONWS.DQC Squeeze (SQ) FIDONWS.LBR Library (LU) FIDONWS.LQR (SQ), (LU) FIDONWS.ARC Archive (ARC) BOOMERS.BAS Original BASIC Program Binary BOOMERS.BQS Squeeze (SQ) BOOMERS.LBR Library (LU) BOOMERS.LQR (SQ), (LU) BOOMERS.ARC Archive (ARC) Squeeze and Unsqueeze --------------------- SQ and SQPC are two of the first available software packages designed to compress files into the smallest possible form. AUSQ, UNSQ, and NUSQ are their counterparts. They put files back into expanded format on request. Squeezed files usually have a Q as the second letter of their three-letter file name extension. Simply typing AUSQ or SQPC alone on a command line at the DOS level brings up a small help screen that shows how to operate the system. Since there are many such programs on the market, my object is to explain the concept behind them, not how a specific package operates. Nevertheless, the documentation on these packages is usually enough to operate it successfully. The Data LIBRARY ---------------- LU.EXE is the original library (LBR) utility. It allows the packing (and unpacking) of files into one large file. LUed files usually have a LBR file extension. In the same respect, a LQR extension indicates that the file must be unsqueezed (using AUSQ or NUSQ) BEFORE it is converted with the library utility. Also, libraries may consist of one or more libraries residing inside of each other, some squeezed beforehand. As you can see, the standards for the file extensions are important to follow when dealing with such a variety of systems. Usually, typing LU will give a brief command line summary of the function. The library utility usually provides a command which will remove all the files from the library file to stand-alone files. The LBR file will then serve as a compressed backup of the information you have unpacked. It is usually handy to unpack a library by putting it in its own subdirectory (DOS MKDIR Command). In this way, it becomes clearly evident which files have been removed from the library. You will not get them confused with other files with similar or identical names. You can then move the files (DOS COPY Command) outside of the subdirectory or onto the disk of your choosing. Archiving Systems ----------------- Finally, we come to ARC.EXE, short for Archive. This handy little utility takes all the guesswork out of squeezing and packing files into an archive (or library). ARC automatically decides the best way to compress a file and then adds it to the archive. ARC also unpacks the file in the same way, eliminating the squeeze step of the process. The archive utility is compact and it makes the other packing schemes obsolete. Of course, if you have only one file to pack, you may only want to squeeze it. In this case AUSQ comes into play. Typing ARC at the DOS command line prompt causes the program to supply you with an informative help screen. To unpack all the files from an archive, you type ARC E archive_name. Addition of files to an archive is just as simple. Once a file has been archived, there is no need for further squeezing. The file has been squeezed as tightly as possible and any further attempts at compression will only add to the file's size. Files with the file name extension .ARC are archive files. The instructions about putting the archive data files in separate directories still stands. This technique certainly makes for a much easier time of packing and unpacking. What All This Means to Me ------------------------- Archiving and squeezing are not requirements (in most cases) before a file before is transmitted to a bulletin board. However, most BBS system operators will ARC or squeeze the files they receive from users. Compressing the files ahead of time saves time for the sysop and also allows more room on the BBS for additional download files, good for everyone involved. Also, a sysop of a Commodore board, for example, will probably not have the capability to squeeze files intended for IBM systems. Compressing files is also a good idea from the KISS (Keep It Simple, Stupid!) concept of file transfer. All the files necessary for a software system to run should be placed under one archive name. The users of the BBS are much more likely to get a working system than if they have to sift through 500 files until they find all the correct ones for that particular system. It also allows for easy updates when you improve the software. Archiving makes sure there is only one filename to delete and one to add. Finally, compressing files SAVES MONEY! Compressed files can be shrunk more than 50%, cutting AT&T's share of a long distance call in half. They are also convenient for pay services such as CompuServe or The Source, where they make sure every second costs. And it saves money in both directions, as both the sender and receivers benefit from lower bills. So the next time you send a file to your favorite BBS, do everyone a favor and do a squeeze play on Ma Bell. Other Confusing Items --------------------- After writing this piece, I noticed that I fluttered for one verb usage to another with uncanny regularity. So below is a list of terms and their (my) definitions. Archive File(s) that are squeezed and lumped under a single heading by the ARC.EXE package. ASCII Seven bit protocol standard agreed upon by all major microcomputer manufacturers. Compress To make a file smaller by shrinking the space the file occupies. Crunch Same as compress. Library File(s) lumped under a single heading by LU.EXE or a similar package. Pack Placing numerous files under a single heading by either the ARC or LU utilities. Protocol Code of 1's and 0's indicating characters stored in a computer's memory. Squeeze Same as compress. Unpack Remove from library or archive one or more sections into stand-alone files. Unsqueeze Return file to original structure I hope you will find the above article helpful. If you have any questions, comments, additions, corrections, gripes, etc., please send them to me. I will make an attempt to respond as soon as possible. Try to leave a Fido, GEnie or CompuServe address. December 4, 1985 Rick Weerts