torsdag 23 juni 2011

archiving & Compressing

tar short for Tape Archive commonly referred to as tarball is both a file format and a program used to handle such files. The format was developed in the early days of Unix and made standard POSIX.1-1998 and later POSIX.1-2001.

Its easy to compress a whole directory in Unix/Linux´sBSD. Its useful when you are backing-up, transferring files over the Internet.

To make a tarball you need to follow the syntax and,

tar -zcvf archive_name directory_name


  • z compress a archive using gzip program
  • c create archive
  • v verbose - displays progress 
  • f archive file name
To de-compress a file follow this syntax,

tar -zxvf archive_name directory_name

where

  • x Extract files
 -----------------------------------
Read on


The basics of file compression in Unix
Why small files may be larger than large files. Why one file may be better than two, and how you can get more space on your disk by compressing your files using Tape Archiove (tar, tarball).

If you list files on your Unix system using ls -l you get the file size, however not the amount of space used on the disk. To se the about used by the disk you set a flag to -s, so ls -ls. This will display a initial number representing blocks used by the file. A block is a unit of 512 bytes, so 4 blocks would be 2048 bytes

The reason a filsystem uses blocks is that if it would be using the actual size would creata a overhead that  would become a load on the system. Also keep in mind that you need some information about the file, like name and other information. Also a file would become heavily fragmentet (parts of the file spread across the filesystem) leading to slow performance.

To handle the problem a compromise was reached in disk organization. A convenient number of bytes was selected as the minimum amount that could be allocated to a file, the allocation unit. If a file don´t use the allocated unit, this would be reserved for future expansion.

Early Unix system used 512 bytes, however this was expanded as memory increased to 1024 bytes on most Unix systems (larger on some), but many utilities (like ls) continues to report  in 512 bytes-blocks even though the actual size increased.

Black holes
With this in mind its easy to understand that a 3 byte file will occupy 512 bytes in the momory, 1-block on the disk. If the file is expanded over 512 bytes it will occupy another block though now using 1024 bytes on your disk.

This makes for a lot of space on your system. If you have a file that is 1201 bytes therefor occupy 3 blocks on a system that uses the 1024 byte-block 99.7 percent of the space allocated is unused, and 41.4 percent is wasted. Multiply this by the number of files on your system and you could imagean the vast black holes of disk space.

Remember that this kind of waste only occurs on very small files, so the larger files are more efficient in allocation space on your disk. If you allow that your system is probably working well your allocation is a good compromise between allocation and speed of disk access.

To establish the allocation unit of your system. You can read the man-page for the ls command. Establish the block size used by the -s flag.

Compression
Pack, compress, and gzip compression utilities work well on large files, but perform poorly on small files. In the sample below, compress is applied on the file and the result is displayed. The compress utility recognizes that it can´t do any good on the file and leaves it alone.  If it would to a do a job it would append the .z to a file when it compresses it.  To reverse the process use the uncompress command.

The smallest blocks is a pair of two (or the default allocation of units on your system).

Compressing files with tar
If you have a directory of small files that are little used, but need to remain on the system. One way of doing it is to combine the files and compress them. The obvious candidate for combining files are tar (tape archive) utility program.

Tar informs you that it has appended a file and how many tape blocks it uses. tar reprots in 512 bytes blocks.

Having a look at the files produced you might get a shock! The files may now take even more blocks than prior. Fortunately, tar fills those empty spaces in the blocks with garbage (takes form of hex zeros or NULLs.

Proceeding the next logical steps, is to compress those tarballs (tar). This results in a .tar.Z file that is smaller than the original file.

Just reverse the process starting with uncompress, and then use tar with the flag -xvf


It might be easier to just follow the tar command as described at the beginning of this guide. However if you would like to use archiving and compressing of your choice: That makes this process more powerful - and operating from the command line,  is that not why we prefer shells infront its desktop alternative? Now we can perform a tailored tarball for best performance of a particular file.   


Inga kommentarer:

Skicka en kommentar