File format

From ArticleWorld


A file format is a way of encoding information in order to be stored in a computer's file. The information in a file needs to be encoded because the information is stored in a binary format, and not in a human-readable one.

Most file formats are centered around a single data type which they store. For example, the PNG format is designed to store still images, while others, like Quicktime, can store various types of information as well. The same goes for text files: text files store only plain-text, but its destination can vary.

However, it should be noted that, at the base, it's really just bits and bytes. For example, one can send a JPEG image to the sound card to be played, and some sounds will be heard in fact.

Specifications

All file formats are based on specifications. This is done in order to ensure a general consistency when implementing applications. Some of these specifications are open to anyone, but some are not, leading to many issues regarding patents and incompatibility between programs.

Identifying a file type

Using file formats is not enough. It is important for programs to be able to recognize the files that they open, especially if they can use multiple file formats. There are several ways of identifying the file type:

  • By extension. This is the popular way file types are recognized on older systems like CP/M and DOS, but newer systems also use this too, like Windows. It is actually the simplest way of recognizing file formats: the names of the files end with a period followed by three letters which define the type of a file. For example, text files may end with .txt and GIF files with .gif, and the file format specifications contain the standard file extension details. This is, however, a very unreliable method, prone to many security problems as well.
  • Magic number. This method is usually more reliable. It works by decoding the first (typically) 2 bytes of a file and comparing them against a list. This ensures that the file is usually recognized correctly.

Other, although less popular methods, include explicit metadata (storing metadata separately from the main data and the name, which is less portable, but more reliable), storing metadata information in the filesystem or using POSIX extended attributes.