Digital data is able to represent most media types, whether text, sound, image, moving image, or new media types such as hypertext or relational databases, in a unified way. In the end, everything is just a bit stream.
In a broad sense, digital data can be defined as anything recorded using a symbol-based code on a medium. Such a code uses a finite set, S
, of Symbols—for example the Latin alphabet, Egyptian hieroglyphics, etc.
S = {S1,S2,.....Sn}, n ≥ 2
If n = 2
and thus the code uses only two symbols, it would be called a binary code. Binary codes are the simplest codes; they can be implemented easily by computing machinery (e.g., S = {0,1}, S = {true, false}, S = {+5V,-5V} or S = {?,?}
).
The meaning of a symbol often depends on its position within the sequence of symbols. Frequently symbols are combined in groups to form new symbols (commonly known as “words”) which themselves are combined into higher units (“sentences”).
Reading and understanding a digital code requires two distinct steps:
-
The file format has to be identified.
-
The appropriate syntactic and semantic rules of the file format have to be applied to interpret the digital code.
If a data file is identified as being a TIFF image file, but the specification of the TIFF format is not known, the image represented by the data cannot be extracted. Therefore, if the semantic system of a digital code cannot be identified or is not known, the information contained in the digital data cannot be extracted.
This leads to the following prerequisites for retrieving digital data:
-
The physical property used to create the marks has to be known. For current media types the physical property used to create the marks is usually known. We know that a floppy disk has magnetic marks and that a compact disc has optically detectable marks. However, future digital archeologists might have problems determining which physical property has been used to create the marks, especially for new, emerging recording technologies.
-
The physical marks on the medium must be detectable and convertible into symbols. If, as a result of damage and aging, this is no longer possible, the medium has to be considered “destroyed” and unreadable.
-
The syntactic and semantic system (file format) has to be identified and known.
If any of these tasks cannot be accomplished, the digital data will no longer be readable and the recorded information is lost.
Digital data is independent of the medium it is recorded on as long as the symbols can be deciphered. For example, a binary computer file representing an image using the JPEG format could be engraved into a stone—it would be not very handy to work with, but nevertheless feasible. Thus, digital data can be copied from one medium to any other medium without loss.
Digital data can be copied without any loss by reproducing the same sequence of symbols from the “original” sequence. The two copies will be indistinguishable from each other and therefore it is not possible to determine which one is the “original.” However, since the physical representation of a digital code always has an analog nature that may result in errors, the digital copy process is only completed if the two copies have been verified to be identical either by a symbol-wise (or, in case of binary data, bit-wise) comparison or by using checksums. Therefore, digital data can be copied without limits and there will be no generational loss.
Digital data can be transported through space with the speed of light without the need for moving atoms or matter. This property allows digital data to be tele-copied without loss at the speed of light.
Table of common image file formats with “magic numbers”
File Type | Typical Extension | Hex Digits xx = variable | ASCII Digits |
---|---|---|---|
GIF | .gif | 47 49 46 38 | GIF8 |
FITS | .fits | 53 49 4d 50 4c 45 | SIMPLE |
Bitmap | .bmp | 42 4d | BM |
Graphics Kernel System | .gks | 47 4b 53 4d | GKSM |
IRIS rgb | .rgb | 01 da | . . |
ITC (CMU WM) | .itc | f1 00 40 bb | . . . . |
JPEG File Interchange | .jpg | ff d8 ff e0 | . . . . |
NIFF (Navy TIFF) | .nif | 49 49 4e 31 | IIN1 |
PM | .pm | 56 49 45 57 | VIEW |
PNG | .png | 89 50 4e 47 | .PNG |
Postscript | .[e]ps | 25 21 | %! |
Sun Rasterfile | .ras | 59 a6 6a 95 | Y.j. |
Targa | .tga | xx xx xx | . . . |
TIFF (Motorola—big endian) | .tif | 4d 4d 00 2a | MM.* |
TIFF (Intel—little endian) | .tif | 49 49 2a 00 | II*. |
X11 Bitmap | .xbm | xx xx | |
XCF Gimp file structure | .xcf | 67 69 6d 70 20 78 63 66 20 76 | gimp xcf |
Xfig | .fig | 23 46 49 47 | #FIG |
XPM | .xpm | 2f 2a 20 58 50 4d 20 2a 2f | /* XPM */ |
There have been no comments | Subscribe to Comments | Jump to Form »