30th November 2001 |
Introduction
This document describes a proprosed extension to the TIFF file format which allows
random access to images within an arbitrarily large file.
A TIFF file contains an arbitrary number of entities called Image File Directories
(IFDs). Each IFD contains a single image and its attributes. The IFDs comprising
a TIFF file are arranged and stored as a singly-linked list which provides forward-sequential
access only. In a file containing 7000 images, to find and extract the 6900th, it
is necessary to traverse the chain of IFDs from the beginning. The IFDs are variable-length
structures and the link from one IFD to the next is at the end of each structure.
Stepping through 6900 images to find the one you want can be time-consuming. The
need to retrieve small groups of images from TIFF files containing as many as 40,000
images prompted the scheme described herein.
Overview
An IFD index comprising an extensible list of offsets is written to the TIFF
file as it is created. The offsets point directly at the IFDs in one-to-one correspondence.
The list is extended and updated whenever images are appended to the TIFF file.
Technical details
The IFD index is a singly-linked list of variable-length index blocks. Each block
contains a 4-byte header, an array of IFD offsets and a 4-byte link to the next
index block. The table below describes a single index block.
Address in block |
Data type |
Description |
Origin+0 |
TIFF_SHORT |
c = Capacity of current block |
Origin+2 |
TIFF_SHORT |
u = Number of index block entries used |
Origin+4 |
TIFF_LONG |
p[0] = IFD position of first img/doc |
... |
... |
... |
Origin+4u |
TIFF_LONG |
p[u-1] = IFD position of last img/doc |
Origin+4u+4 |
TIFF_LONG |
First free index slot in this block |
... |
... |
... |
Origin+4c |
TIFF_LONG |
Last free index slot in this block |
Origin+4c+4 |
TIFF_LONG |
Link to next index block |
In general the link to the next index block will be zero when u < c but code
should not rely upon this to determine the end of an index block chain. Instead,
index blocks should be read until a zero link is encountered.
The reason for having a linked list of index blocks rather than a single, contiguous
block is that an existing TIFF can be extended arbitrarily. If the index were a
single, contiguous block then file space would have to be allocated at maximum size
when the TIFF file is created. Allowance could be made for growth; for example,
when creating a two-image TIFF, space for 20 or 200 indices could be set aside,
but what if 800 or 2000 images were added? When creating a TIFF file there is no
way to know its ultimate size and hence the amount of space to reserve for an index.
Chaining index blocks together solves the extensibility problem, allowing for unconstrained
growth. Of course the flexibility has a price in terms of considerable additional
complexity in the library code but we pay that cost just once rather than every
time a TIFF file is written so it is highly amortised.
To avoid over-allocation of index table space while limiting the number of index
blocks a TIFF writer can allocate blocks of increasing size so that small TIFF files
have a small overhead. For example, the first index block might contain space for
8 entries. The first expansion might be to 20 entries with subsequent expansions
each 50% larger than the previous one up to a maximum of 2000 entries per block.
File signature
Support for index blocks is signalled via a header extension:
| ||||||||||||
|
With this structure, an index-aware TIFF reader can load the index table with
just a few reads from the file, thereafter images can be accessed in constant time.
The impact on the size of the TIFF file is small, just four bytes per image and
an eight-byte overhead for each index block. The coding required to load an index
is fairly straightforward. Writing index tables is rather more complex, particularly
when extending an existing TIFF.
Extra benefit
The normal method of appending to a TIFF is to write new images at the end of
the file but link them to the front of the IFD chain so that the new images logically
precede the older ones. Wheras this produces correctly-formatted TIFF files, the
disturbance of the logical ordering could cause difficulty for some systems. A side-effect
of index blocks is to allow for sequential extensions to a TIFF, preserving the
logical order of IFDs.
Compatibility
A TIFF reader which is unaware of index blocks will never notice their presence.
The IFD link is completely intact. This means that third-party and commercial software
will continue to work with TIFF files containing index blocks.
A TIFF writer which is unaware of the index blocks and which appends to an indexed
TIFF is guaranteed to stuff it up. There will be IFDs in the file which are not
randomly accessible.
Implementations
This code has not been added to libtiff. The only implementation to date is by
the author and is in C++. It is built on a TIFF library also in C++ which is completely
independent of libtiff, albeit more limited in scope.