Random access TIFF files

Random-access TIFF files

Jon Saxton
30th November 2001

Introduction

This document describes a proprosed extension to the TIFF file format which allows random access to images within an arbitrarily large file.

A TIFF file contains an arbitrary number of entities called Image File Directories (IFDs). Each IFD contains a single image and its attributes. The IFDs comprising a TIFF file are arranged and stored as a singly-linked list which provides forward-sequential access only. In a file containing 7000 images, to find and extract the 6900th, it is necessary to traverse the chain of IFDs from the beginning. The IFDs are variable-length structures and the link from one IFD to the next is at the end of each structure. Stepping through 6900 images to find the one you want can be time-consuming. The need to retrieve small groups of images from TIFF files containing as many as 40,000 images prompted the scheme described herein.

Overview

An IFD index comprising an extensible list of offsets is written to the TIFF file as it is created. The offsets point directly at the IFDs in one-to-one correspondence. The list is extended and updated whenever images are appended to the TIFF file.

Technical details

The IFD index is a singly-linked list of variable-length index blocks. Each block contains a 4-byte header, an array of IFD offsets and a 4-byte link to the next index block. The table below describes a single index block.

Index block layout

Address in block
Data type
Description

Origin+0
TIFF_SHORT
c = Capacity of current block

Origin+2
TIFF_SHORT
u = Number of index block entries used

Origin+4
TIFF_LONG
p[0] = IFD position of first img/doc

...
...
...

Origin+4u
TIFF_LONG
p[u-1] = IFD position of last img/doc

Origin+4u+4
TIFF_LONG
First free index slot in this block

...
...
...

Origin+4c
TIFF_LONG
Last free index slot in this block

Origin+4c+4
TIFF_LONG
Link to next index block

**Index block layout**
Address in block	Data type	Description
Origin+0	TIFF_SHORT	c = Capacity of current block
Origin+2	TIFF_SHORT	u = Number of index block entries used
Origin+4	TIFF_LONG	p[0] = IFD position of first img/doc
...	...	...
Origin+4u	TIFF_LONG	p[u-1] = IFD position of last img/doc
Origin+4u+4	TIFF_LONG	First free index slot in this block
...	...	...
Origin+4c	TIFF_LONG	Last free index slot in this block
Origin+4c+4	TIFF_LONG	Link to next index block

In general the link to the next index block will be zero when u < c but code should not rely upon this to determine the end of an index block chain. Instead, index blocks should be read until a zero link is encountered.

The reason for having a linked list of index blocks rather than a single, contiguous block is that an existing TIFF can be extended arbitrarily. If the index were a single, contiguous block then file space would have to be allocated at maximum size when the TIFF file is created. Allowance could be made for growth; for example, when creating a two-image TIFF, space for 20 or 200 indices could be set aside, but what if 800 or 2000 images were added? When creating a TIFF file there is no way to know its ultimate size and hence the amount of space to reserve for an index. Chaining index blocks together solves the extensibility problem, allowing for unconstrained growth. Of course the flexibility has a price in terms of considerable additional complexity in the library code but we pay that cost just once rather than every time a TIFF file is written so it is highly amortised.

To avoid over-allocation of index table space while limiting the number of index blocks a TIFF writer can allocate blocks of increasing size so that small TIFF files have a small overhead. For example, the first index block might contain space for 8 entries. The first expansion might be to 20 entries with subsequent expansions each 50% larger than the previous one up to a maximum of 2000 entries per block.

File signature

Support for index blocks is signalled via a header extension:

Standard TIFF header

Bytes
Data type
Description

0-1
TIFF_SHORT
II for little-endian, MM for big-endian

2-3
TIFF_SHORT
42

4-7
TIFF_LONG
Pointer to first IFD

TIFF header extension

Bytes
Data type
Description

8-11
TIFF_LONG
3735929054, indexed TIFF signature

12-15
TIFF_LONG
Pointer to first index block

With this structure, an index-aware TIFF reader can load the index table with just a few reads from the file, thereafter images can be accessed in constant time. The impact on the size of the TIFF file is small, just four bytes per image and an eight-byte overhead for each index block. The coding required to load an index is fairly straightforward. Writing index tables is rather more complex, particularly when extending an existing TIFF.

Extra benefit

The normal method of appending to a TIFF is to write new images at the end of the file but link them to the front of the IFD chain so that the new images logically precede the older ones. Wheras this produces correctly-formatted TIFF files, the disturbance of the logical ordering could cause difficulty for some systems. A side-effect of index blocks is to allow for sequential extensions to a TIFF, preserving the logical order of IFDs.

Compatibility

A TIFF reader which is unaware of index blocks will never notice their presence. The IFD link is completely intact. This means that third-party and commercial software will continue to work with TIFF files containing index blocks.

A TIFF writer which is unaware of the index blocks and which appends to an indexed TIFF is guaranteed to stuff it up. There will be IFDs in the file which are not randomly accessible.

Implementations

This code has not been added to libtiff. The only implementation to date is by the author and is in C++. It is built on a TIFF library also in C++ which is completely independent of libtiff, albeit more limited in scope.

[ Home Page ]

Visitors: