Image Montage Zoomer


A image montage is an image made up of a number of smaller photographs. Together, the smaller photographs look like the one single bigger photograph. An example of a image montage is this movie poster from "The Truman Show:"

The image montage zoomer will be a screensaver which will display an image montage on the screen. The screensaver will then zoom into that image montage "forever." More specifically, as the zoomer magnifies one part of the image montage, one of the smaller pictures will grow until it becomes visible that the smaller picture is another image montage, and so on.

The idea behind the image montage zoomer came from a discussion with Charless Fowlkes where he proposed a video version of the zoomer involving video clips from the point of view of someone running through a forest. The simpler image montage zoomer idea came from a later discussion. We will be working on this project jointly, and this page serves as a place for me to write down my thoughts in order to help me clarify them.


  • One obvious plentiful and possibly interesting source of data for the image montage zoomer would be a library of personal digital photos. JPEG decompression takes about 1 second per 2 megapixel photo on my P-III 500 MHz computer. Assuming 1000 library photos used to create the montages, it would take more than 16 minutes to load all of the images from their original compressed state.
  • Consider a representation of each image in our collection as a 16-bit 256 × 256 bitmap. Each such image would consume 128KB of memory. A collection of 1000 such images would consume 125MB. It is not reasonable to store all of the images in physical memory.
  • As the window "zooms" forever into the first image, the corners of the rectangle represented by the edges of the screen will trace out a path through the original image. That path should be at least C1 continuous (continuous first derivative) to avoid the perception of abrupt changes in motion. It is possible that C2 (continuous second derivative) continuity will be needed. We will need to pick zoom targets in advance in order to calculate such C1 and/or C2 paths.

Image Montage Architecture

In this section, we describe the workings of the system.

General Algorithm and Definitions

(not written yet)

OpenGL Rendering [proposed]

OpenGL can handle transparency and scaling of images in a platform independent, hardware accelerated manner, which makes it an attractive option for rendering the montages. This section describes the use of the OpenGL library to implement the Image Montage Zoomer.

We create a scene containing the following coplanar elements.

  1. Rectangle #1, the largest image currently visible (black rectangle)
  2. Montage #1, NUMTILESX × NUMTILESY rectangles for the montage of Rectangle #1 (blue rectangles)
  3. Rectangle #2, the image in Montage #1 towards which we are zooming (green rectangle)
  4. Montage #2, NUMTILESX × NUMTILESY rectangles for the montage of Rectangle #2 (red rectangles)

Our scene. NUMTILESX = 4, NUMTILESY = 4

We start initially with Rectangle #1 filling the screen. After some time, we have smoothly shrunk and moved the displayed area such that Rectangle #2 is filling the screen.

At this point, we do the following steps:

  1. Reset the projection back to its initial settings, such that Rectangle #1 is again filling the screen.
  2. Take the texture that was mapped onto Rectangle #2 and map it onto Rectangle #1. Do the same for each of the rectangles in Montage #2 and Montage #1.
  3. Pick a new image in Montage #1 to zoom at.
  4. Move Rectangle #2 and Montage #2 to juxtapose the new zoom target.
  5. Download

Issue: This last set of steps seems will be significantly more expensive than simply re-calculating the projection window. This may cause a momentary pause in the animation each time a new image reaches the point of filling the screen. It may be possible to smooth out these operations by adding in a few new textures and completely transparent rectangles in their new locations each animation frame, instead of waiting until we have zoomed all the way into Rectangle #2. Alternatively, the motion path of the window could be a function of wall clock time such that the lengthy calculation simply causes the motion of the screen to jump ahead rather than momentarily pause.

Image Cache

One obvious plentiful and possibly interesting source of data for the image montage zoomer would be a library of personal digital photos. Consider a library of 1000 2 megapixel photos, stored with "high quality" JPEG compression. On a P-III 500 MHz computer, decompressing one such library would take more than 15 minutes at 1 second per photo. A 15 minute startup time for a screensaver is clearly unacceptable, so it will not be possible to load all of the images at startup.

Furthermore, consider a representation of each of the images as a 16-bit 256 × 256 bitmap. Each such image would consume 128KB of memory. 1000 such images would consume 125MB. Storing all of the images in memory, even at this low resolution, would consume too much physical memory.

To rectify these issues, we introduce a three-tier image information cache. The tiers of the cache are as follows.

  1. Mosaic Summary
  2. Small Texture
  3. Full Texture
Each tier of the cache can be generated from the subsequent tier. For example, the Mosaic Summary can be generated from the Small Texture, and the Small Texture can be generated from the Full Texture. The Full Texture is generated from the original image, the storage of which is not managed by the cache. Each tier is described in its own subsection below.

Other parts of the application can request either the Mosaic Summary, the Small Texture, or the Full Texture from the Image Cache. If said information is already available, the Image Cache simply returns it. However, if the information has not yet been calculated, the Image Cache will cacluate the information on demand. The calculated version is then stored internally (as descrbied below) to service future requests.

Tier 1: Mosaic Summary

This is a summary, stored in memory, of the color information in the image, used in the Montage Algorithm.

The Mosaic Summary is represented as a single triple {R, G, B}. R is calculated by averaging all of the red components in each of the pixels in the Small Texture. Similarly, G is the average green component, and B is the average blue component.

Tier 2: Small Texture

The Small Texture is the Full Texture image, scaled down to a width of TEXTURE_WIDTH / NUM_TILES_X and a height of TEXTURE_HEIGHT / NUM_TILES_Y. It is used for the images in the montage that will never be displayed at high resolution.

The Small Texture tier is stored in memory for quick access. If the maximum collection size that we wish to support makes this infeasible, it may be possible to move some or all of the Small Texture tier to the disk, in uncompressed format.

Tier 3: Full Texture

This is the original image scaled to the width of TEXTURE_WIDTH and the height of TEXTURE_HEIGHT, ignoring the original aspect ratio. This image is stored in compressed format on the disk in a special cache directory.

Motion Path Calculation

The motion path problem can be viewed as follows:

The screen saver is generating an infinite sequence of points in 2-D space corresponding to the corners of each of the images that it is zooming toward. A motion path will need to be calculated based on this sequence of points which:

  1. must pass through every point, in order
  2. will ideally have a continuous second derivative such that the motion does not change direction suddenly
  3. must be computable without the entire sequence of points. obviously the motion path will never be calculated if we must produce the entire infinite sequence first
  4. can be plotted for points whose distances apart along the motion path are a geometric series.
"natural splines" satisfy (#1, #2, #4), "B-splines" satisfy (#2, #3, #4), and "bezier curves" can be trivially constrained to satisfy (#1, #3, and #4). It appears that "A-splines" may satisfy (#1, #2, #3). More investigation is needed to see if they also satisfy #4.

One option is to drop requirement 2 and use bezier curves constrained to be C1 continuous. They look simple and quick to calculate and C1 continuity might be good enough for our purposes.

Software Rendering

Not all computers will be equipped with an OpenGL compatible card. Thus we implement a all-software rendering routine in case it is not possible to run the program at full speed using OpenGL. This will allow anyone to get at least an idea of what the software does.

We visualize our scene as containing the following elements:

  1. Rectangle #1, the largest image for which a piece is currently visible (black rectangle)
  2. Montage #1, NUMTILESX × NUMTILESY rectangles for the montage of Rectangle #1 (blue rectangles)
  3. Rectangle #2, the image in Montage #1 towards which we are zooming (green rectangle)
  4. Montage #2, NUMTILESX × NUMTILESY rectangles for the montage of Rectangle #2 (red rectangles)

Our scene. NUMTILESX = 4, NUMTILESY = 4

Currently, the software renderer currently only renders Rectangle #1 and Montage #1. We calculate, for each frame, a "virtual" rectangle, within the coordinate space of Rectangle #1, which will represent the edges of the physical screen.

We then scan convert the entire screen into Rectangle #1 and Montage #1 coordinates. if the destination coordinate is changing by at most 1 for every change in screen coordinates, we use Bresenham's line drawing algorithm. However, if the destination coordinate changes by more than 1 for every change in screen coordinates, we use a modified version of Bresenham's Algorithm derived as follows.

Let Δx = x2 - x1
Let Δy = y2 - y1
Let m = Δy / Δx

   e = 0
   y = y1
   for x = x1 to x2 do
      plot (x, y)
      if (e + frac(m) < 0.5)
         e = e + frac(m)
         y = y + int(m)
         e = e + frac(m) - 1
         y = y + int(m) + 1

If Δy and Δx are integers, we know that:

= frac(Δy / Δx)
= (Δy % Δx)/(Δx)


= int(Δy / Δx)
=Δy / Δx

We convert the if guard to integer math by multiplying by 2 × Δx and making the replacements: e' = eΔx, r = (Δy % Δx)

e + frac(m) < 0.5
e + (Δy % Δx)/(Δx) < 0.5
eΔx + (Δy % Δx) < 0.5 Δx
2 (eΔx + (Δy % Δx)) < Δx
2 (e' + (Δy % Δx)) < Δx
2 (e' + f) < Δx

We make the same replacements in the update calculations for e.

e = e + frac(m)
e Δx = e Δx + frac(m) Δx
e' = e' + frac(m) Δx
e' = e' + (Δy % Δx)/(Δx) Δx
e' = e' + (Δy % Δx)
e' = e' + f

e = e + frac(m) - 1
e Δx = e Δx + frac(m) Δx - Δx
e' = e' + frac(m) Δx - Δ x
e' = e' + (Δy % Δx)/(Δx) Δx - Δx
e' = e' + (Δy % Δx) - Δx
e' = e' + f - Δx

Which gives us the modified Bresenham's algorithm as follows:

   e' = 0
   y = y1
   for x = x1 to x2 do
      plot (x, y)
      e' = e' + f
      y = y + dy/dx
      if ((e' << 1) >= dx)
         e = e - dx
         y = y + 1

Montage Algorithm

Start with a set of images S. Each image in the set has:
  • A name, which is unique to that image
  • A width, a positive integer number
  • A height, a positive integer number
  • width × height pixel color values

We also define a "mosaic summary" and a "summary closeness" function. We may wish to explore different possible "mosaic summary" and "summary closeness" functions at a later time. For now, for the purposes of describing this algorithm, we pick mosaic summary and summary closeness as follows:

The "mosaic summary" is the average color of an image, calculated by taking the arithmetic mean of all of the color values in the image.

The "summary closeness" function is the perceptual distance between two "mosaic summary" colors.

The algorithm has three steps:

  1. Calculation of "mosaic summaries" for all of the images in S
  2. Calculation of sub-"mosaic summaries" for different sub-images in S
  3. Matching of sub-"mosaic summaries" to full image "mosaic summaries" using the "summary closeness" function

Calculation of "mosaic summaries"

Calculation of sub-"mosaic summaries"

Matching using "summary closeness"


  1. Enlightenment Project. Imlib2.
  2. Flanagan, Colin. The Bresenham Line-Drawing Algorithm.
  3. Gnome Project. GdkPixbuf.
  4. Google search on Natural Splines.
  5. ImageMagick.
  6. Microsoft Corporation. Handling Screen Savers.
  7. Perry, Matt. Fireflies Screensaver.
  8. Riemersma, Thiadmer. Colour metric.
  9. Silvers, Robert.
  10. Troebs, Michael. Algorithms to create Image Montages. May 30, 2000.
  11. Troebs, Michael. Juggle.
  12. Tugrul Galatali. Really Slick Screensavers Port to GLX.
Subscribe to All Posts - Wesley Tanaka