Cris’ Image Analysis Blog

theory, methods, algorithms, applications

ITK's architecture

This morning I was reading a chapter about the architecture of ITK in a book called The Architecture of Open Source Applications (Volume 2). ITK is the Insight Toolkit, a large library for image analysis in C++, specifically aimed at medical applications. Most of its functions work on images of any dimensionality, just like in DIPlib. But just about every other choice made in this library is the polar opposite of the choices made in DIPlib. And this is why I was curious to learn about why those choices were made.

The book chapter was written by Luis Ibáñez and Brad King. Luis is one of the main architects of ITK, and Brad is one of the original developers. ITK’s development started in 1999 with lots of funding from the US government. This chapter was written maybe 10 or 15 years ago, ITK was already well established.

Reading this chapter, I saw some statements that I disagreed with, I guess that is the main reason I’m writing this blog post now. I am going to address these statements, and their justification for some of the design choices, out of order from how they appear in the chapter, so that my comments make more sense. Obviously, my comments will make lots of comparisons to DIPlib.

I once tried using ITK. I spent a week writing a simple function to align two images, then gave up and implemented a rigid alignment from scratch in a day. That is to say, ITK has a very steep learning curve. Keep that in mind when you read my comments below.

Object-oriented programming

Now, some of my criticism of ITK comes from my dislike of the modern interpretation of object-oriented programming (OOP): “everything is a class”. The Java programming language for example is designed around this mentality. In Java you cannot even write a function outside a class. ITK’s architecture revolves around the “everything is a class” mentality. This makes my hair stand on edge when I see ITK code. But I will try to separate my feeling for this programming paradigm from all the other comments I make in this blog post.

OOP

OOP is supposed to simplify the construction of large, complex programs by dividing it up into smaller, minimally-dependent portions. Each portion is an object, that communicates with other objects through their shared API. This shared API needs to be minimal, since it is the only thing that creates dependencies between the components. This division reduces the complexity of the application: pieces of code can be read and modified independently of other pieces of code. Nowhere in this definition does it say or imply that the individual components must be made using classes, that classes must inherit behavior or interfaces from other classes, or that the OOP paradigm is even suitable for writing a library.

Functionality

The article starts by explaining things about the field of image analysis, and what an image analysis library needs to deal with.

In this context, a certain level of “redundancy”—for example, offering three different implementations of the Gaussian filter—is not seen as a problem but as a valuable feature, because different implementations can be used interchangeably to satisfy constraints and exploit efficiencies with respect to image size, number of processors, and Gaussian kernel size that might be specific to a given imaging application.

This is absolutely right. Different algorithms that compute the same thing in different ways are not necessarily redundant. The example of the Gaussian filter (DIPlib also has 3 different algorithms) is the most obvious one: The IIR algorithm is most efficient for a very large sigma, but is an approximation. The FIR algorithm is most efficient for the more common sigma values. The FT algorithm is the only way you can compute the filter for a sub-pixel sigma with any semblance of precision.

[…] ITK has a collection of about 700 filters. Given that ITK is implemented in C++, this is a natural level at which every one of those filters is implemented by a C++ Class following object-oriented design patterns.

I’ve already said that making each filter a class is not real OOP, but OK, let’s set that aside. I don’t think that making each filter a class is a good idea because it makes the user’s code so much more complex. Here’s an example program from the ITK documentation:

const auto input = itk::ReadImage<ImageType>(inputFileName);

using FilterType = itk::SmoothingRecursiveGaussianImageFilter<ImageType, ImageType>;
auto smoothFilter = FilterType::New();
smoothFilter->SetSigma(sigmaValue);
smoothFilter->SetInput(input);

itk::WriteImage(smoothFilter->GetOutput(), outputFileName);

I’ve left out all the include statements, input parsing, exception handling, and the definition of ImageType (which we’ll talk about later). Also, I’m honestly surprised that itk::ReadImage and itk::WriteImage are not classes. Apparently they are convenience functions introduced in release 5.2.0 (May 2021). They just encapsulate the creation and use of the image reading and writing objects.

Anyway, the above is a very simple program, it reads an image, applies a Gaussian filter, and writes the result to file. The same program in DIPlib:

auto input = dip::ImageRead(inputFileName);
auto output = dip::Gauss(input, {sigmaValue});
dip::ImageWrite(output, outputFileName);

As a library user, which would you prefer to write?

Sure, it is more explicit to do smoothFilter->SetSigma(sigmaValue) than having sigmaValue be the second input argument to a function. But when being explicit causes you to have to write four lines of code instead of one… sigh. For functions with lots of parameters, the C++ call syntax can be quite obtuse. We’re reliant on our IDE to show the name of the parameters for such function calls. I really like the Python syntax there, where you can explicitly name the parameters as you fill in their value in the function call. Will C++ ever adopt that syntax?

Pipeline

ITK is designed around the so-called “Data Pipeline architecture”:

The staged nature of most image analysis tasks led naturally to the selection of a Data Pipeline architecture as the backbone infrastructure for data processing. The Data Pipeline enables:

  • Filter Concatenation: A set of image filters can be concatenated one after another, composing a processing chain in which a sequence of operations are applied to the input images.
  • Parameter Exploration: Once a processing chain is put together, it is easy to change the parameters of any filter in the chain, and to explore the effects that such change will have on the final output image.
  • Memory Streaming: Large images can be managed by processing only sub-blocks of the image at a time. In this way, it becomes possible to process large images that otherwise would not have fit into main memory.

This core principle informed every other decision in the library. Regarding the second point, they say:

Image filters typically have numeric parameters that are used to regulate the behavior of the filter. Every time one of the numeric parameters is modified, the data pipeline marks its output as “dirty” and knows that this particular filter, and all the downstream ones that use its output, should be executed again. This feature of the pipeline facilitates the exploration of parameter space while using a minimum amount of processing power for each instance of an experiment.

This is a neat idea, but to change the parameters, I’d have to recompile my program (meaning I don’t get to reuse any of its working data), unless I write code to let the user modify parameters. If I write code to let the user modify parameters, I can also write code to re-process the relevant steps of the pipeline. I don’t understand why this use case needs to be embedded in the library. But it’s cool.

The main problem with this pipeline idea then is that all intermediate images are stored in memory. If I have a pipeline with 100 individual processing steps (this is not out of the ordinary at all!), then I’ll have 100 intermediate results being stored. I cannot instruct a step to overwrite its input to save memory, and I cannot discard intermediate images when I don’t need them anymore.

So this one use case, of allowing a program to do some interactive parameter exploration, makes the (IMO) more common use case, of a program with fixed parameters needing to be as fast and memory-efficient as possible, harder.

The third point, memory streaming, is also very interesting. Indeed, sometimes images are so large they don’t fit in memory. Back in 1999 when ITK was designed, this might have been more common than it is today, but it certainly still happens, for example in digital pathology.

But this point is contradicted by the design decision of keeping all intermediate images in memory. Or maybe that choice makes it more important to be able to process the image in smaller tiles, since it wastes so much memory…

But it is true, being able to write a program as if you’re processing the image as a whole, but have it automatically split the image up into tiles for you and process those tiles independently (and possibly in parallel) is great. Each of the steps in the pipeline know how much margin they need (each filter will read data from outside its output window, so its input needs to be a bit larger), the margin gets computed automatically. This is one of the more difficult things to get right when doing tile-wise processing.

This margin, also called overlap, is processed multiple times, and is necessary for the result on two neighboring tiles to match up at the edge between them. The smaller the margin, the more efficient the algorithm is, but if the margin is too small, then the two tiles don’t agree at the edge that joins them, and the final result will show seams. Hence, finding the minimal and sufficient overlap is important.

However,

Streaming, unfortunately, can not be applied to all types of algorithms. Specific cases that are not suitable for streaming are: Iterative algorithms. […] Algorithms that require the full set of input pixel values […]. Region propagation or front propagation algorithms […]. Image registration algorithms […].

This is a large fraction of the image analysis toolbox. Most algorithms are not trivial to apply in tiles. For example, you need to understand the content of the image to decide how large the margin must be for the Watershed algorithm to produce correct results when applied tile-wise. How far information gets propagated across the image by such an algorithm depends on the image content. It is only simple neighborhood algorithms where you can determine the margin needed without understanding the application.

And then I ask myself: is it really worth it, creating this complex “Data Pipeline architecture” so that some programs can automatically be applied tile-wise to large images?

Obviously in DIPlib we took the approach that an algorithm works on a complete image in memory. The programmer can use these functions to process image tiles, by manually determining how large these margins must be, writing the code to read the input image tile-wise, and writing code to merge the results at the end. ITK writes the result tile-wise to file, but most image analysis programs do not produce an output image, they produce measurement results. So combining the results from file-wise processing is application-dependent. Maybe you need to decide which of the cells that intersect the tile “belong” to the tile, maybe you need to merge region outlines produced in tiles, maybe you need to sum the individual tile measurements, …

I have often thought about adding a tile-wise processing function to DIPlib. This function would take as input a user-written function that processes a single tile. But it is the final merging step that I can’t seem to define generically enough.

And there’s another issue with the pipeline approach.

In a typical image analysis problem, a researcher or an engineer will take an input image, improve some characteristics of the image by, let’s say, reducing noise or increasing contrast, and then proceed to identify some features in the image, such as corners and strong edges This type of processing is naturally well-suited for a data pipeline architecture, as shown in Figure 9.1.

Example image processing pipeline, Fig 9.1

So, I think this is rather naive. A typical problem has many more steps, and they’re not in a simple linear fashion like this. This arbitrary graph is a more realistic pipeline:

More realistic example image processing pipeline

In this pipeline, the blocks marked B, C and D read the output of block A, each likely with a different margin in the tile-wise processing. Does block A process the same tile multiple times? How complex does the internal logic have to be for A to see that these different output requests are very similar, and have it process the field of view only once, with the largest of the requested margins? Does ITK implement such logic?

Types

Despite the assistance that the file reader and writer facades provide, it is still up to the application developer to be aware of the pixel type that the application needs to process. In the context of medical imaging, it is reasonable to expect that the application developer will know whether the input image will contain a MRI, a mammogram or a CT scan, and therefore be mindful of selecting the appropriate pixel type and image dimensionality for each one of these different image modalities. This specificity of image type might not be convenient for application settings where users wants to read any image type, which are most commonly found in the scenarios of rapid prototyping and teaching. In the context of deploying a medical image application for production in a clinical setting, however, it is expected that the pixel type and dimension of the images will be clearly defined and specified based on the image modality to be processed.

Sure, this is true. When developing a program for a specific application, you know what file type you’ll be loading, what the pixel type is, and how many dimensions the image has. But when you develop a more generic program, then allowing for run-time type and dimensionality determination is very useful. SimpleITK was developed specifically for this purpose (I actually see it as an admission that ITK is unusable!). DIPlib and OpenCV both allow for run-time type determination. Why? Because it makes writing code easier. And because it makes creating bindings for other languages easier. And because everything can be pre-compiled instead of being offered as templated code in header files that must be compiled over and over again every time you build your application. And because it makes generic programs much, much simpler.

Vigra is another C++ library that has the pixel type and image dimensionality as template parameters. Vigra attempts to create something more akin to the Standard Template Library for image processing. The idea is that algorithms are generic, and the compiler builds an efficient implementation of the algorithm for your particular use case. Maybe you can apply one of the Vigra image processing algorithms on a graph data structure. This is a neat idea in principle, but you end up with the same issues I criticized above.

IO Factories

The Factory pattern in ITK uses class names as keys to a registry of class constructors. The registration of factories happens at run time, and can be done by simply placing dynamic libraries in specific directories that ITK applications search at start-up time. This last feature provides a natural mechanism for implementing a plugin architecture in a clean and transparent way. The outcome is to facilitate the development of extensible image analysis applications, satisfying the need to provide an ever-growing set of image analysis capabilities.

So, instead of allowing the user of the library to write a new image reading function (or class I guess) for a new file format, and call that function directly (or register it with the library before using the library’s image reading functionality), they want this new functionality to be compiled into a separate dynamic library, and installed in a specific directory on the system where the application will run. The itk::ImageFileReader class will then find it and be able to read the new file type.

This seems to be designed specifically so that applications written with ITK can be extended without recompiling those applications. I think this adds a lot of complexity to the library, and to its use. Still, plugins are quite neat.

The loadable IO factories has been one of the most successful features in the architectural design of ITK. It has made it possible to easily manage a challenging situation without placing a burden on the code or obscuring its implementation.

This is surprising to me. Plugins are neat, yes, but are they that important? How many people have built such a plugin? Here is an ITK ImageIO plugin for Bio-Formats. I haven’t found any other plugins so far. Do you know of any others?

Some comments not related to ITK’s design

There were a couple of other comments in this chapter that I’d like to highlight and comment on.

maintainers […] account for more than 75% of the cost of software development over the lifetime of a project.

(Later they have a more elaborate description of this finding, with citations.) I did not know this. An interesting point to keep in mind. For every hour we spend writing an algorithm, we need to spend 3 hours maintaining that code!

In a last section called “Reproducibility”, the authors say:

One of the early lessons learned in ITK was that the many papers published in the field were not as easy to implement as we were led to believe. The computational field tends to over-celebrate algorithms and to dismiss the practical work of writing software as “just an implementation detail”.

Well, the interesting things to discuss about an algorithm are its properties, not its implementation. Then it says:

The outcome is that most published papers are simply not reproducible, and when researchers and students attempt to use such techniques they end up spending a lot of time in the process and deliver variations of the original work. It is actually quite difficult, in practice, to verify if an implementation matches what was described in a paper.

That is completely true. Unless the authors publish their code, reproducing their work is nearly impossible. Often paper authors are ashamed about the quality of their code, and do not want to share it without a complete rewrite. But of course there never is time for that. And so the code will, unfortunately, remain unpublished.

Next, they claim:

ITK disrupted, for the good, that environment and restored a culture of DIY to a field that had grown accustomed to theoretical reasoning, and that had learned to dismiss experimental work. The new culture brought by ITK is a practical and pragmatic one in which the virtues of the software are judged by its practical results and not by the appearance of complexity that is celebrated in some scientific publications.

I have not seen any evidence of this. I imagine they would have liked this outcome. I certainly would have welcomed it. But, 25 years on, papers are, more often than not, still not reproducible. We still don’t get to see the code used to make them.

It turns out that in practice the most effective processing methods are those that would appear to be too simple to be accepted for a scientific paper.

This is also true. I have seen it hundreds of times. It is unfortunate that most journals will only accept a paper if the method described beats the “state of the art”. I have written about this before. This leads to awkward experimental sections with bad comparisons, in an attempt to appear to beat the state of the art. But journals should be publishing papers about new ways of thinking, new approaches, not tiny incremental, arbitrary changes that improve results in one particular test from 91.4% to 91.6% (and probably make results worse in other tests that didn’t get included in the paper).

All the quotes in this blog post are from the chapter ITK, by Luis Ibáñez and Brad King, 9th chapter in the book The Architecture of Open Source Applications (Volume 2), edited by Amy Brown and Greg Wilson, and published under the Creative Commons Attribution 3.0 Unported license.

In closing

And now I’d like to hear from you: Have you used ITK? How usable do you find it? Are any of the design decisions discussed here better than in DIPlib?

Questions or comments on this topic?
Join the discussion on LinkedIn or Mastodon