Cris’ Image Analysis Blog

theory, methods, algorithms, applications

DIPlib 3.5.0 released

This week we released DIPlib version 3.5.0. In my previous blog post I talked about new syntax for selecting labeled objects. There are quite a few more quality-of-life improvements like that in this release; changes that, rather than adding algorithms, make existing ones easier to use:

  • dip::​Measurement::​Iterator​Feature::​AsImage() converts a measurement column into an image, making it easy to apply a lot of statistics functions to measurement features.

  • There’s a new overload of dip::​Gaussian​Mixture​Model() that takes a 1D histogram as input.

  • dip::​GaussFT() has a new parameter boundaryCondition, so you don’t need to manually pad and crop the image when doing Gaussian convolution through the FFT.

  • dip::Quartiles() computes the minimum, maximum and three percentiles much more efficiently than it could be computed when calling dip::Percentile() three times.

Of course there are many more changes and bug fixes, see the change log.

There is only one new algorithm (though it’s fairly trivial): dip::​Histogram now can use the Freedman–Diaconis rule to choose the bin size. Use the new function dip::​Histogram::​Optimal​Configuration() to create the configuration object that enables this rule.

This change has the biggest impact of all the changes in this release, because we’re now using it in all the functions that use a histogram to determine a threshold. In some situations, these algorithms weren’t correct because the histogram that they relied on did not have enough resolution.

For example, consider the following example image:

import diplib as dip

img = dip.Image((1024, 1024))
img.Fill(20)
dip.GaussianNoise(img, out=img, variance=16)
dip.DrawBandlimitedPoint(img, (500, 400), value=1000000, sigmas=10)
dip.DrawBandlimitedPoint(img, (800, 200), value=50000, sigmas=3)
dip.DrawBandlimitedPoint(img, (700, 800), value=120000, sigmas=2)
dip.DrawBandlimitedPoint(img, (400, 600), value=400000, sigmas=4)
dip.DrawBandlimitedPoint(img, (200, 300), value=80000, sigmas=3)

An example image, it has a blacl background, and a few very bright Gaussian blobs

The background has a value of 20.0 with normally distributed noise with a variance of 16.0 (standard deviation of 4.0). The foreground is a series of relatively small dots, the brightest one has a peak value of around 4800. This is a challenging image to generate a histogram for, but it’s not out of the ordinary, there are microscope systems producing 12 or 14-bit images like this.

The default histogram generated for this image has 256 bins, stretching from the minimum to the maximum value. This makes the bin size equal to 18.74. All of the background pixels will fall in the same bottom bin:

dip.Histogram(img).Show()

The default histogram

Using such a histogram to estimate the background distribution is simply not possible, the estimate will be wrong by definition.

Using the new histogram configuration, the bin size is computed to be 0.1069. The Freedman–Diaconis rule takes the interquartile range (IRQ, the distance from the 25th to the 75th percentile), and the number of pixels in the image, \(n\) , and determines the bin size as

\[ w_\text{bin} = \frac{2\ \text{IQR}}{\sqrt[3]{n}} \quad . \]

Given that such a small bin size could potentially lead to a humongous histogram, pixels 50 IQR above the upper quartile or below the lower quartile are ignored. Thus, in our case, the histogram will be cropped on the top side, and has 2751 bins:

dip.Histogram(img, configuration=dip.Histogram.OptimalConfiguration()).Show()

A much better histogram

The pixels that are ignored are not expected to be meaningful, but of course they could be. The alternative configuration dip.​Histogram.​Optimal​Configuration​With​Full​Range() does not crop the histogram, leading to 146284 bins.

All the threshold functions that use the histogram to determine a threshold, now use the new Optimal​Configuration(). The functions impacted are:

These functions will no longer always produce the same output, because the threshold value previously was not always correct. Of course, it is possible that an image has a very strange histogram and the algorithm still won’t compute the correct threshold, but it is now much more likely for it to be correct.

For example, applying dip::​Background​Threshold() to the image above previously yielded a threshold of 58.51, now it yields a threshold of 29.08. We expected a threshold of 29.42, computed as the mean plus two times the half-width at half maximum (HWHM) of the background Gaussian peak. The HWHM is approximately 1.1775 times the stdandard deviation for a Gaussian. So we compute 20 + 2 · 1.1775 · 4.0 = 29.42. The difference in this example is not as extreme as the case that prompted this change, see issue #161 on GitHub.

To reproduce the old behavior, compute a default histogram, then call the equivalent threshold function that takes a histogram as input. This function will compute the threshold, which can then be applied to the image using a comparison operator:

// Existing C++ code:
dip::Image bin = dip::OtsuThreshold( img );
// Code that reproduces the old behavior of the existing code:
double threshold = dip::OtsuThreshold( dip::Histogram( img ));
dip::Image bin = img >= threshold;

Or, in Python:

# Existing Python code:
bin = dip.OtsuThreshold( img )
# Code that reproduces the old behavior of the existing code:
threshold = dip.OtsuThreshold( dip.Histogram( img ))
bin = img >= threshold

Let us know what you think of this new functionality!

Questions or comments on this topic?
Join the discussion on LinkedIn or Mastodon