Segmentation of a "tangle cube" that was encased in an outer, cube-shaped shell.
Image segmentation with region growing is simple and can be used as an initialization step for more sophisticated segmentation methods. In this note, I’ll describe how to implement a region growing method for 3D image volume segmentation (note: the code here can be applied, without modification, to 2D images by adding an extra axis to the image) that uses a single seed point and uses a neighborhood average inclusion criteria. First, we’ll implement the algorithm in Python, and next, I’ll show how to implement the algorithm in C, wrapping the C code using f2py in order to make it callable from Python. Implementing the code in C will give us big performance boosts, while wrapping the code in Python will gives all the convenience of scripting in Python.
First, let’s describe the algorithm in some pseudocode:
We’ve glossed over a few details that will appear in the actual implementation, but in a nutshell, the pseudocode says that whenever the image value at a point is above (or equal to) the average image value in the neighborhood of that point, we set the corresponding point in segmentation to “True” and add all the neighbors of that point to a stack to be checked later. When there’s no points left to check, the boolean segmentation volume will (hopefully, roughly) indicate where the object is in the image volume.
It’s worth mentioning at this point that using the neighborhood average as the inclusion criteria will have a rather large effect in the final segmentation outcome. Indeed, we are assuming the object is “bright” relative to its surroundings. Generally speaking, this inclusion criteria should be guided by the application domain at hand and by the interpretation of actual image values. For example, in CT scans the image values correspond to Hounsfield units (which correspond to tissue density), and thus our region growing routine with the inclusion criteria as is would find connected regions of high density for CT data.
Generating an image volume for testing
We’ll generate a synthetic volume for testing, rather than depending on some external data source. Specifically, we’ll use implicit surfaces of the form,
\[x^4 - 5x^2 + y^4 - 5y^2 + z^4 - 5z^2 = k\]
and we’ll use ranges of $k$ to define regions of higher and lower image values. Wolfram MathWorld calls this surface a tangle cube for $k=-11.8$.
The code below creates an image volume that has value 1 in regions where $5 \leq k \leq 20$ and where $k \leq -11$, and has image volume -1 otherwise. We also view the two regions using mayavi.
The figure on the right illustrates the two regions in the image volume. The light blue region corresponds to the outer shell $(5 \leq k \leq 20)$ that encloses the inner region $(k \leq -11)$. So, we can see that by planting the seed for the region growing in the inner structure, we should (hopefully) obtain only the inner structure since the inner region is not connected to the outer shell. Note that in the code above, we’ve also used different numbers of points to sample along the x, y, and z axes to reveal any potential flaws in this aspect of the implementation.
As another means of looking at this test volume, we could look at cuts through the volume, using the following code, for example:
Python implementation
Now then, let’s translate the pseudocode to Python (always a relatively easy task!):
This python code works just like the pseudocode, but we have taken care of a couple of extra details:
A point might be placed in the stack multiple times before its checked. So, if the point has already been checked, we add a check to just continue to the next iteration of the while loop.
We handle the special cases at the borders with an ugly six lines of code (imin, imax, etc.).
The code uses a get_nbhd function which is not yet written. This function adds neighboring voxels when necessary. It accepts the current voxel (pt), the indicator array of checked points (checked), and the dimensions of the volume (img.shape). The checked indicator is of course necessary so we don’t add points that have already been checked, and the volume dimensions argument is necessary so we don’t add points beyond the image borders.
Next, there is a connectivity choice for adding neighbors of a voxel, i.e., the choice of which voxels are considered the neighbors of a given voxel. This is determined by the get_nbhd function and should not be confused with the neighborhood that is used for computing the average image value about a voxel (the latter usually being a comparatively larger region). Rather, this neighborhood defines the coordinates (i.e., voxels) that are adjacent to a given coordinate. There are different ways to define the neighborhood on a 3D lattice, and I use a 6-voxel neighborhood in the code below:
Testing the Python implementation
So, we have the test data, and now let’s finally test the python implementation above. Supposing the grow function is defined in file called region_growing_python:
Upon running the above code, you should see something like:
and a corresponding image of the inner structure that we hoped to pick out:
Alternatively, we could set seed = (45,38,35) and pick out the outer shell:
So, it appears to be working!
C Implementation
We’ll start by writing the implementation in C, and then we’ll wrap the C implementation in Python by using f2py.
A simple stack structure
First, we need to write a minimal stack data structure using a singly linked list.
So, we’ll create a file stack.h containing the following:
… and a quick test program to make sure the stack is working correctly (call it stack_check.c) …
which gives (after gcc stack_check.c stack.h && ./a.out)
Or maybe more importantly, running valgrind ./a.out gives:
:+1: moving on …
The region growing function in C
Let’s now create a file called region_growing.c. At the top, we’ll include the necessary libraries
This function is a perfect candidate for inlining, so we’ve done so.
Next, we write the analog of get_nbhd from the Python implementation. This function was already pretty verbose in the Python version and becomes even more so (unsurprisingly) in the C version:
Moving along, in the C version, we’ll separate out the inclusion criteria check (which we did not do for Python version). This function could be overwritten to use all kinds of inclusion checks, but the main reason for separating out here is because we have to compute the neighborhood average of image values ourselves, rather than relying on built-in NumPy functions.
Almost there. Lastly, we write the grow method, which actually implements the region growing procedure:
Writing the f2py signature file
Next, we should write a “signature” file so f2py knows the names of things and what the types of the arguments are. Create a file called region_growing.pyf:
We made the dimensions of the img array implicit by making the variables m,n,p hidden, and by telling f2py that they should be extracted from the provided img argument. Why do we use type integer for seg instead of boolean? In my experience, f2py can be a little funny when passing boolean arrays to Fortran or C, so it’s easier to just let seg be integer type. This is why in the grow C routine, we let checked be a boolean-valued array, but seg was integer type. The reason here was that checked was local to the C program and not passed to Python.
Makefile
Finally, let’s write a make file that calls f2py:
Running make from the terminal creates a region_growing.so module that can by imported from Python.
Testing the Python-wrapped C implementation
Assuming vol and other variables defined as before, let’s test it out:
If all goes correctly, you should observe something like:
$1.189 / 0.074 \approx 16$. That’s a speed-up of nearly a factor of 16! Of course, this factor will change if we use a image neighbor value larger than 5 as was tested here. Something to note is that in the above, the array returned by the C-wrapped function will actually be of type int8. For reasons that are unknown to me, f2py doesn’t care for accepting boolean types from NumPy and requires int8 instead. Generally, you have to be pretty careful when passing different types using f2py because you can observe strange results that might not appear as blatant errors.
In the figure below, I’ve plotted the run times for the Python and C codes (and their ratio) averaged over 10 runs for image neighborhood radius size of $1,5,10,15,$ and $25$ (I’m using the term radius, but we really mean cube-half-length). This means at each coordinate where we check for inclusion, we must compute the average over a volume of size $(2 \cdot 1+1)^3, (2\cdot 5 + 1)^2$, etc. Notable speed increases happen within the 1 to 15 pixel radius range. For larger neighborhood sizes, the averaging operation dominates the code, and since the NumPy average is implemented in C, it is not too surprising that the relative speed increase obtained by the pure C code begins to vanish.
Also, note that for very large image averaging neighborhood sizes, region growing is essentially equivalent to first performing a global threshold (because every “neighborhood” includes the entire image), followed by a connected component analysis from the chosen seed point.