improver.utilities.neighbourhood_tools module

Provides tools for neighbourhood generation

boxsum(data, boxsize, cumsum=True, **pad_options)[source]

Fast vectorised approach to calculating neighbourhood totals.

This function makes use of the summed-area table method. An input array is accumulated top to bottom and left to right. This accumulated array can then be used to efficiently calculate the total within a neighbourhood about any point. An example input data array:

| 1 | 1 | 1 | 1 | 1 |
| 1 | 1 | 1 | 1 | 1 |
| 1 | 1 | 1 | 1 | 1 |
| 1 | 1 | 1 | 1 | 1 |

is accumulated to become:

| 1 | 2  | 3  | 4  | 5  |
| 2 | 4  | 6  | 8  | 10 |
| 3 | 6  | 9  | 12 | 15 |
| 4 | 8  | 12 | 16 | 20 |
| 5 | 10 | 15 | 20 | 25 |

If we wish to calculate the total in a 3x3 neighbourhood about some point (*) of our array we use the following points:

| 1 (C) | 2  | 3     | 4 (D)  | 5  |
| 2     | 4  | 6     | 8      | 10 |
| 3     | 6  | 9 (*) | 12     | 15 |
| 4 (A) | 8  | 12    | 16 (B) | 20 |
| 5     | 10 | 15    | 20     | 25 |

And the calculation is:

Neighbourhood sum = C - A - D + B
= 1 - 4 - 4 + 16
= 9

This is the value we would expect for a 3x3 neighbourhood in an array filled with ones.

Parameters:

data (ndarray) – The input data array.
boxsize (Union[int, Tuple[int, int]]) – The size of the neighbourhood. Must be an odd number.
cumsum (bool) – If False, assume the input data is already cumulative. If True (default), calculate cumsum along the last two dimensions of the input array.
pad_options (Any) – Additional keyword arguments passed to numpy.pad function. If given, the returned result will have the same shape as the input array.

Return type:

ndarray

Returns:

Array containing the calculated neighbourhood total.

Raises:

ValueError – If boxsize has non-integer type.
ValueError – If any member of boxsize is not an odd number.

pad_and_roll(input_array, shape, **kwargs)[source]

Pads the last len(shape) axes of the input array for rolling_window to create ‘neighbourhood’ views of the data of a given shape as the last axes in the returned array. Collapsing over the last len(shape) axes results in a shape of the original input array.

Parameters:

input_array (ndarray) – The dataset of points to pad and create rolling windows for.
shape (Tuple[int, int]) –
Desired shape of the neighbourhood. E.g. if a neighbourhood width of 1 around the point is desired, this shape should be (3, 3):
```
X X X
X O X
X X X
```
Where O is our central point and X represent the neighbour points.
kwargs (Any) – additional keyword arguments passed to numpy.pad function.

Return type:

ndarray

Returns:

Contains the views of the input_array, the final dimension of the array will be the specified shape in the input arguments, the leading dimensions will depend on the shape of the input array.

pad_boxsum(data, boxsize, **pad_options)[source]

Pad an array to shape suitable for boxsum.

Note that padding is not symmetric: there is an extra row/column at the top/left (as required for calculating the boxsum).

Parameters:

data (ndarray) – The input data array.
boxsize (Union[int, Tuple[int, int]]) – The size of the neighbourhood.
pad_options (Any) – Additional keyword arguments passed to numpy.pad function.

Return type:

ndarray

Returns:

Array padded to shape suitable for boxsum.

rolling_window(input_array, shape, writeable=False)[source]

Creates a rolling window neighbourhood of the given shape from the last len(shape) axes of the input array. Avoids creating a large output array by constructing a non-continuous view mapped onto the input array.

Parameters:

input_array (ndarray) – An array from which rolling window neighbourhoods will be created.
shape (Tuple[int, int]) – The neighbourhood shape e.g. if the neighbourhood size is 3, the shape would be (3, 3) to create a 3x3 array around each point in the input_array.
writeable (bool) – If True the returned view will be writeable. This will modify the input array, so use with caution.

Return type:

ndarray

Returns:

“views” into the data, each view represents a neighbourhood of points.

Raises:

ValueError – If input_array has fewer dimensions than shape.
RuntimeError – If any dimension of shape is larger than the corresponding dimension of input_array.