DetecTree: tree detection from aerial imagery in Python

Martí Bosch

AI & Cities, AMLD 2020, January 28, Lausanne

Types of urban tree canopy datasets

Urban tree catalogs

Format: geo-referenced list of trees

Example: Cantonal inventory of isolated trees (Geneva) — **Example**: Cantonal inventory of isolated trees (Geneva)

Urban tree catalogs

Pros

Often comes with valuable attributes (e.g., dimensions, species, age…)

Urban tree catalogs

Cons

Costly: manual surveys
Often restricted to public space

Canopy height models

Format: raster of tree height values

Example: Canopy height model (Montreal) — **Example**: Canopy height model (Montreal)

Canopy height models

Pros

Can be automatically derived from raw LIDAR data, i.e., classified cloud of points

Canopy height models

Cons

LIDAR data is expensive
Building canopy height models requires raw LIDAR data (surface and terrain models are not enough)
Few open raw LIDAR datasets

What does DetecTree do?

Idea in a nutshell

Input: high resolution aerial imagery
Output: binary raster of tree/non-tree pixels

Idea in a nutshell

Example: DetecTree output (right) for Zurich’s 2014/15 Orthophoto (left) — **Example**: DetecTree output (right) for Zurich’s 2014/15 Orthophoto (left)

Idea in a nutshell

Supervised learning

The training set S = {(x_i, y_i), i = {1, …, M}} is a sample of M pixels, each represented by a:

27-component feature vector: x_i ∈ ℝ²⁷, with information of color, texture and entropy
binary response: y_i ∈ {0, 1}, with y_i = 1 if pixel i actually corresponds to a tree, y_i = 0 otherwise

The idea is NOT mine

Approach proposed by Yang et al. [1] in 2009
Others have implemented as well: Mapping All of the Trees with Machine Learning

However

To my knowledge, DetecTree is the first open source tool to perform such a task

Overview of the computational workflow

Step 0: Split image into tiles

Input: aerial imagery raster for the area of interest

The dataset might already come as a mosaic of tiles
Otherwise, you might use DetecTree’s split_into_tiles function

Step 1: Train/test split

We might use random sampling to select, e.g., 1% of the tiles as training data, however …

… the training tiles should be as representative as possible of the overall dataset

Step 1: Train/test split

How do we optimize the representativity of the training set?

For each tile, compute a GIST descriptor [2], i.e., a vector describing key semantics of the tile’s scene
Apply k-means to the GIST descriptors to get k clusters of tiles, with k= size of the training set
For each cluster, select the tile that is closest to the cluster’s centroid for training

Step 1: Train/test split

split_df = dtr.TrainingSelector(
    img_dir='path/to/tiles').train_test_split(
        method='cluster-I')
split_df.head()

	img_filepath	train
0	data/interim/tiles/1091-322_00.tif	False
1	data/interim/tiles/1091-142_20.tif	False
2	data/interim/tiles/1091-124_11.tif	False
3	data/interim/tiles/1091-144_21.tif	False
4	data/interim/tiles/1091-213_05.tif	False

Step 1: Train/test split

split_df[split_df['train']]

	img_filepath	train
28	data/interim/tiles/1091-231_12.tif	True
98	data/interim/tiles/1091-144_06.tif	True
172	data/interim/tiles/1091-231_07.tif	True

In this example we have 225 tiles. A training sample of 1% needs 2.25 tiles, thus DetecTree automatically sets k to the ceil of such number, i.e., k = 3

Step 3: Get the training ground-truth tree/non-tree mask

For each tile of the training set, we need to provide the ground-truth tree/non-tree masks to get the pixel-level responses y_i , ∀i ∈ {1, …, M}
We might use GIMP, Adobe Photoshop or LIDAR data (see the detectree-example repository)

Step 4: Train a binary pixel-level classifier

Given the pixel-level responses, DetecTree will compute the pixel-level features x_i , ∀i ∈ {1, …, M} and train a binary classifier of the form:

f : ℝ²⁷ → {0, 1} i.e., ŷ_i = f(x_i)

where ŷ_i is the tree (y_i = 1)/non-tree (y_i = 0) prediction for pixel i

Step 4: Train a binary pixel-level classifier

clf = dtr.ClassifierTrainer().train_classifier(
    split_df=split_df, response_img_dir=response_dir)

response_dir is where the response tiles are located
clf is the training classifier, i.e., scikit-learn’s AdaBoostClassifier

Step 5: Pixel-level classification

Given the trained classifier clf, we might use the classify_img method as follows:

y = dtr.Classifier().classify_img(
    'path/to/some/tile.tif', clf)

Step 5: Pixel-level classification

Which will give us something like:

Step 5: Pixel-level classification

Note: the pixel-level classification predicts each pixel independently, which might yield noisy results, e.g., sparse points on grass fields labeled as trees
Following the approach of Yang et al. [1], DetecTree refines the pixel-level classification to ensure consistency between adjacent pixels using the graph cuts algorithm of Boykov and Kolmogorov [3].

Step 5: Pixel-level classification

Original tile (left), pixel-level classification (middle), refined classification (right)

Step 5: Pixel-level classification

We can use the classify_imgs method directly on the train/test split dataframe split_df to classify all the tiles at scale with Dask:

dtr.Classifier().classify_imgs(
    split_df, 'path/to/output/dir, clf=clf)

Pros of DetecTree

Many available datasets of HRO, e.g., NAIP open dataset: Continental USA at the 0.6 to 1m resolution
Modest memory requirements compared to LIDAR, e.g., Geneva:

LIDAR (25 pt/m²): 310 GB
SWISSIMAGE (1m): 1 GB

Cons of DetecTree

Only provides binary pixel-level classification
If tree species/dimensions are important, DetecTree is not the best

Scope of DetecTree

When we are only interested in 2D aspects of trees, e.g., proportion of land cover/spatial distribution
LIDAR is not available
LIDAR is available but it is too expensive
LIDAR is available but you don’t want to process 300GB of data

Example application: Urban heat islands in Lausanne

Objective

Given a land cover map (10m), predict the spatial distribution of air temperature

Approach: InVEST Urban cooling model

For each pixel:

T_air ∼ f ( ET , shade , albedo )
However: pixels of the same land cover, e.g., road, can have very different levels of tree cover (which influences the ET, shade, albedo…)

Enter DetecTree

Refine each land cover class into subclasses depending on the level of tree cover
For example, pixels of “road” land cover are further divided into

“road with high tree cover”
“road with intermediate tree cover”
“road with low tree cover”

Enter DetecTree

We go from 25 to 100 land use classes …

… which will likely increase the precision of the model predictions

Results of the study coming soon

Thank you

References

Yang, L., Wu, X., Praun, E., & Ma, X. (2009). Tree detection from aerial imagery. In Proceedings of the 17th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (pp. 131-137). ACM.
Oliva, A., & Torralba, A. (2001). Modeling the shape of the scene: A holistic representation of the spatial envelope. International journal of computer vision, 42(3), 145-175.
Boykov, Y., & Kolmogorov, V. (2004). An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. IEEE Transactions on Pattern Analysis & Machine Intelligence, (9), 1124-1137.