DetecTree: tree detection from aerial imagery in Python

Martí Bosch

AI & Cities, AMLD 2020, January 28, Lausanne

Types of urban tree canopy datasets

Urban tree catalogs

Format: geo-referenced list of trees

Example: Cantonal inventory of isolated trees (Geneva)
Example: Cantonal inventory of isolated trees (Geneva)

Urban tree catalogs

Pros

  • Often comes with valuable attributes (e.g., dimensions, species, age…)

Urban tree catalogs

Cons

  • Costly: manual surveys
  • Often restricted to public space

Canopy height models

Format: raster of tree height values

Example: Canopy height model (Montreal)
Example: Canopy height model (Montreal)

Canopy height models

Pros

  • Can be automatically derived from raw LIDAR data, i.e., classified cloud of points

Canopy height models

Cons

  • LIDAR data is expensive
  • Building canopy height models requires raw LIDAR data (surface and terrain models are not enough)
  • Few open raw LIDAR datasets

What does DetecTree do?

Idea in a nutshell

  • Input: high resolution aerial imagery
  • Output: binary raster of tree/non-tree pixels

Idea in a nutshell

Example: DetecTree output (right) for Zurich’s 2014/15 Orthophoto (left)
Example: DetecTree output (right) for Zurich’s 2014/15 Orthophoto (left)

Idea in a nutshell

Supervised learning

The training set S = {(xi, yi), i = {1, …, M}} is a sample of M pixels, each represented by a:

  • 27-component feature vector: xi ∈ ℝ27, with information of color, texture and entropy
  • binary response: yi ∈ {0, 1}, with yi = 1 if pixel i actually corresponds to a tree, yi = 0 otherwise

The idea is NOT mine

However

To my knowledge, DetecTree is the first open source tool to perform such a task

Overview of the computational workflow

Step 0: Split image into tiles

Input: aerial imagery raster for the area of interest

  • The dataset might already come as a mosaic of tiles
  • Otherwise, you might use DetecTree’s split_into_tiles function

Step 1: Train/test split

We might use random sampling to select, e.g., 1% of the tiles as training data, however

… the training tiles should be as representative as possible of the overall dataset

Step 1: Train/test split

How do we optimize the representativity of the training set?

  1. For each tile, compute a GIST descriptor [2], i.e., a vector describing key semantics of the tile’s scene
  2. Apply k-means to the GIST descriptors to get k clusters of tiles, with k= size of the training set
  3. For each cluster, select the tile that is closest to the cluster’s centroid for training

Step 1: Train/test split

img_filepath train
0 data/interim/tiles/1091-322_00.tif False
1 data/interim/tiles/1091-142_20.tif False
2 data/interim/tiles/1091-124_11.tif False
3 data/interim/tiles/1091-144_21.tif False
4 data/interim/tiles/1091-213_05.tif False

Step 1: Train/test split

img_filepath train
28 data/interim/tiles/1091-231_12.tif True
98 data/interim/tiles/1091-144_06.tif True
172 data/interim/tiles/1091-231_07.tif True

In this example we have 225 tiles. A training sample of 1% needs 2.25 tiles, thus DetecTree automatically sets k to the ceil of such number, i.e., k = 3

Step 3: Get the training ground-truth tree/non-tree mask

  • For each tile of the training set, we need to provide the ground-truth tree/non-tree masks to get the pixel-level responses yi , ∀i ∈ {1, …, M}
  • We might use GIMP, Adobe Photoshop or LIDAR data (see the detectree-example repository)

Step 4: Train a binary pixel-level classifier

Given the pixel-level responses, DetecTree will compute the pixel-level features xi , ∀i ∈ {1, …, M} and train a binary classifier of the form:

f : ℝ27 → {0, 1} i.e., i = f(xi)

where i is the tree (yi = 1)/non-tree (yi = 0) prediction for pixel i

Step 4: Train a binary pixel-level classifier

Step 5: Pixel-level classification

Given the trained classifier clf, we might use the classify_img method as follows:

Step 5: Pixel-level classification

Which will give us something like:

Step 5: Pixel-level classification

  • Note: the pixel-level classification predicts each pixel independently, which might yield noisy results, e.g., sparse points on grass fields labeled as trees
  • Following the approach of Yang et al. [1], DetecTree refines the pixel-level classification to ensure consistency between adjacent pixels using the graph cuts algorithm of Boykov and Kolmogorov [3].

Step 5: Pixel-level classification

Original tile (left), pixel-level classification (middle), refined classification (right)

Step 5: Pixel-level classification

We can use the classify_imgs method directly on the train/test split dataframe split_df to classify all the tiles at scale with Dask:

Pros of DetecTree

Cons of DetecTree

  • Only provides binary pixel-level classification
  • If tree species/dimensions are important, DetecTree is not the best

Scope of DetecTree

  • When we are only interested in 2D aspects of trees, e.g., proportion of land cover/spatial distribution
  • LIDAR is not available
  • LIDAR is available but it is too expensive
  • LIDAR is available but you don’t want to process 300GB of data

Example application: Urban heat islands in Lausanne

Objective

Given a land cover map (10m), predict the spatial distribution of air temperature

Approach: InVEST Urban cooling model

  • For each pixel:

    Tair ∼ f ( ET , shade , albedo )

  • However: pixels of the same land cover, e.g., road, can have very different levels of tree cover (which influences the ET, shade, albedo…)

Enter DetecTree

  • Refine each land cover class into subclasses depending on the level of tree cover
  • For example, pixels of “road” land cover are further divided into
    • “road with high tree cover”
    • “road with intermediate tree cover”
    • “road with low tree cover”

Enter DetecTree

We go from 25 to 100 land use classes …

… which will likely increase the precision of the model predictions

Results of the study coming soon

Thank you

References

  1. Yang, L., Wu, X., Praun, E., & Ma, X. (2009). Tree detection from aerial imagery. In Proceedings of the 17th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (pp. 131-137). ACM.
  2. Oliva, A., & Torralba, A. (2001). Modeling the shape of the scene: A holistic representation of the spatial envelope. International journal of computer vision, 42(3), 145-175.
  3. Boykov, Y., & Kolmogorov, V. (2004). An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. IEEE Transactions on Pattern Analysis & Machine Intelligence, (9), 1124-1137.