Martí Bosch
AI & Cities, AMLD 2020, January 28, Lausanne
Format: geo-referenced list of trees
Format: raster of tree height values
The training set S = {(xi, yi), i = {1, …, M}} is a sample of M pixels, each represented by a:
To my knowledge, DetecTree is the first open source tool to perform such a task
Input: aerial imagery raster for the area of interest
split_into_tiles
functionWe might use random sampling to select, e.g., 1% of the tiles as training data, however …
… the training tiles should be as representative as possible of the overall dataset
How do we optimize the representativity of the training set?
img_filepath | train | |
---|---|---|
0 | data/interim/tiles/1091-322_00.tif | False |
1 | data/interim/tiles/1091-142_20.tif | False |
2 | data/interim/tiles/1091-124_11.tif | False |
3 | data/interim/tiles/1091-144_21.tif | False |
4 | data/interim/tiles/1091-213_05.tif | False |
img_filepath | train | |
---|---|---|
28 | data/interim/tiles/1091-231_12.tif | True |
98 | data/interim/tiles/1091-144_06.tif | True |
172 | data/interim/tiles/1091-231_07.tif | True |
In this example we have 225 tiles. A training sample of 1% needs 2.25 tiles, thus DetecTree automatically sets k to the ceil of such number, i.e., k = 3
Given the pixel-level responses, DetecTree will compute the pixel-level features xi , ∀i ∈ {1, …, M} and train a binary classifier of the form:
f : ℝ27 → {0, 1} i.e., ŷi = f(xi)
where ŷi is the tree (yi = 1)/non-tree (yi = 0) prediction for pixel i
response_dir
is where the response tiles are located
clf
is the training classifier, i.e., scikit-learn’s AdaBoostClassifier
Given the trained classifier clf
, we might use the classify_img
method as follows:
Which will give us something like:
Original tile (left), pixel-level classification (middle), refined classification (right)
We can use the classify_imgs
method directly on the train/test split dataframe split_df
to classify all the tiles at scale with Dask:
Given a land cover map (10m), predict the spatial distribution of air temperature
For each pixel:
Tair ∼ f ( ET , shade , albedo )
We go from 25 to 100 land use classes …
… which will likely increase the precision of the model predictions