MartΓ Bosch (CEAT); Gionata Ghiggi (LTE); Son Pham-Ba and Charlotte Weil (ENAC-IT4R)
May 28, 2024.
Funded by the ETH Domain Open
Research Data (ORD) Program
Example spatial time series (TS) π data:
| station | 1 | β¦ | 33 | ||||
|---|---|---|---|---|---|---|---|
| variable | temperature | water_vapour | precipitation | β¦ | temperature | water_vapour | precipitation |
| time | |||||||
| 2021-01-01 00:00:00 | 2.2 | 99.0 | 0.2 | β¦ | 3.4 | 92.0 | 0.1 |
| 2021-01-01 00:10:00 | 2.3 | 99.0 | 0.2 | β¦ | 3.2 | 92.0 | 0.1 |
| 2021-01-01 00:20:00 | 2.4 | 99.0 | 0.1 | β¦ | 3.2 | 92.0 | 0.2 |
| β¦ | β¦ | β¦ | β¦ | β¦ | β¦ | β¦ | β¦ |
| 2021-01-31 23:30:00 | 6.1 | 99.0 | 0.2 | β¦ | 6.9 | 80.0 | 0.0 |
| 2021-01-31 23:40:00 | 6.1 | 98.0 | 0.3 | β¦ | 6.9 | 81.0 | 0.0 |
| 2021-01-31 23:50:00 | 6.1 | 99.0 | 0.3 | β¦ | 6.8 | 82.0 | 0.2 |
4464 rows Γ 99 columns
df.resample| variable | temperature | water_vapour | precipitation | |
|---|---|---|---|---|
| station | time | |||
| 1 | 2021-01-01 00:00:00 | 2.2 | 99.0 | 0.2 |
| 2021-01-01 00:10:00 | 2.3 | 99.0 | 0.2 | |
| 2021-01-01 00:20:00 | 2.4 | 99.0 | 0.1 | |
| 2021-01-01 00:30:00 | 2.4 | 99.0 | 0.2 | |
| 2021-01-01 00:40:00 | 2.5 | 99.0 | 0.2 | |
| β¦ | β¦ | β¦ | β¦ | β¦ |
| 33 | 2021-01-31 23:10:00 | 5.6 | 100.0 | 0.1 |
| 2021-01-31 23:20:00 | 5.6 | 100.0 | 0.0 | |
| 2021-01-31 23:30:00 | 5.7 | 100.0 | 0.2 | |
| 2021-01-31 23:40:00 | 5.4 | 100.0 | 0.1 | |
| 2021-01-31 23:50:00 | 5.3 | 100.0 | 0.4 |
147312 rows Γ 3 columns
# e.g., stations within 10 km of Lausanne's center
query_geom = gpd.tools.geocode("Lausanne").to_crs(ds.station.crs).buffer(10e3)
ds.xvec.query("station", query_geom)
ds.resampleWe could not find a tool to deal with:
TStore is a Python library for flexible storage and processing of (spatial) TS data. Two key features:
TS,
TSDF, TSLong and TSWide objects
to organize hetereogeneous (spatial) time series data into Python data
framesTStore is a
hierarchically-structured specification to reliably and efficiently
store (spatial) TS data based on Parquet (and GeoParquet)Consider a TS object representing a time-series. Then
the long data frame becomes:
| data | |
|---|---|
| station | |
| 1 | TS[shape=(4464, 3),start=2021-01-01 00:00:00,e⦠|
| 2 | TS[shape=(4464, 3),start=2021-01-01 00:00:00,e⦠|
| 3 | TS[shape=(4464, 3),start=2021-01-01 00:00:00,e⦠|
| β¦ | β¦ |
| 31 | TS[shape=(4464, 3),start=2021-01-01 00:00:00,e⦠|
| 32 | TS[shape=(4464, 3),start=2021-01-01 00:00:00,e⦠|
| 33 | TS[shape=(4464, 3),start=2021-01-01 00:00:00,e⦠|
TS, e.g., useful with
different temporal resolution, periods of maintenance (no data)β¦TS object may be univariate or
multivariate| temperature | water_vapour | precipitation | |
|---|---|---|---|
| station | |||
| 1 | TS[shape=(4464,),start=2021-01-01 00:00:00,end⦠| TS[shape=(4464,),start=2021-01-01 00:00:00,end⦠| TS[shape=(4464,),start=2021-01-01 00:00:00,end⦠|
| 2 | TS[shape=(4464,),start=2021-01-01 00:00:00,end⦠| TS[shape=(4464,),start=2021-01-01 00:00:00,end⦠| TS[shape=(4464,),start=2021-01-01 00:00:00,end⦠|
| 3 | TS[shape=(4464,),start=2021-01-01 00:00:00,end⦠| TS[shape=(4464,),start=2021-01-01 00:00:00,end⦠| TS[shape=(4464,),start=2021-01-01 00:00:00,end⦠|
| β¦ | β¦ | β¦ | β¦ |
| 31 | TS[shape=(4464,),start=2021-01-01 00:00:00,end⦠| TS[shape=(4464,),start=2021-01-01 00:00:00,end⦠| TS[shape=(4464,),start=2021-01-01 00:00:00,end⦠|
| 32 | TS[shape=(4464,),start=2021-01-01 00:00:00,end⦠| TS[shape=(4464,),start=2021-01-01 00:00:00,end⦠| TS[shape=(4464,),start=2021-01-01 00:00:00,end⦠|
| 33 | TS[shape=(4464,),start=2021-01-01 00:00:00,end⦠| TS[shape=(4464,),start=2021-01-01 00:00:00,end⦠| TS[shape=(4464,),start=2021-01-01 00:00:00,end⦠|
TSDF object
TS are pandas
ExtensionDtypeTSArray are pandas
ExtensionArrayGeoPandas compatible:
Consider k years of temperature and precipitation data form n
stations. Then, the TStore looks like:
|
|
We can β¦
TS objects are loaded into the Apache
Arrow memory format5 years of 10 min observations from the 33 Agrometeo stations1 in the Canton of Vaud, Switzerland:
| variable | temperature | water_vapour | precipitation | |
|---|---|---|---|---|
| station | time | |||
| 1 | 2019-06-01 00:00:00 | 17.0 | 57.0 | 0.0 |
| 2019-06-01 00:10:00 | 16.5 | 60.0 | 0.0 | |
| 2019-06-01 00:20:00 | 16.3 | 59.0 | 0.0 | |
| β¦ | β¦ | β¦ | β¦ | β¦ |
| 305 | 2024-04-30 23:30:00 | 14.9 | 74.0 | 0.0 |
| 2024-04-30 23:40:00 | 15.3 | 69.0 | 0.0 | |
| 2024-04-30 23:50:00 | 15.3 | 67.0 | 0.0 |
8534361 rows Γ 3 columns
Resulting TStore directory structure:
|
|
Slides (and notebook): martibosch/tsdf-geopython-2024
Repository: ltelab/tstore