Vector Data Cubes in Geospatial Analysis

Prajwal Amaravati
4 min readAug 18, 2024

--

In the rapidly evolving world of geospatial data, vector data cubes have emerged as a powerful tool for spatial analysis, combining the strengths of raster and vector data. This will guide you through the process of working with vector cubes using Python’s xarray, osmnx, and xvec libraries. We’ll explore how to extract zonal statistics over vector geometries, using real-world data and a practical example.

  1. What Are Vector Data Cubes?

Vector cubes are an innovative data structure that allows the integration of vector data (e.g., shapefiles, GeoJSON) with raster data (e.g., NetCDF, GeoTIFF). This fusion enables advanced spatial analysis, where raster values can be aggregated or summarized within vector geometries, making it a potent for various applications like climate analysis, urban planning, and environmental monitoring.

2. Setting Up the Environment

Before diving into the code, make sure you have the necessary libraries installed. You can do this by running:

pip install xarray osmnx xvec

3. How to Create a simple Data cube of a Vector ?

from shapely.geometry import shape
import xarray as xr
import xvec


# Define multiple geometries
geometries = [
{
"type": "MultiPolygon",
"coordinates": [
[
[
[74.72885849814467, 14.137107067917682],
[74.7640565417165, 14.129564630009437],
[74.76279946873179, 14.098137805391742],
[74.72005898725172, 14.098137805391742],
[74.72885849814467, 14.137107067917682],
]
]
],
},
{
"type": "MultiPolygon",
"coordinates": [
[
[
[75.72885849814467, 15.137107067917682],
[75.7640565417165, 15.129564630009437],
[75.76279946873179, 15.098137805391742],
[75.72005898725172, 15.098137805391742],
[75.72885849814467, 15.137107067917682],
]
]
],
},
]

# Define corresponding data values
data_values = ["small_aoi_1", "small_aoi_2"]

# Create the DataArray with the geometries as coordinates
small_aoi_da = xr.DataArray(
data=data_values, # Your data
dims=("small_aoi"), # Dimension name
coords={
"small_aoi": [shape(geometry) for geometry in geometries]
}, # Assign the Shapely geometries directly as coordinates
).xvec.set_geom_indexes("small_aoi", crs="EPSG:4326")
output of small_aoi_da

4. Advantages of Vector Data Cubes

Vector data cubes offer a seamless integration of vector and raster data within a single dataset, enabling advanced spatial operations like buffering, intersecting with bounding boxes, and performing spatial joins directly in xarray. This approach simplifies complex geospatial analyses, making data access more efficient by allowing users to query and manipulate both vector and raster data without switching between different tools. Additionally, vector data cubes provide scalability with Dask, enabling efficient computation on large datasets, and maintain interoperability with other geospatial libraries, making them a versatile choice for comprehensive spatial data analysis.

5. Working with ERA5 Data in xarray

For this example, we’ll use ERA5 reanalysis data — a widely used climate dataset. We’ll focus on a specific subset of this data and perform spatial analysis using vector geometries.

Step 1: Load the Dataset

We’ll start by loading a subset of the ERA5 data using xarray. This dataset contains climate variables like temperature, wind speed, and precipitation. Here’s how you can load the data:

import xarray as xr

uri = "gs://gcp-public-data-arco-era5/ar/1959–2022-full_37–6h-0p25deg-chunk-1.zarr-v2"
era5_ds = (
xr.open_zarr(uri, chunks={"time": 43}, consolidated=True)
.isel(level=0, drop=True)
.sel(time=slice("2020–01", "2021–01"))
[["2m_temperature"]] # Reduce to one arrays
)
era5_ds output

In this snippet, we load the ERA5 dataset stored in Zarr format from a Google Cloud bucket. We select the near-surface level, subset the time range to 2020, and focus on the 2-meter temperature variable.

Step 2: Define the Area of Interest

Next, we define our area of interest using osmnx, a powerful library for retrieving geospatial data, especially for urban areas.

import osmnx as ox

ox.config(use_cache=True, log_console=True)
query = {'city': 'Bengaluru'}
gdf = ox.geocode_to_gdf(query)

Here, we define a simple query to get the geometries for the city of Bengaluru. osmnx retrieves the vector data for the specified location and returns it as a GeoDataFrame.

Step 3: Perform Zonal Statistics with xvec

Now, we move on to the exciting part: computing zonal statistics. We’ll use xvec, a library that bridges xarray and vector data to compute statistics over specified geometries.

import xvec

aggregated_mean = era5_ds.xvec.zonal_stats(
gdf.geometry,
x_coords="longitude",
y_coords="latitude",
stats="mean",
)
aggregated_mean dataset

This code snippet performs zonal statistics on the ERA5 dataset using the geometries of Bengaluru. The `stats` method ensures accurate aggregation of raster data within the vector polygons.

Step 4: Convert the Results to GeoDataFrame

Finally, we’ll convert the aggregated data back to a GeoDataFrame, allowing for easy visualization and further spatial analysis.

aggregated_mean.xvec.to_geodataframe()

This simple line of code converts the results into a GeoDataFrame, which can be easily saved, shared, or visualized.

6. Conclusion

In this post, we’ve demonstrated how to harness the power of vector cubes by combining the capabilities of xarray, osmnx, and xvec. This approach allows for sophisticated spatial analyses that leverage both raster and vector data. Whether you’re working on climate research, urban planning, or environmental monitoring, vector cubes offer a versatile and powerful framework for your geospatial analyses.

Stay tuned for more posts where we’ll explore further applications and advanced techniques with vector cubes. Happy coding!

7. Acknowledgement

  1. WeatherBench2 for analysis-ready ERA5 data
  2. osmnx for geometries

8. References

--

--

No responses yet