dbfriend is a Python command-line tool designed to simplify the loading and synchronization of spatial data into PostGIS databases. It focuses on data integrity and safety, ensuring that your database operations are reliable and efficient. By handling complex tasks intelligently, dbfriend helps GIS professionals and database administrators streamline their workflows. Github repo
All database operations are executed within transactions, ensuring data integrity and automatic rollback on failure.
dbfriend automatically creates backups before modifying any existing tables, keeping up to three historical versions per table for easy restoration and added data safety.
Load data from various spatial file formats, including GeoJSON, Shapefile, GeoPackage, KML, and GML, providing flexibility in handling different data sources.
Prevent duplicates and ensure data consistency by comparing geometries using hashes to detect new, updated, and identical features efficiently.
Update existing geometries based on attribute changes, so your database always reflects the most current data.
Automatically detects and renames geometry columns to a standard format, simplifying data processing and integration.
Verifies CRS compatibility and automatically reprojects data as needed, ensuring spatial data aligns correctly within your database.
Automatically creates spatial indexes on imported data, improving query performance and data retrieval speeds.
dbfriend in action: processing spatial files and managing PostGIS database operations.
Sosilogikk is a Python module intended to streamline the use of Python libraries like Shapely or Fiona for GIS analyses, on the Norwegian vector data format SOSI (Samordnet Opplegg for Stedfestet Informasjon). Sosilogikk allows the user to seamlessly load a .SOS file into a GeoPandas GeoDataFrame through only a few lines of code. Github repo
Example structure of a vector object in a SOSI-file. The dot- and coordinates-format makes it difficult to use with Python libraries.
Sosilogikk applied to a large SOSI-file, resulting in an excel-like table.
Using the .to_file method, you can easily export the GeoDataFrame to any OGR-supported vector format, allowing software like ArcGIS or QGIS to be used.
Drainage lines in Flatgeobuf format visualized in QGIS.
Modern cloud-native GIS applications often need to efficiently store and transmit large volumes of geographic data between services. While GeoJSON is the standard format for geographic data exchange, its text-based nature makes it suboptimal for cloud storage and transmission. This solution combines MongoDB's BSON format with delta encoding to create a highly efficient geographic data pipeline. Github repo
BSON (Binary JSON) is MongoDB's binary format, specifically designed for cloud-scale data operations. Unlike traditional JSON, BSON provides native support for different numeric types and binary data, making it ideal for geographic coordinate storage. This becomes particularly important in microservice architectures where data needs to be efficiently serialized, transmitted, and stored across different cloud services. In cloud environments where MongoDB Atlas is increasingly common, this native format compatibility translates to significant performance benefits and reduced processing costs.
Delta encoding is a compression technique that stores the differences (deltas) between consecutive values rather than the values themselves. For geographic coordinates, this is particularly effective because consecutive points in a geometry are typically close to each other, resulting in small delta values that require fewer bits to store.
Visualization of delta encoding: Starting with a sequence of numbers (top row), we compute the differences between consecutive values (second row). Negative differences are then shifted to positive values (third row) for efficient binary representation (bottom row). This process significantly reduces storage requirements while maintaining perfect reversibility. (Adapted from Xia et al., The VLDB Journal, 2024)
The implementation in BSON_encoder.py follows these key steps:
Here's a simplified example showing the transformation:
Original coordinates: [(100.123456, 50.123456), (100.123476, 50.123476)]
After scaling by 1e6: [(100123456, 50123456), (100123476, 50123476)]
Delta encoded: [(100123456, 50123456), (20, 20)] # Second point stored as difference
In modern cloud architectures, geographic data flows between various services - from storage to processing to web APIs. This combined approach of delta encoding and BSON serialization dramatically reduces the bandwidth required for these operations. Testing with real-world infrastructure data:
Comparison of data sizes: Original GeoJSON format vs BSON-encoded format with delta compression. The combined approach reduces the file size by almost 90% while maintaining full coordinate precision.
While the size reduction is impressive, the real value lies in the format's cloud-native nature. The compressed data remains fully compatible with MongoDB's geospatial queries and indexes, allowing for efficient spatial operations directly on the compressed data. The compression is completely reversible, and the flattened GeoJSON structure results in smaller file sizes even after decompression.
Working in GIS production environments has highlighted an interesting challenge: the gap between what can be automated and what typically is automated. While tools like ArcGIS and QGIS excel at interactive analysis, many workflows would benefit from programmatic automation - yet often remain manual processes.
GIS workflows frequently involve repetitive tasks that are perfect candidates for automation:
The challenge isn't identifying what to automate - it's making automation accessible to GIS professionals who may not have programming experience. This is where Docker has proven particularly valuable.
Docker's containerization approach solves several fundamental challenges in GIS automation:
In practice, implementing Docker in a GIS environment involves creating a layer of abstraction between the technical complexity and the end user. The implementation typically involves wrapping Docker commands in a user-friendly interface - the user doesn't need to understand the underlying system, they simply interact with familiar buttons and inputs while Docker handles the complex environment management behind the scenes. To accomplish this, we can create a launcher script in the form of a batch file that presents the user with inputs through a simple graphical user interface. As I work in an environment where Python comes pre-installed, I chose to use a Python script for this task.
Example of a Docker-based GIS tool launcher in Python that abstracts away the complexity of container management and environment setup.
Using Docker in production has revealed several interesting insights:
The integration of Docker in GIS workflows opens interesting possibilities for the future of spatial data processing. As cloud infrastructure becomes more prevalent in GIS, containerized workflows could become the standard way of handling automated spatial analysis. The key will be maintaining the balance between powerful automation capabilities and user-friendly interfaces.
Python is a powerful language for rapid development, especially in the GIS domain, thanks to libraries like GeoPandas and Shapely. However, when processing large datasets or performing complex calculations, Python's speed can become a limitation. This is where Rust comes in - offering the speed we need while letting us keep Python's ease of use.
Bindings are essentially a way to connect two different programming languages, allowing them to work together. In this case, we use Rust bindings to integrate Rust's high-performance capabilities into Python workflows. This means we can write the most performance-critical parts of our GIS analysis in Rust, while still using Python for the overall workflow.
Many traditional GIS tools are written in C++, and for good reason - C++ offers excellent performance and has been the go-to language for computationally intensive tasks for decades. However, Rust brings some unique advantages to the table. While matching C++'s performance, Rust's compiler enforces memory safety and thread safety at compile time, preventing many common programming errors before they can become runtime bugs. This is particularly valuable when working with large spatial datasets where data integrity is crucial.
Additionally, Rust's modern tooling and package management system makes it easier to create and maintain bindings compared to C++. The language's focus on safe concurrency also makes it particularly well-suited for parallel processing of spatial data, an increasingly important consideration as datasets continue to grow in size and complexity.
To demonstrate the performance difference between Python and Rust, I performed a simple GIS task: creating buffers around 1 million point geometries, and checking how many of the buffers overlapped with each other. The results were striking - the Rust implementation completed in just 2 seconds, while the Python version took 84 seconds to finish the same task.
Results from the performance comparison.
The key features that make this implementation fast:
How we expose the Rust function to Python:
By incorporating Rust into a Python-based workflow, we can leverage the strengths of both languages. Python remains the glue that holds the workflow together, providing ease of use and flexibility, while Rust handles the heavy lifting where performance is critical. This combination allows us to build robust GIS applications that are both user-friendly and highly efficient.
An exploration of why and how to build custom topology validation tools in an era of increasingly complex geospatial data relationships. While traditional GIS tools offer built-in topology checks, modern spatial data often requires more nuanced, domain-specific validation rules. Github repo
As geospatial data becomes more complex, the relationships between features often extend beyond simple geometric rules. For example, a road intersection might be valid or invalid based on multiple factors:
While tools like ArcGIS, QGIS, and PostGIS provide robust basic topology checks, they may not capture these nuanced relationships without significant customization.
This project demonstrates how to build a flexible topology testing framework that separates validation rules from the validation logic. Using external configuration files, domain experts can define what constitutes a valid topological relationship:
{
"global_settings": {
"id_attribute": "id",
"output_folder_name": "TopologyTest_Output",
"tolerances": {
"gap": 0.001,
"overlap": 0.001
},
"enabled_checks": {
"intersections": true,
"self_intersections": true,
"gaps": true,
"dangles": true,
"overlaps": true,
"containment": true
}
},
"dataset_rules": {
"roads": {
"allow_intersection_if": [
{
"attribute": "terrain",
"values": ["bridge", "tunnel", "air"]
}
],
"allow_overlap_if": [
{
"attribute": "type",
"values": ["service_road", "emergency_lane"]
}
],
"check_dangles": true,
"check_self_intersections": true
},
"buildings": {
"allow_intersection_if": [],
"allow_overlap_if": [],
"check_gaps": true,
"gap_tolerance": 0.5,
"check_containment": true
}
}
}
Terminal print of the topology validation results when two files are tested against rules set by the user.
The framework demonstrates several key principles for custom topology testing:
This approach shows how organizations can build tools that validate not just geometric correctness, but also domain-specific spatial relationships that matter to their business processes.