Jesper Fjellin

GIS Developer & Engineer

Scroll to explore

dbfriend - PostGIS Database Management

dbfriend is a Python command-line tool designed to simplify the loading and synchronization of spatial data into PostGIS databases. It focuses on data integrity and safety, ensuring that your database operations are reliable and efficient. By handling complex tasks intelligently, dbfriend helps GIS professionals and database administrators streamline their workflows. Github repo

Key Features

Transactional Operations

All database operations are executed within transactions, ensuring data integrity and automatic rollback on failure.

Automated Table Backups

dbfriend automatically creates backups before modifying any existing tables, keeping up to three historical versions per table for easy restoration and added data safety.

Supports Multiple Vector Formats

Load data from various spatial file formats, including GeoJSON, Shapefile, GeoPackage, KML, and GML, providing flexibility in handling different data sources.

Intelligent Geometry Comparison

Prevent duplicates and ensure data consistency by comparing geometries using hashes to detect new, updated, and identical features efficiently.

Attribute-Aware Updates

Update existing geometries based on attribute changes, so your database always reflects the most current data.

Automatic Geometry Handling

Automatically detects and renames geometry columns to a standard format, simplifying data processing and integration.

CRS Compatibility Checks and Automatic Reprojection

Verifies CRS compatibility and automatically reprojects data as needed, ensuring spatial data aligns correctly within your database.

Spatial Index Creation for Optimized Queries

Automatically creates spatial indexes on imported data, improving query performance and data retrieval speeds.

Demonstration

Demonstration of dbfriend in action

dbfriend in action: processing spatial files and managing PostGIS database operations.

Sosilogikk

Sosilogikk is a Python module intended to streamline the use of Python libraries like Shapely or Fiona for GIS analyses, on the Norwegian vector data format SOSI (Samordnet Opplegg for Stedfestet Informasjon). Sosilogikk allows the user to seamlessly load a .SOS file into a GeoPandas GeoDataFrame through only a few lines of code. Github repo

Example SOSI-file

Example structure of a vector object in a SOSI-file. The dot- and coordinates-format makes it difficult to use with Python libraries.

Example SOSI-file loaded into GeoDataFrame

Sosilogikk applied to a large SOSI-file, resulting in an excel-like table.

Using the .to_file method, you can easily export the GeoDataFrame to any OGR-supported vector format, allowing software like ArcGIS or QGIS to be used.

Drainage lines in Flatgeobuf format visualized in QGIS

Drainage lines in Flatgeobuf format visualized in QGIS.

Delta Encoding and MongoDB - Optimizing for Cloud Computing

Modern cloud-native GIS applications often need to efficiently store and transmit large volumes of geographic data between services. While GeoJSON is the standard format for geographic data exchange, its text-based nature makes it suboptimal for cloud storage and transmission. This solution combines MongoDB's BSON format with delta encoding to create a highly efficient geographic data pipeline. Github repo

BSON and MongoDB in Cloud Computing

BSON (Binary JSON) is MongoDB's binary format, specifically designed for cloud-scale data operations. Unlike traditional JSON, BSON provides native support for different numeric types and binary data, making it ideal for geographic coordinate storage. This becomes particularly important in microservice architectures where data needs to be efficiently serialized, transmitted, and stored across different cloud services. In cloud environments where MongoDB Atlas is increasingly common, this native format compatibility translates to significant performance benefits and reduced processing costs.

Understanding Delta Encoding

Delta encoding is a compression technique that stores the differences (deltas) between consecutive values rather than the values themselves. For geographic coordinates, this is particularly effective because consecutive points in a geometry are typically close to each other, resulting in small delta values that require fewer bits to store.

Delta encoding visualization

Visualization of delta encoding: Starting with a sequence of numbers (top row), we compute the differences between consecutive values (second row). Negative differences are then shifted to positive values (third row) for efficient binary representation (bottom row). This process significantly reduces storage requirements while maintaining perfect reversibility. (Adapted from Xia et al., The VLDB Journal, 2024)

Implementation Approach

The implementation in BSON_encoder.py follows these key steps:

  1. Scale and Convert to Integers: First, we scale the floating-point coordinates (typically by 1e6) and convert them to integers to preserve precision while enabling efficient delta calculations.
  2. Calculate Deltas: For each point after the first, we store the difference from the previous point rather than the absolute coordinates.
  3. BSON Serialization: The delta-encoded coordinates are then serialized to BSON format, which provides efficient storage of integer arrays.
  4. GZIP Compression: Finally, we apply GZIP compression to further reduce the size of the encoded data.

Here's a simplified example showing the transformation:


            Original coordinates: [(100.123456, 50.123456), (100.123476, 50.123476)]
            After scaling by 1e6: [(100123456, 50123456), (100123476, 50123476)]
            Delta encoded: [(100123456, 50123456), (20, 20)]  # Second point stored as difference
                    

Compressions Results

In modern cloud architectures, geographic data flows between various services - from storage to processing to web APIs. This combined approach of delta encoding and BSON serialization dramatically reduces the bandwidth required for these operations. Testing with real-world infrastructure data:

Data size comparison between GeoJSON and BSON formats

Comparison of data sizes: Original GeoJSON format vs BSON-encoded format with delta compression. The combined approach reduces the file size by almost 90% while maintaining full coordinate precision.

Cloud Integration Benefits

While the size reduction is impressive, the real value lies in the format's cloud-native nature. The compressed data remains fully compatible with MongoDB's geospatial queries and indexes, allowing for efficient spatial operations directly on the compressed data. The compression is completely reversible, and the flattened GeoJSON structure results in smaller file sizes even after decompression.

Docker in Production Environments - Bridging Technical Gaps

Working in GIS production environments has highlighted an interesting challenge: the gap between what can be automated and what typically is automated. While tools like ArcGIS and QGIS excel at interactive analysis, many workflows would benefit from programmatic automation - yet often remain manual processes.

The Automation Challenge in GIS

GIS workflows frequently involve repetitive tasks that are perfect candidates for automation:

  • Database-wide topology validation and error checking
  • Scheduled quality control processes
  • Statistical aggregation of incoming project data
  • Automated spatial sampling and analysis

The challenge isn't identifying what to automate - it's making automation accessible to GIS professionals who may not have programming experience. This is where Docker has proven particularly valuable.

Docker as a Bridge

Docker's containerization approach solves several fundamental challenges in GIS automation:

  • It eliminates the complexity of Python environment management
  • It ensures consistent spatial libraries across different machines
  • It packages all dependencies in a single, shareable unit
  • Most importantly, it makes advanced automation accessible to non-programmers

From Theory to Practice

In practice, implementing Docker in a GIS environment involves creating a layer of abstraction between the technical complexity and the end user. The implementation typically involves wrapping Docker commands in a user-friendly interface - the user doesn't need to understand the underlying system, they simply interact with familiar buttons and inputs while Docker handles the complex environment management behind the scenes. To accomplish this, we can create a launcher script in the form of a batch file that presents the user with inputs through a simple graphical user interface. As I work in an environment where Python comes pre-installed, I chose to use a Python script for this task.

Docker-based GIS tool launcher interface

Example of a Docker-based GIS tool launcher in Python that abstracts away the complexity of container management and environment setup.

Reflections on Production Use

Using Docker in production has revealed several interesting insights:

  • Environment Consistency: The "it works on my machine" problem essentially disappears
  • Version Management: Docker images provide a reliable way to track and roll back changes
  • Distribution: Updates to spatial analysis tools can be pushed through Docker Hub without requiring end-user intervention
  • Isolation: Each process runs in its own container, preventing system-wide conflicts

Looking Forward

The integration of Docker in GIS workflows opens interesting possibilities for the future of spatial data processing. As cloud infrastructure becomes more prevalent in GIS, containerized workflows could become the standard way of handling automated spatial analysis. The key will be maintaining the balance between powerful automation capabilities and user-friendly interfaces.

Rust Bindings in Python - When Fast Data Processing Matters

Python is a powerful language for rapid development, especially in the GIS domain, thanks to libraries like GeoPandas and Shapely. However, when processing large datasets or performing complex calculations, Python's speed can become a limitation. This is where Rust comes in - offering the speed we need while letting us keep Python's ease of use.

Understanding Rust Bindings

Bindings are essentially a way to connect two different programming languages, allowing them to work together. In this case, we use Rust bindings to integrate Rust's high-performance capabilities into Python workflows. This means we can write the most performance-critical parts of our GIS analysis in Rust, while still using Python for the overall workflow.

Why Use Rust?

Many traditional GIS tools are written in C++, and for good reason - C++ offers excellent performance and has been the go-to language for computationally intensive tasks for decades. However, Rust brings some unique advantages to the table. While matching C++'s performance, Rust's compiler enforces memory safety and thread safety at compile time, preventing many common programming errors before they can become runtime bugs. This is particularly valuable when working with large spatial datasets where data integrity is crucial.

Additionally, Rust's modern tooling and package management system makes it easier to create and maintain bindings compared to C++. The language's focus on safe concurrency also makes it particularly well-suited for parallel processing of spatial data, an increasingly important consideration as datasets continue to grow in size and complexity.

Performance Comparison

To demonstrate the performance difference between Python and Rust, I performed a simple GIS task: creating buffers around 1 million point geometries, and checking how many of the buffers overlapped with each other. The results were striking - the Rust implementation completed in just 2 seconds, while the Python version took 84 seconds to finish the same task.

Buffer analysis performance comparison

Results from the performance comparison.

Rust Implementation Highlights

The key features that make this implementation fast:

Python Integration

How we expose the Rust function to Python:

Integrating Rust with Python

By incorporating Rust into a Python-based workflow, we can leverage the strengths of both languages. Python remains the glue that holds the workflow together, providing ease of use and flexibility, while Rust handles the heavy lifting where performance is critical. This combination allows us to build robust GIS applications that are both user-friendly and highly efficient.

Building Custom Topology Testing Solutions

An exploration of why and how to build custom topology validation tools in an era of increasingly complex geospatial data relationships. While traditional GIS tools offer built-in topology checks, modern spatial data often requires more nuanced, domain-specific validation rules. Github repo

Why Custom Topology Testing?

As geospatial data becomes more complex, the relationships between features often extend beyond simple geometric rules. For example, a road intersection might be valid or invalid based on multiple factors:

  • Physical infrastructure (bridges, tunnels)
  • Administrative classifications
  • Temporal constraints
  • Domain-specific business rules

While tools like ArcGIS, QGIS, and PostGIS provide robust basic topology checks, they may not capture these nuanced relationships without significant customization.

A Rule-Based Approach

This project demonstrates how to build a flexible topology testing framework that separates validation rules from the validation logic. Using external configuration files, domain experts can define what constitutes a valid topological relationship:


                        {
                            "global_settings": {
                                "id_attribute": "id",
                                "output_folder_name": "TopologyTest_Output",
                                "tolerances": {
                                    "gap": 0.001,
                                    "overlap": 0.001
                                },
                                "enabled_checks": {
                                    "intersections": true,
                                    "self_intersections": true,
                                    "gaps": true,
                                    "dangles": true,
                                    "overlaps": true,
                                    "containment": true
                                }
                            },
                            "dataset_rules": {
                                "roads": {
                                    "allow_intersection_if": [
                                        {
                                            "attribute": "terrain",
                                            "values": ["bridge", "tunnel", "air"]
                                        }
                                    ],
                                    "allow_overlap_if": [
                                        {
                                            "attribute": "type",
                                            "values": ["service_road", "emergency_lane"]
                                        }
                                    ],
                                    "check_dangles": true,
                                    "check_self_intersections": true
                                },
                                "buildings": {
                                    "allow_intersection_if": [],
                                    "allow_overlap_if": [],
                                    "check_gaps": true,
                                    "gap_tolerance": 0.5,
                                    "check_containment": true
                                }
                            }
                        }
Example of topology validation results

Terminal print of the topology validation results when two files are tested against rules set by the user.

Implementation Strategy

The framework demonstrates several key principles for custom topology testing:

  1. Separation of Concerns: Keeping validation rules separate from the validation engine allows for easy updates as business rules evolve
  2. Attribute-Aware Validation: Moving beyond pure geometry to consider feature attributes and relationships
  3. Extensibility: A modular design that allows for adding new types of topology checks as needs arise
  4. Clear Reporting: Generating results that help users understand and fix topology issues in their specific context

This approach shows how organizations can build tools that validate not just geometric correctness, but also domain-specific spatial relationships that matter to their business processes.