QGIS scripting — Checking point membership within vector layer features

Hit another QGIS snag. This one took a day or so to sort through, and I actually had to write code. So I figured I’d write it up.

I struggled to solve the following problem using QGIS GUI tools:

I have a bunch of points (as a vector layer)
I have a bunch of vector layers of polygons
I want to know, for each point, which layers have at least one feature which contain this point

Speaking more concretely: I have cities (yellow), and I have areas (pink). I want to find which cities are in the areas, and which are not:

I assumed this would be a simple exercise using the GUI tools. It might be. But I could not figure it out. The internet suggests doing a vector layer join, but for whatever reason, joining a point layer to a vector layer crashed QGIS (plus, this is slow overkill for what I need — simple overlap, not full attribute joins).

Luckily, QGIS has rich support for scripting tools. There’s a pretty good tutorial for one example here. The full API is documented using Doxygen here. So I wrote a script to do this. I put the full script on GitHub —you can find it here.

I will preface this before I walk through the code — this is not a clever script. It’s actually really, really dumb, and really, really slow. But I only need this to work once, so I’m not going to implement any potential optimizations (which I’ll describe at the end).

First, the basic-basics: navigate Processing → Toolbox. Click “Create New Script from Template”

This creates — as you might expect — a new script from a template. I’ll go over the interesting bits here, since I had to piece together how to use the API as I went. Glossing over the boilerplate about naming, we only want two parameters: the vector layer with the XY points, and the output layer:

    def initAlgorithm(self, config=None):

        self.addParameter(
            QgsProcessingParameterFeatureSource(
                self.POINT_INPUT,
                self.tr('Input point layer'),
                [QgsProcessing.TypeVectorPoint]
            )
        )

        self.addParameter(
            QgsProcessingParameterFeatureSink(
                self.OUTPUT,
                self.tr('Output layer')
            )
        )

Getting down into the processAlgorithm block, we want to turn this input parameter into a source. We can do that with the built-in parameter methods:

        point_source = self.parameterAsSource(
            parameters,
            self.POINT_INPUT,
            context
        )

        if point_source is None:
            raise QgsProcessingException(self.invalidSourceError(parameters, self.POINT_INPUT))

A more production-ized version of this script would take a list of source layers to check. I could not be bothered to implement that, so I’m just looking at all of them (except the point layer). If it’s a vector layer, we’re checking it:

        vector_layers = []
        
        for key,layer in QgsProject.instance().mapLayers().items():
            if(layer.__class__.__name__ == 'QgsVectorLayer'):
                if(layer.name() != point_source.sourceName()):
                    vector_layers.append(layer)
                else:
                    feedback.pushInfo('Skipping identity point layer: %s:' %point_source.sourceName())

We want our output layer to have two types of attributes:

The original attributes from the point layer
One column for each other layer, for which we can mark presence with a simple 0/1 value.

        output_fields = QgsFields(point_source.fields())
        
        for layer in vector_layers:
            feedback.pushInfo('layer name: %s:' %layer.name())
            
            field = QgsField(layer.name())
            output_fields.append(field)

Similar to the input, we want to turn the parameter into a sink layer:

        (sink, dest_id) = self.parameterAsSink(
            parameters, 
            self.OUTPUT,
            context,
            output_fields,
            point_source.wkbType(),
            point_source.sourceCrs()
        )

        if sink is None:
            raise QgsProcessingException(self.invalidSinkError(parameters, self.OUTPUT))

Although it seems like a “nice to have”, tracking progress as we iterate through our points is pretty important; this script ran for 24 hours on the data I ran through it. If I had hit the 2 hour mark with no idea of progress — I’d certainly have given up.

Likewise, unless you explicitly interrupt your script when the operation is cancelled, QGIS has no way to stop progress. Having to force-kill QGIS to stop a hanging processing algorithm is super, duper, annoying:

        points = point_source.getFeatures()        
        total = 100.0 / point_source.featureCount() if point_source.featureCount() else 0

        for current, point in enumerate(points):

            if feedback.isCanceled():
                break

            feedback.setProgress(int(current * total))

From here on, we iterate over the target layers, and add to the target attributes if point is present in any feature in the target layer:

            attr_copy = point.attributes().copy()

            for layer in vector_layers: 
            
                features = layer.getFeatures()
                feature_match = False
                geometry = point.geometry()

                for feature in features:
                    
                    if (feature.geometry().contains(geometry)):
                        feature_match = True
                        break
                    
                if(feature_match):
                    attr_copy.append(1)
                else:
                    attr_copy.append(0)

Last but not least, we just output the feature we’ve put together into the output sink:

            output_feature = QgsFeature(point)
            output_feature.setAttributes(attr_copy)
            feedback.pushInfo('Point attributes: %s' % output_feature.attributes())
            sink.addFeature(output_feature, QgsFeatureSink.FastInsert)

And that’s about it (minus some boilerplate). Click the nifty “Run” button on your script:

Because we wrote this as a QGIS script, we get a nice UI out of it:

When we run this, it creates a new temporary output layer. When we open up the output layer attribute table, we get exactly what we wanted: for each record, a column with a 0/1 for the presence or absence within a given vector layer:

Perfect.

Now, this script is super slow, but we could fix that. Say we have n input points and m total vector features. The obvious fix is to run in better than n*m time — we’re currently checking every point against every feature in every layer. We could optimize this by geo-bucketing the vector layer features:

Break the map into a 10×10 (or whatever) grid
For each vector layer feature, insert the feature into the grid elements it overlaps.
When we check each point for layer membership, only check the features in the grid element it belongs to.

If we’re using k buckets (100, for a 10×10 grid), this takes the cost down to, roughly, k*m + n*m/k, assuming very few features end up in multiple buckets. We spend k*m to assign each feature to the relevant bucket, and then each point only compares against 1/k of the vector features we did before.

I’m not implementing this right now, because I don’t need to, but given the APIs available here, I actually don’t think it would be more than an hour or two of work. I’ll leave it as an exercise to the reader.

Anyway, I’d been doing my best to avoid QGIS scripting, because it seemed a bit hardcore for a casual like me. Turned out to be pretty straightforward, so I’ll be less of a wimp in the future. I’ll follow up soon™ with what I actually used this script for.

Using the QGIS Gaussian Filter on Wildfire Risk Data

I thought was done learning new QGIS tools for a while. Turns out I needed to learn one more trick with QGIS — the Gaussian filter tool. The Gaussian filter is ~~sparsely documented~~ basically undocumented, so I figured I’d write up an post on how I used it to turn a raster image into vector layers of gradient bands.

Motivation: In my spare time I’m adding more layers to the site I’ve been building which maps out disaster risks. California was mostly on fire last year, so I figured wildfires were a pretty hot topic right now.

The most useful data-source I found for wildfire risk was this USDA-sourced raster data of overall 2018 wildfire risk, at a pretty fine gradient level. I pulled this into QGIS:

(I’m using the continuous WHP from the site I linked). Just to get a sense of what the data looked like, I did some basic styling to make near-0 values transparent, and map the rest of the values to a familiar color scheme:

This actually looks pretty good as a high-level view, but the data is actually super grainy when you zoom in (which makes sense — the data was collected to show national maps):

This is a bit grainy to display as-is at high zoom levels. Also, raster data, although very precise is (1) slow to load for large maps and (2) difficult to work with in the browser — in MapBox I’m not able to remap raster values or easily get the value at a point (eg, on mouse click). I wanted this data available as a vector layer, and I was willing to sacrifice a bit of granularity to get there.

The rest of this post will be me getting there. The basic steps will be:

Filtering out low values from the source dataset
Using a very slow, wide, Gaussian filter to “smooth” the input raster
Using the raster calculator to extract discrete bands from the data
Converting the raster to polygons (“polygonalize”)
Putting it together and styling it

The first thing I did was filter values out of the original raster image below a certain threshold using the raster calculator. The only justification I have for this is “the polygonalization never finished if I didn’t”. Presumably this calculation is only feasible for reasonably-sized raster maps:

(I iterated on this, so the screenshot is wrong: I used a threshold of 1,000 in the final version). The result looks like this:

Next step is the fancy new tool — the Gaussian filter. A Gaussian filter, or blur, as I’ve seen elsewhere, is kind of a fancy “smudge” tool. It’s available via Processing → Toolbox → SAGA → Raster filter.

This took forever to run. Naturally, the larger values I used for the radius, the longer it took. Iterated on the numbers here for quite a while, with no real scientific basis; I settled on 20 Standard Deviation and 20 search radius (pixels), because it worked. There is no numerical justification for those numbers. The result looks like this:

Now, we can go back to what I did a few weeks ago — turning a raster into vectors with the raster calculator and polygonalization. I did a raster calculator on this layer (a threshold of .1 here, not shown):

These bands are actually continuous enough that we can vectorize it without my laptop setting any polar bears on fire. I ran through the normal Raster → Conversion → Polygonalize tool to create a new vector layer:

This looks like what we’d expect:

Fast forward a bit, filtering out the 0-value shape from the vector layer, rinse-and-repeating with 3 more thresholds, and adding some colors, it looks pretty good:

I want this on Mapbox, so I uploaded it there (again, see my older post for how I uploaded this data as an mbtiles file). Applied the same color scheme in a Style there, and it looks nice:

Just as a summary of the before and after, here is Los Angeles with my best attempt at styling the raw raster data:

You get the general idea, but it’s not really fun when you zoom in. Here’s it is after the Gaussian filter and banding:

I found these layers a lot easier to work with, and a lot more informative to the end user. It’s now visible as a layer on bunker.land.

I thought this tool was nifty, so hopefully this helps someone else who needs to smooth out some input rasters.

More QGIS – Hurricane maps (lines to points and vector overlaps)

I posted a couple days ago about how I used QGIS to generate heatmaps of tornado activity based on raw point data. Since I had invested time (kind of) learning the tool, I figured I should put together a few similar layers.

The most obvious choice was hurricane risk. I ended up using a pretty similar procedure to when I generated the tornado heatmap, but massaging the data took a few extra steps:

The input data came as vectors instead of points
My data covered the whole globe, but I wanted the final vectors to only cover land areas

Again, I was happy with the result, so I figured I’d write it up.

Similar to what I ran into with the tornado risk data, I couldn’t find any hurricane hazard GIS shapefiles. I did again find a raw dataset of all hurricanes the NOAA has records on, which was enough to get started.

Importing all the vectors (I think there were about 700,000) from this MapServer took a while, and the result was, as expected, a bit cluttered:

There’s probably a better way to filter the data down, but I ended up exporting the records to shapefiles so I could filter on attributes. The dataset had a lot of tropical storm data, and I filtered out everything except proper hurricanes (H1-H5).

Here things got a bit different. The heatmap function I used for tornadoes only works on points, and these vectors were all lines. Luckily, there was a brute force but straightforward solution: turn the line into a bunch of points. QChainage is a simple plugin that does exactly that. Once it’s installed, it’s available from the Vector → QChainage menu.

The above screenshot is a bit deceptive — I ended up using a point spacing of 20km in the final version. The only main downside of a higher frequency is longer processing time when generating the heatmap. The result kind of looks a mess from above:

But looks a lot better once I zoom in:

From here, I’ll fast forward through the same stuff I did last time; I used the points to generate a heatmap, this time using 250km point radii, and pulled a vector out of it. I iterated on thresholds until the most expansive layer more-or-less lined up with other reputable sources. My layer:

Compared to what I found online (source):

Except for a few lumps in Virginia and Maine, it looks pretty comparable.

Jumping forward a bit more, I again went with four gradients to get a map that looked like this:

I was a bit torn. While this looks cool, the highlights on the ocean are distracting when the goal is to highlight risk areas on land; I needed to filter the shapes down to only land areas.

It turns out, intersecting vectors in QGIS is pretty easy. I found a simple shapefile of all land areas on earth here (it wasn’t even a big download — less than 10MB). Once this data was imported, I could use the Vector → Geoprocessing tools → Intersect tool to generate an intersection layer:

This did exactly what I wanted. I repeated this for all four layers and ended up with a gradient only over land areas, exactly what I wanted. I didn’t bother styling the layers, since I’ll just handle that in Mapbox later.

Just as a sanity check, I swapped back in the openmaptiles background to make sure the coastlines lined up correctly (they did, except a few hundred meters here and there on the coastline).

A nice bonus from this NOAA data: this time the data covered the whole globe. All the other datasets I’ve found for other natural disaster risks are US-specific (and usually only the continental US):

I won’t go through the details on loading this into Mapbox; everything from here mirrored what I did last time. You can see the result as a layer on bunker.land:

Once again I was pleasantly surprised at how easy it was to get (relatively) nice looking graphics from QGIS with minimal experience.

At this point I’ve added data for most the layers I was interested in displaying (although I’m open to suggestions). I’ll likely get back to the actual web-dev side of this project and clean up a few loose ends over the next couple weekends.