Ineffective Altruism

Texas Lt. Gov. Dan Patrick hung up a morality piñata when he had the audacity to state, on record, that he’d be willing to take risks with his life to prevent an economic meltdown:

No one reached out to me and said, ‘As a senior citizen, are you willing to take a chance on your survival in exchange for keeping the America that all America loves for your children and grandchildren?’ And if that is the exchange, I’m all in

Like many other moral-minded folk, Cuomo took a free swing at the piñata a few hours later, and snapped back with this zinger:

While this tweet is genetically engineered to be re-twatted, it is a ridiculous statement.  People put prices on human life all the time. Insurance companies price human lives. Legislators do it all the time when enacting regulations meant to protect human life, at a cost. 

On the flip side, it’s common to price out the cost of saving a life as well.  Effective Altruism is a niche but deeply principled movement which goes through great lengths to, with somewhat autistic rigour, price out the most effective way to save lives (and then, generally, donate 80% of their salary, time, and organs to those causes).

GiveWell is one of them.  They annually put together a version of this spreadsheet which calculates which charities are able to do the most work with the fewest dollars.  It’s worth checking out.

The math is more complicated than a naive observer would expect, because some it turns out that a lot of the shittiest things nature can do to a person doesn’t actually kill them.  Hookworms reduce educational attainment, blind children (!), and reduce life earnings, but rarely if ever… actually kill anyone. But because being blind and poor really sucks, many of the “most effective” charities focus on eliminating hookworms and similar parasites.  

The way to massage this math onto a linear scale is to compute dollars per QALY saved, where QALY stands for Quality Adjusted Life Year — the equivalent of one year in perfect health.  By this measure, saving the life of an infant who would otherwise die in childbirth may save 75 QALYs, while saving the life of a senile, bedbound 80 year old may save 15 Quality Adjusted Life Hours.

This is a reasonable and principled method of making financial investments.  If you put stock in the Efficient Politics Hypothesis of government, stop here and feel good about the choices we’ve made.

2020

We have decided to, as a nation, spend an indefinite amount of time-money at home eating ice cream and mastrubating to Netflix, in a crusade to stop* the spread of COVID-19.  

*by some difficult to define metric

How much time-money?  $2.2 trillion is the down-payment stimulus package.  It’s a very conservative lower-bound estimate of the cost of recovery (it’s not expected that this will fix the economy by any means), so we can run with it.

How many lives are we saving?  The high upper bound is 2.2 million (assuming a 100% infection rate (not generally considered realistic), with a fatality rate of .66% (best estimate)).

This works out to a conveniently round, very lower bound, $1,000,000 per life saved.  What about those QALYs? I’m not going to try sum it out, but we can look at a few suggestive stats:

  • The average age at death (due to COVID-19) in Italy is 79.5.  Italy’s average life expectancy is, for reference, 82.5*
  • 99% of Italian COVID-19 victims had pre-existing conditions (same source).

*I understand that the average life expectancy of a living 79 year old is higher than 82, which is why I’m not doing the math, so please shut up so we can move on.

We can speculate that no, we will not be saving very many QALYs.  But we can use the crude ‘lives saved’ metric instead to generously lower-bound our math, and run with 2.2 million raw.

Effectiver Altruism

My question: how does this calculus compare to effective altruism?  I was genuinely curious, because $1,000,000 per life saved is somewhat disproportionate to the charity pitches you see on television:

COMMERCIAL CUT. Enter UNICEF USA. 

“Save a Child for Only $39 per day*”

Exit right.

*assuming 70 year life expectancy @ $1,000,000 per life

I tried to find a toplist of “problems we can solve for X dollars to save Y lives per year”.  I did not find one. GiveWell (entirely reasonably) calculates the payout of donating to a specific charity, not of speculatively eliminating entire facets of human suffering.

So I put together a list.  These numbers aren’t precise.  They are very speculative.  My goal was to understand the orders of magnitude involved.

My focus was on problems we could solve that don’t involve serious tradeoffs, and don’t require hard political choices.  Trying to solve “war”, “suicide”, or “alcoholism” don’t cost money per se, they require societal committment we can’t put a price tag on.  For the most part, this leaves diseases.

I started with the highest-preventable-death diseases in the developing world, and ended up with 7 “campaigns” where we could non-controversially plug in money on one end, and pull extant wretched masses out of the other.   When considering the payout in lives saved from eradicating a disease, I used 30 years, because using “forever” is unfair (I’m sure there’s a time-decay value on life an actuary would prefer, but this was simple, and it doesn’t really change the conclusion).

Global Hunger

Hunger is the stereotypical “big bad problem”, and it wasn’t hard to find data about deaths:

Around 9 million people die of hunger and hunger-related diseases every year, more than the lives taken by AIDS, malaria and tuberculosis combined.

(for the record, this gave me some good leads on other problems).  How much would it cost to actually fix hunger?

Estimates of how much money it would take to end world hunger range from $7 billion to $265 billion per year.  

Pulling the high estimate, we get… 

Price Tag$265 billion
Lives Saved9,000,000
Cost Per Life$29,444 / life 

Malaria

Malaria sucks, and a lot of smart people want to spend money to get rid of it.  How many people does Malaria kill?

In 2017, it was estimated that 435,000 deaths due to malaria had occurred globally

What would it take to actually eliminate Malaria?

Eradicating malaria by 2040 would cost between $90 billion and $120 billion, according to the Gates Foundation

We can highball this estimate to get …. 

Price Tag$120 billion
Lives Saved13,050,000
Cost Per Life$9,195 / life

Tuberculosis

Tuberculosis is still a huge killer in the developing world, but it’s a killer we can put rough numbers on:

The Lancet study said reducing tuberculosis deaths to less than 200,000 a year would cost around $10 billion annually… a chronic lung disease which is preventable and largely treatable if caught in time, tuberculosis is the top infectious killer of our time, causing over 1.6 million deaths each year.

Price Tag$10 billion
Lives Saved42,000,000
Cost Per Life$7,143 / life

The math here is fuzzier than I’m comfortable with, but works out in the same ballpark as Malaria, so I feel OK about the result.

AIDS

Again, this wasn’t the cleanest math puzzle, but this report pegs the cost of ending AIDs at $26.2 billion/year for 16 years.  At 770,000 deaths per year from AIDs, we can (again, more mathematically than I like, ballpark the bill and lives saved over 30 years:

Price Tag$366.8 billion
Lives Saved10,780,000
Cost Per Life$33,963 / life

Maternal mortality

Like Tuberculosis, it ends up in the same ballpark as Malaria, so I’m inclined to believe it’s not more than half-asinine.

Dying in childbirth is bad, and kills people.  How much public health spending would it take to eliminate it?  Again, it was really hard to find good estimates, but we find that 

Researchers at UNFPA and Johns Hopkins University calculated that the annual cost of direct services, such as paying for medical staff, drugs and supplies when a woman is giving birth, will reach $7.8bn (£6.2bn) by 2030, up from an estimated $1.4bn last year.

To save how many lives?  

About 295 000 women died during and following pregnancy and childbirth in 2017

I’m honestly not sure how to compare these numbers, but if we ballpark that the $7.8 billion saves at least that number (?) each year, we work out to 

Price Tag$7.8 billion
Lives Saved295,000
Cost Per Life$26,440 / life

If you don’t like the fuzziness of the math, feel free to ignore it, or multiply it by 10.  Or whatever.

Measles

Measles is bad.  To convince you to vaccinate your children, I will attach a picture of a child with measles.

In the US, measles doesn’t flat-out kill many people, but in the developing world, it does:

Worldwide more than 140,000 people died from measles in 2018

What would it cost to actually eradicate measles?

Eradicating measles by 2020 is projected to cost an additional discounted $7.8 billion and avert a discounted 346 million DALYs between 2010 and 2050

Using the 30 year window I’ve been using, we end up with:

Price Tag$7.8 billion
Lives Saved140,000
Cost Per Life$1,857 / life

This is a shockingly low number, and can only conclude either that (1) I messed something up, or (2)  that we are a terrible, horrible, species for not having eradicated this decades ago.

Global Warming

Stepping outside of diseases, what about something big?  Global warming is big.

How many deaths might be attributed to climate change in the next century?  Obviously this is a make-believe number, but the number is definitely at least ‘several’:  

A report on the global human impact of climate change published by the Global Humanitarian Forum in 2009, estimated more than 300,000 deaths… each year

This is the lowest bound I can find short of flat-out AGW denialism.  It’s easy to find genocide-level projections, assuming crop failures in the developing world several orders of magnitude higher.  I won’t use them.

What’s the cost of fixing global warming?  Long-term, there’s no good answer yet, because the technology doesn’t exist.  But there are (speculative) ways we can slow it down for reasonable sums via carbon sequestration: 

Returning that land to pasture, food crops or trees would convert enough carbon into biomass to stabilize emissions of CO2, the biggest greenhouse gas, for 15-20 years… With political will and investment of about $300 billion, it is doable

We can use these numbers to price tag the cost/payoff of delaying global warming:

Price Tag$300 billion
Lives Saved6,000,000
Cost Per Life$50,000 / life

This is the most speculative guesstimate of all, so if you want to ignore it too, feel free.

Compare & Contrast

My original goal was to build a snarky visualization game which invited users to bin-pack global problem solving which worked out to less than $2T.  I was foiled, because you could do literally everything on this list for less — by my (fuzzy) calculations, you could solve global hunger*, malaria, tuberculosis, delay global warming 20 years, cure AIDs, eliminate maternal mortality, and eliminate measles, for “only” $1.4T.

*to be fair, this one is annual, not a permanent elimination.

But I had already invested the time learning how to use Data Studio, so I made the chart anyway:

(you can play with it yourself here)

Conclusion

What I feel confident saying — even using wildly generous numbers, since I am:

  • using the absolute upper bound for US COVID-19 deaths,
  • using crude deaths for a disease which primarily affects the elderly, instead of QALYs when comparing to diseases which affect primarily the young,
  • using just one ($2.2T) of many recovery packages we’re going to pay for, and
  • generously upper/lower bounding all the numbers

is that 

The COVID-19 economic shutdown is 20x as expensive per life as any other public health intervention we could fund.  The most expensive intervention on this list — “delaying global warming” — cost $50,000/head.  We’re paying $1,000,000/head for COVID-19.

Now, there is a range of valid value statements, depending on your priors and beliefs in how creative fiscal policy can be:

  • “We should do both”
  • “We don’t have money to do either”
  • “Maybe civilization was a bad idea”

I’m not claiming to be a superhero here.  I’m not an Effective Altruist, and probably don’t register as an altruist at all.  But cheap platitudes annoy me, especially when used to shut down arguments.

In the end, the most meaningful, easiest, way Cuomo could have qualified his Tweet would have been 

We will not put a dollar value on American life

It’s not a great look, or a great tweet.  But as far as I can tell, it’s the only way to make the numbers — maybe — add up.

You Should be Angry

If you are under the age of 30, in the west, your future has turned grim.  In a fair world, you would be in the streets, screaming to get back to your job and building your life.

But instead, you are inside, while the police sweep empty streets.

As of late March 2020, the economies of the US, and most of western Europe, have been shut down.  This action does not have precedent, and it will cripple a generation in poverty and debt. Short term, this will likely mean 20% unemployment, vast GDP contraction, and trillions in debt.

This price will be paid by those under 30, to save — some of — those over 80.

It is not necessary, and is not worth the price.  It was an instinctive reaction, and I hope history will not be kind to the politicians who caved to it.  The best time to stop this mistake was before it was made. 

The second best time is right now.

You are being lied to

We have been asked to shut down the US for two weeks — and similar timeframes in Italy, France and elsewhere.  Two weeks (15 days, per the Feds) is a palatable number. Two weeks is a long Christmas break.  The technorati elite on Twitter think the shutdown is a vacation, and for them it is, because their checking accounts are overflowing from the fat years of the 2010’s.

Two weeks is not the goal, and it never was the goal.

The Imperial College report is the study which inspired the shutdowns — first of the Bay Area, then all of California, then New York.   This report measured the impact of various mitigation strategies. For those not “in the know” (aka, normal humans) there are two approaches to treating this pandemic:

  • Mitigation, where we “flatten the curve” enough to keep ICUs full, but not overflowing.  Eventually, we will build up herd immunity, and disease persists at a low level.
  • Suppression, where we eliminate the disease ruthlessly and completely.  

You don’t have to read the paper.  This graph tells you everything you need to know:

The orange line is the optimal “mitigation” strategy.  We try to keep ICUs full, try to keep businesses and schools running, and power through it.  But people will die.

The green line is suppression.  We shut down businesses, schools, universities, and all civic life.  Transmission stops, because there is no interaction with the outside world.  The economy does not depress — it stops.

We aren’t following the orange line, because: people will die.

That is the IC report’s conclusion: no amount of curve flattening gets us through this pandemic in a palatable timeframe.  Thus, we must suppress — for 18 months or longer — until we have a vaccine.  I’m not paraphrasing. This is the quote:

This leaves suppression as the preferred policy option…. this type of intensive intervention package … will need to be maintained until a vaccine becomes available (potentially 18 months or more)

Italy, France, California, New York, Illinois, and more in the days to come, have nearly shuttered their economies.  All schools, universities, and social gatherings are cancelled, at risk of ostracization or police enforcement. This is the green line.

By enacting the green line — closing schools, universities, and businesses — the US is choosing to give up on mitigation, and choose suppression.  This doesn’t mean 2 weeks of suffering. It means 2 years to start, and years of recession to follow.

We are eating the young to save the unhealthy old

COVID-19 does not kill, except in the rarest of exceptions, the young.   Old politicians will lie to you. The WHO and CDC will lie to you — as they lied about masks being ineffective — to nudge you to act “the right way”.  Do not trust them.

Here are the real, latest, numbers:

In South Korea, for example, which had an early surge of cases, the death rate in Covid-19 patients ages 80 and over was 10.4%, compared to 5.35% in 70-somethings, 1.51% in patients 60 to 69, 0.37% in 50-somethings. Even lower rates were seen in younger people, dropping to zero in those 29 and younger.

No youth in South Korea has died from COVID-19.  Fleetingly few of the middle aged. Even healthy seniors rarely have trouble.  The only deaths were those seniors with existing co-morbidities.  In Italy, over 99% of the dead had existing illnesses:

With the same age breakdown for deaths as South Korea:

As expected, the numbers for the US so far are the same:

More than raw numbers, the percent of total cases gives a sense of the risk to different age groups. For instance, just 1.6% to 2.5% of 123 infected people 19 and under were admitted to hospitals; none needed intensive care and none has died… In contrast, no ICU admissions or deaths were reported among people younger than 20.

These numbers are under-estimates — the vast majority of cases were never even tested or reported, because the symptoms don’t even exist in many of the healthy.   The vast majority of the young would not even notice a global pandemic, and none — generously, “fleetingly few” would die.

The young — the ones who will pay for, and live through, the recession we have wrought by fiat — do not even benefit from the harsh medicine we are swallowing.  But they will taste it for decades. 

This is not even necessary

To stop COVID-19, the west shut itself down.  East Asia did not. East Asia has beaten COVID-19 anyway.

China is where the disease started (conspiracy theories aside).  Through aggressive containment and public policy, the disease has been stopped.  Not even mitigated — stopped:

There are two common (and opposite) reactions to these numbers:

  1. China is lying.  This pandemic started on Chinese lies, and they continue today.
  2. China has proven that the only effective strategy is containment

Neither is true.  But we also know that China can, and has, used measures we will never choose to implement in the west.  China can lock down cities with the military. China can force every citizen to install a smartphone app to track their movement, and alert those with whom they interacted.

So we can look at the countries we can emulate:  South Korea, Japan, Taiwan, and Singapore.  None (well, at most one) of them are authoritarian.  None of them have shut down their economies. Everyone one of them is winning against COVID-19.

South Korea

South Korea is the example to emulate.  The growth looked exponential — until it wasn’t:

South Korea has not shut down.  Their economy is running, and civic life continues, if not the same as normal, within the realm of normal.  So how did they win, if not via self-imposed economic catastrophe?  Testing.

The backbone of Korea’s success has been mass, indiscriminate testing, followed by rigorous contact tracing and the quarantine of anyone the carrier has come into contact with

Their economy will suffer not because of a self-imposed shutdown, but because the rest of the world is shutting itself down. 

Singapore

Singapore won the same way: by keeping calm, testing, and not shutting down their economy.

Singapore is often the “sure… but” exception in policy discussions.   It’s a hyper-educated city-state. Lessons don’t always apply to the rest of the world. But a pandemic is different.  Pandemics kill cities, and Singapore is the world’s densest city. If Singapore can fix this without national suicide, anyone can.  So what did they do? 

  • Follow contacts of potential victims, and test
  • Keeping positives in the hospital
  • Communicate
  • Do not panic 
  • Lead clearly and decisively

I could talk about Japan and Taiwan, but I won’t, because the story is the same: Practice hygiene.  Isolate the sick. Social distance. Test aggressively.   

And do not destroy your economy.

“The economy” means lives

The shutdown has become a game — fodder for memes, fodder for mocking Tweets, and inspirational Facebook posts, because it still feels like Christmas in March.  

It is not.  If your response to the threat of a recession is:

  • “The economy will just regrow”
  • “We can just print more money”
  • “We just have to live on savings for a while”

The answer is simple: you either live a life of extraordinary privilege, are an idiot, or both.  I can try to convince you, but first, ask yourself:  how many people have the savings you’ve built up — the freedom to live out of a savings account in a rough year? I’ll give you a hint:  almost none.

Likewise, working from home is the correct choice, for anyone who can manage it. Flattening the curve is a meaningful and important improvement over the unmitigated spread of this disease. But the ability to work from home is a privilege afforded to not even a third of Americans:

According to the Bureau of Labor Statistics, only 29 percent of Americans can work from home, including one in 20 service workers and more than half of information workers

If you are able to weather this storm by working from home, congratulations — you are profoundly privileged. I am one of you. But we are not average, and we are not the ones who risk unemployment and poverty. We are not the ones who public policy should revolve around helping. The other 71% of Americans — who cannot — are the ones who matter right now.

To keep the now-unemployed from dying in the streets, we will bail them out.  And that bailout, in the US alone, just to start, will cost a trillion dollars.  That number will almost certainly double, at a minimum, over the next two years.  

What else could we spend two trillion dollars on?   To start, we could void all student debt:  $1.56 trillion, as of 2020.  We could vastly expand medicare or medicaid.  You can fill in the policy of your choice, and we could do it.  But we won’t.

We are borrowing money to pay for this self-inflicted crisis.  We should be spending that money investing in the future — perhaps freeing students from a life of crippling debt — but instead, we are throwing it at the past.

The rest of the world

The US is not the world.  The US will muddle through, no matter how poor our decisions, usually (but not always) at the cost of our futures, not our lives.  The rest of the world does not have this luxury.

GDP saves lives.  GDP is inextricably linked to life expectancy, child mortality, deaths in childbirth, and any other measure of life you want to choose.  This is so well proven that it shouldn’t require citations, but I’ll put up a chart anyway:

A depression will set back world GDP by years.  The US and Europe buy goods from the developing world.  The 2008 recession — driven primarily by housing and speculation in western markets — crushed the economies not just of developed nations, but the entire world:

we investigate the 29 percent drop in world trade in manufactures during the period 2008-2009. A shift in final spending away from tradable sectors, largely caused by declines in durables investment efficiency, accounts for most of the collapse in trade relative to GDP

If you are unswayed by the arguments that a self-inflicted depression will hurt the working poor in the US, be swayed by this — that our short-sighted choices will kill millions in the developing world.

How did we get here?

Doctors are not responsible for policy.  They are responsible for curing diseases.  It is not fair to ask them to do more, or to factor long-term economic policy into their goal of saving lives right now.   We elect people to balance the long-term cost of these decisions.  We call them politicians, and ours have failed us.

The solution to this crisis is simple — we do our best to emulate East Asia.  We isolate the sick. We improve sanitization.  We mobilize industry to build tests, ventilators, and respirators, as fast as we can — using whatever emergency powers are needed to make it happen.   And we do this all without shutting down the economy, the engine which pays for our future.

We do the best we can.  And accept that if we fail, many of the sickest elderly will die.  

Next time will be different.  We will learn our lessons, be prepared, and organize our government response teams  the way that Taiwan and South Korea have. We will have a government and a response which can protect every American.

But now, today, we need to turn the country back on, and send the rest (the 71% who can’t work from home) back to work. We owe them a future worth living in. 

The best gift the Coronavirus can give us? The normalization of remote work

tl,dr: Covid-19, if (or at this point, “when”) it becomes a full pandemic, is going to rapidly accelerate the shift to remote-first tech hiring.  This is an amazing thing. It’s too bad it’ll take a pandemic to get us there.

The focus over the past couple weeks has been on how a pandemic will affect the economy.  This is a reasonable question to ask. But a brief economic downturn is just noise compared to the big structural changes we’ll see in turn — that long-term quarantine and travel restrictions will normalize remote work. It’s already happening, especially in tech companies.  

Remote work is a revolutionary improvement for workers of all stripes and careers, but I’m a tech worker, so I’ll speak from the perspective of a one.

Software development is second only to sales in ability to execute at full capacity while remote*, if given the opportunity.  But willingness to perform tech-first remote hiring has ground forward glacially slowly, especially at the largest and hippest tech companies — and at times, even moved backwards

*I’ve known top sales reps to make their calls on ski lifts between runs.  Software devs can’t quite match that quality-of-life boost.

A prolonged quarantine period (even without a formal government quarantine, tech companies will push hard for employees to work from home) will force new remote-first defaults on companies previously unwilling to make remote work a first-class option:

  • Conversations will move to Slack, MS teams, or the equivalent
  • Videoconferenced meetings will become the default, even if a few workers make it into the office
  • IT departments without stable VPNs (all too common) will quickly fix that problem, as the C-suite starts working from home offices

The shift to normalized remote work will massively benefit tech employees, who have been working for decades with a Hobson’s choice of employment — move to a tech hub, with tech-hub salaries, and spend it all on cost-of-living, or eat scraps:

  • Cost-of-living freedom: SF and New York are inhumanly expensive. Make it practical to pull SF-salaries while living in the midwest? Most SF employees have no concept of how well you can live on $250k in a civilized part of the country.
  • Family balance freedom: operating as an on-site employee with children is hard, and working from home makes it wildly easier to balance childcare responsibilities (trips to school, childcare, etc etc).
  • Commutes suck. Good commutes suck. Terrible commutes are a portal into hell. What would you pay to gain two hours and live 26-hour days? Well, working from home is the closest you can get.

I don’t mean to be callous — deaths are bad, and I wish it upon nobody. But the future is remote-first, and when we get there, history will judge our commute-first culture the same way we judge people in the middle ages for dumping shit on public streets.

I wish it didn’t take a global pandemic to get us here, but if we’re looking for silver linings in the coffins — it’s hard to imagine a shinier one that the remote-normalized culture we’re going to build, by fire, blood, and mucous, over the next six months.  

I’ll take it.

It turns out that Ithaca, Traverse City, and Roswell are good places to hang out while the world burns

Alright, inspired by recent events, I’ve spent a bit (well, a lot) of time over the past couple months scratching an itch called “figuring out where in the US is safe from Armageddon.”

I had a lot of fun learning how to use a medley of QGIS features, learning where to find USGS GIS data, researching what targets the Russians/Chinese/French are are likely to target in a nuclear war, and learning how to render all this using Mapbox GL JS

I’ve continued adding all this data as layers on my pet website bunker.land, and now if you want, you can map any of the risk layers I’ve added:

  • Tornadoes
  • Earthquakes
  • Sea level rise
  • Hurricanes
  • Wildfires
  • Possible targets in a nuclear war

But it’s time to wrap it up and ask an actual actionable question:

“Given these potential hazards —both natural and man-made — which US cities are the least prone to unexpected natural or man-made disaster?”

Rules

As a ground rule, I limited the list of towns/cities I’d evaluate to those with populations of 10,000 or more, for a couple reasons:

  1. 10,000 is a reasonable cutoff for “towns which have all the basic infrastructure for urban life” – grocery stores, restaurants, etc (your threshold may be wildly different, and you should feel free to filter the data differently.)
  2. Even mapping out cities of 10,000+ was computationally challenging — it took a full 24 hours for me to do the QGIS join I needed on this data.  Mapping out towns of 1,000+ would have required a more sophisticated process.  

Data

Before I get too deep, I want to be very clear about all the relative metrics used here: they are based only on my judgement.  The raw data is all pretty well-sourced — I’m not fabricating any data points — but the weights of the relative risk bands are judgement-based. Interpret as you will.

First, I had four natural-disaster risk maps to parse: hurricanes, tornadoes, earthquakes, and wildfires.  I broke each of these risk zones into 4 hazard bands, informally “low, medium, high, and very high.”

Earthquakes: I was able to fairly directly translate earthquake data into hazard bands based on the USGS input data.  The units here takes a bit of work to wrap your head around (“peak acceleration as a % of gravity”), but it was easy enough to break this data into four bands:  10-20, 20-50, 50-70, and 70+

Wildfires: see this post for how I translated wildfire hazard data into discrete hazard bands. Lots of judgement involved.

Tornados: see this post for how I found tornado hazard zones.

Hurricanes:  see this post for how I generated a hurricane risk map.

I assigned each of these zones a risk score.  These scores are entirely judgement-based (although as I’ll discuss later, these scores don’t actually matter much for the purposes of this post): 

  • Low: 1
  • Medium: 4
  • High: 6
  • Very high: 8

Second, there’s the list of plausible infrastructure targets in a nuclear war.  For these purposes that means: military-capable airports, ports, military bases, state capitals, power plants (1+ GW), railyards, and nuclear missile silos. 

I’ve used Alex Wellerstein’s NUKEMAP to judge “how close is too close” to a nuclear target.  I went with a 5MT nuclear warhead (a standard Chinese ICBM loadout), which gives four hazard bands:

  • Fireball: within 2km
  • 5 PSI airblast: within 12km
  • 3rd-degree burns: within 25km
  • 1 PSI airblast: within 34km

Like with the natural disasters above, I assigned each of these zones a risk score:

  • Fireball: 10
  • 5 PSI airblast: 5
  • 3rd-degree burns: 2
  • 1 PSI airblast: 1

You can read a bit more about the methodology I used here.

If you want to do your own calculations, with your own weights, here are the raw layers I used, and the script I used to calculate weights (I know it’s a mess.  It’s a personal project. Don’t judge me). You can reference this post for the script that turns the layers into the combined columnar cities.csv file.

Results

So, crunched all of the above data, and found…. 72 cities of 10,000+ people with no measurable risk, by those metrics.  Here’s the map:

I’ve put a map of these cities on MapBox: you can explore the map here.  I’ve also included the worst 25 cities for reference, but I’ll save that discussion for a later post.

Observations

In no particular order:

Most of the Midwest is entirely ruled out because of the risk of tornadoes.  This may or may not be a reasonable bar, by your own judgement.

I had technical difficulties factoring flooding from sea level rise into these rankings.  “Luckily”, coastal cities ended up being death-traps for unrelated reasons, so I didn’t need to do any manual fiddling to exclude them.

There were fewer low-risk cities in Idaho, Nevada, Utah, and Montana than I expected.  Turns out, this is because:

  • The area around Yellowstone has a meaningful earthquake risk.  Given that Yellowstone is still an active supervolcano, this seems fair.
  • A lot of areas in Idaho and Nevada are totally safe from a risk perspective, but simply don’t have any cities of 10,000+ people which register.

If you end up working through the data from scratch, note that I did remove three cities which only made the list because of bad data:

  • Juneau and Anchorage.  Turns out, these cities have a huge nominal footprint, so the “city center” is actually in the middle of nowhere.  The real city centers are next to all sorts of important infrastructure (including a state capital).  I removed these from the “safe” list.
  • Newport OR is actually in a high-earthquake risk zone, but my map data puts the city in the middle of a river, which doesn’t register an overlap.  Instead of fiddling with the data, I just removed it.

There are likely others — I’m not going to sort through the remaining 72 by hand, but be aware that there are probably flukes.

Largest cities

This is actually a longer list of cities than I anticipated: I thought I’d get 1-2 strangely isolated cities free of hazards, not 72.  So we have a bit of leeway to interpret this data. The most straightforward question an urbanite would ask is, 

“So what’s the largest city with no measurable hazards?”

We can answer that pretty easily.  Here are the top 9:

CityPopulation
Prescott Valley, AZ97,066
Casa Grande, AZ58,632
Ithaca, NY55,439
Lake Havasu City, AZ55,341
Traverse City, MI49,806
Bullhead City, AZ49,531
Roswell, NM49,119
Maricopa, AZ46,741
Prescott, AZ42,731

Now, here’s where I’m going to roleplay Solomon:  I don’t care what this data says, nowhere — absolutely nowhere — in Arizona is a good place to ride out the apocalypse:

  • Arizona is, at best, barely inhabitable without Air Conditioning.  Global warming will only make this worse.  There is absolutely no point in surviving a nuclear war only to burst into flames the second your HVAC loses grid power.
  • Central Arizona is only hydrated by a gargantuan public works project.  An entire river is pumped over the Buckskin mountains to keep the geriatric heart of Phoenix feebly beating.  The minute a disaster strikes, Phoenix is going to be full of Sandworms and fremen raiding nursing homes to dessicate the elderly and steal their water.
  • The few — very few — places in Arizona with natural water are along the Colorado river.  If there’s a breakdown of law and order, Las Vegas is either going to (1) close the Hoover Dam and take all the water, or (2) get nuked and wash everything downstream of the Glen Canyon dam into Baja California.

So I am striking everything in Arizona from this list: Prescott Valley, Casa Grande (honestly, it’s a suburb of Phoenix, it’s just that the suburbs of Phoenix threaten to circle the earth) , Lake Havasu City, Bullhead City, Maricopa, and Prescott (why is this even distinct from Prescott Valley?)

Which leaves 3 cities.

The winners

This leaves us three cities which are (1) fairly large, (2) sheltered from natural disasters and (3) have absolutely nothing worth destroying: 

  1. Ithaca, NY
  2. Traverse City, MI
  3. (I promise I did not tamper with the data to get this) — Roswell, NM 

Ithaca

Ithaca was a bit surprising, but it’s reasonable in retrospect:

  • As a college town, Ithaca is reasonably large, driving it to the top of this list
  • As far as I can tell, it has no industry whatsoever
  • Although New York City is a Big Deal, upstate New York is pretty empty overall.  There’s really not much in the area that shows up in the target maps I generated:

So… not what I expected, but seems reasonable overall.

Traverse City

I have never heard of Traverse City MI before.  After reading the Wikipedia page, I have learned that “the Traverse City area is the largest producer of tart cherries in the United States”.  Apparently that is about it.

There are some military bases in the general area, but nothing that registers in the 34km buffer: 

I have very little else to say about Traverse City, except that it seems safe from disaster.

Roswell

I will be honest: I’ve always thought of Roswell in the context of UFO jokes, and never really considered that Roswell is a real city, full of real people, living real lives.

It turns out that it is a real city, but the largest industry is “Leprino Foods, one of the world’s largest mozzarella factories”, which is likely not a first-strike military target. It also turns out that the infamous Roswell Air Force Base closed in the late 60s, so there are no longer any military targets in the vicinity.  

In fact, the closest risk of any significance, by these metrics, is a wildfire hazard zone to the east:

So Roswell, alien jokes aside, actually registers as the third-largest city utterly* safe from natural or man-made disaster.

*well, as best as I can figure.

Conclusions

I tried pretty hard to not pre-register expectations so I wouldn’t unconsciously bias my results.  So I don’t have anything interesting to say, like “that’s exactly what I expected” or “wow, I thought city XYZ would make the list!” 

I feel pretty good about these results because:

  • They are geographically diverse.  It’s not all in some weird cluster because of bad data.
  • I didn’t end up having to draw an arbitrary cutoff.  72 is a good number of cities to greenlight.
  • Roswell is #3, which I still find hilarious.

I’ll do one last followup post with the worst-25 cities by these metrics.  Spoiler alert: it’s mostly the gulf coast and LA. But I’ll hopefully have that up in a week or two.

QGIS scripting — Checking point membership within vector layer features

Hit another QGIS snag. This one took a day or so to sort through, and I actually had to write code. So I figured I’d write it up.

I struggled to solve the following problem using QGIS GUI tools:

  • I have a bunch of points (as a vector layer)
  • I have a bunch of vector layers of polygons
  • I want to know, for each point, which layers have at least one feature which contain this point

Speaking more concretely: I have cities (yellow), and I have areas (pink). I want to find which cities are in the areas, and which are not:

I assumed this would be a simple exercise using the GUI tools. It might be. But I could not figure it out. The internet suggests doing a vector layer join, but for whatever reason, joining a point layer to a vector layer crashed QGIS (plus, this is slow overkill for what I need — simple overlap, not full attribute joins).

Luckily, QGIS has rich support for scripting tools. There’s a pretty good tutorial for one example here. The full API is documented using Doxygen here. So I wrote a script to do this. I put the full script on GitHub —you can find it here.

I will preface this before I walk through the code — this is not a clever script. It’s actually really, really dumb, and really, really slow. But I only need this to work once, so I’m not going to implement any potential optimizations (which I’ll describe at the end).

First, the basic-basics: navigate Processing  → Toolbox. Click “Create New Script from Template”

This creates — as you might expect — a new script from a template. I’ll go over the interesting bits here, since I had to piece together how to use the API as I went. Glossing over the boilerplate about naming, we only want two parameters: the vector layer with the XY points, and the output layer:

    def initAlgorithm(self, config=None):

        self.addParameter(
            QgsProcessingParameterFeatureSource(
                self.POINT_INPUT,
                self.tr('Input point layer'),
                [QgsProcessing.TypeVectorPoint]
            )
        )

        self.addParameter(
            QgsProcessingParameterFeatureSink(
                self.OUTPUT,
                self.tr('Output layer')
            )
        )

Getting down into the processAlgorithm block, we want to turn this input parameter into a source. We can do that with the built-in parameter methods:

        point_source = self.parameterAsSource(
            parameters,
            self.POINT_INPUT,
            context
        )

        if point_source is None:
            raise QgsProcessingException(self.invalidSourceError(parameters, self.POINT_INPUT))

A more production-ized version of this script would take a list of source layers to check. I could not be bothered to implement that, so I’m just looking at all of them (except the point layer). If it’s a vector layer, we’re checking it:

        vector_layers = []
        
        for key,layer in QgsProject.instance().mapLayers().items():
            if(layer.__class__.__name__ == 'QgsVectorLayer'):
                if(layer.name() != point_source.sourceName()):
                    vector_layers.append(layer)
                else:
                    feedback.pushInfo('Skipping identity point layer: %s:' %point_source.sourceName())

We want our output layer to have two types of attributes:

  • The original attributes from the point layer
  • One column for each other layer, for which we can mark presence with a simple 0/1 value.
        output_fields = QgsFields(point_source.fields())
        
        for layer in vector_layers:
            feedback.pushInfo('layer name: %s:' %layer.name())
            
            field = QgsField(layer.name())
            output_fields.append(field)

Similar to the input, we want to turn the parameter into a sink layer:

        (sink, dest_id) = self.parameterAsSink(
            parameters, 
            self.OUTPUT,
            context,
            output_fields,
            point_source.wkbType(),
            point_source.sourceCrs()
        )

        if sink is None:
            raise QgsProcessingException(self.invalidSinkError(parameters, self.OUTPUT))

Although it seems like a “nice to have”, tracking progress as we iterate through our points is pretty important; this script ran for 24 hours on the data I ran through it. If I had hit the 2 hour mark with no idea of progress — I’d certainly have given up.

Likewise, unless you explicitly interrupt your script when the operation is cancelled, QGIS has no way to stop progress. Having to force-kill QGIS to stop a hanging processing algorithm is super, duper, annoying:

        points = point_source.getFeatures()        
        total = 100.0 / point_source.featureCount() if point_source.featureCount() else 0

        for current, point in enumerate(points):

            if feedback.isCanceled():
                break

            feedback.setProgress(int(current * total))

From here on, we iterate over the target layers, and add to the target attributes if point is present in any feature in the target layer:

            attr_copy = point.attributes().copy()

            for layer in vector_layers: 
            
                features = layer.getFeatures()
                feature_match = False
                geometry = point.geometry()

                for feature in features:
                    
                    if (feature.geometry().contains(geometry)):
                        feature_match = True
                        break
                    
                if(feature_match):
                    attr_copy.append(1)
                else:
                    attr_copy.append(0)

Last but not least, we just output the feature we’ve put together into the output sink:

            output_feature = QgsFeature(point)
            output_feature.setAttributes(attr_copy)
            feedback.pushInfo('Point attributes: %s' % output_feature.attributes())
            sink.addFeature(output_feature, QgsFeatureSink.FastInsert)

And that’s about it (minus some boilerplate). Click the nifty “Run” button on your script:

Because we wrote this as a QGIS script, we get a nice UI out of it:

When we run this, it creates a new temporary output layer. When we open up the output layer attribute table, we get exactly what we wanted: for each record, a column with a 0/1 for the presence or absence within a given vector layer:

Perfect.

Now, this script is super slow, but we could fix that. Say we have n input points and m total vector features. The obvious fix is to run in better than n*m time — we’re currently checking every point against every feature in every layer. We could optimize this by geo-bucketing the vector layer features:

  • Break the map into a 10×10 (or whatever) grid
  • For each vector layer feature, insert the feature into the grid elements it overlaps.
  • When we check each point for layer membership, only check the features in the grid element it belongs to.

If we’re using k buckets (100, for a 10×10 grid), this takes the cost down to, roughly, k*m + n*m/k, assuming very few features end up in multiple buckets. We spend k*m to assign each feature to the relevant bucket, and then each point only compares against 1/k of the vector features we did before.

I’m not implementing this right now, because I don’t need to, but given the APIs available here, I actually don’t think it would be more than an hour or two of work. I’ll leave it as an exercise to the reader.

Anyway, I’d been doing my best to avoid QGIS scripting, because it seemed a bit hardcore for a casual like me. Turned out to be pretty straightforward, so I’ll be less of a wimp in the future. I’ll follow up soon with what I actually used this script for.

‘Education Facts’ Labeling — A Modest Proposal to Fix the Student Debt Crisis

College debt is a hot topic right now.  Elizabeth Warren wants to cancel most of it.  Bernie Sanders wants to cancel all of it. Donald Trump loves the idea of bankruptcy (not from college debt — just as a general principle).  

But since forgiving student debt, like any meaningful reform in America, is a silly pipe dream, let’s instead fix it by eliminating information asymmetry.  Because if there’s anything American college students are not, it’s informed consumers.

Colleges are expensive, and costs have grown wildly faster than overall wage growth.  We all know that some majors, and some for-profit colleges, provide almost no value.  But since undergraduate education is a cash cow for universities, self-regulation of tuition growth — growth used to boost spending on new dorms, rec centers, and bureaucrats — by the universities themselves is utterly unrealistic. 

The crowning achievement of the Food and Drug Administration — right alongside keeping children from dying of Salmonella — is accurate, mandatory, Nutrition Facts.  Nutrition Facts are a universal constant. Without them, the American consumer would be forced to perform difficult calculus like “quantify how much lard is in a medium-sized lardburger.”

So, building on the wild success of Nutrition Facts, here’s my modest proposal: Federal Department of Education mandated Education Facts labeling:

This summary statistics table will give students the ability to identify which colleges can actually improve their futures, and which exist mainly as a parasitic drain on society.  Advertising will be totally legal — but must come coupled with vital statistics.  These will focus on:

  • Debt.  The big kahuna.  How far underwater is the average Philosophy major when they swim off the graduation stage?
  • Salary and employment.  5 years post-graduation, where is your career?  Can you dig yourself out of your debt before your children die?
  • Grad school acceptance.  If you’re going to die in debt, at least do it in style.  Can your undergraduate education help shelter you from the real-world with another 8-15 years of graduate school?

These statistics won’t just be available online.  McDonalds publishes nutrition facts online, but the mobility scooter market is as hot as ever.  These Education Facts will be attached to every form of advertisement produced by an institution of higher learning.  

To help build a vision of this fully-informed world, I have illustrated a few examples: 

College brochures — The paper deluge that every high-school student wades through during Junior through Senior years.  Education Facts would help triage this garbage pile, by filtering the wheat from the for-profit scams:

[source]

Billboards: Colleges are huge on billboards now-days.  It is only appropriate that claims like “career launching” be substantiated, in similarly giant font:

[source]

Sports:  College sports are, without a doubt, the most ironic yet effective form of higher-education advertising on the planet.  The only ethical use of this time and attention is to put numbers and figures in front of the eyeballs of impressionable high-school students:

[source]

This will not be an easy transition for America.  While calorie-labelling Frappuchinos at Starbucks inspired consternation, guilt, and shame across America, it did in fact cut calorie consumption markedly.  

Education Facts will hurt lousy colleges.  It will hurt schools which peddle useless majors to naive students.  But the students of America will come out of it stronger, more informed, and more solvent than ever before. 

Using the QGIS Gaussian Filter on Wildfire Risk Data

I thought was done learning new QGIS tools for a while.  Turns out I needed to learn one more trick with QGIS — the Gaussian filter tool.  The Gaussian filter is sparsely documented basically undocumented, so I figured I’d write up an post on how I used it to turn a raster image into vector layers of gradient bands.

Motivation:  In my spare time I’m adding more layers to the site I’ve been building which maps out disaster risks.  California was mostly on fire last year, so I figured wildfires were a pretty hot topic right now.

The most useful data-source I found for wildfire risk was this USDA-sourced raster data of overall 2018 wildfire risk, at a pretty fine gradient level.  I pulled this into QGIS:

(I’m using the continuous WHP from the site I linked).  Just to get a sense of what the data looked like, I did some basic styling to make near-0 values transparent, and map the rest of the values to a familiar color scheme:

This actually looks pretty good as a high-level view, but the data is actually super grainy when you zoom in (which makes sense — the data was collected to show national maps):

This is a bit grainy to display as-is at high zoom levels.  Also, raster data, although very precise is (1) slow to load for large maps and (2) difficult to work with in the browser — in MapBox I’m not able to remap raster values or easily get the value at a point (eg, on mouse click).  I wanted this data available as a vector layer, and I was willing to sacrifice a bit of granularity to get there.

The rest of this post will be me getting there.  The basic steps will be:

  • Filtering out low values from the source dataset
  • Using a very slow, wide, Gaussian filter to “smooth” the input raster
  • Using the raster calculator to extract discrete bands from the data
  • Converting the raster to polygons (“polygonalize”)
  • Putting it together and styling it

The first thing I did was filter values out of the original raster image below a certain threshold using the raster calculator.  The only justification I have for this is “the polygonalization never finished if I didn’t”. Presumably this calculation is only feasible for reasonably-sized raster maps:  

(I iterated on this, so the screenshot is wrong: I used a threshold of 1,000 in the final version).  The result looks like this:

Next step is the fancy new tool — the Gaussian filter.  A Gaussian filter, or blur, as I’ve seen elsewhere, is kind of a fancy “smudge” tool.  It’s available via Processing → Toolbox → SAGA → Raster filter.  

This took forever to run.  Naturally, the larger values I used for the radius, the longer it took.  Iterated on the numbers here for quite a while, with no real scientific basis;  I settled on 20 Standard Deviation and 20 search radius (pixels), because it worked.  There is no numerical justification for those numbers. The result looks like this: 

Now, we can go back to what I did a few weeks ago — turning a raster into vectors with the raster calculator and polygonalization.  I did a raster calculator on this layer (a threshold of .1 here, not shown):

These bands are actually continuous enough that we can vectorize it without my laptop setting any polar bears on fire.  I ran through the normal Raster → Conversion → Polygonalize tool to create a new vector layer:

This looks like what we’d expect:

Fast forward a bit, filtering out the 0-value shape from the vector layer, rinse-and-repeating with 3 more thresholds, and adding some colors, it looks pretty good:

I want this on Mapbox, so I uploaded it there (again, see my older post for how I uploaded this data as an mbtiles file).  Applied the same color scheme in a Style there, and it looks nice: 

Just as a summary of the before and after, here is Los Angeles with my best attempt at styling the raw raster data: 

You get the general idea, but it’s not really fun when you zoom in.  Here’s it is after the Gaussian filter and banding:

I found these layers a lot easier to work with, and a lot more informative to the end user.  It’s now visible as a layer on bunker.land.

I thought this tool was nifty, so hopefully this helps someone else who needs to smooth out some input rasters.

Firebase is cool — Easy IP geolocation on Mapbox GL JS page-load

(warning: this all is probably obvious to people who know Firebase, but I didn’t see any direct references to this feature, so I figured I’d write it up)

Over the past few weeks I’ve been learning my way around Firebase; I use it to host my current side-project, a webapp (bunker.land), which uses Mapbox GL JS to render an interactive map of natural disasters, nuclear targets, and the like.

Today I took a shot at adding a convenience feature; during the initial page load, I wanted to zoom to the user’s actual location, instead of just defaulting to the center of the US.  Mapbox and Firebase have made this project stupidly easy so far, so I was optimistic this would also be easy.

Precise geolocation is certainly possible through Mapbox GL JS, but I’d have to use the actual browser location APIs; those require permissions, which is a noisy user-experience if it happens during initial page-load: 

(and frankly, people have no reason to give my random webapp their location.  I’m not that important.)

A lighter-weight version of geolocation would be to just geo-locate based on the user’s IP address.  IP geolocation isn’t very accurate — IP addresses move around, so I’m not going to get more precision than a city.  For my purposes, that’s fine. And unlike real location, I don’t have to get permission to see a user’s IP address.*

Mapping IP address to a location still takes a dataset and a bit of work though.  A number of sites offer IP to location services, but I wasn’t really thrilled about creating an account with a location service, managing an API key, and giving my credit card to internet randos just for this convenience.

Luckily, I discovered an easier way: it turns out that even though I’m using Firebase and not AppEngine, all the AppEngine request headers are attached to my Firebase function requests.  Among those is x-appengine-citylatlong, which is (more or less) exactly what I want. 

So, I built a tiny Firebase function which does nothing except listen for requests and pipe the location back into the response so I can use it in Mapbox:

'use strict';

const functions = require('firebase-functions');
const admin = require('firebase-admin');
admin.initializeApp();

const cors = require('cors')({
  origin: true
});

exports.getCoordinates = functions.https.onRequest((req, res) => {
  cors(req, res, () => {
    res.status(200).send({
      "data": {
        "coords": req.headers['x-appengine-citylatlong']
      }
    });
    res.end();
  })
});

(this function ended up being pretty trivial, but I struggled for a bit because it wasn’t obvious (to me) how to directly return JSON from a Firebase function.  Firebase functions are (rightfully) built around the idea of returning Promises, because most Firebase functions proxy async services — storing data in database, putting it on GCS, etc.  It’s pretty unusual that a function is able to do what I do here — respond immediately, based only on the headers.)

Anyway, this function does exactly what I want it to do; it returns the coordinates of the request:

$ curl https://us-central1-notzillow.cloudfunctions.net/getCoordinates

{"data":{"coords":"47.606209,-122.332071","country":"US"}}%

On the Mapbox side, we can use this to flyTo the coordinates as soon as the map is loaded:

//  wait until the map is loaded
map.on('load', function () {

    // fetch the user coordinates from firebase
    var getCoordinates = firebase.functions().httpsCallable('getCoordinates');
    getCoordinates({}).then(function (result) {

      if (result.data.coords) {

          let latLong = result.data.coords.split(",");

            //  note that lat/long are reversed in appengine
            map.flyTo({
              center: [
                parseFloat(latLong[1]), 
                parseFloat(latLong[0])],
              zoom: 11
            });
        }
    });
})

Really, that’s it.  I’ve plugged a slightly more complicated version of this code into bunker.land, and now it zoom to (roughly) the user’s location after the map loads.  With this trick, the geolocation is easy, cheap and simple, my favorite kind of trick : ) 

* do not try to teach me about GDPR.  I do not care.

More QGIS – Hurricane maps (lines to points and vector overlaps)

I posted a couple days ago about how I used QGIS to generate heatmaps of tornado activity based on raw point data.  Since I had invested time (kind of) learning the tool, I figured I should put together a few similar layers.

The most obvious choice was hurricane risk.  I ended up using a pretty similar procedure to when I generated the tornado heatmap, but massaging the data took a few extra steps:

  • The input data came as vectors instead of points
  • My data covered the whole globe, but I wanted the final vectors to only cover land areas

Again, I was happy with the result, so I figured I’d write it up.  

Similar to what I ran into with the tornado risk data, I couldn’t find any hurricane hazard GIS shapefiles.  I did again find a raw dataset of all hurricanes the NOAA has records on, which was enough to get started.

Importing all the vectors (I think there were about 700,000) from this MapServer took a while, and the result was, as expected, a bit cluttered:

There’s probably a better way to filter the data down, but I ended up exporting the records to shapefiles so I could filter on attributes.  The dataset had a lot of tropical storm data, and I filtered out everything except proper hurricanes (H1-H5).

Here things got a bit different.  The heatmap function I used for tornadoes only works on points, and these vectors were all lines.  Luckily, there was a brute force but straightforward solution: turn the line into a bunch of points.  QChainage is a simple plugin that does exactly that.  Once it’s installed, it’s available from the Vector  →  QChainage menu.

The above screenshot is a bit deceptive — I ended up using a point spacing of 20km in the final version.  The only main downside of a higher frequency is longer processing time when generating the heatmap. The result kind of looks a mess from above:

But looks a lot better once I zoom in:

From here, I’ll fast forward through the same stuff I did last time; I used the points to generate a heatmap, this time using 250km point radii, and pulled a vector out of it.  I iterated on thresholds until the most expansive layer more-or-less lined up with other reputable sources. My layer:

Compared to what I found online (source): 

Except for a few lumps in Virginia and Maine, it looks pretty comparable. 

Jumping forward a bit more, I again went with four gradients to get a map that looked like this:

I was a bit torn.  While this looks cool, the highlights on the ocean are distracting when the goal is to highlight risk areas on land; I needed to filter the shapes down to only land areas.

It turns out, intersecting vectors in QGIS is pretty easy.  I found a simple shapefile of all land areas on earth here (it wasn’t even a big download — less than 10MB).  Once this data was imported, I could use the Vector → Geoprocessing tools → Intersect tool to generate an intersection layer:

This did exactly what I wanted.  I repeated this for all four layers and ended up with a gradient only over land areas, exactly what I wanted.  I didn’t bother styling the layers, since I’ll just handle that in Mapbox later.

Just as a sanity check, I swapped back in the openmaptiles background to make sure the coastlines lined up correctly (they did, except a few hundred meters here and there on the coastline).

A nice bonus from this NOAA data: this time the data covered the whole globe.  All the other datasets I’ve found for other natural disaster risks are US-specific (and usually only the continental US):

I won’t go through the details on loading this into Mapbox; everything from here mirrored what I did last time.  You can see the result as a layer on bunker.land:

Once again I was pleasantly surprised at how easy it was to get (relatively) nice looking graphics from QGIS with minimal experience.  

At this point I’ve added data for most the layers I was interested in displaying (although I’m open to suggestions).  I’ll likely get back to the actual web-dev side of this project and clean up a few loose ends over the next couple weekends.

Using QGIS and Mapbox to Map Tornado Hotspots on bunker.land

I’ve spent the last couple weekends putting together bunker.land, which I posted about a couple weeks ago (the tl,dr is, “mapping out the riskiest places in the US to live”). 

My focus has been on targets during a nuclear war, but I thought it would be fun to expand the project to include natural disaster likelihoods.  I didn’t have too much trouble finding datasets for elevation (to model sea level rise), earthquakes, and wildfires. Tornado risk seemed like a good next step.

I wanted a map that looked something like this (source):

(aside:  I grew up in the middle of Missouri, which according to this map, is literally the armpit of tornado alley.  And yet in the 15 years I lived there, I never saw even one tornado, and I am pretty salty about this.  Now I know why. More on this later.) 

However, I needed the tornado hazard data formatted as GIS shapefiles or rasters so I could render it via Mapbox GL JS, the library I use to display webmaps.  Sadly, I had a lot of trouble finding a GIS-formatted risk map for tornadoes  The closest thing I found was a set of tornado starting points from 1950-2017.  This is a comprehensive dataset, but when I pulled it into QGIS and mapped it out, the raw data was a bit… noisy:

Since I couldn’t find a map that worked out of the box, I had no choice but to learn something new.  Luckily, I found a guide for making heatmaps in QGIS, which gave me a really good starting point. Less fortunately, the guide is for an old version of QGIS, and as a result I hit a number of obstacles I had to Google around.  

I’m pretty happy with the result, and spent a fair amount of time learning how to use QGIS, so I figured I’d write up how I filtered this data into vector datasets and got it into Mapbox, where I display it as a layer on bunker.land.

Making a heatmap

Starting from the very beginning, we’ll want a new QGIS project.  So we have some visual context when playing around with these shapes, I added an OpenStreetMap base layer.  The tornado dataset we want to work with is available as shapefiles, and we can add that to QGIS via Layer → Add Layer → Add Vector Layer:

Our lives will be easier layer on if this data is all reprojected into EPSG:3857 – WGS 84 before we do any editing.  We can just do that first. Right click on the layer → Export → Save Features As:

Set the CRS to WGS 84, save, and we can work with the new reprojected layer from now on.

So our first problem is that this dataset is huge and noisy.  While I don’t recommend ignoring any tornadoes, I would not personally get off my couch for anything less than an F2 tornado, so that’s what I’m going to filter for.  Since this data is a shapefile, I can filter on the fields of the objects; right click on the new layer → Filter.   

We’ll just filter on the “mag” column, looking for F2+ tornadoes:

This is a bit less cluttered, but still not super actionable.  From here, our goals are:

  • turn these points into a heatmap
  • extract discrete layers from the heatmap
  • save the extracted layers as shapefiles

Luckily for us, QGIS has a nifty heatmap tool which lets us turn our points into a heatmap raster.  Click on Processing → Toolbox → Interpolation → Heatmap:

Iterating on the settings here took a while; I had to experiment before I found settings that looked good.  I went with a 150km radius on the heatmap points, 4000 rows, and 10009 columns (once you select the number of rows, the columns auto-populate).  I played around with the colors on the resulting heatmap for a bit and ended up with this:

Intensity bands

While this is an improvement, it’s still kind of a hot mess (pun intended).  Heatmaps based on spotty data like this probably over-exaggerate the hotspots (there’s likely reporting bias, and we don’t want to overweight individual data points).  I’d prefer to get discrete intensity bands. To get those, we can use the raster calculator: Raster → Raster Calculator:

Since our heatmap values are no longer really connected to any actual unit, choosing units was a bit of guesswork.  Frankly, I just chose numbers that lined up with the risk areas I saw on other maps; the lowest gradient, 10, gives us this:

This is the kind of gradient band I’m interested in.  Unfortunately, this is still a raster image. We really want shapefiles — we can do more interesting things with them on Mapbox and in the browser, and the data is dramatically smaller.  Luckily, QGIS has tool to turn raster images into shapefiles: “polygonalize”. We can go Raster → Conversion → Raster to Vector:

We can select whatever we’ve named our filtered raster.  This gives us the previous image broken into two chunks:

We want to filter out the part that falls below our heatmap threshold.  Right click the layer → Properties → Filter:

Filter for where the feature value is equal to 1.  Now we’re down to the shapes we care about:

Of course we can play around with the layer styling to get it to look like whatever we want:

To capture the gradients we care about, we can repeat this process at a few thresholds to capture distinct bands.  These don’t correspond to any particular intensity, they are just intended to demarcate more and less intense risk areas.  

Fast-forwarding the repetitive bits, I’ve repeated these steps with four raster calculator thresholds (with this dataset, I ended up using thresholds of 10, 25, 40, and 65).  By setting a different color on each layer I’ve produced and decreasing opacity to 50%, I got this:

This captures what I want; distinct gradient bands without overly-weighting hotspots.  If your goal is just to generate a static raster image, you can stop here and export this image directly. 

Mapbox

My goal however is to import these layers into Mapbox so I can attach them to an existing interactive web map.  Mapbox is a platform for hosting customized maps and embedding them in apps or webapps; I use Mapbox, plus the corresponding Mapbox GL JS library, to host maps for bunker.land.  To get this data into Mapbox, we want to upload the data as a Tileset and use the data within a Style as feature layers.

I learned the hard way that there is a good way to do this, and a bad way to do this.  The simple way is to export each of the 4 bands as a GeoJSON file, upload it to Mapbox, and add it as a layer.  This is a mistake. Mapbox has a limit of 15 data “Sources” per Style, so saving each layer as a separate GeoJSON file and uploading them separately quickly caps out how many layers we can have per Style.

Luckily, Mapbox has released a nice tool called tippecanoe which lets us combine GeoJSON files into a single mbtiles file (it can do a ton of other things too; this is just what I’ve used it for).  An mbtiles file can have as many layers as we want, as long as it is under 25 GB.

First we want to extract each layer as a GeoJSON file; right click the layer → Export → Save Features As.

Choose GeoJSON and repeat for each layer.  This gives us four geojson files:

$ ls -lh *.geojson
-rw-r--r--  1 bpodgursky  640K Aug  1 22:46 tornado10.geojson
-rw-r--r--  1 bpodgursky  590K Aug  1 22:45 tornado25.geojson
-rw-r--r--  1 bpodgursky  579K Aug  1 22:45 tornado40.geojson
-rw-r--r--  1 bpodgursky  367K Aug  1 22:44 tornado65.geojson

We can use tippecanoe to combine these into a single, small, mbtiles file:

$ tippecanoe  -zg -o tornado.mbtiles — extend-zooms-if-still-dropping *.geojson
$ ls -lh tornado2.mbtiles
-rw-r--r--  1 bpodgursky  128K Aug  1 22:54 tornado.mbtiles

This gives us a single tornado.mbtiles file.  

In practice I added these layers to an existing map for bunker.land; for simplicity, here I’m going to set up a new empty Style.  After setting up a Mapbox account, navigate to Studio → Styles → New Style.  I use a blank background, but you can also choose an OpenStreetMap background.

We can add these layers directly to the Style.  Navigate through Add layer → Select data → Upload to upload the mbtiles file we just generated.  These features are small and should upload pretty quickly.  Once that’s available (you may need to refresh), we see that there are four layers in the new source:

We’ll create four new layers from this source.  We’ll just use the Mapbox studio to recreate the styling we want, and set the opacity so the overlay is visible but doesn’t obscure anything:

All I needed to do now was get this into a website.

Embedding on bunker.land

Mapbox GL JS has great examples about how to get a Style in a map, so I won’t dig into the code too much; the important part is just loading a map from this style:

mapboxgl.accessToken = YOUR_TOKEN;

var map = new mapboxgl.Map({
  container: 'map', // the div we want to attach the map to
  style: 'mapbox://styles/bpodgursky/cjxw0v4fr7hd81cp6s0230lcw', // the ID of our style
  center: [-98, 40], // starting position [lng, lat] -- this is about the middle of the US
  zoom: 4 // starting zoom level
});

We can see the final result here, overlaid against OpenMapTiles on bunker.land:

Since our layer is just a simple vector tile layer, it’s easy to detect these features on-click for a particular point, along with any other enabled layers:

Wrapping up

It’s now pretty clear why I missed all the tornadoes as a kid — Tornado Alley (kind of) skips right over central Missouri, where I grew up!  My only explanation for this is, “weather is complicated”.

On the technical side, I was surprised how easy it was to generate a decent-looking map; Mapbox and QGIS made it stupidly easy to turn raw data into a clean visualization (and I’ve only been using QGIS for a couple weeks, so I’m sure I missed a few nice shortcuts.) 

Now that I know how to turn ugly data into nice heatmaps or gradient data, I’ll probably work on adding hurricanes and flooding over the next couple weeks.  Stay tuned.