bunker.land — Mapping the Best Places to Wait out a Nuclear War

The news is quite clear: tensions with China are high, Russia is flaunting hypersonic missiles, and even newcomers Iran and North Korea will likely have sophisticated ICBM capabilities within a couple years.  While the general sentiment has been “yeah, nuclear war would definitely suck”, there’s been very limited conversation about how a nuclear war would actually play out, and what it would mean for the average American.

One artifact of the Cold War I find fascinating are the nuclear target maps which identified the likely first and second-strike targets in a nuclear war.  For those who felt the risk of a nuclear confrontation was high, these maps helped inform better and worse places to live.

Unfortunately, I’ve never seen a good resource that exposed this data using modern mapping tools.  I’ve wanted an opportunity to learn about GIS and front-end mapping libraries, so I decided I to build a similar map using modern browser-based map libraries.  

bunker.land

I’ll likely follow up with a post about what this involved technically, but tl,dr it involved:

  • (light) research on which areas of the US are potential high-priority targets
  • (light) research on the impact radius of a nuclear bomb (primarily via NUKEMAP)
  • Finding public-domain maps of US infrastructure by type; these were pretty easy to grab from data.gov and the DOT
  • Calculating the blast radii around potential targets (simple buffers produced with QGIS)
  • Loading all these layers into Mapbox and exposing them on a simple site via Mapbox GL JS 

You can see what I put together at bunker.land, a very simple attempt at mapping out what places in the US would and would not be great places to live during a nuclear exchange.

Although most of the work here went into modeling nuclear targets, there were a few other un/natural disasters I thought would be interesting overlays:

  • Earthquake risk
  • Sea level rise (from global warming)

Normal disclaimer: I am not an expert on much of anything, and especially not on nuclear war.  The maps here should be taken exactly for what they are — aggregated publicly available datasets with minimal filtering or analysis.  Feedback is welcome from actual experts.

Nuclear War

Obviously nuclear war is always bad news, but even in a major war, not everyone is going to be instantly vaporized.  There are especially terrible places to live during a nuclear war — namely, next to any important targets. These maps try to identify for any location in the US whether there are any potential nearby bomb targets in a nuclear strike scenario, and the potential damage range from those strikes:

This map plots potential nuclear targets, sourced from public datasets. Right now I include:

  • Military bases
  • Ports
  • Major cities
  • Rail yards
  • ICBM missile silos
  • State capitals
  • Power plants

This post explains the data sources and filtering farther down.

All nuclear blast radii on this map are modeled on a 5 Mt nuclear airburst (a standard Chinese warhead).  Damage radii use estimates from Alex Wellerstein’s NUKEMAP; for more info, check out his site. This site renders nuclear blast impacts at 4 levels: 

  • 2km: Fireball radius
  • 12km: Air blast radius (5 psi)
  • 25km: Thermal radiation radius
  • 34km: Air blast radius (1 psi)

On the map, the zones look something like this:

Modeling nuclear fallout accurately is a lot harder, and I have not attempted it at all.  The fallout zones depend on airburst height and wind conditions, which are both dynamic and complex.

Targets

This a quick description of each of the target layers available on bunker.land.  Since I don’t know what I’m doing, unless the selection criteria were very obvious, I erred on the side of presenting raw, un-filtered data.  So, many minor military bases, railyards etc are included even if they have no real significance.

Likewise, several categories of likely targets are not included yet, including but not limited to airports, refineries, shipyards, factories, and communication facilities.

Military bases

Strategic military bases are obvious targets in a nuclear war. This map displays all US military installations on US soil, with data sourced from the Department of Transportation.

This map makes no effort to distinguish between major and minor strategic targets; all installations are rendered as potential targets.

Ports

Major US ports are often cited as potential targets in either terrorist attacks or nuclear war, due to their important economic roles and proximity to economic centers.

This map sources a Department of Transportation list of major US ports. No effort was made to filter ports by importance or risk; all ports in this dataset are rendered as potential targets.

Major cities

Even in a major nuclear war, most cities are not valuable targets; only cities with important military targets or infrastructure are likely to be targeted.

This map displays all cities with a population over 100,000 (sourced here) only as a proxy for infrastructure that other layers do not capture.

No effort is made to filter cities by risk or strategic importance.

Rail yards

Cold War nuclear targets frequently include transportation hubs such as railyards. This map includes all US rail yards, as sourced from data.gov.

This is a very inclusive map, and most of these rail yards have little to no strategic value. Without a better metric for inclusion though, all US railyards are modeled as potential targets.

ICBM missile silos

The first priority in a nuclear war is eliminating the enemy’s ability to respond with nuclear weapons. Ground-based nuclear missile silos are very high-value targets.

The United States maintains a ground-based ICBM force of three missile wings spread across Montana, North Dakota, Wyoming, Nebraska, and Colorado.

These silo locations have been sourced from Wikipedia, and no other effort was made to verify operational status.

State capitals

It is generally agreed that US state capitals will be considered high-value targets in a full nuclear war. This map includes all 50 US state capitals as targets.

Power plants

In a nuclear war, power production facilities will be targeted for their military and industrial value. This map pulls from Energy Information Administration datasets all facilities with over 1 GW of capacity, across all production types (coal, hydroelectric, nuclear, etc).

Uncontrolled Sea Level Rise

Unlike nuclear war, sea level rise won’t sneak up and vaporize you while you sleep. But it will make a house a really poor investment .  

Most realistic global warming worst-case scenarios model a 5-10 foot sea level rise by 2100, which is, to be clear, Very Bad News, but is unlikely to sink you unless you live in the everglades.  This map goes further and asks “How far from the ocean would you want to be if all the ice melted — around 220 feet of it.

Elevation data was sourced here, at 100m resolution.

There are a lot of ways global warming could make a place uninhabitable — for example, making it really hot. But this map currently only captures raw sea level rise.

Earthquakes

Earthquakes are usually bad news. Earthquake prediction is challenging, but it’s generally understood which areas of the country are most prone to earthquakes. This map attempts to display areas with especially high earthquake risks.

Earthquake risks are pulled from the 2014 USGS seismic-hazard maps found here. ‘Intensity’ represents the peak horizontal acceleration with 10% probability of exceedance in 50 years, measured as a percentage of gravity.

Only areas with over 10% g are rendered on location markers. 10% was only chosen because it is a round number.

Doom Score

I found that the buffers presented on the map were cool but made it challenging to make a head-to-head numeric comparison between locations.  To make this more straightforward, I added a “Doom Score” which aggregates the enabled hazards for a given point:

It’s not a sophisticated score:  for each enabled target layer, points are assigned by distance:

  • 0-2km: 10
  • 2-12km: 5
  • 12-25km: 2
  • 25-34km: 1

Earthquake risk is assigned as the %g exceedance as measured above by 10.  Eg, 20% chance of exceedance = 2 points. Summed together, these numbers may not represent a ton, but they are fun to compare.

So while Zillow (and similar services) provide useful info about neighborhoods like “Walk Score” and “Transit Score”, bunker.land is the only place you can get a Doom Score.

Follow-ups / Help

I’m not an expert in anything presented on this map. There’s certainly a lot that could be improved:

  • This is by no means an exhaustive list of the things that can kill you. More hazards will (probably) be added to this map over time. Reach out if you have any specific interests (hurricanes, etc).
  • Expanded target lists from reliable data-sets (airports, etc)
  • Contributions appreciated from actual experts about ways to judge which targets are actually important.

I’ll update as I add content to this site (which may or may not happen frequently).   Feature requests and bug reports welcome. Best way to leave feedback is to email me directly at bpodgursky@gmail.com.

Migrating LiveRamp’s Hadoop infrastructure to GCP

At Google Next 2019, co-workers Sasha Kipervarg, Patrick Raymond and I presented about how we migrated our company’s on-premise big-data Hadoop environment to GCP, from both a technical and cultural perspective.  You can view the presentation and slides on YouTube:

Our Google Next presentation did not give us enough time to go into deep technical details; to give a more in-depth technical view of our migration, I’ve put together a series of blog posts on the LiveRamp engineering blog, with an emphasis on how we migrated our Hadoop environment, the infrastructure my team is responsible for maintaining:

Part 1 where I discuss our Hadoop environment, why we decided to migrate LiveRamp to the cloud, and how we chose GCP.

Part 2 where I discuss the design we chose for our Hadoop environment on GCP.

Part 3 where I discuss the migration process and system architecture which let teams incrementally migrate applications to the cloud.

Part 4 written by coworker Porter Westling, where he discusses how we worked around our data center egress bandwidth restrictions.

Part 5 where I discuss LiveRamp’s use of the cloud going forward, and the cultural changes enabled by migrating to the cloud from an on-premise environment.

LiveRamp’s migration into GCP has been my (and my team’s) primary objective for over a year, and we’ve learned (sometimes painfully) a ton on the way.  Hopefully these articles help others who are planning big-data cloud migrations skip a few painful lessons.

Happy Earth Day: drive more and (maybe) save the environment

Yesterday was Earth Day, so Facebook was naturally full of people bragging about how they walked to the store instead of driving, in order to save the Earth.  I feel obligated to point out that this very plausibly isn’t true.  I’m not the first person to run these numbers, but I was curious and wanted to investigate for myself.  My (rather rough) calculations are all here.

As a baseline, we want to calculate the kWh cost of driving a car 1 mile. I’m using a baseline of 33.41 kWh / gallon of gasoline:

Car MPG kWh/ 1 mile
Prius 58 .58
F-150 19 1.76

If you’re bragging on Facebook about your environmental impact, you’re probably driving a Prius, so we’ll roll with that.  Feel free to substitute your own car.

To get the calories burned per mile walking, I used numbers I found here.  The numbers here vary pretty widely with body weight and walking speed, but I’ll use 180 pounds at 3.0 mph for 95 calories per hour.

To get the energy costs per pound of food produced, I used the numbers I found here.  Click through for their sources.  kWh / 1 mile is calculated as

kWh/1 mile = 180/(calories/lb) * ( kWh/lb)

Just to be clear: this isn’t the calories in food.  This is the energy usage required to produce and transport the food to your mouth, which is essentially all fossil fuels.  Numbers vary widely per food source, as expected.

Food Calories / Lb kWh / Lb kWh / 1 mile
Corn 390 0.43 0.10
Milk 291 0.75 0.24
Apples 216 1.67 0.73
Eggs 650 4 0.58
Chicken 573 4.4 0.73
Cheese 1824 6.75 0.35
Pork 480 12.6 2.49
Beef 1176 31.5 2.54

So what’s the conclusion?  It’s mixed.

  • If you drive a Prius, you’re OK walking, as long as you replace the burned calories with Doritos (cheese + corn) and (corn syrup’d) Coca-Cola
  • If you drive a Prius, and you replace the burned calories with a chicken and apple salad (I couldn’t find numbers for lettuce, but they are undoubtedly even worse), you are destroying the planet
  • If you drive an F-150, you’re probably going to replace your burned calories with a steak, so you’re actually saving the environment by driving.

These numbers are of course rough, and do not include:

  • The energy cost of producing a car.  This becomes very complicated very quickly, becaus you likely would have done less damage by just buying a used car instead of a new Prius
  • This assumes you actually eat all the food you ordered, and didn’t leave carrots rotting in the back of your fridge (your fridge, by the way, uses energy).  Americans are notoriously terrible at doing this.
  • This calculates only energy usage — it does not attempt to quantify the environmental impact of turning Brazilian rainforests into organic Kale farms, to grow your fourth-meal salad.
  • This assumes a single rider per car-mile.  If you are carpooling on your drive to KFC, you can cut all the car energy usage numbers by half (or more, for families)

Anyway, I’m sure there are many other reasons these numbers are rough, I just wanted to point out that the conventional wisdom is pretty awful on environmental topics.

Procedural star rendering with three.js and WebGL shaders

Over the past few months I’ve been working on a WebGL visualization of earth’s solar neighborhood — that is, a 3D map of all stars within 75 light years of Earth, rendering stars and (exo)planets as accurately as possible.  In the process I’ve had to learn a lot about WebGL (specifically three.js, the WebGL library I’ve used).  This post goes into more detail about how I ended up doing procedural star rendering using three.js.  

The first iteration of this project rendered stars as large balls, with colors roughly mapped to star temperature.  The balls did technically tell you where a star was, but it’s not a particularly compelling visual:

uncharted-screenshot

Pretty much any interesting WebGL or OpenGL animation uses vertex and fragment shaders to render complex details on surfaces.  In some cases this just means mapping a fixed image onto a shape, but shaders can also be generated randomly, to represent flames, explosions, waves etc.  three.js makes it easy to attach custom vertex and fragment shaders to your meshes, so I decided to take a shot at semi-realistic (or at least, cool-looking) star rendering with my own shaders.  

Some googling brought me to a very helpful guide on the Seeds of Andromeda dev blog which outlined how to procedurally render stars using OpenGL.  This post outlines how I translated a portion of this guide to three.js, along with a few tweaks.

The full code for the fragment and vertex shaders are on GitHub.  I have images here, but the visuals are most interesting on the actual tool (http://uncharted.bpodgursky.com/) since they are larger and animated.

Usual disclaimer — I don’t know anything about astronomy, and I’m new to WebGL, so don’t assume that anything here is “correct” or implemented “cleanly”.  Feedback and suggestions welcome.

My goal was to render something along the lines of this false-color image of the sun:

sun

In the final shader I implemented:

  • the star’s temperature is mapped to an RGB color
  • noise functions try to emulate the real texture
    • a base noise function to generate granules
    • a targeted negative noise function to generate sunspots
    • a broader noise function to generate hotter areas
  • a separate corona is added to show the star at long distances

Temperature mapping
The color of a star is determined by its temperature, following the black body radiation, color spectrum:

color_temperature_of_a_black_body

(sourced from wikipedia)

Since we want to render stars at the correct temperature, it makes sense to access this gradient in the shader where we are choosing  colors for pixels.  Unfortunately, WebGL limits the size of uniforms to a couple hundred on most hardware, making it tough to pack this data into the shader.

In theory WebGL implements vertex texture mapping, which would let the shader fetch the RGB coordinates from a loaded texture, but I wasn’t sure how to do this in WebGL.  So instead I broke the black-body radiation color vector into a large, horrifying, stepwise function:

bool rbucket1 = i < 60.0; // 0, 255 in 60 bool rbucket2 = i >= 60.0 && i < 236.0;  //   255,255
…
float r =
float(rbucket1) * (0.0 + i * 4.25) +
float(rbucket2) * (255.0) +
float(rbucket3) * (255.0 + (i - 236.0) * -2.442) +
float(rbucket4) * (128.0 + (i - 288.0) * -0.764) +
float(rbucket5) * (60.0 + (i - 377.0) * -0.4477)+
float(rbucket6) * 0.0;

Pretty disgusting.  But it works!  The full function is in the shader here

Plugging in the Sun’s temperature (5,778) gives us an exciting shade of off-white:

sun-no-noise

While beautiful, we can do better.

Base noise function (granules)

Going forward I diverge a bit from the SoA guide.  While the SoA guide chooses a temperature and then varies the intensity of the texture based on a noise function, I instead fix high and low surface temperatures for the star, and use the noise function to vary between them.  The high and low temperatures are passed into the shader as uniforms:

 var material = new THREE.ShaderMaterial({
   uniforms: {
     time: uniforms.time,
     scale: uniforms.scale,
     highTemp: {type: "f", value: starData.temperatureEstimate.value.quantity},
     lowTemp: {type: "f", value: starData.temperatureEstimate.value.quantity / 4}
   },
   vertexShader: shaders.dynamicVertexShader,
   fragmentShader: shaders.starFragmentShader,
   transparent: false,
   polygonOffset: -.1,
   usePolygonOffset: true
 });

All the noise functions below shift the pixel temperature, which is then mapped to an RGB color.

Convection currents on the surface of the sun generate noisy “granules” of hotter and cooler areas.  To represent these granules an available WebGL implementation of 3D simplex noise.    The base noise for a pixel is just the simplex noise at the vertex coordinates, plus some magic numbers (simply tuned to whatever looked “realistic”):

void main( void ) {
float noiseBase = (noise(vTexCoord3D , .40, 0.7)+1.0)/2.0;

The number of octaves in the simplex noise determines the “depth” of the noise, as zoom increases.  The tradeoff of course is that each octave increases the work the GPU computes each frame, so more octaves == fewer frames per second.  Here is the sun rendered at 2 octaves:

sun-2-octaves

4 octaves (which I ended up using):

sun-4-octaves

and 8 octaves (too intense to render real-time with acceptable performance):

sun-8-octaves

Sunspots

Sunspots are areas on the surface of a star with a reduced surface temperature due to magnetic field flux.  My implementation of sunspots is pretty simple; I take the same noise function we used for the granules, but with a decreased frequency, higher amplitude and initial offset.  By only taking the positive values (the max function), the sunspots show up as discrete features rather than continuous noise.  The final value (“ss”) is then subtracted from the initial noise.

float frequency = 0.04;
float t1 = snoise(vTexCoord3D * frequency)*2.7 -  1.9;
float ss = max(0.0, t1);

This adds only a single snoise call per pixel, and looks reasonably good:

sunspots-impl

Additional temperature variation

To add a bit more noise, the noise function is used one last time, this time to add temperature in broader areas, for a bit more noise:

float brightNoise= snoise(vTexCoord3D * .02)*1.4- .9;
float brightSpot = max(0.0, brightNoise);

float total = noiseBase - ss + brightSpot;

All together, this is what the final shader looks like:

sun-final

Corona

Stars are very small, on a stellar scale.  The main goal of this project is to be able to visually hop around the Earth’s solar neighborhood, so we need to be able to see stars at a long distance (like we can in real life).  

The easiest solution is to just have a very large fixed sprite attached at the star’s location.  This solution has some issues though:

  • being inside a large semi-opaque sprite (ex, when zoomed up towards a star) occludes vision of everything else
  • scaled sprites in Three.js do not play well with raycasting (the raycaster misses the sprite, making it impossible to select stars by mousing over them)
  • a fixed sprite will not vary its color by star temperature

I ended up implementing a shader which implemented a corona shader with

  • RGB color based on the star’s temperature (same implementation as above)
  • color near the focus trending towards pure white
  • size was proportional to camera distance (up to a max distance)
  • a bit of lens flare (this didn’t work very well)

Full code here.  Lots of magic constants for aesthetics, like before.

Close to the target star, the corona is mostly occluded by the detail mesh:

corona-close

At a distance the corona remains visible:

corona-distance

On a cooler (temperature) star:

corona-flare

The corona mesh serves two purposes

  • calculating intersections during raycasting (to enable targeting stars via mouseover and clicking)
  • star visibility

Using a custom shader to implement both of these use-cases let me cut the number of rendered three.js meshes in half; this is great, because rendering half as many objects means each frame renders twice as quickly.

Conclusions

This shader is a pretty good first step, but I’d like to make a few improvements and additions when I have a chance:

  • Solar flares (and other 3D surface activity)
  • More accurate sunspot rendering (the size and frequency aren’t based on any real science)
  • Fix coronas to more accurately represent a star’s real visual magnitude — the most obvious ones here are the largest ones, not necessarily the brightest ones

My goal is to follow up this post a couple others about parts of this project I think turned out well, starting with the orbit controls (the logic for panning the camera around a fixed point while orbiting).  

3D map of Solar Neighborhood using three.js (again!)

A few years ago I posted about a WebGL visualization of the neighborhood around our sun.  It was never as polished as I wanted, so on-and-off over the past few months I’ve been working on making it more interesting.  The project is still located here:

http://uncharted.bpodgursky.com/

The code is still hosted on GitHub:

https://github.com/bpodgursky/uncharted

Two of the improvements I’m especially excited about.  First the star rendering now uses glsl shaders which are based on the star’s temperature, giving cool (and animated!) visuals:

alpha2centauri

Second, all known exoplanets (planets orbiting stars besides our Sun) are rendered around their parent stars.  The textures here are of course fake, but the orbits are accurate where the data is known:

rocky-planet

I’ve also included all the full planets in our solar system with full textures and (hopefully accurate) orbits:

mercury

I’ve updated the README on the GitHub project with all the changes (I’ve also totally reworked the controls).

I’m going to try to write some more granular posts about what actually went into the three.js and glsl to implement this, since I learned a ton in the process.

 

Catalog of Life Taxonomic Tree

A small visualization I’ve wanted to do for a while is a tree of life graph — visualizing all known species and their relationships.

Recently I found that the Catalog of Life project has a very accessible database of all known species / taxonomic groups and their relationships, available for download here.  This let me put together a simple site backed by their database, available here:

http://taxontree.bpodgursky.com/

uncharted-screenshot

All the source code is available on Github.

Design

I’ve used dagre + d3 on a number of other graph visualization projects, so dagre-d3 was the natural choice for the  visualization component.  The actual code required to do the graph building and rendering is pretty trivial.

The data fetching was a bit trickier.  Since pre-loading tens of millions of records was obviously unrealistic, I had to implement a graph class (BackedBiGraph) which lazily expands and collapses, using user-provided callbacks to fetch new data.  In this case, the callbacks were just ajax calls back to the server.

The Catalog of Life database did not come with a Java client, so I thought this would be a good opportunity to use jOOQ to generate Java models and query builders corresponding to the COL database, since I am allergic to writing actual SQL queries.  This ended up working very well — configuring the jOOQ Maven plugin was simple, and the generated code made writing the queries trivial:

 private Collection&lt;TaxonNodeInfo&gt; taxonInfo(Condition condition) {
return context.select()
.from(TAXON)
.leftOuterJoin(TAXON_NAME_ELEMENT)
.on(TAXON.ID.equal(TAXON_NAME_ELEMENT.TAXON_ID))
.leftOuterJoin(SCIENTIFIC_NAME_ELEMENT)
.on(SCIENTIFIC_NAME_ELEMENT.ID.equal(TAXON_NAME_ELEMENT.SCIENTIFIC_NAME_ELEMENT_ID))
.where(condition).fetch().stream().map(record -&gt; new TaxonNodeInfo(
record.into(TAXON),
record.into(TAXON_NAME_ELEMENT),
record.into(SCIENTIFIC_NAME_ELEMENT)
)).collect(Collectors.toList());
}

All in all, there are a lot of rough edges still, but dagre, d3 and jOOQ made this a much easier project than expected.  The code is on Github, so suggestions, improvements, or bugfixes are always welcome.

 

 

D3 NLP Visualizer Update

A couple years ago I put together a simple NLP parse tree visualizer demo which used d3 and the dagre layout library.  At the time, integrating dagre with d3 required a bit of manual work (or copy-paste); however, since then dagre-d3 library was split out of dagre which added an easy API for adding and removing nodes.  Even though the library isn’t under active development, I think it’s still the most powerful pure-JS directed graph layout library out there.

An example from the wiki shows how simple the the dagre-d3 API is for creating directed graphs, abbreviated here:

    // Create the input graph
    var g = new dagreD3.graphlib.Graph()
      .setGraph({})
      .setDefaultEdgeLabel(function() { return {}; });

    // Here we"re setting nodeclass, which is used by our custom drawNodes function
    // below.
    g.setNode(0,  { label: "TOP", class: "type-TOP" });

    // ... snip

    // Set up edges, no special attributes.
    g.setEdge(3, 4);
    
    // ... snip
    // Create the renderer
    var render = new dagreD3.render();

    var svg = d3.select("svg"),
        svgGroup = svg.append("g");

    // Run the renderer. This is what draws the final graph.
    render(d3.select("svg g"), g);

Since that original demo site still gets a fair amount of traffic, I thought it would be good to update it to use dagre-d3 instead of the original hand-rolled bindings (along with other cleanup).  You can see the change set required here.

The other goal was to re-familiarize myself with d3 and dagre, since I have a couple projects in mind which would make heavy use of both.  Hopefully I’ll have something to post here in the next couple months.

3d Map of our Solar Neighborhood using three.js

A few months ago I stumbled on three.js, a library which exposes a simple WebGL interface.  I was really impressed at both the performance of WebGL and how easy three.js made building high-performance animations in the browser.

I thought this would be a good opportunity to put together a visualization I’ve been looking for for a while–a map of our solar neighborhood, showing all the closest stars to our own.  The data-set I used was the HYG database of nearby stars, which is a compilation of all stars within 50 parsecs.  I’ve cross-referenced stars against wikipedia where available.   The site is available here:

http://uncharted.bpodgursky.com/

Screenshot:

uncharted-screenshot

The whole project is open-source, hosted here.

The project isn’t nearly as polished as I would have liked (there’s a long to-do list on the github page), but I’m trying to commit to releasing projects rather than letting them die silently.  I’m hoping to be able to iterate on the remaining issues over the next few months.

Thoughts, suggestions, or contributions always welcome here or on the GitHub page.

Simple Boolean Expression Manipulation in Java

I’ve worked on a couple projects recently where I needed to be able to do some lightweight propositional expression manipulation in Java.  Specifically, I wanted to be able to:

  • Let a user input simple logical expressions, and parse them into Java data structures
  • Evaluate the truth of the statement given values for each variable
  • Incrementally update the expression as values are assigned to the variables
  • If the statement given some variable assignments is not definitively true or false, show which terms remain.
  • Perform basic simplification of redundant terms (full satisfiability is of course NP hard, so this would only include basic simplification)

I couldn’t find a Java library which made this particularly easy; a couple stackoverflow questions I found didn’t have any particularly easy solutions.  I decided to take a shot at implementing a basic library.  The result is on GitHub as the jbool_expressions library.

(most of the rest of this is copied from the README, so feel free to read it there.)

Using the library, a basic propositional expression is built out of the types And, Or, Not, Variable and Literal. All of these extend the base type Expression.  An Expression can be built programatically:

    Expression expr = And.of(
        Variable.of("A"),
        Variable.of("B"),
        Or.of(Variable.of("C"), Not.of(Variable.of("C"))));
    System.out.println(expr);

or by parsing a string:

    Expression expr =
        ExprParser.parse("( ( (! C) | C) & A & B)");
    System.out.println(expr);

The expression is the same either way:

    ((!C | C) & A & B)

We can do some basic simplification to eliminate the redundant terms:

    Expression simplified = RuleSet.simplify(expr);
    System.out.println(expr);

to see the redundant terms are simplified to “true”:

    (A & B)

We can assign a value to one of the variables, and see that the expression is simplified after assigning “A” a value:

    Expression halfAssigned = RuleSet.assign(
        simplified,
        Collections.singletonMap("A", true)
    );
    System.out.println(halfAssigned);

We can see the remaining expression:

    B

If we assign a value to the remaining variable, we can see the expression evaluate to a literal:

    Expression resolved = RuleSet.assign(
        halfAssigned,
         Collections.singletonMap("B", true)
     );
    System.out.println(resolved);
    true

All expressions are immutable (we got a new expression back each time we performed an operation), so we can see that the original expression is unmodified:

    System.out.println(expr);
    ((!C | C) & A & B)

Expressions can also be converted to sum-of-products form:

    Expression nonStandard = PrefixParser.parse(
        "(* (+ A B) (+ C D))"
    );

    System.out.println(nonStandard);

    Expression sopForm = RuleSet.toSop(nonStandard);
    System.out.println(sopForm);
    ((A | B) & (C | D))
    ((A & C) | (A & D) | (B & C) | (B & D))

You can build the library yourself or grab it via maven:


<dependency>
    <groupId>com.bpodgursky</groupId>
    <artifactId>jbool_expressions</artifactId>
    <version>1.3</version>
</dependency>

Happy to hear any feedback / bugs / improvements etc. I’d also be interested in hearing how other people have dealt with this problem, and if there are any better libraries out there.

Taxi Loading at SFO

I usually avoid catching Taxis whenever possible, but when I arrived in SFO last week the trains were no longer running and I hadn’t arranged for a shuttle, so I ended up waiting in line to catch a Taxi.  The line was structured something like this:

Taxi line 1 (1)

  • There was a loading area about four cars long where Taxis were loading passengers
  • Would-be passengers waited in line along the curb to the left, waiting for a Taxi
  • Likewise, taxis waited in line for passengers on the other side of the curb
  • As people loaded into Taxis and departed, each line advanced to the right, matching the front of the Taxi line with the front of the passenger line
  • An airport employee stood stood near the front of the line, shepherding people and cabs around to enforce this flow

Of course, this felt like an extremely inefficient system; I was waiting next to a cab which was waiting for a passenger; had we been allowed, I would have just jumped in the cab next to me and we both would have been happier.  However, since the line of people was denser than the Taxi line, I would have been cutting in front of other people in line.

In college I took a couple classes where we learned about queuing algorithms and the standard trade-offs involved.  On the ride back I thought about how they applied to the Taxi-loading situation here:

  • Throughput: how many passengers per hour could the system match to Taxis?   This was not being optimized for, or I could have gotten into the Taxi beside me.
  • Fairness: this was pretty clearly what was being optimized for–both the Taxi line and the passenger line were being processed in First-In-First-Out (FIFO) ordering. 
  • Average wait time:  I don’t think wait time was being taken into account, especially since passengers with less luggage (and therefore faster loading) would have been given priority over passengers with many bags.

A couple other issues were specific to this situation:

  • The matching process should not involve an inordinate amount of walking by prospective passengers (a passenger should never have to walk the entire length of the Taxi queue to find a cab)
  • If cabs frequently have to pass other cabs to advance to the head of the queue, it increases the odds of an accident (or of getting run over, if you are loading your bags into the trunk.)

I’d like to think that a better system exists (“there has to be a better way!”), even if it sacrifices some amount of fairness, since clearly this system would scale poorly if the airport was busier.

If anyone knows of airports/malls/etc that do a better job, I’d be interested in knowing how they manage it.  I didn’t waste an enormous amount of time in line (~10 minutes), but if the line is on average 50 people long, that’s actually a huge amount of time being squandered over the course of a year.