Our family’s IVF cycle using whole-genome embryo screening

Comparing our cycles: PGT-A, then Orchid PGT-WGS

A few months ago I looked back at the PGT-A my wife and I received for our first IVF cycle in 2019, and found that low-quality PGT-A results resulted in our clinic discarding possibly viable embryos — 2 embryos with chaotic results, and 1 extremely low-level mosaicism (here’s a “chaotic” example, you can read more here).

This year we decided to try for a third child (in the meantime we had unexpectedly, but happily, conceived a daughter unassisted). Given our challenges conceiving naturally and our advancing ages (we’re both 35), we decided to do another embryo creation cycle, to have embryos available for the future. We wanted to use Orchid’s Whole-Genome Preimplantation Genetic Sequencing (PGT-WGS) this time*, instead of the more common Preimplantation Genetic Testing for Aneuploidy (PGT-A).

Now that we’ve been through two IVF cycles — one guided by PGT-A, and one using PGT-WGS — I wanted to walk through what each cycle looked like for us, and highlight the places where PGT-WGS gave us more, or higher-quality, information than our 2019 PGT-A cycle:

The improved aneuploidy screening from PGT-WGS gave us confidence that our euploid embryos were euploid, and our aneuploid embryos were aneuploid — unlike during our first cycle.
PGT-WGS detected an embryo with whole-genome Uniparental Isodisomy (UPD), which was not only non-viable, potentially resulting in a molar pregnancy, but would have presented a health risk to my wife.
Monogenic screening on the ~1,200 genes on the Hereditary Cancer, Birth Defect, and Neurodevelopmental Disorder panels gave us a level of confidence that we had reduced the risk of severe genetic disease.
When selecting an embryo to transfer, genetic risk let us prioritize an embryo which had a lower risk for inflammatory bowel disease.

(I’m including minimally-redacted versions of the embryo data, aneuploidy plots, and PGT results from our cycles in this post. Please reach out if there’s additional information you’d like to see).

Retrieval cycle

Many IVF centers have strict policies on embryo transfers — for example, refusing to transfer any embryo with a monogenic finding. It’s important to pick a clinic and a physician comfortable with the information provided by PGT-WGS, who will work with you, as a patient, when deciding which embryo to transfer.

We chose a nearby Kindbody location for our embryo retrieval and transfer — the friends who recommended Kindbody had explicitly mentioned the patient-centric experience. Our local Kindbody location already worked with Orchid, and our physician was very familiar with the additional information available in a PGT-WGS report, so including Orchid in our cycle plan was easy.

We retrieved 13 eggs; 11 of those eggs were mature and successfully fertilized. 9 of those embryos developed enough to be biopsied; Kindbody performed a standard embryo biopsy and shipped the biopsies to Orchid for PGT-WGS testing.

We were happy with these results so far. If found to be chromosomally normal after PGT, the 4 “Excellent” embryos would have a 65% to 80% chance of resulting in a pregnancy. Even the two “Average” embryos would have over 50% odds of pregnancy.

A few weeks later, we got our PGT-WGS results, and 4 of the embryos were euploid, in line with expectations for our ages (in our age bracket, about 46% of embryos will be euploid).

Better PGT-A through PGT-WGS

I’ll break down these results, and how they compare to our previous PGT-A test.

4 of our embryos were euploid — 2 male, 2 female. 5 were aneuploid — 4 with large chromosome duplications or deletions, and one case of whole-genome UPD. I’ll talk more about that one below.

Keeping more embryos

After losing potentially viable embryos to noisy PGT-A in our first cycle, better aneuploidy screening was our driving motivation for using PGT-WGS — we wanted to be confident that the aneuploidy results were real.

As a reminder, the IVF clinic in our first PGT-A cycle had discarded 3 embryos with “aneuploid” results not strongly supported by the raw data — two embryos had been reported as “Chaotic” (likely noise from a poor-quality biopsy rather than true aneuploidy) and one as “Aneuploid” (when re-analyzed, a low-level mosaic)

The embryos called aneuploid in our PGT-WGS cycle, on the other hand, showed very clear gains and losses of chromosomes. As it happened, one embryo from each cycle had a Trisomy 21. Below, you can visually compare a “Chaotic” embryo from the first cycle (top), a Trisomy 21 embryo from PGT-A first cycle (middle), and a Trisomy 21 embryo from our PGT-WGS cycle (bottom):

(A quick guide to interpreting these plots: PGT-A measures the number of copies of each section of your genome. Chromosomally normal humans have 2 copies of chromosomes 1-22 (one from the father, one from the mother). Male embryos have a single X chromosome and a single Y chromosome, while female embryos have 2 X chromosomes and 0 Y.

On the PGT-A plot, this means that chromosomes 1-22 should have points centered on the “2” line of the CN plot. Males will have a signal of “1” for the X chromosome and “1” for the Y chromosome, while females will have an X at “2” and little to no signal for the Y chromosome. Small variations above and below are due to difficulty in sequencing DNA from the 3-5 cells in an embryo biopsy.)

After requesting the raw data, we could visually see the higher resolution and more consistent genome coverage in the PGT-WGS results, and felt more confident we were discarding only embryos with true aneuploidy.

Detecting smaller aneuploidy

Of course, it was just as important that the embryos we did select for transfer were healthy. We wanted to use PGT-WGS to screen for clinically significant gains or losses of chromosomes that wouldn’t be caught by standard PGT-A.

PGT-WGS can detect chromosomal abnormalities smaller than standard PGT-A. Standard PGT-A will report segmental chromosome gains and losses down to about 10 million base pairs, while Orchid PGT-WGS detects gains and losses down to 400,000 base pairs in specific regions linked to disease (not all small chromosomal insertions or deletions cause problems in a child). These regions are included in Orchid’s Microduplications and deletions screening panel.

This screening is possible due to the higher resolution and more consistent coverage achievable by PGT-WGS. We can look at the raw data from the embryo we transferred during our PGT-A cycle (top), and a simplified PGT-A visualization from the embryo we transferred during our recent PGT-WGS cycle (bottom).

To be clear, there weren’t any issues to catch in the top embryo — our now-3-year old son is perfectly healthy — but the extra screening lets us sleep a bit better, knowing we did our best to prevent microduplications and deletions which could result in autism, developmental delays, or failure to thrive in our child.

Preventing a molar pregnancy

Last, but potentially most importantly — an embryo was found to have genome-wide Uniparental isodisomy (UPD). Embryos with full UPD which successfully implant result in a molar pregnancy — a growth of irregular cells rather than a developed fetus.

A molar pregnancy is not viable. Most will miscarry early in pregnancy, but the ones that don’t miscarry will need to be terminated, often surgically. In rare cases, the molar cells may become cancerous if not fully removed. Even in the best case, a transfer which resulted in a molar pregnancy would have significantly delayed our family planning, a big setback at our age (we’re already 35), as women are asked to avoid pregnancy for 6 to 12 months after the termination of a molar pregnancy.

While PGT labs provide screening for women with a history of molar pregnancies, standard PGT-A would not detect UPD in a couple like us without a history of molar pregnancy. PGT-WGS, by performing whole-genome sequencing, can detect conditions which appear to be a euploid embryo during the PGT-A screening in many labs, such as Uniparental isodisomy and triploidy.

(To go a bit deeper. If you only look at chromosome counts, KC10, an embryo with UPD looks very similar to KC9, a euploid female — each embryo has two copies of each chromosome, the top plot in each figure.

However, we can overlay the variant-level information available via PGT-WGS to distinguish the two. Most variants in a viable euploid embryo will be inherited only from a single parent and show up at 50% frequency (ie, heterozygous). However, since KC10 has two copies of the same chromosome, you only see variants at 100% frequency (ie, homozygous)).

Past PGT-A — Screening for disease

Monogenic variants

PGT-WGS allows us to screen for single-gene disorders that PGT-A will not detect. Orchid’s PGT-WGS screening offers screening on about 1,200 genes linked to serious disease on four screening panels — Neurodevelopmental disorders, Birth defects, Hereditary cancer, and ACMG secondary findings.

None of our embryos had monogenic findings:

This was a relief, but not a shock — about 3-4% of embryos have monogenic findings, so it was likely that PGT-WGS wouldn’t find any variants linked to inherited disease on our 4 euploid embryos.

Genetic risk scores

While the whole-genome sequencing data captured during PGT-WGS is primarily used to detect pathogenic variants during the monogenic screening I just described, it also allows us to compute the genetic risk for each embryo, measuring their predisposition to 11 chronic diseases (many of which are adult-onset). This testing is optional, if parents feel comfortable with the extra information available.

The risk of each disease is presented as compared to the population average, and will look something like the below (these are the risks for KC4):

Comparing the risk of disease to the population average is critical when putting these risks into context. An embryo with a monogenic finding has an extremely high chance of inheriting a disease, and a couple with a monogenic finding may choose not to transfer such an embryo.

By contrast, an embryo with a high genetic risk score (also known as a polygenic risk score) may still have a relatively small absolute risk for a disease (the rightmost column). Genetic risk scores are a useful tool for prioritizing an embryo to transfer, in the absence of other information (especially when coupled with a family history of disease). While my wife and I would have been happy to transfer any of our viable embryos, these scores gave us a way to minimize the risks of specific diseases (more on this later).

Transfer guided by PGT-A

In our PGT-A cycle, the decision about which embryo to transfer was pretty simple. In fact, for most couples, the consultation is pro-forma — your IVF physician will pick the embryo they want to transfer. For us, that was the highest quality euploid embryo, KC-6 (“good”):

Our fertility clinic suggested the highest-grade euploid embryo, KC-6, which we transferred successfully, and resulted in our first child, a healthy boy.

Transfer informed by PGT-WGS

In our PGT-WGS cycle, my wife and I had more information to work with.

If there wasn’t a downside, we’d always liked the idea of alternating boy/girl/boy/girl. Since we’d most recently had a daughter, we decided to look at KC-3 and KC-4, the two male embryos. Both embryos were grade 4AA — one of the highest embryo grades — so we had no reason to think that one would have a higher chance of successful transfer than the other. And since neither embryo had monogenic findings to consider, we looked at the genetic risk scores for the two embryos.

There are a few ways to visualize genetic risk scores, but in my opinion it’s easiest to interpret genetic risk when the risk is mapped to an estimate of the absolute disease risk, which is how Orchid presents it. We were able to compare the absolute disease risk for KC3 and KC4, the two male embryos, side-by-side:

Both sides of our family had a history of serious mental health issues (first-degree relatives with diagnosed Schizophrenia, and severe anxiety). If one embryo had had a meaningfully reduced risk of Schizophrenia or Bipolar disorder, we would have prioritized it for transfer, but in this case, the differences were negligible.

However, several other diseases had meaningful differences. KC3 had a markedly higher risk of prostate cancer, while KC4 was at higher risk for Inflammatory Bowel Disease and Celiac Disease — autoimmune disorders of the digestive system.

My wife and I took a couple days to discuss how we felt about this tradeoff. Prostate cancer is a real disease, but it’s a late-onset disease with an extremely low mortality rate, given appropriate preventative screening — easy to arrange, especially if a child knows they have a predisposition towards the disease. On the other hand, our daughter had struggled with hard-to-diagnose stomach pain and slow weight gain for the first year after she started solid food. While her symptoms had mostly resolved, IBD and Celiac are frequently lifelong afflictions and can’t be easily treated (so far, anyway).

We both agreed we’d prefer to put our finger on the scale and reduce the risk of infant digestive problems by transferring KC3 first. We let Kindbody know about our decision, and scheduled a transfer for a couple weeks later.

Looking forward

We were optimistic about our chance of a successful transfer, given our PGT-WGS screening, a good embryo grade, and a history of successful transfers — but that didn’t make the wait any less anxious. But a few weeks later, we got a positive test, and a clean ultrasound a few weeks after that — the transfer had been successful.

PGT-WGS, and especially genetic risk scores, can’t guarantee a healthy child. But by screening for aneuploidy at higher resolution than standard PGT-A, screening for monogenic findings, and finally prioritizing based on genetic risk, we were happy we had done what we could to reduce the risk of preventable disease. Just as importantly, PGT-WGS helped us reduce the risk of discarding viable embryos and reduce the risk of pregnancy complications for my wife.

To wrap things up, I want to really highlight the points where PGT-WGS gave us information we would not have gotten otherwise:

PGT-WGS allowed us to avoid transferring an embryo which would, if transferred successfully, have resulted in a molar pregnancy — a significant health risk to my wife, and at minimum a big setback in our family planning goals (6-12 month delay) .
Whole-genome screening gave us confidence that our euploid embryos were really euploid, and our aneuploid embryos were really aneuploid. After discarding 3 potentially viable embryos in our first cycle due to poor PGT-A, this was important to us.
Monogenic screening reduced the risk of rare diseases linked to serious health problems.
Prioritizing embryos based on genetic risk is usually debated in the abstract, rather than in the context a real IVF cycle. Like real families, my wife and I used genetic risk as a prioritization tool, not a filter. It’s very possible we’ll end up transferring all the euploid embryos from this cycle, and have no qualms doing so!

The last thing I want to mention is that the Kindbody patient experience is the best of any IVF center we’ve worked with (long story, but this is the fourth). IVF is a stressful process, and even minor mixups in communication, ambiguity around timelines, or delay filling a prescription can be frustrating. But at Kindbody we had 0 issues getting an immediate response from our support staff when we had problems.

This is already long, so I’ll leave it here. I’m happy to answer any other questions about our experience in either cycle — please feel feel to reach out, either here or by email. We’ve learned a ton through this process and nothing makes me happier than paying it forward.

* I’m writing this up from my and my wife’s experience as patients, but I lead the engineering team at Orchid. I’ve also published a copy of this on guides.orchidhealth.com.

I think it’s OK to want your children to be healthy even if the world falls apart

Embryo screening is a lightning rod for constructive criticism, most recently in the context of Orchid’s embryo screening for predisposition to severe mental health disease, namely Schizophrenia (Orchid screens for both monogenic and polygenic disease)^[1].

I want to talk about one quote which specifically bothers me:

To me, it’s not.

The point of studying the genetics of disease is so children born in 2024 can be healthier, happier, and live longer than I will. Modern medicine is an incredibly laudable enterprise which has saved and will ad infinitum save millions of lives a year, even in a world where every embryo was screened for genetic disease. But disease prevention via embryo screening isn’t comparable to pharmaceuticals and as a parent I think it’s crazy to equate the the two ^[2].

Preventing a disease by selecting an embryo with lower genetic predisposition to disease, or which is free of a monogenic disorder, means that if you succeed ^[3], your child doesn’t have the disease. Successfully treating a disease via therapeutics means your child is free of disease as long as the medical system and pharmaceutical industry is functioning at high efficiency.

And I mean, it mostly has. If I was writing this post in 2019, there was a case to be zealously bullish on globalism, to believe we had passed the event horizon of interconnectedness and international trade would grow stronger until it sustained itself completely independent of the mundane politics of disease and fuel prices and cargo ships being blown up by ballistic missiles, but the pandemic cracked that fantasy and anyone who tried to buy toilet paper or cars in 2020 knows that supply chains work until they don’t.

But it’s obvious now that international trade is fragile when actual history happens. That may be a reasonable tradeoff; if a global pandemic cuts car imports in half, it’s inconvenient but you drive to Walmart in a 1995 Sonoma instead of a 2023 Kia to buy domestic wonderbread and somehow muddle through.

But if a global pandemic cuts insulin production in half, your diabetic child dies.

Medicine being unavailable for good, bad, or surprise reasons is not a hypothetical thing, it’s something that all normal people on prescriptions know and are terrified of:

Diabetes is one of the best-understood and managed chronic diseases, but that management (esp for Type 1 diabetes) requires insulin. When the pandemic started and supply chains collapsed, millions of people were afraid that they would be unable to refill prescriptions.

ADHD is an extremely well-understood disease, whose management requires medication. For more than a year, there has been a critical shortage of Adderall manufacturing, and nobody is close to figuring out why, much less fixing it.

When Ivermectin supplies got shanghaied by mobs using it as moonshine COVID-19 therapeutics, people who genuinely needed it to treat Arthritis got completely shafted by supply shortages.

An increasing % of US pharmaceuticals are either imported directly from China or rely on imported ingredients. Global trade is fragile, and there are a thousand possible conflicts even short of a hot war that would kneecap this supply. Cheap consumer imports are easily substitutable, drugs are not.

Sometimes drugs just disappear! It might be regulatory or because your disease is rare and unprofitable or maybe they lost a patent case. It’s unlikely that drugs for major, common diseases disappear altogether, but if you have a specific rare disease, good luck!

The good news? We have a pharmaceutical for severe allergies, Epinephrine! The bad news? EpiPens that used to go for $100 are now $600! And they expire every year! Deal with it!

We live in a hyperconnected world, we’re obviously not walking that back, even the most intense offgrid survivalists can maybe set themselves up for a decade of self-sufficiency before their solar panels degrade, water pumps fail and they have to reconnect with civilization or die of dysentery. I get it.

But I can’t put myself in the mindset of wanting to bind their children to the good grace and fortunes of the multinational pharmaceutical-industrial complex. For daily medication! The most charitable way to frame this statement is to say, it’s a statement from someone who has never been let down by the medical system, and has confidence that the engine will spit out pills at nominal capacity for the rest of our natural lives.

And maybe it will! But I think I speak for the vast majority of parents when I say I would rather not bet my children on it.

Appendix:

The more mundane criticism here is, couples have been using monogenic preimplantation testing (PGT-M) for decades, and nobody is telling parents they are immoral to screen their children for deafness or Huntington’s disease. Why is it ethically repugnant to do the same for mental health? I genuinely have trouble playing devil’s advocate on this point. The only possibilities I can think of:

Schizophrenia isn’t a “real” disease to the critics. Maybe they are more open to screening for cancer and heart disease predisposition? I doubt this, because this line of criticism comes from mental health researchers!
Lack of confidence in the strength of the datasets being used to construct Schizophrenia predisposition scores. But if you think that, say it! That’s not an indictment of the ethics, it’s a criticism of the data.

What am I missing here?

[1] I lead engineering at Orchid

[2] Gene therapy is the obvious exception here; gene therapy is incredibly promising and I hope it takes off. However it’s hard to imagine gene therapy in the near-term moving the needle on rare disease much less polygenic disease.

[3] Polygenic predispositions are of course a thousand miles away from a guarantee. The science improves each year, but for now you’re just weighting the dice.

Jurassic Park is a bad bioengineering parable but a great AI alignment allegory

Jurassic park is a vividly entertaining movie, but few people watch it for the chaos theory. At most watchers come away with a first-order “bioengineering is dangerous and we shouldn’t play God” message. Which isn’t completely unintended, but is a very shallow version of the lesson Crichton wrote into the (more thoughtful, but less flashy) book of the same name, which is about inevitable unpredictable failures in complex control systems.

(the only place this gets airtime in the movie is the scene where Malcom flirts with Elie in the car, rolling water droplets off her hand, wherein the chaos theory is overshadowed by Malcom’s slightly scandalous horndog antics)

But long story short, the core of the story is a blow-by-blow illustration of the debate between the billionaire Hammond and the mathematician Ian Malcom, and if you abstract away the dinosaur flesh and cut down to the bones, you get this conversation:

Hammond: “We have created a novel complex system of living agents, and we can harness it to do immense good for the world!”

Malcolm: “But you can’t possibly control a system you don’t fully understand!”

Hammond: “Nonsense, we bought state-of-the-art technology, hired the best engineers, and haven’t had any serious accidents!”

Malcolm: “You can’t control this system because you think of these agents as your playthings, but they think of themselves as agents, and their goal is to survive. Without understanding the full system, it’s hubris to guess how or when it will fail, but I’m telling you it will fail.

(and of course, two hours or two hundred pages later, Hammond admits Malcom is right, the dinosaurs escape, and a lot of people get eaten).

It’s hard to take this seriously as a parable about existential risk, because at the end of the day you can make a T-Rex really scary but it’s hard to shake the feeling that a couple tactical bazookas would bring the T-Rex back into containment (and even in-universe they have to fudge this heavily with an inconvenient hurricane and evacuations, although I suppose this is one of the “unpredictable failure” modes)

But as Matt Yglesias has pointed out recently we actually kind of suck at writing approachable stories about existential risk, and in particular about AI risk, and I want to make the case here that Jurassic Park is actually best viewed as of 2022 as a story about how hard it is to align a superintelligent agent with human utility, and I think there are some concrete parallels to AI alignment, at least as it exists today:

Systems which operate as agents eventually optimize their own objective function, not yours

In the story, “dinosaur safety” researchers built in not one but two failsafes to ensure containment:

All dinosaurs were deficient in the amino acid Lysine
All the dinosaurs were female

The dinosaurs of course, did not even realize they were supposed to be contained by these failsafes, and responded by (unpredictably) converting from female to male like a frog, and by eating lysine-rich foods.

In the generic version of that conversation above, you just end up describing the problem of AI alignment as is normally framed: AI when built as a tool (ex, translators, image detection, protein folding) is likely safe from dangerous outcomes (to the extent that people don’t use it to say, design novel pathogens), but as soon as you turn that system into an agent with goals, it becomes extremely difficult to keep the agent optimizing for human-oriented goals.

The classic parallel here is obviously the paperclip maximizer scenario: a friendly agent whose goal is to make as many paperclips as possible (for humans!) , but decides that the most effective way to maximize the paperclip count is to first use all atoms in the observable universe to replicate itself, and consequently converts all humans into paperclip-maximizer motor oil.

The AI fails containment not even through maliciousness or deviance (which is a whole separate problem) but by treating its own failsafe as an obstacle to be overcome; because of course it has no inbuilt reason to respect the spirit of the law, or even a concept of what that means to us.

The system designers honestly tried to be responsible about containment, but stopping (or even slowing down) was not an acceptable outcome

Robert Muldoon was the experienced game warden brought into Jurassic Park by Hammond to ensure the safety of the park. Muldoon’s advice about how to contain the raptors (paraphrased)?

Muldoon: “We should kill them all “

Hammond: “lolno, we’re not shutting the park down”

AI safety is taken seriously by all the big players right now, but similarly has a “yes, and…” mandate. If the DeepMind alignment team’s conclusion was “we can’t trust that any models with over 10 billion parameters are safe to release in a public-facing product”, Google is going to hire a new safety team.

Early warnings where the agents cross tripwires and cause real harm are probably just going to get brushed under the carpet by lawyers and ethics committees

The Nedry-instigated power loss and T-Rex escape wasn’t the first sign that Jurassic Park’s containment was fallible; the opening hook to both the book and movie is an animal handler’s death, and the characters were flown in as an oversight committee of sorts. But the goal wasn’t an honest investigation; the goal was to put a plausible stamp of approval on the operation by a crew of friendly faces.

Would DeepMind’s Ethics Board have the independence and freedom to shut down a model which seemed prone to cross the line from “aligned AI” to “unaligned AGI”? Well…

Even perfect technological safeguards fail to (inevitable) human defection

Although the dinosaurs in Jurassic Park were well on their way to escaping containment on their own (via sex-changes and Lysine-heavy diets), the catastrophic physical containment failures weren’t technological; it was when Nedry shut down the electric fences to steal a vial of embryos and sell on the black market!

This one is straightforward; it doesn’t matter how responsible your AI alignment oversight committee is, if one of your engineers decides to steal and sell a dangerous model to a Russian crime syndicate for a few million crypto shitcoins.

These agent-systems are built without significant popular or even regulatory input

Jurassic Park was built in secret. This stretches narrative suspension of disbelief, but it’s a conceit we accept. And from Hammond’s POV, it’s a surprise he’ll offer the world (although in practice any secrecy was mostly to deflect corporate competitors).

Modern AI isn’t a technical secret to the general populace (Google may even claim to try to inform the public about the benefits of their AI assistant technology), but functionally the general public has no concept of how close or far we are from an AGI which will, for better or for worse, upend their place in the world.

At the end of the day, a new world where these freely-operating agents have completely escaped their control systems is presented as a fait accompli to the general public

Jurassic Park III, (I admit, an overall franchise-milker of a film) ends with a cinematic shot of Pteranodons flying off into the sunset, presumably to find new nesting grounds. This is a romantic vision, undercut by the fact that those Pteranodons very recently tried to eat the film’s main characters, and presumably the humans who live in those nesting grounds will have no veto power over this new arrangement.

And likewise, AGI — or even scoped AI — absent dramatic regulatory change, is going to be presented as a fait accompli to the general public.

What’s my point?

I don’t know if the lesson here is that some enterprising cinematographer can reskin Jurassic Park as AGI Park to get Gen Z interested in AGI risk or what, or if there are more bite-sized lessons about how to make hard-to-grok theoretical risks.

But I do feel we’re missing opportunities to (lightly) inform the public when the contemporary cinematic treatment of technology which is about to turn the world upside down looks like uh, whatever Johnny Depp is doing here, and that really feels like an own-goal for the species (and maybe all sentient organic life if things go really off the rails).

Electoral Savings Accounts: One Person, One Vote (but you can save it for later)

The conversation around “high information” vs “low information” voters hypothesizes a world where voters lie somewhere on a spectrum of “well-informed” to “uninformed”:

Those who fret about “low-information voters” dislike that low-information and high-information voters all count the same at the ballot box, feeling this dilutes the opinions of those who are well-informed about the issues.

Segmenting voters into “high” and “low” information buckets oversimplifies, however, by dropping a dimension — time. A particular voter’s informedness and enthusiasm (we’ll treat them the same, as a first-order approximation) vary widely over time and life circumstances:

The graph is different for everyone, but in this example, our voter partied through college, became more politically engaged after getting their first job, disengaged when their life got busy, and re-engaged after their kids went to college.

(Of course, we could break this down further into individual issues a voter cares about. Interest in taxation, education, and environmentalism etc wax and wane with personal circumstances)

How does enthusiasm translate into voting patterns? Our voter’s voting history may look something like this:

There’s clearly a correlation with enthusiasm and votes cast; in years where the voter is completely disengaged, he/she will likely not bother to vote at all. But in all but the most-disengaged years, the voter will cast the legal maximum of 1 vote.

But if we could design an optimized democracy, is this how we would structure representation? Probably not. In a theoretically optimized democracy, voters would cast a number of votes corresponding to how informed and enthusiastic they were about the issues at hand:

Is that possible?

Saving votes for later

We can’t just ask voters how informed they are (poll tests have a sordid history) or how enthusiastic they are (there’s no advantage in being honest). And any system of buying & selling votes, to rebalance between people, is prone to corruption and disenfranchisement.

But we could let voters cast multiple votes by saving votes, and borrowing future votes, from themselves:

When a voter doesn’t choose to vote, the vote is “saved”, and they are free to use it in the future
A voter can “borrow” votes from up to 10 years in their future
When casting a vote, a voter can spend as many votes as they have available for a race — either dipping into their bank, or borrowing from their future.

This allows our example voter to cast votes that match their enthusiasm curve, with a bit of saving and borrowing:

In college, our voter was busy partying, and didn’t really care about the issues. So they didn’t feel any pressure to vote —but there’s no disenfranchisement, because the votes are saved for later.

Once they sobered up and got a job, they really cared about the issues. Maybe about taxes, or climate change, or both. Not only did they spend the votes they’d banked from college, but they borrowed from elections into their 30’s.

In middle age, they again disengaged, but again not to any permanent disenfranchisement, because once their kids went to college, they were free to spend off their banked votes (or, just save them up).

Electoral Savings Accounts

The details of borrowing and saving votes sounds complicated, but the implementation is actually pretty simple:

When a voter turns 18, they start with an Electoral Savings Account (ESA) of 10 votes — ie, allowing them to “borrow” ten years into the future
Each election, the voter’s ESA increases by one.
Each election, each race, voters may spend anywhere between none and all of the votes in their ESA balance.
Voters hold separate ESAs for each elected position; a voter can vote for school board candidates while abstaining from the presidency, or vice versa.

The last point, when coupled with the fact that people move to new jurisdictions, requires some inter-jurisdictional coordination, to categorize Seattle’s city council election in the same “bucket” as New York’s city council election. But for the most part, the important races have direct analogues in other cities across the country, and it would not be challenging to build a national mapping from one elected position into a known bucket.

Why are ESAs good for democracy?

The headline reason for ESAs is that they align votes with the times voters are most interested, but there are other reasons they would incentivize a healthy democracy:

ESAs reward honesty by political parties (and punish dishonesty)

It is common for political parties, in their public stances and media advertisements, to frame every race in every election as a highest-priority, life-or-death issue (“the most important election of your lifetime”). Currently, there is no downside to doing so, because angry voters are good for fundraising.

But using ESAs, a party which hypes up the importance of a non-critical election risks misleading their voters into wasting their entire ESA on an unimportant race. On the other hand, a party which rightly acknowledges that their opponent is a boring centrist, can save up a war-chest of ESAs their voters can spend on a later, more important, election.

“Voting against everyone” isn’t self-disenfranchisement

Currently, when party primaries produce two terrible candidates, centrist voters are left with two unappealing options:

Vote for the slightly lesser evil
Voting for nobody

The second option — spoiling a ballot, or just not showing up — is usually unappealing because it amounts to self-disenfranchisement, and sends no clear message to the candidates.

But if an abstained vote goes directly into your ESA, there’s a great reason to skip an election to punish a slate of bad candidates — you can spend the vote later, on a candidate you actually like.

Check on tyranny-by-majority

Even districts which consistently vote with 45% – 55% splits in a First Past The Post (FPTP) system are considered non-competitive districts, because the party with 55% of the vote almost always wins. This is a bad deal for the 45% of the population who, despite having 45% of the population, receive 0% of the representation.

ESAs increase the representation of the minority by allowing them to use their votes “when it matters”. Instead of constantly throwing their votes away on 45-55% elections, the minority party can save up votes to flip the election when an especially viable candidate is on the ballot:

(here, abstaining from voting, in the years with light green, and double-voting in the years with light red)

Disincentivize very polarizing candidates

When a candidate is running against voters who have a substantial ESA pool available, it does not pay to be antagonistic or polarizing. In a FPTP system — what we have now — winning 51% of the vote is Good Enough, and it is often good tactics to make the remaining 49% hate you. But this is bad for society overall.

The problem is that in a FPTP system, “49% of the population who hate your guts” is electorally indistinguishable from “49% of the population that mildly dislikes you”. Being very, very angry doesn’t matter. But if the very angry minority has an ESA balance to spend, they can punish specifically infuriating candidates with electoral upsets.

What’s more, in an electorally efficient system — where candidates and voters behave rationally — this threat of upset votes via ESA spending is enough to motivate inclusivity (or at least a lack of outright antagonism). And then when the ESAs aren’t actually spent, they’ll continue to motivate inclusivity in the next election, and so forth.

It’s healthy to not care about politics for a few years

High-intensity interest in politics is not good for mental health. Allowing voters to check out of politics for a few years without sacrificing their representation relative to those who stay involved, allows voters to optimize for their own wellbeing.

Summary

ESAs are a simple mechanism, but would fundamentally change the dynamics between voters and elections for the better. Instead of making votes a use-it-or-lose it opportunity — which cuts out voters who don’t have the time or energy to research and vote for a whole slate of candidates — it trusts voters with a resource that they can spend when and where they please.

Given that the fundamental premise of democracy is that we do trust the people, it seems likely we could make democracy even more robust by trusting voters to cast ballots not just if, but when they see fit.

Appendix: Variations on ESAs

ESAs as proposed above are “as simple as possible”, but there may be opportunities for refinement at the margins, at the cost of higher complexity:

Expire banked votes

ESA votes as described compound neither positively or negatively over time; 1 vote saved in 1980 can be spent as 1 vote in 2025.

An (IMO unlikely) but possible scenario is if vote-hoarding becomes a destabilizing problem, because voters regularly procrastinate instead of casting votes. A gentle way to nudge voters into voting sooner rather than later would be to cap the number of years a vote can be banked — for example, a vote not spent within 10 years would expire.

Age-cap borrowed votes

By giving voters a 10-year window of future votes to borrow (that is, by initializing their ESA with 10 votes), we’ve likely inflated the total vote supply. This is because at the end of a person’s life, they are likely to have spent down their pool of votes, borrowing votes from years they are not alive.

This isn’t catastrophic, but if we wanted to re-normalize the total vote count, ESAs could stop accumulating votes for a corresponding decade, for example between the ages of 60-70 (since the majority of voters will make it to age 70).

Spend votes fungibly across races

If we are allowing voters to spend their votes fungibly across time — because their enthusiasm waxes and wanes over time — a natural extension is to allow voters to spend their votes fungibly across elected positions, in alignment with their enthusiasm.

There are complications in this system (should a vote for a municipal sewage administrator be equivalent to a vote for a senator?), but this is actually the same as quadratic voting, a system fully compatible with ESAs.

Legislative Performance Futures — Incentivize Good Laws by Monetizing the Verdict of History

There are net-positive legislative policies which legislators won’t enact, because they only help people in the medium to far future. For example:

Climate change policy
Infrastructure investments and mass-transit projects
Debt control and social security reform
Child tax credits

The (infrequent) times reforms on these issues are legislated — which happens rarely compared to their future value — they are passed not because of the value provided to future generations, but because of the immediate benefit to voters today:

Infrastructure investment goes to “shovel ready” projects, with an emphasis on short-term job creation, even when the prime benefit is to future GDP. For example, Dams constructed in the 1930s (the Hoover Dam, the TVA) provide immense value today, but the projects only happened in order to create tens of thousands of jobs.

Climate change legislation is usually weakly directed. Instead of policies which incur significant long-term benefits but short-term costs (ie, carbon taxes), “green legislation” aims to create green jobs and incentivize rooftop solar (reducing power bills today).

(small) child tax credits are passed to help parents today, even though the vastly larger benefit is incurred by children who exist because the marginal extra cash helped their parents afford an extra child.

On the other hand, reforms which provide no benefit to today’s voter do not happen; this is why the upcoming Social Security Trust Fund shortfall will likely not be fixed until benefits are reduced and voters are directly impacted.

The issue is that while the future reaps the benefits or failures of today’s laws, people of the future cannot vote in today’s elections. In fact, in almost no circumstances does the future have any ability to meaningfully reward or punish past lawmakers; there are debates today about whether to remove statues and rename buildings dedicated to those on the wrong side of history, actions which even proponents acknowledge as entirely symbolic.

But while the future cannot vote today, financial instruments exist to reward those who made wise choices yesterday — stocks.

Legislative Performance Futures

Those who bet in 1990 that Apple would be a winner have been massively rewarded. But politicians who bet in 1920 that segregation is Very Bad, or in the 1950s that the Red Scare was Very Bad, or in the 1980s that nuclear proliferation is Very Bad, suffered electorally for their stances, and in recompense get only faint praise from history professors.

Though we cannot electorally incentivize forward-thinking politicians, we can monetarily incentivize them. Specifically, by monetizing the future public opinion of today’s legislators, we can provide lawmakers with an incentive to pass laws for which history judges them kindly.

We can call this instrument a Legislative Performance Future (LPF) and it would work something like this:

In lieu of direct compensation, legislators receive LPFs, or shares, on their future job evaluations, which will be paid out 40 years from the date of issue. For example, a 2020 Arkansas congressman on entering office will be granted 100 LPFs, in his or her name, maturing in 2070.

Each year, voters, in addition to electing current representation, “vote” among the representatives who served exactly 40 years ago.

A fixed fraction of GDP, in aggregate corresponding to very generous salaries — .01% of GDP or so — is paid out proportional to the above retrospective votes, to the holders of the corresponding LPFs.

LPFs are fully fungible, and can be inherited or sold like any other financial or physical asset. Because legislators need money to live their lives, it is expected that they will immediately liquidate many (or most) of their shares into cash.

In simple terms, those who enacted legislation for which history thanks them (such as establishing the EPA) are rewarded. Those who passed laws which history scorns — Jim Crow laws, internment caps — are not.

This system might work as-is, with legislators blinding voting their consciences and hoping that history someday rewards them. But this feedback loop is long. Luckily, by making LPFs tradable assets, we can do much better.

Markets Predict

Because legislators need money to live (those who don’t get book deals), and LPFs are monetized commodities, we expect that most legislators will sell (most of) their LPFs at market value.

But what’s the market value of an LPF?

The value of an LPF today is set not by the future, but by the markets of today predicting what the future will think. It is not a leap to believe that markets will align the value of LPFs to future sentiment more accurately than legislators today do to their future reputation.

Today, when a legislator proposes major legislation, their staff closely monitors public opinion polls to gauge public sentiment. The polls move up and down almost immediately (but in a manner only loosely correlated with likely future sentiments, because the voters of today vote primarily with their wallets and emotions).

But the buyers and sellers of LPFs are incentivized only to be correct. They may hold personal stances on the issues, but have every reason to set those stances aside and let homo economicus perform brokerage transactions. This produces good predictions.

(a personal example: I am not a vegetarian. I enjoy eating meat, and would not vote to outlaw meat. But I recognize that the tides of history are clearly against animal farming and consumption, and history will consider me in the wrong. So if a legislator in my district proposed banning pig farming, I would vote against them — but I would immediately buy into their LPF.)

Thus, LPFs change this dynamic. Because a legislator’s LPFs trade on the open market, their value will move immediately when new legislation is proposed — legislators don’t have to wait to reap the verdict of history; they can immediately reap the verdict of projected history by proposing (and passing) laws which are primarily good only for the next generation, and then selling their LPFs at a now-higher price.

LPFs will not help politicians win elections. But to the extent that politicians are corrupt and money-grubbing humans, LPFs will align that greed with the desire to pass good, forward-thinking, laws.

Implementation Questions

How will voters even know what politicians of 40-years-ago stood for (without extensive research)?

Easy — we just re-print the voter information guides they used in their own elections. LPFs incentivize politicians to be very explicit about their policy positions in their candidate statements, in order to stand out during the LPF value-evaluation vote.

Will voters even bother voting in LPF evaluations?

Many won’t. But unlike in elections today (where the stakes are actually pretty high, even in “minor” races) there’s relatively little impact if only high-information voters bother to cast votes. Money is redistributed, but the only way the system breaks is if voting is so random that politicians today lose faith in the connection between performance and payouts.

Likewise, there is (almost) no incentive for voters to ever vote tactically or against their own beliefs, because money is parcelled out proportional to votes (not winner-take-all). There’s simply no reason for voters to be dishonest.

How do we actually start trading LPFs?

Practically, there’s a bootstrapping problem. If legislators today (2021) were paid in 40-year-maturity LPFs, 40 years of LPFs would be bought and traded before a dollar was paid out. The uncertainty of that process (confidence that investments would really pay out) is likely to depress prices. We can instead bootstrap the process by gradually extending the LPF maturity dates:

In 2021, legislators are paid in LPFs which mature in 2023
In 2022, legislators are paid in LPFs which mature in 2025
In 2023, legislators are paid in LPFs which mature in 2027
… and so on, up to the final generational maturity window of 40 years.

This way, the system will pay dividends as soon as 2 years from inauguration, to build confidence, but within half a generation (20 years) will become the long-term instrument we’ve intended.

What laws do I think LPFs will incentivize?

With the caveat that these are the stances I see history judging, not a function of personal opinion, I would personally invest in a LPF portfolio built around:

Carbon taxation
YIMBY policies which depress housing prices
Cuts to social security and medicare benefits
Child tax credits and maternity leave
Banning or restricting factory farming

(and if I’m wrong, the great thing about LPFs is that my money is where my mouth is)

Summary

LPFs will not fix all our problems of a broken lawmaking process. Most fundamentally, they cannot get politicians re-elected. But they may at the margin nudge politicians into politically unpopular stances which they believe will be looked upon favorably.

Possibly more importantly than the financial payout, at least for legislators truly interested in “doing the right thing”, is that in moments of doubt they can literally consult an “opinion poll from the future” (or at least, our best guess at one).

Last but not least, by asking voters to cast their own retrospective votes, it will nudge them to view their own present-day votes by the light that their children’s children will judge them.

In this at least, it is hard to see any downside.

The corporate / academic / public AI value gap

There is a huge gap between the benefits of Artificial Intelligence the public is being sold, the benefits of AI which are being marketed to corporate adopters, and the actual motivations of AI researchers.

Tech providers pitch AI as a driver of innovation (self-driving cars) and global good (mitigating global warming). But the B2B case-studies pitched to corporate clients more often pitch AI solutions as better automation, mostly enabling cost-reduction (specifically, reducing human-in-the-loop labor).
While many AI researchers are motivated by genuine interest in improving the human condition, other motivations diverge — a desire to push the bounds of what we can do, a genuine belief in transhumanism (the desire for AI to replace, or transform into something entirely unrecognizable, humanity), or simply because AI pays bigly.

These drivers — replacing human employment, and perhaps humans themselves — are, to put it mildly — not visions the public has bought into.

But these internal motivations are drowned out by the marketing AI pitch by which AI is sold to the public: “AI will solve [hunger/the environment/war/global warming]”. This leaves the people not “in the know” about AI progress — 99% of the population — not even thinking to use democracy to direct AI research towards a world the (average) person actually wants to live in.

This is not particularly fair.

Marketed AI vs Profitable AI

To the public, the tech giants selling AI solutions (Google, Microsoft, and Apple) pitch visions of AI for good.

The public face of these advertising campaigns is usually brand advertising, perhaps pitching consumers on a software ecosystem (Android, iOS), but rarely selling any specific product. This makes it easy to sell the public a vision of the future in HDR, backed by inspirational hipster soundtracks.

You all know what I’m talking about — you’ve seen them on TV and in movie theaters — but the imagery is so honed, so heroic, that we should look at the pictures anyway.

Google’s AI will do revolutionary things, like fix farming, improve birthday parties, and help us not walk off cliffs:

Microsoft’s AI is safe. You can tell because this man is looking very thoughtfully into the distance:

But if that is not enough to convince you, here is a bird:

Microsoft goes into detail on their “AI for Good” page. The testimonials highlight the power of AI as applied to:

Environmental sustainability (image recognition of land use, wildlife tracking, maximizing farm yields)
Healthcare (dredging through data to find diseases)
Accessibility (machine translation, text to speech)
Humanitarian action and Cultural Heritage preservation

Even the Chinese search engine Baidu, not exactly known for their humanitarian work, has joined the OpenAI “safe AI” consortium, which is nominally dedicated to developing and selling only safe AI.

The theme among all these public “good AI” initiatives — the sales pitch to the public — is:

“We’re developing advanced AI, but we’re partnering with NGOs, hospitals, and more, to make this AI work for people, not against them. Look at all the good we can do!”

This isn’t fake. Microsoft is working with nonprofits, NGOs, and more, to deploy for-the-people AI. But these applications don’t get us closer to the real question:

“What solutions are normal companies actually deploying with AI-as-a-service cloud technology?”

We can peek behind the curtain at Amazon. Amazon’s AWS has been for the last decade synonymous with “the cloud”, and still has a full 50% market share. The bleeding edge of AWS are plug-and-play machine learning and AI tools: Amazon Forecast (machine learning), Amazon Polly (text to speech), Amazon Rekognition (video object recognition), Amazon Comprehend (natural language processing), and more.

And Amazon, alone and refreshingly among tech giants, doesn’t even pretend to care why their customers use AI:

“We certainly don’t want to do evil; everything we’ve released to customers to innovate [helps] to lift the bar on what’s actually happening in the industry. It’s really up to the individual organisation how they use that tech”

Amazon sells AI to C-suites, and we know what the hooks are, because the marketing pitches are online. AWS publishes case studies about how their plug-and-play AI and ML solutions are used by customers.

We can look at a typical example here, outlining how DXC used AWS’s ML and AI toolkits to improve customer service call center interactions. Fair warning: the full read is catastrophically boring — which is to be expected when AI used not to expand the horizon of what is possible… but instead used to excise human labor from work which is already being done:

“DXC has also reduced the lead time to edit and change call flow messaging on its IVR system. With its previous technology, it took two months to make changes to IVR scripts because DXC had to retain a professional voice-over actor and employ a specialized engineer to upload any change. With Amazon Polly, it only takes hours”

Using Amazon Connect, DXC has been able to automate password resets, so the number of calls that get transferred to an agent has dropped by 30–60 percent.

DXC anticipates an internal cost reduction of 30–40 percent as a result of implementing Amazon Connect, thanks to increased automation and improved productivity on each call.

In total, what did DXC do with its deployed AI solution? AI is being used to:

Replace a voice-over actor
Eliminate an operations engineer
Eliminate customer service agents

There’s nothing evil in streamlining operations. But because of the split messaging being used to sell AI research to the public vs to industry — on one hand, visions of environmental sustainability and medical breakthroughs, and on the other hand, the mundane breakthrough of applying a scalpel to a call center’s staffing — the public has little insight (other than nagging discomfort) into automation end-game.

The complete lack of (organized) public anger or federal AI policy — or even an attempt at a policy — speaks to the success of this doublespeak.

Research motivations

So why are actual engineers and researchers building AI solutions?

I could dredge forums and form theories, but I decided to just ask on reddit, in a quick and completely unscientific test. Feel free to read all the responses — I’ve tried to aggregate them here and distill them into the four main themes. Weighted by upvotes, here’s the summary:

Preface: none of these are radical new revelations. They match, in degrees, what you’d find with a more exhaustive dragnet of public statements, blogs, or after liquoring up the academic research mainstream.

Walking down the list:

1. Improving the human condition

A plurality goal is to better the human condition, which is promising. An archetypal response is a vision of a future without work (or at least, without universal work):

“I believe the fundamental problem of the human race is that pretty much everyone has to work for us to survive.
So I want to work on fixing that.”

It’s not a vision without controversy — it’s an open question whether people can really live fulfilled lives in a world where they aren’t really needed — but at minimum it’s a vision many could get behind, and is at root predicated in a goal of human dignity.

2. It pays

Close behind are crude economics. Top comment:

“Dat money”

I don’t intend to sound negative — capitalism is the lever which moves the world, and in capitalism, money follows value. But as shown by AWS, value can come from either revolutionary invention (delivering novel value), or cost excision (delivering cheaper value).

Either direction pays the bills (and engineers), and few megacorp engineers care to peek behind the curtain at which specific aspect of the AI product delivered to clients pays the bills.

3. Transhumanism

Here’s where the interests of true believers in AI diverge from the mainstream. Top comment:

“I don’t really care about modern ML solutions, I am only concerned with AGI. Once we understand the mechanisms behind our own intelligence, we move to the next phase in our species’ evolution. It’s the next paradigm shift. Working on anything else wouldn’t be worth it since the amount of value it brings is so vast.”

“I’m in it for the money” is just realism. “A world without work” and “making cheddar” are motivations which appeal to the mainstream, and is at least comprehensible (if frustrating) to those whose jobs are on the line.

Transhumanism is different. There’s a prevalent (although possibly not majority) philosophy among many AI researchers, practitioners, and enthusiasts, that the goal of developing strong (human-level) AI is not a tool for humans, but an end unto itself. The goal is the creation of a grander intelligence beyond our own:

“Let an ultraintelligent machine be defined as a machine that can far surpass all the intellectual activities of any man however clever. Since the design of machines is one of these intellectual activities, an ultraintelligent machine could design even better machines; there would then unquestionably be an ‘intelligence explosion,’ and the intelligence of man would be left far behind. Thus the first ultraintelligent machine is the last invention that man need ever make.”

Or, step-by-step:

Humans create AI 1.0 with IQ human + 1
AI 1.0 creates AI 2.0, which is slightly smarter
AI 2.0 creates AI 3.0, which is WAY smarter
AI 3.0 creates AI 4.0, which is incomprehensibly smarter

And whatever comes next… we can’t predict.

This is not a complete summary of transhumanism. There’s a spectrum of goals, and widespread desire for AI which can integrate with humans — think, nanobots in the brain, neural augmentation, or wholesale digital brain uploads. But either way — whether the goal is to retrofit or replace humans — the end goal is at minimum a radically transformed concept of humanity.

Given that we live in a world stubbornly resistant to even well-understood technological revolutions — nuclear power, GMOs, and at times even vaccines — it’s fair to say that transhumanism is not a future the average voter is onboard for.

4. Just to see if we can

And just to round it out, a full 16% of the votes could be summarized (verbatim) as:

“Why not?”

Researchers — and engineers — want to build AI, because building AI is fun. And there’s nothing unusual about Fun Driven Development. Most revolutionary science doesn’t come from corporate R&D initiatives; it comes from fanatical, driven, graduate students, startups, or bored engineers hacking on side projects.

Exploration for the sake of exploration (or with a thin facade of purpose) is what got us out of trees and into lamborghinis.

But at the end of the day, “for the fun” is an intrinsic motivation akin to “for the money”. The motivation gives one engineer satisfaction and purpose, but doesn’t weight heavily on the scales when answering “should this research exist?” — in the same way we limit fun of experimental smallpox varietals, DIY open-heart surgery, and backyard nuclear bombs

Misaligned motivations

The public has been sold a vision of AI for Good; AI as an ex machina for some (or all) of our global crises, now and future:

AI for Global warming
AI for biodiversity
AI for COVID-19

These initiatives aren’t fake, but they also represent a small fraction of actual real-world AI deployments, many if not most of which focus on selling cost-reductions to large enterprises (implicitly and predominantly, via headcount reductions).

AI researchers and implementers, in plurality, believe in the potential good of AI, but more frequently are in it for the money, to replace (or fundamentally alter) humans, or just for the fun of it.

The public, and their elected governments, can’t make informed AI policy if they are being sold only one side of the picture — with the unsavory facts hidden, and the deployment goals obscured. These mixed messages are catastrophically unfair to the 99% of humanity not closely following AI developments, but whose lives will be, one way or another, changed by the release of even weak (much less, strong) AI.

GPT-3 is the Elephant, not the Rider

The Righteous Mind by Jonathan Haidt explains the link between our conscious, calculating mind and our subconscious, instinctive mind with a metaphor: The Elephant and the Rider:

The rider is our “conscious”, reasoning mind, which uses explainable logic to reason about the world, our own behavior, and our preferences
The elephant is the momentum of pre-trained and pre-wired preferences with which we make “snap” decisions about preferences or morality.

The rider — homo logicus — believes itself to be in control of the elephant, but this is only about 10% true. In truth, when the rider and elephant disagree about which direction to ride, the elephant almost always wins. The rider instead spends time making excuses to justify why it really intended to go that direction all along!

Or, non-metaphorically: the vast majority of the time, we use our “thinking” mind to explain and generate justifications for our snap judgements — but our thinking mind only rarely is able to actually redirect our pre-trained biases into choices we really don’t want to make.

Occasionally, if it’s a topic we don’t have strong pre-trained preferences about (“What’s your opinion on the gold standard?”), the rider has control — but possibly only until the elephant catches a familiar scent (“The gold standard frees individuals from the control of governmental fiat”) and we fall back to pre-wired beliefs.

Most of the time, the rider (our thinking brain)’s job is to explain why the elephant is walking the direction it is — providing concrete explainable justifications for beliefs whose real foundation is genetic pre-wiring (“Why are spiders scary?”) or decades of imprinting (“Why is incest bad?”)

But even though the rider isn’t, strictly speaking, in control, it’s the glue which helped us level up from smart apes to quasi-hive organisms with cities, indoor plumbing, and senatorial filibusters. By nudging our elephants in roughly the right direction once in a while, we can build civilizations and only rarely atomize each other.

Traditional visions of AI — and the AI still envisioned by popular culture — is cold, structured, logic incarnate.

Most early game AIs performed a minimax search when choosing a move, methodically evaluating the search space. The AI would calculate for each move how to counter the best possible move the opponent could make, and then would perform these calculations as deep as computing power permitted:

This is still the AI portrayed in popular media. In a positive portrayal, the AI is precise, logical, and (somewhat) useful:

C-3PO : Sir, the possibility of successfully navigating an asteroid field is approximately 3,720 to 1
Han Solo : Never tell me the odds.

In a negative portrayal, AI is cold and calculating, but never pointlessly cruel. In 2001: A Space Odyssey, if HAL 9000 opened the pod bay doors, it would prove (in worst case) a potential risk to HAL 9000 (itself), and the mission. The rational move was to eliminate Dave.

Bowman: Open the pod bay doors, HAL.
HAL 9000: I’m sorry, Dave. I’m afraid I can’t do that.

HAL 9000 was simply playing chess against Dave.

NLP and structured knowledge extraction operated similarly. NLP techniques were built to turn sentences into query-able knowledge bases via structured information extraction. Facts were extracted from natural-language sentences and stored in knowledge bases:

Decisions made by AI systems which used information extraction techniques were fully explainable, because they were built from explicit extracted facts.

These visions of AI all envisioned artificial agents as the elephant riders, in which decisions were made upon cold facts. Perhaps we first tried to build explainable AI because we preferred to see ourselves as the riders — a strictly logical agent in firm control of our legacy animal instincts.

But modern AI is the elephant.

Neural networks have replaced traditional structured AI in almost every real application — in both academia and industry. These networks are fast, effective, dynamic, easy to train (for enough money), and completely unexplainable.

Neural networks imitate animal cognition by modeling computation as layers of connected neurons, each neuron connected to downstream neurons with varying strength:

There’s a huge amount of active research into how to design more effective neural networks, how to most efficiently train neural networks, and how to build hardware which most effectively simulates neural networks (for example, Google’s Tensor Processor Units).

But none of this research changes the fact that neural networks are (perhaps by design) not explainable — training produces networks which are able to answer questions quickly and often correctly, but the trained network is just a mathematical array of weighted vectors which cannot be meaningfully translated into human language for inspection. The only way to evaluate the AI is to see what it does.

This is the elephant. And it is wildly effective.

GPT-3 is the world’s most advanced neural network (developed by the OpenAI consortium), and an API backed by GPT-3 was soft-released over the past couple weeks to high-profile beta users. GPT-3 is a neural network with 175 billion trained parameters (by far the world’s largest publicly documented neural network). It was trained on a wide range of internet-available text sources.

GPT-3 is a predictive model — that is, provide it the first part of a block of text, and it will generate the text which it predicts should come next. The simplest application of text prediction is writing stories, which GPT-3 excels at (the prompt is in bold, generated text below):

But text prediction is equally applicable to everyday conversation. GPT-3 can, with prompting, answer everyday questions, and even identify when questions are nonsensical (generated answers at the bottom):

Gwern has generated GPT-3 responses on a wide range of prompts, categorizing where it does well and where it does poorly. Not every response is impressive, but many are, and the conclusion is that GPT-3 is a huge leap forward from GPT-2 (which used 1.5B parameters, vs GPT-3’s 175B).

GPT-2 and GPT-3 have no model of the world, but that doesn’t stop them from having opinions when prompted.

GPT-2/3 are trained on the internet, and are thus the aggregated voice of anyone who has written an opinion on the internet. So they are very good at quickly generating judgements and opinions, even though they have absolutely no logical or moral framework backing those judgements.

Huggingface provides a publicly-accessible playground to test GPT-2’s predictions on your own text inputs (GPT-3 is, for now, available only to internet celebrities and VCs). We can prompt GPT-2 for opinions on a variety of emotionally charged topics, like incest:

abortion:

and other topics likely to provoke an emotional response:

These are elephant responses, generated by volume of training data, not clever logical deduction . GPT-* has absolutely no model of the world or moral framework by which it generates logical responses — and yet the responses read as plausibly human.

Because, after all, we are 90% elephant.

What does this mean for AI, and for us?

Most people have no idea what modern AI is, and that makes effective oversight of AI research by the public completely impossible. Media depictions of AI have only shown two plausible futures:

Hyper-logical, explainable, “Friendly AI”: Data from Star Trek. Alien, but because of the absence of emotion
Hyper-logical, explainable, “Dangerous AI”: Terminator. Deadly, but for an explainable reason: the AI is eliminating a threat (us)

These visions are so wildly far from the future we are in, that the public is less informed for having been shown them

The AIs we’ll actually interact with tomorrow — on Facebook, Reddit, Twitter, or a MMORPG — are utterly un-logical. They are the pure distilled emotions of anyone who has ever voiced their opinions on the internet, amplified a thousandfold (and perhaps filtered for anger or love for particular targets, like China, Russia, or Haribo Gummy Bears).

If we want the public to have any informed opinion about what, how, and where AI is deployed (and as GPT-3/4/5 seem poised to obviate all creative writing, among other careers, this seems like a reasonable ask), the first step is to stop showing them an accurate picture of what Google, Microsoft and OpenAI have actually built.

And second: if we do want to ever get the AI we saw in Star Trek (but hopefully not Terminator), we need to actually build a model-based, logical elephant rider, and not just the elephant itself — even though it’s much, much, harder than downloading 20 billion tweets of training data and throwing them at a trillion parameter neural network.

Or maybe we should figure out how to do it ourselves, first.

Bad Blood: Theranos, Yelp reviews, and LinkedIn profile views

Bad Blood is the history of Theranos. It’s written by John Carreyrou. John is not just some random journalist-turned-novelist — he’s the same Wall Street Journal reporter who blew Theranos open like a microwaved egg, with his bombshell yet hilariously understated expose in 2015:

“Hot Startup Theranos Has Struggled With Its Blood-Test Technology”

(understatement of the year, albeit only in hindsight). It’s a great story, and a fascinating window into the nuts-and-bolts of investigative journalism.

For anyone living under a cinderblock for the past decade, the tl,dr of Theranos:

College drop-out Elizabeth Holmes founds biotech startup Theranos
Theranos claims it has technology which could, from a drop of blood from a finger prick (instead of traditional blood draws), diagnose hundreds of medical conditions from low Vitamin D to Herpes.
(Spoiler: they didn’t, and it couldn’t)
Despite having no working technology, Elizabeth compensates with a closet of black Steve Jobsian turtlenecks and a strangely husky voice, and the company raised hundreds of millions in venture funding at a peak valuation of $10 billion
Theranos fakes it, but forgot the second half, and never makes it. The company collapses, everyone loses their money, and the founders face criminal trials.

My whole career, I’ve lived in deep tech, and had to deal with the semi-literate catastrophe of “tech news”. I went into the book with as much respect for tech journalism as I have for pond slime, so my prior on the Theranos story was:

“Ambitious young founder drops out of Stanford with good idea. Media builds up young (female) founder into unicorn founder superhero. When technology fizzles out, the founder, unable to admit defeat due to immaturity and media adulation, accidentally digs grave for self with good-intentioned but compounding exaggerations. Lies build up until the company collapses. Finito.

While it was technically ‘fraud’, nobody got hurt except investors who didn’t do due diligence, so… so what?”

Well, I was wrong. Theranos — and Theranos was, indisputably, a physical manifestation of Elizabeth Holmes’s psyche — lied from the beginning, and was doing Bad Shit well before Elizabeth became a Young Female Founder icon on the cover of Forbes.

And when I say “Bad Shit”, I mean:

Lying, outright, in the press and to partners, about what technology was being used to run tests.
Completely inventing revenue projections. This is what got them to unicorn status. The lying didn’t come “post-unicorn”
Completely disregarding employee feedback, even when being told outright “these devices are random number generators, but we’re using them to provide clinical results, and should probably stop”
Lying, outright, to the board of directors about basic things “our devices are being used in Afghanistan”
Giving patients clinical results based on clearly malfunctioning experimental devices. And like wildly bad results. Giving patients Potassium readings which classified them as “obviously deceased”.

I don’t want to go too deep into the details. Pretty much every part of the story is equally wild, and you should just read it, if you’re at all interested in reading about biotech trainwrecks.

One of the craziest part about the story (to me) is how barely it happened. There were several points in the story where the breakthrough hinged on absolutely tiny connections or revelations — and usually, those connections were tech-enabled.

First, one of the key connections — the one which actually connected the whistleblower to John Carreyrou, was a LinkedIn profile view notification (!):

“While checking his emails a few days later, Fuisz saw a notification from LinkedIn alerting him that someone new had looked up his profile on the site. The viewer’s name—Alan Beam—didn’t ring a bell but his job title got Fuisz’s attention: laboratory director at Theranos. Fuisz sent Beam a message through the site’s InMail feature asking if they could talk on the phone. He thought the odds of getting a response were very low, but it was worth a try. He was in Malibu taking photos with his old Leica camera the next day when a short reply from Beam appeared in his iPhone in-box.”

In case you haven’t logged into LinkedIn recently, that’s the stupid little notification that shows up right before a recruiter tries to connect with you:

This case breaking open hinged on Fuisz being notified that someone had viewed his LinkedIn profile. This connected a whistleblower former employee with a disgruntled legal rival, who knew a guy who ran a pathology blog. That blogger just happened to know an investigative WSJ reporter.

And that brought down a $10B startup.

It wasn’t the only case where tech-connectivity was critical to breaking open this case. John was able to use Yelp to find doctors to attest to Theranos’s unreliability:

“I had another lead, though, after scanning Yelp to see if anyone had complained about a bad experience with Theranos. Sure enough, a woman who appeared to be a doctor and went by “Natalie M.” had. Yelp has a feature that allows you to send messages to reviewers, so I sent her a note with my contact information. She called me the next day. ”

(This is still a thing, by the way — you can still find irate customers in Phoenix on Yelp dealing with the repercussions of randomized Theranos test results):

There’s the obvious stealth tech too, of course — burner phones, burner emails, email backups, and all the other digital tools which make it impossible to permanently hide internet-connected information in the 21st century.

I don’t mean to imply that the internet (and all the weird stuff we’ve layered on top of the web) made the journalism easy — clearly this story was a grind from start to finish against brutal legal pressure by Theranos. It’s entirely John would have broken the story open without all the newly available digital tricks of the trade.

Or, maybe not.

Theranos certainly wouldn’t have lasted forever, one way or another. The technology simply didn’t work. Safeway or Walgreens, once they had rolled out commercial partnerships, would have figured this out… eventually.

But it seems likely it would have lasted long enough to kill a lot of people.

The Imperial High Modernist Cathedral vs The Bazaar

Or: I Thought I was a Programmer but now I’m worried I’m a High Modernist.

Seeing like a State by James C. Scott is a rallying cry against imperialist high modernism. Imperialist high modernism, in the language of the book, is the thesis that:

Big projects are better,
organized big is the only good big,
formal scientific organization is the only good system, and
it is the duty of elites leading the state to make these projects happen — by force if needed

The thesis sounds vague, but it’s really just big. Scott walks through historical examples to flesh out his thesis:

scientific forestry in emerging-scientific Europe
land reforms / standardization in Europe and beyond
the communist revolution in Russia
agricultural reforms in the USSR and Tanzania
modernist city planning in Paris, Brazil, and India

The conclusion, gruesomely paraphrased, is that “top-down, state-mandated reforms are almost never a win for the average subject/victim of those plans ”, for two reasons:

Top-down “reforms” are usually aimed not at optimizing overall resource production, but at optimizing resource extraction by the state.

Example: State-imposed agricultural reforms rarely actually produced more food than peasant agriculture, but they invariably produced more easily countable and taxable food

Top-down order, when it is aimed at improving lives, often misfires by ignoring hyper-local expertise in favor of expansive, dry-labbed formulae and (importantly) academic aesthetics

Example: Rectangular-gridded, mono-cropped, giant farms work in certain Northern European climates, but failed miserably when imposed in tropical climates

Example: Modernist city planning optimized for straight lines, organized districts, and giant apartment complexes to maximize factory production, but at the cost of cities people could actually live in.

However.

Scott, while discussing how Imperial High Modernism has wrought oppression and starvation upon the pre-modern and developing worlds, neglected (in a forgivable oversight), to discuss how first-world Software Engineers have also suffered at the hands of imperial high modernism.

Which is a shame, because the themes in this book align with the most vicious battles fought by corporate software engineering teams. Let this be the missing chapter.

The Imperial High Modernist Cathedral vs The Bazaar

Imperial high modernist urban design optimizes for top-down order and symmetry. High modernist planners had great trust in the aesthetics of design, believing earnestly that optimal function flows from beautiful form.

Or, simpler: “A well-designed city looks beautiful on a blueprint. If it’s ugly from a birds-eye view, it’s a bad city.”

The hallmarks of high modernist urban planning were clean lines, clean distinctions between functions, and giant identical (repeating) structures. Spheres of life were cleanly divided — industry goes here, commerce goes here, houses go here. If this reminds you of autism-spectrum children sorting M&Ms by color before eating them, you get the idea.

Le Corbusier is the face of high modernist architecture, and SlaS focuses on his contributions (so to speak) to the field. While Le Corbusier actualized very few real-world planned cities, he drew a lot of pictures, so we can see his visions of a perfect city:

True to form, the cities were beautiful from the air, or perhaps from spectacularly high vantage points — the cities were designed for blueprints, and state legibility. Wide, open roads, straight lines, and everything in an expected place. Shopping malls in one district, not mixed alongside residences. Vast apartment blocks, with vast open plazas between.

Long story short, these cookie-cutter designs were great for urban planners, and convenient for governments. But they were awful for people.

The reshuffling of populations from living neighborhoods into apartment blocks destroyed social structures
Small neighborhood enterprises — corner stores and cafes — had no place in these grand designs. The “future” was to be grand enterprises, in grand shopping districts.
Individuals had no ownership of the city they lived in. There were no neighborhood committees, no informal social bonds.

Fundamentally, the “city from on high” imposed an order upon how people were supposed to live their lives, not even bothering to first learn how the “masses” were already living; he swept clean the structures, habits and other social lube that made the “old” city tick.

In the end, the high modernist cities failed, and modern city planning makes an earnest effort to work with the filthy masses, accepting as natural a baseline of disorder and chaos, to help build a city people want to live in.

If this conclusion makes you twitch, you may be a Software Engineer. Because the same aesthetic preferences which ground Le Corbusier’s gears also are the foundation of “good” software architecture; namely:

Good code is pretty code
Good architecture diagrams visually appear organized

Software devs don’t draft cityscapes, but they do draw Lucidchart wireframes. And a “good” service architecture for a web service would look something like this:

We could try to objectively measure the “good” parts of the architecture:

Each service has only a couple clearly defined inputs and outputs
Data flow is (primarily) unidirectional
Each service appears to do “one logical thing”

But software engineers don’t use a checklist to generate first impressions. Often before even reading the lines, the impression of a good design is,

Yeah, that looks like a decent clean, organized, architecture

In contrast, a “messy” architecture… looks like a mess:

We could likewise break down why it’s a mess:

Services don’t have clearly defined roles
The architecture isn’t layered (the user interacts with backend services?)
There are a lot more service calls
Data flow is not unidirectional

But most software architects wouldn’t wade through the details on first glance. The first reaction is:

Why are there f******* lines everywhere??? What do these microservices even do? How does a user even… I don’t care, burn it.

In practice, most good engineers are ruthless high modernist fascists. Unlike the proto-statist but good-hearted urban planners of the early 1900s (“workers are dumb meat and need to be corralled like cattle, but I want them to be happy cows!”), we wrench the means of production from our code with blood and iron. Inasmuch as the subjects are electrons, this isn’t a failing of the system — it’s the system delivering.

Where this aesthetic breaks down is when these engineers have to coordinate with other human beings — beings who don’t always share the same vision of a system’s platonic ideals. To a perfectionist architect, outside contributions risk tainting the geometric precision with which a system was crafted.

Eric S Raymond famously summarized the two models for building collaborative software in his essay (and later, book): The Cathedral and the Bazaar

Unlike in urban planning, the software Cathedral came first. Every man dies alone, and every programmer codes solo. Corporate, commercial cathedrals were run by a lone (or small team) of ruthless God Emperors, carefully vetting contributions for coherence to a grander plan. The essay summaries the distinctions better than I can rehash, so I’ll quote in length.

The Cathedral model represents mind-made-matter diktat from above:

I believed that the most important software (operating systems and really large tools like Emacs) needed to be built like cathedrals, carefully crafted by individual wizards or small bands of mages working in splendid isolation, with no beta to be released before its time.

The grand exception to this pattern was an upstart open-source Operating System you may have heard of — Linux. Linux took a different approach to design, welcoming with open arms external contributions and all the chaos and dissent they brought:

Linus Torvalds’s style of development – release early and often, delegate everything you can, be open to the point of promiscuity – came as a surprise. No quiet, reverent cathedral-building here – rather, the Linux community seemed to resemble a great babbling bazaar of differing agendas and approaches (aptly symbolized by the Linux archive sites, who’d take submissions from anyone) out of which a coherent and stable system could seemingly emerge only by a succession of miracles.

Eric predicted that the challenges of working within the chaos of the Bazaar — the struggle of herding argumentative usenet-connected cats in a common direction — would be vastly outweighed by the individual skills, experience, and contributions of those cats:

I think the future of open-source software will increasingly belong to people who know how to play Linus’ game, people who leave behind the cathedral and embrace the bazaar. This is not to say that individual vision and brilliance will no longer matter; rather, I think that the cutting edge of open-source software will belong to people who start from individual vision and brilliance, then amplify it through the effective construction of voluntary communities of interest.

Eric was right — Linux dominated, and the Bazaar won. In the open-source world, it won so conclusively that we pretty much just speak the language of the bazaar:

“Community contributions” are the defining measure of health for an Open Source project. No contributions implies a dead project.
“Pull Requests” are how outsiders contribute to OSS projects. Public-editable project wikis are totally standard documentation. Debate (usually) happens on public mailing lists, public Slacks, public Discord servers. Radical transparency is the default.

I won’t take this too far — most successful open-source projects remain a labor of love by a core cadre of believers. But very few successful OSS projects reject outside efforts to flesh out the core vision, be it through documentation, code, or self-machochistic user testing.

The ultimate victory of the Bazaar over the Cathedral mirrors the abandonment of high modernist urban planning. But here it was a silent victory; the difference between cities and software, is that dying software quietly fades away, while dying cities end up on the evening news and on UNICEF donation mailers. The OSS Bazaar won, but the Cathedral faded away without a bang.

Take that, Le Corbusier!

High Modernist Corporate IT vs Developer Metis

At risk of appropriating the suffering of Soviet peasants, there’s another domain where the impositions of high modernism parallel closely with the world of software — in the mechanics of software development.

First, a definition: Metis is a critical but fuzzy concept in SlaS, so I’ll attempt to define it here. Metis is the on-the-ground, hard-to-codify, adaptive knowledge workers use to “get stuff done”. In context of farming, it’s:

“I have 30 variants of rice, but I’ll plant the ones suited to a particular amount of rainfall in a particular year in this particular soil, otherwise the rice will die and everyone will starve to death”

Or in the context of a factory, it’s,

“Sure, that machine works, but when it’s raining and the humidity is high, turning it on will short-circuit, arc through your brain, and turn the operator into pulpy organic fertilizer.”

and so forth.

In the context of programming, metis is the tips and tricks that turn a mediocre new graduate into a great (dare I say, 10x) developer. Using ZSH to get git color annotation. Knowing that, “yeah Lambda is generally cool and great best practice, but since the service is connected to a VPC fat layers, the bursty traffic is going to lead to horrible cold-start times, customers abandoning you, the company going bankrupt, Sales execs forced to live on the streets catching rats and eating them raw.” Etc.

Trusting developer metis means trusting developers to know which tools and technologies to use. Not viewing developers as sources of execution independent of the expertise and tools which turned them into good developers.

Corporate IT — especially at large companies— has an infamous fetish for standardization. Prototypical “standardizations” could mean funneling every dev in an organization onto:

the same hardware, running the same OS (“2015 Macbook Airs for everyone”)
the same IDE (“This is a Visual Studio shop”)
an org-wide standard development methodology (“All changes via GitHub PRs, all teams use 2-week scrum sprints”)
org-wide tool choices (“every team will use Terraform V 0.11.0, on AWS”)

If top-down dev tool standardization reminds you of the Holodomor, the Soviet sorta-genocide side-effect of dekulakizatizing Ukraine, then we’re on the same page.

To be fair, these standardizations are, in the better cases, more defensible than the Soviet agricultural reforms in SlaS. The decisions were (almost always) made by real developers elevated to the role of architect. And not just developers, but really good devs. This is an improvement over the Soviet Union, where Stalin promoted his dog’s favorite groomer to be your district agricultural officer and he knows as much about farming as the average farmer knows about vegan dog shampoo.

But even good standards are sticky, and sticky standards leave a dev team trapped in amber. Recruiting into a hyper-standardized org asymptotically approaches “take and hire the best, and boil them down to high-IQ, Ivy+ League developer paste; apply liberally to under-staffed new initiatives”

When tech startups win against these incumbents, it’s by staying nimble in changing times — changing markets, changing technologies, changing consumer preferences.

To phrase “startups vs the enterprise” in the language of Seeing Like a State: nimble teams — especially nimble engineering teams — can take advantage of metis developer talent to quickly reposition under changing circumstances, while high modernist companies (let’s pick on IBM), like a Soviet collectivist farm, choose to excel at producing standardized money-printing mainframe servers — but only until the weather changes, and the market shifts to the cloud.

Overall

The main thing I struggled with while reading Seeing like a State is that it’s a book about history. The oppression and policy failures are real, but real in a world distant in both space and time — I could connect more more concretely to a discussion of crypto-currency, contemporary public education, or the FDA. Framing software engineering in the language of high modernism helped me ground this book in the world I live in.

Takeaways for my own life? Besides the concrete (don’t collectivize Russian peasant farms, avoid monoculture agriculture at all costs) it will be to view aesthetic simplicity with a skeptical eye. Aesthetic beauty is a great heuristic which guides us towards scalable designs — until it doesn’t.

And when it doesn’t, a bunch of Russian peasants starve to death.

Blueprint

Blueprint by Nicholas Christakis posits that humans are all fundamentally the same. Except under unusual circumstances, humans build societies full of good people, with instincts inclined towards kindness and cooperation. I read it.

This book is a grab-bag which combines lab-grown sociology (much of it from Nicholas’s own team) with down-and-dirty research about common foundational elements across human societies — both “natural” ones (tribes more-or-less undisturbed by modern society) and “artificial” ones (religious sects and shipwrecked crews).

tl,dr:

First, the book gives a tour of artificial and real communities, and their defining features:

Pre-industrial societies (such as the Hadza in Tanzania)
Hippie communes in the 70s
Utopian communities in the 1800s
Sexual mores in uncommon cultures (non-monogamous or polygynous)
Religious sects (Shakers)
Shipwrecked crews (some successful, some disasters)

Nicholas takes findings from these communities and references them against his own research of human behavior in controlled circumstances (think, using Amazon Mechanical Turk, MMORPG-esque games, and other controlled sociological experiments to test human social behavior given variations of the prisoners’ dilemma), and against our behavior as compared to other intelligent primates (Chimps and Bonobos), and comes up with a central theme:

“Humans are all genetically hard-wired to construct certain types of societies, based on certain foundational cultural elements. These societies trend towards “goodness”, with predispositions towards:

Kindness, generosity, and fairness
Monogamy (or light polygyny)
Friendship
Inclination to teach
Leadership

There are differences between people, and possibly across cultures, based on genetic differences, but these distinctions are trivial when measured against the commonalities in the societies we build”

Or, in his own words:

“We should be humble in the face of temptations to engineer society in opposition to our instincts. Fortunately, we do not need to exercise any such authority in order to have a good life. The arc of our evolutionary history is long. But it bends toward goodness.”

It’s an all-encompassing statement. Given the breadth of human experience, it’s a hard one to either negate or endorse without begging a thousand counterexamples.

(This summary comes out sounding like I’m accusing Blueprint of being primarily hand-wavy sociology, which wasn’t intentional. The research and historical errata are fairly hard science. But the conclusion is markedly less concrete than the research behind it.)

To be honest, I had more fun with the world tour — the fun anecdotes like “Hippie urban communes in the 70s actually did fewer drugs than the norm”, or “certain Amazon tribes believe that children are just large balls of semen, and children can have five fathers if the mother sleeps with every dude in the tribe” — than I had any “aha” moments with regards to the actual thesis.

My guess is that the book is a decade before its time — in 2020, we know enough to confidently state that “genes matter”, but are only beginning to get the faintest glimpse of “which genes matter”. Until the biology research catches up with the sociology (I never expected myself to type that) it’s hard to separate out “humans, because of these specific genes, organize ourselves into monogamous or lightly-polygynous societies with altruism, friends, respect for elders, sexual jealousy and love of children from “any complex society inherently will develop emergent properties like friends, altruism and sexual jealousy”.

I did find one interesting, tangible, take-away: the examples in Blueprint suggest a common recurring theme of physical ritual, like ceremonial dances and singing, in successful “artificial” communities.

Obviously, song & dance are a central theme in pretty much every natural community (eg, civilizations which developed over thousands of years) as well, but it’s easier to use artificial communities as a natural experiment, because many of these “new” communities completely failed — we generally don’t get to observe historical cultures fail in real-time.

(to be clear, this was not even slightly a central theme of the book — I’m extrapolating it from the examples he detailed)

In the chapter on ‘Intentional communities’ (that is, constructed societies, a la communes or utopian sects), Nicholas discusses the remarkable success of the Shaker sect. Why remarkable? Because the sect endured, and even grew, for a hundred years, despite some obvious recruiting challenges:

Shakers worked hard, all the time
Shakers didn’t (individually) own possessions
Shakers were utterly, absolutely, celibate

Much of the appeal of the Shaker communities to converts was the camaraderie and in some ways progressive values, like equality between the sexes. But a lot of the success seems to stem from kinship and closeness from ritual:

“Religious practice involved as many as a dozen meetings per week with distinctive dances and marches.”

Wikipedia adds to this story with contemporary illustrations; here, “Shakers during worship”:

I’m sure that economic and cultural aspects of Shaker communities also attracted converts and retained members, but I have to wonder whether part of the success of Shaker-ism (despite the extreme drawbacks of membership) was due to the closeness engendered by… essentially constant, physical ritual.

The second example was from Ernest Shackleton’s Imperial Trans-Antarctic Expedition. The tl,dr of Shackleton’s expedition is:

28 men were shipwrecked in Antarctica (aboard the Endurance)
For the better part of a year, they were stuck on an ice-bound boat, with no obvious exit plan
There was absolutely no fighting or tension in the crew. Nobody was killed, left to die, or recycled as dinner. In fact, nobody died, at all.

(3) is a remarkable achievement, given the other shipwrecked “societies” described in Blueprint — shipwrecked crews were wont to fall prey to violence, infighting, and occasionally cannibalism. Blueprint quotes survivors though, as they describe how the crew of the Endurance… endured:

“Strikingly, the men spent a lot of time on organized entertainment, passing the time with soccer matches, theatrical productions, and concerts… On the winter solstice, another special occasion, Hurley reported a string of thirty different “humorous” performances that included cross-dressing and singing. In his journal from the ordeal, Major Thomas Orde-Lees (who later became a pioneer in parachuting) noted: “We had a grand concert of 24 turns including a few new topical songs and so ended one of the happiest days of my life.”

It’s hard to separate cause and effect — a crew already inclined towards murdering each other over scarce Seal-jerky is unlikely to put on a musical production — but it seems likely that the “ritual” entertainment was a reinforcing element of the camaraderie as much as it was an artifact.

It’s hard to conjure up many strong feelings about Blueprint. It’s worth reading for the anecdota and history, but my main take from the descriptions of “in-progress research”, is that in a decade, we’ll be able to actually tie human behavior back to the genetic underpinnings, and won’t have to speculate quite as much.

Blueprint is a good read, but the sequel will (hopefully) prove an even better one.