How we think about hiring

Gradient is growing, and we need to hire. We are kicking off our process this week. It’s funny, along with sales, how to hire well is one of those crucial, practical things they don’t teach you in business school. We have been sort of flying by the seat of our pants each time when we run this process.

In our typical fashion, we decided we would try thinking things through and doing them a bit differently. For our last process, we decided very early on that we would not think about defining a “role” and finding someone that could be slot into that. I basically don’t believe in defined roles for knowledge-work. The whole idea is that it’s something that requires creative thought, so how can you predict in advance what will be needed?

Anyway, even if defined roles make sense elsewhere, work at Gradient is far too fluid for it to work with us. We all need to pitch in on all areas of the business: building statistical models, hooking up our CRM system to our email system, evaluating different AWS servers, and so on.

So the way we did it last time was to define a whole bunch of

  • Experiences (projects or models they had built in the past that were synonymous with what we do)
  • Skills (things they know how to do, from statistical modeling to server management and so on)
  • Traits (most importantly: do they want to work remotely?)

And give each of them a weight. People that had a bunch of highly-weighted attributes were more attractive to us than people that did not, simple as that. We even automated a large part of this process, having interested candidates take a survey that gave us a score. Take it here, if you’re interested — although consider this one obsolete.

I think that for the next hire, we’ll follow a similar process. It maps well to my mental model of how our company works and worked very well for us last time.


Thinking in Prolog

I’ve been having a lot of fun working on a side project written in Prolog, which is a logical programming language. It has been incredibly difficult to wrap my mind around how it works but I’m getting there, and I can see that “there” is a pretty powerful place. It makes me wonder why this paradigm isn’t used for more applications.

It turns out (you may already know this if you write code) that programming languages work very differently depending on which paradigm they use.

From Wikipedia:

Common programming paradigms include:

  • imperative in which the programmer instructs the machine how to change its state,
    • procedural which groups instructions into procedures,
    • object-oriented which groups instructions together with the part of the state they operate on,
  • declarative in which the programmer merely declares properties of the desired result, but not how to compute it
    • functional in which the desired result is declared as the value of a series of function applications,
    • logic in which the desired result is declared as the answer to a question about a system of facts and rules,
    • mathematical in which the desired result is declared as the solution of an optimization problem

I’ve found functional programming to be so intuitive and object-oriented (probably the most common paradigm today) to be so counterintuitive that I almost always write my programs in a functional style.

In a logic program, you write down a set of facts and rules. That’s it. That’s your program. And then you run the program by asking it if it can find a solution to a new query.

Prolog in particular is built around a type of logical statement called a Horn clause that basically reads “X is true if Y and Z are true”. In Prolog that would look like:

X :- Y, Z.

A more complete program (but still tiny) program would look like:

food(sandwich). % declaring a fact that sandwich is a food
drink(water). % declaring that water is a drink

meal(X, Y) :- food(X), drink(Y).
% X and Y (variables) make a meal if X is a food & Y is a drink

When you run this program, you’ll get:

?- meal(X, Y).
X = sandwich,
Y = water.

That’s a toy example, but you can see how quickly the complexity could build up with ever-more relationships and so on. I’m working on writing a program to solve the common wedding-seating problem where you have a set number of tables, people you want to sit next to each other, and people that you want to sit at different tables. It’s fun.

The space of things you can write

This morning I was reminded of a few tweets by Francois Chollet, a machine-learning researcher and practitioner who is the lead on Keras, the most popular deep learning library.

These thoughts have stuck with me.

This morning, I read Fred Wilson’s post celebrating fifteen years’ of writing on his blog, AVC.Reading it, I wondered the degree to which Fred has benefited from the discipline of expressing his thoughts in a precise, unlimited, and permanent form.

I’ve been missing that. Twitter has replaced writing for me — I agree with Brian Norgard’s take that Twitter’s change to 280 characters was the final death blow to blogging (if Google killing Reader was the pummel of body blows that knocked it down)

But if writing is self-directed, Twitter is a to-do list of things to think about or react to created by other people (if, done well, created by a very select group of other people that you’ve chosen and refined over time).

Putting it all together, I miss writing and spend too much time on Twitter. So it’s time to restart this thing. Should be fun.


He and I had been close when we were young. I remember him as a toddler; I even think I remember holding him as a baby. Four years my junior, we would play together as little children at my house along with some of my friends. I remember playing “buffalo”, where we would essentially run at each other on all fours. He would make up for his disadvantage in size and years with a fierce tenacity. But you would have been misled if that’s what you took of him, because he would grow into the sweetest, most kind-hearted person you could find.

Over time, school and summers filled with extracurricular vocations filled the spaces where we would have seen each other, and we drifted apart. It wouldn’t be honest to say that he was a major fixture in my life over many years, but that did not stop me from feeling the tensile strength of the residual bond between us, reaching forward from back in time, when it snapped.

If life is a braid through time, our lives had once been entwined, and I found myself grasping too late for threads that had long ago become unfurled.

The pictures at the funeral showed him rock climbing — why didn’t we ever go together? That was something we both loved. Why didn’t I at least ring him from time to time to keep up on his story?

My mom asked me to reach out to him after he completed his last semester. He had been in and out of graduate school, dealing with depression, and he had gone back and done well. Wouldn’t I just send him a quick note of congratulations? Of course I would. And within a few days, he wrote back, but I never opened his message. While he was alive, I never read him tell me he had had some ups and downs, but that he would persevere. And by the way, congratulations to me on becoming an uncle.

How could I, who had lived with someone whose affliction with Bipolar II manifested itself, who, after that chapter, committed to being vigilant to these warning signs, who had believed that familiarity with mental health symptoms should be something they teach you in school so as to become common knowledge, let this pass without pouncing on it? Demand a phone call; take a flight to Chicago; whatever it took. At least read the message he sent back.

Maybe it’s just a fact of life that these pathways sometimes only reveal themselves in the counterfactual. That only when it’s impossible to make them, do some decisions make themselves apparent. My grandmother used to say that the three worst words in the English language are “coulda, woulda, shoulda”, but our imaginations can torment us with alternate histories. And maybe it was fate, but when reading a book on this very topic, The Book of Why, I came across the following quote (with genders reversed):

I am convinced that, whether right or not, he was sure some sinister change was going on in his brain, from which he could never recover. So in tenderest love to us all he chose to spare us the grief of sharing with him the spectacle of such a tragic decline.

I can only hope that it was done with tender love, and not only some excruciating psychic pain. And I’m not sure whether to hope that I could have made a difference, or could not have. Either way, it’s a moot point now, and a voice nags at me that this gratuitous self-examination wildly misses the point. A good person is dead, he had been in pain, and his family is now exploring the depths of grief.

So. Many. Fucking. Bots. (And they’re good)

Ugh. This can be considered a follow-up to my post yesterday on the same issue.

On Facebook today, I saw the following link in my “trending” section.

Screenshot 2018-03-26 16.25.54

The story is essentially just this one picture that’s been circulating around Twitter:

Well, isn’t this some great liberal content! Red meat for the base. Well — knowing what I know about deception on Twitter, my first instinct these days is to be skeptical about the integrity of the information I’m getting. And, sure enough, if you go to @LazyyMillennial‘s twitter timeline, it’s been made very recently and devoid of any real personal content.

Very much not surprisingly, if you run this account through the same botrnot model I used yesterday, you get a very high likelihood (~92%) that this account is a bot.

Screenshot 2018-03-26 17.30.33

“The Hill” is a pretty well-known outlet, and as of the time I checked out the page 2,937 shares. And, of course, they are not the only ones that picked up on it.

HuffPo is carrying the story, as well as the Houston Chronicle. All of these outlets are credulously referring to this Twitter account as “Rebecca” even though they have not gotten any kind of quote from her (they only quote what’s posted on her profile). Some are even featuring a screenshot of her photo in a kind of Ken Burns moving image with narration video.

Now, perhaps the picture is real (Huffpo’s story had the image tweeted out by someone else), but “Rebecca” is certainly helping to amplify the message:

Screenshot 2018-03-26 17.32.13

Is this a bot in the sense of a purely automated account? I don’t think so. It seems to me like a human writing these quotes. But is this person the person they claim to be? Seems very, very unlikely.

Once you start looking for bots — or impersonated accounts, which may be a better term, they are everywhere. That last piece of content you chuckled at while wondering why {conservatives, liberals} were so dumb? (“Yeaaahh we gottem good!”) Or that made you really angry, and made you wonder how “the other side” could think that way? That may well have been manufactured. That may not be real. There seem to be a lot of “Rebeccas” out there.


Our discourse is still being poisoned

Today on Twitter I came across a tweet that seemed so heartless, so inhumane, that I had to wonder whether or not the perpetrator, was, in fact… human

For those who don’t know, Patrick Petty is a father of one of the Parkland victims. Unlike most of the people that have emerged and become quite visible out of Parkland, Patrick is conservative and pro Second Amendment.

So, the tweet above is beyond inappropriate. But if you look further, something seems a bit off. If you check out @simonsaysboohoo‘s profile, it’s a mélange of emotionally-charged liberal content. It’s… exclusively a mélange of emotionally-charged liberal content. There’s nothing else.

Using the botrnot model developed by Mike Kearney, this account has a very high probability of being a bot (~92%):

Screenshot 2018-03-25 20.23.34

The big problem of course is that a lot of people are going to think that @simonsaysboohoo is a real person, reflective of an “other side” where people actually do think these objectionable and immoral things. And why would they think otherwise? A person is not privy to the summary statistics of user behavior that the model above is using to calculate the probability of bot-hood. We are enormously easy to fool.

So, someone is trying to inject poison into our public debate; but who? and why?

And probably most importantly, why isn’t Twitter (and the same applies to Facebook and Google) doing more to prevent this kind of deception?

Gradient’s Principles

When I started Gradient I knew that eventually, I wanted it to be a principle-driven company. What does that mean, and why is it important? Well — here’s what I wrote our team (on July 29th):

Guys: one more thing. One thing that I have started to think about, and would like both of you to think about, is what principles should we develop to drive our company. I would like Gradient to be a principle-driven company. Why not a customer-driven, or technology-driven, or profit-driven company? Well of course we should always serve our customers, use the latest technologies, and make money. But customers change (and can sometimes turn on you), and technology and profit are means, not ends. Principles don’t change – they won’t reneg on a contract, they won’t turn on you, etc.

Having a core set of principles will help guide our behavior, especially in difficult circumstances. What happens when a customer doesn’t like our analysis? What happens when a customer renegs on a contract? How should we behave?

When we hire new people, what should we be looking for, other than technical competency? When we have a difficult decision, what should we be considering, other than our respective opinions?

This may seem fluffy, but at the best companies it’s anything but. Specifically, I’m thinking about Amazon, whose twelve leadership principles are used all the time in the regular course of business there.

So I’d like everyone to start thinking about what they’d like to see in our set of principles, and when it’s right, get them down on paper. There’s no timeline to this (yet), as this is not a case where done is better than perfect.

At the end of the day, they have to do at least these four things:

  • reflect our personal values
  • be effective guides to decision making and action
  • serve as an effective rubric for character inside our company
  • be useful – they can’t just live on a shelf somewhere

I’m happy to say that just over a month later — after a collective process where the three of us came together, pooled our thoughts, and worked collectively to draft the same document — we’ve landed on the following eleven principles. I’m immensely proud of them, and can’t wait to use them all the time in day-to-day discussions. While some of them are especially important given our remote set-up, they all feel mostly timeless to me. Have a read and let me know what you think.

Gradient leaders… (and we’re all leaders)

  • Are honest and integral to a fault: if we say something will be done, then it will be done. We conform our words to reality (honesty) and reality to our words (integrity).
  • Do more with less: We are frugal and look for ways to avoid spending time and money when it is not needed.
  • Think win-win: Success is not zero-sum. Our clients, partners, and vendors’ success is our success. We build credible, reliable, and honest relationships with every client, partner, and ally.
  • Prove themselves wrong: We seek diverse perspectives, look for alternative hypotheses, investigate the details, and stress-test our analyses to ensure that we are right. We never assume we are right.
  • Are obsessed with constant improvement: Individually, we are always looking to learn new techniques and develop valuable skills sets. We proactively seek feedback to improve our collective performance.
  • Collaborate and communicate extremely well: We value team contribution over individual contribution. We are excellent team players that go the extra mile to make it easy to work with others.
  • Deliver results, not work: We don’t value work, we value results. We are always moving toward delivering value to our clients.
  • Take care of each other: We care for each other’s well-being and celebrate alternative perspectives.
  • Self-Manage: We take ownership of our work by prioritizing and organizing effectively with our colleagues while acting on behalf of the entire company.
  • Investigate deeply: We are never satisfied with the first layer of understanding, or fixing symptoms instead of underlying causes. We fix problems so they stay fixed.
  • Are ambitious risk takers: We push the definition of normal by moving fast and pursuing new, unconventional solutions.

Unpacking the election results using bayesian inference

As anyone whose read this blog recently can surmise, I’m pretty interested in how this election turned out, and have been doing some exploratory research into the makeup of our electorate. Over the past few weeks I’ve taken the analysis a step further and built a sophisticated regression that goes as far as anything I’ve seen to unpack what happened.

Background on probability distributions

(Skip this section if you’re familiar with the beta and binomial distributions.)

Before I get started explaining how the model works, we need to discuss some important probability distributions.

The first one is easy: the coin flip. In math, we call a coin flip a Bernoulli trial, but they’re the same thing. A flip of a fair coin is what a mathematician would call a “Bernoulli trial with p = 0.5”. The “p = 0.5” part simply means that the coin has a 50% chance of landing heads (and 50% chance of landing tails). But in principle you can weight coins however you want, and you can have Bernoulli trials with p = 0.1, p = 0.75, p = 0.9999999, or whatever.

Now let’s imagine we flip one of these coins 100 times. What is the probability that it comes up heads 50 times? Even if the coin is fair (p = 0.5), just by random chance it may come up heads only 40 times, or may come up heads more than you’d expect – like 60 times. It is even possible for it to come up 100 times in a row, although the odds of that are vanishingly small.

The distribution of possible times the coin comes up heads is called a binomial distribution. A probability distribution is a set of numbers that assigns a value to every possible outcome. In the case of 100 coin flips, the binomial distribution will assign a value to every number between 0 and 100 (which are all the possible numbers of times the coin could come up heads), and all of these values will sum to 1.

Now let’s go one step further. Let’s imagine you have a big bag of different coins, all with different weights. Let’s imagine we grab a bunch of coins out of the bag and then flip them. How can we model the distribution of the number of times those coins will come up heads?

First, we need to think about the distribution of possible weights the coins have. Let’s imagine we line up the coins from the lowest weight to the highest weight, and stack coins with the same weight on top of each other. The relative “heights” of each stack tell us how likely it is that we grab a coin with that weight.

Now we basically have something called the beta distribution, which is a family of distributions that tell us how likely it is we’ll get a number between 0 and 1. Beta distributions are very flexible, and they can look like any of these shapes and almost everything in between:

Taken from Bruce Hardie:


So if you had a bag like the upper left, most of the coins would be weighted to come up tails, and if you had a bag like the lower right, most of the coins would be weighted to come up heads; if you had a bag like the lower left, the coins would either be weighted very strongly to come up tails or very strongly to come up heads.

This distribution is called the beta-binomial.

Model set up

You might now be seeing where this is going. While we can’t observe individuals’ voting behavior (other than whether or not they voted), we can look at the talleys at local levels, like counties. And let’s say, some time before the election, you lined up every voter in a county and stacked them the same way you did with coins as before, but instead of the probability of “coming up heads”, you’d be looking at a voter’s probability of voting for one of the two major candidates. That would look like a beta distribution. You could then model the number of votes for a particular candidate in a particular county would as a beta-binomial distribution.

So in our model we can say the number of votes V[i] in county i is distributed beta-binomial with N[i] voters and voters with p[i] propensity to vote for that candidate:

V[i] ~ binomial(p[i], N[i])

But we’re keeping in mind that p[i] is not a single number but a beta distribution with parameters alpha[i] and beta[i]:

p[i] ~ beta(alpha[i], beta[i])

So now we need to talk about alpha and beta. A beta distribution needs two parameters to tell you what kind of shape it has. Commonly, these are called alpha and beta (I know, it’s confusing to have the name of the distribution and one of its parameters be the same), and the way you can think about it is that alpha “pushes” the distribution to the right (i.e. in the lower right above) and that beta “pushes” the distribution to the left (i.e. in the upper left above). Both alpha and beta have to be greater than zero.

Unfortunately, while this helps us understand what’s going on with the shape of the distribution, it’s not a useful way to encapsulate the information if we were to talk about voting behavior. If something (say unemployment) were to “push” the distribution one way (say having an effect on alpha), it would also likely have an effect on beta (because they push in opposite directions). Ideally, we’d separate alpha and beta into two unrelated pieces of information. Let’s see how we can do that.

It’s a property of the beta distribution that its average is:

alpha + beta

So let’s just define a new term called mu that’s equal to this average.

mu = ------------
     alpha + beta

And then we can define a new term phi like so

phi = --------

With a few lines of arithmetic, we can solve for everything else:

phi = alpha + beta
alpha = mu * phi 
beta = (1 - mu) * phi

And if alpha is the amount of “pushing” to the right and beta is the amount of “pushing” to the left in the distribution, then phi is all of the pushing (either left or right) in the distribution. This is a sort of “uniformity” parameter. Large values of phi mean that almost all of the distribution is near the average (think the upper right beta distribution above) – the alpha and beta are pushing up against each other – and small values of phi mean that almost all the values are away from the average (think the beta distribution on the lower left above).

In this parameterization, we can model propensity and polarization independently.

So now we can use county-level information to set up regressions on mu and phi – and therefore on the county’s distribution of voters, and how they ended up voting. Since mu has to be between 0 and 1 we use the logit link function, and since phi has to be greater than zero, we use the exponential link function

logit(mu[i]) = linear function of predictors in county i
log(phi[i]) = linear function of predictors in county i

The “linear functions of predictors” have the format:

coef[uninsured] * uninsured[i] + coef[unemployment] * unemployment[i] + ...

Where uninsured[i] is the uninsurance rate in that county and coef[uninsured] is the effect that uninsurance has on the average propensity of voters in that county (in the first equation) or the polarity/centrality of the voting distribution (in the second equation).

For each county, I extracted nine pieces of information:

  • The proportion of residents that do not have insurance
  • The rate of unemployment
  • The rate of diabetes (a proxy for overall health levels)
  • The median income
  • The violent crime rate
  • The median age
  • The gini coefficient (an index of income heterogeneity)
  • The rate of high-school graduation
  • The proportion of residents that are white

Since each of the above pieces of information had two coefficients (one each for the equations for mu and phi) the model I used had twenty parameters against 3111 observations.

The source for the data is the same as in this post, and is available and described here.

The BUGS model code is below: (all of the code is available here and the model code is in the file county_binom_model.bugs.R)

Model results / validation

The model performs very well on first inspection, especially when we take the log of the actual votes and the prediction (upper right plot), and even more so when we do that and restrict it only to counties with greater than 20,000 votes (lower left plot):


This is actually cheating a bit, since the number of votes for HRC (which the model is fitting) in any county is constrained by the number of votes overall. Here’s a plot showing the estimated proportion vs. the actual proportion of votes for HRC, weighted by the number of votes overall:


Here is the plot of coefficients for mu (the average propensity within a county):


All else being equal, coefficients to the left of the vertical bar helped Trump, and to the right helped Clinton. As we can see, since more Democratic support is concentrated in dense urban areas, there are many more counties that supported Trump, so the intercept is far to the left. Unsurprisingly (but perhaps sadly) whiteness was the strongest predictor overall and was very strong for Trump.

In addition, the rate of uninsurance was a relatively strong predictor for Trump support, and diabetes (a proxy for overall health) was a smaller but significant factor.

Economic factors (income, gini / income inequality, and unemployment) were either not a factor or predicted support for Clinton.

The effects on polarity can be seen here:


What we can see here (as the intercept is far to the right) is that most individual counties have a fairly uniform voter base. High rates of diabetes and whiteness predict high uniformity, and basically nothing except for income inequality predicts diversity in voting patterns (and this is unsurprising).

What is also striking is that we can map mu and phi against each other. This is a plot of “uniformity” – how similar voting preferences are within a county vs. “propensity” – the average direction a vote will go within a county. In this graph, mu is on the y axis, and log(phi) is on the x axis, and the size of a county is represented by the size of a circle:


What we see is a positive relationship between support for Trump and uniformity within a county and vice versa.

And if you’re interested in bayesian inference using gibbs sampling, here are the trace plots for the parameters to show they converged nicely: mu trace / phi trace.

Conclusion and potential next steps

This modeling approach has the advantage of closely approximating the underlying dynamics of voting, and the plots showing the actual outcome vs. predicted outcome show the model has pretty good fit.

It also shows that whiteness was a major driver of Trump support, and that economic factors on their own were decidedly not a factor in supporting Trump. If anything, they predicted support for Clinton. It also provides an interesting way of directly modeling unit-level (in this case, county-level) uniformity / polarity among the electorate. This approach could perhaps be of use in better identifying “swing counties” (or at least a different approach in identifying them).

This modeling approach can be extended in an number of interesting ways. For example, instead of using a beta-binomial distribution to model two-way voting patterns, we could use a dirichlet-multinomial distribution (basically, the extension of beta-binomial to more than 2 possible outcomes) to model voting patterns across all candidates (including Libertarian and Green), and even flexibly model turnout by including not voting as an outcome in the distribution.

We could build similar regressions for past elections and see how coefficients have changed over time.

We could even match voting records across the ’12 and ’16 elections to make inferences about the components of the county-level vote swing: voters flipping their vote, voting in ’12 and not voting in ’16, or not voting in ’12 and then voting in ’16 – and which candidate they came to support.

The importance of model thinking

The other day, I got into a weird argument with my cousin (who is among the smartest people I know). We were discussing Sam Wang, the Princeton professor who runs the Princeton Election Consortium (PEC). Of the well-known prognosticators about the election, Dr. Wang was the most wrong, with his site estimating a probability of HRC winning to be over 99%.

My cousin was arguing that the results eliminated Dr. Wang’s credibility in this field and that we basically shouldn’t be listening to him any longer. Because he had been so spectacularly wrong, why should he be trusted again?

But this is wrong, and why it’s wrong is important in the discourse of ideas. First, Dr. Wang wasn’t reporting that he himself was estimating these odds for HRC, he was reporting that his model was outputting these estimates. This is an important distinction. He may have been convinced by the statistical model he was referring to, and he may have also believed its reported estimates, but what’s important for these purposes is that he was reporting the results of an independent model, not simply saying that’s what he believed.

Certainly, the model that the PEC had been using has lost its credibility. We now know that it didn’t properly incorporate correlated error in the outcomes at the state level (e.g. a miss in PA making a miss in MI and WI more likely), and it underestimated the distribution of overall polling bias. We shouldn’t use it again.

But what if Dr. Wang creates a new model that corrects for these mistakes? How now should we take my cousin’s advice to disregard Dr. Wang? Do we not even bother with the new model since the source is tainted by the previous election results? Do we inspect the model independently?

We can see here that my cousin’s advice doesn’t make much sense if you treat the model and its author separately. Clearly the new model must be treated on its own merits.

But this gets at a deeper question. What is a recommendation, a forecast, and estimate, an analysis, etc., without a model? The answer is that there is always a model, because there is always some kind of computation that leads to the end result, even if that computation is taking place entirely within the neural circuitry of the analyst. In these cases, when people simply come to their own conclusions, the author and the model are one and the same. There are no equations, parameters, logical relations, etc., that observers can evaluate and see if the specification does or does not make sense.

If Dr. Wang had not made his model explicit and had simply been reporting his own estimates, then my cousin perhaps would have been right. In this world, the logic would go something like this: his (Dr. Wang’s) model turned out to be bad, and his model was him, so disregarding his model and disregarding him are one and the same.

But this of course was not the case, and this is why it is so important to think in terms of explicit models. If you don’t have a model in mind facing something in the real world, it’s not even clear to me how you update your knowledge, aside from adding an additional memory to your bank of heuristics. When you understand how a model functions – the relationships between its several parts – you can adapt and improve it in the face of real-world experience.


Three reasons to be a raving lunatic about Trump

The other night, my mind was literally 💥 when two of my very smart friends challenged me on the idea that Trump being elected president was not the worst thing to happen in 2016. To me – it’s an absolute no-brainer, and a conclusion that even a little imagination combined with historical knowledge would usher you to.

I guess in part it depends on how bad you think things could get; I think the worst cases are so bad that it’s driven me to become a raving lunatic. Here’s why I think you should be one too:

International Conflict

This one is especially salient given the murder of the Russian ambassador to Turkey.

Matthew White calls the beginning of the 20th Century the “Hemoclysm” – literally, blood flood – because of the staggeringly large conflicts and loss of life. WWI, WWII, the Holocaust, the Great Purge, and two nuclear bombs all happened during this time. How did this era begin? As Steven Pinker puts it in The Better Angels of our Nature:

The war was a perfect storm of destructive currents, brought suddenly together by the iron dice of Mars: an ideological background of militarism and nationalism, a sudden contest of honor that threatened the credibility of each of the great powers, a Hobbesian trap that frightened leaders into attacking before they were attacked first, an overconfidence that deluded each of them into thinking that victory would come swiftly, military machines that could deliver massive quantities of men to a front that could mow them down as quickly as they arrived, and a game of attrition that locked the two sides into sinking exponentially greater costs into a ruinous situation – all set off by a Serbian nationalist who had a lucky day.

We live in a fragile world. The complexities of international diplomacy are enormous, and the consequences can be more severe than we can imagine. And Trump has shown himself to be quite willing to fly by the seat of his pants on things like the One China policy, our support of NATO, and just bombing the shit out of wherever.

The President’s #1 job is to keep America safe. This is not as simple as battening down the hatches and keeping “America First”; it involves real, nuanced thinking about how to carefully deploy threats, when to back them up, and when to offer the olive branch. Are we going to get anything close to that with Trump? Don’t tell me he has good advisers – all the easy decisions get made before they get to the oval office.

Norms of American Democracy

Although we have a Constitution that is the ultimate source of law in this country, most of what makes our system of government actually so great is that our leaders respect norms regarding the use and transfer of power. When incumbents lose, they leave office! When candidates lose, they concede! Until, maybe, now.

In addition, candidates for higher office have been careful to make clear that they have no conflicts of interest (say, by releasing their tax returns) and that they are acting in the interest of the country (at least according to their worldview).

Trump has done the exact opposite of all this. He refused to say he’d concede in the event of a loss. He hasn’t released his tax returns. His businesses present a bewildering array of conflicts of interest. He’s installing his family (who are managing his businesses) as his closest confidantes.

Now, putting aside whether any of this is illegal, it certainly screws up the incentives in our country. The way to succeed becomes to curry favor with the government. This is how countries become Russia – a sham democracy and a corrupt kleptocracy.

And then of course there’s the lying. So much lying. Like, for instance, that he had one of the largest electoral college wins (his was one of the smallest); or that the murder rates are the highest they’ve been in decades (they’re the lowest). Politicians lie, have always done so, and will always do so. But – at least in the USA – their lies at least maintain respect for the truth, as they’re couched in euphemisms or misdirection. Trump has no respect for the truth. His lies are so easy to fact-check that it’s hard to escape the impression that for him, lies like this are a much a demonstration of domination (“look how easily and bigly I can lie and you can’t do anything about it”) as anything else.

Degrading the norms of our republic – as Trump has done and seems to be intent on continuing to do – is one way to start ending the great American experiment in democracy.

Climate Change

While certainly not as sexy a way to go out as a nuclear war, climate change poses a similar kind of existential threat. We need a president that can at least recognize the science and the tradeoffs that come with different forms of climate policy – not someone who has once dismissed it as a “Chinese hoax”, or who puts a climate denier in charge of the EPA, or who puts a man who once forgot, in a nationally televised debate, the third federal department that he’d like to eliminate, in charge of said department – the Department of Energy, whose original mandate is to safeguard our stockpile of nuclear waste. What kind of thinking puts a man who’s famous for saying “oops” in charge of that department?

Putting it all together

Maybe raving lunatic was not the right word – you have to be taken seriously at the end of the day. But we should be uncomfortable about all this. We can’t let our vigilance slide and allow this (tax returns, the family posse rolling up to the White House, shattering of stable alliances, abuses on twitter, the lying etc.) to feel normal. Like the proverbial frog in the pot of increasingly-hot water, we can’t just let our government crumble around us and say “this is fine”. It’s exhausting, but we can’t get tired of calling Trump and his enablers out. As Voltaire said:

Those who can make you believe absurdities, can make you commit atrocities.