Tales of the Marvellous and news of the Strange
There are some tasks which computers are good at. But not everything. For instance: tasks which require imagination. Or rare events. Artificial Intelligence doesn’t seem to be good at dealing with the new or strange. Whereas we have been telling ourselves “Tales of the Marvellous and News of the Strange” for a thousand years or more. And I think the key to a good story, is that it is surprising and not predictable – but at the same time is believable, perhaps even seems inevitable. A good story has to “make sense”.
I’ve been playing around with text analysis, because some people seem to be making progress analysing storylines and plots. Matthew Jockers and Jodie Archer have written a book that suggests their computer model can predict best selling stories with 80% accuracy.
Using 2,799 features that appear in novels, they think can predict which books will be a bestseller with 80% accuracy. It’s a bit more complicated than this because they actually use three methods K Nearest Neighbour (KNN) which works 90% of the time, Support Vector Machines (SVM) 70% of the time and Nearest Shrunken Centroids (NSC) 79% of the time. The average of the three methods is 80% for every manuscript in their corpus.
From bestsellers to Annual Reports
If you can do this for bestselling books, it might be possible to do this for Management Discussions in their Annual Reports. One of my friends is working on this for bond prospectuses, which (by coincidence) he says also flags issues with 80% accuracy. Certainly, Natural Language Processing seems to be one area that computers are making good progress.
But before we get too excited I’ve noticed a problem with these models though. And it goes back to a Nonconformist clergyman who was born in 1701. Reverend Thomas Bayes. Between 1746 and 1749 Bayes was trying to work out a systematic way of dealing with uncertainty. He suggested that when dealing with uncertainty it was reasonable to guess… and then refine that guess later. So he imagined himself with his back to a table, throwing an object over his shoulder so that it would have the same chance of landing on any spot of the table. As he repeated the process, throwing objects over his shoulder again and again, he could modify his initial beliefs (his prior) with recent data (where his most recent throw landed) giving a new improved belief.
Bayes thought experiment assumed equal probabilities of the item landing anywhere. But you don’t have to assume equal probabilities, if you’ve got a good reason to have a different prior. For instance, if an event is very unlikely (say 1 in 1000) if your model isn’t close to perfect, that event is still going to be unlikely.
A university friend who is a Hollywood producer told me that she hadn’t heard of anyone using computers to analyse screenplays in Hollywood. When I knew her at university she was interested in this. She tried to analyse systematically what makes a good story (she was studying psychology) but she gave up because it was too hard. The reason why is (I think) that however good your model, a blockbuster hit is still unlikely.
80% effective is still unlikely
“80% effective” is ambiguous, it could mean two things. Does it mean that the computer can identify 8 of the 10 bestselling books each year, and avoids flagging any books that are not on the bestseller list? Or (more likely) there are also many books that score well on its system, but don’t go on to be bestsellers?
Or put another way 80% of bestsellers are good stories. But very few good stories go on to be best sellers. This is what Bayes was dealing with. The probability of a four legged animal being a horse (rather than dog, cat, iguana or field mouse), is lower than a horse being a four legged animal.
Bayes Theorem:
Suppose that I am clever enough to write a computer program that can identify good stories. When I tested my software 80% of the titles on the bestseller list are flagged are on the best seller list. But this is not the whole story. There are only 10 bestsellers a year, but every year publishers receive 10,000 stories from hopeful authors. No one sets out to spend years writing a bad story, but only a very few go on to be bestsellers. Many non bestsellers are good stories, perhaps half. My prior is 10 bestselling books out of 10,000. It’s like looking for a needle in a haystack. How much does my computer software help me answer the question “what is the probability that a book that the model flags as a good story turns out to be best seller?” The answer is
The answer is Bayes Theorem.
My 10 in 10,000 has increased, but only slightly…despite the model being 80% effective at identifying good stories that are on the bestseller list. Given the very low prior, the probability only rises from 0.1% to 0.16%. The vast majority of good stories do not go on to be bestsellers.
Perhaps the assumption that half of the books published are good stories is harsh. Maybe it’s less – but I find it hard to believe that the technology is going to identify just the bestsellers, and not flag up many more potential bestsellers which never make it.
The reason why I’m interested in this is that language around probability is really rather hard. Psychologists argue about the meaning of the word “plausible”. Probability can sometimes be about frequency (an unbiased coin will land heads up roughly half the time). But sometimes not, instead a probability is a means of describing a state of knowledge. Good stories are about unusual or infrequent events, but that nonetheless “ring true.”
Stories are our way of dealing with nature’s lack of uniformity
One technique of story telling is to drop suggestive hints through the plot, that lead the reader to a wrong conclusion. In detective stories the murderer is never obvious. In Game of Thrones George RR Martin has reversed this genre, the character who gets murdered is normally one of the least likely. The whole framework of interpretation shifts, it seems incredibly obvious.
Stories are our way of dealing with the fact that the future does not resemble the past, and that nature is not as uniform as we might come to expect, as explained by Bertrand Russell.
“Domestic animals expect food when they see the person who usually feeds them. We know that all these rather crude expectations of uniformity are liable to be misleading. The man who has fed the chicken every day throughout its life at last wrings its neck instead, showing that more refined views as to the uniformity of nature would have been useful to the chicken.”
Nature is marvellous and strange…and often somewhat cruel.
Or you could put this numerically:
1+4= 5
2+5 =12
3+6 =21, then
8+11 = ?
Most of my mathematician friends on Facebook think that the answer is 96, but I think that it is 40. Either way, you have a reasonable basis for believing either number, but as the series progresses and you have new information that may totally undermine your prior, and the new solution looks obvious. (See explanation at the bottom, if you don’t think that either number is obvious.)
Great investment stories are like this. Unilever which has been a 100 bagger over the years seems really obvious in retrospect. Another example is Teledyne, run by Claude Shannon and Henry Singleton which between 1963 and 1990 was a 180 bagger. It’s infrequent that you find increases in value like this – but when you look at the companies that do actually do perform like this they don’t seem to be particularly risky, just following a credible strategy…often their success looks inevitable in retrospect, but hard to spot at the time. That isn’t supposed to be possible according to economists who believe in Efficient Market Theory.
This has never happened before
Shannon and Singleton were both mathematicians, who appeared to follow simple rules of thumb (buy fast growing companies generating cashflow and high marketshare). Rather than High Frequency Trading, perhaps computers can learn to assess priors, and make buy and hold decisions that outperform over decades, like Teledyne and Unilever did.
We do know that Teledyne used Bayes Theorem in their consulting work. They were asked to estimate the probability of the Space Shuttle crashing. Before the Challenger disaster happened in 1986, NASA estimated the risk of an accident at 1 in 100,000. But even before the accident happened there were doubts about the spaceship’s safety, and scepticism from those outside NASA that they were not being honest with themselves about the risks of space flight. Three years earlier the US Air Force paid Teledyne to estimate the probability of failure.
Teledyne were estimating the probability of an event that had never happened before. Using prior experience of 32 confirmed failures during 1,902 rocket motor launches they applied this to the space shuttle. Teledyne estimated the probability of a rocket booster failure as 1 in 35, far lower than NASA’s 1 in 100,000 number. Sadly in 1986, on the shuttle’s 25th launch, the Challenger flight exploded killing 7 crew. The point being that often when things go wrong management like to absolve themselves from responsibility with the claim that: “this had never happened before…no one could have forseen this.” Remember Goldman Sach’s CFO claim in August 2007 that they were seeing 25 sigma events, several days in a row.
You can use Bayes Theorem to think about risky events that have not yet happened (a space shuttle crash, a High Frequency Trading blow up). But the other side of that coin is that should also be able to use Bayes Theorem to think about rare events that are extremely positive. For instance 100 baggers, which do exponentially better than “normal” companies. It’s hard to have a high level of confidence about infrequent events. Hard but not impossible. To me it’s a far more interesting (and profitable) to focus on the outliers rather than “average” returns. Financial distributions are not normal, they follow power laws. So there ways of analysing rare and surprising events, which are also coherent. I think that there are heuristics to understand how improbably successful companies and stories “make sense”.
Sources:
The Bestseller Code: Anatomy of the Blockbuster Novel – Matthew Jockers and Jodie Archer.
The Theory that Would Not Die: How Bayes’ Rule Cracked the Enigma Code, Hunted Down Russian Submarines and Emerged Triumphant – Sharon Betsch McGrayne
The Outsiders: Eight Unconventional CEOs and Their Radically Rational Blueprint for Success – William Thorndike
Fortune’s Formula The Untold Story of the Scientific Betting System That Beat the Casinos and Wall Street – William Poundstone
8+11 = ?
? could be 8+(8 x 11) = 96 but it could also be 21+8+11 = 40