💡A short history of knowledge technologies
How does OpenAI's latest language model, GPT-3, compare to previous innovations?
|Azeem Azhar||Jul 21|| 27||4|
This is a subscriber-only post which I am making open access until July 25th. Please consider subscribing.
How should we think about OpenAI’s new language model, GPT-3?
The model, released with a very simple API that allows even people who write code at my level to play with it*, is getting people breathless, probably with good reason. (I also recommend that you read a typically excellent overview of GPT-3 and community reaction to it from Gideon Lichfield’s Technology Review. This also touches on a number of key criticisms around bias, and draws attention to various technical limitations.)
Technology helps us manipulate matter, energy or information. Today, courtesy of computation, we tend to start our experiments with manipulating information.
Written records allow us to stand on the shoulders of giants—and, crucially, bring tools of criticism and scepticism to established beliefs.
Software has eaten the world and all that. Experiment in silico before you build in vivo. Information matters more than matter matters. During the Exponential Age we will do remarkable things with matter and energy, but manipulating information will be critical to doing any of those things.
Five thousand years ago, cuneiform script helped us coalesce into larger and larger cities. Thousands of years earlier, tally sticks had let us transmit our experiences even when we weren’t there.
But Gutenberg’s printing press, dating from likely 1449, transformed the nature and scale of knowledge, and our relationship to it. His innovation in Mainz is well-studied – it dramatically reduced the cost of producing books.
The chart below shows the decline in the price of books in the Netherlands. The first printing press arrived there in 1473.
More copies of individual books were printed. Topics moved beyond the Bible, religious encyclicals and the occasional Latin grammar. Dramatic variegation in subjects followed, and the number of people reading books increased. In Germany alone, book consumption increased about 100-fold in the three centuries after that first poem was printed by Gutenberg.
Growth in book output drove literacy rates – it increased the number of people writing books (and other things). The costs of accessing the knowledge in printed books rose. There were many gatekeepers to knowledge production (the publishing process, peer review). Accessing books was expensive (to buy) or time-consuming (to find in libraries). Much as I fondly remember the reading rooms of the Bodleian, finding what you wanted involved much faffing about. Today, more than 1.5m new books are published annually.
The Internet represented another step forward in developing our access to knowledge, through reducing the cost of distributing it. LISTSERVs, Usenet groups and FTP archives full of LateX documents being some of the major milestones in the latter years of the Internet’s academic development. This increased the dispersion in the utility of information. Because anyone could publish, anyone did.
Accessing this was also easier than printed libraries. Running an FTP session to http://xxx.lanl.gov, the site of the first preprint server, was easier than slugging it through a library.
However, discovery became a bigger problem. Search engines began to tackle this. They indexed the information on the Web to make it ‘universally accessible to all’. Rather than trying to remember fifty different websites and their particular archives, I only needed to remember Google.com. Over time, Google got better at figuring out what I needed and letting me putting in more and more sophisticated searches, in progressively more normal language.**
But the answers we got were incomplete. Queries didn't give us answers. They gave us documents. We then needed to do the work. To assemble an answer or get a job done, we’d need to choose one or more results, read them, extract what we wanted and then go and act on this reconstituted knowledge.
Search engines tried to go one better. They introduced “oneboxes”, separate panels which give specific answers to queries. While increasingly automated, the logic behind oneboxes is hand-coded. (See the example below.)
Microsoft acquired Powerset, a natural language processing startup I had invested in, which had the technology to power oneboxes.
Creating a generalisable tool for actually answering the question you wanted answering was hard. People believed it required lots of knowledge about the world to be explicitly encoded in specific ways. The Cyc project (now Cycorp) tackled this area for 30 years before launching commercially.
I was briefly an advisor at TrueKnowledge which built similar technologies. One of our demos from 2008 is below (Amazon’s Alexa team subsequently acquired TrueKnowledge).
But the problem remains. Even though search engines are much better than they ever have been, the hard work of turning that knowledge into what we want to get done is left to us. Whether it is figuring out the padding syntax for CSS or summarising the transcript of a YouTube podcast, or cribbing from Wikipedia***… it is left to us.
One key advantage of symbolic AI approaches, like that of Cyc and Trueknowledge, is that they have some ability to reason and explain their reasoning. (See the explanation laid out by TrueKnowledge’s system below.) This gives them the capability to generate new statements based on the relationships coded in their semantic networks—and more importantly, show its chain of reasoning.
Statistical approaches to natural language processing, such as those used to index the Web by Google and its ilk, have typically been much less good at figuring out relationships between concepts, and much less good at explaining why they give the answer they give.
Enter the d̶r̶a̶g̶o̶n̶ GPT-3
GPT-3 is the latest in a long procession of natural language models called transformers, which have been around since 2017. They represented an improvement in statistical approaches to dealing with language. (I’m not going to drill into the details here because I have written about it before.) GPT-3 is a large improvement on its predecessor, GPT-2. At 100 times the scale, it’s very flexible.
What GPT-3 has achieved is the encapsulate knowledge, billions of words of it, in a sufficiently parameterised model that it can give granular answers to very different types of queries, across multiple domains. It can also follow quite complex instructions across those domains.
Turn plain English into algorithmic expressions.
Write acceptable summaries of complex research papers.
Recommend how to run effective board meetings.
In a sense, it does something the search engine doesn’t do, but which we want it to do. It is capable of synthesizing the information and presenting it in a near usable form. (This demo of an answer engine that goes one step beyond Google makes the point.)
Transformers take the cuneiform tablet -> printed book -> Internet -> search engine evolution one stage further.
The power of such systems, built using transformers today, is that they enable a new class of knowledge manipulation. They make the process of synthesizing knowledge cheaper, which will have knock-on effects both substituting of that activity and spurring a range of new innovations around it.
Even as Sam Altman, CEO of OpenAI, cautions against excessive hype, I think there could be something interesting going on with this type of technology (and this particular instantiation of it.) The point is – it may represent a step-change in our access to and manipulation of knowledge.
I’ll be following up with more thoughts about this in the coming weeks.
P.S. Gwern Branwen has a fantastic post on the ins, outs and practical potential of GPT-3. If you have made it this far, take a look.
* OpenAI hasn’t yet given me access to GPT-3, so I’m only able to look at it second hand.
** This abbreviated history has obviously skipped the bits where Google soiled its search results with PPC adverts. But Del rigor en la ciencia and all that.
*** I’ve obviously never done this. But I hear that some people do.
Our World in Data https://ourworldindata.org/books
Barbier and Birrell, Gutenberg’s Europe.