🔮 Weekly Commentary: Working with AI Tools
The temptation to churn out what “works” rather than what challenges might be too great
Text-to-image AI systems are all the rage. A month ago, OpenAI announced the beta of Dall-E, its image-generating system. When you sign up for the beta, you’ll get 50 credits initially and then 15 credits each month. Each credit allows you to pop in a prompt and receive four candidate images generated from that prompt. Additionally, you can buy 115 credits for $15.
More recently, systems built on “Stable Diffusion algorithms” have been released. Stable Diffusion is a big deal because it is compact enough to run on a laptop. And I’ve been playing with DreamStudio, a free-for-now service that uses Stable Diffusion. Like Dall-E, Stable Diffusion systems are triggered by giving them a text-based prompt.
Prompts are more-or-less structured recipes which can be fed into AI models like Dall-E or Stable Diffusion. These models, trained on oodles of data, can then spit out candidate images (or other output).
Here is the UK’s Cyberforce painted in the style of Turner using Dall-E.
Here is what Stable Diffusion generates from the prompt “Utopian city”.
Member of Exponential Do, Pascal Finette, shared some of his Mini Dall-E experiments with the community, this one titled “A human going exponential”.
In a recent essay, Kevin Kelly helped me understand how to think about how we make use of these AI tools. The interface is now a text-based prompt. Kelly points out that the new human skill will be “how you construct the prompt you give the AI”.
He drew my attention to the “Prompt book, a free PDF e-book on how to get the most out of Dall-E (or any AI image generator)”. Kelly predicts “many prompt books in the future” will be different systems that work with different prompts, much like a Leica camera responds differently to a Fuji. And underlying AI systems may perform different tasks. DreamStudio generates images. Other systems may output text or sound or video.
A prompt book is, in a sense, an instruction manual for these AI tools. To help me make sense of them, I put them into some historical context. In a sense, every successive tool is taking a higher level of abstraction than the one before, and these text-to-image services fall on that journey.
I thought back to the first graphics tools I used, mostly bitmap-based editors in the mid-1980s. You really had to build your image one pixel at a time (likely from an 8-bit or even 4-bit palette).