🐙 Promptpack: Getting started with Code Interpreter
How to power up your analytical skills with generative AI
Last week, OpenAI released Code Interpreter, which allows ChatGPT to write, run and iterate Python1 code within the browser. Users can also choose to upload files (up to 100MB), meaning ChatGPT can read code (or data) longer than what fits in its context window of 8k tokens. We had already shared one of my first experiments with Code Interpreter in the latest Sunday newsletter, but I decided to go deeper and bring you a whole edition dedicated to this new feature.
Code Interpreter may seem like a small addition, but it will have big effects. ChatGPT is now your personal junior analyst and developer. You can get answers to quantitative questions without having to know or write Python code, lowering the technical barriers to data analytics.
In this Promptpack, we will run through a list of prompts that will take you through an end-to-end analysis using Code Interpreter. This whole process took me 20-30 minutes. I reckon without Code Interpreter it might have been the better part of a days work, maybe more. It would also have called on data cleansing and analysis skills which an analyst might not have.
This tool will inevitably enter your organisation (it’s that effective), so I highly recommend walking through your own analysis using these prompts to understand the process.
After each prompt, I encourage you to continue conversing with ChatGPT before moving on to the next prompt, asking refining questions — just like you would with a human analyst. ChatGPT is omniscient but scatter-brained. You should constantly clarify and challenge it.
This Promptpack — and all our in-depth analyses and guides to AI — are made available to paying members of Exponential View who make our work possible.
Present your own experiments with generative AI at the upcoming members event on 27 July. Email me at email@example.com if you have a use case you’d like to share with us.
Let’s get started…
First, enable the Code Interpreter! It is on the ChatGPT settings page. You’ll need to be a ChatGPT Plus member to access the service.
We’ll then go through four key steps with our newly empowered chatbot. In each case, ChatGPT (or one of its cousins) will augment our own capabilities. The steps are:
Formulating the question
Identifying relevant data
Cleaning the data
Conducting the analysis
Step 1: Formulating the question
In this example, I wanted to understand recent democracy trends worldwide. I started with a prompt to start a discussion where you can further refine your research question using the SMART2 framework.
🔮 PROMPT 1: Could you help me make and refine my research question using the SMART framework: [ENTER YOUR QUESTION e.g. How has global liberal democracy changed since 2000, using some sort of quantitative measure]
CHATGPT: Your research question might look like this: "How has the state of liberal democracy globally, as measured by a recognized democracy index, changed since 2000?" This question is: - Specific: We're focusing on liberal democracy on a global scale. - Measurable: We're using a recognized democracy index as a measure of the state of democracy. There are several of these available, including the Democracy Index from the Economist Intelligence Unit and the Freedom in the World report from Freedom House. You might also consider looking at related factors such as indices of press freedom, corruption, or human rights. - Achievable: The achievability will depend on your ability to access and analyze these indices, but they are generally publicly available and used in academic research. - Relevant: This question is relevant to your stated purpose of assessing the state of global democracy. Time-bound: We're looking at changes since 2000.
Great! From ChatGPT’s feedback, I refined my question to “How has the state of liberal democracy globally, as measured by a recognized democracy index, changed since 2000?”
Step 2: Identifying relevant data
I now need the data for my analysis. For this question, I asked ChatGPT. However, if you wanted more up-to-date data, it would be appropriate to use Google Bard for this step since it has access to the internet.
🔮 PROMPT 2: Where can I find data to answer this research question?
CHATGPT: There are several reputable sources where you can find data to answer your research question on the changes in liberal democracy since 2000. Here are a few of them: - The Economist Intelligence Unit's Democracy Index: This is an annual publication that ranks countries by 60 indicators across five broad categories: electoral process and pluralism, functioning of government, political participation, political culture, and civil liberties. The full report can usually be purchased, and a free overview is often available on the Economist Intelligence Unit's website. - Varieties of Democracy (V-Dem): V-Dem provides a multidimensional and disaggregated dataset that reflects the complexity of the concept of democracy. It is one of the largest ever social science research-oriented data collection efforts, providing a detailed analysis of various forms of democracy and its components. - World Bank's World Development Indicators: If your research involves examining the impact of changes in democracy on factors like economic development, health, or education, the World Bank's World Development Indicators is a valuable resource. It provides a comprehensive set of data about development in countries around the globe. ...
ChatGPT provides several high-quality data sources relevant to the research question. From this list, I chose Varieties of Democracy (V-Dem)3 and the World Bank’s World Development Indicators to download data from.
Now I am looking for relevant indicators within these datasets for my research question. For this, I copied the list of variables and their definitions into Claude 2, since its 100k context window could handle reading a larger list of variable definitions than GPT-4 which is limited to an 8k context window. It then returns a list of variables within the dataset that it thinks are most relevant to my research question.
🔮 PROMPT 3: These are the X variables in my dataset: [ADD YOUR LIST OF VARIABLES AND THEIR DEFINITIONS]. Which are the most relevant to my research question [ADD YOUR RESEARCH QUESTION]?
CLAUDE 2: Here are the most relevant variables from the V-Dem dataset for analyzing how the state of liberal democracy globally has changed since 2000, as measured by a recognized democracy index: 1. Liberal Democracy Index: This index encapsulates the core features of liberal democracy, including electoral democracy, judicial constraints on the executive, and legislative constraints on the executive. It would provide a good overall measure of the state of liberal democracy globally. 2. Equality before the Law and Individual Liberty Index: This index focuses on equality before the law and individual liberties. It captures another key aspect of liberal democracy. ...
Before Code Interpreter, this would be the point where ChatGPT’s (and other LLMs’) functionality would stop. It would help you brainstorm and provide feedback on your code. Now, with Code Interpreter it can do the whole analysis for you. We'll now explore the process of cleaning the data, followed by using Code Interpreter to analyze it and gather insights.
We’ll now turn to the final steps of the process: the usually hideous process of cleaning data and the sublime acts of conducting our analysis and getting early conclusions. You with me? 😉