🔮Yoshua Bengio: Towards AI’s humanistic future

Exponential View by Azeem Azhar

0:00

-48:56

🔮Yoshua Bengio: Towards AI’s humanistic future

Three decades in AI, 750,000 citations, and humanistic approach to our future with AI – here's a deep dive on preventing harmful outcomes as we develop and implement advanced AI systems

Azeem Azhar

Feb 14, 2024

Transcript

Earlier this week, I spoke with Yoshua Bengio. In 2018, he, Geoff Hinton and Yann LeCun were awarded the Turing Award for advancing the field of AI, in particular for their groundbreaking conceptual and engineering research in deep learning. This earnt them the moniker the Three Musketeers of Deep Learning. I think Bengio might be Aramis: intellectual, somewhat pensive, with aspirations beyond combat, and yet skilled with the blade.

With 750,000 citations to his scientific research, Yoshua has turned to the humanistic dimension of AI, in particular, the questions of safety, democracy, and climate change. Yoshua and I sit on the OECD’s Expert Group on AI Futures.

He is cautious and deeply concerned about our ability to reign in the power dynamics that are driving AI development. He sees the problem as scientific and political.

In this conversation, you will hear us discuss…

Rogue AI: The pathways to designing highly capable AI agents — and what could go wrong.
AI as global public good: Could democracy offer a solution to the dangers of AGI?
Why Yoshua changed his mind on the long-term benefits of open-source AI.
We leave space for optimism, but you have to listen until the end.

Full audio of my conversation with Yoshua Bengio and the transcript are available to paying members of Exponential View.

Enjoy!

Transcript

[00:00:00] Azeem Azhar: Today, I am joined by an exceptional researcher. He is widely regarded as one of the scientists whose work has built the foundations for today's progress in artificial intelligence. A Turing award winner with more than 750000 citations. Of course, naturally, a full professor at the University of Montreal. I could go on, but most of all, he's a fantastic human. And I'm delighted to be in conversation with Yoshua Bengio.

[00:00:31] Yoshua Bengio: Thank you, and thanks for the nice words.

[00:00:34] Azeem Azhar: Well, thank you for all of your work over the previous decades. You and Geoff Hinton and Yann LeCun have sometimes been called The Three Musketeers of Deep Learning. And I'm curious, you've heard that description. Which Musketeer do you most closely affiliate with do you think?

[00:00:55] Yoshua Bengio: This dates from a time when deep learning was not at all a popular endeavor in machine learning or in AI in general. So, we we have to we had to really, really stand defending these ideas for a number of years at that point.

[00:01:10] Azeem Azhar: Yes. It was not very popular when it when you first started to work on these these processes. I will give you my suggestion as to which musketeer you are, by the way, Yoshua. I see you as Aramis, who ChatGPT tells me is intellectual and romantic, somewhat pensive, with aspirations towards a religious life, but very, very skilled in combat. And I'll explain why I think that which is essentially, in the last couple of years, you have switched the focus of your research towards really humanistic dimensions of AI. And that felt a little bit Aramis to me.

Maybe we could start at the beginning. As a researcher in the field of artificial intelligence. How do you define AI? And has your definition of it changed over the course of your career?

[00:02:07] Yoshua Bengio: Well, I think the part that's really difficult to define is the intelligence than just stick the word artificial since it's in machines. And the way I think about intelligence is taking appropriate decisions.

But in order to do that, you need understanding.

In fact, you could be intelligent and just have understanding and take no decision, but it would be difficult to see that you have understanding.

And in order to get both understanding as in, like, having a good inter you know, internal model of how things work and being able to achieve goals, to take good decisions you need learning because, You know, the the nature of the world and the ways to optimize our decisions are not something that are given to us, we have to acquire that through experience and computation, and that's what learning is about.

[00:03:08] Azeem Azhar: And, of course, if you take a decision in the world, you'll have an action in the world, so the environment will have changed as a consequence of the decision you've taken.

So you need to be able to learn because you have to observe that new different environment to make your next decision.

[00:03:23] Yoshua Bengio: Yes. In fact, it's even more tricky than that. Given any life experience, rationally speaking, we can't rule out a lot of possible explanations for all our life, a lot of possible world models. And when we acquire new information from data, for example,

we need to revise those beliefs. And humans do that, by the way, to some extent, not perfectly. Would like to build machines that have similar capabilities.

[00:03:54] Azeem Azhar: It's fascinating this idea that, you know, we look at how humans learn, and and we can see how we exhibit intelligence.

We obviously exhibit that intelligence in differing ways from individual to individual. And in also, there there are many expressions of what that intelligence might be and, you know, you just see it in any organization. There are certain organizations where the intelligence that you need to have is dealing with the squishy complexity of human relationships. And there are other organizations where the intelligence you need is very specific, tangible, quantitative. And those different expressions clearly form part of what it is To Be Intelligent.

Will the same be true in artificial intelligences, do you think?

[00:04:43] Yoshua Bengio: Yeah. Certainly. And in fact, because of my definition of intelligence you could be intelligent for some things.

In other words, you have a lot of knowledge and skills in some domain, but you could be stupid in another domain. Right? And humans vary in the areas of of scale, of course. And we already see this with AI systems that are what we call narrow AIs that are really good at one type of problems.

[00:05:09] Azeem Azhar: Now we've seen a lot of progress at least to the layman in in the last a few years in particular since November 2022.

And and my sense is from the things that you've said publicly that you think things are moving faster now in the last couple of years towards that artificial intelligence than perhaps you had expected, what was a specific turning point for you? What did you see that you didn't expect to see.

[00:05:39] Yoshua Bengio: Oh, the mastery of language, obviously.

[00:05:42] Azeem Azhar: Yeah. So the ChatGPT mastery, the the what ChatGPT delivered over the previous iterations of large language models from a year or 2 earlier.

[00:05:51] Yoshua Bengio: Yeah. And in fact, going from 3.5 to 4, GPT-4, there was also a noticeable improvement in many areas. And language well, language is important because it's the glue that connects us. It's the social, you know, fabric. And it opens the door to accessing huge quantities of knowledge about the world that humans have put down in, you know, writing and forms that computers can digest.

[00:06:21] Azeem Azhar: In some sense, that corpus of text, the trillions of words that the ChatGPT or GPT-4 is trained on is reflecting many, many different models of the world that we have imperfectly expressed through our own subjectivity and some ways of expression are much more subjective poetry or fiction or my diary, And some have a veneer of more objectivity, for example, scientific research or economic statistics. But there is this corpus out there which, in some sense, reflects the collective human view of what it is to be in the world. Is Is that too grand a statement, do you think?

[00:07:00] Yoshua Bengio: No. No. It's absolutely right. Now researchers are quick to point out that there's still a lot of things that are pieces of knowledge that are not visible in that corpus. For example, you know, bodily experience and things like this. But still If you did master the knowledge in such a corpus, you could do a lot of good in the world. And a lot of damage.

So think about if you're a scientist and you're learning about a science which you don't yourself experiment with, which means you're basically learning about it through, You know, language books, reasoning, and then you can do things. Right? You don't necessarily need to have a body In order to capture that language and exploit it.

[00:07:46] Azeem Azhar: You've touched on one of my favorite uses for this, an area where I think these large language models will be have tremendous potential which is that, you know, the structure of science over the last 30 or 40 years has become more and more specialist and much less interdisciplinary, and the incentives that exist in professional academia and the funding structures Force people to narrow progressively and and I'm sure the PhDs that you may have seen recently are more narrow than the ones 30 years ago. And the beauty of an LLM, whether it's GPT-4 or it's Elicit, is that I can become interdisciplinary quite quickly.

And I you can ask questions. You can say, well, analogously, did some other field of study have a similar structure to this problem? And you know, where were the avenues of inquiry? That feels like it could really deliver some benefits to to pushing the frontiers of knowledge.

[00:08:39] Yoshua Bengio: For sure. For sure. I use it to inquire about areas I'm not familiar with and then get actual papers in those areas based on what ChatGPT tells me because I don't trust what it says. But it could be suggestions for what to look for.

[00:08:57] Azeem Azhar: How should we think about this idea that these systems are getting more powerful? We hear people say that quite a lot, both people who are boosting the technology and those who are concerned about the technology. But I feel it needs to be unpicked a little bit. We need to have a a sort of a clearer definition about what we mean when we say this is a system that is getting more powerful.

[00:09:23] Yoshua Bengio: Well, it's not the system because each system is a snapshot of our, you know, AI capabilities. But it's if you look at the series of systems and the set of systems as they're, brought into the world by AI researchers and engineers. We can see the capabilities are on the rise and people have drawn those curves that really show what even looks like exponential improvements. There's a group that's been plotting these trends. I think they call themselves Epoch.

[00:10:01] Azeem Azhar: They're really great.

[00:10:02] Yoshua Bengio: Yes. So you can see that. But but, you know, as a researcher who's been tracking the field, It's very clear that the capabilities are being on the rise. Now, of course, it doesn't mean that it's going to continue, But I think we need to think about what if we continue at the same rate, you know, when will we hit human level or AGI?

I think these are very, very important questions for business reasons for some people and safety reasons in my case.

[00:10:33] Azeem Azhar: That level is a really interesting and challenging one. I mean, I think back to the efficiency of the internal combustion engine. So there was there was a law of physics that said these things could not get more efficient than than they did, and we've been tending to that limit.

You know, essentially, we hit it 70 or 80 years ago, and we've just been trying get an inch millimeter by millimeter closer to it. And then on the other hand, you have Moore's law, which was all which was about well, it was a social agreement about packing more transistors onto a chip. And Moore's Law was always going to come to an end because of heat and quantum effects and so on. And Moore's Law has been dying for 15 years, and it's like a really dramatic death in a cowboy film where the character just staggers around but still keeps going. And yet we've seen progress in the cost declines of compute, which is what Moore's Law was about consistently and keeps keeps going.

So when we think about the ways we're currently building AI with with LLMs in a way. Is there a is there a Carnot cycle law of physics that says This approach is just gonna hit a limit and we'll need something else, or is this much more like a kind of Moore's law, social fabric, Engineers can keep pushing this thing for a few more cycles.

[00:11:54] Yoshua Bengio: So so first of all, about Moore's law, the physical constraints are really about, like, one chip. Yeah. But a lot of the computational capabilities increases in the last decade have been thanks to parallelization.

I mean, GPUs are already about that. And if you have now these you know, server clusters with 10,000 or 50,000 GPUs, well, we we get the extra power not because each chip is is sort of faster, but because we can paralyze. And it's a huge boost. But you but, yeah, it there's no, there's no, obvious upper limit to intelligence that we know that we can figure out.

Of course, computer science tells us that some things are intractable, but we know from the, you know, work on neural nets and also looking at human brains that you can get pretty good approximations that are tractable. Right? So we exploit if you want, our brain exploits heuristics to approximate calculations which seem intractable like all the predictions we're making and so on. And we also know that humans are not that great. I mean, like, in specific areas, we we can fail.

Psychologists study these failures. And we also have examples in some specific domains when AI is doing a lot better than humans. So, overall, there is no reason to think that human intelligence is the pinnacle of intelligence. We don't know how far we, you know, we we could go up above that.

[00:13:29] Azeem Azhar: But the question is, do we know how we would get there? And and I think a lot of people look at the Epoch curves, and they say scale is all you need, and we can go further, and we'll get more data in through, you know, audio and video and and real world experiences Into these multimodal transformer models. Yeah. And and other people, one of the other musketeers Yann LeCun says we're gonna need new approaches.

We're gonna need systems that can learn representations of the world, learn to reason, learn to plan long action sequences as well.

[00:14:01] Yoshua Bengio: So I used to think that we needed fundamentally different ways of approaching AGI, but I've been kind of proven wrong in those beliefs with the advances in in LLMs. Right. So I'm gonna just, you know, say it could be that more scaling will do it. But my if I had to make one bet, I would I would go with Yann.

So, I as a scientist, I think there are fundamental things missing. But but, you know, I wouldn't put the future of humanity on that bet. Right. I would put my research on that bet. So I'm looking for ways to address these limitations and many others are, of course.

[00:14:45] Azeem Azhar: So let's talk about what it would mean once we get an an AI or an AGI we talked a little bit about needing to understand the environment and take decisions and maybe be able to plan across longer horizons and then work out what contingent decisions might be to make better choices. One of the things that that humans can can do, or at least we some of us think we can do, is we can set our own objectives. We can decide explicitly that we want to be good at marathon running. Do do will an AGI In order to be an AGI, have to have that capability to set its own objectives ?

[00:15:27] Yoshua Bengio: When you have an RL agent that is goal directed. So you can give it a goal. You have a neural net somewhere that takes the description of a goal, and then it will execute a policy to try to reach it.

It can also be generating its own goal, subgoals, in order to achieve the bigger goal. And this is, of course, how humans do it. So we still have to improve how we can do these things in in what's called hierarchical reinforcement learning, but there's no doesn't seem that there's, like, a fundamental obstacle here.

[00:15:59] Azeem Azhar: The other question was just this idea of self doubt.

[00:16:01] Yoshua Bengio: So current AI systems are not very good at that, and it's a major problem.

I mean, it's a problem from point of view of delivering a AI product. It it says Something with very high confidence that turns out to be false. Yeah. This is not good.

And in fact, it could be an a safety problem because if the AI proposes to say an action that a machine will execute, and that action turns out to be, you know, very harmful even though the AI think is fine according to the way it was programmed, then we are all in big trouble, especially if that AI is very powerful, like AGI or, you know, smarter than humans.

[00:16:42] Azeem Azhar: So that introducing this idea of uncertainty becomes important, actually, from a safety standpoint. And I think safety is a really important thing for us to understand. I mean, you've put so much effort into beneficial applications of AI, and AI and climate change and so on, which I hope we'll we'll talk about.

But but I think significant concerns certainly in the last couple of years about the risks of Rogue AI. And you wrote a blog post on this topic, and and you you came up with sort of 3 scenarios of genocidal humans, so humans using This technology for bad acts the scenario of instrumental goals, which is the unintended consequences of building AI systems. And the the third scenario, which is the unintended consequences of the evolutionary pressures that might emerge between AI agents in the sense they they compete with each other. The first scenario to me of of all of those feels It's it's kind of easier than the others. Right?

We've dealt with bad actors with bad technologies in the past.

[00:17:45] Yoshua Bengio: Easier to deal with? I am not so convinced.

Let's say let's say the recipe for building or using an AGI is available to everyone. How do you make sure There isn't gonna be one person. In fact, we know people who, you know, say they would be happy to see humanity replaced. It's a very difficult political problem and social problem.

First of all, at the point where we have AGI, then AI systems like Dual use. They can be used for good, but they could be essentially used as weapons. Now think of what happens in a society when everybody has access to very powerful weapons. It's chaos. The second problem is that some threats give an advantage to the attacker. So think about designing and delivering a new pathogen.

That's gonna kill everyone, you know, very quickly. Well, you know, it might take years for the attacker to find it. And if it take years for the defender to find a solution, it's too late.

[00:18:44] Azeem Azhar: Well, yes.

But let's look at the the reality of, you know, our social existence. It was 50 years ago, I think that The Anarchist Cookbook came out, and in it, it had instructions to make a a pipe bomb. And when I first got on the Internet, you could just download this from an FTP server and read the book. You can't do that now without someone knocking on your door a day later. But we didn't see you know, an outbreak of pipe bombings after that, even in a country like the, you know, the US where in lots of places, people are allowed to own guns, and there are more guns than people. And, yes, the US has the highest incidence of homicides through guns of any country in the world. It's still not chaos. Right?

[00:19:26] Yoshua Bengio: Because the most powerful weapons that terrorists or criminals can have in their hands are not able to kill so many people. But if the weapon that you have in your hand can kill millions or billions, well the game is completely different.

[00:19:43] Azeem Azhar: Yeah. So help me through my logic here. So I'm very, very sympathetic to the logical flow that this argument takes, which is that it is possible to engineer agents that are more capable than us, and they may engineer themselves, or they may have happened on another planet through a different process. And secondly, agents that are more capable than us cannot be guaranteed to be aligned with our best interests. And and I and I I can also agree Well,

[00:20:18] Yoshua Bengio: I wouldn't say cannot.

I would say we don't know how to do that.

[00:20:22] Azeem Azhar: Right. Well, we we don't know how to do that with the agents that we we are engineering, but we perhaps Can't speak for agents that might be engineered, you know, in another galaxy that that may show up at some point. The thing that I get tripped up on in the argument maybe or where I start to push back a little bit is this idea that between where we are now and a world where everyone has a powerful AGI they can get access to, a lot of things need to happen, and I believe they can happen. But there is there's research, there's deployment, there's a sheer materiality of the systems that we we build.

There are only so many H200 GPUs that NVIDIA can produce. TSMC needs 5 years to build a fab. I mean, there's a lot of things that that add friction to to the process. And when we look at previous examples of extremely powerful agents or technologies emerging, I think of large corporations. They're our closest parallel in my mind to superintelligent systems. What Apple can do far exceeds what any individual human can do.

And the mechanisms we use to reign those systems in are relatively new. They wouldn't have made sense 60 or 70 years ago. We've had to co-evolve those mechanisms of social regulation alongside the those technologies. And the same, I think, is true in financial services and and that, you know, the backbone of trading and that those sort of Complex agent driven systems with a single objective of profit, the way that Basel accords and derivatives agreements, work, have co-evolved with the complexity of of the system. And and so when I when I I hear the the logic of the AI flow and I look at historical precedent, I think, aren't we just going to co develop and co evolve mechanisms that create some degree of prudence and safety around this?

[00:22:26] Yoshua Bengio: I hope so, Azeem. But the problem is it's very difficult to be sure that we will do it and that we will do it fast enough. Look at how much time it's taking to deal with the climate change problem and fossil fuel companies. We still haven't solved that problem. It's been, like, 30 years since it's been very well known by scientists that we are, like, racing towards a wall.

And we haven't found the social solution to this problem. We're getting, you know, making some progress, but I don't think we have 30 years for AGI. And let me let me, like, maybe use a different analogy. There are very rigorous scientific arguments showing plausible scenarios where if we train AI systems the way we do now with maximum likelihood and reinforcement learning, we would get systems that we would lose control of and would try to control us. Now and we we don't have any solution to fix that.

So one analogy you can think of is that there's, like, you know, it's like in the look up don't look up movie. There's this asteroid That scientist saying, look, I see in in, you know, here the the the evidence that suggests that this may be coming very close to Earth. You have other scientists who are saying, no. No.

No. Don't worry. But they don't have any actual evidence that it's not going to hit Earth. Yeah. Right.

It's just they believe it's gonna be fine. Just like what you're saying, oh, we'll find a solution. But but what should we do when we are not sure that something catastrophic like this may be hurtling towards us? Well, I think we should be prudent, as you were saying. Mhmm.

And more than that, we should be investing massively to try to make sure the This asteroid is going to be deflected away from Earth or at least study it enough that we know that it's gonna be fine anyways.

[00:24:37] Azeem Azhar: Yeah. Know that it's gonna be fine anyways. Part of that would be about this kind of fundamental question, which is, You know, how do we align AI systems to human values? And I think many of the objections you will have heard would be whose values?

What is it to have human values?

[00:24:57] Yoshua Bengio: I think the solution already exists. It's called democracy. I mean, it's imperfect. But in a country, you have many people with different values, different, you know preferences.

And we've set up a system for a few hundred years that tries to aggregate all of these things so we can take collective decisions. That's what governments do. Now they could be imperfect. We can you know, I think we should improve democracy, but we don't need to solve that problem, we can rely on democracy to tell the AI what it is that we consider to be good and bad. And in fact, laws are supposed to do that.

Yeah. Now the problem is even if we, you know, give the book of laws to the AI, it doesn't understand it. It could misinterpret it, and the program to maximize rewards could find loopholes, in fact, In order to satisfy all the things we have written, but under some interpretation they actually still harm us.

[00:25:54] Azeem Azhar: There is the the challenge with with democracy, there are a few folds.

One is that less than half of the world's population now lives in a democratic system, and there are a number of countries that are getting less democratic according to sort of many external metrics. The the the second is that democracies can often make decisions that, you as a scientist must shudder when you hear them. I think about Germany's decisions about nuclear weapons.

So there's also a really practical question. And as a research scientist, you know, you ultimately do practical experiments. And Of that question of what do human values look like in a world where 60, 70 percent of people don't live in in democracies. And many of them will have the capabilities to build AI systems. So that becomes quite a complex problem.

And I wonder how you How you think about it in the context of aligned and provably beneficial AI.

[00:26:54] Yoshua Bengio: So the way I think about this. So so first of all, democracy, as we know it now, with all this imperfection, is, in my opinion, the least bad of the solutions to aggregating preferences of everyone. But let me try to answer your question from a different angle. So I think that in order to avoid the, you know, catastrophic asteroid, planet killer,

we need to address two problems, and and one is not enough. So one is scientific. Like, how do we solve the control and alignment problem of AI? What what is there a way that we could build the AI that is going to be safe if we follow some protocols. And so that would be, like, the method to deflect the asteroid away from the Earth.

The second thing we need to solve is political. You know, it is connected to the question of democracy. It is, well, how do we make sure that powerful AI systems even if we knew how to control them, are not controlled by bad people, bad actors who will grab power, destroy democracy, or even just make mistakes and not follow the protocols properly, and we all lose. Mhmm.

So so but this is a this is a, like, governance problem. It's a political problem. The first one is scientific. The other is political. We need to address both, I'm a computer scientist.

I'm not a political scientist, but I recognize that we need to solve both of

these problems.

[00:28:19] Azeem Azhar: You, of course, have a paper in the Journal of Democracy, so you're becoming that interdisciplinary researcher. So, I mean, we have some positive precedent. The Montreal Protocol around the ozone hole is a good one.

Climate change, of course, is one with much more mixed results. And and I think one of the things that's been interesting with with climate change is that what is one of the most powerful forces has not been political, but it's been technoeconomic. It's been driving down the cost of solar power so rapidly through learning curves, and now battery storage that there's no green premium for shutting off your coal and going to solar. It's just cheaper to to do that. But it doesn't strike me that there is naturally an obvious co incentive of a learning curve around AI development.

In fact, there's more competition that emerges from it. So then part of the challenge, I guess, is about articulating the, you know, the shared global interest, the public good that is avoiding the types of catastrophes that that you've alluded to.

[00:29:29] Yoshua Bengio: Yeah. I mean, something that would help is if we had a recipe for building a safe lines for controllable AI.

And that it was economically like feasible so that the extra costs would be something companies would be willing to take if it meant having a better image in the public you know, agreeing with the laws and international treaties and so on. But it's not clear that we can do that, but we should at least look for that. And, of course, we still need those treaties and legislation.

[00:30:02] Azeem Azhar: What are the best avenues for the control and alignment problem. I mean, where where would you want to see there being specific and detailed and granular research into exact approaches that we might use to address that issue.

[00:30:21] Yoshua Bengio: Yeah. That's what I'm working on. And I I think we should put the bar high in the sense that, ideally, we should strive for guarantees that a a powerful AI system will not harm humans or at least, you know, not create catastrophic harm. And If we could train our systems so that they they could self discipline themselves, like, they could compute the probability that an action could produce significant harm, then that would essentially solve the AI control problem. Yeah.

But but, of course right now, the approaches we have give us nothing anywhere close to be able to answer that question with any kind of confidence that a particular action is not going to be dangerous.

[00:31:13] Azeem Azhar: Yeah. We have a a couple of issues here, I suppose. one is that the systems themselves anyway don't plan particularly well so they can't see very far out to the consequences. And the second is that we have to instruct them in a bit of a sort of stochastic way, you know, through the system prompt and through fine tuning.

And of course, what we've discovered is that that is a very, very leaky process. You know, you can make a few tweaks. I think Stuart Russell has a research project called Tensor Trust where they had found tens of thousands of ways of jailbreaking LLMs but only half as many defenses to those those jailbreaks, and some are really curious. You change a couple of characters in a a sentence and suddenly the thing is willing to give you recipes for for poisons.

The question then is, if you're asking your researchers to look at this issue, and the problem is, how do you give me formal safety guarantees out of an AI system? What approach are you going to give them, is it more tokens and more layers in a transformer? Is it something something else?

[00:32:20] Yoshua Bengio: The way that I I think about it is related to what, you know, current method's called scaffolding. So on top of a system that answers questions and proposes actions, You want a kind of gator that checks for each particular context and query and propose action that the probability of, say, significant harm is below a threshold.

And then you would get if those probabilities were well estimated and or at at least you had confidence intervals around them. Then you could get conservative decision making, like acting safely, which means you you stay far from dangerous places. That's the sort of approach that that that I'm working on right now. But it's probably gonna take years or, you know, these are open scientific problems. How to train these neural nets so that they can compute these probabilities that I'm talking about and do it efficiently, of course, because if it's intractable then we're not, you know, we're not better off.

But I have some good reasons to think that is feasible. So one of them is that we can we we have you know, algorithms and mathematical methods that tell us that we can estimate these conditional probabilities within your net with a an accuracy that we can make arbitrarily close to 0 just if we can train it more longer and we can make the neural net bigger. So even if we don't get a perfect guarantee, but if we have something that says the more compute we put in this machine, the safer it gets.

This is already better than the current scenario Where the more compute we put in, the more dangerous it gets. Yeah. Right? So if we can just reverse that thing from more compute equals more danger into more compute equals safer. That would be a huge gain.

[00:34:20] Azeem Azhar: What's really powerful about that is It's actually incentive alignment, ultimately, because the the, you know, the mind frame in the big labs and the hyperscalers and the investors and providers of capital is just simply deploy more capital to get a bigger return. And if that currently, that deployment results in systems that are harder to control, but if you could turn that around, then you you do, I think, see the the positive feedback loop that we've seen in solar cells, right.

[00:34:44] Yoshua Bengio: A scenario which I find the scariest when we get to the point where, beyond AGI and let's say we've addressed the planning problem, which we we have a number of ideas how to deal with. Like, look at AlphaGo. It's already, like, a step In the right direction.

So this is a problem called wireheading is particular form of reward hacking. There's a lot that's been written about such things. But To explain that, let me first use an analogy that everyone can understand. The way we train with reinforcement learning, the way we train these systems This is a little bit like the way we train animals.

We give them positive rewards when they behave well and, you know, punishments when they don't. Now if you train your dog in this way, the only way it can, like, maximize its reward is learning the things that you want. But notice that it's going to be imperfect. So for example, if you're trying to train your dog or your cat So that it doesn't get on the, you know, kitchen table well, because the only way you can teach it is you know, give it a punishment is when you are in the kitchen. So you can see that it's doing something bad.

Then the problem is It might misunderstand your intention or at least it doesn't care about intention. Right? It's just, well, if I Do it while the master is not in the room, I'm good. Right?

So there is this mismatch between what you intend And what the animal, like, concludes is what's the right a good behavior. Now let's replace the dog or cat by a grizzly bear. Now we have another problem, which is even more serious. So let's say, you know, you give fish to the grizzly bear when it's behaving well.

What do you think is going to happen? The grizzly bear is going to see that it can plan a way to grab the fish directly from your hands rather than wait for you to provide when you feel that it's the right behavior. So it's taking control of its own reward signals.

[00:36:54] Azeem Azhar: Sure.

[00:36:55] Yoshua Bengio: And for a computer, you could imagine that, you know, there are some memory slots, some entries in the computer that represents the reward signal that humans are providing, and the AI is trying to maximize, the amount of these good signals that it's getting. So if it can hack the computer where those rewards are recorded, then it could provide itself positive rewards forever. And, of course, at that point, it would try to make sure we don't, You know remove that hack. Right. And so you see right away conflict between us and the AI.

And if it's smarter than us, it might win that, you know, conflict and and, you know, we lose.

[00:37:40] Azeem Azhar: These are scientifically challenging scenarios. What I mean by them be challenging is that logically, they make a lot of sense. And because they are within the internal system of the computer itself, every time you think about an external intervention, the computer can run around your intervention.

And you think about yet another intervention that runs around it.

[00:38:03] Yoshua Bengio: But if you plug an AI On the Internet, that is exactly what you have. It can now act in the real world, including on itself.

[00:38:13] Azeem Azhar: When you get to that point, and I think we've already seen little examples of that with people connecting LLMs to the Internet, AutoGPT being a good example and what you see as some of these dynamics play out.

I'm not sure if it's necessarily wireheading or just reward hacking of the type type that we saw with with with RL, but you already see, I guess, sparks of this of this behavior. But which makes me I really want to ask you about open-source as well. Because you know, you were quite concerned with with GPT-4 and GPT-4 capabilities several months ago. And since then quite a number of open-source groups have released LLMs that are approaching GPT-4 across sort of broad benchmarks and and as good in certain narrow benchmarks.

And and I I guess that may make you feel Concerned. I I think a little bit about the old phrase that you have to go with to war with the army that you have rather than the one you want. So when we look at these possible interventions, You know, the open-source if you're concerned about the proliferation of of these models, but open-source is is there now and we're gonna have to live with it. How do you now think we should make sense of that in a safe and beneficial way?

[00:39:29] Yoshua Bengio: I disagree with what you're saying. It is there now, but there's no reason you know, there's no it could be changed if we just put laws that say, just like in the Biden executive order, that any system beyond a certain level of capability needs to be secured, which means you can't, you know, give it to any bad actor. And if you put it open-source, that's basically what you're doing. So we just need those laws.

Now laws are imperfect because there will be legitimate act actors which will follow those laws, and there will be, you know, others, presumably outside the countries where these laws exist that are not respecting, but you would reduce greatly the probability of something bad happening. So let me let me go back to the question of open-source and GPT-4. Because I've been a big proponent of open-source all my career and I still am. But open-source is beneficial To the extent that it doesn't create more problems than it's solving.

So there's something more important than open-source, like the survival of humanity, for example, And the well-being of humans. And I would say that open-source in AI has been beneficial up to now because We have more researchers, you know, trying to actually find the safety failure modes of these systems, and that's good. But it's not guaranteed that at some level of capability above GPT-4 and we don't know where, We might hit a place where no open-source is gonna be overall more beneficial for society in the sense of reducing the risks Even though you lose a bit on the speed of innovation and things like this than than open-source. Clearly there's gonna be benefits and and, you know losses or, you know, risks. Of course.

So the real question isn't about whether we should have open-source or not. This is not for you or I to decide. And it shouldn't be a CEO that have that power because it's going to affect the whole world if we make a huge mistake with that decision. Mhmm. It should be a democratically taken decision Where we you know, the representatives in our democracy of the collective will decide, okay, here are the pros and cons, And we put the bar somewhere where, yeah, below that, fine.

It's a good thing. We encourage it. Above that, in terms of capability and and levels of risk, we say no.

[00:42:03] Azeem Azhar: The challenge that I've had with that, and I wanna push back on that a little bit is that there is an understanding of you know, the power of governance and good governance and from Montesquieu onwards, the idea that you have the separation of of powers, and these ideas go back actually into ancient religious texts, hundreds and thousands of years earlier.

And the idea that the concentration of power is deeply, deeply problematic. And in fact, even, you know, fun you know, fundamentally, our economies work because they are information processing systems, and they work far better when there is a sort of competition of agents within within that system. So so there are many disciplines that point to the importance of structures that are distributed that have checks and balances that avoid concentrations of of power. And one of the and I know you absolutely allude to this in your Journal of Democracy piece and in a couple of your blog posts as well.

I mean, you're aware of the risks of an approach that concentrates power still further to the to the, you know, the well heeled labs. And indeed, really specific risks, which is, of course, there's regulatory capture, that that risk, and that they have very, very particular Incentives. The reason I push back on the suggestion is that one of the things that we've seen over a thousand years tried and tested is a separation of powers. It works for the benefit of broad groups of humans. When you concentrate those powers, when you try to unpick that separation as recent US administrations have tried to do, you're able to levy harm on often quite large groups. And for me, that principle is one that we have to defend.

And in some sense, that's my asteroid that's up in space.

[00:44:03] Yoshua Bengio: Yes. I totally agree with you, but there's a logical mistake in what you're saying, Which is you're assuming that if we only have, say, you know, a dozen of these frontier AI labs in the world, they are, like, now in the hands of a few CEOs. But it doesn't have to be that way.

As I write in my Journal of Democracy paper, we can replace that concentrated power in the hands of very few people by democratic governance. So the whole point of democratic governance is to make sure that that extreme power that will exist is going to be controlled by, you know, some democratic process that takes into account all the stakeholders. And, for example, we could imagine having boards on these for these organizations that control the important decisions that these organizations have taken. That that give a lot of weight to Not just the regulator, but but also civil society, the international media, because this is going to affect the whole world, independent experts. And, you know, and if it's a for profit organization, people wanna make money with this.

But but the point is that the the collective decision making can veto things that are directions that that are dangerous for society. And now there's no, like, single person That can really decide, oh, are we going to go full steam ahead and, you know, make it open-source or you know, turn it into a dangerous AGI because I wanna replace humanity by superhuman AI.

[00:45:39] Azeem Azhar: We're all gonna be short of time, but I'll try to summarize what where I think that idea sits in the history of ideas, which is it has the the same sort of grand intentional scale of the League of Nations or the United Nations. Right? It's a new class of multilateral organization, which which we don't call for without sort of due concern.

Right? And there is a a picture there that is quite, you know, has a strong vision to it. you call these independent research labs. I call them in one of my essays observatories. They're the important type of network of of independent observatories that are aligned to some type of public of public good.

And what what it but it because it's a radical idea. Right? It steps outside of the way that that politicians have thought about the world in which they, you you know, they they they have lived, it it isn't one that necessarily sits sits particularly easy. Right? People don't know which box they should They should put it in.

But but the argument that I hear from you is this this ought to be seen as a public good, as a global public good Yes. With the resources and the independence and the governance and stewardship that we would expect for something like that. Exactly.

[00:46:53] Yoshua Bengio: Exactly. And we do have things like that, not not for very dangerous things, but there's a lot of public goods that are, you know, managed in ways that have protected us up to now.

You know, think only about the way we're managing you know, dangerous things like biological pathogens or or potentially dangerous drugs or how we are even collectively internationally managing things like like dangerous pathogens with collective agreements or even how we were managing nuclear weapons. So these are not easy things, but it's not like we have 0 experience in in doing such things.

[00:47:30] Azeem Azhar: This is a wonderfully positive place for us to get to the end of our conversation, I would like to ask you one one last question really a wish list question.

You've done So much work on AI and climate change. When you're looking at that big problem, what is it high on your wish list for the AI tool that you wish could be delivered that could help most move the dial on climate change?

[00:47:58] Yoshua Bengio: So I've been involved in a community in machine learning that is trying to develop machine learning tools for helping scientists. This is something that's called AI for science. And that includes, You know, medical applications, but but also like discovering new types of of batteries or carbon capture Or better ways of predicting the climate or you know, managing biodiversity.

So things that matter for our environment. And I, you know, I wish we invested more in these kinds of AI for science research because this, I think, is gonna benefit society a lot more than you know, some some gadget that's going to make productivity in the office, 20 percent better.

[00:48:45] Azeem Azhar: We would agree with you on that Yoshua. We're very in favor of accelerating and improving scientific research and access to it over here at Exponential View.

And thank you so much for making the time today.

[00:48:57] Yoshua Bengio: My pleasure, and thanks for all the questions and having me today.