Holden Karnofsky - Transformative AI & Most Important Century

Dwarkesh Podcast

1×

0:00

Current time: 0:00 / Total time: -1:56:10

-1:56:10

Holden Karnofsky - Transformative AI & Most Important Century

AI, Progress, Digital People, Ethics, & a $30m grant to OpenAI

Jan 03, 2023

Holden Karnofsky is the co-CEO of Open Philanthropy and co-founder of GiveWell. He is also the author of one of the most interesting blogs on the internet, Cold Takes.

We discuss:

Are we living in the most important century?
Does he regret OpenPhil’s 30 million dollar grant to OpenAI in 2016?
How does he think about AI, progress, digital people, & ethics?

Highly recommend!

Watch on YouTube. Listen on Apple Podcasts, Spotify, or any other podcast platform. Read the full transcript here.

Timestamps

(0:00:00) - Intro

(0:00:58) - The Most Important Century

(0:06:44) - The Weirdness of Our Time

(0:21:20) - The Industrial Revolution

(0:35:40) - AI Success Scenario

(0:52:36) - Competition, Innovation , & AGI Bottlenecks

(1:00:14) - Lock-in & Weak Points

(1:06:04) - Predicting the Future

(1:20:40) - Choosing Which Problem To Solve

(1:26:56) - $30M OpenAI Investment

(1:30:22) - Future Proof Ethics

(1:37:28) - Integrity vs Utilitarianism

(1:40:46) - Bayesian Mindset & Governance

(1:46:56) - Career Advice

Transcript

Dwarkesh Patel All right, today I have the pleasure of speaking with Holden Karnofsky who is the co-CEO of Open Philanthropy. In my opinion, Holden is one of the most interesting intellectuals alive. Holden, welcome to the Lunar Society.

Holden Karnofsky

Thanks for having me.

The Most Important Century

Dwarkesh Patel

Let's start off by talking about The Most Important Century thesis. Do you want to explain what this is for the audience?

Holden Karnofsky

My story is that I originally co-founded an organization called GiveWell that helps people decide where to give as effectively as possible. While I’m no longer as active as I once was there, I'm on its board. It's a website called GiveWell.org that makes good recommendations about where to give to charity to help a lot of people. As we were working at GiveWell, we met Cari Tuna and Dustin Moskovitz. Dustin is the co-founder of Facebook and Asana and we started a project that became Open Philanthropy to try to help them give away their large fortune and help as many people as possible. So I've spent my career looking for ways to do as much good as possible with a dollar, an hour, or basically whatever resources you have (especially with money).

I've developed this professional specialization in looking for ideas that are underappreciated, underrated, and tremendously important because a lot of the time that's where I think you can find what you might call an “outsized return on investment.” There are opportunities to spend money and get an enormous impact because you're doing something very important that’s being ignored by others. So it's through that kind of professional specialization that I've actively looked for interesting ideas that are not getting enough attention. Then I encountered the Effective Altruist Community, which is a community of people basically built around the idea of doing as much good as you can. It's through that community that I encountered the idea of the most important century.

It's not my idea at all, I reached this conclusion with the help and input of a lot of people. The basic idea is that if we developed the right kind of AI systems this century (and that looks reasonably likely), this could make this century the most important of all time for humanity. So now let’s talk about the basic mechanics of why that might be or how you might think about that. One thing is that if you look back at all of economic history ( the rate at which the world economy has grown), you see acceleration. You see that it's growing a lot faster today than it ever was. One theory of why that might be or one way of thinking about it through the lens of basic economic growth theory is that in normal circumstances, you can imagine a feedback loop where you have people coming up with ideas, and then the ideas lead to greater productivity and more resources.

When you have more resources, you can also have more people, and then those people have more ideas. So you get this feedback loop that goes people, ideas, resources, people, ideas, resources. If you’re starting a couple of hundred years ago and you run a feedback loop like that, standard economic theory says you'll get accelerating growth. You'll get a rate of economic growth that goes faster and faster. Basically, if you take the story of our economy to date and you plot it on a chart and do the kind of simplest thing you can to project it forward, you project that it will go that our economy will reach an infinite growth rate this century.

The reason that I currently don't think that's a great thing to expect by default is that one of the steps of that feedback loop broke a couple hundred years ago. So it goes more people, more ideas, more resources, more people, more ideas, more resources. But, a couple hundred years ago, people stopped having more children when they had more resources. They got richer instead of more populous. This is all discussed on the Most Important Century page on my blog, Cold Takes. What happens right now is that when we have more ideas and we have more resources, we don't end up with more people as a result. We don't have that same accelerating feedback loop. If you had AI systems that could do all the things humans do to advance science and technology (meaning the AI systems could fill in that “more ideas” part of the loop), then you could get that feedback loop back.

You could get sort of this unbounded, heavily accelerating, explosive growth in science and technology. That's the basic dynamic at the heart of it and a way of putting it that's trying to use familiar concepts from economic growth theory. Another way of putting it might just be, “Gosh, if we had AI systems that could do everything humans do to advance science and technology, that would be insane.” What if we were to take the things that humans do to create new technologies that have transformed the planet so radically and we were able to completely automate them so that every computer we have is potentially another mind working on advancing technology?

So either way, when you think about it, you could imagine the world changing incredibly quickly and incredibly dramatically. I argue in the Most Important Century series that it looks reasonably likely, in my opinion, more than 50-50, that this century will see AI systems that can do all of the key tasks that humans do to advance science and technology. If that happens, we'll see explosive progress in science and technology. The world will quickly become extremely different from how it is today. You might think of it as if there was thousands of years of changes packed into a much shorter time period. If that happens, then I argue that you could end up in a deeply unfamiliar future. I give one example of what that might look like using this hypothetical technology idea called digital people. That would be sort of people that live in virtual environments that are kind of simulated, but also realistic and exactly like us.

When you picture that kind of advanced world, I think there is a decent reason to think that if we did get that rate of scientific and technological advancement, we could basically hit the limits of science and technology. We could basically find most of what there is to find and end up with a civilization that expands well beyond this planet, has a lot of control over the environment, is very stable for very long periods of time, and looks sort of post-human in a lot of relevant ways. If you think that, then this is basically our last chance to shape how this happens. The most important century hypothesis in a nutshell is that if we develop AI that can do all the things humans do to advance science and technology, we could very quickly reach a very futuristic world that’s very different from today's. It could be a very stable, very large world, and this is our last chance to shape it.

The Weirdness of Our Time

Dwarkesh Patel

Gotcha. I and many other people are going to find that very wild. Could you walk us through the process by which you went from working in global development to thinking this way? In 2014, for example, you had an interview or a conversation and this is a quote from there. “I have looked at the situation in Africa, have an understanding of the situation in Africa, and see a path of doing a lot of good in Africa. I don't know how to look into the far future situation, don't understand the far future situation, and don't see a path to doing good on that front I feel good about.” Maybe you can walk me through how you got from there to where you are today.

Holden Karnofsky

Firstly, I want to connect this back to how this relates to the work I was doing at GiveWell, and why this all falls under one theme. If we are on the cusp for this century of creating these advanced AI systems, then we could be looking at a future that's either very good or very bad. I think there are decent arguments that if we move forward without caution and we develop sloppily designed AI systems, they could end up with goals of their own. We would end up with a universe that contains very little that humans value or a galaxy that does or a world where very powerful technologies are used by ill-meaning governments to create a world that isn't very good. We could also end up with a world where we manage to eliminate a lot of forms of material scarcity and have a planet that's much better off than today's.

A lot of what I ask is how can we help the most people possible per dollar spent? If you ask how we can help the most people possible per dollar spent, then funding some work to help shape that transition and make sure that we don't move forward too incautiously, and that we do increase the odds that we do get like a good future world instead of a bad future one, is helping a huge number of people per dollar spent. That's the motivation. You're quoting an argument I was having where we posted a transcript back in 2014–– a time that was part of my journey of getting here. I was talking to people who were saying, “Holden, you want to help a lot of people with your resources. You should be focused on this massive event that could be coming this century that very few people are paying attention to, and there might be a chance to make this go well or poorly for humanity.”

So I was saying, “Gosh, like that sure is interesting.” And I did think it was interesting. That's why I was spending the time and having those conversations. But I said, “When I look at global poverty and global health, I see what I can do. I see the evidence. I see the actions I can take. I'm not seeing that with this stuff.” So what changed? I would say a good chunk of what changed is maybe like the most boring answer possible. I just kept at it. I was sitting there in 2014 saying, “Gosh, this is really interesting, but it's all a bit overwhelming. It's all a bit crazy. I don't know how I would even think about this. I don't know how I would come up with a risk from AI that I actually believed was a risk and could do something about today.” Now, I've just been thinking about this for a much longer time period. I do believe that most things you could say about the far future are very unreliable and not worth taking action on, but I think there are a few things one might say about what a transition to very powerful AI systems could look like. There are some things I'm willing to say would be bad if AI systems were poorly designed, had goals of their own, and ended up kind of running the world instead of humans. That seems bad.

I am more familiar today than I was then with the research and the work people can do to make that less likely and the actions people can take to make that less likely–– so that's probably more than half the answer. But another thing that would be close to half the answer is that I think there have been big changes in the world of AI since then. 2014 was the beginning of what's sometimes called the “deep learning revolution”. Since then, we've basically seen these very computationally intensive (but fundamentally simple) AI systems achieve a lot of progress on lots of different unrelated tasks. It's not crazy to imagine that the current way people are developing AI systems, cutting-edge AI systems, could take us all the way to the kind of extremely powerful AI systems that automate roughly everything humans do to advance science and technology. It's not so wild to imagine that we could just keep on going with these systems, make them bigger, put more work into them, but basically stay on the same path and you could get there. If you imagine doing that, it becomes a little bit less daunting to imagine the risks that might come up and the things we could do about them. So I don't think it's necessarily the leading possibility, but it's enough to start thinking concretely about the problem.

Dwarkesh Patel

Another quote from the interview that I found appealing was “Does the upper crust of humanity have a track record of being able to figure out the kinds of things MIRI claims to have figured out?” By the way, for context for the viewers, MIRI is the organization Eliezer was leading, which is who you were talking to at the time.

Holden Karnofsky

I don't remember exactly what kinds of things MIRI was trying to figure out and I'm not sure that I even understood what they were that well. I definitely think it is true that it is hard to predict the future, no matter who you are, no matter how hard you think, and no matter how much you've studied. I think parts of our “world” or memeplex or whatever you want to call it, overblow this at least a little bit. I think I was buying into that a little bit more than I could. In 2014, I would have said something like, “Gosh, no one's ever done something like making smart statements about what several decades out of our future could look like or making smart statements about what we would be doing today to prepare for it.” Since then, I think a bunch of people have looked into this and looked for historical examples of people making long-term predictions and long-term interventions. I don't think it's amazing, but I think I wrote a recent blog post entitled The Track Record of Future. It seems… fine. “Fine” is how I put it, where I don't think there's anyone who has demonstrated a real ability to predict the future with precision and know exactly what we should do.

I also don't think humans' track record of this is so bad and so devastating that we shouldn't think we are capable of at least giving it a shot. If you enter into this endeavor with self-awareness about the fact that everything is less reliable than it appears and feels at first glance and you look for the few things that you would really bet on, I think it's worth doing. I think it's worth the bet. My job is to find 10 things we could do, and have nine of them fail embarrassingly, on the off chance that one of them becomes such a big hit that it makes up for everything else. I don't think it's totally crazy to think we could make meaningful statements about how things we do today could make these future events go better, especially if the future events are crazily far away (especially if they're within the next few decades.) That's something I've changed my mind on, at least to some degree.

Dwarkesh Patel

Gotcha. Okay, so we'll get to forecasting in a second, but let's continue on the object-level conversation about the most important century. I want to make sure I have the thesis right. Is the argument that because we're living in a weird time, we shouldn't be surprised if something transformative happens in a century or is the argument that something transformative could happen this century?

Holden Karnofsky

It's a weird time. So something we haven't covered yet, but I think is worth throwing in is that a significant part of the ‘Most Important Century series’ is making the case that even if you ignore AI, there's a lot of things that are very strange about the time that our generation lives in. The reason I spent so much effort on this is because back in 2014, my number one objection to these stories about transformative AI wasn’t anything about whether the specific claims about AI or economic models or alignment research made sense. This whole thing sounded crazy and was just suspicious. It's suspicious if someone says to you, “You know, this could be the most important century of all time for humanity.” I titled the series that way because I wanted people to know that I was saying something crazy and that I should have to defend it. I didn't want to be backpedaling or soft-pedaling or hiding what a big claim I was making.

I think my biggest source of skepticism was how I didn’t have any specific objection. It sounds crazy and suspicious to say that we might live in one of the most significant times of the most significant time for humanity ever. So a lot of my series is saying that it is weird to think that, but we already have a lot of evidence that we live in an extraordinarily weird time that would be on the short list of contenders for the most important time ever–– even before you get into anything about AI, and just used completely commonly accepted facts about the world. For example, if you chart the history of economic growth, you’ll see that the last couple hundred years have seen faster growth by a lot than anything else in the history of humanity or the world. If you chart anything about scientific and technological developments, you’ll see that everything significant is packed together in the recent past. There's almost no way to cut it. I've looked at many different cuts of this. There's almost no way to cut it that won't give you that conclusion. One way to put it is that the universe is something like 11 or 12 billion years old. Life on Earth is three billion years old.

Human civilization is a blink of an eye compared to that. We're in this really tiny sliver of time, the couple hundred years when we've seen a huge amount of technological advancement and economic growth. So that's weird. I also talk about the fact that the current rate of economic growth seems high enough that we can't keep it going for that much longer. If it went for another 10,000 years, that's another blink of an eye and galactic time scales. It looks to me like we would run out of atoms in the galaxy and wouldn't have anywhere to go. So I think there are a lot of signs that we just live in a really strange time. One more thing that I'll just throw in there–– I think a lot of people who disagree with my take would say, “Look, I do believe eventually we will develop space colonization abilities. We could go to the stars, fill up the galaxy with life, and maybe have artificial general intelligence, but to say that this will happen in a century is crazy. I think it might be 500 years. I think it might be a thousand years. I think it might be 5000 years.” A big point I make in the series is how I say, “Well, even if it's 100000 years, that's still an extremely crazy time to be in in the scheme of things.” If you make a graphic timeline and you show my view versus yours, they look exactly the same down to the pixel. So there are already a lot of reasons to think we live in a very weird time. We're on this planet where there's no other sign of life anywhere in the galaxy.

We believe that we could fill up the galaxy with life. That alone would make us among the earliest life that has ever existed in the galaxy–– a tiny fraction of it. So that’s a lot of what the series is about. I'll answer this question explicitly. You ask, “Is this series about whether transformative AI will come and make this century weird?” or is it about “This century could be weird, and therefore transformative AI will come?” The central claim is that transformative AI could be developed in this century and the sections about ‘how weird a time we live in’ are just a response to an objection. It's a response to a point of skepticism. It's a way of saying there are already a lot of reasons to think we live in a very weird time. So actually, this thing about AI is only a moderate quantitative update, not a complete revolution in the way you're thinking about things.

Dwarkesh Patel

There's a famous comedian who has a bit where he's imagining what it must have been like to live in 10BC. Let's say somebody comes with the proof that current deep learning techniques are not scalable for some reason and that transformative AI is very unlikely this century. I don't know if this is a hypothetical where that would happen, but let's just say that it is. Even if this is a weird time in terms of economic growth, does that have any implications other than transformative AI?

Holden Karnofsky

I encourage people to go to my series because I have a bunch of charts illustrating this and it could be a little bit hard to do concisely now. But having learned about just how strange the time we live in is when you look at it in context, I think the biggest thing I take away from this is how we should really look for the next big thing. If you'd been living 300 years ago and you'd been talking about the best way to help people, a lot of people might have been talking about various forms of helping low-income people. They probably would have been talking about spreading various religious beliefs. It would have seemed crazy to think that what you should be thinking about, for example, was the steam engine and how that might change the world, but I think the Industrial Revolution was actually an enormous deal and was probably the right thing to be thinking about if there's any way to be thinking about it how that would change the world and what one might do to make that a world that could be better.

So that's basically where I'm at. I just think that as a world, as a global civilization, we should place a really high priority on saying that we live in a weird time. Growth has been exploding, accelerating over the last blink of an eye. We really need to be nervous and vigilant about what comes next and think about all the things that could radically transform the world. We should make a list of all the things that might radically transform the world, make sure we've done everything we can to think about them and identify the ways we might be able to do something today that would actually help. Maybe after we're done doing all that, we can have a lot of the world's brightest minds doing their best to think of stuff and when can't think of any more, then we can go back to all the other things that we worry about

Right now the world invests so little in that kind of speculative, “Hey, what's the next big thing?” Even if it's not super productive to do so, even if there's not that much to learn, I feel the world should be investing more in that because the stakes are extremely high. I think it’s a reasonable guess that we're living in a world that's recently been incredibly transformed by the Industrial Revolution and the future could be incredibly transformed by the next thing. I just don't think this gets a lot of discussion in basically any circles. If it got some, I would feel a lot more comfortable. I don't think the whole world should just obsess over what the next transformative event is, but I think right now there's so little attention to it.

The Industrial Revolution

Dwarkesh Patel

I'm glad you brought up the Industrial Revolution because I feel like there are two implicit claims within the most important century thesis that don't seem perfectly compatible. One is that we live in an extremely wild time and that the transition here is potentially wilder than any other transition there has been before. The second is we have some sense of what we can be doing to make sure this transition goes well. Do you think that somebody at the beginning of the Industrial Revolution, knowing what they knew then, could have done something significant to make sure that it went as favorably as possible? Or do you think that that's a bad analogy for some reason?

Holden Karnofsky

It's a pretty good analogy for being thought-provoking and for thinking, “Gosh, if you had seen the Industrial Revolution coming in advance and this is when economic growth really reached a new level back in the 1700s and 1800s, what could you have done?” I think part of the answer is it's not that clear and I think that is a bit of an argument we shouldn’t get too carried away today by thinking that we know exactly what we can do. But I don't think the answer is quite nothing. I have a goofy cold-taste post that I never published and may never publish because I lost track of it. What it basically says is “What if you'd been in that time and you had known the Industrial Revolution was coming or you had thought it might be?” You would ask yourself what you could be doing. One answer you might have given is you might have said, “Well, gosh, if this happens, whatever country it happens in might be disproportionately influential. What would be great is if I could help transform the thinking and the culture in that country to have a better handle on human rights and more value on human rights and individual liberties and a lot of other stuff–– and gosh, it kind of looks like people were doing that and it looks like it worked out.” So this is the Enlightenment.

I even give this goofy example–– I could look it up and it's all kind of a trollish post. But the example is someone's thinking, “Hey, I'm thinking about this esoteric question about what a government owes to its citizens” or, “When does a citizen have a right to overthrow a government or when is it acceptable to enforce certain beliefs and not?” Then the other person in the dialogue is just like, “This is the weirdest, most esoteric question. Why does this matter? Why aren't you helping poor people?” But these are the questions that the Enlightenment thinkers were thinking about. I think there is a good case that they came up with a lot of stuff that really shaped the whole world since then because of the fact that the UK became so influential and really laid the groundwork for a lot of stuff about the rights of the governed, free speech, individual rights, and human rights.

Then I go to the next analogy. It’s like we're sitting here today and someone is saying, “Well, instead of working on global poverty, I'm studying this esoteric question about how you get an AI system to do what you want it to do instead of doing its own thing. I think it's not completely crazy to see them as analogous.” Now, I don't think this is what the Enlightenment thinkers are actually doing. I don't think they were saying this could be the most important millennium, but it is interesting that it doesn't look like there was nothing to be had there. It doesn't look like there's nothing you could have come up with. In many ways, it looks like what the Enlightenment thinkers were up to had the same esoteric, strange, overly cerebral feel at the time and ended up mattering a huge amount. So it doesn't feel like there's zero precedent either.

Dwarkesh Patel

Maybe I'm a bit more pessimistic about that because I think the people who are working on individual rights frameworks weren’t anticipating an industrial revolution. I feel like the type of person who’d actually anticipate the industrial revolution would have a political philosophy that was actually probably a negative given, you know… Karl Marx. If you saw something like this happening, I don’t think it would be totally not obvious.

Holden Karnofsky

I mean, I think my basic position here is that I'm not sitting here highly confident. I'm not saying there's tons of precedent and we know exactly what to do. That's not what I believe. I believe we should be giving it a shot. I think we should be trying and I don't think we should be totally defeatist and say, “Well, it's so obvious that there's never anything you could have come up with throughout history and humans have been helpless to predict the future.” I don't think that is true. I think that's enough of an example to kind of illustrate that. I mean, gosh, you could make the same statement today and say, “Look, doing research on how to get AI systems to behave as intended is a perfectly fine thing to do at any period in time.” It's not like a bad thing to do. I think John Locke was doing his stuff because he felt it was a good thing to do at any period in time, but the thing is that if we are at this crucial period of time, it becomes an even better thing to do and it becomes magnified to the point where it could be more important than other things.

Dwarkesh Patel

The one reason I might be skeptical of this theory is that I could say, “Oh, gosh, if you look throughout history, people were often convinced that they were living in the most important time,” or at least an especially important time. If you go back, everybody could be right about living in the most important time. Should you just have a very low prior that anybody is right about this kind of thing? How do you respond to that kind of logic?

Holden Karnofsky

First of all, I don't know if it’s really true that it's that common for people to say that they're living in the most important time in history. This would be an interesting thing to look at. But just from stuff I've read about past works on political philosophy and stuff, I don't exactly see this claim all over the place. It definitely happens. It's definitely happened. I think a way of thinking about it is that there are two reasons you might think you are especially important. One is that you actually are and you’ve made reasonable observations about it. Another is that you want to be or you want to think you are so you're self-deceiving. So over the long sweep of history, a lot of people will come to this conclusion for the second reason. Most of the people who think they're the most important will be wrong. So that's all true. That certainly could apply to me and it certainly could apply to others. But I think that's just completely fine and completely true. I think we should have some skepticism when we find ourselves making these kinds of observations. At the same time, I think it would be a really bad rule or a really bad norm that every time you find yourself thinking the stakes are really high or that you're in a really important position, you just decide to ignore the thought. I think that would be very bad.

If you imagine a universe where there actually are some people who live in an especially important time, and there are a bunch of other people who tell stories to themselves about whether they do, how would you want all those people to behave? To me, the worst possible rule is that all those people should just be like, “No, this is crazy, and forget about it.” I think that’s the worst possible rule because the people who are living at the important time will then do the wrong thing. I think another bad rule would be that everyone should take themselves completely seriously and just promote their own interests ahead of everyone else's. A rule I would propose over either of them is that all these people should take their beliefs reasonably seriously and try to do the best thing according to their beliefs, but should also adhere to common sense standards of ethical conduct and not do too much “ends justify the means’ reasoning.” It's totally good and fine to do research on alignment, but people shouldn't be telling lies or breaking the law in order to further their ends. That would be my proposed rules. When we have these high stake, crazy thoughts, we should do what we can about them and not go so crazy about them that we break all the rules of society. That seems like a better rule. That's the rule I'm trying to follow.

Dwarkesh Patel

Can you talk more about that? If for some reason, we can be convinced that the expected value calculation was immense, and you had to break some law in order to increase the odds that the AI goes well, I don't know how hypothetical this would be. Is it just that you're not sure whether you would be right and so you’d just want to err on the side of caution?

Holden Karnofsky

Yeah, I'm really not a fan of ends justify the means’ reasoning. The thing that looks really, really bad is people saying it's worth doing horrible things and coercing each other and using force to accomplish these things because the ends we're trying to get to are more important than everything else. I'm against that stuff. I think that stuff looks a lot worse historically than people trying to break the future and do helpful things. So I see my main role in the world as trying to break the future and do helpful things. I can do that without doing a bunch of harmful, common sense, unethical stuff. Maybe someday there will be one of these intense tradeoffs. I haven't really felt like I've run into them yet. If I ever ran into one of those intense tradeoffs, I'd have to ask myself how confident I really am. The current level of information and confidence I have is, in my opinion, not enough to really justify the means.

Dwarkesh Patel

Okay, so let's talk about the potential implausibility of continued high growth. One thing somebody might think is, “OK, maybe 2 percent growth can't keep going on forever, but maybe the growth slows down to point five percent a year.” As you know, small differences in growth rates have big effects on the end result. So by the point that we've exhausted all the possible growth in the galaxy, we'll probably be able to expand to other galaxies. What’s wrong with that kind of logic where there's point five percent growth that still doesn't imply a lock-in or would it be weird if that implied a lock-in?

Holden Karnofsky

I think we might want to give a little bit more context here. One of the key arguments of the most important century is that it's just part of one of the arguments that we live in a strange time. I'm also arguing that the current level of economic growth just looks too high to go on for another 10,000 years or so. One of the points I make, which is a point I got from Robin Hanson, is that if you just take the current level of economic growth and extrapolate it out 10,000 years, you end up having to conclude that we would need multiple stuff that is worth as much as the whole world economy is today–– multiple times that per atom in the galaxy. If you believe we can't break the speed of light, then we can't get further than that. We can't get outside the galaxy. So in some sense, we run out of material. So you're saying, “Alright but what if the growth rate falls to 0.5 percent?” Then I'm kind of like, “OK, well, so the growth rate now I ballparked it in the post is around 2 percent. That's the growth rate generally in the most developed countries. Let's say it falls to 0.5 percent.” Just like for how long? Did you calculate how long it would take to get to the same place?

Dwarkesh Patel

I think it was like 25,000 years. 0.5 percent gets you like one world-size economy. It's 10,000 versus 25,000, but 25,000 is the number of light years between us and like the next galaxy.

Holden Karnofsky

That doesn't sound right. I don't think this galaxy calculation is very close. There's also going to be a bunch of dead space. As you get to the outer reach of the galaxy, there's not going to be as much there. That doesn't sound super right, but let's just roll with it. I mean, sure, let's just say that you had 2 percent today and then growth went down to 0.5 percent and stayed there forever. I'm pretty sure that's still too big. I'm pretty sure you're still going to hit limits in some reasonable period of time, but that would still be weird on its own. It would just be like, “Well, we lived in the 200-year period when we had 2 percent growth and then we had 0.5 percent growth forever.” That would still make this a kind of an interesting time. It would be the most dynamic, fastest-changing time in all of human history. Not by a ton, but it's also like you pick the number that's the closest and the most perfectly optimized here. So if it went down to 0.1 percent or even down to 0.01 percent, then it would take longer to run out of stuff. But it would be even stranger with the 2 percent versus the 0.01 percent. So I don't really think there's any way out of “Gosh, this looks like this looks like it's probably going to end up looking like a very special time or a very weird time.”

Dwarkesh Patel

This is not worth getting hung up on, but from that perspective, then the century where we had 8 percent growth because of the Industrial Revolution–– would you say that maybe that's the most important century?

Holden Karnofsky

Oh, sure. Yeah, totally. No, the thing about rapid growth is it’s not supposed to be on its own. By growth standards, this century looks less special than the last one or two. It's saying that the century is one of a handful or I think when I say “One of 80 of the most significant centuries,” or something by economic growth standards. That's only one argument, but then I look at a lot of other ways in which this century looks unusual. To say that something is the most important century of all time sounds totally nuts because there are so many centuries in the history of humanity, especially if you want to think about it on galactic time scales. Even once you narrow it down to 80, it's just much way less weird. If I've already convinced you using kind of non-controversial reasoning that we're one of the 80 most important centuries, it shouldn't take me nearly as much further evidence to say, actually, this one might be number one out of 80 because you're starting odds are more than 1 percent. So to get you up to 10 percent or 20 percent or 30 percent doesn't necessarily require a massive update the way that it would if we're just starting from nowhere.

Dwarkesh Patel

I guess I'm still not convinced that just because this is a weird century, this has any implications for why or whether we should see transformative AI this century. If we have a model about when transformative AI happens, is one of the variables that goes into that “What is the growth rate in 2080?” It just feels weird to have this as a parameter for when the specific technological development is going to happen.

Holden Karnofsky

It's just one argument in the series. I think the way that I would come at it is I would just say, “Hey, look at AI systems. Look at what they're doing. Look at how fast the rate of progress is. Look at these five different angles on imagining when I might be able to do all the things humans do to advance science and technology.” Just imagine that we get there this century. Wouldn't it be crazy to have AI that could do all the things humans do to advance science and technology? Wouldn't that lead to just a lot of crazy stuff happening? There's only ever been one species in the history of the universe that we know of that can do the kinds of things humans do. Wouldn't it be weird if there were two? That would be crazy. One of them was a new one we built that could be copied at will, and run at different speeds on any hardware you have. That would be crazy. Then you might come back and say, “Yeah, that would be crazy. This is too crazy. Like I'm ruling this out because this is too crazy.” Then I would say, “OK, well, we have a bunch of evidence that we live in an unusual, crazy time.” And you actually should think that there's a lot of signs that this century is not just a random century picked from a sample of millions of centuries.

So that's the basic structure of the argument. As far as the growth rate in zero AD, I think it matters. I think you're asking the question, why do the dynamics of growth in zero AD matter at all for this argument? I think it's because it's just a question of, “How does economic growth work generally and what is the trend that we're on, and what happens if that trend continues?” If around zero AD growth was very low but accelerating, and if that was also true at one hundred AD and a thousand AD and negative a thousand or, you know, a thousand BC, then it starts to point to a general pattern. Growth is accelerating and maybe accelerating for a particular reason, and therefore you might expect more acceleration.

AI Success Scenario

Dwarkesh Patel

Alright, let’s talk about transformative AI then. Can you describe what success looks like concretely? Are humans part of the post-transformative AI world? Are we hoping that these AIs become enslaved gods that help us create a utopia? What does the concrete success scenario look like?

Holden Karnofsky

I mean, I think we've talked a lot about the difficulty of predicting the future, and I think I do want to emphasize that I really do believe in that. My attitude to the most important century is not at all, “Hey, I know exactly what's going to happen and I'm making a plan to get us through it.” It's much more like there's a general fuzzy outline of a big thing that might be approaching us. There are maybe two or three things we can come up with that seem good to do. Everything else we think about, we're not going to know if it's good to do or bad to do. So I'm just trying to find the things that are good to do so that I can make things go a little bit better or help things go a little bit better. That is my general attitude. It's like if you were on a ship in a storm and you saw some very large, fuzzy object obscured by the clouds, you might want to steer away from it. You might not want to say, “Well, I think that is an island and I think there's probably a tiger on it. So if we go and train the tiger in the right way, blah, blah, blah, blah, blah,” you don't want to get into that. Right? So that is the general attitude I'm taking.

What does success look like to me? Success could look like a lot of things, but one thing success would look like to me would frankly just be that we get something not too different from the trajectory we're already on. So, in other words, if we can have systems that behaved as intended, acted as tools and amplifiers of humans, and did the things they're supposed to do. If we could avoid a world where those systems got sort of all controlled by one government or one person, we could avoid a world where that caused a huge concentration of power. If we could have a world where AI systems are just another technology that helps us do a lot of stuff, and we’d invent lots of other technologies and everything is relatively broadly distributed and everything works roughly as it's supposed to work, then you might be in a world where we continue the trend we've seen over the last couple of hundred years, which is that we're all getting richer. We're all getting more tools. We all hopefully get an increasing ability to understand ourselves, study ourselves, and understand what makes us happy, what makes us thrive.

Hopefully, the world just gets better over time and we have more and more new ideas that thus hopefully make us wiser. I do think that in most respects, the world of today is a heck of a lot better than the world of 200 years ago. I don't think the only reason for that is wealth and technology, but I think they played a role. I think that if you'd gone back to 200 years ago and said, “Holden, how would you like the world to develop a bunch of new technologies as long as they're sort of evenly distributed and they behave roughly as intended and people mostly just get richer and discover new stuff?” I'd be like, “That sounds great!” I don't know exactly where we're going to land. I can't predict in advance whether we're going to decide that we want to treat our technologies as having their own rights. That's stuff that the world will figure out. But I'd like to avoid massive disasters that are identifiable because I think if we can, we might end up in a world where the future is wiser than we are and is able to do better things.

Dwarkesh Patel

The way you put it, AI enabling humans doesn't sound like something that could last for thousands of years. It almost sounds as weird as chimps saying “What we would like is humans to be our tools.” At best, maybe they could hope we would give them nice zoos. What is the role of humans in this in this future?

Holden Karnofsky

A world I could easily imagine, although that doesn't mean it's realistic at all, is a world where we build these AI systems. They do what they're supposed to do, and we use them to gain more intelligence and wisdom. I've talked a little bit about this hypothetical idea of digital people–– maybe we develop something like that. Then, after 100 years of this, we've been around and people have been having discussions in the public sphere, and people kind of start to talk about whether the AIs themselves do have rights of their own and should be sharing the world with us. Maybe then they do get rights. Maybe some AI systems end up voting or maybe we decide they shouldn't and they don't. Either way, you have this kind of world where there's a bunch of different beings that all have rights and interests that matter. They vote on how to set up the world so that we can all hopefully thrive and have a good time. We have less and less material scarcity. Fewer and fewer tradeoffs need to be made. That would be great. I don't know exactly where it ends or what it looks like. But I don't know. Does anything strike you as unimaginable about that?

Dwarkesh Patel

Yeah, the fact that you can have beings that can be copied at will, but also there's some method of voting..

Holden Karnofsky

Oh, yeah. That's a problem that would have to be solved. I mean, we have a lot of attention paid to how the voting system works, who gets to vote, and how we avoid things being unfair. I mean, it's definitely true that if we decided there was some kind of digital entity and it had the right to vote and that digital entity was able to copy itself–– you could definitely wreak some havoc right there. So you'd want to come up with some system that restricts how many copies you can make of yourself or restricts how many of those copies can vote. These are problems that I'm hoping can be handled in a way that, while not perfect, could be non-catastrophic by a society that hasn't been derailed by some huge concentration of power or misaligned systems.

Dwarkesh Patel

That sounds like that might take time. But let's say you didn't have time. Let's say you get a call and somebody says, “Holden, next month, my company is developing or deploying a model that might plausibly lead to AGI.” What does Open Philanthropy do? What do you do?

Holden Karnofsky

Well, I need to distinguish. You may not have time to avoid some of these catastrophes. A huge concentration of power or AI systems don't behave as intended and have their own goals. If you can prevent those catastrophes from happening, you might then get more time after you build the AIs to have these tools that help us invent new technologies and help us perhaps figure things out better and ask better questions. You could have a lot of time or you could figure out a lot in a little time if you had those things. But if someone said–– wait how long did you give me?

Dwarkesh Patel

A month. Let's say three months. So it's a little bit more.

Holden Karnofsky

Yeah, I would find that extremely scary. I kind of feel like that's one of the worlds in which I might not even be able to offer an enormous amount. My job is in philanthropy (and a lot of what philanthropists do historically or have done well historically), is we help fields grow. We help do things that operate on very long timescales. So an example of something Open Philanthropy does a lot of right now is we fund people who do research on alignment and we fund people who are thinking about what it would look like to get through the most important century successfully. A lot of these people right now are very early in their careers and just figuring stuff out. So a lot of the world I picture is like 10 years from now, 20 years from now, or 50 years from now. There's this whole field of expertise that got support when traditional institutions wouldn't support it. That was because of us. Then you come to me and you say, “We've got one week left. What do we do?” I’d be like, “I don't know. We did what we could do. We can’t go back in time and try to prepare for this better.” So that would be an answer. I could say more specific things about what I'd say in the one to three-month time frame, but a lot of it would be flailing around and freaking out, frankly.

Dwarkesh Patel

Gotcha. Okay. Maybe we can reverse the question. Let's say you found out that AI actually is going to take much longer than you thought, and you have more than five decades. What changes? What are you able to do that you might not otherwise be able to do?

Holden Karnofsky

I think the further things are, the more I think it's valid to say that humans have trouble making predictions on long time frames. The more I’d be interested in focusing on other causes of very broad things we do, such as trying to grow the set of people who think about issues like this, rather than trying to specifically study how to get AI systems like today's to behave as intended. So I think that's a general shift, but I would say that I tend to feel a bit more optimistic on longer time frames because I do think that the world just isn't ready for this and isn't thinking seriously about this. A lot of what we're trying to do at Open Philanthropy is create support that doesn't exist in traditional institutions for people to think about these topics. That includes doing AI alignment research. That also includes thinking through what we want politically, and what regulations we might want to prevent disaster. I think those are a lot of things. It's kind of a spectrum. I would say, if it's in three months, I would probably be trying to hammer out a reasonable test of whether we can demonstrate that the AI system is either safe or dangerous.

If we can demonstrate it's dangerous, use that demonstration to really advocate for a broad slowing of AI research to buy more time to figure out how to make it less dangerous. I don't know that I feel that much optimism. If this kind of AI is 500 years off, then I'm kind of inclined to just ignore it and just try and make the world better and more robust, and wiser. But I think if we've got 10 years, 20 years, 50 years, 80 years, something in that range, I think that is kind of the place where supporting early careers and supporting people who are going to spend their lives thinking about this would be beneficial. Then we flash forward to this crucial time and there are a lot more people who spent their lives thinking about it. I think that would be a big deal.

Dwarkesh Patel

Let's talk about the question of whether we can expect the AI to be smart enough to disempower humanity, but dumb enough to have that kind of goal. When I look out at smart people in the world, it seems like a lot of them have very complex, nuanced goals that they've thought a lot about what is good and how to do good.

Holden Karnofsky

A lot of them don't.

Dwarkesh Patel

Does that overall make you more optimistic about AIs?

Holden Karnofsky

I am not that comforted by that. This is a very, very old debate in the world of AI alignment. Eliezer Yudkowsky has something called the orthogonality thesis. I don't remember exactly what it says, but it's something like “You could be very intelligent about any goal. You could have the stupidest goal and be very intelligent about how to get it.” In many ways, a lot of human goals are pretty silly. A lot of the things that make me happy are not things that are profound or wonderful. They're just things that happen to make me happy. You could very intelligently try to get those things, but it doesn't give me a lot of comfort. I think basically my picture of how modern AI works is that you're basically training these systems by trial and error. You're basically taking an AI system, and you're encouraging some behaviors, while discouraging other behaviors. So you might end up with a system that's being encouraged to pursue something that you didn't mean to encourage. It does it very intelligently. I don't see any contradiction there. I think that if you were to design an AI system and you were kind of giving it encouragement every time it was getting more money into your bank account, you might get something that's very, very good at getting money into your bank account to the point where you'd going to disrupt the whole world to do that. You will not automatically get something that thinks, “Gosh, is this a good thing to do?” I think with a lot of human goals, there's not really a right answer about whether our goals actually make sense. They're just the goals we have.

Dwarkesh Patel

You've written elsewhere about how moral progress is something that's real, that's historically happened, and it corresponds to what actually counts as moral progress. Do you think there's a reason to think the same thing might happen with AI? Whatever the process is that creates moral progress?

Holden Karnofsky

I kind of don't in particular. I've used the term moral progress as just a term to refer to changes in morality that are good. I think there has been moral progress, but I don't think that means moral progress is something inevitable or something that happens every time you are intelligent. An example I use a lot is attitudes toward homosexuality. It's a lot more accepted today than it used to be. I call that moral progress because I think it's good. Some people will say, “Well, you know, I don't believe that morality is objectively good or bad. I don't believe there is any such thing as moral progress. I just think things change randomly.”

That will often be an example I'll pull out and I'll say, “But do you think that was a neutral change?” I just think it was good. I think it was good, but that's not because I believe there's some underlying objective reality. It's just my way of tagging or using language to talk about moral changes that seem like they were positive to me. I don't particularly expect that an AI system would have the same evolution that I've had in reflecting on morality or would come to the same conclusions I've come to or would come up with moralities that seem good to me. I don't have any reason to think any of that. I do think that historically there have been some cases of moral progress.

Dwarkesh Patel

What do you think is the explanation for historical progress?

Holden Karnofsky

One thing that I would say is that humans have a lot in common with each other. I think some of history contains cases of humans learning more about the world, learning more about themselves, and debating each other. I think a lot of moral progress has just come from humans getting to know other humans who they previously were stereotyping and judging negatively and afraid of. So I think there's some way in which humans learning about the world and learning about themselves leads them to have kind of conclusions that are more reflective and more intelligent for their own goals. But, if you brought in something into the picture that was not a human at all, it might be very intelligent and reflective about its goals, but those goals might have zero value from our point of view.

Dwarkesh Patel

Recent developments in AI have made many people think that AI could happen much sooner than they otherwise thought. Has the release of these new models impacted your timelines?

Holden Karnofsky

Yeah, I definitely think that recent developments in AI have made me a bit more freaked out. Ever since I wrote The Most Important Century series and before that, there were years when Open Philanthropy was very interested in AI risk, but it's become more so as we've seen progress in AI. I think what we're seeing is we're seeing these very generic, simple systems that are able to do a lot of different tasks. I think people are interested in this. There are a lot of compilations of what GPT-3 is–– a very simple language model that, by the way, my wife and brother-in-law both worked on. This very simple language model just predicts the next word it's going to see in a stream of text. People have gotten it to tell stories. People got similar (though not identical) models to analyze and explain jokes.

People have gotten it to play role-playing games, write poetry, write lyrics, answer multiple-choice questions, and answer trivia questions. One of the results that I found most ridiculous, strange and weird was this thing called Minerva, where people took one of these language models and with very little special intervention, they got it to do these difficult math problems and explain its reasoning and get them right about half the time. It wasn't really trained in a way that was very specialized for these math problems, so we just see AI systems having all these unpredictable human-like abilities just from having this very simple training procedure. That is something I find kind of wild and kind of scary. I don't know exactly where it's going or how fast.

Dwarkesh Patel

So if you think transformative AI might happen this century, what implications does that have for the traditional global health and well-being stuff that OpenPhilanthropy does? Will that have persistent effects of AI if it gets aligned? Will it create a utopia for us anyway?

Holden Karnofsky

I don't know about utopia. My general take is that anything could happen. I think my general take on this most important century stuff, and the reason it's so important is because it's easy to imagine a world that is really awesome and is free from scarcity and we see more of the progress we've seen over the last 200 years and we end up in a really great place. It's also easy to imagine a horrible dystopia. But my take is that the more likely you think all this is, the more likely you think transformative AI is, the more you should think that that should be the top priority, that we should be trying to make that go well instead of trying to solve more direct problems that are more short term. I'm not an extremist on this. So, OpenPhilanthropy does both.

OpenPhilanthropy works on speculative far-off future risks and OpenPhil also does a bunch of more direct work. Again, we do direct and recommend a lot of money to give to those top charities, which do things like distributing bed nets in Africa to help prevent malaria and treat children for intestinal parasites. OpenPhilanthropy does a lot of advocacy for more money going to foreign aid or for better land use policies to have a stronger economy. We do a bunch of scientific research work that is more aimed at direct medical applications, especially in poor countries. So I support all that stuff. I'm glad we're doing it. It's just a matter of how real and how imminent do you think this transformative AI stuff is? The more real and more imminent, the more of our resources should go into it.

Dwarkesh Patel

Yeah, that makes sense to me. I'm curious, whatever work you do elsewhere, do those still have persistent effects after transformative AI comes? Or do you think they’ll basically wash out in comparison to the really big stuff?

Holden Karnofsky

I mean, I think in some sense, the effects are permanent in the sense that if you cause someone to live a healthier, better life, that's a significant thing that happened. Nothing will ever erase that life or make that life unimportant, but I think in terms of the effects on the future, I do expect it mostly to wash out. I expect that mostly whatever we do to make the world better in that way will not persist in any kind of systematic, predictable manner past these crazy changes. I think that's probably how things look pre and post-industrial revolution. There are probably some exceptions, but that's my guess.

Competition, Innovation , & AGI Bottlenecks

Dwarkesh Patel

You've expressed skepticism towards the competition frame around AI or you try to make capabilities go faster for the countries or companies you favor most. But elsewhere, you've used the “innovation as mining metaphor,” and maybe you can explain that when you're giving the answer. It seems like this frame should imply that the second most powerful AI company is probably right on the heels of the first most powerful. So if you think the first most powerful is going to take safety more seriously, you should try to boost them. How do you think about how these two different frames interact?

Holden Karnofsky

I think it's common for people who become convinced that AI could be really important to just jump straight to, “Well, I want to make sure that people I trust build it first.” That could mean my country, that could mean my friends, people I'm investing in. I have generally called that the competition frame which is “I want to win a competition to develop AI”, and I've contrasted it with a frame that I also think is important, called the caution frame, which is that we need to all work together to be careful to not build something that spins out of control and has all these properties and behaves in all these ways we didn't intend. I do think that if we do develop these very powerful AI systems, we're likely to end up in a world where there are multiple players trying to develop it and they're all hot on each other's heels. I am very interested in ways for us all to work together to avoid disaster as we're doing that. I am maybe less excited than the average person who first learns about this is and is like “I’m picking the one I like best and helping them race ahead.”

Dwarkesh Patel

Although I am someone interested in both, if you take the innovation as mining metaphor seriously, doesn't that imply that actually the competition is really a big factor here?

Holden Karnofsky

The innovation mining metaphor is from another bit of Cold Takes. It's an argument I make that you should think of ideas as being somewhat like natural resources in the sense of once someone discovers a scientific hypothesis or once someone writes a certain great symphony, that's something that can only be done once. That's an innovation that can only be done once. So it gets harder and harder over time to have revolutionary ideas because the most revolutionary, easiest-to-find ideas have already been found. So there's an analogy to mining. I don't think it applies super importantly to the AI thing because all I'm saying is that success by person one makes success by person two harder. I'm not saying that it has no impact or that it doesn't speed things up. Just to use a literal mining metaphor, let's say there's a bunch of gold in the ground. It is true that if you rush and go get all that gold, it'll be harder for me to now come in and find a bunch of gold. That is true. What's not true is that it doesn't matter if you do it. I mean, you might do it a lot faster than me. You might do it a lot ahead of me.

Dwarkesh Patel

Fair enough. Maybe one piece of skepticism that somebody could have about transformative AI is that all this is going to be bottlenecked by the non-automatable steps in the innovation sequence. So there won't be these feedback loops that speed up. What is your reaction?

Holden Karnofsky

I think the single best criticism and my biggest point of skepticism on this most important century stuff is the idea that you could build an AI system that's very impressive and could do pretty much everything humans can do. There might be one step that you still have to have humans do, and that could bottleneck everything. Then you could have the world not speed up that much and science and technology not advance that fast because they are doing almost everything. But humans are still slowing down this one step or the real world is slowing down one step. Let's say real-world experiments to invent new technologies take how long they take. I think this is the best objection to this whole thing and the one that I'd most like to look into more. I do ultimately think that there's enough reason to think that if you had AI systems that had human-like reasoning and analysis capabilities, you shouldn't count on this kind of bottleneck causing everything to go really slow.

I write about that in this piece called Weak Point in the Most Important Century: Full Automation. Part of this is how you don't need to automate the entire economy to get this crazy growth loop. You can automate just a part of it that specifically has to do with very important tech like energy and AI itself. Those actually seem, in many ways, less bottlenecked than a lot of other parts of the economy. So you could be developing better AI algorithms and AI chips, manufacturing them, mostly using robots, and using those to come up with even better designs. Then you could also be designing more and more efficient solar panels, and using those to collect more and more energy to power your eyes. So a lot of the crucial pieces here just actually don't seem all that likely to be bottlenecked. You can be at the point where you have something that has the ability to have creative new scientific hypotheses the way a human does, which is a debate over whether we should ever expect that and when. Once you have that, I think you should figure that there are just a lot of ways to get around all your other bottlenecks because you have this potentially massive population of thinkers looking for them. So an example is that you could, with enough firepower, enough energy, enough AI, and enough analysis, you could probably find a way to simulate a lot of the experiments you need to run, for example.

Dwarkesh Patel

Gotcha. Now, it seems like the specific examples you used of energy and AI innovations are probably the hardest things to automate, given the fact that those are the ones that humanity's only gotten around to advancing most recently. Can you talk us through the intuition about how those might be easier?

Holden Karnofsky

I think some of the stuff that might be hardest to automate would just be stuff that in some sense doesn't have anything to do with software or capabilities. So an example of something that might just be extremely hard to automate is trust, making a business deal, or providing care for someone who's sick. It might just be that even if an AI system has all the same intellectual capabilities as a human, and can write poetry just as well, have just as many ideas, and have just as good a conversation, it just doesn't look like a human. So people don't want that. Maybe you can create a perfect representation of a human on a screen, but it's still on the screen. In general, I see the progress in AI as being mostly on the software front, not the hardware front. So AIs are able to do a lot of incredible things with language, things with math, and things with board games. I also wouldn't be surprised if they could write hit music in the next decade or two.

But these people really are not making the same kind of progress with robotics. So weirdly, a task that might be among the hardest to automate is the task of taking this bottle of water and taking off the cap. Because I have this hand that is just well designed for that. Well, it's clearly not designed for that, but it's like these hands can do a lot of stuff. We aren't seeing the same kind of progress there. So I think there are a lot of places where AI systems might have the kind of brains that can do roughly everything human brains can. There's some other reason they can't do some key economic tasks, but I think these are not the tasks I see likely to bottleneck the R&D as much.

Dwarkesh Patel

Gotcha.

Holden Karnofsky

This is an argument I make in one of my more obscure Cold Takes posts. I say that AI that could actually take everyone's job, like every human's job, might be a lot harder than AI and could radically transform the galaxy via new technology. It might be easier to take a scientist's job than a teacher's job or a doctor's job because the teachers and the doctors are regulated. People might just say, “I want human teachers. I don't want an AI teacher.” Whereas you can sit there in your lab with your scientists and find new theories that change the world. So some of this stuff, I think, is very counterintuitive, but I can imagine worlds where you get really wacky stuff before you get self-driving cars out on the road just because of the way the regulations work.

Lock-ins and Weak Points

Dwarkesh Patel

Gotcha. OK, let's talk about another weak point or the one you identify as a weak point. Lock-in. What do you think are the odds of lock-in given transformative AI?

Holden Karnofsky

So lock-in is a term I use to talk about the possibility that we could end up with a very stable civilization. I talk about that in another post. It's called Weak Point in the Most Important Century. I wrote posts about the weakest points in the series and the idea is that throughout history so far, when someone becomes in charge of a government and they're very powerful and they're very bad, this is generally considered to be temporary. It’s not going to go on forever. There are a lot of reasons the world is dynamic and the world tends to just not stay that way completely. The world has changed a lot throughout history. It's kind of a dumb thing to say, but I'll get to why this might be important. If someone is running a country in a really cruel, corrupt way, at some point they're going to get old and die and someone else is going to take over. That person will probably be different from them.

Furthermore, the world is changing all the time. There are new technologies, new things are possible, there's new ideas. The most powerful country today might not be the most powerful tomorrow. The people in power today might not be the ones in power tomorrow. I think this gets us used to the idea that everything is temporary, and everything changes. A point I make in the Most Important Century series is that you can imagine a level of technological development where there just aren't new things to find. There isn't a lot of new growth to have. People aren't dying because it seems like it should be medically possible for people not to age or die. So you can imagine a lot of the sources of dynamism in the world actually going away if we had enough technology. You could imagine a government that was able to actually serve everyone, which is not something you can do now, with a dictator who actually doesn't age or die, who knows everything going on, who's able to respond to everything. You could imagine that world just being completely stable.

I think this is a very scary thought. It's something we have to be mindful of–– if the rate of technological progress speeds up a lot, we could quickly get to a world that doesn't have a lot more dynamism and is a lot more stable. What are the odds of this? I don't know. It's very hard to put a probability on it. But I think if you imagine that we're going to get this explosion in scientific and technological advancement, you have to take pretty seriously the idea that we could end up hitting a wall and there could not be a lot of room for more dynamism. We could have these very stable societies. What does seriously mean in terms of probability? I don't know–– a quarter, a third, a half, something like that–– I'm making up numbers. I think it's serious enough to think about as something that affects the stakes of what we're talking about.

Dwarkesh Patel

Gotcha. Are you concerned about lock-in just from the perspective of locking in a negative future, or do you think that might intrinsically be bad to lock in any kind of future? If you could press a button right now and lock in a reasonably positive future that won't have any dynamism, or one where dynamism is guaranteed but net expected positive is not, how would you make that determination?

Holden Karnofsky

Well, I don't think a lot about what I would do with unrealistic buttons where I have crazy amounts of power that I'll never have and shouldn't have. I think of lock-in by default as mostly a bad thing. I feel like we’d want to at least kind of preserve optionality and have a world where it's not just one person running the show with their values set up the way they want forever. I think of it mostly that way. I can imagine some future world where civilization's been around long enough and we've learned what there is to learn, and we know what a good world looks like, so most people feel pretty confident about that, and they're right to feel confident. Maybe then, lock-in's wouldn’t be so bad. But I do mostly think of lock-in as a bad thing. I also imagine that you could lock in some things about the world in order to avoid locking in others. So I can imagine if you had this enormous amount of power over how the world works––some of this is more explained in my digital people series–– but if you had this kind of world where you completely control the environment, you might want to lock in the fact that you should never have one person with all the power. That might be a thing you might want to lock in, and that prevents other kinds of lock-in.

Dwarkesh Patel

Do you worry about AI alignment as being a form of lock-in? In some sense, if the goal of the research is to prevent drift from human values, then you might just be locking in values that are suboptimal.

Holden Karnofsky

Yeah, I mostly think of AI alignment as just trying to avoid a really bad thing from happening. What we don't want to happen is we have some AI system we thought we were designing to help us, but in reality, we're actually designing it to do some extremely random thing. Again, these systems work by trial and error, by encouragement, discouragement, or positive and negative reinforcement. So we might have not even noticed that through the pattern of reinforcement we were giving, we trained some system to want to put as much money as possible into one bank account, gain as much power as possible, or control as much energy as possible, or something like that. Maybe it’d set its own reward number, its own score to the highest possible number. I think that would be a form of lock-in if we had systems more powerful than humans that had these kinds of random goals.

That would be like locking in a kind of future that is not related to the things that humans value and care about. That's an example of a future I think would be really bad. Now, if we got these systems to behave as intended, we still might have problems because we might have humans doing really stupid things and locking in really bad futures. I think that's an issue too. I feel reasonably comfortable, though not 100% confident, saying that we'd like to avoid that just like slip-ups. We'd like to avoid having these systems that have these random goals we gave them by accident. They're very powerful, and they're better at setting up the world than we are. So we get this world that's just doing this random thing that we did by accident. I think that's a thing worth avoiding.

Predicting the Future

Dwarkesh Patel

What is your biggest disagreement with Will MacAskill's new book on long-termism?

Holden Karnofsky

I like Will's book. I think it's worth reading Will MacAskill's book about how the future could be very large and very important. In my opinion, if you want to talk about the long-run future and how to make the long-run future go well, you're starting from a place of, by default, “almost nothing I can do will actually make sense.” I do really believe it's hard to understand the long-run future, and it's hard to make specific plans about it. So I would say that compared to Will, I am very picky about which issues are big enough and serious enough to actually pay attention to. I feel the issue of AI would be transformative enough. It looks likely enough that it'll be soon. If it's soon, that means there might actually be things we can do that have predictable effects. I think this misaligned AI thing is a real threat. The way people design AI systems today could be really bad. I am ready to put some resources into preventing this, but that's kind of crossing my threshold. Most things don't. So if you make a list of ways to make the next million years go well, I'll look at most of them and be like, “I don't know. I don't really believe in this.” I wouldn't really invest in this. I think Will is a bit broader than I am in a sense. He's interested in more things, and I am pickier than he is because I think it's so hard to know what's really going to affect the long-run future that I'm just looking for a really short list of things that are worth paying special attention to.

Dwarkesh Patel

Is there a specific thing that he points out in the book you think would be hard to grapple with?

Holden Karnofsky

I don't remember super well. The book is a really broad survey of lots of stuff. An example I might give is he talks about the risk of stagnation, for example. The risk that growth might just stop or growth might slow to very low levels. That implies that what we should be trying to do is make sure we continue to innovate and continue to have growth, but then there's other parts of the book that make it sound like we shouldn't move too fast and we shouldn't innovate too much because we don't want to get to our future before we've like kind of achieved some sort of civilizational maturity beyond what we have now to decide what we want that future to look like. We don't want to build these powers before we have a better idea of what to do with them. So I think these are examples where I'm just like, “Gosh, I don't know. It could be good to have more growth. It could be bad to have less growth. It could be that stagnation is a big threat. It could be that building powerful technologies too fast is a big threat.” I just don't really know. I'll tell you what I'm thinking about. I'm thinking about AI because I think it's a big enough deal and likely enough and that we've got enough traction on some of the major risks.

Dwarkesh Patel

Right, right. When I look throughout history, it often seems like people who predict long-term trends are too pessimistic. In the 70s, you might have been too pessimistic about the ability to find more oil or feed a growing population because you couldn't have predicted the technological breakthroughs that might have made these things possible. Does this inform some sort of vague optimism about the future for you with regards to AI or not?

Holden Karnofsky

I think historically, people have been overly pessimistic about future technologies. I think by default, the picture with AI looks really scary. It just looks like it would be really easy to get a bad future in a lot of different ways if we just didn't move cautiously. These two considerations balance each other out a little bit for me. I know a lot of people who believe that we're in deep, deep, enormous trouble, and this outcome where you get AI with its own goals wiping humanity off the map is almost surely going to happen. I don't believe that and this is part of the reason I don't believe it. I actually think the situation looks very challenging, very scary by default, and I think we're tending to overestimate how bad and how dire things are. So they balance out a little bit for me.

Dwarkesh Patel

Okay, gotcha. In many of these cases, it seems like it would be impossible to see the positive scenario come about. For example, if you were forecasting population in the 70s, is there some reasonable method by which you would have predicted this was not going to lead to some massive famine that kills a billion people? Or would that have been your focus in the 70s if Open Philanthropy was a thing back then?

Holden Karnofsky

I think it's really hard to know how “knowable” the future was in the past and what that means for today. I do think that when you look back at people trying to predict the future in the past, it just looks deeply unserious. You could say that future people will say the same about us. I'm sure they'll think we look less serious than they are, but I think there's a difference. I really do think there haven’t been attempts to rigorously make predictions about the future historically. I don't think it's obvious that people were doing the best they can and that we can't do better today. So this population is an example. It doesn't seem necessarily true to me that you couldn't have said “Gosh, the population has been going up for a while now and people keep inventing new ways to come up with more resources. Maybe that will keep happening.”

I'm just really not convinced you couldn't have said that. I'm definitely not convinced no one did say it. I think some people did say it. So I think I'm hesitant to get too defeatist just from the fact that some people were wrong about the future in the past. I think it's hard to know if there was really no way to know or if they just weren't trying very hard.

Dwarkesh Patel

One thing you just said a minute ago was that we are better at making predictions than people in the past were. So that alone should make us more optimistic about what we need to predict in the future.

Holden Karnofsky

It's just a guess. I mean this is what society is. We have had a lot of progress on all kinds of intellectual fronts. I think there has been a lot of progress on what it looks like to make good, reasonable predictions about the future. I think that's something that's happened. So I think we should be expecting ourselves to do a bit better than people did in the past and future people will probably do better than we do.

Dwarkesh Patel

Right. When I look at a report like Biological Anchors, I often wonder whether Asimov is just shooting the shit about screens and what you're able to do with them. Maybe he had less sources of error than this eight-step methodology where you might not even be aware that there's a ninth or tenth missing step that might make the whole thing invalid, and where many of the inputs have multiple orders of magnitude, wide confidence intervals. What do you think of that general skepticism?

Holden Karnofsky

I mean Biological Anchors is a very important input into my thinking, but it's not the only input. I think my views on AI timelines are a mix of A, looking at AI systems today, looking at what they did 50 years ago, looking at what they did 10 years ago, and just kind of being like, “Well, gosh, it sure looks plausible that these will be able to do all the things humans can do to advance science and technology pretty soon.” That's one input into my thinking. Another input into my thinking is what we call the semi-informative priors analysis, which is a complex report because it looks from a lot of different angles. I think you can summarize the highlights of the report as just saying that most of the effort that has ever in the history of humanity has gone into making AI because the field of AI is not very old and the economy and the amount of effort invested have gone up dramatically. I think that's a data point in favor of not being too skeptical that we could be on the cusp of transformative AI. In some sense, the world has not been trying very hard for very long. So that's an input. Another input is expert surveys when people ask AI researchers when they think AI will be able to do everything humans can. They tend to come out saying it's a few decades. That can be biased and unreliable in all kinds of ways, and all these things have their problems, but that's a data point.

Dwarkesh Patel

Then there's biological anchors.

Holden Karnofsky

Biological anchors isn’t a report I would summarize on a podcast. It's a very complex report. There are a lot of different angles it uses. There are a lot of different questions it asks. There are a lot of different numbers. However, I do think you can boil it down to some fairly simple observations. You can say that in some important sense (which could be debated and analyzed but seems true most ways you look at it), we've never built AI systems before that do as much computation per second as a human brain does. So it shouldn't be surprising that we don't have AI systems that can do everything humans do because humans are actually doing more work in their brains than a normal AI system is doing.

However, it also looks like within this century, we probably will have AI systems that are that big. If we estimate how much it would take to train them, how much it would cost to build them, that looks like it will probably be affordable this century. Then you could just talk about all the different ways you could define this and all the different ways you could quantify that and all the different assumptions you could put in. But my bottom line is like almost any way you slice it, and however you want to define what it means for an AI brain to be as big as a human's and what it would mean to get that brain sort of trained, most angles on it suggest that it looks reasonably likely it will happen this century. That's a data point for me. That matters. So all these are data points feeding into my view.

Dwarkesh Patel

Okay. So I'm stealing this from Eliezer who asked on Twitter, “Has there literally ever in the entire history of AI been any case of anybody successfully calling the development timing of literally any novel AI capability using a bio-anchored or bio-inspired calculation?” He has very complex sentences.

Holden Karnofsky

I saw some discussion of this on his Facebook and I think the answer might be yes. However, I mostly want to attack the premise. I just want to say that there haven't been a lot of cases of people predicting AI milestones with great precision and that's also not what I'm trying to do. A lot of what I'm trying to say is “Gosh, in this century, it looks more likely than not that we'll get something hugely transformative.” He's asking about some history of AI that's like a few decades old and there haven’t even been a lot of people trying to make predictions. A lot of the predictions have been way more narrow and specific than that. So I mostly think that this isn't a very important or informative question. I think all the work he's doing… has there ever been an example of someone using the kind of reasoning Eliezer is using to predict the end of the world or something?

That’s what he's predicting. So I mostly just want to challenge the premise and say, “Look, we're not working with a lot of sample size here. This isn’t some big, well-developed field where people have tried to do the exact same thing I'm trying to do 25 times and failed each time.” This is mostly people in academia trying to advance AI systems. They don't try to predict when AI systems can do much and do what? We're not working with a lot here. We have to do the best we can, make our best guess, use our common sense, use our judgment, and use the angles we've got. That's my main answer. Another answer I would give is that Hans Moravec was the original biological anchors person. I think he predicted artificial general intelligence around 2020. From Eliezer's own views, it's going to look like he was unbelievably close. Maybe Eliezer believes we'll see it by 2030. I think that's plausible. So if that's what you believe, then we'll look back on that as like the greatest technology prediction ever by a lot.

I think the answer is maybe. There was also some discussion in the comments about whether Moravec called that big progress on AI was doing well at vision by examining the retina. There was some debate about that. I think it's all very muddy. I don't think this is much of a knockdown argument against thinking about biological anchors. It is only one input into my thinking. I do think it looks kind of good for biological anchors that we have seen this deep learning revolution and we’ve seen these brute force AI training methods working really well when they didn't use to work well. This happened when AI systems started to be about the size of an insect or small animal brains within range, within a few orders of magnitude of human brains. You could call that a wild coincidence, but these numbers are probably all off by 10x, 100x, 1000x. I mean, we're talking about very important things and trying to get our best handle and our best guess. I think biological anchors looks fine so far. It doesn't look amazing, but it looks fine.

Dwarkesh Patel

Now, I'm sure many people have proposed that increasing progress in science, technology, and economic growth are the most compelling things to be doing instead of working on transformative AI. I just want to get your broad reaction to that first.

Holden Karnofsky

Sure. I think we're talking about the progress studies crowd here. I wrote a piece about this on Cold Takes called Rowing, Steering, Anchoring, Equity and Mutiny, where I discuss different ways of thinking about what it means to make the world good. I do have some sympathy for the idea that a lot of the way the world has gotten better over the last couple hundred years is just that we've gotten richer. We've had more technological capabilities, so maybe we should try and do more of that. I don't think this is a nutty thing to think. I think this is somewhat reasonable, but I feel that even a couple hundred years is not that big a percentage of the history of humanity.

I wrote a series called Has Life Gotten Better? that asks what the whole graph of quality of life looks like over the course of humanity. There is precedent for technological development seeming to make things worse. That's what it looks like happened in the agricultural revolution so I have some sympathy for saying, “Hey, this thing has been good for 200 years, let's do more of it,” but I don't think it's the tightest, most conclusive argument in the world. I think we do have some specific reasons to believe that developing some particular new technologies, not only AI, but also potentially bioweapons, could just be catastrophically dangerous. I think Toby Ord uses the analogy of humanity being like a child who's becoming an adolescent. It's like it's great to become stronger up to a certain point. That's fun. That feels good, but then at a certain point, you're strong enough to really hurt yourself and really get yourself in trouble, or maybe strong enough that you don't know your own strength. I think there's a pretty good case that humanity is reaching that point.

I think we're reaching the point where we could have a nuclear war or a bioweapon or AI systems that really change the world forever. So it might have made sense 300 years ago when we were all struggling to feed ourselves to say, “Hey, we want more power. We want more technology. We want more abilities.” Today, I think we're starting to enter the gray area. We're starting to enter the gray zone, or maybe we should slow down a little bit, and be a little bit more careful. I'm not saying to literally slow down, but I'm talking about priorities. I would rather look at what I think are dramatically neglected issues that might affect all of humanity's future and at least do the best we can to have a handle on what we want to do about them. Then put my effort into throwing more juice and more gas behind this ongoing technological progress, which I think is a good thing. It's just a matter of priority.

Dwarkesh Patel

Okay. Do you think that the entire vision of increasing progress is doomed if ideas get harder to find?

Holden Karnofsky

I’ve talked about the atoms of the galaxy argument before–– I think a broader common sense take would be that the world over the last couple of hundred years has changed incredibly dramatically. We've had new exciting technologies and capabilities every year. I think a good guess would be that that hits a wall at some point. It might be the atoms of the galaxy, or it might just be something much more boring. What we seem to observe when we look at the numbers is that we are seeing a bit of stagnation, a bit of slowing down that probably will keep slowing down by default. So, yeah, I think it’s probably a good guess that the world is changing at an incredible pace that has not been the case for most of history and it probably won't be the case for the whole future.

Choosing Which Problem To Solve

Dwarkesh Patel

Okay. Gotcha. I guess there are several reactions somebody could have to the idea that ideas are getting harder to find and therefore that this makes progress studies less relevant. If you look at your own blog, the entire thing is about you complaining about all this low-hanging fruit that people are not plucking. Nobody's thinking seriously about transformative AI. Nobody's thinking seriously about utopia. Nobody's thinking seriously about ideal governance. How do you square this with the general concept of ideas getting harder to find?

Holden Karnofsky

I think there's just a ton of really important stuff today that not enough people are paying attention to. That was true 50 years ago, and that was also true a hundred years ago. It was probably more true 50 years ago than it is today. It was probably more true a hundred years ago than 50 years ago. Gradually, the supply of amazingly important ideas that are not getting any attention is probably getting harder to do, but harder doesn't mean impossible. I do actually think that if people want to do something that's really new and world changing and dramatic and revolutionary, the worst way to do that is to go into some well-established scientific field and try to revolutionize that. I think it's better to just use your common sense and ask an important question about the world that no one's working on because it isn't a scientific field (because it isn't a field of academia, because it doesn't have institutions) and work on that. A lot of my blog does advocate that. For example, AI itself is a very well-established field, but AI alignment is a weird field that doesn't really have academic departments right now. A lot of what I'm talking about, like trying to predict what the future is going to look like, is a weird, low prestige thing that you can't easily explain to your extended family. I do think that's probably the best place to look if you want to do something that's going to be super significant or super revolutionary. That is why I've professionally been drawn to it–– looking for potential big wins that philanthropy could get.

Dwarkesh Patel

You’ve once said that we shouldn't follow in the footsteps of the greats to be a great person, and should instead have great achievements yourself. Isn't another way to think about that, that you should probably also ignore the advice that the optical advice you're giving or 80,000 hours gives, because those specific things that aren’t what's going to make you the next Einstein?

Holden Karnofsky

I mean, a fair number of your questions are part of the dialogue I see in those skeptical of the futurism world–– it feels to me like it's almost just getting unnecessarily fancy. I kind of just want to say “Who's someone who really revolutionized the way the world thinks about stuff?” Darwin. Now, what was Darwin doing? Was Darwin saying, “Well, I really don't want to think about this thing because I don't believe humans are capable of thinking about that thing and I don't want to think about this topic because I think it's too hard to know the future and blah, blah, blah.” Was he doing all that stuff or was he just asking an interesting question? Was he just saying, “Hey, this thing seems important. I'm going to use my common sense and judgment to figure out how it works and I'm going to write about it.” I think some of this stuff gets too fancy.

So I think today, if I just look at the world and I say, “What are the most important things that could matter for the world to be a good or bad place in the future?” I've looked at a lot of possibilities. I think AI alignment is one of the leading examples and I don't see a lot of people paying attention to it, so that's what I want to work on. I think a lot of the people who have done revolutionary work that we now look back on (whom a lot of people try to imitate), weren't trying to imitate what usually worked and stay away from the stuff that wasn't. They were just asking interesting, important questions and working on them. As far as myself in 80,000 hours, I just don't feel that we're well known enough or influential enough that our advice that that stuff we're interested in is obvious is automatically, therefore not neglected. I think the stuff we're talking about is very neglected, but if you find something that's even more neglected and more important, more power to you.

Dwarkesh Patel

Let's say the total amount of money given to EA just increased by an order of magnitude or something. What could be possible at that point that's not possible now?

Holden Karnofsky

I don't know. I think even then, the amount of money we'd be working with would be really small by the standards of any kind of government budget. In general, with philanthropy, I'm always looking for things where it's like, “Can we see the creation of a field? Can we fund people to introduce new ideas?” But, we're very small compared to the overall economy and the overall government. I think even multiplying everything by 10, that would still be true. I’m not sure exactly what we do with 10x as much money. I'm not even sure what we're going to do with the money that already exists.

Dwarkesh Patel

Yeah, but do you think there will be more billionaires in the future and does that imply you should be spending money faster now if you are?

Holden Karnofsky

In theory, we have all these models that say, “Here's our guess at how much money is eventually going to be available, and here's our guess at how many giving opportunities will eventually be there to fund. This is our guess about what's good enough to fund and what's not.” That's a very tentative guess. A lot of it is just really really, really imprecise stuff, but we have to have some view on it–– anyone who's spending money does. So, I mean, yeah, I do tend to assume that Sam Bankman Fried, Dustin Moskovitz and Cari Tuna are not the last billionaires who are interested in doing as much good as possible–– but it is really hard to model this stuff. Frankly, we have various rough models we've made over the years but we’ll also sometimes use our intuition and just say we fund the stuff that seems quite good and exciting and we don't fund stuff that doesn't. That's an input into our thinking too.

$30M OpenAI Investment

Dwarkesh Patel

Gotcha. How do you think about the risk that some of your giving might have negative impacts? People have brought this up in the context of your 30 million dollar investment in OpenAI, but in all sorts of context, especially when you're talking about political advocacy, people might think that the thing you do has negative side effects that counteract the positive effects. Is it just a straight calculation? How do you think about this?

Holden Karnofsky

I think in theory, what we want is to make grants that have more upside than downside or have expected net positive effects. I think we tend to be, in a common sense way, a little bit conservative with the negative effects. What we don't want to do is enter some field on a theory that's just totally messed up and wrong in a way that we could have known if we had just done a little bit more homework. I think that there's just something irresponsible and uncooperative about that. So in general, when we are making big decisions like big dollar decisions or going into a new cause, we’ll often try really hard to do everything we can to understand the downsides.

If after we've done roughly everything we can up to some reasonable diminishing returns, we still believe that the upsides outweigh the downsides, then we're generally going to go for it. Our goal is not to avoid harm at all costs. Our goal is to operate in a cooperative, high-integrity way–– always doing our best, always trying to anticipate the downsides, but recognizing that we're going to have unintended side effects sometimes. That's life–– anything you do has unintended side effects. I don't agree with the specific example you gave as an example of something that was net negative, but I don't know.

Dwarkesh Patel

Are you talking about OpenAI? Yeah. Many people on Twitter might have asked if you were investing in OpenAI.

Holden Karnofsky

I mean, you can look up our $30 million grant to OpenAI. I think it was back in 2016–– we wrote about some of the thinking behind it. Part of that grant was getting a board seat for Open Philanthropy for a few years so that we could help with their governance at a crucial early time in their development. I think some people believe that OpenAI has been net negative for the world because of the fact that they have contributed a lot to AI advancing and to AI being sort of hyped, and they think that gives us less time to prepare for it. However, I do think that all else being equal, AI advancing faster gives us less time to prepare. It is a bad thing, but I don't think it's the only consideration. I think OpenAI has done a number of good things too, and has set some important precedents. I think it's probably much more interested in a lot of the issues I'm talking about and risks from advanced AI than like the company that I would guess would exist if they didn't, would be doing similar things.

I don't really accept that the idea that OpenAI is a negative force. I think it's highly debatable. We could talk about it all day. If you look at our specific grant, it's even a completely different thing because a lot of that was not just about boosting them, but about getting to be part of their early decision making. I think that was something that there were benefits and was important. My overall view is that I don't look back on that grant as one of the better grants we've made, not one of the worse ones. But certainly we've done a lot of things that have had, you know, that have not worked out. I think there are some times shortly when we've done things that have consequences we didn't intend. No philanthropist can be free of that. What we can try and do is be responsible, seriously do our homework to try to understand things beforehand, see the risks that we're able to see, and think about how to minimize them.

Future Proof Ethics

Dwarkesh Patel

Let's talk about ethics. I think you have a very interesting series of blog posts about future proof ethics. Sure. You want to explain what this is first?

Holden Karnofsky

Sure. I wrote a short blog post series trying to explain some of the philosophical views and ethical views that are common among people who call themselves effective altruists. One of the ideas I appealed to is (I'm not sure I'm getting this right) how a lot of people I know are trying to come up with a system of morality and a system of ethics that would survive a lot of moral progress. They’re trying to come up with a system where if they later became a lot wiser and learned a lot more and reflected on their morality, they wouldn't look back on their earlier actions and think they were doing horrible, monstrous mistakes. A lot of history has just people doing things they thought were fine and right at the time, but now we look back and we're horrified.

You could think of yourself asking “What morality can I have that would make it not so likely that if there was a bunch more moral progress and if people learned a lot more, the future won't look back on me and be horrified of what I did.” So I wrote a bit of a series about what it might look like to try to do that, and laid out a few principles of it trying to use this to explain the moral systems a lot of effective altruists tend to use–– which tends to be some flavor of utilitarianism that is often very expansive about like whose rights count. So effective altruists are very interested in future generations that don't exist yet. They're interested in animals being mistreated on factory farms. They're interested in various populations that a lot of people don't care about today, but that there are large numbers of. So I try to explain that. A thing that's important is I laid this view out partly so I could argue against it later and I haven't done the latter yet. So I have a lot of reservations too about the ethical systems that are common with effective altruists.

Dwarkesh Patel

Alright, so let's talk about some of the pillars you laid out in this piece. Sentientism seems pretty reasonable to me.

Holden Karnofsky

There are three principles that I roughly outlined that you might want for a morality that is going to stand up to scrutiny or you won't be so likely to change your mind about if you learn more and get better. One principle is systemization. It's better to have morality based on simple general principles that you apply everywhere than have a morality that's just always you just deciding what feels right in the moment. The latter could be subject to a lot of the biases of your time and the former lets you stress test the core ideas. Two of the core ideas I propose are what I call “thin utilitarianism”, which is basically the greatest good for the greatest number and sentientism, which is basically saying that someone counts or someone matters if they're able to suffer or have pleasure.

I think you just said sentientism seems reasonable to you. I think sentientism might be the weakest part of the picture to me. I think you if you have a morality where you are insistent on saying that everyone counts equally in proportion to the amount of pain or pleasure they're able to have, you run into a lot of weird dilemmas that you wouldn't have to run into if you didn't have that view. So I think it's very strange, but I think it is actually one of the more questionable parts of the view. It's kind of saying, “When I'm deciding whether I care about someone, it doesn't matter at all if they're way in the future, if they're way far away, if they're totally different from me, if they're not human, if I've never met them, all that matters is if they can have pain or pleasure.” I think it sounds great, and I completely get why someone listening to me would say, “How could you ever disagree with that?” But I do think there's various challenges with it which I have not had the chance to write about yet. I doubt I can be very convincing on this podcast as of right now because I haven't thought enough about it.

Dwarkesh Patel

Alright––yeah, sounds good. Let's talk about systemization. Doesn't the fact that you have lots of complex and sometimes contradictory moral intuitions suggest that maybe the whole goal of having some fundamental principles you extrapolate the rest of morality from is kind of doom project?

Holden Karnofsky

I think it does somewhat suggest that. I am somewhat partial to that view and that's something I may be writing in my rebuttal. I also think it's possible to be confused and I think it's possible to have lots of stuff going on in your brain and some of it might be based on really good really good intentions of treating other people fairly and being good to other people. Some of it might be based on just other weird stuff about wanting to stand up for people who look like you or help people who look like you, etc. So I do have some sympathy for the project of trying to say, “My intuitions contradict each other, but some of them are coming from a good place. Some of them are coming from a bad place.” If I thought more about it, I would realize which ones are which, and I want to try and do that.

Dwarkesh Patel

Yeah. Let's talk about new totalitarianism. There’s this question from an old Scott Alexander post where he asks, would you rather the medieval church spent all of its money helping the poor rather than supporting the arts? So maybe there were fewer poor people back in the medieval times, but you wouldn't have any cathedrals or you wouldn't have the Sistine Chapel. I don't know how you would answer that if you were in medieval times.

Holden Karnofsky

It doesn't sound like the strongest version of this argument to me, to be honest. Maybe that would be fine or good. I don't know. My wife really loves these like old churches–– if I had more of her attitude, I would be more horrified by this idea. Low income people had a rough time in the past so them having better lives seems pretty appealing, so I don't really know if that's the best version of this argument.

Dwarkesh Patel

How much of future proof ethics is basically that you're very confident that a future Holden will have a much more developed and better set of ethics? How much do you think people in general or humanity in general will get better ethics over time?

Holden Karnofsky

Thhis has been definitely a point of confusion in this series and partly something I think I didn't communicate well about and which makes the series like not that amazing. I use the term moral progress and I just use it to refer to like things “getting better.” I think sometimes there is such a thing as thinking more about your morality, gaining some insight and ending up in a better place as a result. I think that is a thing that is real. There are some people who believe morality is an objective truth, but I'm not one of those people. However, even though I believe morality is not objective, I still think there's a meaningful notion of moral progress. There's such a thing as having more reasonable moral views than I used to.

What I didn't mean to say is that moral progress has any inevitability about it. I didn't mean to say that moral progress necessarily happens just because time goes on. I don't think that. I just think it's a thing that can happen. So I do think a future Holden will probably be better at morality just because I'm really interested in the topic–– I'm going to keep trying to improve it. I think that we have some reason to think that actually does help a bit–– a really tiny bit, but I'm not confident in that at all. I certainly don't think that society is going to have moral progress necessarily, but I do think we've had some in the past.

Dwarkesh Patel

Ok, but then it seems weird to label the system of ethics future proof ethics, right? Maybe it would just be future “Holden-proof ethics.”

Holden Karnofsky

Yeah, possible. I talk about this a bunch in the series and I think I just didn't do a great job with this. I think what I was trying to do is use a term that you didn't have to be a moral realist to to get behind. What I was really trying to capture was, “Can I think now to reduce the odds that if later I improve, I'll be horrified by my early actions?” That was what I was trying to capture the concept of. I'm not sure I really did it successfully.

Integrity vs Utilitarianism

Dwarkesh Patel

Gotcha. OK, so you had a recent post on the E.A. forum that I thought was really interesting. A quote from that is, “My view is that for the most part, people who identify as E.A. tend to have unusually high integrity–– but my guess is that this is more despite utilitarianism than because of it.” So what do you think is the explanation for this coincidence where a group of reasonable, non fanatical, high integrity people also happen to be a community of utilitarians?

Holden Karnofsky

You might have a set of people who are who think of themselves as trying really hard to be like the kind of person they should be or really hard to bring their actions in line with their beliefs and their statements–– so that drives them to be kind of like honest a lot of the time and follow a lot of our common sense rules of morality. It also drives them to really try to get that ethics right and land on ideas like utilitarianism that are very systematic and pure and like give you sort of this clear theoretical guidance. So it could drive both those things. Whereas I believe that if you're a utilitarian, it's really unclear whether utilitarianism actually tells you to do things like avoiding lying. Some people think it does. Some people think it doesn't. I think it's very unclear.

Dwarkesh Patel

You've advocated for the moral parliament's approach when you're trying to make decisions. What is the right level of organization at which to use that approach? Should individuals be making decisions based on having multiple different moral parties inside them? Is that the right approach for entire movements but individuals should be specializing? What is the right level to be applying this approach at?

Holden Karnofsky

Moral uncertainty is something I hope to write about in the future. The basic idea is that there might be a bunch of different ways about thinking about what the right thing is to do in the world. You might look at the world from one angle and say, “Well, what matters is like the total sum of all the pleasures. So therefore a bigger world would be better. So therefore I should be like really obsessed with getting the world to be as big as possible.” There might be another perspective that says that what really matters is suffering. “We should minimize suffering. We should want the world to be small.” There might be another perspective that says it doesn't matter what happens to the world. “It matters how I act. What matters is that I act with integrity, that I tell the truth, things like that.”

There's these interesting debates asking what should you do when you think you have some sympathy for all these views. How do you choose an action that some perspectives would say is the best thing you've ever done and some would say is the worst thing you've ever done? The moral parliament idea is an idea that was laid out by Nick Bostrom in an Overcoming Bias post a decade ago that I like. I think about it as if I'm just multiple people. I just think about it as if there's multiple people all living inside my head arguing about what to do and they all are friends and they all care about each other and they want to get along. So they're trying to reach a deal that all of them can feel fairly good about. That is how I tend to think about dealing with different moral views.

I tend to want to do things that are really good according to one and not too bad according to the rest and try to have the kind of different parts of myself making deals with each other. So that relates to something I said at the beginning about not being into ends justify the means. I put a lot of effort into doing things that would be like really, really good. If this most important century stuff came out true, that would be good but it also would not be too catastrophic if it didn't. So there are lines I'm not willing to cross. There are behaviors I'm not willing to engage in to promote the kind of goals of people who worry about AI safety. So it's a moderating approach I think.

Bayesian Mindset & Governance

Dwarkesh Patel

It makes a lot of sense for somebody who is the CEO of Open Philanthropy to want the decisions you make to reflect uncertainties about your decisions. However, if it's just somebody like me where I'm not in some sort of leadership position where I have a large amount of resources to dedicate to, should I just specialize in that particular moral view I have or should I also be trying to allocate my time and resources according to different moral views?

Holden Karnofsky

I think no matter what position I was in in the world and however many resources I had, I would feel that my decisions were significant in some sense and affected people and were important in the way that they affect those around me. So I think it's just very natural to me to think there are a lot of different perspectives on what it means to be a good person rather than trying to turn them into a single unifying mathematical equation and take the expected value–– which is another approach I think is interesting. I do think the approach I do tend to prefer is to imagine the different perspectives as different people trying to get along and make a deal with each other.

Dwarkesh Patel

Let's talk about governance and management. In software, as I'm sure you're aware, there's a concept of a 10X engineer. Is there something similar in the kinds of work a research analyst at Open Philanthropy does? Is it meaningful to say that when two people are doing the same job, one can be orders of magnitude more effective than another?

Holden Karnofsky

At any given thing in open philanthropy, some people are much better at it than others. I don't think that's very surprising. I think this is true fot many jobs and I don't really know the reasons for it. It could be any combination of talent, interest, and how hard someone works at it. I certainly think there's a lot of variance and hiring people who can do a great job at the work Open Phil does has been a lifelong challenge.

Dwarkesh Patel

You've written about the Bayesian mindset. You know many billionaires, and many of them are donors to Open Philanthropy. In your experience, do these startup founders who end up becoming very successful have a Bayesian mindset or is that the wrong way to characterize their –

Holden Karnofsky

Yeah, I wrote about this idea called the Bayesian mindset, which is basically about being willing to put a probability on anything and use your probabilities and say your probabilities as a way of discovering why it is you think what you think and using expected value calculations similarly. I think this is like much more common among successful tech founders than it is among like the general population, but there's plenty of tech founders who don't think this way at all. I say in the Bayesian mindset. I don't think it's like a super well-tested, well-proven social technology that does amazing things, but I do think it's more like an interesting thing to be experimenting with.

Dwarkesh Patel

Well, to the general population, the Bayesian mindset is practically unheard of.

Holden Karnofsky

Yeah, I mean it’s not even just the name. This whole idea of like thinking about expected value and subjective probabilities all the time–– almost no one does that. However, I do think tech founders probably do it more than the average person.

Dwarkesh Patel

That makes sense. Do you think that adopting more of a Bayesian mindset would help somebody get to the top levels and be more successful?

Holden Karnofsky

It's really TBD and unclear. I think the Bayesian mindset is a cool thing to experiment with. I experiment with it a lot. I feel like it helps me sometimes. Like most things, it's good in moderation and with taste and not using it for every single thing. Maybe 10 years from now as it gets more popular, we'll have a better sense of where the actual applied strengths and weaknesses are.

Dwarkesh Patel

As I'm sure you're aware, there are many prizes floating around for all kinds of intellectual work in effective altruism. Some of them even have come from open philanthropy. Are you optimistic about their ability to resurface new ideas?

Holden Karnofsky

I would say I'm medium-optimistic about the impact of all these prizes. I've been part of designing some of them, but I've just seen some other ones… people say, “Hey, we'll pay you X dollars if you can give us a good critique of our…” GiveWell will pay people to give them a good critique of their reasoning about what the best charities are to give to. Open Philanthropy has a prize for showing us a cause we should be looking at that we're not. I think I'm medium optimistic. I think it will get some interest and it will get some people to pay attention who weren't otherwise and some of those people might have good ideas, but I don't think it's like the only way to solve these problems or that it will automatically solve them. That's generally how the people designing the prizes think about them too.

Dwarkesh Patel

You have an interesting post about stakeholder management that says that over time, institutions will have to take into account the interests of more and more stakeholders. Do you expect that this will be something that will be a major factor in how Open Philanthropy acts in the future? What will be the impact on how Open {hilanthropy runs overall?

Holden Karnofsky

Yeah, I think in general the bigger your organization is, the bigger your city is, the bigger your society is–– if you want everyone to be happy, there are more people you're going to have to make happy. I think this does mean that in general by default as a company grows, it gets less able to make a lot of disruptive quick changes. A lot of people would use the term “nimble”. A lot of people in the tech world like to use these very negative terms for big company properties and very positive terms for small company properties. So small companies are nimble and quick and practical and adaptive and dynamic and high productivity and big companies are bureaucratic and slow and non-adaptive. I think that's all fair. I also think that big companies often at the end of the day just produce more stuff than they could if they were small.

I think if Apple were still 10 people, it might be a more exciting place to work at— but they wouldn't be able to make all those iPhones. There are a lot of iPhones going out to a lot of people, serving a lot of different people's needs, and abiding by a lot of regulatory requirements. There's a lot of work to be done. So I don't think it's necessarily a bad thing but I think there’s a tradeoff when a company grows. I do think Open Philanthropy is in the business of doing kind of unconventional giving and using a lot of judgment calls to do it. So I tend to think we benefit a lot from staying as small as we can and I generally have fought for us to stay as small as we can while doing our work–– but we still have to grow from where we are.

Dwarkesh Patel

Gotcha. Do you mean stay small in terms of funds or do you mean people?

Holden Karnofsky

People.

Dwarkesh Patel

Okay yeah, people. It seems odd to say that in the organization you have the most experience with, your inside view is that more stakeholders would be bad–– but overall it's been a net zero or positive.

Holden Karnofsky

Well, it's not clear – we are growing. We're bigger than we were a year ago and we'll be bigger in a year. So it's definitely not true that I’m trying to minimize the size of the company. We're growing, but I think we want to watch it. I think we want to treat each hire as something that we only do because we had a really good reason to. I think there are some companies that may have more to gain from being 10,000 people. I don't think we'll ever be 10,000 people.

Career Advice

Dwarkesh Patel

Right. Now your written career advice emphasizes building aptitudes and specializing, but when I look at your career––, it's all over the place, right? We were just talking about it at the beginning of the interview. You started off in GiveWell, then you were working at Open Philanthropy and now you're forecasting AI. So how do you think about this kind of thing? Are you specializing? What's going on here?

Holden Karnofsky

I don't know if I really forecast AI. I mostly distill and bring together analyses that others have done, and I manage people who work on that sort of thing. The career advice I often give is that it's really good to have something you're very focused on that you're specializing in, and that you're trying to be the best in the world at. The general theme of my career is just taking questions, especially questions about how to give effectively, where it's just like no one's really gotten started on this question. Even doing a pretty crappy analysis can be better than what already exists. So often what I have done in my career, what I consider myself to have kind of specialized in, in a sense, is I do the first cut crappy analysis of some question that has not been analyzed much and is very important.

Then I build a team to do better analysis of that question. That's been my general pattern. I think that's the most generalizable skill I've had, but I have switched around because I do think that I've kind of at various points in my career just said, “Hey, here's something that's getting very little attention and it's very important, and it's worth the sacrifice of the specialized knowledge I built up in one area to switch into this other area that I think I ought to be working on.”

Dwarkesh Patel

What does the logo on the Cold Takes blog mean?

Holden Karnofsky

There is no logo. I think you're talking about the browser icon.

Dwarkesh Patel

Yeah, yeah.

Holden Karnofsky

That is a stuffed animal named Mora. At some point, if I get enough subscribers, I will explain who all these stuffed animals are, but my wife and I basically use a stuffed animal personality classification system where we will compare someone to various stuffed animals to explain what their strengths and weaknesses are. Mora is a pink polar bear who's very creative but also very narcissistic and loves attention. So she's kind of the mascot of the blog because it's this blog that's just very crazy, very out there, and is just me writing in public. So it just felt like her spirit.

Dwarkesh Patel

Gosh, okay. So let me ask, what is the goal of the blog? Why have a second job as a blogger in addition to being the CEO of a big organization?

Holden Karnofsky

I think it fits into my job reasonably well. I didn't want it to be Open Philanthropy branded. I just wanted the freedom to write about things the way I wanted and how I wanted. I do think that we make these high-stakes decisions based on very unconventional views about the world. So I think it's good for us to be trying to make those views have contact for the rest of the world. I think there would be something not ideal about being a large foundation giving you large amounts of money but then just quietly going around believing these enormously important and true things that no one else believes. If we put the views out into the world, A, I think all the people seeking money from us would just have a better idea of where we're coming from and why is it that we're interested in funding what we're interested in funding.

I think to an extent, if people can find the arguments compelling or even just understand them, this helps us understand our thinking and can help create more grantees for us. It can help cause the world to be a place where there's more good stuff for us to fund because more people see where we're coming from, hopefully, agree with it, and are trying to work on the things we consider important. Then to the extent that my stuff is actually just screwed up and wrong and I've got mistakes in there and I've thought it all through wrong, this is also the best way I know of to discover that. I don't know how else I'm going to get people to critique it except by putting it out there and maybe getting some attention for it. So that's how I kind of think of it–– it's taking views that are very important to the decisions we're making and trying to express them so that we can either get more people agreeing with us who were able to fund and support and work with or learn more about what we're getting wrong.

Dwarkesh Patel

Alright. So let me actually ask you–– has that happened? Has there been an important view expressed on the blog that because of feedback that you’ve change your mind on? Or is it mostly about the communication part?

Holden Karnofsky

I mean, there's definitely been a lot of interesting stuff. An example is I put up this post on the track record of futurists and there was a post by Dan Luu recently that I haven't read yet. It has its own analysis of the track record of futurists and I need to compare them to think about what I really think about how good humans have historically been at predicting the future. He certainly has a ton of data in there that I was not aware of that feels like a bit of a response or may not have been prompted by it. There's been a lot of commentary.

There's been a lot of critiques about some of the stuff I've written in the most important century. There have been other critiques. I think a lot of the stuff I wrote about the biggest weak points of the most important century was based on the public criticism that was coming in. So I think I have become more aware of a lot of the parts of my thinking that are the least convincing or the weakest or the most in need of argument. I have like paid more attention to those things because of that.

Dwarkesh Patel

Yeah. This may just be me talking, but it actually does sound like if you've learned about how people react to the most important century thesis, but it doesn't seem like something has surfaced–– which has made you change your mind on it a lot.

Holden Karnofsky

That would be a big, a big change, to drop my view that we could be in the most important century for humanity. That's still what I believe. I mean, I think I've also heard from people who think I'm like underselling the whole thing–– crazy people who just think that I should be planning on transformative AI much sooner than what I kind of implied in the series. So, yeah, I put out “Most Important Century” and I don't believe any of the critiques have been deep enough and strong enough to make me just drop that whole thing. It's a big picture with a lot of moving parts and I have deepened my understanding of many of the parts.

Dwarkesh Patel

Yeah. One thing I find really interesting about your work is how much it involves the CEO having a deep understanding of all the issues involved. You’re the one who has to deeply understand, for example, moral parliaments or specific forecasts about AI, biological anchors, and whatever else, right? It seems perhaps in other organizations, the CEO just delegates this kind of understanding and just asks for the bullet points. Is this something you think more leaders should be doing or is there something special about your position?

Holden Karnofsky

I know much less about any given topic than the person who specializes in the topic. I think what I try to do is I try to know enough about the topic that I can manage them effectively, and that's a pretty general corporate best practice. I think it just varies a lot. So, for example, something like keeping our books, keeping our finances, doing the financial audits, and all that stuff–– that's something that's really easy to judge the outputs by without really knowing much about finance at all. You can just say, “Look, was this compliant? Did we do our audit? Did we pass the audit? Do we still have money in the bank? How much money do we have in the bank?” You don't need to know much about it to judge it effectively.

However, at any other company, you may need to know a fair amount about some other topics in order to judge them very effectively. If your company is making computers or phones, and design is very important, it would be really bad if the CEO just had like no opinions on design and just thought “I'm going to let our design person decide the design.” It's a central thing to the company. It matters to the company and they should know some things about it. So I do know things that are really central to open philanthropy. What does it mean to do good? How do we handle uncertainty about how to do good? What are the most important causes? If AI might be one of the most important causes, then when might we see transformative AI? What would that mean? How big is the risk of misaligned AI? I think I need to understand those issues well enough to effectively manage people who know a lot more about them than I do. I'm curious–– what do you think about this whole most important century stuff? Does it just strike you as like crazy? What do you think when you read the series?

Dwarkesh Patel

Yeah, obviously through the entire interview, I've been trying to like nitpick at small things, but when I really think about the main claim you're making, it’s that this could be the most important century and transformative AI could happen in the century. If it does, then it's a really big deal and yeah, I don't disagree. That makes a lot of sense. Throughout preparing for the interview and trying to come up with objections, I've just been a little bit more convinced with thinking about, “Is there actually something I could do over my early career that matters? Or is that something that maybe I should just hold off on thinking about?”

Holden Karnofsky

Glad to hear it. Do you have any ideas about what you might do?

Dwarkesh Patel

No.

Holden Karnofsky

Really? Literally no ideas? You haven't been like, “Can I work on AI alignment?”

Dwarkesh Patel

Well, yeah in that sense, I’ve thought a little bit about it. In probably like two months or three months, I'll think really hard about what I actually want to do for a career.

Many thanks to my amazing editor, Graham Bessalou, for producing this podcast and to Mia Aiyana for creating the amazing transcripts that accompany each episode, which have helpful links and you can find them at the link in the description below. Remember to subscribe on YouTube and your favorite podcast platforms. Cheers. See you next time.