Edouard Mathieu on Our World in Data
In this episode, Fin and Luca talk to Edouard Mathieu.
Edouard Mathieu is the Head of Data at Our World in Data (OWID), a scientific online publication that focuses on large global problems such as poverty, disease, hunger, climate change, war, existential risks, and inequality.
- What Ed learned from working with governments and the WHO in reporting Covid data
- A simple change the WHO could make to radically improve how countries share data for the next pandemic
- The idea of ‘experimental longtermism’
- How Ed is thinking about collecting data on transformative artificial intelligence and other potential existential risks
- Figuring out the value of making everyone slightly better-informed
- Lessons for starting a career in impact-oriented data science
- And finally… Ed’s favourite OWID chart
- The Signal and the Noise: Why So Many Predictions Fail—But Some Don’t (2012) by Nate Silver
- Factfulness: Ten Reasons We’re Wrong About the World – and Why Things Are Better Than You Think (2018) by Hans Rosling
- Beyond Measure: The Hidden History of Measurement from Cubits to Quantum Constants (2022) by James Vincent
- Five Books
- See Edouard’s own interviews for Five Books
- Our World in Data
- An Ed favourite: You want to reduce the carbon footprint of your food? Focus on what you eat, not whether your food is local
- Another Ed favourite: What are the carbon opportunity costs of our food?
- Another Ed favourite: How much economic growth is necessary to reduce global poverty substantially?
- The website of Edward Tufte
- Edward Tufte’s The Visual Display of Quantitative Information
- Ideas of India podcast: Rukmini Shrinivasan on What Data Can and Cannot Tell Us
- Jess Whittlestone and Jack Clarke, ‘Why and How Governments Should Monitor AI Development’
- The GapMinder website
- The Clock of the Long Now
- The Economist: Tracking covid-19 excess deaths across countries
- Data journalism
- Financial Times: How space debris threatens modern life
- Dr. Peter Brecke
Hey, you’re listening to Hear This Idea. In this episode, we spoke to Edouard Mathieu, who is the head of data at a website you might have heard of called Our World in Data, or OWID. OWID finds the best available research on global problems like poverty, climate change, war and pandemics, and then it presents that information through all kinds of interactive charts and really clear write ups. It’s just an amazing resource, and I think I speak for Luca when I say Our World in Data is one of our favourite websites in the whole world. So as you’ll hear, OWID gained an enormous new following during COVID, when they unexpectedly became one of the only outlets in the world collecting together and communicating up-to-date figures on key measures of the pandemic for the world, like reported cases, and later on vaccination uptake. Now, Ed oversaw this effort. And so we spend the first part of the podcast just hearing what a wild story that was, as well as the lessons Ed took from dealing with national and international agencies during a global crisis, and how those lessons might transfer to global catastrophes. We also talk about the idea of experimental longtermism, what Our World in Data has to do with EA, the challenge of collecting and communicating important data on AI and especially transformative AI, quantifying the value of making the world slightly more sane and well informed, whether EA orgs could borrow OWID’s open source model, Ed’s own career and advice for people who might want to work in a place like Our World in Data, useful concepts for better data visualisation, OWID’s future plans, and much more. As always, there are chapter markers in case you want to jump right to the parts of the conversation that interest you most. But without further ado, here’s the episode.
Ed, thanks for joining us.
Wonderful. Well, the first question we ask everyone is: Is there a problem that you’re stuck on right now?
Not necessarily stuck on but one thing I’ve been thinking a lot about and working on recently is the question of how to introduce people to the topic of artificial intelligence in an OWID kind of way - OWID standing for Our World in Data, obviously. So it’s something we’ve been thinking about and discussing for a long time now, for a few months. We want to start working on AI and publishing data on AI and obviously publishing articles on the topic. But it’s also something where we’re very much aware that it’s not as easy to tackle as something like poverty or climate change. It’s something that a lot of people have never heard about, or heard little about, or in a way that didn’t really let them understand the topic. And so we’re keen to think very carefully about how we want to introduce this. We can’t really start straight away with something like AI safety or AGI because that doesn’t make any sense to most people. So we have to start much earlier with just what is artificial intelligence? What do we mean by it? What is it currently doing to society, to many kinds of problems around us? And then to slowly get into more weedy stuff, like, what could possibly happen if AI becomes even more intelligent? Could it reach human capacity? What would happen then? What are the different scenarios? And what do researchers think about this? I tend to think about it as this idea of a train that would go from not knowing anything about AI to knowing everything that people interested in AI safety know. And it’s a train that has to stop at very many different stations. And we have to make sure that we remember to stop at each station because as people who learn a lot about the topic and read a lot about it, we tend to forget what it’s like to have never heard of it. And so, again, we can start straight away with weird concepts that don’t make any sense to most people.
Yeah, this gets called the ‘curse of knowledge’, if you’ve heard that phrase,
Yeah, the fact that you cannot possibly remember what it felt like to not know about this because now it seems like common sense to you.
Jumping the gun here a little bit, but what does presenting AI and Our World in Data style mean? What kind of challenges come up when you’re trying to picture this in data, which is in many ways a forward looking concept, where many of the points are maybe harder to illustrate using data?
I think it’s a tricky thing to define, and that’s actually something that makes it hard for us to hire people to write because it’s a very tricky thing to get right. It’s this perfect balance of not talking to people like they’re idiots. We’re not talking to children. We’re mostly talking to an audience of people who actually know academia and research a little bit who are very interested in data, who are very knowledgeable about many topics, maybe not exactly this one, but about other things. So quite data savvy, quite research savvy in general, and very interested in learning in general. So we need to factor that in. But at the same time, we need to make sure that when we start writing an article, we don’t leave out any kind of assumptions, and that we write as clearly as possible, that all of the points are made extremely clearly. And this basically means that the writing style has to be extremely thought through, it has to be both very direct and impactful, but also very thorough in making everything extra clear. And it’s something that’s very difficult, because typically a journalism style will be not rigorous enough. It will skim over some things, it will exaggerate some points just to make it catchy. And that’s not something we want to do. We want to be more academic than that. But in many ways, the typical academic writing style will be the opposite of that, and will be way too rigorous, way too boring, way too precise, and get lost in methodology aspects. We don’t want people to lose interest when they read. So it’s a balance that’s very hard to find, and that’s very hard to reach. And it’s also, again, quite hard to find people who have the mix of backgrounds to have enough research knowledge to do that right, but also to write well for people who are not their colleagues in academia.
OWID during Covid
Yep, totally. And I look forward to chatting about this a lot more later on. I was thinking about a way to start off this interview, and COVID feels like a very obvious place to start. And I was wondering what question to ask, and I realised that I actually just don’t have a very good picture at all of what goes on behind the scenes from firsthand data collection to the chart ending up on Our World in Data. What does the pipeline look like, speaking to the most naive person about this?
It’s a complicated question, also, because it has varied over time, and it varies depending on different kinds of metrics. The thing we did initially, and that we still do is to aggregate data from different sources. So data that other people in other institutions collect. The best example is the confirmed cases and confirmed deaths that are collected by Johns Hopkins University. What we do with that is basically just take their data, we first analyse a little bit by adding things like a seven day average, which is not something they do, they just have the very raw data. And so we add that on top, to make it more comprehensible. We calculate things like the case fatality ratio by dividing deaths by cases, things like this. And then what we do as an extra step is to make it pretty by basically packaging that into our main thing right now, which is the Data Explorer, which is this interface where people can select a bunch of countries, switch between metrics easily, and toggle something to look at per capita metrics instead of absolute numbers. And so we provide this interface so that people can more easily look at the data. So that’s the data we get from other sources. Then there’s the part of the worry that actually takes us the most time, which is the actual data collection. So for a bunch of metrics, we don’t want to take the data from anywhere because no one is collecting it. We actually collect it ourselves. We started doing that quite early on, in April of 2020. That’s why I joined OWID in the first place. We started collecting data on testing, because it appeared pretty early on that cases, numbers, didn’t really make much sense if you didn’t know how much testing was being done. So we started collecting data about tests to get the positive rates and things like that. And so this work means pretty much going every day to the website of 200 different countries and collecting the number. And this is also something we’ve done in a much more visible way almost a year later for the vaccination. So in late 2020, in December, when vaccination started in the UK, we started collecting this data again, country by country, by pretty much every day going through the websites of all of these countries and looking at just how many vaccinations had been done the previous day. And in terms of what this actually means, it means a bunch of things. Right now today, it’s pretty much all automated. It’s very easy, because countries have either a bunch of open files, like CSV files, or they have APs, everything is really clean. And even the countries that don’t have very clean data, the WHO is providing regular updates that the governments give them, and so we get the data from that even from poor countries. The problem is that for a very long time this wasn’t at all like that. For the first few months of both the testing and the vaccinations, the data collection meant literally at first going through places like press releases and media articles, and even Twitter and Facebook, to look at the posts and the tweets of health ministers, for example, a few days after the vaccination started. The French Minister of Health would just send out a tweet saying, ‘Oh, we’ve vaccinated 2000 people.’ And that was the only proof we ever had of that number. We didn’t have any file, any official press release. And so the link in our data was just linked to the tweets, and we were like, ‘2000’, literally by manual inputs. And so for the first few months, this was extremely time consuming, as you can imagine, also very hard to clarify, because some things you just don’t really understand, because maybe the Prime Minister seems saying something slightly different from the Health Minister, you don’t really know who’s right. And so this is extremely hard to do and extremely time consuming. Over time, these things have become better. And so, countries have started to publish data that’s in a much more usable format. And so little by little, we’ve been able to automate a lot of these countries. Now, the capacity we have to automate also depends a lot on just what countries make available. In an ideal case, an automation looks like a CSV file that you download every day and look at the latest number. In the worst possible cases, it’s a weird dashboard that you can’t possibly pass because there’s no table or CSV file behind it. So we literally have a script that opens a fake browser, loads the dashboard, knows exactly on the screen where the number is, and collects that number for that day. And as you can imagine, that breaks all the time because maybe one day the developer decides to put the number on the left, so now it doesn’t work anymore. And so a lot of the work we’ve been doing is every morning, launching that data collection. It crashes for a bunch of countries, and so we spent a couple of hours fixing those scripts.
Why isn’t this standardised? It feels like it’s in literally everyone’s interest to have some norms and formats.
I think part of it is actually that there’s no incentive to do that. It would have to be standardised by an institution. I think it would have to be the role of the WHO, most likely, to have a standard format for that. Maybe there’s a hope that because of COVID people are going to be thinking about that more. But currently, there’s no particular plan to standardise that in any way. I think there’s also the counterproductive thing that, because we made the effort of collecting everything, there’s no incentive for people to standardise it because they know that whatever happened, we need the data, it needs to be available. So they know we’ll make the effort.
Well it definitely sounds like there is some incentive, right? To the degree, as you said, that Our World in Data is creating a lot of value by manually going through and standardising the stuff. It should probably be in somebody’s interest that their data will be able to be compiled with somebody else’s data.
It’s actually in our interest in a way. So what we started doing after a few weeks, and especially as countries became very aware, especially for the vaccinations, that we were the only source, and so if they started vaccinating, and they wanted the data to be on the dashboard, and in the data, they got in touch with us saying like, ‘Oh, we just started vaccinating. Can you please, please, please add the data?’, that gave us leverage to say, ‘Okay, but then can you please put it in a nice format? Can you put it in a spreadsheet at least? Can you add the name of the vaccines or some basic information that you haven’t provided?’ So we did that for a few countries. And even for some countries, we actually got into a position of being involved in government meetings where they asked us exactly how to publish the data, which was nice. But in most countries, that did not happen. And so we just kind of had to deal with the formats they gave us which sometimes was just a pain, but we could still deal with it. But sometimes it actually meant not having enough information. For example, that was not the case in the UK, but in many EU countries, some vaccines were one though, some were to dose. And if you don’t publish the data in a good enough format, there’s basically no way to know which people have gotten the first dose of a one dose vaccine or the first dose of a two dose vaccine, and so whether they’ve completed the protocol already. And so because of that we spent weeks and weeks trying to pass that information, and asking governments to fix that data so we could actually understand.
Was the opposite ever true, or did you ever get inklings of it, where governments or some kind of body or authority doesn’t want to make the data easily comparable or super transparent?
Yeah, we had a few instances of this where we kept asking questions, and we either would not get any reply or the reply would be extremely vague, enough to give us a sense that probably in some countries, actually, the national system was not precise enough to give them accurate numbers on a national level, maybe because the systems were not interconnected correctly. And either they were not counting everyone or there was some double counting happening. And so giving us desegregated data would have revealed that and so they gave us very aggregate numbers, very ballpark numbers. Some exchanges with some countries made it pretty clear that there was something a little fishy with the data. It probably wasn’t on a huge scale. I’m not expecting that any country has only vaccinated 10%, and they’ve pretended to vaccinate 80%. It’s more like, at some point, you realise that actually some country is pretending to know exactly a number, and it’s a ballpark estimate within a few percentage points. It’s actually known that, for example, it is the case in the US - it’s been very clearly documented - that because there’s been some problems with the data in terms of people going to get the second dose, but sometimes being counted as a second first dose, it’s not clear (in the US) exactly how many people have been fully vaccinated with the original protocol versus how many people have gotten the first dose. And to this day, it’s still not really resolved.
Yeah, in general, I wonder if there is some dynamic where there definitely is an incentive to collect and report the data, but maybe there are much weaker incentives to make that data play well with other countries data and standardise it to the same kinds of formats, and just generally make it digestible and understandable, because by that point, it’s out of my hands. Without some kind of coordinating mechanism, it’s like, ‘Where does that incentive come from?’ Right? My job is done.
Yeah, exactly. To be fair, the only institution, outside of special circumstances like this one where we had a little bit of leverage temporarily, is the WHO, and they have actual leverage to impose formats. So when, for example, a new pandemic starts, or there’s a new infectious disease episode, they actually publish a case detection form that they send to countries, and that countries have to send back every time they get a new case of the disease. And that list things like the age of the person, date of birth, gender, symptoms, things like that. And they can impose a common set of reporting standards. The problem is that the whole system of the WHO case detection form has been thought out for diseases with very few cases. So you get a new case, and the form is four pages for one person for one case. And so this makes sense when an outbreak starts. But when you’ve got 100,000 cases a day, which has happened in many countries, you cannot possibly fill those forms. So basically, countries completely stopped filling those forms, and what they should have access to would be some kind of standardised system for aggregated numbers. But that doesn’t exist.
Sounds like it could exist there though, right? So it’s pretty easy to deal with.
Technically speaking, yes. Technically speaking, we’re only talking about some kind of API where each government would have access with a key and they could push new numbers or revise numbers if they need to. But as far as I know, there is no way that they can actually do that. They do that indirectly, by publishing files in the open that we, for example, at Our World in Data, or Johns Hopkins University, pool once, like one time every morning, to get the data. So we create a common standard, but they can still publish the data in all kinds of formats.
And how lucky do you think it is? I guess part of the argument here is that COVID was unprecedented in the WHO’s existence - I don’t know if this is true - and when you have to deal with case numbers as big as with COVID, this four page system is broken. Given that, how likely do you think it is that the WHO, or some other international body, is now going to take steps and fix this for the next pandemic?
In terms of willingness, it’s high. I’ve been involved in a few discussions with the WHO. I know of other institutions, and have seen the fact that they genuinely want to do better. I think what makes me sceptical is the fact that I’m not really seeing hints of an actual change. The main reason for that is that the recent example is monkeypox, where we’ve had the same thing of, ‘Oh, here are a few cases and more cases and more cases.’ And most of the discussions I’ve heard were discussions around that form, like, ‘What questions should we put on the form?’, and it took several weeks to get the form right. And obviously, these were important questions like ‘What should we ask countries to report?’, and all that, but again, it took several weeks to get that form right, and during those several weeks, the epidemic just kept growing. And the institution that has done the actual work of counting cases in a way that’s useful and usable, is a private institution. It’s Global.health, which is based at Oxford again, and they’ve been doing the work I think the WHO should be doing. We have a monkeypox data explorer now, and it’s based on their data. It’s not based on WHO data.
So, if I’m understanding you right, to be clear, these four page reports on the individual case have a lot of value. And to the degree that, especially early on in the pandemic, which is a really critical time to have some of this detail and resolve some of these unknowns, this is really important. But it sounds like one of the bottlenecks is: when do you move from that four page stage to a coordinated, international, ‘We’re gonna aggregate and count things’, stage
Exactly. I think these would ideally need to be two separate systems, where you would still collect individual data on many cases, to be able to analyse at an epidemiological level things like symptoms, gender and age and many kinds of breakdowns. But at some point, you just get to a level that is physically impossible to pass. You cannot possibly ask a country to report 100,000 forms per day. This just cannot work. I think probably this system has been designed for smaller outbreaks, like Ebola, or something like that, where it is literally possible to do a few hundred cases a day or a few thousand. But, at some point, it breaks down. And I’m completely willing to think that COVID was the first example of that. And I think that we need to move away from this, from something that is less of a form-based reporting system where you print out a PDF of four pages, to something that looks more like an API where a country pushes 4273 for today, and that’s the data they have.
Well, if you happen to be a tech literate, high ranking official in the WHO, you know who to call. I actually had one, probably very naive, question, which is: I’m a government. I am publishing my case numbers on some outbreak. Just very briefly, where am I getting those numbers from? Presumably hospitals are reporting to regions. Is there some sort of bottom-up aggregation?
Yeah, it’s very bottom up. It obviously depends a lot on the country. But the typical thing is hospitals or very local health systems reporting to a regional level, and then people at the regional level sending this to a national authority, and the national authority basically adding up the cases. What this looks like in practice was a lot of excel spreadsheets, actually, early on, because, again, there’s never been really any need to do this quickly. Usually when an epidemic starts, it’s basically either the flu or nothing. It’s never happened that you get a large national outbreak of anything in a country like France or the UK, and that you need to count them up quickly. I’ve actually seen some spreadsheets, especially vaccinations early on in France, where it was literally bottom excel spreadsheets being sent to a national authority, and a person literally adding them up together to get a national number, which is why it took so long at the beginning, because it’s a long process. And in every single country, there was a process of computer system engineering where they had to build a system from scratch, that they didn’t have, to sum up the numbers, especially for things like testing where they had to get the data bottom-up from pharmacies and hospitals and doctors, and get those pharmacies to connect up to the common system to actually get, at 6pm each day, the number of cases for that day, which is something that, again, if you haven’t designed that system in advance, takes months to build.
Which institutions really stood out over COVID? Which did really well on collecting and sharing data?
I think the main ones were all university-based. The main one that started very early on was Johns Hopkins University. They started extremely early counting cases and deaths in this jar today, and they are the main data source for this. And then two projects out of Oxford. The first one is the Blavatnik School of Government with the OxCGRT project, which is basically looking at policy and restrictions and things like that. And over time, since the first day of restrictions in the pandemic, they’ve been basically collecting per country per day, whether schools are closed, whether transports are open, whether or not external visitors can come into the country, and things like that, and it’s a very valuable data set. And the other institution is Our World in Data where we’ve been doing, again, testing and vaccinations and also a little bit of hospital and ICU data. Outside of that, I would say that in terms of systematic data collection, the main efforts have been done by the European CDC, which has been doing a pretty good job considering that they’re a tiny institution of a few dozen people, so they published very good data sets. Early on, we also use them for cases and deaths. But the problem is that, on many data sets they actually have a mandate to just do European countries, which is fairly limited, and just only part of what we need. And also, again, they have a small team. And they’re quite good on the actual public health side, but not necessarily the best in terms of data engineering. So the updates can be a little bit late, they have a lot of errors coming through, and so, when we used their data, we, we very often emailed them almost every day with mistakes, we found in the data, because they hadn’t implemented checks to make sure that they didn’t just add a zero at the end of a number by mistake, which can happen a lot when you do manual input. ECDC, I think, did a good job considering their resources, but not quite up to the standard that we would need. And then there’s the WHO that, again, did their best, but I think was late to the party every time. It happened again on vaccinations where, for a good six months, they did not publish any data on vaccination. And it left us in this very weird position of being the only source in the world to let the world know how many people have been vaccinated, which is very strange, also it put a lot of responsibility and pressure on us. And they ended up doing it, they ended up launching their own dashboards, and again, publishing the data that governments were sending to them directly, but it took them months, and also making improvements took them a very long time. Because again, they don’t really have the technical expertise to do that. So for example, when booster shots started, it took them six months to actually add a booster column to the data set, because I’m guessing that it might have been implemented by some kind of subcontractor, and so they didn’t have the technical capacity to add a new column to the file, or something like that. I don’t have the full story, but at least I know that it started in June, in Israel - the booster shots - and it’s only I think, in December, that they added the booster column when hundreds of countries, like more than 100 countries had actually started boosting people. So the WHO is a recurrent story of, ‘They always end up doing stuff, and once they do them, they do them pretty well. But it’s just that, given the timeframe we’re dealing with and given that everything needs to be done fast when a pandemic hits, it doesn’t seem like they have the resources to do that well.’
I’m curious, given the constraints of the unprecedented nature of COVID, and also given some of the resource constraints, how did governments, especially in developing countries, or of LMIC backgrounds, do when it came to reporting data? It went through the WHO for a lot of this, but I’m curious in particular how you think that went?
Yeah, a lot of it went through the WHO, and for a lot of countries, we basically had to wait until the WHO started publishing numbers, because we just didn’t have any information outside of that. And for other countries, it was really a mixed bag of some countries published numbers on websites with those being really the ones that were really hard to collect. Most of it was manual collection because it wouldn’t be any kind of open data open file, or even a regular table, it would just be a wall of text or an image, and within that image there would be a number. And so either you start getting into some kind of image recognition software, or, it was easier and less costly for us to actually have someone go to the website of Malawi every day, and look at the image and type the number, it was just easier. The problem is that you run into problems and in some countries, you just don’t understand the language. So you don’t understand what’s on the image. And maybe they’ve moved the number from one cell to the other. We spent a lot of time doing awful things. I often had my phone on Google Translate in the photo mode, pointed at my computer screen, to scan the text of an image to get the translation of that text to make sure I was looking at the right number. So things like that. It was also a lot of technical constraints. In a lot of developing countries, we found out that actually, the health ministry doesn’t have its official website, because often I’m guessing the reason is that technically it would be considered too unstable, and so the website would crash very often. And so what they do, which is very smart, is to use Facebook, for example, as a publishing platform, because Facebook never crashes. So the information will always be online. And so you end up with something like the page of the Health Minister of an African country that publishes the number of cases every day, but that means that what we do is basically go to the page every day, look for the latest posts within the number of cases and copy that number by hand. Which again, is something I wouldn’t have never imagined but something we had to do very often and for a few countries we still did until recently.
I don’t know if you can confirm but our friend Angela mentioned as well that you occasionally got messages from people having just received their booster shot or their vaccination, messaging in and being asked or asking if their number can be updated on the tables.
That is true, and we never really understood whether it was a joke or not, or whether people actually thought that we would collect them one by one. But yeah, we got a few of those people, maybe jokingly, or very seriously, emailing us saying, ‘Yeah, I got my booster shot today. Can you please add me to the list?’ So obviously, we emailed back saying, ‘Thank you very much. But hopefully your government knows that, and we’ll get it through to your government.’ But, yeah, it did happen.
I hope it’s not a joke. I think it’s a really sweet gesture of people doing their part.
Yeah, maybe they think we’re just dealing with hundreds of thousands of emails confirming-
‘We got another one!’
How data is useful during global catastrophes
Nice. Maybe it’d be worth talking about the use of data and good data visualisation in times of crises in general, whether it’s a pandemic or something else. So how did you think about OWID’s role in COVID? And how does that extend to other kinds of crises?
I think a lot of it was providing trustworthy data in a context where, as we’ve seen, just because something is very bad and very urgent, doesn’t mean that people don’t have the same problems of mistrust, polarisation, conspiracy theory and all that. And so I think, within those short timeframes of crises, you actually have an even stronger need for trustworthy things and trustworthy data. And so I think OWID’s main role during COVID has been to be a place where people can get online, go see your charts every day and know that what they’re looking at is accurate and trustworthy and up to date, and that they don’t have to second guess what we put up on the website. It also means that what has always made me very happy is when I see people online who fiercely argue about something COVID-related, but both of them, for example, in a Twitter conversation, refer to OWID graphs to make their point, which I think is very cool when that happens, because it rarely happens. If you had people arguing fiercely about some kind of thing related to politics, the main thing they would do would be to quote completely different sources that would just not agree. They would basically criticise each other’s sources, saying, ‘Your source is not trustworthy’, ‘Neither is yours.’ And this would just never end. Whereas I think with COVID numbers, most of the time, people on both sides of the debate have agreed that they can use all data, which makes me really happy and I think provides a lot of value, because again, that means that people don’t have different sources for COVID data, they have a few sources, but they know that the sources don’t contradict each other. Now, obviously, that doesn’t apply to everything. So, for example, we’ve started adding some data a few months ago about death rates by vaccination status - so death rates among unvaccinated people, and vaccinated people - and this is something where, when we started pushing that data and putting it online, some people started saying, ‘Oh, this can’t possibly be true. Our World in Data must be controlled, manipulated, financed by some people who want vaccination to happen. But for the more reasonable debates about things like restrictions, and whether the restriction should go on or whether they should be relaxed, it was good to see that people trusted us.
And in that time of crisis, in the most acute time of crisis, I’m imagining 2020 and 2021, as a basic question, do you know what kind of decisions this data would have been used to inform? Is it around when to issue lock downs or when to ease up? Is it about one dose, two dose strategies? Can you give a picture of what kind of decisions that would influence?
It’s hard to track exactly. But the main thing we provided, and that actually worked really well, in that context was country comparisons. It’s not something I would have necessarily guessed, if you had asked me in early 2020, the extent to which all countries kept comparing each other. It was quite crazy to me that almost every single decision that the French government took was compared in the light of what the German, what the Italian, what the Spanish, what the British government did. And when those governments did things that were deemed good by people in France, then they would be compared, and people would be like, ‘Oh, we need to do the same thing.’ And vice versa, when they did something that looked bad, like, ‘We should really not do the same thing.’ So I think country comparison is the thing that we mostly enabled by just making it available from a user interface point of view, where people could really easily select countries and compare them to one another. And I think that allowed people to push forward restrictions or policies in general that would have taken more time to happen. I think the best example of that is very early vaccination in the EU, where, during the first 10 days of the vaccination in late December, it was very quickly visible that governments had completely different ideas of what it meant to vaccinate the country. Some governments started really fast and did as best as they could to vaccinate a lot of elderly people very quickly. Other governments started extremely slowly, as in a few dozen people per day, for the first few days. And that happened again, especially in France. I live in France, so I got to see this debate firsthand, and when, in the first few days, the government didn’t publish any numbers, and then when they started publishing a first number, after a few days, the number was that 80 people had been vaccinated after four days of vaccination. And in the meantime, Germany had vaccinated several thousand people. And I think that country comparison obviously pushed the French government to make their vaccination faster, just because of sheer comparison. There were a few days of literal mockery, where people made fun of the French government for that. And I think that obviously pushed them to make things faster. There’s also been some much more direct things. We were very surprised, for example, to know that, in Hungary, the protocol to allow a vaccine to be used in the country was that a million doses of that vaccine had to be used somewhere in the world. So as soon as the vaccine had been used a million times in the world in general, then that vaccine was deemed safe and could be used in the country. They relied on our data to do that. And so that meant that in a very direct way, the data we provided meant that people could be vaccinated in Hungary with new vaccines, which is pretty crazy as a way to evaluate a vaccine, but I think in some ways, it kind of makes sense. It was a very direct way that we contributed. I think these are the main ones, like country comparisons.
Yeah, I like that. That gives a different flavour, actually, to what I had in mind. This sounds a lot more like, rather than helping somebody like crunch the numbers on the spreadsheet and do some cost benefit analysis and a technocratic bureaucracy, helping enable the democratic process or keeping governments to account.
Yeah, healthy competition.
Yeah, like influencing that decision-making process through that.
Yeah, in many ways we didn’t provide direct analysis. What we provided was the capacity for journalists and people in general and researchers to have that healthy debate in our country about, ‘What should we do now?’ And I think that’s good. I think that’s good also, because the whole pandemic involves some pretty large restrictions put on society, and so I think it’s good that these things were debated. And I think it’s very good that a few institutions try to make sure that everyone was able to see how cases evolved after restrictions, how that’s evolved, whether things were having an effect. So I think this has been quite essential.
And I imagined, especially after the crisis now, just having this data set available, there’s gonna be a lot more analysis to help prepare for the next pandemic.
Yeah, exactly. One thing we’ve also enabled has been efforts by researchers to do analysis. So there have been hundreds of papers published citing our data, because it just allows people to, either through country comparisons, or even within a country, to look at trends and how things evolved after some things were put in place. And another thing I haven’t mentioned, in terms of the direct impacts of our work, has been the reuses of our data. A lot of people, I think, obviously, still don’t know Our World in Data in the world, and that’s perfectly normal. But I think a lot of people, a huge percentage of people in countries like the UK or the US have actually seen our data somehow through other websites. Because again, especially on vaccination, our data fed into the dashboards of The Guardian, the BBC, The New York Times, The Washington Post, and they would basically put another layout on top of it to make it their own charts and everything, but the data behind those charts would be ours. So a lot of the impact we’ve had is also just enabling those dashboards to exist, so that people who just get their news through the New York Times without really browsing other websites can see data that’s accurate and useful.
I learnt this phrase today, ‘naked numbers’, which is just raw numbers in isolation from any context. Compare this to numbers where you’re able to really easily compare them, maybe even visually compare them to, for instance, case numbers. I don’t know what 10,000 new cases this week means, but I can begin to understand what it means compared to other comparable countries, compared to the last few weeks. Maybe I can extrapolate the curve forwards and figure out what that implies, onto the top of just having the data in the first place. Right?
Yeah, exactly. I think it’s an intermediate stage that people don’t really think about too much. People either think about the extremely raw data, like just counting cases, or they think about the ultimately very advanced research side of things where you’ve got whole epidemiological models, looking at the trends of an epidemic done by research groups and universities. But I think there’s this intermediate step: you take raw case counts, or raw death counts, and you polish them, and you harmonise the dates, and you harmonise the name of the countries, and you add a seven day average, and you see the bumps, and you fix them so that the curve makes sense, and you add a map onto that, and you allow people to look at weekly and bi-weekly, and cumulative and things like this. And this is the bulk of the stuff we did actually is to just make it look prettier, make it look usable, and also provide people with things that we think makes sense. For example, we very early on started saying, ‘You should not look at daily cases, because the raw daily number doesn’t make any sense, because it’s too influenced by the date of the day of the week, or whether it’s a weekend or something like that. So you should look at the seven day average.’ But obviously, for most people, it’s extremely difficult to locate a seven day average, they wouldn’t even know where to start. And so we do the work of providing that and making sure that the data behind it is good. If we think the source we’re using is not good enough, then we change it. And we basically allow thousands of people to do that kind of bit of analysis themselves without having to download anything. And if we didn’t do that, then people would have to go to the website of France to look at cases, go to the website of the UK to find cases, put everything in an Excel spreadsheet, divide by population, and it would be extremely time consuming. So very few people would actually do that.
One thing I was curious about is: I don’t know how much duplication of all the infrastructure involved in collecting and presenting data there is across countries and maybe across journalistic outlets as well, but have you thought about just making this a public good, where you just share the infrastructure that you’ve built up with, for instance, other governments and just help them set up their own outlets as well?
There’s definitely a lot of duplication that happens. Also, because, again, governments don’t really understand that what we ask them to do is just to publish a file, even a text file would be fine, or CSV file, or spreadsheet, and we don’t really ask them to publish kind of pretty dashboards or anything like that. But they always do the dashboard first, and they don’t do the file. They don’t really understand that that’s what we’re asking them, and they always do their dashboard first. And so they spend weeks, if not months, creating a dashboard and improving it and all that, while forgetting to actually just do the thing that we would need them to do. What we try to provide there was, first of all, advice by telling them that we don’t need a dashboard, we can make the dashboard. It makes more sense if one or at least a few institutions make a dashboard, but that not every single country goes through the work of just making a website, making a website takes time. Making a website like the UK COVID dashboard: it’s extremely good, but it took them months to make it, to polish it, to make it good, to make it look nice. You have to do things like make designs and hire developers and improve it and choose colours and whatnot. If you multiply that by 200 countries, it’s a lot of wasted effort.
And then probably it also creates a bunch of heterogeneity, which then makes the point of standardising all of this.
Exactly. So it’s a weird situation where we really think they should not focus on that. But at the same time, within the governments, the incentives are opposite. The incentives for somebody like an adviser to the health minister, or civil servants, is to create a dashboard, because it’s something you can show to the minister, it’s something that actually shows that you’ve done a lot of work, and it’s something that makes your country look nice. Whereas publishing a CSV file is not really valued by people. I mean, it’s valued to people like me, but for most people, if you just provide a spreadsheet and say, ‘My job is done!’, then people will tell you, ‘No, your job is not done. You haven’t analysed the data. You haven’t informed people.’ But I think the people in governments should have the opposite focus and the opposite preoccupation.
Yeah. One more personal question I have, is: it sounds like you were really in the thick of it with the COVID decision-making crisis and also, if I recall, were in some government meetings where people came to you for advice as well - how has all of this shaped your worldview of how the world works? Especially with the question of ‘Where are the adults in the room?’ How do you see the world now differently than you did two or three years ago?
It has changed a lot. Not necessarily for the better. I think it has updated my view on several things. In terms of big international institutions, probably it has updated for the worse, in the sense that I still think those institutions should exist, and I think their work is essential for international cooperation and setting standards, but I think in terms of what I’ve seen these institutions do on a day-to-day basis, it just seems to me like the current model just doesn’t work. The incentives are wrong, they’re too slow. They’re not flexible enough. They don’t have the resources to work properly. They don’t have the technical skills to work properly. If an epidemic hit now, I just don’t really expect much from the WHO, for example. Maybe in a few years, if they manage to fix the way they work, and they hire the right people. But if something happened next month, I will not turn to the WHO and expect much, which I think is pretty sad, and is something I think I would have been more optimistic about a few months ago, or a few years ago.
To prod, somewhat controversially as well: How do you view the ability to have agencies on the board as well? The other thing that just strikes me as insane is that this scrappy, Our World in Data startup was able to just go and do this and presumably shape a whole bunch of government decisions and outcomes, and to some degree, the trajectory of how this big crisis went as well. That also just strikes me as like, ‘Oh, wow. The world is way more fragile and messy, but also doable, right?’
Yeah, very much. If you frame it in a positive way, it’s also very exciting. On an individual level, I think it’s pretty cool that we were able to do that, even on a personal level that I was able to do that. If you had told me three years ago, a pandemic is going to hit, and on a personal level, you’re going to have some influence on what people think about this event, I would not have thought about any possible way that could happen, and now it seems like there is. Again, the negative way of framing it is that it can happen because of the lack of agency of institutions. But I think it’s pretty good. I think maybe from an EA perspective, it also supports the idea that you can have an effect. You can have more effect than you possibly think at any given time, because things just don’t really play out the way you think. So I think even going into things like Alvea on the vaccine front, I think it’s interesting to think, ‘Okay, if you look at a system or a landscape of things, you might think it’s impossible to possibly create a company that could compete with pharma companies. But actually, maybe if you try, and maybe if things don’t exactly end up looking the way you thought they did, maybe you have an opportunity to actually do that.’ That’s what happened on the international level. And I think on the national level, it also updated my view a lot, but here I think it’s more towards revising my view of how decisions are made. I was pretty struck by a lot of people I talked to, especially in the French government, around how a lot of these decisions were made, especially around vaccination policies and restrictions, and the fact that - it’s not like this was thought about for weeks and months, because obviously people didn’t have time, which is completely understandable - a lot of decisions, as far as I understood and as far as some people explained it to me, were made by a handful of people. Like ‘Tonight, the President is going to make some kind of speech, and then he’s going to announce something. What’s going to be the list of policies in that speech?’ That list of policies will sometimes be defined at 3pm by a group of five people. That’s been very surprising to me to realise that, and it might not have been the case in every country. Obviously, again, because of the shortened time frame and the time compression of a crisis like this, it’s perfectly understandable. But I think that begs the question of: Who are the five people in the room? And are they the right people? Can we incentivize some people to apply for these jobs to maybe make better decisions when that happens? I think it has changed my view on this a lot.
International coordination during global catastrophes
Cool. So maybe we could zoom out now and think about potentially even bigger crises than COVID. I’m curious what you think we can learn from the last two years about what the world’s response might look like to potentially even more serious catastrophes. Potentially up to the level of existential threats.
Yeah. I think on this front, I’ve updated towards being more pessimistic. Not so much because the world has failed to coordinate because I think there’s been a lot of coordination, actually, and most countries have responded in a way that’s been somewhat effective, and a lot of countries have done better than others. I think it’s more than the fact that if you look at what happened really early on, like in February, January even, it was extremely slow. And the problem is if you start thinking about things that would be X-risk-like, I mean, there could be some that could play out slowly, but most of them will play out quickly. Most of them will play out on the order of minutes, hours, days, maybe a few weeks, if you think about a slow AGI take off, for example. We won’t have from late December to early April, like we had here. And so I think the fact that it took so long to coordinate and so long to react has made me think that we should be more mindful of those particular moments where things might happen, and that we need to make sure that if something kicks off, we actually react quickly, and we don’t just postpone the decision. And I think building on this idea of the ‘most important century’, I keep referring to the idea of the ‘most important week’, which might be like: if humanity ever disappears, obviously no one will remember, but if someone could remember, they would remember specific dates like April 21st, when the world fail to act when X happened. And I think it’s very important to have some kind of model of what we would want to happen if something started that looked like an X-risk possibility. Something that would be like a pathogen with 50% mortality, or an AGI take off or potential nuclear war. For some things, like nuclear war, I think we have a pretty good model of what we want countries to do. And governments have plans in place to know how they want to react. But I think for something like AGI, obviously, governments have no idea. But even in, I think, Effective Altruism as a community and long term-ism, researchers don’t really think too much about or don’t really discuss too much, as far as I know: what do we actually want countries to do? If tomorrow somebody creates an AGI, and we think it has takeoff capacity, what is it that we think governments should do? Should they kill it? Should they take over the company by law, or something like that? It’s obviously very difficult to answer this question, but I’m always worried that if something happens next week, that actually we would be pretty bad at it. We basically, for all this time we spend thinking about this stuff, wouldn’t really be that useful. And we would really know what to do.
Well, it sounds to me like there’s at least two things going on here. One is, for some particular threat, let’s say it is AGI takeoff, or a really bad pandemic, there are all these somewhat technical questions about what the world should do to mitigate or stop the risk. And then there’s this more general question, which is: do we actually have as a world the core capabilities to respond to an extremely fast moving crisis extremely quickly? And that cuts across the risks, right? And there, I guess you can start thinking about, just in general terms, what this kind of crisis response - maybe it could be a team in a government, maybe it could exist at the international level as well - could look like, and whether it could exist. Or are you just too pessimistic about the existing incentives for this thing to exist at all?
No, I think it should exist. I definitely think it should exist. One thing I’m slightly sceptical of is that the obvious legitimate place where you would want to build this would be at the UN level, for example. But again, given what I’ve seen from international institutions, and the incentives they had and this slowness they have built in, I think it would be a tall order to expect a UN built institution to be truly a rapid response team, to react within a few hours or days. I think I would be extremely surprised by that. What could possibly happen would be some private external system where people would build rapid response teams to at least analyse what happens. But then the problem is that you just get to the level of analysis. Some people in Effective Altruism have been talking about the idea of building rapid response teams so that people are available on the fly, to kind of stop the work they’re doing currently, and if something seems to happen, like the war in Ukraine, for example, to become available within a few hours to work on a problem. The problem is that people would be private citizens in every respect. And so they could provide advice, they could provide analysis that could maybe provide context to people in government, but ultimately, they wouldn’t have any kind of power. So what you would want, ideally, would be a mix of the two where you would have something truly rapid and that can react quickly, but something that has actual leverage. And in the current landscape, it’s hard to imagine exactly what that would be and where this would live. But obviously, I think it is needed.
Speaking of the UN, I agree, it’s a little difficult to imagine some kind of office for saving the world which just nails it within a couple of days. It does feel easier to imagine some kind of convening mechanism where there is some agreement for when, when certain conditions are triggered, representatives of countries convene in this place very quickly, and they have certain kinds of clear responsibilities, it’s delegation in advance. And so the UN itself is not calling the shots, but it’s giving a space for coordinating very quickly. And it’s like neutral ground, and it facilitates that discussion, and maybe that buys you a few days.
I think. I think that sounds much more doable, but it does sound useful. It would be an international equivalent of the Moscow-Washington hotline during the Cold War, where you would have some kind of immediate system to convene as many countries as possible to get them to talk, either to get them to cooperate on a problem between countries, like a war, or to get them to cooperate on something country specific, like an AGI. And I think that makes a lot of sense. And I think that sounds much more doable. I’m not exactly sure what that would lead to, because then once they’re on the phone, or they’re gathered in that room, you would actually need them to agree on something. But at least it sounds like an improvement over what currently happens, which is, right now, if a pandemic hits, the WHO takes a few days to react and then kindly invites the representative of these countries to come next week to agree something. And it has this delay of a few days, which, again, has weirdly worked for COVID, and for monkeypox and things like that, because we’re facing epidemics that actually kill around, or less than, 1% of the people, that’s the effects. But if we think of something like 30% or 50% mortality, this just doesn’t sound like this can wait 10 more days.
It may also be worth spelling out, presumably, when you’re talking about the ‘most important week’ or something like this, it’s about a week where we still have the kind of hinginess where we can do something about it. It’s not necessarily, although it might be the case, that everybody dies within a week, but it is within a week that you can determine when this becomes a contained kind of pandemic or a COVID-level pandemic or something even worse that spreads. There are these critical moments.
Yeah. And I think part of the work would be to determine what that hinginess hinges on. What are the systems? If something happens within a few days, what are the things we wouldn’t want governments to do? For example, in the context of a pandemic, would it be the immediate stop of all kinds of international flights? Would it be an instant lockdown in all countries, something extreme like that? I think it’s actually worth discussing what we would want to happen. And I think it’s very difficult to currently know what would happen because we would probably go back to the situation around February, March, 2020, of countries slowly discussing things in a way that’s just way too slow.
Yeah. And there seems to be two challenging dynamics, or at least two that I can think of. One of them is this issue of false positives. Especially when it comes to these global catastrophic risks or existential risks, as you said, we’re talking about things that happen really, really quickly, and you need to act really, really early on. But then the question is: on how much uncertainty can you make a call? How quick should you act if something like monkey-pox is happening? Should you automatically treat this as a COVID-style event and shut down flights and call lock downs early on? And what kind of criteria do you set for this? And then the other one seems to be that, because we’re talking about things that are on a global level, and these really big risks, there is clearly some tendency here for things to turn authoritarian where you either have to seize an AI lab or shut down international borders. And that just feels like a lot of power and that, I think understandably, there’s a lot of hesitation around setting these kinds of mechanisms that make these things available. So in all senses really what I’m just saying here is that this sounds really difficult and really challenging.
It is extremely difficult. And I think it’s also why it’s worth having those debates beforehand. Because if the thing that blocks us from stopping an AGI is the fact that we haven’t talked about it beforehand, and the fact that a lot of people feel uncomfortable forcing a company to shut down an AGI and so it takes a few days to get parliament to debate about it and maybe parliament decides not to do it, it will just feel like a waste to kill humanity because of that. I think part of it is also cultivating this idea that false positives are a good thing in a way, that the idea of reacting strongly to something that looks like it might be really bad is not something that should be made fun of. There’s a specific case that people in France often referred to during COVID, which is that in 2009, the governments and, in particular, the health minister at the time, ordered dozens of millions of doses of vaccines against the H1N1 virus. Because at the time there was a, you could call it a panic, or at least some kind of pessimistic analysis that maybe this could be a terribly bad flu year. And so the government spent almost a billion euros to order vaccines. And later on, it turned out to be pretty much almost a normal flu year. The problem is that this was remembered as a blunder by the government, which I think is just a terrible way to look at it. And for years, that particular person was made fun of for that decision, which I think now probably, I hope, in retrospect, because of COVID, looks like a very smart decision. And I think it’s good to cultivate that idea of ‘All things equal, it’s good to react slightly too strongly to crises, rather than react slightly too weakly’.
Yeah. I’ve heard this get called the ‘preparedness paradox’, which I’m we’ve talked about in other episodes, but if you do enough to prepare for major crises, one of two things will happen: either you do so well, that you make no blunders like that, and then no one notices because nothing bad happens; or you occasionally have these false positives, because you need those mistakes, which are ex ante sensible, in which case people like make fun of you. And there’s really no winning, but it’s still very important to do.
Yeah, and I think that’s why some public information campaign about this is very important, because, as you said, like if we decided now, rightfully so, to increase the budget on pandemic preparedness, and then another pandemic, similar to COVID, hits in 10 years, and we manage it much better, then maybe people will start saying, ‘Well look at this money we’re spending. Why would we spend so much money on pandemic preparedness? We haven’t had a pandemic in 10 years’, where, actually, the reason why we don’t have them is that we’re catching them. So yeah, I think there’s definitely a curse there, where the more you do and the more successful, the less it looks like you should be doing.
Okay, let’s, let’s press on. We’re talking about some of these more X-risk or longtermist topics now. On that topic, it is often said that longtermism lacks certain kinds of feedback loop, which you do get in, for instance, the global health and well-being context. I’m curious, in your experience of the last couple of years at OWID, did you learn anything about where we can find useful feedback loops for some of these big, long term topics?
I think, obviously, the answer should not be that we should wait for big crises, because we don’t know exactly when they’re going to happen, and if we wait for them, it might be too late. I think to me, it’s led me to think that smaller crises - and by small I mean, something like COVID, or maybe even smaller, and I don’t mean to be derogatory, when I say that, obviously, COVID was huge and killed a lot of people, but we’re now talking X-risk level of accidents - are also extremely important to look at. First of all, because obviously, these things happen chronologically, and when something small happens, you don’t know if it’s going to be big. Something big first looks very small, especially something like a pandemic. Second, I think it’s pretty clear that if we can’t use anything to influence a small crisis, like COVID, what exactly do we think we’re going to do for a pandemic that’s 10 times as bad? Some people say, ‘Oh, but the reason why we didn’t do so much about COVID, and we didn’t try very hard is that it was very clear that it was not going to kill humanity.’ I mean, sure, but, first of all, you don’t really know that: the virus could have mutated to a way worse pathogen that could have killed a significant portion of humanity, and in many ways, it kind of did, at least much more than we thought would ever happen. And I think also, it’s very crucial to think that lessons can also transfer from small crises to big crises. It’s this idea that things would not be radically different, between something like COVID and something 10 times as bad as COVID. The only difference would be that you would need to react 10 times as quickly. And so I think it’s the wrong way to think about it to say that anything below X-risk is not worth looking at. I think that’s a really bad way to look at it. And I think also because everything we work on in longtermism are models. Hopefully we will never actually work on an X-risk, it will not happen. And certainly we should not wait for one to actually occur to start building models of what we should be doing. So I think, for anything like speed of international cooperation, preventative measures, efficacy of restrictions, or PPE or vaccine production, we should definitely be looking at smaller crises to look at this. So I think, to me, there’s a whole field of research that can be built to be something around experimental longtermism. So something that isn’t just looking at possible crises, but also looking at something that looks within an order of magnitude of an X-risk, then how has the world behaved? How have we reacted? Could we have done something better? And I think that’s the kind of thing that could potentially help you when an actual X-risk happens to actually be useful and change something.
Yeah, it does sound that one of the challenges with this kind of experimental longtermism idea is that it must be really tricky to be able to differentiate where you’re learning really valuable lessons from these order of magnitude smaller, but still relevant, crises, where certain factors do generalise and where certain factors don’t generalise to when we’re talking about existential risks or some of these scenarios. I’m particularly curious about linking that back to our discussion before. It sounds that Our World in Data was hugely influential and impactful when it came to the question of COVID. But I’m wondering more broadly, what is the role of data collection and data standardisation, when it comes to existential risks?
I think it’s very important. And I think, as you said, it’s important to be honest, and think that if an extremely bad pandemic hit, I’m not sure Our World in Data in its current state would be very useful. Because even though we work much faster than WHO, we still work on the scale of a few days or weeks. And so I’m not exactly sure that we would play such a role. I think the answer here would be systems that are built not to be reactive, but to be proactive. So things that are surveilling what’s happening. I think here the idea of standardising things is very important, because the only way you can build a surveillance system is by systematising things and looking at them every day, even when nothing happens to make sure that you catch early signs as early as possible. I think one of the ways we could help with this, and we’re starting to think about it, if you take the big example of something like the flu, is to build a surveillance system that would allow researchers and journalists and people in general to monitor what’s happening regularly, even outside of periods of crises. So obviously, we wouldn’t be doing any kind of direct data collection ourselves, but the WHO, for example, has a data set of weekly flu cases around the world, which they make available, which is pretty good, and it’s pretty timely and available publicly. But the way they’re presented is not so useful, so we could make it available in a format that’s much more usable. We could also merge it with other time series that are available about the flu, about testing capacity of different countries. And also, it’s not just a flu, it’s also what people call flu-like disease - so anything that kind of looks like it has the same symptoms, but isn’t necessarily the flu. And I think building this monitoring capacity is also something very important, because if we built in advance, then potentially, if we’re talking about that most important week, within that time frame, we could still be somewhat useful. But I think if we’re talking existential risks, and it’s something that we haven’t anticipated, I don’t think even Our World in Data will be very useful if something happens tomorrow. We might react as quickly as possible. But if it’s gonna kill humanity, I think we need to think about it beforehand.
Yeah, it sounds like here the question for data collection and existential risk is playing the canary in the coal mine, like a really early warning signal. It sounds like some of our previous episodes about meta genomic sequencing and early pandemic warning systems.
Yeah, in many ways, we were able to be useful because COVID had already reached the level where it’s the aggregate data that’s useful. There were so many cases already that the thing that people found useful was to look at seven-day averages, because there was just so much stuff happening already. I think anything before that, it’s still pretty unclear exactly what it’s used for at the moment.
What does OWID have to do with effective altruism?
Nice. How about we draw a line under this crisis management stuff, and talk about EA, if you’re down for that?
Great. All right. Well, here’s the question: What is the Our World in Data Theory of Change? How do you think Our World in Data has an impact?
I think it’s something we’ve been thinking about much more in the last couple of years, mostly because of the impact we’ve had on the current situation. I think before that, we were content with this idea of just building a website that provides extremely good research and data on the world’s most pressing problems. And I think now we think more carefully about: What exactly are we doing? How can we measure it? How can we exactly gauge the impact we’re having? Some of it is still really meta and hard to measure. In many ways, we want to be a very good epistemic institution for everyone, somewhat like Wikipedia. And that means that it’s hard to measure exactly how you do that, and whether it’s working or not. Some things that obviously could happen would be the amount of citations you get, and people who visit your website and how often people use your charts to make points and debates and stuff like that. But that stuff is hard to exactly grasp. Even harder to grasp are things like the fact that we were very conscious that some specific people or certain specific decisions matter much more than a million viewers. So, in some ways, you could forgo 99% of the traffic we have on the website, and if you replace it with a couple of people and governments strategically placed, and those people are directly influenced by an article we wrote, and they decide to, for example, open one more nuclear power plant, because of something they read on Our World in Data, then potentially those two people will have much more influence than a lot of the people we have reading articles. The problem is, it’s even harder to measure that. How do you possibly know that a government adviser before getting into a meeting, googled Our World in Data-something, and then stumbled upon one of our articles, read it and slightly or strongly changed their mind about something. It’s extremely difficult to know. Same thing for philanthropy, where a lot of the impact we think we’re having is by redirecting or influencing the way that people give money, or the causes they care about. But again, it’s very hard to exactly say, to what extent. But it’s something where we’re trying to think about. It’s also something we’re trying to measure. There are some ways of judging that. So for example, the impact you might have on institutions, is something you can measure through a number of citations, like UN reports, or like IPCC reports, or things like that. And it’s something we didn’t do initially, we were much more tracking things like mentions in the media. But now, we’re tracking directly institutional reports, to try and get a sense of how much we’re influencing things. But again, it’s still somewhat based on numbers and on people making clear, in public, reports that they used us. But it’s not exactly clear what happens behind closed doors in meetings.
Yeah. I also want to take the opportunity to zoom out of our earlier discussion, which was very X-risk focused and just flag that there’s a lot of tremendous data that Our World in Data has on everything from global health to farmed animal welfare to climate change. I’m curious for you to elaborate a bit more on what having these reliable sources of data looks like for impact, rather than just crisis management?
I think that’s a good question, because I think the crisis management aspect is something people think a lot about us now because of COVID. But before that we were not particularly focused on crises or anything like that. We actually tended to focus much more on long term problems that we could take months writing about without it being a problem, things like poverty and climate change and education around the world and public health. And it’s something where definitely still the bulk of the work that we do is extremely useful. We do this work of finding the right data, getting it into the right format, cleaning it, analysing it, to make charts. And it’s this idea that people don’t have to constantly think about, ‘Okay, I want the most up to date data on maternal mortality across Africa, where can I possibly go to get that?’ And we’ve done the work for them of looking into the data, looking into the different sources for maternal mortality, understanding what are the differences between them, reconciling those differences, building a time series that makes sense within those different sources, and putting that in a chart that is customizable, interactive, and that they can use to analyse the situation. And I tend to think also, as our theory of change, of all those millions of minutes saved, because people don’t have to do that data collection themselves. We also have datasets, for example, on CO2, where we gather data on dozens of different metrics about CO2 emissions across all countries across time, and it comes from multiple different sources. And the idea is that, sure, some people will have to go back to square one and do their own data collection, because they need something other than that, but there will be hopefully thousands, if not more, students and researchers and journalists who need what’s in that file, and they won’t have to do that data collection. And so this will save them millions of hours of work thanks to that because we’ve done the effort of providing a data set that’s accurate, trustworthy, in a proper format that they can use without buying it, without buying software, without anything like that, with good documentation, where if they find something odd in the detail, we actually answer their questions. So that’s a lot of the added value we also have: doing that grunt work for people where we analyse stuff, and we clean stuff, and we do all those things that people don’t really want to do, but they still have to do, and we allow them to skip that whole phase.
It is incredibly surprising to me how much of this, at least in my context as a researcher, I hadn’t found in this easily accessible format until I stumbled on Our World in Data. The most recent painful example is that I spent probably hours trying to eyeball what these IPCC climate scenarios mean for GDP and population assumptions and stuff, and I think that is a Our World in Data chart drop that happened more recently. I wish I had that a year before that would have easily saved me a week of time.
Yeah, exactly. I think then, once you have that, the challenge becomes to make those things early on, so that you don’t have to make them yourself, basically find all of these things that people keep referring to, and keep asking themselves, and make sure we cover all of them. And that they can easily find them, because it’s not just good enough to make those charts and to cover those problems, but it’s also very important for people to easily find this; things like search engine optimization, and making sure that when they click on the link on the website, they find out what they’re looking for, and that the search works and things like this, which are also just very down to earth things about maintaining your websites, and making sure everything works, and making sure the search is good, and that you have a good user interface and a good user experience, and that everything is pleasant to use. And I think it’s also something that a lot of institutions struggle with. Some institutions have become better. For example, the World Bank, in the last few years, has done a much better job at making the website easy to use, and they have a data website that’s actually pretty cool, and when you look for new variables, you can compare variables. It works well. It’s not slow, it’s not terrible. I think if every institution had a website like that, I don’t think we would be useless. But I think we would be somewhat less useful if all of these institutions made the data truly browsable, accessible, understandable. I think we would find ourselves providing other kinds of value added, but not specifically this one. But in the meantime, we have plenty of added value to provide by just getting that data, and making it available in a format that makes sense.
This may be a huge tangent, but I can imagine some people wondering: well, why not something like Wikipedia, which is just fully distributed, anyone in the world can contribute to it, and there’s no kind of Wikipedia team that writes most of Wikipedia? One thing that comes to mind is in philosophy, there are Wikipedia articles on different philosophers. There is also a philosophy specific encyclopedia called the Stanford Encyclopedia of Philosophy. And their model is they have a core team of maintainers, and they reach out to experts on different topics. Everyone in philosophy goes to that first, before Wikipedia, and it’s exceptional, and it’s been maintained over all these years and just got better and better. And in general, my impression is at this point to some kind of pattern, which is in the world of open source software, the projects that do especially well at getting maintained properly, over long time frames are those with a core team of maintainers, rather than just this fully distributed, ‘Anyone chips in with some of their spare time’-type thing. I guess that’s one reason why something like an Our World in Data model does so well is that you have a team of people whose heads are just fully in keeping this thing up to date and keeping it running.
Yeah, I think so. I think that’s right. And I think the reason why we do this and that it works well, is that, as you said, we know all of the intricacies of all these data sets. We’ve looked into them for thousands of hours, we know exactly why this variable in this particular set of this particular institution looks weird for that country in that year. We know how to reconcile things. We spend a lot of time thinking about that. And the other way to think about it is that we often ask ourselves, ‘Why haven’t there been strong competitors to Our World in Data?’ It’s something we often wonder about. We would have thought about that by now, there will be at least one website doing almost the exact same thing, and there hasn’t been. I think it’s just there’s a huge cost of entry. Just taking the time to build the database we have of thousands of data sets and thousands of charts would take a huge amount of time. And just because we’ve done it over 10 years now means that we have a treasure that people could start to build. But, first of all, there’s no big incentive for them to do that because we’ve done it and also it would take a huge amount of time and they wouldn’t know where to start, and they would have to start from scratch for all these data sets, they don’t have context of these institutions. So yeah, it would just take a huge amount of time.
I’m wondering what to challenge, or something like that, means: how much other low hanging fruit is there, if you could just concert your efforts and do this big push. One example here that jumps to mind is, not sure if it’s a competitor of Our World in Data or a source, something like Global Burden of Disease which is also really new, right? I think it was a Gates initiative back in 2014, that tried to map out the DALY burden of every disease in the world. And as I understand it, it has a bunch of issues to fix, still. But it’s a massive step ahead of what was there before. I’m just curious what other things might be out there that are ripe if you could just concert either nonprofit or community open source efforts to do a push.
Yeah. I think it’s a key thing. I, obviously, certainly don’t have an answer, because if I did I think I would have told people to do it. But I think there’s some potential ideas: when I think about Effective Thesis, there’s a little bit of that solving a coordination problem that doesn’t require thousands of people, it just requires a few people to be extremely motivated. One thing we’ve been thinking about doing and that could be done by potentially another institution would be to tell people about what’s missing in the world of knowledge: what are the dates that are missing? What are the research papers that are missing? It’s something where we’ve realised only recently that we’re actually in a very good position to do that, because we spend so much time mapping the knowledge space of a given topic, like CO2 emissions, that, actually, by the time we’re able to write about it, we’re also able to write about what we haven’t found. What are the missing things? What particular PhD has never been written? And until recently, we basically just discarded that and never really told people about it. Now we’ve realised that also because of the visibility we have, and the influence we have, it’s actually also useful to just tell people, ‘Hey, look, in the space of CO2 emissions, if you want to be impactful, you could do this particular PhD thesis, you could start this particular NGO, you could count this particular metric, because no one has done that before.’ So I think this is a low hanging fruit that we could start doing, obviously, I think, other institutions could start doing. Just letting the world know about what are the good things that could be-
The ‘knowledge gaps database’.
I should ask: any examples, like concrete examples that came to your mind?
I don’t know the exact details of it, but I think my understanding is that when my colleague, Saloni, worked on mental health, she came back with this impression that there’s a few things to be said, and there’s a few papers to look at. But what she explained to us is that everything she found was 10 years old, pretty broad, or pretty shady in terms of research methods. And there seems to be a huge space for data collection, data analysis, something that would provide a solid foundation for publishing the kind of entry we have. I think she was pretty disappointed in what she found, and my sense was that there was a lot of room for people working on what felt like much more systematic data collection. Even things like, ‘What are the most common reasons why people are prescribed antidepressants?’ Even a question as basic as this basically doesn’t have a very good answer in the research literature, which is kind of mad because we have national healthcare systems, and we should be able to pull something like this. But apparently, it’s just not available.
Yeah. One channel here is to point this out to researchers and try to get people motivated to work on these questions. Another thing that Our World in Data seems to have more recently done as well is get involved in the, if I can say so, advocacy space, right? Asking certain organisations to take action on this. The particular thing that comes to mind here is the IEA, and some of their energy data sets. I’m wondering if you could tell us a bit more about that. And also, what a future here might look like?
Yeah, so the IEA is the International Energy Agency. What happened there is that they’ve been publishing data for several years now. The problem is that when the agency was set up by its constituent member countries, it was decided that the agency would have to be self-funded. And so the agency has to generate money to get its funding, and that means that all of the data that they publish is behind a paywall. And so you have to pay a hefty cost to get access to it and then once you get access to it, there are extremely severe restrictions on what you’re allowed to publish publicly, or to distribute. Basically you cannot redistribute anything, and you have to be extremely mindful of what you publish. The problem is that this is all public money, and that it’s extremely important data. And as long as it’s not public, especially in the space of energy, the main data set we have to rely on is the BP data sets, which, when you think about it is a very ridiculous situation where to look at energy, we have to trust and use the data of one of the biggest energy companies in the world that doesn’t necessarily have the right incentives. Now, as far as we know, the BP data is very good, and so we’ve been using it. But there is something that feels very suboptimal there. And so we started a campaign a few months ago to basically advocate for the IEA to open up its data. Also, because - I can’t remember the exact number but - the funding gap was extremely small, we’re talking about a few million dollars, something that could be easily filled by countries. And as some people started supporting us, we got support from various NGOs and some journalists started writing about the issue. And recently, the director of the IEA announced that they would start efforts towards publishing that data in an open way, and to make the data completely available. It’s not completely done yet, but I think, right now, it feels like unless they really go back on what they said, they’re gonna do it. And I think it feels like there’s a lot of ways that we could replicate that by using that leverage we have now that we’re a little bit more famous and a little bit more visible, to basically ‘shame’ institutions into doing that. It’s a strong word, but I think that’s the idea, the idea of exposing the fact that some things don’t work exactly the way we think they should work, and then in a constructive way, making suggestions and recommendations as to what should happen. And I think that’s a fine line to tread on. We don’t want to be mocking institutions. Most of these institutions, if not all, are doing the best they can, with limited resources. For example, if we’re talking specifically about the IEA, I don’t think the IEA wants to be paywalling its data, I think it just has to because it has to generate money, and it doesn’t know how, and they’re stuck in this situation. But I’m pretty sure if you ask every person working there, they will tell you that they would love to make that data accessible. And so I think the idea is to be a partner in that change, and to have this complicated relationship between advocacy and partnership to get those institutions to slowly change and be forceful about it, but not aggressive about it.
It’s a great example. Since we’re talking about datasets, which should exist, or at least should exist more, one thing that comes to mind is tracking conflict over human history.
The specific example you’re talking about, the idea of conflict and conflict deaths is a weird one where all of the counting that has been done has been done by individual people who spend basically their entire career dedicated to this. And it’s very strange, because obviously, they’ve done the best they can, but they’ve come up with slightly different versions of it. So actually, we’re working on a data set that is going to merge all of this, referring to the different sources, and what people have estimated the casualties of each conflict to be. And we’re going to try to make this visualizable and accessible in a way that makes sense. It’s obviously very difficult, because you’ve got thousands and thousands of these conflicts. They go from one person killed to millions of people killed. Some of them don’t have quite the same definitions, according to different countries that took part in the conflict and how they defend themselves and whether they think it was a war or not. But we’re trying to work on that. And it’s a project we’ve just launched internally. And so-
Watch this space.
Yeh, watch this space for something, probably next year.
The value of making the world a little bit more sane
Cool. So just zooming out, we were, at least a while ago, talking about Our World in Data’s Theory of Change. I have a very vague big question, which is: How do you think about the value of making the entire world just a little bit more sane? Imagine that everyone gets to occasionally check in with Our World in Data and have a slightly better, higher resolution picture of just what’s going on, broadly speaking, in the world. That seems good to me, but how do we start thinking about how good?
Yeah, I think that’s exactly where we’re at. We definitely value it, and, personally, it’s something I value hugely and I think one of the main things we do is make the world slightly more sane. But as you said, the problem is that using that as a theory of change is difficult because it’s hard to say exactly if you made the world more sane today, or if you didn’t. So I think part of it is coming back to the idea of measuring what we do. But I think part of it is also accepting that, if that’s your goal, then there’s going to be broad meta ways that you can track this that are going to give you a general hint of whether you are achieving this. But other than that, you will never get an actual metric of whether you made the world more sane. And I think that’s frustrating, and that’s difficult. But I think it’s also accepting that, over time, and given a big enough scale, it’s extremely obvious that Wikipedia has made the world more sane. It’s obvious to me that something like forecasting could make the world more sane, maybe it hasn’t yet, but it can. And I think on some things specific like COVID, I would need to justify too much why it has made the COVID situation more sane in some aspects. But it is true that on any given day, on any given article we publish in particular, it’s very difficult to gauge exactly whether we’ve done that.
Yeah. There’s something nicely ironic here about how it’s really hard to quantify the impact of the quantifiers, right?
Yeah, and I think you could ask the same question about any kind of newspaper, right? Is the New York Times making the world more sane? I mean, yes, obviously, I think over its history, and even on any given day, I think it is. But if you stopped publishing it for a week, would the world go wrong? Obviously not. So I think it’s difficult, but it’s this idea that the aggregated output is positive. Any kind of thing related to epidemics is kind of hard to define it that way.
It’s worth doing with made up statistics. I’d love to see, maybe even someone listening to this, just trying out some BOTECs, on like, ‘how valuable is making the world more sane in various ways?’ And I’m sure there’s creative ways of trying to put numbers on this, right?
Yeah I think so. And I think generally speaking, we’ve done that a little bit. At some point, I had done some kind of BOTEC analysis of, ‘How many millions of lives were possibly saved by the vaccination campaign being a little bit faster by a couple of weeks, because of the country comparisons that happened in the EU?’ And that came out to a very nice number, so it was nice. But I think this is doable, because we know exactly how many people died every day because of this. We know exactly the dates and all that. For example, one huge thing that’s been very viral about our website is the article that my colleague Hannah Ritchie wrote about - the several articles she wrote about - the impact of beef on CO2 emissions. And we know for a fact, because of just how many hundreds, if not thousands, of times people have told us about this, that this is a very useful article. And possibly, if we deleted that article today, that would be the most damaging thing we could do in terms of impact. But exactly how many people were convinced to stop eating beef? Who knows? I just have no way to know that? You could potentially try to do a BOTEC for that. But I think the uncertainty interval would be pretty large.
Yeah. But I’m a fan of at least trying to make guesses, even if it’s very uncertain.
No, no, I think that would be good. And obviously, I think it’s important to do it also just to get a sense of whether what you’re doing can have some kind of positive impact or whether you should be spending your time doing something else.
Let’s talk a bit about EA. Is everyone at Our World in Data a card-carrying effective altruist?
No. I was gonna say sadly not, but it’s not sad - I think it’s actually important that not everyone always is a card-carrying EA. OWID has a definitely some kind of relationship to EA, which is that it’s definitely EA-aligned in the things we talk about. There’s an idea that OWID is also providing a lot of inputs for EA organisations and EA researchers. And it’s not negligible how often we are cited on the EA forum or in newsletters and articles - it’s definitely something very visible. EA ideas are also feeding into OWID. I mean, our tagline is ‘Working on the world’s most important problems’. So there’s definitely that idea of, ‘What are those most important problems?’ But I think it’s important to know that OWID is not an EA organisation. We were set up kind of at the same time that EA started actually, but in a completely separate way. And people’s opinions on the team vary from just having heard about EA - I would be surprised if somebody had never heard about EA, just because I talked about it - and can go all the way to being somebody quite involved in the community, which is my case and the case of a couple of other people on the team. But there’s all sorts of situations in between, that kind of spectrum of involvement. But definitely not everyone.
With your EA cause-prioritisation hat on, I’m curious how you think about the value of putting together charts about problems which maybe don’t rank in the kind of world’s most pressing problems? I actually think that there are still really good reasons to present the data on those problems, but what are those reasons?
I think the reason why we provide this data is that it’s about providing the data. I think, if it’s about writing articles, and things like this, this is where we stop because if you take something like terrorism, we might have a couple of old articles about it, but we don’t do that, because we don’t think it’s worth spending a lot of time because the data shows us that it is not one of the world’s most important problems. However, to be able to say that you do need the data, otherwise, how possibly are you going to say that it’s not?
And it comes back to the value of comparisons, right? You need both sides of the comparison to make that point.
Exactly. So we definitely do need to every year update our charts on terrorism, because otherwise, we won’t be able to know exactly what to write articles about. EA researchers won’t exactly be able to say that the DALYs generated by terrorism are much less or much more than the ones by malaria. And so I think it’s something about providing an input, and because of that it’s something we need to do. I mean, obviously, you could justify going into every kind of data analysis because of that, you could write about sports or anything like that. But I think if you remain within some kind of reasonable circle of what could be potentially useful, I think it’s very, very important that OWID doesn’t become data but gives more stuff basically. It’s very important that OWID stay something very broad that provides data about GDP and population and terrorism and natural disasters, and everything possibly that you can think of that could be a candidate for an important problem. And then it’s everyone else’s kind of job or opportunity to look at this data and decide what they think is important,
And presumably, for impact or something, the impact that Our World in Data would have is some multiple of how important it is to make progress on this specific cause, and how impactful creating and maintaining these datasets is for this cause, right? We were talking about COVID before where maybe ‘natural occurring pandemics’ isn’t within the normal EA canon of cause areas, but clearly here, having this data being available was hugely impactful.
Yeah, definitely. I think there’s also many ways in which we’re impacting the world in ways that don’t really necessarily qualify as big in impact by an EA worldview, but that are still very important. I think, for example, whether we impact people by letting them know that nuclear energy tends to be safe and effective and efficient in terms of energy generation, I think that’s a good thing, whether it would qualify as a very important thing to say from an EA point of view is probably not unless we’re talking about extreme climate change. But I think it’s still very important in terms of helping governments make the right decisions, helping journalists talk about the issue in the right way, and helping citizens understand the issue in the right way. I think it also comes back to the question you’re asking about making the one more sane. It’s this idea that there is some kind of intrinsic value in making people better informed about things. And in the long run, if we want to avoid some kind of X-risk, or S-risks, possibly, then potentially there’s intrinsic value in making sure people in society are informed in the best possible way, even if that doesn’t directly impact some kind of existential risks.
I mean all I can think about this is that it seems just really obvious to me that it’s just generally useful, just to know how most parts of the world work without zooming in too quickly.
Yeah, I think it’s very important to understand how the world works, to build a worldview that’s coherent, and also a theory of change that’s coherent. I think it’s also good for the idea we all have that a lot of EA ideas might be wrong. We might find out in a few years that we neglected something in particular. I think a good example for this is education: we might find out in 10 years that actually there’s a growing consensus in EA that actually, on average, striving towards better education of the population is actually a very impactful thing because it has indirect side effects on many other existential risk problems. And so in that case then, it’s going to be extremely useful to have good charts about education.
And in the first phase, I guess the only reasons that we now have some idea of what problems seem especially important is not because people just started off really fanatical about them and succeeded in pushing for them is because people were able to scan across the horizon of a number of problems and see the ones which happen to stick out, but only because they can make the comparisons to lots of other things, right?
Exactly, yeah. I think we see it as our responsibility to provide that input, that fuel, so that other people can later decide what they think, given the best evidence available, are the most important problems.
OWID for longtermist cause areas
With all this in mind, you mentioned at the very top of this episode that one of the things that you’re currently stuck on is thinking about AI visualisations and with that in mind, that existential risk and longtermist dynamics, I’m keen to delve in a bit more on what Our World in Data for longtermist cause areas would look like. Starting off broadly there, and then we can maybe dig in a bit more into specifics.
I think the two big ones we know we want to be working on and we’ve started kind of diving into are AI and pandemics. I think the pandemic one feels pretty obvious because of the work we’ve done on COVID. I think we also are in a good place to talk about this without people questioning too much why we would be legitimate to talk about it. I think it’s a little bit more difficult for AI because of the stuff I mentioned at the start. For us, it seems pretty obvious for most people, it seems pretty random. It seems pretty random that this website that has been talking about COVID, and climate change and poverty would suddenly talk about something as specific as artificial intelligence. And so we need to think carefully about how we want to approach this question. I think something like providing better data on things that we already write about. So things like nuclear weapons, for example. Also providing good data we don’t have yet on bio weapons, on AI capacity, things like this, is something we really want to be doing. Then there’s the question of what is available. And the problem is that on most longtermist stuff, there isn’t actually that much data available, because it’s more about potential technologies, about risks, about things like this. And so you can’t really make a line chart of much at all. Like bioweapons - what exactly are you doing? You could do stuff about stockpiles of nuclear weapons, and we have that, but on most things, like AI capacity, it’s much more subtle than just tracking things over time. So a lot of it is about writing and explaining to people why we think this is an important problem to worry about. Other things we’re looking into, but that are difficult to explain are probabilistic measures and things like forecasts where something like the forecast of an AGI by Metaculus or the forecast for World War III, is something that provides some aggregate estimate of how close we’re getting towards an event that could be impacting the long term future of humanity. The problem is then we get into these more epistemic questions of, ‘How do you talk about forecasts? What are these forecasts?’ And again, it’s something where, as people interested in EA and the EA space, we tend to forget that websites like Metaculus - I know a lot of people at Metaculus, I feel free to say that - for people discovering them, they don’t make any sense. There’s just a bunch of people with usernames, making forecasts without strong justifications for them. And as people who are interested in the topic and who’ve read Tetlock and all that, we know what’s behind this. We know there’s a lot of effort to build consistent forecasts through careful analysis, and that people are judged through scores and through their track records and all that. But again, it’s something that needs to go by carefully.
It goes back to your training analogy, right? A lot of people are five stops down the line, and you’ve got to realise there’s a lot of stops on the way to understanding why this stuff is worth paying attention to.
Yeah, and almost every time I go to an EA conference, people ask me, ‘Why don’t you embed Metaculus’ forecasts into your website? Like on your page about nuclear weapons, why don’t you embed a chart about the chances of a nuclear weapon being used?’ And I think that’s a relevant question, but I think the big problem is that for most people who read our website, it would not make any sense. It would feel completely out of context. We’re showing data about real life measures of things that have happened, like how many weapons have been stockpiled by countries, and it’s very hard from there to just then, under that, lower down the page, show a forecast made by unidentified people about the chances of these weapons being used. I think it’s somewhat different and much easier for us to do when we’re talking about a big institution. So we show probability forecasts by the UN on population, for example. But population is much less contentious, it’s much clearer exactly what these forecasts are based on. They’re based on current population and fertility rates and life expectancy. And it’s much easier to explain, and people would know what the UN is, we don’t need to explain that. Or they know what the IPCC is, so we also show scenarios by the IPCC.
It’s also like a data visualisation question, right? How do I show a line chart, which is probabilistic in nature?
Yeh, exactly. And you have this issue just even on the Metaculus website, where you’ve got the median forecast over time, and then lower down the page, you’ve got the probability distribution of the current situation. And this is something that’s super hard to explain to people, if they don’t know about forecasts.
It’s almost that you want to show the distribution over time. If you have three dimensions, you can kind of do it but-
Exactly and it’s extremely difficult to do. And something saying that the UN thinks there will be a little bit more than 10 billion people by the end of the century. That’s okay. Saying that a bunch of people who are good at forecasting but we don’t exactly know who they think that there is a 5% chance that something might happen, but they also think that it like it has a 1% chance by some estimates and that it has changed by one percentage point over the last two months. This is extremely hard to define.
There is a reasoning transparency point here as well where I would want somebody inferring AI X-risk statistics from Metaculus to also be forced to go through the reasoning transparency process of, ‘What is Phil Tetlock forecasting? What is Metaculus? What caveats do I need to keep in mind here and what biases might I need to correct for? What epistemic certainty should I interpret this with?’ which, especially when you’re producing things for the internet, and people will not read the full article in depth and often will just see the data and see the chart as well, seems really difficult.
Yeah, and something we’ve been discussing a lot with people, especially at Metaculus, is the fact that I think one thing that would make it easier for us to use forecasts would be context. And the problem is that the current landscape of the forecasting space is very much focused on providing forecasts, and letting people actually do the forecast. But it’s only recently that people have started to develop analysis and justification for these forecasts. And now there’s a few newsletters of people who make forecasts and who try to say, ‘Okay, here’s my thought process on why I think this comes down to 2%’, or, ‘Here’s my thought process on why I think AGI probability has gone up by 10%’. And I think these actual analyses are useful. The actual raw number, the actual probability, or like, ‘I think there’s a 35% chance that this happens’, that’s actually not as useful as the thought process of why you think here’s how it has gone to prison.
I often hear this around biotechs as well. That a lot of the value you get is by having to go through this process and make your model explicit, not the number at the end. And if it’s just about sharing the number at the end, and people just read that and take that away, then they’re actually missing the whole value here.
Exactly. Yeah. One of the early uses we could find for forecasts would be something like if Metaculus has a forecast of world population by the end of the century, and it’s significantly different from the UN one, then, if we had on top of this a good lengthy explanation by one of the forecasters of why they think the UN estimate is wrong, I think that could be interesting. And I think people would be interested in that. And it would be easily justifiable for us to publish this on the website and say, ‘Hey, we’ve shown these UN forecasts for a while now, here’s what other people think. And they think that the UN is wrong because they overestimate fertility rate rebounds’ or something.
One other thing I want to throw in the mix here and see your reaction to is that I would love to see more forecasting comparisons between climate change and the IPCC. One thing to say on the top right is that a lot of IPCC analysis are these five representative frameworks, which very explicitly say is not a probabilistic assessment. And I think that often gets misinterpreted by almost everyone, especially lay people and decision-making people who aren’t the IPCC authors who are writing this big disclaimer. I would love to see that as well as an opportunity to see how the forecasting community diverges on really long timescales, from what is often mainstream consensus or what are IPCC reference points where this probability distribution is not set because the authors don’t want their work to be interpreted in that way.
Yeah. And I think maybe one of the limitations of having an institution that’s so strong and so respected in that particular topic is that no one really feels legitimate to provide alternative scenarios or alternative forecasts. And people tend to defer to the IPCC a lot, which I think is good overall. The net effect of that is extremely beneficial, that people don’t constantly second guess the IPCC, and that it’s considered to be a source of truth in a way, or at least of very good scientific research. But I think it definitely means that people who might produce estimates that significantly vary from what the IPCC scenario says, or what they’ve kind of estimated to be likely, might be less willing to publish that, or we even as Our World in Data might be less willing to compare them, because the IPCC is considered to be such a strong institution.
Hear This Idea 1:50:40
And also, to frame this around epistemic status, my understanding at least is that the reason the IPCC doesn’t do probabilistic forecasts and uses these five reference scenarios is because they don’t think they can hit their level of scientific rigour when it comes to probabilistic forecasting, that they don’t want to dilute the other sciences where they’re able to, make much more confident guesses. But in my mind, that just creates this big gap where I want to know the probability distribution, not five reference scenarios and can somebody please forecast this. And I know this is a live Metaculus question, but when I looked into it, there were like two, and there were maybe 2000 anonymous internet guesses, when I think this is such a thing that deserves more attention.
Yeah, definitely. I think it’s very important. I think I’ve seen quite a few EA reports try to do that, especially trying to estimate what is the current probability of something like a six plus degree scenario. But I think, obviously, if that was directly given by the IPCC, that would be pretty amazing.
Key indicators for the trajectory of the world
Yeah. So zooming out, one thing I was wondering about is what it might look like to have some kind of world dashboard for the key metrics for figuring out whether the world seems to be headed on a really great trajectory for future generations, or maybe things are looking a little shakier. So maybe one question there is: what do you imagine being on this kind of dashboard? Let’s say there’s 10 different indicators, and the needle has moved backwards and forwards each month, or each year - what are those indicators in your mind?
If we forget about the whole discussion we just had about how you explain things to people and all that, and we just take the metrics we would want to use, I think I would definitely include a bit of all the risks we think humanity is facing. So things like nuclear weapons stockpile, some kind of estimates of the probability of a nuclear war happening, things on AI capability, things on greenhouse gases’ concentration in the atmosphere, bioweapons, all that. On that particular dashboard the two of you imagined, I think having forecasts would be very useful because I think they are a type of aggregate measure that takes into account all of these different metrics, but also media discourse and what’s happening around the world, and it’s churning all that by using some kind of giant Bayesian process of like, ‘What does that all mean? And is this going up, or is this going down?’ And I think when forecasting is done well by enough forecasters who update regularly enough, that’s the output you’re getting. Some kind of measure of ‘Are things becoming more dangerous or less dangerous?’. And so I would want to have that on the dashboard. And I think then what I would want, and I don’t think it exists, would be something like the doomsday clock, but in a way that would be they would feel more rigorous. If we had an aggregate metric, even if it’s questionable, even if people could criticise it and think that it should be improved in some way, I think even some kind of attempt to have an index of, ‘Is the world through 30 different metrics, becoming safer or more dangerous in the last six months?’ I think that would be extremely useful. And obviously it feels like right now, whenever the doomsday clock is updated, because of the way it’s being framed and the way it’s being presented, it can always go up. I would be surprised in the next few years if it went down. When actually, if you made it as an aggregate metric based on real life metrics, I think it would be legitimate for it to sometimes go down slightly by a few seconds, but I think because of way it exists and the way the reasoning was created, I don’t think it’s likely to actually go down. So I think I would be really keen to have something like that on the dashboard.
That’s a great idea. To be clear, I think the Bulletin of the Atomic Scientists are really doing amazing things, but I guess I conceptualised a doomsday clock as more of a art project or advocacy project, where at least it wasn’t really set up to track in some granular way what’s actually going on in the world, which is fine, but something which does do that seems really-
It’s interesting because when you go on the website of the Bulletin, they have a dashboard of data. And that dashboard of data has component metrics. They have a bar chart of nuclear weapons over time, they have something about nuclear material stored, they have sea level rise, they have carbon dioxide in the atmosphere, they have temperature differences over time. And so that gives you a sense that they are using those kinds of metrics to think about the world and how it’s evolving, but then it’s quite obvious that when they gather to decide what to do with the clock, they don’t actually churn the numbers, they don’t use those numbers to define that it should go up by 3.4 seconds because of these metrics. They just discuss it. I don’t know exactly how and they decide how much to-
Also they reset irregularly as well. So it’s not like it’s actually going backwards and forwards unpredictably.
I’m keen to revisit the question of what metrics we might add to this idealised dashboard, and especially setting aside the Our World in Data kind of constraints, and thinking what listeners, what other people might be keen to do. One area here that strikes me as particularly relevant is AI capability. I know there have been some new attempts at this as well. So, I don’t know if you’re familiar with I think it’s pronounced Epoch, but around tracking flops over time; you might also consider, I think CSET has done some of this work in the space of tracking military R&D spend on AI. Are there any other particular metrics that you want to shout out as cool for someone to do and track?
Yeah, I think this stuff that Epoch has been doing is extremely useful. And we have already published several charts with their data. Because I think it was thoroughly needed, and it’s very good that they’re doing it. I think the limitation with it is obviously that it’s very difficult to explain exactly what this is. To most people the idea of flops, and the number of parameters and size of the model, doesn’t make any sense. It has a lot of assumptions built in, that you can be okay with, if you already know what you’re looking at, if you already know about AI, and then it’s a very good way to keep track of the situation. But if you don’t know any of this, it’s actually quite a terrible chart to show people. If I showed this to people around me or to my parents to tell them about AI trends, it would be a very bad way of introducing them to the topic. So because this is difficult, on the other side of the AI thing, we tend to fall back to things that are too simple, like number of papers in conferences, which feel somewhat interesting heuristics of what’s happening, like proxy metrics of the situation, but, obviously, it’s pretty obvious to everyone that number of papers in a conference doesn’t really directly tell you about what’s happening in terms of AI safety, it’s just satisfactory at best. And we use that as a proxy, but we want something better. So I think if people could come up with something in between something that would somehow describe AI capability over time, through some kind of AGI lens of like, ‘How are the best models doing overtime in terms of their combined capacity to understand images, produce speech, produce video, beat us at games, and all of that combined or pass the Turing test, and all of that combined? How has it gone up over time? And how close are we to something that will be complete?’ And obviously what I’m describing is almost impossible to do. But if people could try and get closer to that, I think that would be very useful.
Well, I guess this is what AI benchmarks exist for, to some extent: trying to compare some common sense capability across different models. So image net for classifying images, and the Turing test, in some sense, is an informal benchmark for conversational ability. We can try coming up with a new benchmark for truthfulness, truthful QA, maybe coming up with new benchmarks which can scale to really advanced systems, and maybe also just tracking the existing benchmarks, and presenting them. Because I don’t really have a good picture for progress over time on any of these things.
And I think something in there that would be a combined benchmark of what we define to be AGI would be good. Something that would combine: yearly competition where people could submit models that would need to pass the Turing test, plus play a bunch of games, plus pass successfully a high school math exam or something like that, and then we would see the combine output of these models for this competition. I think that would be interesting because I think that’s what we’re getting to when we talk about AGI, something that would be, broadly speaking, intelligent. And I think, again, if you talk about things like a benchmark for ImageNet, or something like that, sure it’s interesting, but again, it’s difficult to explain to people who don’t know about AI what an image net benchmark is. It’s quite tough. It’s doable, but it feels like you’re providing part of the answer, but then you’re really giving them something very partial.
Just thinking out loud, I wonder how much of this is trying to relay to people what the underlying metric is versus the more general lesson of trying to help people internalise what super linear trends or even exponential trends. What it really means for something to suddenly be this good at chess one day, and then 10x as good at chess in a year’s time, where it’s not even about, ‘It’s good at chess, and therefore transformative AI’, it’s just that these things can improve really, really quickly. And if we then consider something else replacing this metric, maybe it’s just about really internalising what exponentials are.
And I think in this way, the very big benefit of what Epoch is doing is that they update the data very frequently. And if we’re talking about tracking the capacity of AI in a very short timeframe, that’s very useful, because even if we had this kind of yearly competition of AI generalised benchmark, we would have results late. And again, if we go back to that idea of timeframes and reacting quickly, if something happens, then I think it’s useful to keep track of the model published by Meta last week, and how many parameters it’s used and how this has gone up by like another order of magnitude compared to six months ago. And I think that’s very useful. And I think we should keep doing that.
Tracking attitudes towards big issues
Yeah, just to shift topic slightly. Another thing I was wondering about was whether you’ve thought at Our World in Data about tracking attitudes to various big issues? Maybe looking at survey data, and how attitudes change over different rounds of surveys.
Yeah, it’s definitely another thing we’ve been thinking, especially in the context of longtermism. Because of the lack of real time and real life data, it feels like surveys can be a very interesting way to at least track something in terms of how often are people hearing about a particular issue? Are they aware that it is being researched? Are they aware that it might be a problem? So if you’re talking about something like AI safety, having a sense of, over time, whether people know about the issue, care about the issue, think the issue deserves more money, for example? That would be very interesting. So yeah, we’ve been thinking about surveys. The problem is, again, that there’s very little data about this, even just the surveys. So there was a recent YouGov survey about what people think could be existential risks for humanity.
Bees dying out.
And I think that result was actually interesting, because it shows that actually, some of the rest of the answers were pretty good and pretty interesting, and match some of the things we think, and some of them were completely different. But I think it would be interesting, but again, just that was like one survey in isolation. So you could produce a bar chart, but obviously, what would be interesting would be a panel survey every six months or every year. The same people being asked again, and again, ‘What do you think about AI?’ And if it goes up, then it could provide some kind of evidence that efforts to make people more aware of the dangers of AI are working?
I’m wondering how much of this is, as you were saying, less about making data accessible than it is about, in some ways, narrative or explaining concepts and models. And how much of this then becomes about creating these useful memes or something, again, to make everything about climate change a little bit- I’m thinking here of how the IEA graph of solar power prices decreasing versus the IEA forecasts each year is probably really wonky. And if I tried to explain, in detail, to my parents, what’s going on here, that almost misses the point where the graph by itself is a really powerful narrative tool. And you don’t have to understand all the details of what Wright’s Law is, what solar power prices mean in turn for combating climate change and different technologies and hedging all that. It’s a powerful graph. And thinking what other stories you might be able to capture here in AI or in bio, data tracking aside and more on this narrative element is a more interesting box here.
I think that’s an interesting way to think about it, a way that we would use data to provide evidence but also just to capture an idea of, for example, the fact that in the last few decades AI systems have become much better at winning games over human beings. And you don’t necessarily need to provide the ELO ratings of chess computers over time to do that, because, as you said, then you need to explain chess and ELO, and exactly how it works and why it is going up, and how it is performing. But what you could do would be to provide a bigger set of charts that maybe you would compare AI to humans over 16 different games, and then you would just do a 4x4 charts with small multiples, where you wouldn’t even try to label necessarily the y axis, it might be a common measure, it might be different measures, but you would do small multiples of like, ‘In every single of these 16 games, AI has become stronger than humans’.
In some ways that reminds me of the Our World in Data meme which I probably see the most, which is like, ‘Things are getting better over time, and here are these 16 graphs of things that were previously red, and now green, and it doesn’t matter what infant mortality or what life expectancy or the poverty line,’ it’s just like, ‘Things are going green, and the world is getting better’.
And I mean, obviously, for that chart, there is a label, and it’s a simplified version of it, which is the world as 100 people. In some ways it wouldn’t be a good idea, but we could remove that label, we could remove the axis, and we could just say, ‘Look, three of them are going up, three of them are going down. The world is getting better.’
Definitely part of the power is that, if you were to scrutinise any of the 16 graphs, you would find a lot of rigour and evidence for it as well. So I understand why it doesn’t fully translate. But I think there is something here, thinking about this through the sense of explaining a narrative versus trying to explain data and methodology, that I still think has something to it.
Yeah. And actually Max Rosa did that just a few days ago, when he produced a new chart about decoupling between GDP growth and CO2 emissions. And what he did was that he selected a bunch of countries and, initially - he just told us about this today - he had plotted the actual GDP and the actual CO2 emissions. But then he realised that, actually, it makes the story more complex, because then you’ve got some countries where there’s a bunch of spikes, because something has happened and that has made GDP go up and down. And so it makes the story less clear. And so what he’s done in the final version is that he made it a slope chart where things are just shown in terms of the percentage change between last year and 1990, for example. And so, for every one of those countries, it’s one line going up as an arrow, which is showing the GDP change over those 30 years, and then one arrow going down, which is showing the CO2 emissions change over 30 years. And it’s not necessarily trying to show all the complexity of what’s happened in every single one of these years, but it’s just showing, ‘Hey, over 30 years, all of these countries have managed to grow their GDP, but to shrink their CO2 emissions.’
I’ve seen so many things struggle by not zooming out during the 30 year comparison, reading way too much into when the dataset ends, which currently at the moment is 2021, and hence, the COVID recession, and hence, emissions going down. And so many academic papers I’ve dug into, their significance is screwed up by using 2021 as an endpoint and comparing it to 2015 or 2016 or something.
It’s something we’ve been struggling with. Also, because institutions are struggling with it. There’s a bunch of data sets right now where we’re still waiting for the latest version, because the latest version includes the pandemic. And so they’ve been taking months to publish them. Most recently, in July, we finally had the latest update of the UN population data. But it took them an extra year to publish it because obviously, it’s very hard. You don’t just extrapolate as you usually did. They had to subtract more people who died from COVID. So life expectancy went down in almost every country. Way more people died, and so the population difference from one year to the next was much lower than it usually is, or much higher for people for countries where the population is shrinking. And so we have the same thing right now for the Global Burden of Disease study where we’ve been waiting for more, I think, more than a year now for the latest version, and it was supposed to be for the summer and now we doubt that it’s gonna be this year. And they’re doing their best. It’s just extremely hard to factor into your model that kind of event like a pandemic, that has never happened before. It’s extremely tough, and obviously, I wish them the best of luck. Yeah, we’re waiting for the data, and as soon as it’s there, we’ll show it, but in the meantime, I can’t imagine how extremely difficult of an effort it must be.
So I’m curious. Just putting aside specific examples we’ve been talking about. Are there any ideas for data visualisations or graphs which you’d really love to see but maybe they just aren’t such a great fit for Our World in Data because they’re a bit too creative or artistic or weird or editorial or anything like that?
Yeah, I don’t necessarily have specific ideas, but I think as a broad point, I think it’s very good that a lot of media outlets are trying to make those visualisations that just would not make much sense as OWID realisations. So, we tend to think of our added value as providing stuff that is quite standardised. Basically, most of it looks like a bar chart or line chart or map, or sometimes a scatter plot. And we think our added value is to make sure it’s clean, accurate, up to date, and that there’s good context and a good title, a good subtitle. And that’s basically it. We try to do that in the best possible way. What that means is that we never spend time building very customised, complex visualisations of things. And it’s not because we don’t think they’re valuable. It’s just because we think our value lies in the systematic stuff. So for example, something we really like to read, something I really liked recently, was the Financial Times feature on space debris and space pollution, which was this crazy complicated feature with three graphics showing space debris around the planet. And I don’t think I’ve seen something as fancy as that in a few months, or even a few years. But I thought it was extremely good, because it gave a very good sense of what the problem is there. It was extremely beautiful. Extremely well done. To me, the limitation of that is that I don’t think that the FT has a plan to update that, which is not necessarily that big of a problem for space debris, because the particular description of the problem is going to stay current for another few years. But probably the total number shown is gonna go up by another few orders of magnitude in a few years, and then that article will be out of date. And generally speaking, we try to provide what other institutions and media outlets tend to neglect, which are things that are maybe less nice to the eye, less sexy, but that we make sure we can update on a timely basis. I think also things that would be similar would be heavily customizable scenarios on climate change, where people could explore data but also use sliders, for example, where they would change different assumptions about what happens, or policies that are implemented, to try and get a sense of how this could play out in terms of climate change effects. I think that would be extremely good, and I would like to see that. Things that would also be heavily customised to give people a general sense of time and space are good. I think it’s something that’s very difficult to do. Kurzgesagt published an app that tries to give people a sense of how things compare in space. And I think for obvious reasons, for longtermism, doing that over time would be a very good idea. And I think it’s a very good idea. And I don’t exactly know how this should be done. I really like it by the way - I don’t think it’s available anywhere online - at EAG Prague, there was this film that was shown, where they tried to do exactly this, where each sequence of time was one order of magnitude of time. And so it started at the Big Bang, and then it would slowly compress time, up to the point where the last couple of minutes of the film were real time footage of the drinks just before the film was shown, like showing people getting into the theatre to look at the film, and then finally a shootage of us watching the film. I thought that was really good. And I think more experiments like this to give people a sense of how big time is, and obviously trying to do that for the future as well could be very good.
Yeah, awesome. Like I said, I think this is a really great idea.
Fin, do you want to say some of your ideas on this as well? Not to put you on the spot.
I mean, not especially different from what Ed was talking about, but I think the core idea is being able to start, like Ed said, on familiar timescales, timescales of days or years, and then you can zoom out to timescales of centuries where we see major historical events. And then you keep zooming out, and you keep zooming out, and you keep zooming out. And after a while of zooming you see just about the point where earth stops becoming habitable. We could live that long really feasibly. And then you keep zooming out, and maybe your finger gets tired from scrolling, and you see this is where the last stars begin to form or eventually burn out. And just seeing the incredible differences in order of magnitudes of time. I can imagine that being quite kind of powerful and affecting. Honestly one problem is that if you just look at the numbers, especially on the size of the future, it’s so long that you just have to scroll for so long.
You get through a problem - you’d have to make it interesting. The user experience at some point, you need to zoom out for one minute entirely to get to an order of magnitude where something has-
There are so many amazing precedents for this. So there’s a video called Powers of Ten, from the 70s. So it starts with someone’s eye, and then it’s a camera zooming out and they’re lying on a field of grass, and zooms out to the entire Earth. And then even through our solar system in the universe. There’s also a time lapse of the universe video, which I’m sure it’s Googleable. The Stockholm Metro - Anders was telling me this - has an evolution timeline, which you see when you’re in the train in the tunnel and you drive past history, which is really cool.
Maybe just to add: one of the challenges it seems here is to keep people’s attention for that long or to make it interesting that long. I want a visualisation that doesn’t compromise with that. The thought I have here in mind is this push notification where you don’t zoom out to log scale, or orders of magnitude or something, but it’s an incredibly long time over which these push notifications get sent, so that, when I’m 80 and I’m on my deathbed, that’s when I get the push notification that the last few minutes now died.
Getting a feel for how log scales translate to linear scales is really important across a bunch of contexts. And a way to do this is, ‘Here’s how long something would take. Here’s how long it would take if it was two orders of magnitude longer.’ And you just have to wait that long to see. I think another great example I forgot to mention is Carl Sagan’s idea of a Cosmic Calendar, on an episode of his show Cosmos. You imagine the history of the universe from the Big Bang to the present day, as a calendar year, and you could ask what happens at different dates in this year if you scale all of history down into a single year. And of course, the punchline is that everything happens in the last five minutes before New Year’s celebrations. But I really love these analogies
And I think that’s an important point. Because we tend to use log to solve this question of, ‘it’s not possible to wait for the actual amount of time to show things.’ So we go up by powers of 10, or we go down by powers of 10. But I think there’s something strong to also show things in the actual scale they have. One typical thing that people do when they learn astronomy is to represent the solar system in its actual scale by using a grain of sand for a given planet, and then one person from the classroom has to go to the other side of the school to represent the distance to this planet. And I think that’s really good, because it doesn’t compromise on the scale. And it shows you that things are just mind-bogglingly big.
Some things are really big. People listening can Google, ‘If the Moon Were Only 1 Pixel’, which is exactly the thing you mentioned. Just on your browser, where you just scroll to the right.
I remember this being almost ridiculous, the amount of time you had to spend just scrolling through the page. But I think it’s a good example. Yeah, you have to scroll for a really long time.
We should also maybe plug the Long Now Clock as well.
Right. Have you heard of this Ed?
No, I don’t think so.
So the Long Now Foundation: Stewart Brand’s interested in various advocacy and output projects around understanding ourselves as situated in very long timescales. So, for instance, they write out dates as ‘02022’ because, of course, it might go above four figures. The big art project is an enormous clock in a mountain in a desert in Texas called The Clock of the Long Now, and I don’t know the exact details, but it chimes every decade or something like that. It’s great.
I like it. I’m looking at it now. This is pretty cool.
The Value of Transparency
Okay, cool. Let’s press on then. One thing I was curious to talk about is transparency. All the data Our World in Data relies on, it’s all public. It’s on GitHub. It’s open source. Presumably, this actually requires a lot of effort on your part to maintain all that. So I wanted to just ask: where does the value of transparency come from? I think everyone agrees it’s really great. But let’s zoom in on the concrete reasons for being transparent. How do you think about that?
I think there’s a few strong reasons to favour that, and all of them have been pretty obvious during the COVID period. The first one is that as soon as you’re being transparent, you allow people to contribute. If things are closed, or behind a paywall, or just not public, people are basically facing a complete impossibility to contribute to your data. And so I think in this way, because for COVID, we needed people to provide us with numbers - the first few weeks of the of the vaccination data collection was mostly me getting emails and comments and tweets from people sending me numbers from their country, and telling me ‘Oh, it was just announced that 11,000 people had been vaccinated.’ And without this kind of contribution, I would have been incapable of doing this because that was done in dozens of different languages in countries I have no idea what kind of media I can trust and things like that. So that was very useful. Another thing of transparency, is that it means that there are always extra pairs of eyes checking what you do, which can sound like a bad idea, or something not really desirable. But I think when you’re working on something that’s quite impactful. And you want to make sure that what you’re doing is right, it can sound a little bit stressful at first to put your code online and your data online and have people double guess everything you do. But to me, it felt really reassuring. During that period where I was doing all the vaccination counting, it would have been much more stressful for me to make the whole process secret, and then just spit out numbers, and ask people to trust me. The fact that all of the code was online, and all the processing was online, means that, yes, sometimes people told me, ‘Hey, I think what you did here is wrong.’ And maybe it was wrong, and I fixed it. But that felt much more reassuring than trying to do all of it on my own and telling people to trust me. It also means that sometimes when people do accuse me of having done something wrong, or having changed the numbers or something because I’m I’m pro-vaccination or something, when you use something like GitHub, you have a permanent track record of all the changes you’ve ever made to the data and to the code, which is very useful, because that means you can go back to any point in time and show people how the data used to look, when it has changed, what has made it change, what line of code is responsible for the change. It has rarely been useful. But when it is useful, it’s very useful. It’s very useful to prove to people that, no, you haven’t cheated on the numbers, you haven’t changed anything to look in a particular way, you’ve just imported the data again, and it looks different from the hour before, for example. And so these are all strong reasons to, I think, favour transparency. The last one is more performative, I think. It’s the idea that, let’s be honest, more than 99% of people who browse Our World in Data will not check the data, they will not check the code, they will not open GitHub in any sort of way. But there’s something very strong in letting them know they can. Even if they don’t. The fact that they know they could possibly, if they wanted to, double check what we did, and double guess our assumptions, lets them know that we’re the kind of organisation that is open, and that doesn’t try to hide anything.
It’s like a signal of trustworthiness, right?
Yeah, it’s purely a signal. And even if, again, it’s not used that much, people trust that signal. Whereas if something is hidden behind closed doors, it signals that maybe you have something a little fishy there.
Yeah, I guess it’s like if a magician is doing a trick, and he’s like, ‘Hey, would you like to shuffle the cards before I do this trick?’ And even if you don’t take the option to shuffle the cards, it’s like a signal that-
Yeah, I like that.
Yeah, it’s very similar to this. And I think some of the institutions that have been struggling with criticism of their methods, and I think the Global Burden of Disease studies, particularly in that situation, part of it comes from actual problems in the data, but also part of it comes from the lack of clear explanation of some of the methods they use. It has also happened with COVID estimates of some institutions where they’ve churned out forecasts of COVID cases. And they didn’t make the code available in any sort of way. So I think, when that happens, it’s much easier for people to think, ‘Okay, this is where not only do the results look weird, but they don’t want to make the code available.’ And this is really not a good signal.
Well, you can also just imagine that you would want to fix something as criticisms get pointed out or as feedback comes in rather than to let it all build up, and then suddenly, it’s too much. And then you need to throw your whole model out and start all over again.
Yeah, it also relates to the timeframe that these institutions are working on. I don’t think it’s necessarily good that these big datasets are published every two years with a huge release update every two years, and that if something goes wrong during that update, then that mistake is going to be there for another two years. We just found out yesterday, a small problem with one of the population estimates for the tiny territory of Sint Maarten that belongs to the Netherlands. And it’s not a big deal. It’s a few 1000 people, maybe, I’m not sure. But there’s a big problem in the estimates where they basically were showing that on the entire island no one is aged in their 30s. Everyone is either a child or older than 40. And it’s obviously a mistake. And the problem is we asked them and they told us, ‘Yeah, we found out about the mistake. And it’s going to be corrected in 2024 when the nice update is done’, which again is something that would never happen in any kind of open source model where we will just ship an update within the next day or something to fix the mistake here. Because the whole process is not done in an open source, interactive way. We rely on the old way of doing things.
Can EA organisations learn from open-source models?
Thinking about open source software, again, it’s cool that, as well as being more democratic and legible, scrutable in many ways, I always saw software as often just better in virtue of being open source. It’s often more stable, often more trustworthy, and so on, which prompts this question, which is: how good are the EA orgs? Maybe to take on this lesson about transparency, concretely speaking.
It’s something that I think EA organisations should do more of. One limitation of that is that because I’ve never worked in those organisations, it hasn’t been exactly clear to me how much of it is really an opportunity. Recently, in the last few months, I’ve talked to some people at OpenPhil, at GiveWell, at other organisations, trying to understand whether they have datasets that can be published, whether they have code that can be published. Some people told me that I’m kind of over estimating the amount of code that there is or of data that is being used. And that it’s mostly spreadsheets and stuff like that. And the problem is that I think spreadsheets are, on average, a good thing. I think there’s a big trade off. Because initially, I thought my advice should be that they should get rid of the spreadsheets and replace it with code, because code is more transparent, and the problem of a spreadsheet is that it’s very good to show transparency, but it’s very bad at tracing back what you did. When you land on a given spreadsheet, it’s extremely difficult to actually understand, because you start clicking on cells and seeing how they relate to one another. It’s very hard to understand what they did from the first one. Everything is interconnected, and it’s very difficult. I think, maybe, if code was used, that it would be much simpler to just see linearly, like, ‘Okay, there’s this assumption first, and then it’s multiplied by this and that it’s multiplied by that.’ The problem is then there’s a huge trade off, because there are millions of people who know how to use a spreadsheet, and there’s a fraction of those who know how to use code. And so I think there’s very good arguments for making things available in spreadsheets rather than code.
There’s something really interesting about a fundamental difference here between spreadsheets and code. Whereas as you said, when you’re reading code, you’re literally reading through the methodology. That’s what is happening versus, on a spreadsheet, every cell is a result. And there’s a core kind of difference here in how people can interpret or engage with it.
Yeah, when I land on the spreadsheet, even a very good one, I feel like I need to reverse engineer what happened. I need to trace back the thought process of the people who made it. When I open a piece of code, I’m not reverse engineering, I’m just reading what they decided to do line by line in the exact order that they decided to do it. So I think as an instrument of transparency, code is much better than a spreadsheet. But you don’t just need to think about the transparency, you also need to think about the number of people that are impacted by this transparency. And in that regard, a spreadsheet is many, many times better than code. And if GiveWell was just publishing their analyses on OurFiles or Python scripts, I think that would be a bad idea
And in an ideal world, part of me thinks, ‘Why can’t you have both?’ Or at least, on the GiveWell level, where there’s a single analysis, you want to have the fancier code, which also lets you do things like probability, like distributions and stuff like that, but then also have this spreadsheet as more of a layperson thing, where, maybe the only thing you are interested in is the central estimate. But, yeah, having both, which is useful
Or things which look a bit like code with the power of spreadsheets, or spreadsheets, which are as structured and legible, and linear as code. So Guesstimate being an example of the second thing. And then another project from Korea, which I’m very excited about, which is Squiggle, this a programming language for making estimations where you have a lot of the tools that come in a spreadsheet, but just in a language.
Yeah, it’s a bit of a Catch-22 where you could imagine millions of people know Excel, or tens of millions of people know Excel. Millions of people maybe are in Python and maybe 100 people Squiggle.
Soon a million.
Yeah, I’m sure to come. Also, just talking about tangents, why isn’t that more popular in banking and consulting? This still really gets to me. Things like Guesstimate or things like Squiggle.
I think some kind of software that would try to bridge that gap and provide both a spreadsheet view of things with the underlying code behind it, for people who want to read the code, I think that would be extremely good. Because the fact that spreadsheets are better leads people to use spreadsheets and not code, which I think is good. But that means that they never publish code, which I think hides some of the assumptions. Part of it is also that I think when EA organisations use code, which, as far as I understand, is still the case, sometimes, I think they should publish it. And the problem is that never happens. And this is not so much an EA thing , it’s just a research thing in general. Most researchers hate publishing their code, because they think it looks bad, which it sometimes does, but I think that’s okay. And I think we should also kind of cultivate a culture of making it okay to publish code, even if it’s not the most beautiful piece of code in the world. Whereas right now, I feel like a researcher, if they’ve produced a piece of analysis, they feel like to be able to publish the code, they would need to spend another couple of days cleaning it, polishing it, documenting it. The problem is they need to get on to the next one. And so they don’t take those couple of days, and so that piece of code never gets published. And so we get to a situation where if you want to know exactly how they did it, you need to email them, they need to get back to you, they need to send you the file, and maybe they procrastinate, because, again, they think they need to clean it. And so in the end, you’re not even quite sure if you’re going to get that code. So I think it’s important that people start publishing those pieces of code, again, with this idea that there’s a good chance it’s never going to be right anyway, it’s, again, what I was saying about people usually not checking anyway. And I think it’s good if one person next year wants to read your code. And maybe they’re going to make an extra effort to read it, because you haven’t made it super beautiful and super well documented, but I think it’s better than not publishing anything under the excuse of, ‘Oh, it’s not perfect.’
And on the face of it, this feels like such a cool part of reasoning transparency, which a lot of EA organisations champion, which you’ve spoken with Michael Aird at length about, famously, and he comes to what you were saying earlier, as well that, if you’re trying to make the world better, and you’re wrong, you want to know that you’re wrong, and you want to be corrected. And that’s what matters at the end of the day, and it’s important to spell out assumptions and make things legible.
And I think that feeds into the idea of what would happen if everything was code. If everything was code, that means that EA organisations could publish those analyses, and then other people externally could read them, and create what we call ‘pull requests’, which are requests to change the code. And they could say, ‘Okay, on line 18, you’ve made the assumption that you should multiply this by 4.5. But I think it’s close to 5.2.’ And they could justify why. And then when the EA organisation would maybe validate that change, then the whole pipeline could be run instantly, assuming that change. And the whole report could be published again, without having to rewrite anything because the report would be produced using something like Markdown or Jupyter Notebooks. And I’m describing a world that is quite different from the way that things are currently done, but it’s not that far off. Publishing reports in Jupyter Notebooks is actually something that a lot of researchers do routinely. And it’s just that I think you need to get to that point of making it systematic, maybe some organisations doing it so that other ones look a little bit bad if they don’t. And I think that’s pretty important. For example, right now, as far as I know, Rethink Priorities is the only EA organisation that has a GitHub repo. I looked at it six months ago, so maybe it has changed a little bit, but I couldn’t find any official repo for OpenPhil, GiveWell, or anything else
It’s so doable, right? Luca, you can speak to this, you’ve been thinking about questions like the social cost of carbon. These are questions which involve lots of complicated inputs, lots of complicated ways of combining them and thinking about them, and often injecting guesswork, where it’s hard to find hard evidence, but then there is a bottom line number. And you can imagine if someone was hosting their attempt at a problem like this on GitHub, or maybe it’s a Jupyter Notebook, you go through, you see, Oh, I maybe disagree with this estimate, this discount rate. I plug in my numbers. Let me make a poll request on your thing. Here’s how it’s different. Here’s how we get a different answer.’
I think one way to frame it is that it’s not just about the tool of GitHub and using it to publish stuff, but it’s also about the culture around it. That culture of poll requests, of writing down issues, of forking a repo to create a copy of it for yourself, doing something slightly different with it. It’s the whole idea of reusing people’s code, reusing people’s data to improve upon it and improve code, usually, but here it will be improving knowledge and basically building up on other people’s work.
And there’s an informative value here, as well: if you let people play around with your assumptions which you flagged as very sensitive or very crux-y , then get somebody to see, ‘Oh, if I change this by half x or 2x, oh, wow, the bottom line is super sensitive to that.’ I feel like that is often the really informative lesson that you get
And in some way, if you do that properly, you’re offloading work from yourself, because when you do it in a closed way, you’re going to publish your analysis, and then you’re gonna get 20 different comments asking you to do some kind of sensitivity analysis. They’re gonna ask you, ‘Have you changed this? Have you tried changing that? And what does it do?’ You’re basically creating work for yourself by having to do those analyses yourself. Whereas if your report came with a piece of code on GitHub, people could fork it and actually change the value themselves and see for themselves whether it changes anything. Probably not all of these 20 people would do it, but at least some of them would, and you would make your life a little bit easier.
I’m curious what you would make of some EA-specific objections to making everything transparent, or making everything code. So one of the reasons here might be that a lot of this analysis stuff is just sensitive. The term ‘info hazard’often gets thrown around in the bio space, but you could totally imagine - sorry that makes it seem trivial - in biosecurity the term ‘info hazard’ often captures this idea that there is just some information or some analysis that you just don’t want to make available. And then especially within a lot of organisations, it’s often around grant making, and that also involves personal relationships and putting numbers to organisation-specific attributes or stuff that you don’t want to make transparent or explicit. How would you think about incorporating some of these concerns?
I think it’s definitely an important part of the problem. And I think if you were doing some biosecurity analysis, maybe you wouldn’t want all of the assumptions to be made public, or all of the information you have, because maybe you got some piece of information from someone, in the government, and you don’t want to make them available. So I think there’s definitely an argument for removing some of the analysis steps, but I think then you should do it that way - you should start from a standard, the default should be transparency, the default should be ‘Let’s show all the steps.’ And then when you think you’re in a particular situation where there might be an info hazard, or might be something where it’s easier, or it’s better to hide away, then you can hide it away specifically for that. But generally, we’re in the opposite situation where the default is to not really publish anything, and sometimes through extra effort, publish a piece of code, or send it by email to someone who’s requesting it, too. But yeah, I think we should just reverse the situation and still be flexible from there
Let’s talk about data journalism. Obviously, Our World in Data is focused on just publishing objective data, and the idea is that people can make their own minds up about the upshots and what they can take from it. You can imagine other things which are more editorial and more just advocating for what seems important. And you get this idea floating around occasionally of, what if they could just be a news site that is just really squarely focused on which big picture stories in the world do actually seem like the most important and can we communicate them in a really kind of data oriented way? I’m curious if you’ve thought about that, whether you think it’s been tried? Whether it could just happen, or maybe there’s some reason why, like no one will be interested in it?
I think it could, and I think there’s space for it. And I think, ideally, I would want you to do it. The reason for that is that we sometimes find ourselves in situations where we don’t publish stuff, because we know it’s going to be out of date soon. Like when the war in Ukraine started, we were very tempted to start writing short blog posts, explaining stuff through data. But then that contradicts our general idea of making OWID an evergreen website, like Wikipedia, where you don’t stumble on outdated stuff. That’s a thing we want to go towards, making sure that we don’t have dark out of the corners of the website that people can find and have very old data. But that’s our stuff. But there’s completely a big space for making the opposite decision of writing daily articles about stuff happening in the world, not necessarily trying to update them regularly, but just making them about important stuff happening. But doing that through a lens of OWID-type analysis. I think some of these elements exist. What I’m describing, it looks a lot like some of what Vox is trying to do, a lot like what FiveThirtyEight is trying to do, especially I think FiveThirtyEight is very close to what I’m describing, except that I think the selection of subjects on FiveThirtyEight doesn’t look like what we want to achieve. They are very focused on short term stuff, whether it’s quote-unquote ‘important stuff’, like politics, like political campaigns, or even a bit more trivial things like sports. But I think something that would take the topic selection of OWID and the way of going about treating it of FiveThirtyEight, I think that would be extremely good. The closest thing I know is what people at the FT and The Economist have been doing recently. I really find that in the last couple of years, especially because of COVID, but also on other topics, The Financial Times data team and The Economist data team have been doing really, really new stuff about how to deal with something and like the Ukraine war, or how to deal with COVID, or how to go about talking about some kind of financial crisis, for example, and doing this through the careful use of data by doing some novel data analysis, not just spinning out something that’s already been done. And they have great people doing that internally. And I have hope that maybe the way to go about doing this is not so much setting up a new website, but taking a newspaper that has strong legitimacy and somewhat stable funding already, and creating a team within that that has the power and freedom to do that kind of analysis,
I’m sure actually there’s lots of established journalists who’d be really excited to do this kind of work, and it feels like it’s not for want of talented journalists. It’s probably not for want of resources, and probably not want of demand. I would enjoy reading it, I mean, I’m a nerd, but it feels like some coordination failure where it hasn’t been tried, and maybe there’s some hesitancy about being a first mover, but I don’t know.
I think there’s a little bit of that. I think what’s also coming out now is this idea of making it, as I said, part of something that’s already working well. I think it’s somewhat similar to Future Perfect and, if somebody had tried to build a specific media publication whose name was Future Perfect, that was completely detached, that needed its own funding its own team, its own journalists, its own admin team, possibly that would have failed because maybe there wasn’t space for just that. Maybe you need to make it part of something that already exists. And maybe the reason why the FT data team has been able to do that is that they are part of the FT. So the FT gets money from advertising because they write about tons of other stuff about politics and finance and leisure stuff and travel and they can get their funding through that. And then when they get into a stable enough situation, they can say, ‘Okay, you know what, you and a bunch of other people are going to create a small five person team to write the best possible data oriented analysis on current events. And we don’t really care if you generate enough traffic, we just want you to try and do that the best possible way.’
One thing I want to give a quick shout out to, if we’re talking about data and use, is the Tim Harford More or Less podcast as well. It’s less about visualising data. It’s a podcast or a radio show in format, so it’s not quite possible, but they do a great job of picking some of the main numbers that come out every week to week and giving context on where this number comes from, and what the methodology is there. And maybe that’s also another thing to consider of, what would that look like with more of an EA flavour of not just picking the headlines that are in the news, because other newspapers are talking about them, but selectively picking up a number and then giving the context of how it got produced or what is the methodology behind it?
Ed’s career advice
Free podcast idea, that sounds great. If you’re down, Ed, I’d love to chat about your own career for a bit. Maybe one question is just: what does Ed’s early career look like? And how are you imagining that it would turn out back then?
I certainly did not think I would be doing what I do now. So to trace it back, I had a few bumps in the road. I mean, all of them look like a potentially successful path. It’s just that I didn’t like a bunch of them. I initially started at Sciences Po in Paris, which is roughly an equivalent of a typical Oxbridge PPE programme both in the sense of it’s the same kind of stuff you study when you’re there, but also in the sense of the prestige around it, both in the positive way of your parents are very happy if you get into there, but also in the negative sense of it’s considered quite elitist and a little bit arrogant to have done this, and people don’t necessarily always see it in a good light. So I did that. And then I think when I got into this university, what I mostly wanted to do back then - I mean I was 17 - was to be a journalist. Like international reports or something, which is the kind of thing you really think you’re going to be doing when you’re 17. And obviously once you start learning about things, that kind of idea also translates into an unstable job and a precarious situation, and then maybe it’s also something you realise you’re not that interested in doing. So, at the end of my bachelor’s, I realised that actually, I was more interested in doing more technical work. And I think for some reason at the time, maybe just path dependency, it didn’t feel like the right time to leave that university to get a different masters. So I stuck with the list of masters available at the time, and I studied marketing, which was at the time because there was some specialisation in digital marketing inside it felt like the closest possible thing to a computer science degree, which let me tell you did not happen, because, obviously, just because you do a Digital Marketing MSC doesn’t give you a computer science skills. So yeah, I was pretty disappointed by that. I did work in marketing, and especially social media marketing and comms for a few years, which was interesting in many ways. And it’s taught me a lot about communication and social media, and how to grab people’s attention and how to publish something that looks interesting, which - we’ll talk about this later but - is actually something I end up using now a little bit. So it’s actually been interesting now. But after a few years in marketing and comms, I realised that was not for me. And so at that point, I decided to start from scratch and learn data science. And thankfully, at the time, that was the golden age of online courses in Coursera and edX and all that. It was 2014, around that time, and everything was free. You didn’t have to pay for anything. And you had basically the best universities in the world, rushing into those websites to make their courses available for free. It’s quite different now, you often need to pay, but at the time it was a goldmine.
Why did this happen? Why were universities just throwing out their courses for free?
Because it was the phase where everybody was talking about MOOCs and online courses and, ‘Oh, we need to do that’. And that was a race for the university that would provide the highest number of free courses online with the best possible quality. And none of them thought about really making people pay for it. And that just lasted for maybe a couple of years at best. Then they started saying, ‘Oh, actually, if you want a certificate, you need to pay 100 quid or something.’ And that’s the situation right now where you can just audit those courses and not do the exams and you don’t get the certificate, right. But at the time, you could get everything for free.
But still, like 100 quid for a certificate versus-
Yeah, I think it’s still very good. And I think it’s still very useful and I often advise people to look into those kinds of online courses. But not only was that the golden age of online courses, but it was also the golden age of data science. And so within those online courses, the one thing that was everywhere was machine learning, data science, and AI. And so I thought, ‘Hey, that looks cool.’ It also sounded appealing, because I realised that the few things I liked about the marketing thing I was doing was when clients would ask for a report in a spreadsheet, I actually enjoyed doing that. So I thought, ‘Hey, maybe I could give it a go to this data science stuff.’ And so I started learning online, dozens of online courses. And I basically spent my evenings and weekends doing that. And then the big jump was getting a job from them, which is difficult because, especially at the time, people didn’t really like online courses, that didn’t really sound like a serious thing that people should be doing. So at the time, I moved to Oxford with my ex-partner. And so I had to find a job. And so I applied for a job at the university to do data science for public health. And through a lot of open mindedness of the people I was interviewed by, and also, me highlighting some things more than others on my CV, I was able to get that job. I think it’s not so much about lying. It’s about getting through that barrier that some people might have. I knew I could do all of this stuff because I had done it repeatedly in courses. But it’s just that some people don’t really like the idea of doing it through courses. And so if you say you’ve done it in your previous job and then they give you the test because you’ve said that and through the test you actually make it and the output of the test is great, then they trust you but if you had said, ‘Oh I’ve only ever done online courses’, maybe they would have stopped there and not given me the test.
You should have a build-up portfolio as well whilst you’re doing the courses-
Yeah, so think what would have been better and what I advise people now, is not so much to polish and highlight something on your CV, but to actually build a portfolio, which I think is a much better, much more legitimate way of doing it. While you do these online courses, find a question or topic you’re interested in, and start doing smaller pieces of analysis. So that when you apply for a job, you don’t just have the exercise of your courses to show but you also have some personal piece of analysis, even if it’s small, and not super ambitious, but you can show that, ‘Hey, I tried to analyse the CO2 emissions of various types of food. And I made this report, and I published it on a blog. And I did that for a few other pieces of analysis. And I published six different articles or something.’ And if somebody comes to me with this kind of portfolio right now, I would be extremely interested. And obviously, I would look at the quality of what they did, and whether it’s actually interesting and insightful. But the fact that they did it would convince me that they are somebody who might be a good fit to work at OWID.
Yeah, I know of instances where EA orgs have been in a position where they can make someone an offer, not just because they have impressive credentials or anything, but because they have done online courses and taken the initiative to just put together some projects on their own steam, and maybe it’s a little bits of analysis, like you said, and they didn’t plug into anything bigger, those little projects, but just the fact that they have gone out and done these things on their own accord shows so much that they’re capable of doing it, but also shows that they have the drive to do it themselves. It’s great.
And I think that drive aspect is really important to highlight, because when you do these things, I remember thinking, ‘Oh, but what could I possibly write about that would be insightful. I’m just somebody learning, I’m not going to write anything world changing. So therefore I should not write about it.’ And again, the point you made about ‘Drive is important’, when I look at this portfolio, I’m not expecting to read something that will change my mind about some issue in an incredible way. I mean, these are students. I expect to see something that tells me that they are really interested in doing some kind of OWID style analysis of the world. And they are capable of finding the motivation and they have the skills to do it. I don’t expect that they magically found some crazy piece of analysis that no one has ever thought of. But when your students and you view it, you weirdly think that way. You think. ‘Oh, I need to find a very original piece of research.’ And it’s not the case.
Zooming out on when we’re thinking about what makes for a good research-oriented data scientist, can you talk about some other skills or attributes that you think are relatively overrated or underrated?
Yeah. I think at least for the people I tend to hire on my team, I think of some things as overrated. I think people tend to think that you need a PhD to get into Our World in Data. That’s not really the case. It probably is the case for the research team where you would need to either have one or to be on your way to get one. It would be surprising if we hired somebody who didn’t have one at all. But for the data team, that’s really not a thing you need to have. Another thing is lots of programming languages. When people start learning how to programme, they hear about all sorts of languages. And they started building this idea that they need to know three, four or five of them. It’s really not the case. For me, it’s much more useful to hire someone who knows one language really well, especially if it’s Python, rather than someone who knows three of them in a way that’s not really useful because they need to constantly check what they’re doing. Another thing, and it’s an important one, because I keep telling people if you’re going to join us, make sure you don’t care about that, is machine learning, AI, cloud stuff, anything super fancy that most people when they study data science are interested in and excited about. We don’t do any of that. I never run any kind of machine learning model. I don’t use any kind of fancy AI, cloud driven model service. I used to do that in previous jobs. And that was very interesting. But most of the work we do at OWID in the data team is importing CSV files. Some of them are big, most of them are small, and we clean them and we reshape them. We harmonise country names. We check the units, we change a few things, we divide by population, and then we output another CSV, and then we make pretty charts with it. I’m simplifying but it’s basically the case. And so I think somebody who would be really excited to do some statistical inference, or modelling of projections or forecasts, or just anything that sounds a lot more like actual statistics or machine learning, I think they would be probably pretty quickly disappointed by what we do.
What about the skills that people are potentially underrating if they’re aiming at a research oriented data science career?
I think everything that’s underrated is the stuff that’s hard to describe, but that we need as part of our team. It’s also something that when I joined, I didn’t really know that it was even a skill, but then I found out through various people I’ve hired or not hired that there is this intrinsic quality that some people have, which is a mix between knowledge of the research space and how to do research and thoughtful decision-making. And it’s something like, say you’re faced with a situation where you compare two different data sets of CO2 emissions over time, and you have to stitch them together to get a complete time series, but then there’s a weird break in the series between the two that data sets and they look completely different for a few years. What do you do? Some people will have pretty good ideas about what to do, maybe what to look for in terms of problems, look at the methods, maybe reconcile what they did, or any kind of thing that would explain what happened here. Some people will be completely puzzled and have no idea how to process and they will either be stuck, or they will make a decision that’s extremely bad. Maybe they will remove all of these years, or maybe they will just choose one of them without even looking into what happens. And that kind of thoughtful decision making is usually what you get once you’ve done research. And so if you’ve done a master’s thesis or PhD thesis, usually, slowly over time, you’ll build that sense of thoughtful trade-offs, thoughtful decision making. The problem is for the data team in particular, we were looking for people who also programme extremely well. And so it’s a tough ask to be looking for people who are very good programmers, but who also have knowledge of research and experience in research. So usually the best people we have on the team are people who did a PhD, got all of that knowledge and background knowledge of research, know how to do research really well, and then they found out during their research, during their PhD, that (A) they didn’t want to stay in academia, and (B) that they liked coding more than research. And these are the perfect people for me, because they have everything I’m looking for. But obviously what I just listed is a very specific situation. And there are very few people who are exactly like that.
Concepts for getting good at data visualisation
Yeah. What about when it comes to data visualisation, in particular? Are there any useful concepts that you’ve picked up about what separates really great data visualisation from mediocre confusing stuff?
Yeah, I think in general, the reference that everyone likes to give is Edward Tufte, who’s the main person who’s been researching this and writing about this for a very long time. And one concept I like, for example, that I learned from him is the concept of ‘Data-Ink Ratio’. It’s this idea that you should maximise the amount of information given and minimise the amount of ink that you use. And by ink, he obviously means it in a very old fashioned way where you print the charts. But it’s this idea that you should always question, especially when you produce a chart with a default setting in a programming language, do I need those axes? Do I need those ticks? Do I need all this fancy stuff that’s hiding away the information? Do I need a fancy background? And actually, you find out that if you remove a lot of this stuff, your graph is still legible. You should stop when you actually start removing actual information. But you should really think carefully about each of these different pieces of ink that you put on the chart and think, ‘Do people actually need this? Or am I just showing this because it’s typically part of a chart?’
Silly question: when should you use an area chart versus a line graph?
I think that comes down to, very often, just getting used to it. I think once you’ve done this enough, it just makes complete sense. But that brings me to the next thing I was gonna say, which is I think the main thing for me that has made it easier to become better at it is getting feedback. What we usually do internally at OWID is that if someone tries to make a chart that’s a little bit difficult to make, because it’s trying to convey a difficult point, then they make three or four different versions of it, they make four drafts. And then they just put it on Slack and ask everyone, ‘What do you think? Which one is the best? Which one is sending you down a path where you actually misunderstand what I was trying to say? Which one has good elements that maybe you wouldn’t have thought of, but actually, it highlights a specific part of the argument I’m trying to make?’ And then we mix them together, taking the best of each version, and try to turn them into something that really works. I think too many people skip the feedback phase, and they just produce charts, many of them, and they just publish them, and don’t really try to get a sense of how the people who saw these charts actually understood the problem and understood what they were trying to say.
Even just from a design perspective, what’s your favourite OWID chart?
So actually, from a design perspective, none of them are particularly crazily ambitious. We try to stick to line charts, bar charts, very simple stuff. One of the ones I like the best probably because it’s been the most successful one, and the most viral one, is the chart we have that Hannah made on greenhouse gas emissions across the supply chain. So this is a bar chart that basically shows for each animal, or each type of meat, and animal products, the bar chart of all of the greenhouse gas emissions that are produced by it across things like the farm, the processing, the transport, the packaging, and all that. And it’s a very simple chart, it’s a bar chart, there’s nothing particularly fancy about it. But the way it’s presented has meant that I think this has been the chart that I’ve come across the most time randomly outside of Our World in Data, seeing people who just copied it into books, into articles, into their blogs, onto social media, because they just found it to be extremely useful. Another one, I think, is the - again by Hannah, I think she makes really good charts overall - chart on the opportunity costs of diet changes. So there’s this idea that, very often when people talk about the impact on greenhouse gas emissions of cutting out meat, the only thing they mention is the amount of emissions produced through, for example, the production of beef. And because of that, it tends to underestimate what would happen if we got rid of beef because not only would all the beef stop emitting, especially methane, but also we would be able to use the land that’s currently used for beef production, and we would be able to regrow forests on that land, and actually, most of the effects of cutting out beef would come from that opportunity cost of being able to regrow forests on these lands. And so you end up in a situation where she has this chart that shows that for example, if you cut out beef, lamb and dairy, you would cut out 4.6 gigatonnes of CO2 equivalents. But then, on top of these 4.6, you have 7.7 gigatonnes, just by being able to put back vegetation on these lands. And I think that’s a really important thing, and she makes this clear through some kind of inverted bar chart that goes from the centre to the left to show subtracting emissions. And I think that’s really cool. And it’s conveying a point that’s extremely important that too many people tend to neglect.
Well, we’ll link it in the write up. But I’m also curious, outside of Our World in Data, are there any data visualisations that you want to flag or highlight?
Here again, I’ll make it a little bit different by not citing one of these crazy visualisations with all sorts of colours that people like to mention, sometimes. I like them, but I think they tend to be a little bit overplayed. I think what I really like is when people try to produce something that is actually usable, that is updatable, and that is impactful at the same time. So I think a very good example of this is The Economist’s dashboard on excess mortality during the pandemic. It does have interesting use of graphics, there’s some heat maps in there, some small multiples. So it’s not like this is completely boring from a graphic point of view, but it’s also not exactly the craziest visualisation in the world. But, I think, ultimately, it’s extremely useful and impactful because it’s easy to understand. It’s easy to use. They didn’t make it a one off effort that they published and then never updated again. They updated it every week. And I think, ultimately, that kind of visualisation ends up being much more beneficial for the world rather than something crazy with all sorts of different arrows and a static visualisation, that looks sexy at first, and people are like, ‘Oh, wow, this was incredibly well made’, and all that. But actually, it’s a little bit difficult to understand. And then because it’s a static thing made on Photoshop or Illustrator, people just never updated. And so it becomes out of date after a few months,
What about the most underrated chart on the Our World in Data website?
I think one of the most underrated ones might be because it’s so simple. It’s some of Max’s charts about income distributions. So in particular, there’s this chart about living standards comparing the income distribution of two different countries. So one example on the website is Max made this chart with the income distribution of Ethiopia, and then next to the income distribution of Denmark and the point he made in that article is that, really when you’re looking at income inequality, it doesn’t really matter, first and foremost, where you are within a country or if you’re poor or rich in a country, but it matters basically where you’re born. And that a rich person in Ethiopia will struggle to be richer than a poor person in Denmark, because the income distributions basically almost don’t overlap. And if you look at that chart, it’s basically very boring. It’s two probability distributions but next to one another. It looks like a camel’s back. And there’s very little information, but it conveys something extremely important about the state of the world, something that I think a lot of people neglect. A lot of people tend to think that one of the most important problems in the world is income inequality within countries and is definitely relevant and something we should work on. But I think a lot of people will tend to not realise how much of a difference it makes just where you’re born. And the fact that even the most successful person in some country will struggle to make ends meet compared to somebody not doing so well in a rich country.
It’s almost like an order of magnitude point, internalising just how big differences can be.
Luca, do you have a favourite Our World in Data graph?
So I got sniped very early by Our World in Data with a lot of econ history things. I thought it was so cool just getting to really zoom back. So a lot of early GDP and industrial revolution texts and stuff I really enjoy. What about you Fin?
I thought, when Ed was speaking, of this graph which I just found. The title is ‘The Yearly Number of Animals Slaughtered for Meat in the World’. And again, there’s really just one point you draw from it when you look at it, which is if the thing you care about is the number of animals killed, the problem of animals being slaughtered is the problem of chickens being slaughtered? 70 billion chickens, the next biggest number is under 2 billion, which is pigs. It’s such a striking graph. And it’s so striking because it’s visual.
It’s kind of a tangent, but is there a graph with fish on that as well?
Actually, unfortunately no. I mean, you could probably look up what the number of fish is. I would expect that it might be-
Because I’ve vaguely looked into this a little bit. I think one of the problems here is that fish almost always get measured in tonnes, rather than number of animals. Whereas chickens get measured in national units. So it’s really hard to compare.
I can tell you that fishcount.org.uk estimates just under 200 billion farmed fish. But anyway-
Fish and chicken.
Meta-lessons from Ed’s career
Yeah. Cool. So returning to your own career Ed, I realised, we never really asked: Are there any meta lessons that you’ve taken from such a varied career?
I think one of them is that, obviously, it is difficult to switch between careers, but I think on average, people tend to overstate how difficult it is. I think, obviously, for some things, like if you want to become a medical doctor, you probably should choose that pretty early on, and it’s a bad idea to decide that when you’re 30. But for most things, especially things that a lot of people in EA and longtermism are interested in, it is, I think, broadly true that you can pick it up in less than two years. And it’s something that people tend to be really scared of, and be sceptical when people say, ‘Oh, no, AI safety is a recent field, you could learn quickly.’ People tend to be dismissive of that. I think it is actually true. I think some things are very difficult. If you wanted to make impactful research in quantum physics, again, probably good to decide this early on. And it’s a tall order to decide to do this later on. But learning about the current state of AI safety is - again, I might be overplaying it because I’ve done it. But I think if I currently decided to do that, I would be pretty confident that in under two years of work, I could actually catch up, regardless of what I did previously. Another thing is, again, from an EA perspective, to try not to be too attached to one’s professional identity. I think a lot of people get a little bit stuck with the idea of, ‘I’m a researcher’, or, ‘I’m a journalist’, or something like that. And they get this sense of, ‘I want to be the person I thought I was going to be’. And I think what’s helped me in changing those careers has been to try and think, ‘Okay, regardless of what I thought I would be doing at that age, regardless of what I think would be cool, what can be what can be impactful, and what could I possibly learn now?’
It’s like Sunk Cost Fallacy, as applied to your career.
Exactly, and it’s a Sunk Cost Fallacy, both in terms of what you’ve studied and how many years you spend doing it, but also a Sunk Cost Fallacy in terms of emotional attachments and the daydreaming you do. And if I had been stuck to that, for example, I think there would have been a version of me that would have refused to do data analysis because the ideal version of me would be Einstein. I would be somebody who would actually discover stuff about the world and who might get a Nobel Prize for it. The chances of me getting a Nobel Prize for what I do at OWID are basically non-existent, because it’s not really stuff that is being valued by the research world in the same way that a shared discovery might be. But I think when you ask yourself what might be actually impactful, you realise that actually doing the work that we do at OWID is very impactful.
Any other meta lessons?
Yeah, I think, for me, specifically, the idea of using an opportunity that’s being given to you to have an impact, even if it feels a bit risky is something you should definitely consider. Obviously, you should make sure that you have somewhat of a safe environment around you, and that you’re not jumping into the deep end. So when I was working in data science and I wasn’t exactly sure how to be impactful, and I was doing mostly consultants work, COVID hit, and then I basically got in touch with Max Rosa and asked him, ‘Is there any way you can help?’, and he said, ‘Sure, you can help us with the testing data that we’re trying to collect.’ And it felt like a big risk, because I didn’t know anything about that. I already had a job. So it meant kind of helping them on top of that. But I took that risk. And maybe it could have not played out in any particularly significant way. But I did that. And for a few months, I had to do two jobs at the same time. But I think that was useful. And I think I don’t know if I would have realised later on, but if I had not taken that opportunity, I think that would have been a really big waste for me. And actually linked to that, I think another lesson is that throughout all these changes, it’s been reassuring for me to have what ATK often calls a Plan Z, which is this idea of knowing what you’re good at and or at least what people think you’re good at, and having at the back of your mind that if everything else goes wrong, you can always go back to doing that thing. So for me, if OWID stopped, or I was fired, or something terrible happened, I know I could basically go back to data science consultant work. I would not be happy to do that, but it would pay the bills, and it would work, and I know I could probably somewhat easily get hired because I’ve done it in the past. And I think for a lot of people, it’s good to have this because having the knowledge of that frees you up in terms of stress and pressure to maybe try other stuff that’s a little bit more ambitious.
OWID’s future plans
I was gonna ask also, with that spirit in mind: what’s next for you and Our World in Data?
I think now at Our World in Data we’re entering into a phase where a lot of the stuff that’s been on our mind in the last couple of years is slowly fading into the background, especially COVID. So now we’re getting back into the fundamentals of, first of all catching up on all the data updates we haven’t done during COVID, because we were just too busy. So we want to keep maintaining all of this stuff about COVID, about monkeypox, but we also want to work on making OWID more evergreen. So that means that for the first few years of its existence, a lot of OWID stuff was basically blog posts that were published without really thinking about how they were going to be updated and whether they were going to be updated. And now we have the opposite approach where any piece of information we publish, we directly think, ‘Okay, what is going to be the system that makes sure that next year or whenever new data is available, we update that piece of information, and we update the graph behind it to make sure that people have something that is evergreen.’ And it means changing a lot of the way we work. It means that compared to a few years ago, the data team has a lot more importance compared to the research team that produces articles. Actually, the data team that I manage used to not exist before COVID. It was just the devs doing the websites and the researchers writing articles. And now we have this idea of a data team that is directly a team responsible for ingesting data, analysing data and also producing data as an output. So it’s this idea that the output that people are interested in are not just articles, but also raw data sometimes through the form of data explorers or charts, and that we don’t always need to write text to go alongside these chants. And then I think that’s also coming along with that is this idea that we mentioned of writing more about specific topics that we think are interesting and that we haven’t really written about yet. So things like the history of pandemics, for example, is something we want to be writing about; pandemic preparedness; AI is an obvious one I mentioned where there’s a bunch of topics like that where they are there are less obvious candidates for the world’s pressing problems from a broad point of view. So we’ve dealt with climate change and poverty and many things like that in the first few years of OWID. And now it’s probably time for us to try and tackle these slightly more obscure topics, maybe not obscure for EA people, but at least way more obscure for people outside of EA.
Awesome. Okay, homestretch. Let’s ask final questions. First one is related to all this chat about careers: is OWID hiring?
So currently, we don’t have any particular application on the website, but I would advise the people if they’re interested, first of all, look at the open application form we have. It is the case that because we’re looking for people so specific, as I described, people with research background, but also specific skills, we’re kind of always interested to hear from people who know our work really well, think they could add something to it, and are just very interested in working with us. And so I think it is important to know that we welcome applications generally, even when we’re not actively showing something. And beyond that, it is the case that within the next few months, we’ll most likely hire at least one more person for the data team to help us with data updates and data science.
Awesome. And that’s ourworldindata.org/jobs.
Exactly. Yeah. There’s the website that you can monitor, but also social media. So Twitter, LinkedIn @OurWorldinData, and also the ethical job board where we always publish our shops.
Fantastic. Let’s get to the questions which we like to ask all of our guests at the end. One of these is what are the three recommendations, books, films, websites, other bits of media, that you would recommend to anyone who wants to learn more about what we talked about her?
Yeah, so I’m gonna go with recommendations that feel very not original to me, but probably are the best ones if people are discovering these topics for the first few for the first time, especially the first two. So the first one is Nate Silver’s book, The Signal and the Noise. It’s a book I read, probably 10 years ago now, and I’ve read it a bunch of times since. And I think it is, if you’re discovering all this, and the world of data and the world of statistics and evidence based analysis and all that, I think it is the best 300 page book that you can read on all of this. And it’s very well written. And it has a bunch of very interesting examples. And I would definitely advise people to read it. I think that another obvious choice, but less on the data side, but more on the OWID worldview kind of thing is Factfulness by Hans Rosling and his family. I think if OWID was a book, that would be the book. And that’s what a lot of people think, when they read the book, and we collaborate with Gapminder, and people at Gapminder frequently because we do essentially a lot of similar work, and we have the same worldview about wanting people to understand the world as it is through data. So I think Factfulness is a very good book to look into and to learn more about how the world works, but also how we should think about learning about how the world works. And finally, a third book is, much more recently, a book about the history of measurements called Beyond Measure by James Vincent, who is a reporter at The Verge, and he published that book, Beyond Measure, I think in June, so very recently, and I think it’s a very good book that goes through the history not only of measurements, in the sense of distance measurements and weight, but also things like ‘How do we measure deaths? How do we measure poverty?’ and things like this. And I think it’s an extremely interesting book that tells you more about the trade offs that we’ve made through time and space, to measure things. And what are the problems that also come with this. And I think, if people were interested in the thoughtful trade offs that I mentioned earlier, that would be a very interesting book to read. And actually, a more meta recommendation I would get would be a website that I really like called Fivebooks.com. Fivebooks is a website that I actually worked for for a couple of years, at some point. I did a few interviews for them. And I think it’s something that might really be interesting to people listening to this podcast, who like what you recently I think called cluster reading, which is this idea of choosing a specific topic and instead of just reading one book, and then going to another topic, reading a bunch of books about it with slightly different angles, and then getting out of this process with a very thorough understanding of the different viewpoints about that topic. And Fivebooks is an entire website based on this idea where they choose a topic like quantum physics, and then they will find an expert on quantum physics and ask them to choose the five best books about quantum physics. And then they write a very, very long interview where they talk about the books to talk about the topic to talk about that person. And even the interview itself is sometimes a great way to learn about the topic. So I think Fivebooks is very interesting. There’s a bunch of interviews there about X-risks that I did, about Effective Altruism with Will MacAskill. I think it’s really an interesting website that people should read.
Yeah, big plus one, actually. I’m a big fan of Fivebooks. I think it’s a great website
Yep. Plus two. Next question is: is there any research or work that you’d be especially excited to see people listening to this maybe get started on or even help OWID do?
A big one I mentioned is mental health, where I think there’s a lot of opportunity to do more systematic and broader data collection on the topic. And I think my understanding is that it would be very useful for the field and for the knowledge about the issue. I think another one I haven’t mentioned yet is philanthropy, where it’s been a recurrent idea that people have suggested to us that somebody - and I very much agree - should build a database of philanthropy, and philanthropy giving over time. Like going back centuries, if that’s ever possible, but obviously focusing more on the last on the last few decades. Also that would include current giving, by various institutions, and something that would obviously include proper tagging of different categories of giving, maybe, ‘Is it considered to be an effective cause or not? Is it more for global health? Is it more for development? Is it for art or science or education?’ And I think that’s something that on the whole would give a very good sense of how things are evolving through time. I think it would also give effective altruists a sense of like, ‘Are we generally achieving the idea of moving more money towards effective causes, or is everything we’re doing still just a drop in the bucket?’ So I think that would be extremely useful. And it’s something that is, at least for the current landscape of giving, quite doable, because most of these foundations, they actually publish the data, it’s just that they do it in slightly different ways with slightly different formats of tables. And so it just needs somebody or someone’s attention, and someone’s time, to scrape all of this data into common formats, and then some add-automated or manual labelling to get a sense of what are the different categories of giving.
Strongly agree. I think the history of philanthropy seems especially under-explored, and then even in the present day, trying to get a sense of who are the big players in philanthropy. Where’s the money going? Are people, for instance, living up to their giving pledges, especially very big donors? I’ve tried to figure this out, and it’s just really hard to get a sense because no one is trying to aggregate this information.
It’s extremely difficult. And as far as I know, no one has really given this any legitimate effort.
One thing to plug - I don’t think they’ve done it in a data or quantitative sense but - is HistPhil which is looking to a bunch of case studies around philanthropy, and I think the person running it, Benjamin Soskis, is a great person to reach out to on this.
Forbes has a Philanthropy Score, as well as part of their lists. But I don’t know how they work it out. So I think also things like billionaire impact ranking lists, where you’re just comparing how much of their net worth they’ve committed to giving to impactful causes.
Yeah, that’s very interesting. I think somebody - I can’t remember who it was exactly but I remember somebody a few months ago on Twitter - throwing around the idea of building an index of the coolness of different philanthropists.
Cool. Okay, the very last question is: Where can people find you online?
Mostly Twitter. Twitter is where I spend a lot of time and probably should spend less time. My account is @redouad. So R-E-D-O-U-A-D, or just typing my name, and I’ll get you there . And it’s the place where I publish all of the stuff I do for Our World in Data, and also different things about Effective Altruism and things like that.
Alright. Ed Mathieu, thank you very much.
That was Edouard Mathieu, on Our World in Data. As always, if you want to learn more, you can read the write up, and there’s a link in the show notes. There you’ll find links to all the books, sites and or OWID charts that Ed mentioned, plus a full transcript of the conversation. If you find this podcast valuable in some way, one of the most effective ways to help it is just to write a review wherever you’re listening to this, so Apple, podcasts, Spotify, wherever. And you can also follow us on Twitter, we are just @hearthisidea. If you have any more detailed feedback, then we have a new feedback form with a bunch of questions and a free book at the end as a thank you. It should only take 10 or 15 minutes to fill out and you can choose a book from a decently big selection of books we think you’d enjoy if you’re into the kind of topics we talk about on the podcast, and you can find a link to that on our homepage and also at feedback.hearthis idea.com/listener. Okay, a big thanks to our producer Jason for editing these episodes, and to Claudia and Alfie for writing full transcripts. And thank you very much for listening!