A Data Science Super Hero Fighting the Creepy Algorithms (Guest: Cathy O'Neil)

Media Thumbnail
  • 0.5
  • 1
  • 1.25
  • 1.5
  • 1.75
  • 2
This is a podcast episode titled, A Data Science Super Hero Fighting the Creepy Algorithms (Guest: Cathy O'Neil). The summary for this episode is: <p>More and more of our world today is being evaluated, analyzed, and driven by algorithms. You need look no further than your car insurance, your kids’ educations, or your mortgage to see the results of algorithms processing and delivering verdicts on you and your ability to do something you want to do. Our guest today is Cathy O’Neil, a data scientist, mathematician, activist, and author of the New York Times best-selling book “Weapons of Math Destruction”. She has spent years talking about the dangers of, as she calls them, “creepy algorithms”, and our need to take the effects they may cause seriously - both to ourselves, and those less advantaged than ourselves.</p>

A Data Science Super Hero Fighting the Creepy Algorithms with Cathy O’Neil
Episode Transcription
Ben: Welcome to the Masters of Data podcast, the podcast that brings the human to data. I'm your host, Ben Newton. More and more of our world today is being evaluated, analyzed, and driven by algorithms. You need look no further than your car insurance, your kids' educations, or your mortgage to see the results of algorithms processing and delivering verdicts on you and your ability to do something that you want to do.
Ben: Our guest today is Cathy O'Neil, a data scientist, mathematician, activist, and author of the New York Times best selling book, Weapons of Math Destruction. She has spent years talking about the dangers of, as she calls them, creepy algorithms, and our need to take seriously the effects that they may cause, both to ourselves and those less advantaged. So without any further ado, let's dig in.
Welcome everybody to the Masters of Data podcast, and I'm very excited to have Cathy O'Neil here with me today. Welcome, Cathy.
Cathy: Thank you, Ben.
Ben: As we always do, we love to start off with just getting into your background. You've covered a lot of ground in your career, and done a lot of different things, and I definitely enjoyed your book, which I'm sure we'll talk more about later. Just seeing how you kind of progressed, and your thinking, your experiences, go along. I'd love just to kind of get into that here. How'd you become a mathematician, how'd you get started? What drove you that direction?
Cathy: Both my parents were mathematicians, so it's just kind of the job I thought everyone had. That's part of it. I loved math. I was very excited about the purity and elegance of proof. Especially because I grew up in Lexington, Massachusetts, and we were kind of part of the Revolutionary War, the history of that. Every year in history we were forced to go on the same tour of the same Minuteman statue, and then finally in like 8th grade, for the very first time, we got to talk about something besides the Revolutionary War, and it was like manifest destiny. I'll never forget that moment when I was like, wait, I can't figure out whether my teacher is putting me on about manifest destiny, like whether my teacher believes manifest destiny was a thing that we were supposed to, like God wanted us to kill a bunch of Native Americans to expand our country, or whether this is ... What I realized at that moment was, it's just messy. Everything is messy, and math isn't messy. Math is just crystalline and pure, and you say what you are assuming and then you go from there.
I'm also just, FYI, I come from a long line of people that are on the spectrum, so there was something very appealing about just this kind of system of consistent logical thought that attracted me. I was just like, I'm going to go find refuge in mathematics. I also was talented with it, and in spite of the fact that I had a bunch of teachers that were telling me I didn't need to do math because I was a girl, what I liked about it was ... By the way, many of those teachers were themselves female. But what I liked about it was, you might not have thought I was going to get this answer, but once I had the answer it was irrefutable. That was the other thing, it was like, once you have a proof you have a proof, and it doesn't matter if you're a girl. There were a lot of things that drew me to math, and once I was there I was like, this stuff is awesome. That's how I became a mathematician.
Ben: I read an article you wrote, you said about playing dominoes with your dad, which I thought was really great, just that imagery of you guys playing a game together and increasing your ... I have a seven-year-old daughter, and I just like that idea of how you were able to get that excitement about math from your father. I thought that was really cool.
Cathy: That's the thing about math, is that it's playful. It can be, it's just a game. A lot of math, at least at the beginning, is just, even once you get quite involved in math, you can think of it as a game, you can frame it that way. It's a puzzle, and you can think about it, and you can sleep on it, and you can wake up and still not know it, and then the next day you wake up and you do know how to solve it. That's a very exciting moment. I went to math camp, the summer I turned 15, and if I hadn't already decided to be a mathematician I did that summer, because I learned how to solve the Rubik's Cube using math. I was like dude, this is it. How cool.
Ben: I never had the patience for the Rubik's Cube, so I admire that.
Cathy: But if you looked at it through the world of group theory then you might have, is my point.
Ben: Yeah, clearly.
Cathy: Not only could I solve the Rubik's Cube, I could solve any kind of puzzle of that type. It was really generalizable, it was neat. I would go into a puzzle store and amaze people. It was like a magic super power.
Ben: That's the thing, you can introduce that at the right point, it really helps crystallize in your mind as a young person. I think that's pretty cool.
Cathy: It becomes part of your identity.
Ben: Absolutely. So you went on to school to study mathematics, right?
Cathy: Yeah, I went to Berkeley, in part because they just had a great math department, but in part because I already knew I wanted to go to Budapest for this Hungarian mathematics semester my junior year, and Berkeley was the only college I could find in the country that taught Hungarian.
Ben: Why Budapest? What was in Budapest?
Cathy: I had met some really cool people who had gone to that program when I was in the math camp. It was a society, it was like a nerd society, that I was very happy to be part of.
Ben: What was Berkeley like?
Cathy: Berkeley was just wonderful. It was enormous, but I was lucky because I knew exactly what I wanted to get out of it. I wanted to learn math. I then became obsessed with African drumming, which was also a big thing in Berkeley. The class, I was really, really into it. In a big place like Berkeley you might think you could never get noticed, but if you're a student who's super obsessed with a subject then it turns out you can become friends with your teachers, which is what happened with me. I was very lucky, and I sort of wormed my way into all sorts of math department parties where there was actually a rule, one of the parties I went to there was a rule, no graduate students. But I got to go because I'm an undergrad. There's no rule against no undergrads. It turned out the rule that existed that there was no graduate students was specifically made, and this might be a lie but I like to believe it, because people would get drunk and famously say completely inappropriate things about their colleagues, and they didn't want graduate students to hear those things. But there I was, I got to hear all of it. It was awesome.
Ben: It's funny you say with the African drums thing too, I interviewed a physicist and was talking about this, and I know, I came from a physics background, a lot of physicists were musicians, and I'm seeing Richard Feynman in my mind now, he was really into drumming. There's something about mathematics and music and rhythm that, they really fit well together.
Cathy: Oh yeah, I was originally going to be a musician by the way, so I have a connection to Richard Feynman, that my cello teacher, who was married to my piano teacher, was also a drummer with Richard Feynman, so I actually met Richard Feynman growing up, because I was taking piano lessons in that same house. It's not an unrelated issue, I was interested in this West African Ghanaian drumming stuff because of the polyrhythms. It was so fascinating to me that you could sustain two different kinds of rhythms in two different hands, or different parts of your body. It really is hard, but it's also very pure, in the same way that mathematics is.
Ben: That's really cool. You enjoyed your time at Berkeley and, is that when you went out back to the east coast to work?
Cathy: I need to say, I should mention that I went to Budapest and I actually ended up hating the kind of math that I learned there, and then a friend of mine sent me a book about number theory when I was in Budapest, and revived my interest in math, and ended up deciding to become a number theorist. Got back to Berkeley and met Barry Mazur, who is a famous number theorist at Harvard, and decided I want to go to Harvard, I want to study with Barry, and that's what I did.
Ben: Right, and how was that, and so you got a degree in mathematics at Harvard as well?
Cathy: Yeah, my PhD at Harvard.
Ben: And you studied number theory; is that what you focused on?
Cathy: Yep. It was a really exciting time in number theory. When I was an undergrad at Berkeley, that's when Fermat's last theorem was solved. I was buddies with Ken Ribet, who was part of that story, he was in the Nova specials. I was in a different but even more exciting nerd society at that point.
Ben: I sense this is a theme.
Cathy: It's always about people. The subject is beautiful, but what makes it beautiful, what makes it exciting and important, of course, is how people think about it.
Ben: Yeah, absolutely. Even with the nerdiest subjects, it's the other nerds that make it more interesting. You graduated and then you went on to Wall Street, right?
Cathy: First, I went to MIT for a postdoc for five years, then I was at Barnard as an assistant professor here in New York, and then I was like, you know what, as much as I love number theory, which I do, I actually wanted a little more interaction with the human race. I just want to be part of the city, I loved the city, I wanted to be part of the energy of the city. It was 2006, I applied to and got a job at DE Shaw hedge fund. I started it in 2007, and then two months after I started, the credit crisis hit. At least it hit inside finance. It took another year for Lehman Brothers to fall and for people outside of finance to see what was happening.
Ben: Yeah, I definitely, when I was reading through that in a book, it was really fascinating to hear that from the inside view. What was that like, going through that on the inside? Because most of us were on the outside watching it, and didn't, I don't think ... I can definitely say I didn't really understand everything that was going on. What was that like?
Cathy: First of all, nobody understood what was going on, including from the inside, to be fair. I think that was the most impressive thing about it, is that there I was, I was working on stuff with Larry Summers, he had been the president of Harvard, he'd gotten famous for saying girls might just be bad at math, and I was working with him and I was like, "Uh-huh, uh-huh." Then the crisis hit and he didn't know what was going on, and neither did anyone else. We were all just like oh, something's happening. It was very much along the lines of that picture everybody talks about, of blind men touching different parts of the elephant and trying to describe it. Nobody knew exactly what was going on. But having said that, the disillusionment that I felt, personally, by seeing just how confused and baffled everyone was, was shocking. I was like wait, I thought you guys were experts. I thought the economy was based on something real. I just was more and more like, oh, we're just all trying to make as much money as possible before we retire. It was disillusioning, to say the least.
Ben: Yeah. There's a term that a guy that we were talking about earlier that I interviewed earlier, from MIT, Dr. Rigobon, he uses this term "aggregate confusion" that seems to kind of apply here. We get this, everybody starts to believe that everything's okay because everybody else says it's okay, and nobody actually understands particularly what's going on behind the scenes, and it becomes more self-fulfilling over time, until it all blows up.
Cathy: Yeah. I don't think the book has been written yet that needs to be written about it. I want to say that Larry Summers, Alan Greenspan, they were economists, and people trusted them because they were very arrogant, to be honest. They seemed to be in charge, they seemed to know what was going on. They didn't know the details of, for example, how the credit markets worked, how the credit default swap works. All the stuff that was over the counter. It was complicated, and they didn't think they needed to understand it, but it ended up being pretty important in terms of how risk was being hidden by Lehman Brothers and other banks. They were like, "We understood it at a meta level, and that's good enough, and we're happy with it." It turned out that that wasn't good enough, and we shouldn't have been happy with it. I just feel like that hasn't changed at all. I'm worried. Between you and me, I've left finance. I don't know what's going on in finance. I'm worried. I'm no longer willing to trust a bunch of self-assured economists.
Ben: Yeah. That is interesting, you saying that not a whole lot has changed. One thing that struck me too when you were describing this in your book, you talked about going and working on risk metrics. You went to a company that evaluated risk. I remember, it really struck me, and I had to read it a couple times, that you kind of learned that a lot of the banks didn't actually want to know the risk.
Cathy: Yeah, they don't want to know. To clarify my previous statement, I don't think we're going to have another credit crisis because of the same exact thing. I think we learned that you shouldn't probably give shit tons of mortgages away to people that don't have jobs, that's probably not going to happen again exactly, but as you just said, people don't really want to know what their risk is.
I'll go further. I think there's two different kinds of modeling going on in finance. This is a podcast about modeling, right, about data. There's the kind of modeling, you want it to be right, and then there's the kind of modeling, you want it to be wrong. When you want it to be right it's like, I want to predict the market, and if I'm wrong I don't make money. It's about profit, you want it to be right. Then there's the stuff you want to be wrong, which is about risk. You're like, I want to underestimate my risk. Because if I truly estimate my risk, or overestimate my risk, then I will make less profit. I'd like to slightly if not completely underestimate my risk so that, first of all, I make more profit in good times, and second of all, so that my Sharpe ratio is higher and I can brag about it. Sharpe ratio being just simply profit over risk.
So there really is a deep denial of risk going on, and I don't think that's gone. I think the kind of risk, the particular kind of risk that we ignored, running up to the credit crisis, has probably been acknowledged and avoided now. But then there's other kinds of risk that we're just pretending doesn't happen, because it's just inconvenient.
Ben: Yeah. The traders, their bonuses were based on that ratio you talked about, right?
Cathy: Yeah. There's definitely no incentive to get the risk right.
Ben: No, it's fascinating, so you have these people that just went through all this, the market just bottomed out, and they still don't want to hear the bad news, they don't want to hear about the risk. You'd already been disillusioned, you got even more disillusioned now dealing with this, right?
Cathy: Yeah. In particular, what I realized after trying to work on risk a couple years was that mathematical authority was part of this story. It wasn't just these particular economists who were pretending to understand stuff, it was also people pretending that when there's a math model involved you can trust it, because math doesn't lie. Like wait a second, we're actually building models specifically in order to lie. As a mathematician, I still identify as a mathematician, I'm ashamed of that.
Ben: Yeah. That's part of what you started to then write about, you started a blog, and you really started trying to shed light on this.
Cathy: Mm-hmm (affirmative).
Ben: What brought you to that? How did you decide that that's what you wanted to do? Did you feel like there wasn't a voice out there really pushing this, or you just felt so strongly about it that you just had to write about it?
Cathy: Both. Also I felt like good mathematicians were still being recruited into this field without any pushback, and I was like, let me be that person that says to mathematicians, here's what's really going on here. I originally started Mathbabe, I was thinking my audience would be mathematicians who thought about going into finance, and I'd be like, let me explain a little bit of it before you do that. I wasn't really saying don't do it, I was saying, you should be aware of a few things.
Ben: That makes a lot of sense, because the thing that actually comes to mind is, I was in grad school at the end of the 1990s, and like I said, I was in a physics program, and a lot of physics is math obviously, so a lot of us, we loved math as much as physics, and a lot of people I knew actually went to Wall Street, and I definitely can tell you, nobody had any idea, definitely back then. It was just like, if you want to go use your math skills and make some really good money, that was a good place to go. I can see how getting that information out there is really valuable to people that would be thinking about where they want to go. Can they make a rational decision based on understanding the risks?
Cathy: Yeah, most of the people I worked with were physicists actually.
Ben: Really.
Cathy: Yeah. A lot of them were Eastern European, a lot of them were string theorists.
Ben: As an aside, it makes me very proud that I've studied physics, because a physicist is like some sort of secret underground mafia that get everywhere. We invented the internet, in mathematics we did big data before anybody else did.
Cathy: That's true. Although different kind of big data, which we can come back to. Because there's a really important difference. I'll just say it now, which is that physicists, the kind of big data they've been doing forever, astronomy especially, it definitely describes the past, it might even predict the future, it often does, but it doesn't change the future the way that Google's search algorithm does. The way that Facebook's newsfeed does. The way that credit card companies who are trying to decide who's going to default do. They don't just say, "Are you going to default," they decide who gets a loan. That changes the future as well as predicts the future.
Ben: That's a perfect segue to talk about the book you decided to write, Weapons of Math Destruction. I have to say as an aside, based on the name of your blog and based on the name of this book, you do very well at naming things. I just have to say that.
Cathy: Thank you. I should say that my friend Aaron Abrams, who's a mathematician, actually came up with the name for my book.
Ben: It's literally my favorite book title ever.
Cathy: It's pretty awesome. I'm writing a new book and I'm like "Aaron, where are the goods? I need another title."
Ben: You can't forget that name, it's so memorable. Weapons of Math Destruction, you're writing about exactly that, is that these algorithms that are actually being used have real societal impact. Talk to me a little bit more about that. What made you decide to write the book? What was the journey to get there?
Cathy: We've pretty much almost finished the whole journey. I left finance, I started my blog, I needed a job. I have three kids in New York City, so definitely need a second income past my husband, who's a mathematician at Columbia. I got a job as a data scientist very quickly once I decided to do it, and I'm there working, instead of on predicting models I'm working on predicting clicks in the context of travel, like Expedia, CheapTickets, Orbitz. I'm starting to notice that these models, they're not exactly set up to fail like risk models are, but they are imperfect, deeply imperfect.
I, like many of the other data scientists I was talking to in other fields, were basically using demographic information like browsing, where do you live, what kind of consumer are you, other kind of behavioral stuff. I was deciding between the winners and the losers, and I was dividing winners and losers and scoring people. Basically, how likely are you to be buying something? Therefore, we're going to give you this option, versus if you're not going to buy something, we'll give you a separate option.
In the context of travel it wasn't a big deal really. Honestly, I was just deciding who got to see comparison ads, which is not going to make or break anyone's life, not even their day, forget about their life. But other people, I started to realize, were using the same exact type of techniques to decide who gets a loan, who gets a job, how long do they get sentenced in prison, crazy important things in people's lives, based on demographic data for the most part, and it was really troubling. Because I knew that what I was doing was not well done, you know what I mean? I knew it was false positives here, false negatives there, whatever. Better than guessing, yes, and sometimes it was not that different from what I had been doing in finance, but if you guess wrong 40% of the time in finance you still make money. You just need to be better than guessing. If you guess wrong with people's lives, it matters to them.
The worst part was, they didn't even know they were being scored. That's what I started to realize. I was like wait a second, we are basically propagating past classist, racist, sexist, divisions, because that's how we're deciding who's lucky or unlucky, and then we're making lucky people luckier and making unlucky people unluckier. That's what we're doing, and we're calling it data science as if it's got some kind of scientific authenticity, but we're not making it into a science. We're not experimenting with it, we're not checking to see that it's accurate. There's no sense in which this is deeply scientific, or ethical, or anything.
I actually started blogging about my concerns. I remember my blogs were called like Creepy Algorithms, and then More Creepy Algorithms, and then A Long List of Creepy Algorithms. Every time I had another blog post, my readers, who were really awesome people, this was like the heyday of blogs, when people really read blogs and the commenters were awesome. They were like, "Have you heard about this? Have you heard about that?" I was like, "Oh my God, are you kidding me?" I had this long and growing list of scary algorithms that I suspected were being done with very little oversight and very little science and lots of errors. On the other hand I'd already been disillusioned in finance, as we've discussed.
I realized that the credit crisis, even though it was horrible, and it removed an enormous amount of wealth from Americans, especially Black Americans, had one thing going for it, which was that everyone noticed it happened. Everyone noticed, and it happened internationally. Whereas, I felt that the errors, the flaws, and the suffering caused by the errors and flaws, of these algorithms, were going to happen under the radar. People would not see it happening. Individual people would not know it had happened to them. They would be denied a loan, they would not get a callback after applying for a job because of the application algorithm or what have you, they would just never know they were a victim of a bad algorithm. It would happen to individuals in distant places, in their living rooms. I was like, this is a disaster, because it's like a credit crisis, or like a crisis, that no one is seeing happening. Put that together, I felt like the moral obligation to write a book about it.
Ben: That definitely makes sense. One thing too, taking a pause there, because I remember I’ve listened to a couple other interviews you did, and particularly the word “algorithm,” I know definitely in the field I'm in we throw "algorithm," the word, out a lot, and then definitely some people that are going to be listening to this, it's like, "Algorithm, nerd word, I don't understand what that means." What is an algorithm when you define it like that? How would you define it?
Cathy: Yeah, good question. When I say algorithm I really mean predictive algorithm. If you just stripped it down to the word "algorithm," a computer scientist would just mean like "process". That's not what I'm talking about. I'm talking about a predictive algorithm, so something that's trying to predict success using historical data. It's got two major ingredients. One is the definition of success, and one is the historical data that you train it on.
Loosely speaking, and I think this is sufficient for anything I didn't discuss, look at initial conditions and historical data, and you said, "Did this lead to success? Yes or no?" You look at all the initial conditions you have data for to decide which ones led to success, which ones didn't, and then you have, when confronted with a current set of initial conditions, you say, "Does this look like something that was successful in the past? Yes? Then it's going to be likely to be successful, or if this does not look like something that was successful in the past, it's unlikely to be successful." You basically tag it with a probability of success. So you're just looking for patterns of, what led to success in the past, and then trying to propagate it into the future.
Very quick observation about that is, it's backwards looking, and so any kind of prediction of success in the future is going to assume it's going to happen just like it did in the past. That's sometimes useful, but sometimes totally not what you want. If you're saying, who's going to be successful at Fox News, White guys. You know what I mean? Then you're like, "Who should I hire at Fox News for the future?" If you trained it on the past it'd be like, more White guys. Because that qualified Black woman who's just applied, definitely don't hire her because she does not look like somebody who was successful in the past. I'm just making an extreme example to make it clear that algorithms to not ask, why was someone successful? They just fit patterns to success in the past.
Ben: When you've talked, after writing the book, and you've done a lot of speaking and a lot of connecting with people that are kind of across the spectrum of understanding this, what were the examples you think resonated the most with people? You gave a lot of examples in the book, and it was a surprisingly easy read. Even as a background as a mathematician, I think you did a really good job of explaining in terms that a wide variety of people could understand. What do you think were the examples of those creepy algorithms, as you put it? What were the ones that resonated the most with people that you would talk to?
Cathy: Really depends on the audience. You're going to have to be more precise than that.
Ben: For example, there was ones about the teaching, I think, that definitely resonated. Maybe we could even start with that. I think some people have heard about how the algorithms were used to judge teachers in Washington DC.
Cathy: Right. Among educators, definitely the teacher algorithm, which was almost no better than a random number generator, honestly, but was used to fire people, especially fire teachers, in particular. And it was gameable. I had an example of someone whose score was artificially lowered because previous teachers had cheated but she didn't cheat, and so she was punished in a weird way, but in a very predictable way, because of previous cheating. It was part of this political teacher accountability, no child left behind and race to the top stuff. Much more about politics than about math or science, but that's how that works.
In terms of resonating with people, that example resonates, I think, highly, with educators who had to live through this teacher accountability campaign, which was not itself accountable. I'd say that spreads to anybody who cares about the power of unions in general, because it was essentially a tool that was used against teacher unions. If you look around at which states are using the value-added model for teachers, which is the name of the model, it's happening in places where they have a governor who hates unions. It's pretty simple. But it was used, again, with the authority of mathematics. When teachers asked, "Can you explain this score, why did I get such a bad score,” they were told, "You wouldn't understand it, it's math."
Ben: One thing that's interesting when you say that too, I lived in Washington, DC on the outskirts at that same time, and I cannot remember the idea of an algorithm driving this kind of entering my consciousness. I remember all the back and forth, and a lot of the, should I say, the controversy about it, but I think at that point in time, maybe even as somebody who would have the capacity to understand it for sure, I don't think it really cut through the fog for me that this was a algorithm gone terribly wrong. Maybe that's because of the way it was presented.
Cathy: Thanks for saying that, because I believe you. Because the marketing isn't focused on, "Hey, does this algorithm work?" That's not how they market it. Let me give you another example from this week, which is the first week of September. Did you hear about California getting rid of cash bail?
Ben: I did. I didn't look much into it, but yeah, go ahead.
Cathy: That's good news right, because cash bail sucks, but do you know what they're replacing it with?
Ben: No.
Cathy: An algorithm that is very much along demographic lines; asking questions like, do you come from a high crime neighborhood? Are you friends with gang members? Did you get suspended in high school? Do you have a job? Have you ever been married? Questions that are proxies for race and class. I personally am not particularly familiar with the algorithm that they're going to use in California, but I know it's an algorithm, and I know about the class of algorithms that are related to it, and they're very troubling. I'm no fan of cash bail, don't get me wrong, but there are ways to get rid of ridiculous pre-trial detention systems without resorting to racist algorithms. And, I will add, even if I'm wrong about it being super racist, where's the discussion about whether the system that they're replacing it with is fair? It goes exactly to your point: I'm sure you heard about the teacher accountability movement in Washington, DC with Michelle Rhee, as the superintendent of schools. I'm sure they talked a lot about holding teachers accountable. But I'm pretty sure they didn't spend much time, as you point out, explaining how that system actually works, and explaining the evidence that it works, and is fair.
Ben: One thing that you mentioned a couple times during the book ... Because I know you related to one we just talked about, about algorithms, about recidivism and the justice system, there's a few different examples you put in there, and it does seem like a lot of these algorithms were designed to fix some perceived injustice and actually trying to do a good thing, but because of the way they got implemented they ended up potentially even being worse, right?
Cathy: Yeah. I'm not claiming that the motives were vile. They're not. But I think what happens in general is, people, they want to improve a system that they know is unfair because they have evidence that it's unfair. Then their blind spot is that the data from the unfair system is itself incredibly bad. In this case the data comes from crime records, which, I hesitate to say the word crime records, because it's not crime records we have, it's arrest records. As long as the police practices are uneven, which we know they are, then the data that stems from police practices like arrests will be biased. Then what happens is, you hand over this data to data scientists and say, "Go at it and try to optimize to accuracy for who's going to get arrested in the future," and they do, and to be completely frank they're predicting the police just as much as they're predicting crime.
It ends up being this self-perpetuating feedback loop. The data's biased, you send police back to the same neighborhoods where they found crime before, and they arrest more people, and then the algorithms end up being proven right. Because yeah, there are still people who would take drugs in that neighborhood. But by the way, there's just as many White people who take drugs, but we don't arrest them. Or another way of looking at it, because it's not just about low level crimes like drugs, it's like people who are addicted are still addicted, people who have mental health problems still have mental health problems, and as long as we are criminalizing that, it's extremely predictable.
Sometimes I like to imagine a thought experiment where we use these recidivism risk algorithms to figure out where to allocate resources, rather than where to punish people further.
Ben: One thing that comes to mind too, one of the things I talked to Dr. Rigobon about that I mentioned earlier is that, something he says is that we don't measure enough, and we don't measure early enough. We measure extreme events. It does seem to play into what you're saying here, because these algorithms are so dependent on the data that's being collected and how it's being collected that the very way we measure it is drastically affecting the outcome. If you're not measuring all of the people that are involved, if you're not measuring at all the points in the process ... One of the examples I know he gave was, measuring drug overdoses, not measuring the people taking drugs. If you're only measuring drug overdoses, and then you put in a program to change that, and you don't see any change, it may have changed people that you weren't measuring.
Cathy: Yeah.
Ben: You would never know, but to this point, the algorithms don't know. The algorithms just take the data you give them, you build a model, you put it through the algorithm, and something comes out the end. The algorithms, they're not intelligent, they're not making a judgment call, I guess would be the right word.
Cathy: That's right, and that's a well-said point. Just to, if you don't mind, go further along those lines, sometimes I like to ask, what would it look like to live in a world where we actually do collect all that data, especially around crime. Think about all the friends you've had, I'm sure this has never happened with you particularly, but all your friends who've smoked pot and haven't gotten arrested. Imagine if every single time anyone smoked pot, they got arrested. It's just unimaginable. Or other crimes that we do all the time, without really living in fear of getting arrested.
Ben: People that might have speeded when they were younger.
Cathy: Or something like that. I'm sure that's never happened, but you know what I mean. It's hilarious to actually think about what that would look like. It would obviously need video cameras in every single room with some kind of AI that recognized crime when it happened. We don't want to live in that world; nobody wants to live in that world. But then you could kind of imagine that ... Not just imagine, it is a reality, much more for certain people than for other people. It's a reality for people living in projects and inner cities. Hey, guess what, exactly the people that are denoted high risk by these algorithms. It's just full-circle in a certain way.
Ben: Yeah, and to your point, you give all sorts of examples across the board here, from going to college to getting insurance to getting loans. Algorithms are being put in all part of this, so we describe the problem here, particularly in the last couple years since you wrote the book. So what do we do? Because obviously algorithms aren't going away. They provide benefits in the sense that they allow us to be more effective, be more productive, but you talk about things like the Hippocratic Oath and other things in the book. After a couple years having written the book, where are you at right now? What do you think we do now and in the future to start getting our arms around the problem?
Cathy: Yeah, it's a big question. I would say, first of all, I agree that they're not going away. Second of all, I would love data scientists to take their ethical obligations seriously, and a Hippocratic Oath is in that direction. Speaking of that, I've seen maybe 12 different versions of Hippocratic Oaths from algorithm designers and data scientists; none of them are, that I've seen, even close to being hard core enough for me. None of them ask the question, “Are you denying people their constitutional, legal, or human rights?” Which is the kind of thing I would actually think about. They're toothless so far. But even if they had teeth, I would venture to say, based on the experience of the country, seeing Mark Zuckerberg going to talk to Congress a few times and claiming that he's going to have AI tools to solve the problem of fake news, which is not true, it's not going to just be up to data scientists to stand up and say, "Hey, I'm not willing to work on this algorithm or in this business model context." Because then they're just going to get fired and replaced by someone else.
The answer to your question is, we need laws to be enforced. We have laws that aren't being enforced: anti-discrimination laws for hiring, for housing, for credit, for insurance. None of them are being enforced. But we also need new laws, and we need enforcement of those laws, and I don't want to pretend that I'm holding my breath for any of this to happen.
Ben: One thing that springs to mind too, I don't know, you tell me if this makes sense to you, but hearing you describe it, I know one of the problems with regulating the banking industry that's always been kind of this tension is that, to really regulate the banking industry and the traders, you had to have people that actually understood it. You had to have people on the government regulation side that actually understood it. I'm wondering, particularly having lived in DC for years, if it's also an issue about having people on the government and regulatory side that actually understand what's going on with this stuff, that actually have the background, that actually could go that couple levels deeper in order to understand what's going on. Do you think that's part of what the problem is, or is that really necessary?
Cathy: That's a good question. I think it's a little bit different. I think the markets really are overly complex in a way that you have to be kind of just a nerd even to understand it. Whereas, I think anti-discrimination laws and hiring are pretty simple to understand, and a lawyer could say, "Well, show me how many African American qualified applicants are getting through your system versus White applicants getting through your system who are qualified." They could ask that question, and the answer will be, "Sorry, the algorithm is too complicated to give you an answer." I think a good lawyer would be like, "Sorry, that's not okay. You have to give us an answer." Then the technology could be developed, and I think it's very, I'm not going to say easy, but very doable, practically doable, to say, if you're going to use a hiring algorithm, you absolutely must be able to provide evidence that it's lawful. Right now regulators simply aren't asking the right questions, and they don't have lawyers who are asking the right questions, and they don't have data people who are building the algorithms that can answer those questions appropriately.
I guess what I'm saying is, I'm envisioning, in 10 years, algorithms that are being used for hiring will have automatically installed monitors that will keep track of what they're doing to make sure they're in compliance with laws. I totally think that's doable. We just haven't been asking for them to do that.
Ben: Yeah. As you said, there's no regulatory framework that's there to actually guide them in that way, because at some point you actually have to have something in place saying, "You must go do this."
Cathy: There's no guidance. There are laws, there's just no guidance for how they apply to algorithms.
Ben: Yeah. I guess one part of that, to kind of put a bow on that, do you feel like it's risen to the public consciousness in a way? Because, I mean, at the end of the day, what's going to drive change like that is that people actually care. They actually feel affected.
Cathy: It's weird, I think the public is now aware of the mistakes that really complicated algorithms like Google and Facebook are making. Trump was talking about it last week for God's sake. He's wrong on the facts by the way, but he's right to worry about bias. The problem, of course, is that those are literally the most complicated algorithms of all. They're international algorithms that do unpredictable things, like flash crashes almost, at the level of the algorithms that are so complicated that they're unpredictable. Whereas, we could start small, with like hiring algorithms, or recidivism risk algorithms, or algorithms that schedule people unreasonably. Those are much, much easier to handle, and to make sure they're in compliance with laws and such, but those aren't the ones that are being scrutinized first. The ones that are being scrutinized first, for good reasons, are the ones that are threatening our concept of truth, our concept of believable truth and information sorting and democracy and stuff like that. Those are big, big, big topics, and these algorithms are going to be the hardest of all to tame, unless we decide just to stop using them, which I don't see happening.
Ben: No. So it's been a couple years since you wrote the book, you said you were working on another book, so is that where you're spending your time now? Is that the next big thing?
Cathy: Yeah.
Ben: When is the book due?
Cathy: When is it due? Oh my God, you never ask a writer that, ever. Take it back.
Ben: Sorry. In the next five years?
Cathy: Absolutely.
Ben: Okay, good.
Cathy: Next time you ask it'll be, definitely in the next five years.
Ben: I can tell you, based on your last book, I'm very excited to see what you come up with next, and we definitely would love to have you back to talk about that in the next five years when it's done.
Cathy: Great.
Ben: Thank you Cathy, this has been a fascinating discussion and we're excited to see what you do next. Thank you so much for your time.
Cathy: Thanks Ben.
Voiceover: Masters of data is brought to you by Sumo Logic. Sumo Logic is a cloud-native machine data analytics platform, delivering real time continuous intelligence as a service to build, run, and secure modern applications. Sumo Logic empowers the people who power modern business. For more information go to SumoLogic.com. For more on Masters of Data, go to MastersofData.com and subscribe, and spread the word by rating us on iTunes or your favorite podcast app.


More and more of our world today is being evaluated, analyzed, and driven by algorithms. You need look no further than your car insurance, your kids’ educations, or your mortgage to see the results of algorithms processing and delivering verdicts on you and your ability to do something you want to do. Our guest today is Cathy O’Neil, a data scientist, mathematician, activist, and author of the New York Times best-selling book “Weapons of Math Destruction”. She has spent years talking about the dangers of, as she calls them, “creepy algorithms”, and our need to take the effects they may cause seriously - both to ourselves, and those less advantaged than ourselves.