Intro to Data Bias (Guest: Christian Beedgen)
Ben: Welcome to the Masters of Data podcast, the podcast where we talk about how data affects our businesses and our lives. We talk to the people on the front lines of the data revolution. I'm your host, Ben Newton. I've been honored to know our guest today for several years.
Ben: Christian Beedgen, the co-founder and Chief Technology Officer at Sumo Logic, has had a long history in the world of data, and he is one of my favorite people to have deep conversations about meaty topics with, and that's good, because our topic is meaty. Christian has been spending a lot of time lately thinking about bias, bias in data and data analytics in particular, and that's what we're going to talk about. So without any further ado, let's dig in.
Ben: Welcome everybody to Masters of Data. I am super excited today to have Christian Beedgen with me, who ... He and I worked together for a long time at Sumo Logic. He's the Chief Technology Officer there. Christian, thank you so much for coming on today. I really appreciate it.
Christian: Thanks for having me. We've been trying to schedule this a couple of times already.
Ben: Absolutely. I'm really excited we were able to do this. Christian, you and I have known each other for a long time, but not everybody is going to know you as well as I do, so I'd love to start off on these things to get a little bit into your past, like know you as a person. How did you get to where you are today? How did you get into technology and how did you get to Sumo?
Christian: Technology for me started when I was much younger, couple centuries ago. I was maybe 10, 11 or 12 years old. I don't remember this exactly, but I got into computers by reading books that had programs in them, which is really funny. We had this library bus that would come out to our village, and once I went through all the comics I guess, I ended up with these like dot matrix printed, early computer books where they were talking about Basic and these types of things, and somehow it just caught my attention.
Christian: Some friends got Commodore 64s and were playing games on them and I was ... I don't know, I always liked to read, and this technically reading stuff somehow came natural to me. I don't really know why. I eventually got a computer myself. I convinced my parents to get me one, so I had an Atari, and I started getting into programing. I'm basically self-taught. I didn't really learn it from anybody telling me.
Ben: What did you start out programming?
Christian: Well, Basic basically. That's kind of what you could get. I always tried to get a Modula-2 compiler, if anybody still remembers those, but you needed more RAM in your box, and that would have cost more money, so that didn't happen.
Ben: What kind of stuff did you start off writing? Was it games or just fun little tasks with Basic or-
Christian: Yeah, the graphic demos and just typing stuff, just copying stuff out of magazines and the books. Then the Atari had an early version of a point and click kind of graphical interface. I think it was called GEM or German, and there was a German guy who built a Basic version that ... A version of Basic, the programing language that allowed you to do menus and dial up boxes and drop downs and this type of stuff, and I was just endlessly fascinated with that.
Ben: Yeah. You know, I actually had a Commodore 64 too and I had a program that would make the hard drive sing.
Christian: Yeah. Yeah. I remember those. People did all kinds of crazy stuff with that. Then as I got a little bit older, when I turned 16 or so, other things obviously started becoming a little bit more interesting, and then it was a long detour. I ended up after school moving to Berlin. I'm originally from Germany, in case it wasn't painfully obvious from the accent at this point.
Christian: I started studying social sciences, sociology and politics. I was kind of a little bit, I guess, politically aware and kind of looking at stuff. When I was still in school, we stood in Parliament, you know, those type of classic things. When I started social sciences, it became clear that you need to write lots of papers there, and I kind of made a deal with my parents and got a brand new Mac out of it, a Performa 630 if I remember correctly. That was awesome and I ended up writing a bunch of papers but I could not get a way from that Mac. This was the time where you had the CD-Rom drives already and the magazines at the time, always magazines. The computer magazines would include CD-Roms or shareware and just couldn't get away from it and then I got back into buying books on various programing and Mac related topics and Linux and these types of things and I've always found that I ended up being able to kind of spend a lot of time ... I don't know, you go to bed, you read, I couldn't take a sociological text and get myself to read that for fun but all these computer things, all this programing stuff, I always thought it was a lot of fun.
Christian: At some point, I realized that I should probably move more into that direction. The internet happened in the meantime, so in '96, I worked construction in the summer and I took the money and bought a modem and the internet had just sort of, kind of arrived in Germany and I had a Supra Fax or something, 28 baud, 28.8k baud or whatever, so and I dialed into some sort of dial-up, online sort of AOL-type thing that we had in Germany and that was not interesting and went straight to the internet and Netscape and holy shit, right and sort of never looked back.
Christian: I ended up studying sort of a combined digital media and computer science program. I wasn't really that good at the digital media part. I had a copy of Photoshop and I applied all the filters and then I walked away thinking I was gonna be an artist. I was gonna make all these psychedelic pictures which was cool but I ended up being the guy that somehow figured out how to write programs because I had a little bit of that background, it had all came back and an internship. This was a school of applied science and in Germany, that means in the seventh semester, they kick you out and you need to go into an internship. That's mandatory for this type of school and-
Ben: That's pretty cool.
Christian: Through a very sort of ... An incredible sequence of random coincidences, I ended up at Amazon in Seattle in late '98. This was kind of through a German connection and then I got totally sucked into this startup thing, was part of starting two companies, that's what got me back to the US. I've been in the US since early 2000. I arrived just when everything came crashing down. I went to Axite after we tanked our own company and you guys all know about my background from Axite. I was an early engineer there. That was a security information and management company. That's where I sort of got introduced to this concept of lock management and to make a long story short, I was there for almost 10 years and that's where a lot of these sort of initial ideas around and behind Sumo Logic came about, from observing what works and in the space and it's a really interesting space doing data analytics on data that's usually not being analyzed because the formats are hard to wrap your head around and it's not relational data that you can just load into a warehouse, et cetera, et cetera.
Christian: Then the idea for Sumo came up and together with Kumar, who was also an Axite guy, we started postulating that this type of tool is very interesting, should be extended not just for security but also for operational analytics use cases and most importantly, an upright software was going to be a bad way of selling it and it had to be a service and that's what got us to start Sumo Logic in 2010.
Ben: That's a great background. It's actually some things I hadn't heard there. So, that actually explains a lot about how you think about I think the stuff we're gonna talk about today.
Christian: So, the access to information is really interesting. It has always fascinated me and I still think that this is probably one of the, to me, most formative things about the internet as well because you can pretty much access everything.
Ben: Yeah, and we've been talking about for a while now kind of prepping for this. You've been working with data for a long time. You've been thinking about this for a long time and it's a pretty hot topic now. We've had some cool people on the show so far and we've got some coming up but what have you been thinking about? What's been rolling around in your head lately about data specifically kind of bringing up that level behind kind of where you and I work with machine data but that next level up is data as a general area?
Christian: Yeah, so obviously, almost 20 years into doing stuff with data and I think we are solving many interesting use cases here at Sumo and making systems more observable so that the people that run the applications that power their business are becoming more efficient. Something that I've always wondered about and that's probably natural if you kind of understand a little bit of the background that I have, a lot of the background is, to some degree, accidental obviously but I have a little bit of background in humanities and then there's this whole data thing and so there's always this kind of ... Sometimes there is this discussion in this dichotomy about qualitative approaches versus quantitative approaches and just to be clear, when we did social sciences or sociology in particular, the most feared course was in statistics because they had like this hardcore math statistics guy in that department that was full of a bunch of social scientists and guys who were into politics and they really liked reading Marx and all this type of stuff, of course, and then there was this one guy and he was like the hardcore math guy and everybody just absolutely ... I mean he was tough and everybody hated statistics-
Ben: I was a math major and I didn't like statistics. I get that.
Christian: That guy didn't ... His personality didn't really help. Anyway, so I'm not trying to sort of say that if you're into humanities, you're not ever gonna use data but this has always been sort of in the back of my head. It's like hey, some things are qualitative, some things are quantitative and often people prefer one over the other, usually very strongly and I've always ... I can't find whether I'm on one side or the other, so I feel like there's an interesting aspect there to keeping a balance behind these things.
Christian: And so as I'm kind of living my life on the internet, like so many of us, you follow all these links and things take you to other things and you see references to things and one of the things that I always look out for is book recommendations, and so I happened across this book by this guy, his name is hard to pronounce, I think it's Christian Madsbjerg he wrote a book called Sensemaking and that sort of popped up but he's basically fundamentally questioning whether decisions should be based purely on data.
Christian: And he brings up concepts like instinct and intuition and context and that totally resonated with me because it was kind of an interesting sort of stance that kind of went against sort of the prevailing wisdom that has come up, especially in the last couple of years around data-based decision making, the rise of big data and there's been a lot of hype around how the work can be improved by using more data, more data like more sensors, more observations, more data available to everybody and you can do all kinds of really interesting things and so this guy walks in and basically says, hey, there might be another way to look at that and he's coming much more from a qualitative side, from the ethnographic side, et cetera, et cetera and so I found that sort of just awfully interesting because I think it's good to sort of look at ... Generally, I think it's good if you can to not convince yourself too much of one opinion.
Ben: Yeah, exactly.
Christian: And then I think that's generally something that people do and they pick a side basically and conservative versus liberal or qualitative versus quantitative, left versus right, up versus down, black versus white, et cetera, et cetera, because it feels like that's actually a comfortable spot to be in because there's a lot of other people there and the world's fairly easy. Okay, the world's white, the world's black, pick one. To me, that's never really worked. I just think that's kind of betraying your own intelligence and anyway, so I thought that was awfully interesting.
Christian: As you know, because we've spent a lot of time working together, the concept of context is something that I've always sort of tried to embrace and I've talked a lot about. We've talked about even all the way down to building product features that we have all of these data streams coming in logs and metrics and what have you and they tell you things but they often don't actually include the context in which the stuff happens. What he says in the Sensemaking book is that he's not saying the data is crap or anything but he's basically trying to say, hey, data is not necessarily truth and you need to ask for which context was this data gathered in, what's the further context of the research, so he has one sort of thing that actually stuck in my mind. He calls that sort of the difference between the Savannah and the zoo.
Christian: So, you can put animals in a zoo ... You put them in the zoo and you observe something but you actually observe them in the Savannah in their actual real context-
Ben: You'll see something different.
Christian: You're gonna see something very different and that's actually ... Here's the other thing, I'm a big dog person as you guys all know and ultimately the company's named after my dog, et cetera, et cetera and we have all these dogs in the office, which everybody seems to be really happy with, I am anyways but I read something about that recently as well where ... I don't remember all the details of the context but a lot of the ideas of this term of alpha dog and domination and sort of canine society is kind of derived from observing wolves in captivity and then it turns out, they actually behave completely differently when they are behind bars-
Ben: Versus out in the wild, yeah.
Christian: Then if they're actually in the wild and it's interesting because I don't think you need to spend more than half a second thinking about and realizing that that's very true. It actually applied to people as well. I mean I think you would probably be behaving or I know I would behave very differently if you put me behind bars here.
Ben: So you're saying reality TV isn't true? Is that [inaudible 00:13:20]
Christian: What is true? That is the huge question here. So, this actually I thought was super interesting. That kind of led me down this other path which is ... so, one thing I know about myself is that what I've learned about myself, it took me a long time to learn that, is that I'm actually a fairly intuitive person. There's just different personality types and you can say, hey all of this psychometric stuff is made up and crap and what have you and what's the actual foundation and what's the data saying, blah, blah, blah, this type of stuff but I found generally that it can help you observe yourself and see why am I feeling comfortable in this space and why am I not feeling comfortable in this other space and one of the things I learned by going through some of these things is that I'm a fairly intuitive type and so that's another thing that's kind of came out in this kind of approach to looking at ... Sort of questioning whether data really tells you the truth.
Christian: If you have people that can make decisions based on intuition and instinct and so forth and I thought that was just awfully interesting 'cause it kind of felt sort of related to me.
Ben: I think I connected with that part of the book too 'cause Christian is one of my top book recommenders now at this point. That Sensemaking book was great. I think I connected the most with the book when he got to that point where he talked about intuition and about experts being able to act on instinct because of all the experience they've accumulated over time. It's not a bad thing. It's actually even last week, we talked to Matt Ballantine and Matt was talking about the same thing and it just seems to be kind of this topic going on is that where do you use data and how do you use data and not discounting the intuition that people have built over time but then also realizing your own context and my intuition is not necessarily going to be correct for someone that's in a completely separate context too and that's ... I mean since data is driving so much of the technology we use today, that's an incredibly important topic.
Christian: Yeah, no, I agree but here's the kicker. So if you feel like you trust your intuition, which I think has advantages and if I wasn't able to trust my intuition this company would not exist.
Ben: Right, exactly.
Christian: Because the data would have probably said you're f'n crazy. You have no chance to compete against the people that are already in the market and blah, blah, blah, all these things and I think to make bold decisions, sometimes you have to trust your intuition but it's very tricky because if you get too full of yourself ... So the thing that I realized next was that maybe there is a self-serving aspect to [inaudible 00:15:41] of the argument in the book, that basically says I can make better decisions than the people who bring data and then I got really worried because I like to serve myself as much as anybody but that led me to is, again, this is like this entire ... The internet is just a crazy thing where when you hear about a new type of car and then suddenly you drive down the freeway and everywhere, you see this new type of car or you see a particular license plate that has a particular number on it and suddenly every other license ... I find myself using the internet in my daily habits of ingesting information and in a very similar way where once I'm aware of a topic, then suddenly related things started popping up that I ... It's all serendipitous basically.
Ben: Thus, intuition is core, right?
Christian: Exactly, I guess, yeah, maybe.
Ben: It's making those connections.
Christian: But it just makes yourself ready to sort of perceive, right?
Christian: So, then I came across this other book which I think is ... I think a lot of people ... I think came out in 2016 and I think a lot of people have looked into it and it's kind of sort of start of a lot of discussion. The book has a fantastic title, it's called Weapons of Math Destruction.
Ben: Yeah, I love that title.
Christian: Math Destruction and is written by a math PhD and the main point ... I think one of the sort of underlying points is that this idea that data in itself speaks to any kind of actual truth for whatever that definitely is, is essentially complete horse crap because the data gets collected by people, the data gets analyzed by people and those people bring their intuition to the task and they bring their biases to the task, and then I started looking into ... and then I just realized, and this is when I got into this whole loop about the self-serving aspect of trusting your intuition and it's back to having to somehow find a balance because biases are a real thing and so I started looking a little bit into bias. So basically, if you look it up in Google, it basically said it's prejudice in favor of or against one thing, person or group and you go right to prejudice, I mean that's a pretty harsh word.
Christian: So, then I went to Wikipedia and this is just too funny to share. So, if you look at the Wikipedia article on bias, it has the neutrality flag turned on, so basically right under the headline, it has this [inaudible 00:18:14] box that's basically flagged for neutrality. It says, so this is the article on bias and there's this big box that says the neutrality of this article is disputed.
Ben: You can't make that stuff up.
Christian: No, you really can't and this was literally yesterday when I kind of looked all these things back up and it was just so funny. There's this other article that's called ... Like on recursion that people constantly vandalize and just replace the entire text with see recursion. Yeah, so Wikipedia's cool but there's so many ... And so you look at this and when you start wrapping your head around things like bias, then there is just so many and it feels like you're almost powerless, anchoring. You go and the used car sales guy gives you the initial price that's too high and he goes down two grand and you're like, okay, I'll buy it now but if you actually were to do your research, you're still paying two grand too much because the original anchor price was set up, so the anchoring is like the first thing that you basically hear about a particular topic usually forms your opinion and it's very hard to get past that and I think that's probably why these political attack ads work so well, because if you're sitting out ... that's why they spend so much.
Christian: They seem to work well, right? Because people spend so much money on it. Because if you have any sort of topic, ballot or particular person and they do that not just in terms of presidents, et cetera, but this goes all the way to municipal kind of elections and a local ballot things, so they try to basically anchor you on a particular opinion. It seems like generally this is almost impossible to train yourself out of.
Ben: Well and they're not changing opinions, they're picking something that's probably already there and trying to pull it out to a certain extent, right?
Christian: They're setting their opinion, right?
Christian: So, if you believe that anchoring is a cognitive bias that is ... And I think generally people believe that that is something that's true for whatever definitely of truth but it seems to be more or less people have observed that that actually seems to happen quite a bit and so it seems that if this is true, then you rely on the first bit of information to make a decision.
Ben: Oh, okay, so it's the first thing that you hear and you build off of that?
Christian: Yeah, if I can [inaudible 00:20:22] and say if I'm the conservative guy and I have a liberal guy and then I come in there and I say something really nasty about this liberal person or about their policy ideas, et cetera-
Ben: Then everything works off of that.
Christian: Yeah, it's actually quite interesting. So there's lots of biases and confirmation bias is another one.
Ben: Well, tell me a bit more about that. So, what does confirmation bias actually mean in this context?
Christian: So, confirmation bias is essentially defined as focusing on information that supports our beliefs and paying less attention to information that contradicts that and also sort of if you have ambiguous information you will just assume that it supports your point. And we've been going ... I mean we've had our own game of this for a long time between ... There's engineering and product management and okay, so who has the data wins but it's not that easy.
Christian: Because your piece of data versus my piece of data, if we have different biases, I'm going to take my piece of data supporting my opinion much more seriously and so that's tough.
Ben: And there's something very natural about that because if you actually want to officially make decisions on some point, you can't just view all the data all at once, you have to ... It's always that balance but then I guess recognizing that that's an issue is like the first step.
Christian: Exactly, and I think that's probably the best we can do in all of this and this is like my general takeaway is I don't think I look ... I mean I'm not an expert on any of this stuff but it doesn't look like there's an actual solution here and you just have to be aware and you got to train yourself to sort of stay aware of these things that are influencing you and that are natural, and you can't just shake them. So, you can't sort of turn yourself into a purely rational machine but the trick is that on some level, okay, so I trust my intuition but then I also learn about all these horrible biases that seem to be expressed by people by and large, so that ends up sending you for quite a loop and then other biases are obviously prejudice and classism and having opinions about people based on social class, I think that's something that's very popular, unfortunately. The rich usually look down on the poor. It's often by saying that there's some sort of moral failure.
Christian: I think in this country for sure. I think it happens in a lot of countries and a lot of places it's pretty tough and the other one that I like is lookism.
Ben: I haven't heard of that.
Christian: So, lookism, so basically you judge people by their look. So, take the news anchor thing. So the news anchors are usually really good looking because generally people seem to trust good looking people more, which that really works in my favor because-
Ben: Of course.
Christian: I'm extremely good looking after all, so. Yeah. So one quick way to figure out how to basically figure out what people are biased against and so this is the classic trick I do, you basically use this Google prediction type of [inaudible 00:23:17] prediction thing and so I did that [inaudible 00:23:20] I did it yesterday. So people are ... And you type biased against and it completes it. So, here's what I completed as of yesterday, action in congress, religion, and then introverts, bias against introverts is like the fourth hit.
Christian: I was like what did I ever do to you. Bias against conservatives, okay. Bias against mental illness. Bias against LGBT, unfortunate. Bias against Israel, a huge conflict there. Bias against homeless and then the last one, I'm not gonna say whether it's my favorite or not but bias against conservative students.
Ben: That's very specific.
Christian: Literally I have the screenshot. It's pretty funny.
Christian: So these things are like mirrors, sometimes, but I think the reality is that when you hear about this stuff, you might not actually internalize it to the same degree because I think most of us consider ourselves to be fairly intelligent and aware. I'm not walking around admitting to everybody on the street that I'm a victim to my biases, but it does come through at times and so, there're all kinds of sort of little games that you can play with yourself and try to observe yourself, try to put that second voice in your head that sort of observes.
Christian: So, for example, go through ... So, we're in startup land here, so we go to a lot of companies and we look at the about us pages and we look at the executive team.
Ben: Right, right, that's usually the first place I go.
Christian: Exactly, product marketing, we just look at this and so just try this at home, look at it and keep doing it for a couple companies until you hit one that has an African American CEO and then swallow and then see what your reaction was or a female CEO or I don't know, all executives are from India or something like that. So, my prediction is that what you will find in observing yourself is going to be surprising.
Christian: So, there we are. We want to sort of ... We want to rely on our intuition, biases are a real thing, so we have a bit of a problem here.
Christian: And then people go and say well, but that's why we want to use data in order to iron out all of these things because I'm not the first person to talk about bias, I think that's been around forever basically. So-
Ben: Especially recently, yeah.
Christian: This is not like new insight or so-
Christian: And then what you're getting is and this is where the book comes in that I was talking about just a little bit ago, this is where it gets really interesting because big data analysis and mathematical modeling and so forth ends up being introduced in order to overcome these biases. So, there's one example, this is from the Weapons of Math Destruction book where she's basically talking through a number of scenarios where people went from thinking that sort of qualitative assessments were subject to bias and tried to replace them with more quantitative approaches, so basically models and this type of stuff. And the entire book is essentially, chapter for chapter, a dissection of the good intentions leading to oftentimes, not always but there are some other things where it comes more like predatory lending and these types of things where I don't think you can claim good intentions but there's a lot of examples when it comes to ranking teachers and recidivism is the other example that's also been talked about in other places where the approach was to initially go and say, hey, all this reviewing of people that we're doing, whether it's judging teachers on how efficient they are or how good they are or how much they contribute really to their students advancements-
Christian: Right, I mean that's one way to define efficiency for a teacher if you want to use a cold word like that and then recidivism is about the chances of somebody who has committed a crime-
Ben: Going back to prison.
Christian: To basically commit another crime.
Ben: Yeah, yeah, right.
Christian: And so, when you look at teachers evaluations, then it's all based on peer feedback and so forth and you start looking at well, but maybe the teacher kind of bought a new car for somebody or that's a stupid example but at the end of the day, it's like these kind of soft social things in there and then-
Ben: And then friendlier teachers versus not or something, yeah.
Christian: Exactly, right? And then people try to quantify that and then what happens is that they are focusing on test scores and specifically there's this one example that she has from Washington DC in the book that is really kind of interesting where they had some ... They thought the school system was underperforming. They brought some reformer in. They built like a teacher assessment tool, which is mathematical model of some sort and an algorithm basically. I don't think it was like super complex but so what happened was, they called this the value added model, so they were basically tracking the test scores of students here over a year and if the scores went up, that meant the teacher was good and for those that the score went down, the teacher got a lower score than themselves and they would but the bottom 2% every year and then they would cut the bottom 5% of teachers. Simply based on this evaluation of essentially the delta between the scores of the students year over year and what happened and this became like a national thing, there's like Washington Post articles on this, et cetera. They had a bunch of teachers where they just ended up finding themselves getting fired and they couldn't really figure out why.
Christian: And so the first problem that happened was that the score wasn't actually explained to them because, of course, the people who built this tool didn't want to sort of reveal the algorithm.
Ben: So, you could game it.
Christian: Right and so what had happened to her most likely in this case of this one teacher was that the school that had the class previously ... So there was ... This was kind of, I think, between schools and from between grade and you know the education system here better but she had them the first time in the first grade that they came over to this school and the out coming school, they had like really high scores and then when they showed up in her class, they could barely read and so, of course, the scores dropped because she didn't inflate them artificially but what then happened, what they reverse engineered was that because of this value-added model, the difference ... The trajectory of the scores for the students went down and that put her in sort of the bottom rung and then she got fired and I think she had to raise hell to basically get any kind of explanation out of the model and this is kind of done generally. One of the things that people bring up that makes things very different and makes things very complicated when you are being judged by an algorithm, that the ecosystem around the algorithm believes the algorithm is correct, that algorithms often can't explain their results.
Christian: You get that in machine learning all the time. I mean we went through this in our own product actually where anomaly detection, okay but why is it an anomaly and so we ... It's not that easy to explain actually and so if you have a product and it can't explain an anomaly that's one thing because people can just ignore it but it this type of reasoning and this type of machinery is being deployed against people, then it becomes potentially life-altering and so this is sort of one of the examples and you run into this paradox because the algorithm is essentially opaque and it's hard to explain but then when people complain about the actions that are being done based on the evaluation that's coming from the algorithm, they are expected to bring perfect evidence, but if you don't know how you're being judged, how are you gonna have perfect ... and you end up in this endless loop which is very unfortunate for the individuals.
Christian: And then the recidivism example goes into the same direction where basically what happens is that in the design of these instruments and these models and these tools, it's fairly easy to observe when you take a step back that there are clear biases that exist that simply reproduce the biases that were present before that people wanted to build these models for in order to eradicate but in the end, they'll just get the same biases but they don't come with an explanation anymore because a biased human at least has a mouth and if you put enough pressure-
Ben: They can explain themselves.
Christian: On them, they can at least try to explain themselves but if you have an algorithm and a bunch of data that basically spits out a recommendation to a judge in terms of what your risk of recidivism is, you might just end up getting a longer sentence and the longer sentence is gonna typically increase your risk actually because you're gonna be in prison longer, you're gonna be out of social ... Your regular social environment longer, all of these types of things. There's a lot of criminality or criminal stuff happening in prison, et cetera, so it's like another endless loop and so the way that this recidivism model that is being talked about in the book [inaudible 00:32:33] was that they basically ask people questions, it was a survey and then they computed a score and the questions for recidivism were how many priors do you have? Do you take drugs or alcohol? Where you grew up? When were you involved for the first time with the police? Et cetera, et cetera and those are basically all systemic questions to figure out by and large whether or not you're African American or Hispanic and you come from a poor area. It's sort of actually asking for it.
Christian: If you then look at the statistics for New York City stop and frisk, et cetera, and you look at how it breaks down based on race in terms of [crosstalk 00:33:12] to the population, I mean those biases exist in reality, and then you re-encode them into the models but then the outcome of that is that the model is explained as being mathematically sound and questioning the model becomes much harder if not impossible. And so the argument to take away from that then, if you follow that line of reasoning is that with these types of things, potentially things are getting worse than they were before because of unintended consequences and because of this potentially pliant belief that as soon as I have data, I'm right.
Christian: And now you're coming full circle because there's really ... Now what do you do? Intuition is prone to biases, so okay, let's do data but how is the data being collected, how is the data being interpreted? It's not that the data or the algorithm is right themselves, there's always humans involved and the fact is that humans are just messy.
Ben: And bias maybe harder to uncover in that instance.
Christian: So, I thought that was ... I think there's a lot of really interesting things going on there and I think it's an interesting discussion in light of the pendulum having swung pretty heavily towards looking at some of these sort of really amazing things that you can do with data, like, I don't know, detecting potential infections in prematurely born babies in a hospital two weeks before they actually show signs that humans can observe and then day after day, being able to sort of analyze seismic activity to predict earthquakes and that is some pretty cool stuff and stuff that really helps people but generally, the issue is that it looks like people have now analyzed and at least there is some set of people that are basically saying hey, there's potentially a cost to some of these applications of data and I think that's really what it's all about and it's about awareness at that point and trying to figure out whether, given the context, how you should interpret the data basically.
Ben: Yeah, hey, well, to wrap up, I want to ask you a question. So, you're a CTO in Silicon Valley where a lot of these algorithms are being written and programs are being written, when you think about it from that vantage point, how do you see the responsibility of these companies that are writing these codes and creating these algorithms are to address some of the things you're talking about?
Christian: Oh, it's very tough. Commercial interests are at work and we have a startup here. We have commercial interests. I cannot be a bigot about this and say it's the other people's problem. So, I think generally the discussion that has to happen is about the ethics around all of this. I don't really have a perfect answer or even anything close to a perfect answer but I find that following this train of thought and looking at references to these types of topics that we just discussed, it becomes a little bit clear to me that folks need to reflect on these types of things and I think nobody ... There's a couple of books on this and nobody really has an algorithmic solution to this because it is messy.
Ben: No, yeah, right, right, right.
Christian: And so, the Weapons of Math Destruction author suggests some sort of hippocratic pledge, like for doctors-
Ben: That's interesting.
Christian: For data scientists and then if anybody who's listening wants to dive deeper into the sort of ethics part, there's this lady called Kate Crawford actually who is this super accomplished professor and she's been writing at a very high academic level about data bias and fairness and these types of things and one of the recent things that she wrote about is essentially sort of 10 recommendations for sort of things that you need to keep in mind when you're dealing with data and then so she's touching on things such as always assume that data are people. Don't assume that just because it's a public data set, it's properly anonymized especially when you link public data sets, you can often identify individuals. Don't trust your own anonymization because that can often be reverse engineered very easily because oftentimes there are people out there that have incentive to do that, et cetera, et cetera. So, I think there is a ... I mean there's another three podcasts in that alone, frankly, so yeah, it's an intellectual exercise on some level. I think you just sort of have to want to kind of try to solve this problem on some level but I think it's a sort of step-by-step becoming aware and potentially saying hey, we could build this feature but we won't.
Ben: Yeah, what does it mean? Well, this has been super interesting, Christian. It's nice to see how your mind is thinking about this and I think these will be pretty educational for people listening.
Christian: As I said, humans are messy and so am I.
Ben: But that's what makes life interesting. Well, everybody, thanks for tuning in and Christian, thank you for coming on. Check out the rest of the episodes. We've got some more episodes coming on this very set of topics, based on some recommendations we got from Christian and others and look forward to that in your feed. Thanks everybody.
Speaker 3: Masters of Data is brought to you by Sumo Logic. Sumo Logic is a cloud-native, machine data analytics platform delivering real time, continuous intelligence as a service to build, run and secure modern applications. Sumo Logic empowers the people who power modern business. For more information, go to sumologic.com. For more on Masters of Data, go to mastersofdata.com and subscribe and spread the word by rating us on iTunes or your favorite podcast app.
Christian Beedgen, the Co-Founder and Chief Technology Officer at Sumo Logic, has had a long history in the world of data and is a fantastic person to have deep conversations about meaty topics with. And that’s good because our topic is meaty! Christian has been spending a lot of time lately thinking about Bias - Bias in Data and Data Analytics, in particular - and that’s what we are going to talk about.