Free the Data (Guest: Clark Richey)
Ben Newton: Welcome to the Masters of Data podcast, the podcast where we bring the human to data. I'm your host, Ben Newton. In this episode we talk to Clark Richey, Chief Technology Officer and co-founder at FactGem.
Ben Newton: Clark has spent his career working with data and he co-founded FactGem to help companies get more out of their data. Clark is passionate about freeing data from its constraints and breaking down data silos that hamstring good decision making in today's business world. So without any further ado, let's dig in.
Ben Newton: Welcome everybody to the Masters of Data podcast. As always, I've got another great guest on. Today we have Clark Richey, who is the Chief Technology Officer at FactGem. Welcome Clark.
Clark Richey: Thank you. Glad to be here.
Ben Newton: Absolutely. Good to have you on here. It took us a little while to actually get this on the schedule but now we've got you here and I'm excited to talk to you. As always, Clark, what I love to do is to just start off and understand what makes you you, like how did you end up as as a CTO in kind of the data realm. Tell us your story.
Clark Richey: Thank you. So I actually came into computer science out of the military. I was active duty in the United States Coast Guard. Coming out of the Coast Guard in computer science, I pretty naturally landed in a spot where I was doing a lot of government work, so I spent the majority of my career as a computer scientist, actually in the defense and intelligence spaces primarily, working for DOD and a lot of the big three letter agencies as a contractor for various companies.
Clark Richey: Prior to starting FactGem, I actually was the Director of Public Sector Sales Engineering for a database company out of San Carlos called MarkLogic, again mostly government. It is public sector, but it really it's primarily defense and intelligence.
Clark Richey: That was a great background, because I had gotten the opportunity to work with some of the largest, most complicated data systems on the planet and see how those things were being approached, what challenges there were, what techniques people were using and so on, so that really got me thinking about a lot of stuff.
Clark Richey: Then almost seven years ago now I got the opportunity to co-found FactGem with Megan Kvamme, our CEO, and yeah, I jumped with that and here we are.
Ben Newton: You live in DC, right?
Clark Richey: I live just north of DC, up in Westminster.
Ben Newton: Oh, okay. I lived for years up in Maryland, so I love the area. Between Connecticut and Georgia, outside the beltway, it is definitely a great area. I worked with a lot of people like yourself with that background, so I can definitely understand. And like you said, I feel like my experience consulting with the Department of Defense and other various agencies was a really good way to get into the technology area, because they're using so many different things.
Ben Newton: I remember when I was... I actually worked for the US Army for a little while as a contractor, and I mean we were running the biggest email system on the planet short of Yahoo at that point in time, in the early 2000s, so it was a really good experience.
Clark Richey: They're doing great things.
Ben Newton: Yeah. Absolutely. So back to kind of what... Why did you guys start FactGem? What problem were you trying to solve?
Clark Richey: Yeah, so it was in collaboration. Again, our CEO, Megan, she came out of sort of an investment banking background, and as part of that she was doing really large deals that required her to put together models that were often hundreds of spreadsheets. She was frustrated with that and wanted to understand why isn't there a way where I can take all of these spreadsheets that are about the same thing and bring them together in one spot and understand the big picture?
Clark Richey: She actually talked to a bunch of people in Silicon Valley and other places and they said no, that's just too hard, you can't do that. Around the same time, particularly while working at MarkLogic after... It was a great database, still is, but I saw that as the big data space is maturing and interesting tools are coming out, those tools are being targeted for the most part at engineers, at computer scientist.
Ben Newton: Yeah.
Clark Richey: Well that's great. There's a lot of things that I felt there that if we worked a little harder on the product side we could let data analysts, data scientists, line of business managers actually perform data operations themselves without having to have the time and the expense of having computer scientists do this, and we could let those people work on the really hard problems, not just the I have a whole bunch of data. I need to manage it in a good and careful way at scale.
Clark Richey: So I wanted to see if we could tackle those, so the two really came together. We combined those. Now we've been very focused on let's get the relationships between data, kind of like the LinkedIn for data, and helping people understand data within context, but doing so in very rapid ways without having to employ big teams of engineers.
Ben Newton: This is super interesting, and I think, Clark, when you talk about getting different people into the data, that's something that's come up before on the podcast in talking to people, and I think it's a really interesting topic, because one of the things I've seen is for a lot of these companies out there to become more competitive and, you know, move faster, you can't always have like the technical organizations, the engineers, be like the translators, the high priests of data, right?
Ben Newton: So what happens is you get these data silos like in the organization. So talk to me more about that, because I know it's something you've thought a lot about. I mean what are you seeing? What do you think that you're seeing these kind of... What's the right word? Basically kind of disorganized data strategies in a lot of these companies. What do you think is happening?
Clark Richey: Yeah. There's a lot there. Data silos in general, I mean that's a part of it, and I really... When we talk about a data silo we're typically talking about some database, some data warehouse that is essentially isolated, right? It has a narrow lens into a piece of the organizational or corporate data.
Clark Richey: Those were created for good reasons. They're often still very valuable. They were created because if you look at the tools that IT has had for 50 years or so, at scale that's kind of what you had to do. You had to separate things to get it performed in the way you need to.
Clark Richey: But as I think about these silos now I'd like to draw an analogy and say that in a lot of ways data silos are like dinosaur fossils. That is if I want to understand... If I find a fossil and I want to understand how did that dinosaur live and what was its family structure and why did they die out, I can get a lot of information by really analyzing that fossil, but I can't really see what did it eat. I can't understand the other things around it and the context and how it related to its world and other things.
Clark Richey: I might be able to go and find other fossils in the area, like a fossilized plant, and study that as well, but then I've got to do a whole bunch of more math and things to figure out is it really contemporary to this other thing I found? So I can kind of start to link things together, but it's really, really hard.
Clark Richey: Data silos are the same way. You're able to look at one thing, but you've lost all of that context. And yes, maybe you can go to a different data silo and start to piece it back together, but it's super hard to do that and it consumes a ton of time and resources, and you lose data fidelity as well.
Ben Newton: That makes a lot of sense. I guess these things, they kind of naturally form, right? I've talked to in other contexts about Conway's Law, like how it's basically when people build applications it reflects the organization structure, and I think it seems like even with data and how people use data, it's really reflective of the business structure, and that's really hard to overcome because there's just a natural pooling of data where people work together.
Clark Richey: Right. And that's been supported by the state of data storage and databases for the last 40 or 50 years. It really mirrored that. You couldn't reasonably say I want to have all the information together in one spot. I mean people tried to do that in the last five or six years or so with the new data lakes and they thought that would be the big answer. As we're clearly seeing now, like what's happening with MapR and other companies, that wasn't the answer.
Clark Richey: All you've done now is create a different place with a bunch of stuff and whole lot of really expensive engineers are now required to do things. But I think we can get there now. There's new technologies, and we can start to bring those big pictures together in ways that are cost effective and give people that larger picture.
Ben Newton: You mentioned the concept there of data lakes, so maybe take a little bit of a tangent there, because I... Personally that term comes up all the time, and even as a person in the field I have a hard time pinpointing what that actually means. Where does that term come from and what do people mean when the say a data lake?
Clark Richey: Yeah, don't feel... You're not alone, because I think a lot of times people do use that term to kind of mean very ambiguous things. If you really sort of pin it down, it goes back to kind of the beginnings of the creation of [Fadoop 00:08:21] and the Hadoop file system that provided IT organizations with a very inexpensive, highly resilient way to store data on servers.
Clark Richey: I've seen implementations in the government where these exceed a petabyte. That's massive amounts of storage. So the thought was hey, you know, these data warehouses we know are giving us silos. They are providing a keyhole into the organization data. That's not ideal, but we can't really go... We can't put everything into a single data warehouse. The technology isn't scaled that way.
Clark Richey: What if we just dump everything into this ginormous sort of virtual file system? That seems like a great thing to do, and then it's kind of the old South Park skit. Step one, steal all the pants, step three, rule the world. There's that missing step two that they have. They go oh, yeah, step one, data warehouse or data lake, step three, problem solved.
Clark Richey: What's step two? No one wanted to talk about step two, and it's because step two involved getting a lot of very expensive engineers who were familiar with map redos and distributed file systems and so forth to write a lot of code to do something, and if they want to change that it's back to engineering.
Ben Newton: I've talked with people before that there's... You know, there's kind of a sense that big data kind of failed in a lot of these efforts. It sounds like probably what you're saying is that it was like a skills gap and like lack of... I don't know, just kind of throwing technology at a problem instead of really thinking through it.
Ben Newton: Do you feel like that's basically what happened, is that organizations looked at the technology or read it in a CIO magazine and were like hey, this will solve it for me, and then they realized it's a lot more complicated than they thought and they didn't have the skillsets to solve it?
Clark Richey: Yeah. Actually this happens I think all the time. Gardner has sort of the famous technology adoption curve with the trough of disillusionment and everything. That was certainly happening. At the same time, you had really smart people at really big corporations, again like MapR, who are saying this is absolutely the right way to do it. They had really smart people that could explain it, and they had good tools.
Clark Richey: But if we think back, and I'm not going to be making any friends in my software engineering community saying this, but it's software engineers writing those tools that are designed to be used by other software engineers at a consulting job, and that's essentially a forever lock on software engineering, I'll create a product that's going to force you to employ more software engineers to do stuff.
Clark Richey: It's great for software engineers, not necessarily so great for businesses. That's what people ended up seeing. It just became too expensive to do. The market is really starting to wake up to that and demand more agile, lower cost solutions that just wasn't being provided by those companies and that technology.
Ben Newton: Yeah. Yeah. That makes a lot of sense. I guess that could be a whole different conversation, but what you're getting into is that's one of the hardest problems to solve about developing products and solutions in general, is they tend to be developed by a set of people who then design for themselves, and particularly in a lot of these... When you're solving this, the real audience when you start to make this data worth its weight in gold is not with the people writing code. It's with the people that are running the business and making business decisions, but they're almost never involved in the actual implementation of these, right?
Ben Newton: So we have these organizations that have naturally over time, just because they're human and this is the way that humans work, they're creating data silos. They're dumping it in overpriced data lakes that aren't really working.
Ben Newton: I guess one thing, based on... Do you think that the... Data lakes is coming up a lot in the kind of cloud era and you hear that a lot with some of the clouds that are out there. Do you think that... Have some of these offerings in the kind of cloud world changed anything, or are they really just exacerbating the problem you think?
Clark Richey: I think that it's really just exacerbating the problem by shifting the problem away from your servers to theirs. I mean think about... They're a fantastic way if you're Google or Microsoft or Amazon and you're selling server time. What better way to spin more dials on the server than to say move all your data to my storage? That's great.
Clark Richey: Then of course they say and now use my technologies, my Amazon Redshift, or whatever that technology choice might be, to then look at your data. It's still a giant data lake. They're still tools built by engineers for engineers. They're still not super flexible. You're still taking context out of the data.
Clark Richey: Are they starting to change them? Yeah. Both Amazon and Microsoft, for example, have in the last year released major graph based database offerings that are starting to shift the needle on this in saying yeah, you can still use our servers, but here's a technology that actually allows you to keep the context, to keep those relationships, and that's going to be really important going forward.
Ben Newton: I did notice that when I was on... And I want to get back to the whole relationship thing you were talking about, but I do remember when I went to, you know, Amazon AWS Reinvent last year it just seems like it is funny to see how there's a emphasis on getting that... Moving that data where literally they'll pull a truck up to your data center or they'll have these on premise devices, these actual things that you will put in the data center, suck up all your data and you'll move it.
Ben Newton: That seems to be kind of shifting the... Like that's the cloud migration strategy, is to like I'm going to suck up all your data and move it, which makes sense, because your data is like the real heart and soul of the business.
Ben Newton: But, as you're saying, they're really only just now getting to the point where they may actually start solving problems as opposed to just dumping it from one place to another.
Clark Richey: Yep. I worked with some really big organizations that said hey, we're going to have this new cloud strategy and we're going to go to Amazon, for example, for all our big, critical stuff. You ask them... You say, "What are you going to do when the region fails?" "Oh, that's Amazon. It's not going to fail." Look at the cloud. About once a year you get a regional outage. It might be an hour, sometimes it's like two days, and small businesses go out of business.
Clark Richey: You might be really big, but when you call Amazon and you're upset because your business is down because the region is down, they really don't care. You're not that big. Sorry. They know it's a problem. They're going to fix it, but you've got to deal with it. And yes, there are multi-region fail over strategies that you can utilize, but you're now really racking up some server time.
Clark Richey: People are finally starting to realize too while the cloud is great, it is not foolproof. Cloud servers go down, mistakes happen, and you have to have a strategy for dealing with that just as you do on your own servers.
Ben Newton: Yeah. No, no. Absolutely. It doesn't... It's not like the problems go away. You've been talking a lot about this idea of relationships and context, and I agree with you. I think that's really important. So talk to me a little bit more about what you actually mean there. When you say retaining the context, retaining the relationships, what does that actually mean?
Clark Richey: Nothing happens in a vacuum, right? So if you're a retail company, for example, sure you need to understand your customers and your products and your transactions and information about your store and so on, but you can't look at each of those things in isolation. You can't, for example, just look at store traffic and understand how is the business really doing.
Clark Richey: You have to understand like who is shopping at the store, and when, and what are they buying. Even going further out, is the weather impacting things on a certain day? What is inventory have to say about this? It's a big picture to really be able to understand the business in a deep way.
Clark Richey: Unfortunately, until recently technologies didn't really do that. The great sort of irony of relational databases is that they're no good at relationships. No, they're not. They were really never designed to do that.
Clark Richey: I've actually worked on government projects where I looked at a big relational schema and I got very, very confused because I knew the product and I saw that... If you've ever looked at a relational diagram, you see tables, like little rectangles, and lines drawn between them that represent joins, and there were no lines. And I asked the government project manager... I said, "I'm very confused. There's no lines." He says, "Oh, yeah, yeah. Those were slowing things down so we removed all of the foreign keys from the database," and that's usually the telling thing.
Clark Richey: You can actually do that, and it still works because the relationships are kind of a lie we tell ourselves in a relational database. They don't really exist in there. They're a myth we all agree to believe in, but they don't work super well.
Ben Newton: So what do you do? What's the way forward?
Clark Richey: Personally I think the advent of the stable productized property based graph is really, really a tipping point in the technology. It's not going to solve every problem. There are definitely problems you're still going to want to solve with a relational database or a columnar database, but when you really need to understand the context, understand how customers are related to products and transactions, you need to understand supply chains, for example, and how a supply side disruption is going to trickle all the way down to a product in the store which can be hundreds of degrees of separation away, you have to have all those materialized relationships that you can do analysis on and really understand what is happening at a big picture scale.
Clark Richey: That's what property graphs are really all about. There's a whole big movement in data science now to redo a lot of existing algorithms and work based on graphs now. I'm not a data scientist, but if you look back on the original work, a lot of the math, the original math, if you strip away the computer side it's graph theory, right? It's how are things related?
Clark Richey: But then what happened was there was no way to implement that on the computer side, so we sort of flattened that out. Now people are going back and saying let's put this back in graph theory. There's even some really big efforts happening to redo neuro networking, because neuro networks... Their most positive fix now aren't actually graphs. They're flat table type structures. So it's a fascinating area of development.
Ben Newton: Let's pull that apart a little bit, and particularly for people that are not as familiar with this concept. So when you say a relational database, I guess what we're talking about is you might have a table of sales data and they might have a column that's like customer... That links to another table that has like the customer information. Is that what we mean by... You know, people that aren't really BBAs, is that what a relational database is? Am I getting it right?
Clark Richey: Yeah. Absolutely. A lot of people are familiar with Excel or a spreadsheet, so a table in a relational database looks like an Excel spreadsheet. It's rows and columns, right? Just like you said, if those rows and columns are information about a customer and they want to understand things the customer bought, have another sheet or another table that's information about the product, and then I have to find some way to connect those two things, like a product ID and putting it in the same row as the customer ID to indicate that there's some relationship between those two things.
Ben Newton: Yeah. Okay. Talking about the graph, when you actually say a graph database, in talking about... What does it actually mean? I mean what are we talking about?
Clark Richey: One of the things I love about a graph database is I think it's much more intuitive to people if it is much more natural in terms of the way we thing. For example, if I were to ask you or any of your listeners to explain your job or your business to me, you would probably not go to a whiteboard and start drawing the rows and columns and putting information into cells and tables.
Clark Richey: We don't really think that way. You'd probably start drawing some circles that represent people or places or products or insurance policies, and you talk about those, and then you draw a line to something else to show, for example, that a customer has an account which can have an insurance policy associated with it, et cetera, because that's kind of the way we think about things. We think about concrete entities, people, places, things, and how they relate to other things.
Clark Richey: That's a graph. Those circles and those lines with arrows on them, that's a graph, and it's a very powerful data structure for expressing these concepts and doing analytics on them.
Ben Newton: I think that helps to kind of go through that, because then you know... I've even seen... Having been in the technology world for a long time, we tend to throw words around and then you find that people don't actually really know what they mean.
Ben Newton: I think that's really helpful, because fundamentally you're absolutely right. It's like every time... I do a lot of interviews for my job to see if they're like people that we want to bring onboard, and it's like literally one of the first things we ask is like, "Okay, tell me about your organization." I'm literally drawing circles with lines in between them on the board, because that is how we think. It's human relationships.
Ben Newton: I remember we actually had... We had a guy on the podcast before in education and he was, you know, drawing graphs to talk about social relationships and how that affected educational change and reform. When you put a term like graph database on, people are like I don't know what you mean. But all we're really talking about is relationship between things, people, whatever it is. It's representing that.
Clark Richey: Yeah. And it's great. It's one e of the things I love most about it, because I can go to a prospect or a customer I work with and I can show anyone of your listeners or yourself the most complex models that we've done in a graph to represent a business, and I can just put it there and walk away, and you can look at it and you will immediately understand what's happening.
Clark Richey: You'll go oh, yeah, customers make transactions that have line ends and products when those transactions occur in a store. If I do the same thing with a relational database or a columnar that's voodoo. I mean I've even I've even created some myself in the past and come back to it three months later and looked at it and go wow, I have no idea what I did here, because they're complicated and you have to have basically a degree in engineering to understand them. These you don't. They're very accessible.
Ben Newton: That's what makes sense. So like going back to kind of where you originally started this, kind of what you're saying is that part of this transition to breaking down the data silos and making these lakes or ponds or whatever data that you've got to spread around here and make it useful is to take this concept of a graph or taking these relationships. So what does that actually practically mean? What are you actually doing to make that data useful?
Clark Richey: That's right. It's breaking down those barriers. It's letting the people who understand the business best, whether that's at the C-suite or someone who runs a line of business or business analyst... It's letting them essentially go to that whiteboard and say this is how the business works, and even facilitating large organizations, facilitating that conversation amongst groups.
Clark Richey: Then once you do that, that's how we store the data, so when you draw it on the whiteboard it actually gets stored in your IT systems exactly that way, which is very different than what traditionally happens, where this IT group that translates it into the technology, even though it's nothing like that. And when that scenario happens you have this disconnect, because the business thinks about the business in a certain way and tries to ask questions based upon that, but those don't always translate well and you get this friction. You get this cognitive disconnect, and that causes errors and problems in the business.
Clark Richey: But if you can eliminate that so that the way the business thinks and reasons about the data exactly matches the way IT is storing it, together you can do some much more powerful things.
Ben Newton: I mean what are some of the practical outcomes that you're seeing? Is it really just about enabling these business people to be able to do more with what they have or-
Clark Richey: Yeah. I think there's basically three areas we see there. There's areas where organizations are able to get to some sort of an end result right now, but it's very expensive, typically in terms of manpower and time. There's scenarios where yes, I can have a team of people that they go and they ask a question of data source one, two, three, et cetera, each time pulling down information locally, then going through some manual process to combine those together to then get an answer. We used to do that all the time. It's expensive and time consuming.
Clark Richey: So by eliminating that and saying we'll do this together and it's connected, then you just ask your question. You've now shortened that time span. You saved a ton of time and money. That's certainly one big area where people are realizing the benefits from this.
Clark Richey: Another big one is very often there are questions people get trained not to ask essentially. You start to understand that the data is siloed in certain ways, and asking questions that cross silos is expensive and time consuming, so you start to think about how can I try to solve the problem without asking those questions?
Clark Richey: Now, like I say, no, no, you can ask those questions. Look, it's really easy to, for example, understand how weather is impacting sales, because you can connect the weather to the location, to the transaction, to the customer and so on, and now you can ask that question and it's easy, so really being able to broaden people's horizons and allow them to get super creative and do and try new things.
Clark Richey: Then the third is just really agility. There's a lot of times when senior leaders go to the IT shop and through no fault of the IT organization they say, "Hey, I'd like to know the answer to this question." IT goes, "Well, you know, the current technology is the way it is. Okay, I have to bridge two or three data silos, so what I'll probably just do is I'll create a new data mart," a data mart essentially being like a smaller data silo that's built for a specific purpose, maybe in this case to answer that senior leader's question.
Clark Richey: "That's going to take me six or seven weeks and it's going to cost you a couple hundred thousand dollars of internal billing." "No, I need the answer in three days. In six weeks it's too late," so those projects don't happen. But now you have the agility to answer those.
Clark Richey: So I think the combination of agility, really a shortening of time to value, and really unleashing that intellectual and creative curiosity of the analysis of things, to really explore the data and understand it in new ways that was impossible before.
Ben Newton: That makes a lot of sense. I think whenever you break down barriers and speed things up, as you say, people start doing things that they might not have envisioned before, because the very fact of being able to do it faster just from a human level, we start using it more, you start interacting with it more, and then you start seeing things you never saw before, so that makes total sense.
Ben Newton: So we've gone through from data silos to data lakes to graphs to data marks. We've got all the data words. We're making this transition and clearly we need... You know, it's not just about throwing your data in one place. It's about thinking about the relationships and putting those in the business context.
Ben Newton: Where is the industry going now? As somebody just in the middle of this every single day and you're talking to people who are doing this, where do you see things going?
Clark Richey: I think businesses are headed more towards graph. The government has actually adopted this sooner than commercial, because I'm working both spaces and this is a space where the government is an early adopter. They've been doing this for a little while now because of the nature of their work. Commercial I think is catching up. It's about risk factors certainly.
Clark Richey: But I'm seeing more and more... Let's say five years ago I would meet with a prospect and we'd eventually talk about graph data. They had no idea of what a graph database was or why they should care about that. Now people are actually coming to us and saying, "You know, I've got a graph problem. I have a relationship problem. I need to understand these relationships. In need help doing that."
Clark Richey: So leaders are starting to become aware now through the mainstream adoption, again, through really big companies, especially like Microsoft with their Cosmos DB, the Neptune DB by the folks at Amazon, that yeah, this is a commercially viable technology. This is real. Let's start looking at this.
Clark Richey: I think businesses are getting a little bit frustrated by the ability to get access to data and the market pressure is being placed on a lot of more traditional companies by the huge players like Amazon and BJ's and Walmart and so on, are forcing people to think long and hard about how can I do more with my data to stay competitive?
Ben Newton: Absolutely. I think that's another part of that whole transition that we're seeing right now. It's like you move fast, you move smart, or you die. There's really no other way.
Clark Richey: Yeah. Absolutely. While those market pressures are strong, they're also of course creating opportunities, right? There's really opportunity for smaller players, even in the digital space, to do really well through... You know, personalization is a big topic right now. How do I make it a personalized customer experience? But that really requires a wide breadth of data about that customer and what they're doing on your website, what they've done in the past, and who their friends are on Instagram and so on. And that requires understanding of relationships and connections.
Ben Newton: Yeah. Absolutely. You know, Clark, I think this has been a great discussion. I mean I even... As much as I read up on this stuff every day, I know I've learned something from this. So I think that kind of transition from old school databases to data lakes, and now through like this concept of really using the interconnectivity of the data to understand I think is fascinating, so I appreciate you taking the time to come on and explain that to us. Thanks for being on the podcast.
Clark Richey: Oh, thanks for having me. I had a great time.
Ben Newton: Thanks everybody for listening and as always, you know, rate us in your favorite podcast app, wherever you go to find podcasts, so that other people can find us, and look for the next one in your feed. Thanks everybody for listening.
Speaker 3: Masters of Data is brought to you by SumoLogic. SumoLogic is a cloud native, machine data analytics platform delivering real time continuous intelligence as a service to build, run and secure modern applications. SumoLogic empowers the people who power modern business.
Speaker 3: For more information go to sumologic.com. For more on Masters of Data go to mastersofdata.com and subscribe and spread the word by rating us on iTunes or your favorite podcast app.
In this episode, we talk to Clark Richey, Chief Technology Officer at FactGem. Clark has spent his career working with data, and he co-founded FactGem to help companies get more out of their data. Clark is passionate about freeing data from its constraints and breaking the silos between data that hamstring good decision making in today’s business world.