Enterprises face challenges in adopting graph technology, which represents a paradigm shift for most organizations. Partly it’s a shift in mindset, empowering more and more people to envision the possibilities and begin thinking in graphs, driving demand from the business on up for graph technology.
In this week’s five-minute interview (conducted at GraphConnect 2018 in NYC), we spoke with Dan Woods about his talk at GraphTour and his experience in talking with those implementing a graph database in an enterprise context.
What did you speak about at Neo4j GraphTour?
Dan Woods: My talk was on how to find the hundred million dollar graph query.
What it came down to is that if you really are going to systematically adopt graphs, you have to do a number of things. I went through a typical enterprise maturity model where you first find initial use cases, gradually create a center of excellence, expand that and make it operationally secure, and then put it in operations. And you have to think about all those things at once.
That’s pretty much the story with any technology: How do you take it from an experiment and skunkworks all the way to production?
What do you think is the key to enterprise graph technology?
Woods: I found that two things are key to systematic graph adoption.
One is how to enable a large number of people to understand the power of thinking in graphs, helping them understand what questions they can ask with a graph query, a graph algorithm and graph analytics.
If people understand that they can use analytics both to answer questions and to look at the whole corpus of a graph and find out things about it, when they are working with data, they’ll say, “Wow! If I had a graph, I could answer this question. That would be really cool.” That then drives demand for creating graphs, and shows the way to really powerful applications – because the kernel of the idea started with the person in the business.
That’s the first thing: How do you get as many people as possible thinking in graphs?
The second thing is, once you have those ideas bubbling up, you have to build the graphs.
There are lots of examples of taking relational tables and putting them in a graph. But when you start creating graphs that have five or six sources of information, or 10 or 20 sources of information, you have a graph ETL problem. With relational ETL, you take a source of data, you massage it into tabular form, and you dump it into the RDBMS structure and then you’re done. Then all the complexity of knitting that together happens in the SQL query.
With a graph, it’s different. You take the data, collect it and either put it into a node or a connection property.
Let’s say you dump a bunch of nodes in. Now you have to have a program that understands how to connect those nodes and how to assign the properties to the connections. That’s no longer just dumping data in and then having the complexity come later in the query. That’s a complicated program. And then, of course, you have to understand when you’re updating data and when you’re not.
Graph ETL is the skill, I think, that’s going to be the boundary for companies as they adopt graphs. I think what Neo4j is doing with Morpheus will make that easier. By being able to reach out to relational tables, bring them in, and then have a systematic way of connecting them to a graph – that’s going to be very powerful.
I’ve seen this over and over when I talk to people in cybersecurity – an area widely using graphs – about how they create these huge production graphs. It turns out that most of the time they spent was on the graph ETL layer because that’s the hardest problem.
What shifts are you seeing in adoption of graph technology?
Woods: I think people are now moving from a project-based approach to adopting graphs to graphs as a platform for their technology. I think the next challenge is how to systematically adopt graphs.
Can you talk about Google Kubernetes Enterprise and Neo4j?
Woods: I find it really interesting what’s going on with Google Kubernetes Enterprise and Neo4j.
Once you get into Google Kubernetes Enterprise, you’re in this world in which you have the ability to put code in these containers and then the containers are very easily scaled and the environments very easily managed. You can also then reach out to the Google Cloud Platform Marketplace and find packaged applications like Neo4j and pull them very easily into the Google Kubernetes Enterprise system.
This is how I think Neo4j is going to be relevant to a lot of containerized applications. You’ll be able to pull it in and then integrate all the data that’s in all of these different microservices into one place where it can become one unified view. And all the plumbing will be made easier.
Again, the hard part will be the graph ETL, bringing it all in, but then once you have it in there, you’ll be able to look at this formerly distributed set of information that was trapped inside all these microservices. You’ll be able to look at it in one unified way, which is obviously the power of graphs.
What overlap do you see between graph technology and data warehouses?
Woods: One of the biggest frontiers where graph is making a big impact in is in delivering what we always wanted from the data warehouse and then subsequently from the data lake. In those realms, what did we want?
In the data warehouse, we wanted the integration of data from all of our applications. The data warehouse succeeded brilliantly in creating a canonical model that was basically a language that you could use to describe your business. Now the canonical model didn’t evolve quickly and there were a lot of problems with the brittleness of the system and the things you had to do to make it perform quickly. But once you had that canonical model, it radiated value.
Now the data lake said, “Wait, what about all the rest of the data?” Data lakes are much better at storing that data than they are at integrating it. I think now, especially with knowledge graphs, we can take all the information that was in a data warehouse, all the information that you collected in a data lake, and all the information in place in all the other applications, and we can bring it together into a unified graph and keep it up to date.
What do you find is the most interesting part of Neo4j?
Woods: I think it’s really interesting what’s happening with Neo4j. One of the aspects of the company that’s very ambitious is that you’re trying to not only be an OLTP transactional operational database, where you can actually power applications as a system of record. You’re also trying to build the plumbing to be an analytics OLAP database where you’re going to be a first-class analytics player.
This has not happened in the relational world. We have different categories of databases: Teradata for OLAP and Oracle, Sybase and others for OLTP. Neo4j is saying, “No, we are not going to be one or the other; we are going to be both.” I think that’s really ambitious. If you pull it off, then it is a powerful result because you don’t have to move the data from one of those databases to the other. I’m really eager to see someone who says, “Yes, we’re super operational with Neo4j and we’re also super analytic.”
Want to share about your Neo4j project in a future 5-Minute Interview? Drop us a line at content@neo4j.com
Download this white paper, The Top 5 Use Cases of Graph Databases, and discover how to tap into the power of graphs for the connected enterprise.
Read the White Paper