Keith Hare has joined the Neo4j team to spearhead language standards efforts.
My name is Philip Rathle, Neo4j’s VP of Product Management, and I got a chance to sit down with Keith to discuss his thoughts on databases, standards and the future of the industry.
Philip Rathle: Welcome to the team, Keith. As the Convenor for the ISO committees for SQL and now for the GQL Project you’re a busy person. What have you been up to?
Keith Hare: While I am spending a significant amount of time working with the Neo4j LANGSTAR (Languages, Standards, and Research) team, I am continuing in my roles as the Convenor of the ISO SQL and GQL standards committee, and as the President of JCC Consulting, Inc.
Philip: What’s been your involvement in ISO, and what have you been working on?
Keith: I got started in the US SQL standards process a bit over 30 years ago, partially because it was interesting and partially as a way of keeping track of what was happening in the database industry. In the early 2000’s, I attended several international standards meetings and got to know the participants from other national bodies.
In 2005, I was talked into volunteering to serve as the convenor (chair) of ISO/IEC JTC1 SC32 WG3 Database Languages (the full nomenclature for the international SQL standards committee).
Since then, we have published three editions of the SQL Standard (ISO/IEC 9075 Database Language SQL) adding support for temporal, JSON, Row Pattern Recognition, Multidimensional Arrays, and a number of other capabilities. During the SC32 WG3 meeting, we frequently have informal discussions of database industry directions and what types of things would benefit from standardization.
These informal discussions often occur outside of regular meetings, with cryptic notes on whatever available writing material. Because of this, I often refer to possible futures as the “Napkin of Things.”
“Napkin of Things” from 2018 – an informal discussion of standards future topics.
In 2016, graphs and property graphs popped up within SC32 WG3 and in other contexts. In early 2017, SC32 WG3 reached out to LDBC (Linked Data Benchmark Council) about cooperating to develop formal ISO standards around property graphs. Since then, we have created two projects, one for property graph queries within SQL (ISO/IEC 9075-16 SQL/PGQ) and one for a declarative property graph query language (ISO/IEC 39075 Database Language GQL).We are also looking at other topics such as adding support to SQL for streaming data, but the SQL/PGQ and GQL property graph work are the biggest projects right now.
Philip: What is JCC Consulting, Inc.?
Keith: For 35 years, JCC Consulting, Inc. has focused on high performance, high availability, mission critical database systems running on the OpenVMS operating system.
In that time, we have accumulated a lot of experience in database administration, availability, performance, and a host of related technologies. We have also worked with customers to replicate data from their transaction processing databases to other environments to support reporting, data analysis, data lakes, etc.
While we’ve mostly focused on the data and transaction processing applications, we’ve seen the potential for learning things from that data. JCC Consulting is a small company but includes colleagues with whom I’ve worked for a long time.
Philip: What draws you to graph databases?
Keith: Doing database performance work, I’ve seen a variety of complex SQL queries. Few customers focus on queries that are performing well so I’ve spent time looking at the queries that have performance problems.
These performance problems have often occured because the queries were missing parts of the join criteria. This occurs because the person writing the SQL query has to correctly specify the relationships between the tables in the queries and small omissions can lead to incorrect results and poor performance.
Property graph databases require a bit more upfront application work to explicitly specify the relationships between nodes, but this reduces the work for the query writers and therefore reduces the potential for errors in the queries.
Philip: GQL is big news for the database landscape. You’ve been working with standards since 1988. Can you describe the road ahead?
Keith: In the database languages standards landscape, we have two property graph projects developing in parallel, Property Graph Queries in SQL (SQL/PGQ) and the declarative property graph query language GQL.
GQL is (will be) a complete declarative database language with syntax for defining, managing, querying, and updating property graphs. Where-ever possible, the GQL standard will adopt SQL specifications for data types, operators, etc.
Philip: Why another language? Why not just enhance SQL?
Keith: We started into the property graph standardization space with SQL/PGQ. SQL/PGQ provides two interesting pieces, the ability to create a property graph view on a set of existing SQL tables and the ability to embed a property graph query within a traditional SQL query. SQL/PGQ brings property graph query capabilities and power to existing SQL data, which is a powerful tool for SQL users.
Essentially, SQL/PGQ is the intersection between traditional SQL and a property graph query language. When we started SQL/PGQ, there was not a standard for a property graph query language that we could reference so we had to create significant chunks within SQL/PGQ.
As the work progressed, a number of property-graph vendors joined the standards process. It became clear that there was sufficient vendor interest to justify developing a separate property graph query language standard. In 2019, SC32 WG3 got approval to start on a new project, ISO/IEC 39075 Database Language GQL.
The GQL early working draft has incorporated a base structure and definitions derived from the SQL standard, and capabilities from the SQL/PGQ draft. This has helped the work make rapid progress (rapid for a standards development process).
Philip: What does this process look like?
Keith: Writing detailed standards at this level takes effort and time. The current roadmap calls for an ISO Committee Draft (CD) ballot starting in February 2021. This 12-week ballot will prompt experts to carefully review the GQL draft specification looking for inconsistencies, missing pieces and outright bugs.
Resolving comments from the CD ballot will require time and effort, and a couple of additional ballot steps. If the current plan holds, we will have a published international standard by the end of 2022, by which point we should be well into enhancing the next version of the GQL standard.
Philip: What convinced you to join the Neo4j LANGSTAR team?
Keith: When Alastair Green approached me about joining the LANGSTAR team and described what the position entailed, my first response was “You think I can do that? Are you nuts?” My LANGSTAR tasks include working with both the LANGSTAR team and other vendors to develop the property graph standards.
To this end, I have to better understand all of the property graph products. I also have to talk about property graphs and the standards efforts to outside groups (my kids who are all fairly technical will tell you that the challenge is getting me to shut up) and work with the Neo4j product management and engineering teams to plan for implementing GQL.
So, learn new technologies, work on the standards, and talk about it. And the team I’m working with are really smart, really nice people.
Philip: How do you balance your impartiality and advocacy?
Keith: Carefully.
ISO SQL has come about through the participation of a number of companies working together toward a greater cause. Experts from the participating organizations have a long history of collaborating on change proposals, reviews of the draft, and problem solving.
Experts from the Neo4j LANGSTAR team (Language Standard & Research) have fit well into this collaborative, collegial environment and have demonstrated their commitment to developing standards in a way that serves the broader graph and user community. Working in this environment with the LANGSTAR team makes it fairly easy to balance impartiality and advocacy.
Discover Graph-Based Search