A new year (let alone a new decade) is always a great time to reflect on the year gone past but also where we’re headed.
We leaned on some of our most forward-thinking Neo4j colleagues – Amy Hodler, Michael Hunger and Amit Chaudhry – to take a few minutes and pontificate on their predictions for the graph database space in 2020.
Let’s see what they came up with after gazing into their crystal balls.
Amy Hodler, Analytics & AI Program Manager, Neo4j
How will graph technology and AI continue to evolve in 2020?
Hodler: In 2020, graph-feature engineering will be considered the low-hanging fruit to easily increase machine learning accuracy. If you’re not boosting your ML with graph features, you’ll be playing catch up.
Likewise in 2020, commercial applications of graph embeddings will spread beyond the image analysis space, where they’ve proven very successful. That will open up machine learning on more complex data structures, such as paths that can be used for learnings such as the best possible customer or patient journeys.
Are we on the cusp of any particular advancement that will propel AI outcomes or solutions?
Hodler: The data science community is really leading the charge here, and the biggest change is really a mental shift with graphs starting to be considered a standard format for adding context to machine learning, much like tables are used today for other workloads. We’ve already seen this shift starting in the AI research with significant results. And we’re about to see this make a big impact in the data science community as a whole.
When graphs become another standard format to enhance machine learning, adding highly predictive relationships will become the norm and we’ll see immediate improvements in ML accuracy across industries.
What role – or rather, how big of a role – will data supply chain play in the advancement of AI and ML outcomes?
Hodler: The data supply chain will become increasingly important to ethical and responsible AI for two reasons: 1) Understanding our data, it’s sources and details, is the only way we can build more ethical and unbiased AI. 2) Data lineage and protection against data manipulation is foundational for trustworthy AI.
I think in 2019 we saw a tipping point of public, private, and government interest in creating guidelines for AI systems that better align to cultural values. In 2020, I expect that momentum to increase and see frameworks published such as the EU Ethics Guidelines (with updated checklists expected in early 2020.) The role of a data supply chain will vary by industry and it may still be a few years before the full impact is recognized.
What are the complex AI problems that we’ll need to overcome in the next 3-5 years?
Hodler: Explainability and interpretability of AI outcomes (e.g. predictions/scores/recommendations/etc) as well as how those outcomes were derived (i.e. how are decisions made) is a big, hairy problem that undermines the utility and appropriateness of AI systems. We need to significantly improve our ability to explain the logic of results in a way that’s complete and understandable to less expert users.
For example, an attention heat map might show us that the pixels around a bird’s head were influential in an AI system deciding this was a picture of a parrot. However, that says nothing about what drove the decision: was it the color, the angle of the beak, the texture of feathers? We have that same problem with most complex AI solutions today and that’s not acceptable, especially for high-stakes decisions.
Michael Hunger, Head of Developer Relations, Neo4j
What challenges will developers face in 2020?
Hunger: There are a number of challenges that should concern developers.
One is the more widespread use of black-box algorithms and tools for supporting decision making in areas that affect other humans sometimes in drastic ways. It would be good if more folks would get involved in accountability and responsibility around algorithmic biases, use of technology to unduly influence democratic processes and manipulation. On the other hand, there are many possibilities where we as a profession can contribute to solving crucial problems in climate, healthcare and business.
One topic I would love to see continuously worked on is to reach equality of underrepresented minorities in tech, and that’s something each team can work on in their hiring processes and by supporting groups and initiatives to make our space more welcoming, safe and interesting.
On the technical side we will see more and more complex cloud systems and architectures, introducing challenges around orchestration, versioning, consistent APIs and interoperability. The first (microservice) cloud applications will become legacy and it will be interesting to see how that’s addressed.
The growth of out-of-the-box solutions and templates will lead to more “low-code” development that favors convention over configuration and hopefully code (DSLs) over configuration. Open source use and collaborations will continue to grow as we’re seeing in the data processing, machine learning and front-end spaces.
Topics like cryptocurrencies, I think, will be re-evaluated under more scrutiny. Ubiquitous edge computing, mobile apps and IoT will grow both in industry but also personal use as more and more devices, wearables and sensors collect, process and learn from data. Managing that volume in a sensible, secure and unbiased way that actually benefits the users will be a main challenge.
In development, we will see more adoption of new features and languages in popular runtimes like Java Virtual Machine.
Working at Neo4j, I am glad that several of those topics will benefit from utilizing connected data at high volumes and complexities, so that we can contribute a bit to make the life of developers easier and the world a bit better.
What will the developer experience look like moving to the cloud?
Hunger: Developers going to the cloud will want to use resources more efficiently utilizing reactive architectures, serverless functions and serverless (on demand) services. There will be more efforts to reduce and also increase vendor lock-ins that developers have to deal with.
Interesting areas are continued moves to static front-end deployments, GraphQL for APIs, and zero config setups. I already mentioned the orchestration and versioning challenges above. I think the microservice pendulum might swing a bit back to more realistically sized components.
There is definitely a clear trend towards cloud-hosted infrastructures, from managed databases to serverless functions and low-code approaches. On all levels of the development stack we see a very active evolution, be it in the runtimes like the JVM, middleware like Kafka or Akka, the API layers with GraphQL and the frontend with React and Vue.js. To reduce time to market, I think more opinionated stacks will continue to dominate, that favor convention over configuration and allow quick turnaround times for application development. An ongoing challenge is of course the management of highly distributed and complex systems (microservices) and software architectures, an area that graph databases are also widely used for.
Amit Chaudhry, Vice President of Product Marketing, Neo4j
Evolving from 2019, what do you think we’ll see continue to take shape in 2020?
Chaudry: This past year has demonstrated that “big data” may be a thing of the past. It’s not about the size of the data set, it’s about the value. As more enterprises move toward graph database adoption, semantic tech will begin to crack the code on AI, ML, NLP and learning applications. Graph databases are suited for these technologies in ways that relational databases are not.
As AI and ML continues to be valuable and relevant to the enterprise, there will be an increased need for refined algorithms that can be achieved with graph technology. We’ve seen that data without context can be damaging for algorithms and that we must be scrutinous of the data supply chain (i.e. the learning set) that factors into algorithms. As we begin to incorporate more context into AI, the algorithms we produce will be able to suss out solutions from more abstract scenarios and effortlessly come up with the answers we need.
Looking ahead, graph analytics will only grow in importance and continue to breathe life into business intelligence. What’s been offered to analysts from a business intelligence perspective historically has not changed for quite a while. We’re going to see traditional business intelligence analysts starting to adopt graph algorithms and graph analytics and from there, deliver connected data that provides a whole new point of view and perspective.
Where do you see things taking an entirely new direction?
Chaudry: Over the next decade, GQL (Graph Query Language) will emerge as the most relevant language standard for how we work with data. GQL will emerge as a standard for graph database vendors. 2020 will see the decline of RDF and Gremlin for graph queries. GQL is great news for users because it means interoperability, portability and higher availability of skill-sets and will only accelerate graph adoption in large conservative enterprises.
Well, there you have it. Hold onto your hats, graphistas!
Take the Class