On Connected Data and Graphs
It’s been a while since I last wrote about data APIs. Recently, I have been experimenting with a powerful tool that is revolutionizing the way we think about data.
Although not a new discovery by any stretch, graphs, and graph databases have captured recent attention due to newly discovered applications in areas such as recommender systems, link prediction, traffic analysis, and image processing.
My moment of personal epiphany with graphs came a few years ago while participating in a very enlightening research project to explore innovative ways to enhance tools that manipulate knowledge graphs using RDF. Specifically, my team and I implemented a framework to run path queries over weighted RDF graphs. The associated paper presents the findings of this research in more detail.
Little did I know that being part of such a project was going to spark an evergreen interest in this powerful tool.
Just like AI, graphs, and knowledge graphs in particular, went through a short period of time in which the initial inflated expectations failed to deliver and quickly approached the trough of disillusionment.
However, just like all timeless technologies, graphs are getting to the point where real applications and their benefits are actually delivering on their initial promises. This comes as a consequence of new research projects and the work of companies such as Neo4J, DGraph, and TigerGraph.
But, why are graphs something worth considering?
Connected Data
Metcalfe’s Law of the Network states that:
The network value increases exponentially as more nodes are added to it. Increasing data’s connectedness further increases its value through additional context.
However, data connections are not created by simply dumping said data into a central location. They are created when data relationship information is treated as a first-class entity. Specifically, data relationships must be persisted, assigned properties, and used to develop the context for the applications using the data.
Once these relationships have been identified, numerous benefits become evident:
- Ability to make more informed decisions in real-time without running expensive queries
- A deeper understanding of the information provided by the data. This is achieved by the context provided by the relationships of any piece of information
- Since the data relationships are created and updated in real-time, actions, and decisions taken on the data are precise and relevant. No need to retrain models or run expensive batch analytics processes
As good as this sounds, there is a caveat: Having a deep understanding of data relationships is powerful… and complicated.
Connecting Data with Graphs
Having the right tool for a job matters. Graph databases are the right tool for processing highly connected data. Graph databases, unlike RDBMS, represent a more flexible and intuitive approach to storing data.
Graph databases make data relationships a core component of any data model. In other words, relationships and connections are created, stored, and processed in every step of the data lifecycle: From idea, to design, to implementation, to operation using a query language and to persistence within a scalable, reliable database system.
As a consequence, by using a graph database it is not necessary to infer data connections using expensive and complex structures, like foreign keys, or out-of-band processing, like MapReduce.
Historically, applications such as social networks, recommendation systems, and computer network tools, have been the signature killer application for graph databases. However, their utility and applicability go well beyond these realms. A great example of this is Lyft's Cartography which is a cloud security tool that consolidates cloud assets and their relationships.
All things considered, graph databases offer two major advantages over traditional RDBMS:
- Performance: Relational databases are not a good fit for agile change (adaptation). The data schema needs to be well defined from the beginning and subsequent changes usually bring undesired side-effects. On the other hand, native graph databases are optimized to store data in a way that matches the notion of what it represents.
- Flexibility: No real-world data model remains static for long. Businesses may change their objectives or want to test out new ideas without requiring lengthy database rebuilds. With graph databases, it is easy to add labels to nodes, properties to relationships, or refactor large sections of your graph without ceremony.
And that’s it for now! In future posts, we will analyze more practical applications of graphs using tools like Neo4J and the ways we can interact with it using Golang and Rust. Stay tuned!
Image by Gerd Altmann from Pixabay