To make a knowledge graph, it is useful to have a vocabulary in place, which is called an ontology.
The Medical Subject Headings is one such ontology, which includes many of the medical terms that are currently being used.
It can be downloaded as an RDF file (N-triples), making it easy to import to Neo4j with neosemantics (n10s).
The next three commands will import the 2021 MeSH graph directly into Neo4j. It will take a moment before all 2 million nodes and 4 million relations are loaded in.
CREATE CONSTRAINT n10s_unique_uri ON (r:Resource) ASSERT r.uri IS UNIQUE;
CALL n10s.graphconfig.init();
CALL n10s.rdf.import.fetch("https://nlmpubs.nlm.nih.gov/projects/mesh/rdf/2021/mesh2021.nt","N-Triples");
Exploring the Data
Before I start, I will set the caption to rdfs__label for resources, so the nodes have a name. For ns0_Term, I will use ns0__prefLabel.
Let’s start with the sexiest thing to do — reading the documentation of RDF data structure of medical terms used to sort medical papers.
Did I say “sexy”? I meant nerdiest.
I will not go over the full structure; instead, I will select just two elements I think are interesting to start with. Feel free to disagree.
Code in grey blocks such as this one, is the cypher query you can use in Neo4j. It is not needed but might be useful if you want to know how I got the results, or they can serve as an example that my cypher is not optimized, up to standard, etc.
Terms, Descriptors, and Concepts
Descriptors, concepts, and terms are very closely related. Descriptors are the broadest — within descriptors, you have concepts (at least one that is the preferred one). Concepts have terms — these terms hold synonyms for the concepts. Each concept has one preferred term, while the descriptor also has one preferred term out of all (see picture below).
MATCH (q:ns0__Term)<-[]-(n:ns0__TopicalDescriptor)-[]->(p:ns0__Concept)-[]->(z:ns0__Term) WHERE (n.rdfs__label = "Calcimycin") return n, p, q, z
Terms are very useful for labeling text. Concepts can define a part that is smaller than the whole descriptor. The descriptor holds the connection to the rest of the graph (tree, other descriptors, SCR, Qualifiers, etc.). I will mainly focus on the descriptors for graph algorithms.
Tree Structure
All TopicalDescriptor have a link to a tree-number (ns0__treeNumber) and to another TopicalDescriptor (ns0__broaderDescriptor).
These two hold very similar information but have one use case where they differ: multiple tree locations.
A descriptor can be in more than one tree at the same time (like the descriptor “eye”). Eye has tree number A01.456.505.420 as a subcategory of face, and A09.371 as a subcategory of Sense Organ. This can give us problems because these two tree numbers do NOT have the same subcategories!
Eyebrows are part of the eye as part of the face but are NOT part of the eye as part of a sense organ.
If we use ns0__broaderDescriptor to go back from Eyebrows to the broadest description, we come upon a mistake. The broader description of Eyebrows is Eye, which has two broader descriptions (namely, sense organs and face). As Eyebrows is not a sense organ, this shouldn’t be correct.
MATCH (n:ns0__TopicalDescriptor)-[:ns0__broaderDescriptor*]->(p:ns0__TopicalDescriptor) WHERE n.rdfs__label = "Eyebrows" return n, p
The other way is to go via the tree numbers. This will mean Eyebrows is only connected to one of the two tree numbers of Eye and does NOT have Sense organs as a broader description.
MATCH (n:ns0__TopicalDescriptor)-[:ns0__treeNumber]->(t:ns0__TreeNumber)-[:ns0__parentTreeNumber*]->(p:ns0__TreeNumber)<-[:ns0__treeNumber]-(d:ns0__TopicalDescriptor) WHERE n.rdfs__label = "Eyebrows" return n, t, p, d
For this reason, I will use ns0__treeNumber to find hierarchical relationships rather than ns0__broaderDescriptor.
Conclusion
In conclusion, using the Medical Subject Headings (MeSH) ontology to create a knowledge graph is highly beneficial. By importing MeSH as an RDF file into Neo4j with neosemantics (n10s), we can easily explore the extensive collection of medical terms and their relationships.
Descriptors, concepts, and terms are essential components of MeSH. Descriptors encompass broad categories, concepts provide specific definitions within descriptors, and terms offer synonyms for concepts. Understanding the hierarchical structure is crucial for effective graph analysis, with tree numbers being a more reliable way to establish relationships than broader descriptors.
In summary, MeSH is a valuable resource for constructing medical knowledge graphs. Leveraging its rich information and employing appropriate graph analysis techniques, researchers can gain meaningful insights from medical literature and data.
MeSH Into Neo4j was originally published in Neo4j Developer Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.