Populating the database

For this section of the case study, we will interact with a running Neo4j database. If you do not have such a database available, please take a look at the Neo4j section of this web page to learn how to set up one locally or via Docker.

To upload our data, we need to open a terminal and navigate to the folder where we stored the files from the previous section. We can upload all files using the neo4biom and io modules. Remember to adjust the address and password if those are different for your Neo4j instance!

mako neo4biom -u neo4j -p test -a neo4j://localhost:7688 -cf -count ibd_taxa.tsv 
mako neo4biom -cf -tax ibd_lineages.tsv

Next, we will upload the relationships between species and metabolites. Since the metabolites are all strings, they will be uploaded as separate Metabolite nodes (since Metabolite is the column name in the file). We need to upload files in the correct order, so that the first column always contains nodes that already exist in the database.


mako io -cf -meta microbe_metabolite.tsv
mako io -cf -meta chemclass.tsv
mako io -cf -meta standardmatch.tsv

After completing the above tasks, you should be able to conveniently query all metabolites in the database. For example, we can easily find the chemical classes that certain metabolite clusters belong to with the query MATCH p=(:Metabolite)--(:Chemical_class) RETURN p LIMIT 25.

Screenshot of Neo4j Browser showing Cypher query outcome with Metabolite nodes.
Figure 1: Screenshot of Neo4j Browser showing Cypher query outcome with Metabolite nodes.