Setting up the database

Find all Python code used on this page here: custom_setup.py

Before we can write any queries, the database needs to be populated first. We will use the BIOM and graphml files from the Code Ocean capsule here. You can replace these with any other BIOM or graphml file of your choice, like the demo data in the vignette.

First, we load all software packages and collect all the file paths of the files we want to write to the database. For instructions on starting a Python interpreter, please see the API section of the manual.


import biom
import networkx as nx
import os
from mako.scripts.neo4biom import Biom2Neo
from mako.scripts.io import IoDriver

filepath = "C:/Users/username/demo/" # change this to where you downloaded the files

biom_filepaths = [filepath + "biomfiles/11766.biom", 
                  filepath + "biomfiles/11888.biom", 
                  filepath + "biomfiles/11947.biom",
                  filepath + "biomfiles/12021.biom",
                  filepath + "biomfiles/12716.biom"]
                  
network_filepaths = [filepath + "networks/11766.graphml", 
                     filepath + "networks/11888.graphml", 
                     filepath + "networks/11947.graphml",
                     filepath + "networks/12021.graphml",
                     filepath + "networks/12716.graphml"]

We will add the BIOM files first, so the next section imports the Biom2Neo driver, connects to the Neo4j database and then loops over the file paths to read the BIOM files and write them to the database. In this case, we are assuming that you are running Neo4j from the Docker container (link to Docker setup), so we will use these connection settings. We use the filenames to set the experiment ID in the database.


driver = Biom2Neo(uri='neo4j://localhost:7688',
                  user='neo4j',
                  password='test',
                  filepath=filepath,
                  encrypted=False)

for file in biom_filepaths:
    name = os.path.basename(file)
    name = name.split(".")[0]
    biomtab = biom.load_table(file)
    driver.convert_biom(biomfile=biomtab, exp_id=name, obs=True)
driver.close()

With the BIOM files uploaded, we can do the same thing for the network files, now using the IoDriver class.


driver = IoDriver(uri='neo4j://localhost:7688',
                  user='neo4j',
                  password='test',
                  filepath=filepath,
                  encrypted=False)

for file in network_filepaths:
    name = os.path.basename(file)
    name = name.split(".")[0]
    net = nx.read_graphml(file)
    driver.convert_networkx(network=net, network_id=name)
driver.close()

That’s it! Now that there is a populated Neo4j database, we can start running our own queries.