Writing custom queries

Find all Python code used on this page here: custom_queries.py

All of mako’s driver classes are able to use the functions that call custom queries, so there is no need to close and recreate the driver to use these functions; however, for the sake of simplicity, the code below calls the ParentDriver class.

First, we need to start up the driver and connect to the database.


from mako.scripts.utils import ParentDriver

driver = ParentDriver(uri='neo4j://localhost:7688',
                      user='neo4j',
                      password='test',
                      filepath=filepath,
                      encrypted=False)

The ParentDriver only has three class methods: close, query and write. The close method closes the connection to the Neo4j database. There is no need to run this every time you run a query, but it can be helpful to include it in a script so you are not opening many connections at a time. The query processes a read transaction, while the write processes a write transaction. Both of these methods require a Cypher query as input, but only the write method can make changes to the database.

Let’s first take a look at the structure of some simple query results. We will use this query:
MATCH p=(:Genus {name: ‘g__Escherichia’})–(:Taxon)–(:Edge) RETURN p

The query returns a pattern p, which consists of a Genus node, a Taxon node and an Edge node. When we run the query, we will get a list of all results; the section below shows our first result.


query = "MATCH p=(:Genus {name: 'g__Odoribacter'})--(:Taxon)--(:Edge) RETURN p"
query_results = driver.query(query)

print(len(query_results))
print(query_results[0])

10
{'p': [{'name': 'g__Odoribacter'}, 
        'MEMBER_OF', 
        {'name': '11947-agglom-107'}, 
        'PARTICIPATES_IN', 
        {'name': '7c942282-a6f1-4d38-b7e0-8ac3d89f69b1', 'weight': -0.30529758442530924}]}

We can see that the query result is returned as a dictionary, with each value (in this case only p) as a key in the dictionary. The value is a list, with each item in the list representing an item in the path. Nodes are returned as dictionaries, the other values are relationship labels.

To access values in the query, it can be helpful to process them. Neo4j does not keep track of the “direction” of the pattern matching; if a pattern matches the same three nodes in two ways (e.g. A–B–C and C–B–A), both matches will be returned as results. When studying motifs, it is therefore crucial to filter out these duplicates.

The script below does exactly this: it first makes a list of lists, with each sublist containing all node names (not relationships) in each pattern. The sublists are then sorted. Next, the set of each sublist is taken, so that they only contain unique nodes. For this query, this step is not actually necessary, but for loops like (A–B–C–A) it makes sure that we do not return (B–C–A–B) too. Finally, we return only unique sublists.


all_nodes = [[y['name'] for y in x['p'] if type(y) == dict] for x in query_results]
for y in all_nodes:
    y.sort()
all_nodes = [set(x) for x in all_nodes]
all_nodes = set(map(tuple, all_nodes))

Note that, if we change the query, we may need to change our queries to go along with this. For example, we can choose to return the genus assignment of the association partners rather than the pattern above. There is no need to specify the Taxon nodes, since the only connection between Genus and Edge nodes is via taxa.


query = "MATCH (:Genus {name: 'g__Odoribacter'})--()--(:Edge)--()--(n:Genus) RETURN n"
query_results = driver.query(query)

print(len(query_results))
print(query_results[0])

8
{'n': {'name': 'g__[Ruminococcus]'}}

We actually have fewer matches than before. Apparently, not all associations are with taxa that have a genus assignment. Morever, we can see that the structure of the results is the same; we get a dictionary that has a key identical to the parameter in our Cypher RETURN and a value that is a node dictionary.

If we are interested in finding those associations without a genus assignment, we can actually add this to the query. The WHERE NOT clause states that the node n cannot have a link to a Genus, while the WITH clause then uses this node to return the pattern matching the Family node.


query = "MATCH (:Genus {name: 'g__Odoribacter'})--()--(:Edge)--(n) 
         WHERE NOT (n)--(:Genus) 
         WITH n MATCH p=(n)--(:Family) RETURN p"
query_results = driver.query(query)

print(len(query_results))
print(query_results[0])

2
{'p': [{'name': '11947-agglom-78'}, 'MEMBER_OF', {'name': 'f__[Barnesiellaceae]'}]}

Finally, it is also possible that we may have specified a query that is technically correct, but the pattern does not match to any data in the database. For example, Genus nodes can only connect to Taxon nodes, and Taxon nodes cannot directly connect to a Network node. In this case, the query results are just an empty list.


query = "MATCH p=(:Genus {name: 'g__Odoribacter'})--()--(:Network) RETURN p"
query_results = driver.query(query)

print(query_results)

[]