Standardized queries

We can use the schema specified below to design sets of standard queries that can be reused across microbiome studies. Because we have a number of standard node labels that should only have connections with certain other node labels, we can omit information from our queries that we would otherwise need to specify.

For example, we know that a Family node is never directly connected to an Edge node, but must always pass through a Taxon node. There is no other node that is able to connect taxonomy nodes to edges. As a result, both queries below should always give the same results:

MATCH (:Family {name: “f__Rhodobacteraceae”})–(:Taxon)–(:Edge)–(:Taxon)–(b:Family) RETURN b
MATCH (:Family {name: “f__Rhodobacteraceae”})–()–(:Edge)–()–(b:Family) RETURN b

However, we should specify the Edge node; according to the database schema, the pattern would otherwise also match all families that are found in the same specimen, since the central node in the pattern could also be a Specimen node.

In this section, we will go through several increasingly complex queries that can be useful when with mako. Since mako writes all files to the Neo4j database in the same way, the same queries can be reused for most data sets.