MetastatsDriver

This driver extracts nodes and edges from the database that are required for the operations defined in the metastats module.

MetastatsDriver.agglomerate_networks(level=None, weight=True, networks=None)

Agglomerates to specified taxonomic level, or, if no level is specified, over all levels. Edges are agglomerated based on similarity at the specified taxonomic level. If ‘weight’ is set to True, edges are only agglomerated if their weight matches. The stop condition is the length of the pair list; as soon as no pair meets the qualification, agglomeration is terminated. By default, agglomeration is done separately per network in the database, so each network gets an agglomerated version.

The networks parameter can be both a dict and a list. If it is a dict, the keys are the new network names, the values the old names.

Pseudocode representation:

  1. Duplicate networks
  2. For each edge pair (taxon-level)-taxon-taxon-(taxon-level) 3. Create new edge 4. Delete edge pair
  • level: Taxonomic level to agglomerate to
  • weight: If True, takes edge weight sign into account
  • networks: If specified, only these networks are agglomerated

MetastatsDriver.copy_network(source_network, new_network)

Copies a network node and its edges. The network node name is new_network, edge IDs are generated with uuid4. The weights of the edges are not copied, only the signs.

  • source_network: Source network name
  • new_network: New network name

MetastatsDriver.get_pairlist(level, weight, network)

Returns an edge pair. A pair is defined as two edges linked to taxonomic nodes that have the same taxonomic assignment at the specified level, e.g. Nitrobacter-edge-Nitrosomonas.

  • level: Taxonomic level to identify a pair
  • weight: If True, specifies that edge weights should have the same sign
  • network: Name of network that the pairs should belong to

MetastatsDriver.agglomerate_pair(pair, level, weight, network)

For one pair, as returned by get_pairlist, this function creates new agglomerated nodes, deletes old agglomerated nodes, and chains taxonomic nodes to the new agglomerated nodes. Morever, the two old edges are deleted and replaced by a new edge.

  • pair:
  • List containing transaction results of query for pair
  • level: Taxonomic level to identify a pair
  • weight: If True, specifies that edge weights should have the same sign
  • network: Name of network that the pairs should belong to

MetastatsDriver.get_taxlist(level, network)

Returns taxon pairs. A taxon pair is a list containing two edges linked to identical taxonomy, e.g. edge-taxon-Nitrobacter-taxon-edge.

  • level: Taxonomic level to identify a pair
  • network: Name of network that the pair should belong to

MetastatsDriver.agglomerate_taxa(pair, level, network)

For one pair, as returned by get_taxlist, this function merges nodes with similar taxonomy but different edges together. Old nodes are linked to the new agglomerated node, except for Agglom_Taxon; in that case,links to the ancestral nodes are generated.

  • pair:
  • List containing transaction results of query for pair
  • level: Taxonomic level to identify a pair
  • weight: If false, merges edges with different weights

MetastatsDriver.associate_samples(label, null_input=None)

To test the hypothesis that taxa are associated with specific sample properties, the following tests are performed:

  1. For qualitative variables, a hypergeometric test is performed; how many edges do we expect by chance?
  2. For quantitative variables, Spearman correlation is performed. Because this is a hypothesis-generating tool, multiple-testing correction should be applied with care.
  • label: Label of property (e.g. pH) to query
  • null_input: If missing values are not specified as NA, specify the NA input here

MetastatsDriver.associate_taxon(taxon, null_input, properties)

Tests whether specific sample properties can be associated to a taxon.

  • taxon: Name of a taxon
  • null_input: If missing values are not specified as NA, specify the NA input here
  • properties: List specifying names of properties to query