Accessing the anuran API

What does anuran do?

The anuran package contains functions that can be used to establish the existence of conserved or unique patterns across network. You can find anuran here. At the moment, the README contains several standard commands and there are two vignettes available that demonstrate how to use anuran. The first vignette uses the demo data and looks at the soil microbiome in three different plots. The second vignette uses a data set of soil samples from the Arctic. Both of these vignettes use the command line interface (CLI). This means that you run anuran from your terminal and it will generate several files containing the outcomes of the analysis. However, you might want to tweak your analysis so you can compare your centrality value of choice, or you might want to use the null models to test the significance of network-level properties not currently supported by anuran. This is where the application programming interface (API) comes in.

What is an API exactly and why would you use it?

While a command line interface specifies how the command line input controls the software, the API specifies how you should interact with anuran in Python. For example, you don’t need to know how anuran works to be able to generate null models with it. For nearly all functions in the package, the Sphinx documentation describes its parameters and output. You can also get this documentation by importing the function and running help(function name).

In this demo, we will go through some of the most important functions that may be useful if you want to adapt anuran specifically to your research. You will see how to export a json file of null models, and how to generate a custom centrality score.

How are the anuran functions structured?

Each function in the package is part of a module. At the moment, there are 7 modules: centrality, draw, graphvals, nulls, sets, stats and utils.

The draw module contains code for generating figures with Seaborn, while the utils module contains functions that need to be imported by the other modules. For example, we generate the null models in parallel to reduce the computational time that anuran needs. The actual edge swapping is therefore implemented in utils, so the nulls module can import it and run several instances in parallel. Since you will not need to access these functions directly, you can ignore them.

The centrality and graphvals modules contain functions for calculating properties from groups of networks. If you want to change the centrality scores or graph properties, you can change functions in these modules to do this. With the nulls module, you can get the null models to do your own analysis, while you might adapt the sets module to change the definition of an intersection. For example, you might choose to define an intersection as a group of edges where two matching edges do not need to have the same association partners, but the association partners do need to be from the same family.

In this case, we will generate some null models first. We will use two networks that we inferred from the arctic soil data set. with two different software methods. Download them from these links: SPIEC-EASI and FlashWeave by right-clicking the page, selecting ‘Save page as…’ and saving them as .graphml files.

Null models

First, we define all the parameters of the function. The location where you saved the two networks should be given as a list (the brackets []), since anuran can accept multiple locations at once. The number n is the total number of null models to generate per network, while core determines the number of processor cores to use. The fraction and prevalence are then used to define a synthetic core; in this case, we will generate five groups of null models. The first group will not have a core network, the second two will have a core network that is 5% of the total network size and occurs in half or all of the networks, while the last two will have a core network that is 10% of the total core size and also occurs in half or all of the networks.

Keep in mind that especially the core prevalence needs to make sense: we cannot generate a core network with edges in 20% of networks, since we only have 2 networks. Additionally, the core network sets are derived from a single network. The two networks that have the same core will both be randomized from either the FlashWeave network or the SpiecEasi network, and therefore the set size depends strongly on the total size of these networks.

location = ['C:/Documents/demo']
n = 10
core = 4
fraction = [0.05, 0.1]
prevalence = [0.5, 1]

After we have defined the parameters, we need to correctly import the networks from the folder. Since anuran can import different groups, it retains this group structure in a Python dictionary where each key is the base name of the imported folder. We can just give the key an arbitrary name. We only need to define a single key-value pair, with the value being the list where the networks will be added.

The next part is to get all the graph files from the folder. You can use glob, or you can just give the full filenames.

Then, we iterate over the the files. For each file, we read the network into memory and then add it to the dictionary as a tuple. The tuple contains the name of the file (without the complete filepath) and a NetworkX object.

Because we add the network as a tuple, we can later on use the name if we want to find a specific null models.

networks = {'demo': list()}

import glob
files = [f for f in glob.glob(location + "**/*.graphml", recursive=True)]
files = ['C:/Documents/demo/arctic_flash.graphml', 'arctic_spiec.graphml']
for file in files:
    network = nx.read_graphml(file)
    networks['demo'].append((os.path.basename(file), nx.to_undirected(network)))

If you call the networks dictionary, you will see the structure of this object. It should now look like this (with different memory codes):

networks

{'demo': [('arctic_flash.graphml', <networkx.classes.graph.Graph at 0x1dc4c6b7fd0>),
          ('arctic_spiec.graphml', <networkx.classes.graph.Graph at 0x1dc4c796fd0>)]}

Next, we import the function that generates the null models (appropriately named generate_nulls) and run it.

from anuran.nulls import generate_null
random, degree = generate_null(networks, n=n, core=core, fraction=fraction, prev=prevalence)

And there we go! Two new objects with all the null models we want. One problem: they are still in memory. We can write specific networks to individual graph files, or we can just loop over all the networks and export them.

If we just want a single null model, we can access the two model objects. Both objects are dictionaries, with a key for each of the groups we imported. We can access the dictionary; it contains another dictionary with fully randomized null models and null models with a core.

random_models_only = random['demo']['random']

This object contains two lists, one for each of the networks in the demo folder. Each list contains 10 null models, again organized as tuples with a network name and the NetworkX object. The picture below illustrates the full structure. Each of the generated model types is a nested dictionary, with important anuran parameters like the folder name and filename used as dictionary keys. The networks themselves are stored as a list of lists; one list per network, with each list containing the specified number of null models. These are stored as a tuple so the file name associated with the model cannot be lost.

To get a single FlashWeave network, we just need to index the lists. We can then write the network to disk by accessing the second part of the tuple. Find the different NetworkX-supported formats here.

network = random_models_only[0][0]
network
import networkx as nx
nx.write_graphml(network[1], 'C:/Documents/demo/flashweave_null.graphml')

To write the entire random object to graphml files, we need to iterate over the object structure. Create a folder first, so you can store the null models there.

Can you repeat this for the degree object?

for file in random:
  for model in random[file]:
    if type(random[file][model]) == list:
        # if the type is a list, this must be the models without a core
        for i in range(len(random[file][model])):
          network = random[file][model][i]
          for j in range(len(network)):
            name = [file, model, str(j), network[j][0]]
            name =  'C:/Documents/demo/nulls/' + '_'.join(name)
            nx.write_graphml(network[j][1], name)
    elif type(random[file][model]) == dict:
      # if the type is a dict, then it contains core models and is organized by core size and prevalence
      for size in random[file][model]:
        for prev in random[file][model][size]:
          for i in range(len(random[file][model][size][prev])):
            network = random[file][model][size][prev][i]
            for j in range(len(network)):
              name = [file, model, str(size), str(prev), str(j), network[j][0]]
              name =  'C:/Documents/demo/nulls/' + '_'.join(name)
              nx.write_graphml(network[j][1], name)

with open('C:/Documents/demo/null_models.json', 'w') as fp:
  json.dump(random_json, fp)

If we want to generate networks in R, we can just read the graphml files with igraph. We can parse the filenames to fill in the object structure. Now the random_networks list contains all fully randomized networks named by their respective filename.

Can you import the other networks with a synthetic core to lists as well?

library(igraph)
file <- 'C:/Documents/demo/nulls/'

all_filenames <- list.files(path=file)
random_networks <- list()
for (filename in all_filenames){
  names <- strsplit(filename, '_')
  if (names[[1]][2] == 'random'){
    random_networks[[filename]] = read_graph(paste(file, filename, sep=''), format='graphml')
  }
}

Centralities

Do you want to change the centrality measure for your particular analysis? anuran only has one function that calculates all the centralities. Since most NetworkX centrality functions have interchangeable outputs, you can just adapt this function. It is located in the centralities module. A simple way to edit anuran is to download the zip file from Github and extract it. Then, navigate to the utils module so you can edit the centralities function.

def _generate_centralities_parallel(model_list):
    """
    This function takes a list of null models or networks,
    where each item in the list is a tuple.
    The tuple contains the network name and the NetworkX object.
    This function adds centrality rankings to the tuple.

    :param model_list: List of list of networks, with networks given as a tuple (name and networkX object)
    :return:
    """
    centrality_list = []
    for network in model_list:
        centrality_list.append((network[0], network[1],
                                {'Degree': _centrality_percentile(nx.degree_centrality(network[1])),
                                 'Closeness': _centrality_percentile(nx.closeness_centrality(network[1])),
                                 'Betweenness': _centrality_percentile(nx.betweenness_centrality(network[1]))}))
    return centrality_list

This function takes absolute scores for centralities and converts these to rankings, so different networks are comparable. We can try adding another centrality from here. Let’s add the load centrality. Try to add another centrality that you are interested in as well.

def _generate_centralities_parallel(networks):
def _generate_centralities_parallel(model_list):
    """
    This function takes a list of null models or networks,
    where each item in the list is a tuple.
    The tuple contains the network name and the NetworkX object.
    This function adds centrality rankings to the tuple.

    :param model_list: List of list of networks, with networks given as a tuple (name and networkX object)
    :return:
    """
    centrality_list = []
    for network in model_list:
        centrality_list.append((network[0], network[1],
                                {'Degree': _centrality_percentile(nx.degree_centrality(network[1])),
                                 'Closeness': _centrality_percentile(nx.closeness_centrality(network[1])),
                                 'Betweenness': _centrality_percentile(nx.betweenness_centrality(network[1])),
                                 'Load': _centrality_percentile(nx.load_centrality(network[1]))}))
    return centrality_list

After you have made your changes, navigate to the folder where you unzipped the anuran repository. It should contain a file called setup.py Reinstall your version of anuran with the following command:

python -m pip uninstall anuran
python setup.py install

You should now be able to run anuran on command line while it will calculate your centrality statistic of choice.