anuran package¶
Submodules¶
anuran.centrality module¶
The functions in this module calculate intersections or differences of networks. The first function is a wrapper that subsamples networks from a list of null models to output a dataframe of set sizes.
-
anuran.centrality.
generate_ci_frame
(networks, random, degree, fractions, prev, perm, core)¶ This function estimates centralities from all networks provided in the network, random and degree lists. The random and degree lists are structured as follows: —List corresponding to each original network (length networks)
—List of permutations per original network (length n in generate_null)The core list is structured as follows: —List of all shared fractions (length fractions)
- —List corresponding to core prevalence(length core)
- —List of permutations per original network (length networks)
The function returns a pandas dataframe with the size of the intersection, the type of model and the shared fraction as a separate column. The length of the dataset is equal to the number of original networks, the number of permuted sets for the random models and the number of permuted sets for the degree-preserving model.
‘None’ values reflect that the species in question was not found in a network.
Parameters: - networks – List of input networks
- random – Dictionary with permuted input networks without preserved degree distribution
- degree – Dictionary with permuted input networks with preserved degree distribution
- fractions – List with fractions of shared interactions
- prev – List with prevalence of shared interactions
- perm – Number of sets to take from null models
- core – Number of processor cores
Returns: List of lists with set sizes
-
anuran.centrality.
generate_confidence_interval
(ranking)¶ Given a list with centrality rankings calculated from multiple networks, this function calculates the confidence interval.
Parameters: ranking – List of centrality rankings for each network Returns: Dictionary with nodes as keys and tuples of confidence intervals as values
anuran.draw module¶
The functions in this module visualize set sizes and other anuran outputs. Draw_sets visualizes the set sizes of the null models and original networks. Draw_samples shows the distribution of set sizes as the number of networks increases, for both null models and the input networks. Draw_centralities plots the upper limit of the confidence interval against the lower limit. Draw_graphs shows the graph properties for each of the networks used by anuran.
-
anuran.draw.
draw_centralities
(data, fp)¶ This function accepts a pandas dataframe with 5 columns: Node, Network, Network type, Conserved fraction, Centrality, Upper limit, Lower limit For every centrality a scatter plot is generated with the upper- and lower limits on the x and y axes respectively.
Parameters: - data – Pandas data frame
- fp – Filepath with prefix for name
Returns:
-
anuran.draw.
draw_graphs
(data, fp)¶ This function accepts a pandas dataframe with 5 columns: Network, Name, Group, Network type, Conserved fraction, Property, Value
Parameters: - data – Pandas data frame
- fp – Filepath with prefix for name
Returns:
-
anuran.draw.
draw_samples
(data, fp)¶ This function accepts a pandas dataframe with 6 columns: Network, Network type, Conserved fraction, Set type, Set size For every combination of set type a faceted box and whiskers plot is generated that visualizes the distribution of set sizes per network type.
Parameters: - data – Pandas data frame
- fp – Filepath with prefix for name
Returns:
-
anuran.draw.
draw_set_differences
(data, fp)¶ This function accepts a pandas dataframe with 4 columns: Interval, Set size, Group, Network. The interval is the difference of the intersections.
The interval is the median for the null model networks.
The function writes a bar plot of the intervals to path.
Parameters: - data –
- fp –
Returns:
-
anuran.draw.
draw_sets
(data, fp)¶ This function accepts a pandas dataframe with 5 columns: Network, Network type, Conserved fraction, Set type, Set size For every combination of set type a faceted box and whiskers plot is generated that visualizes the distribution of set sizes per network type.
Parameters: - data – Pandas data frame
- fp – Filepath with prefix for name
Returns:
anuran.graphvals module¶
The functions in this module calculate different graph-level properties.
The first function is a wrapper that subsamples networks from a list of null models to output a dataframe of set sizes.
-
anuran.graphvals.
generate_graph_frame
(networks, random, degree, fractions, core, perm)¶ This function estimates graph-level properties of all networks provided in the network, random and degree lists. The random and degree lists are structured as follows: —List corresponding to each original network (length networks)
—List of permutations per original network (length n in generate_null)The core list is structured as follows: —List of all shared fractions (length fractions)
- —List corresponding to core prevalence(length core)
- —List of permutations per original network (length networks)
The function returns a pandas dataframe with the size of the intersection, the type of model and the shared fraction as a separate column. The length of the dataset is equal to the number of original networks, the number of permuted sets for the random models and the number of permuted sets for the degree-preserving model. :param networks: List of input networks :param random: Dictionary with permuted input networks without preserved degree distribution :param degree: Dictionary with permuted input networks with preserved degree distribution :param fractions: List with fractions of shared interactions :param core: List with prevalence of shared interactions :param perm: Number of sets to take from null models :return: List of lists with set sizes
-
anuran.graphvals.
generate_graph_properties
(networks)¶ This function constructs lists with centrality rankings of nodes in multiple networks. Instead of using the absolute degree or betweenness centrality, this takes metric bias into account.
If the graph is not connected, the values are calculated for the largest connected component.
Parameters: networks – List of input networks Returns: Pandas dataframe with rankings
anuran.main module¶
anuran: Null models for replicate networks. The script takes a network as input and uses this to generate null models. The output of the null models is presented as a csv of set sizes and a t-test is used to assess whether set sizes are different than expected from the null model. Detailed explanations are available in the headers of each file.
anuran uses the file extension to import networks. Generation of null models is done on the adjacency matrix for speed; the NetworkX representation is unfortunately slower.
The demo data for anuran was downloaded from the following publication: Meyer, K. M., Memiaghe, H., Korte, L., Kenfack, D., Alonso, A., & Bohannan, B. J. (2018). Why do microbes exhibit weak biogeographic patterns?. The ISME journal, 12(6), 1404.
-
anuran.main.
main
()¶
-
anuran.main.
model_calcs
(networks, args)¶ Function for generating null models and carrying out calculations. :param networks: Dictionary with folder name as key and values as tuples (name, network object). :param args: Settings for running anuran :return:
-
anuran.main.
set_anuran
()¶ This parser gets input settings for running anuran. It requires an input format that can be read by NetworkX. Make sure to include the extension in the input filename as this is used to infer the file type.
anuran.nulls module¶
The null models module contains functions for constructing permutations of input networks. Generation of null models is done on the adjacency matrix for speed; the NetworkX representation is unfortunately slower. The functions can either change (random model) or preserve (degree model) the degree distribution.
The functions in this module also calculate intersections or differences of networks. The first function is a wrapper that subsamples networks from a list of null models to output a dataframe of set sizes.
These functions run operations in parallel. utils.py contains the operations they carry out.
-
anuran.nulls.
generate_null
(networks, n, npos, core, fraction=False, prev=False)¶ This function takes a list of networks. For each network, a list with length n is generated, with each item in the list being a permutation of the original network. This is returned as a list of lists with this structure: —List corresponding to each original network (length networks)
—List of permutations per original network (length n)For the positive controls, this list is inverted: —List of permutations across networks (length n)
—List corresponding to a single permuted group of networksTo generate the list through multiprocessing, a dictionary with arguments is generated and provided to a utility function.
Parameters: - networks – List of input NetworkX objects
- n – Number of randomized networks per input network
- npos – Number of positive control randomized networks per group
- core – Number of processor cores
- fraction – Fraction of conserved interactions
- prev – Prevalence of core. If provided, null models have conserved interactions.
Returns: List of lists with randomized networks
anuran.sets module¶
The functions in this module construct Pandas dataframes with set sizes for different operations.
-
anuran.sets.
generate_sample_sizes
(networks, random_models, degree_models, sign, core, fractions, prev, perm, sizes, limit, number)¶ This function wraps the the generate_sizes function but it only gives a random subset of the input networks and null models. This shows the effect of increasing sample number on set size.
Parameters: - networks – List of input networks
- random_models – List of permuted input networks without preserved degree distribution
- degree_models – List of permuted input networks with preserved degree distribution
- sign – If true, sets take sign information into account.
- core – Number of processor cores
- fractions – List with fractions of shared interactions
- prev – List with prevalence of shared interactions
- perm – Number of sets to take from null models
- sizes – Size of intersection to calculate. By default 1 (edge should be in all networks).
- limit – Maximum number of resamples.
- number – Sample number to test.
Returns: List of lists with set sizes
-
anuran.sets.
generate_size_differences
(data, sizes)¶ Since the intersections are nested, e.g. a 0.9 intersection is always nested inside a 0.5 intersection, we can extract differences of intersections to assess which intersections are relevant. So if we have the 0.9 and the 1 intersection, we know what number of edges belong to that range.
This function takes the dataframe from the generate_sizes function, and calculates the sizes of set of sets (difference of intersections). The difference of intersections is also referred to as the interval.
For the null models, the median set size per interval is returned.
The input dataframe should have the following columns: ‘Conserved fraction’, ‘Group’, ‘Network’, ‘Network type’, ‘Prevalence of conserved fraction’, ‘Samples’, ‘Set size’, ‘Set type’, ‘Set type (absolute)’
The returned dataframe only contains the intersection intervals.
Parameters: - data – Dictionary with permuted input networks without preserved degree distribution
- sizes – Size of intersection to calculate. By default 1 (edge should be in all networks).
Returns: Dataframe with intersection intervals
-
anuran.sets.
generate_sizes
(networks, random_models, degree_models, sign, core, fractions, prev, perm, sizes, combos=None)¶ This function carries out set operations on all networks provided in the network, random and degree lists. The random and degree lists are structured as follows: —List corresponding to each original network (length networks)
—List of permutations per original network (length n in generate_null)The core list is structured as follows: —List of all shared fractions (length fractions)
- —List corresponding to core prevalence(length core)
- —List of permutations per original network (length networks)
The function returns a pandas dataframe with the size of the intersection, the type of model and the shared fraction as a separate column. The length of the dataset is equal to the number of original networks, the number of permuted sets for the random models and the number of permuted sets for the degree-preserving model.
Parameters: - networks – List of input networks
- random_models – Dictionary with permuted input networks without preserved degree distribution
- degree_models – Dictionary with permuted input networks with preserved degree distribution
- sign – If true, sets take sign information into account.
- core – Number of processor cores
- fractions – List with fractions of shared interactions
- prev – List with prevalence of shared interactions
- perm – Number of sets to take from null models
- sizes – Size of intersection to calculate. By default 1 (edge should be in all networks).
- combos – Dictionary of networks to combine per network
Returns: List of lists with set sizes
anuran.stats module¶
The functions in this module take previously calculated properties and assesses whether it is likely that there are differences between these properties by reporting p-values.
Three properties can be assessed for significance: 1. Set sizes 2. Centrality scores 3. Graph properties
For all properties, the properties are compared to both the null models and different groups of networks (if included).
-
anuran.stats.
compare_centralities
(centralities, mc)¶ The centralities dataframe contains a list of all centrality ranks measured across a group of networks.
The centralities dataframe should have the following columns: Node, Network, Group, Network type, Conserved fraction, Prevalence of conserved fraction, Centrality, Upper limit, Lower limit, Values.
This function carries out a Mann-Whitney test to test whether the ranks are different across the two groups that are being compared. This means that it can compare groups with different n. Consequently, a group is compared to all networks from a specific group. Since it is possible to generate more than one network per original network, this means that it is possible to compare ranks for 6 networks to ranks of 6*10 networks.
Parameters: - centralities – Dataframe with centralities
- mc – Method for multiple-testing correction
Returns: pandas dataframe with p-values for comparisons
-
anuran.stats.
compare_graph_properties
(graph_properties)¶ This function takes a dataframe of graph properties. Each graph property is compared to other groups with the Mann-Whitney test.
Takes a pandas dataframe of graph properties with the following columns: Network, Group, Network type, Conserved fraction, Prevalence of conserved fraction, Property, Value.
Parameters: graph_properties – Dataframe with graph properties Returns: pandas dataframe with p-values for comparisons
-
anuran.stats.
compare_set_sizes
(set_sizes)¶ This function takes a dataframe of set sizes. Each set size is compared against a group (generated from null models); the p-value is computed by assessing how many standard deviations the set size is outside the distribution calculated from the null models.
The Interval property is the difference of intersections (e.g. Intersection 4 subtracted from Intersection 2).
Only the random and degree models, without a core, are likely to follow a normal distribution and therefore meet conditions for this test.
Takes a pandas dataframe of set sizes with the following columns: Network, Group, Network type, Conserved fraction, Prevalence of conserved fraction, Set type, Set size, Set type (absolute)
The :param set_sizes: Dataframe with set sizes :return: pandas dataframe with p-values for comparisons
-
anuran.stats.
correlate_centralities
(group, centralities, mc)¶ Returns correlations for ordered networks and compares these to the null model correlations. The function returns a dataframe comparing correlations in the ordered networks and their randomized versions.
The centralities dataframe should have the following columns: Node, Network, Group, Network type, Conserved fraction, Prevalence of conserved fraction, Centrality, Upper limit, Lower limit, Values.
Parameters: - group – name of grouped networks
- centralities – Dataframe with centralities
- mc – multiple-testing correction
Returns: Dataframe of correlations
-
anuran.stats.
correlate_graph_properties
(group, graph_properties)¶ Returns correlations for ordered networks and compares these to the null model correlations. The function returns a dataframe comparing correlations in the ordered networks and their randomized versions.
Takes a pandas dataframe of graph properties with the following columns: Network, Group, Network type, Conserved fraction, Prevalence of conserved fraction, Property, Value.
Parameters: - group – name of grouped networks
- graph_properties – Dataframe with graph properties
Returns: Dataframe of correlations
anuran.utils module¶
The utils module contains functions used by other modules for multiprocessing.