Network analysis - node importance

In this tutorial, you will carry out an analysis of two different networks with manta and anuran. If you have not yet installed these tools, please follow the instructions on their homepages. Both software tools have a command line interface. This means that you do not need to know Python: all commands can be run on a Unix shell or Windows Command Prompt.

Step 1 - Data

If you have worked through previous tutorials, you may be familiar with the arctic soils network. You can find the paper with the data here. In this analysis, we will compare a FlashWeave network with the SPIEC-EASI network. You can find both files in the repository. Download them from these links: SPIEC-EASI and FlashWeave by right-clicking the page, selecting ‘Save page as…’ and saving them as .graphml files.

The networks are shown below: the SpiecEasi network on the left, and the FlashWeave network on the right.

Step 2 - Checking your installation

Now that you have downloaded the networks, you can run manta on it. Open your terminal and check if your manta installation works by running the help command. If you have correctly installed manta, you should get an explanation of all the parameters.

manta -h

Step 3 - Running manta on the SPIEC-EASI and FlashWeave networks

There are two ways to import networks with manta: either you specify the complete file path, or you navigate to the directory where you saved the graphml files. Try navigating to the directory, since it will save you a lot of typing! Unfamiliar with command line? Check the solutions.

Now that you are in the right location, run manta. Only the -i and -o parameters are mandatory; other parameters implement different features or set parameters of the clustering algorithm itself. With the –layout flag, the software will generate a layout as well, so the graph will be organized by cluster if it is imported into Cytoscape.

manta -i arctic_spiec.graphml -o spiec_clustered.cyjs --layout 

How would you generate cluster robustness scores? And what is the difference between cluster assigments done on the network with edges set to -1 and 1, compared to the edge weights generated by SPIEC-EASI? Which nodes tend to have low robustness? Solution.

Since the network is in a similar format, we can use the same command as with the SPIEC-EASI network. Can manta cluster this network without edge weight conversion?

Step 4 - Running anuran on the SPIEC-EASI and FlashWeave networks

Although both networks qualitatively look rather similar (and have similar clusters), it would be nice to have some idea of just how similar they are. For example, are the most central nodes in these networks identical? We can check such network properties with anuran.

Like before, first check your installation.

anuran -h

While manta runs on one network at a time, anuran accepts (multiple) groups of networks. It will try to import all networks in a folder and run analyses on these. Therefore, move your FlashWeave and SPIEC-EASI networks to a folder and give anuran the name of the folder. We can get some initial figures with the -draw flag. Do you think that the networks are very similar? How big is the overlap? Solution.

anuran -i folder -o demo -draw

With the default plotting, the y-axis is a bit inconvenient because the difference is so much larger than the intersection. We can import the csv file and plot the intersection with ggplot2.

library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.6.1
data <- read.csv('demo_sets.csv')
data <- data[data$Set.type == 'Intersection 1',]
data$Network <- factor(data$Network, c('Input', 'Degree', 'Random'))
ggplot(data, aes(x=Network, y=Set.size, colour=Network)) + geom_point() + theme_minimal()

The intersection sizes that are generated, are a function of the matching edges across networks. Networks can be similar even if the edges are not. For example, nodes that occupy a central position in one network may also be central in the second network, even if they share no neighbours. We only need to add a single flag to study node centralities with anuran. We will also reduce the number of null models and iterations, so it runs a bit faster.

anuran -i folder -o demo -draw -c -perm 5 -nperm 10

With R, we can import the csv file that was just generated.

centralities <- read.csv('demo_centralities.csv', stringsAsFactors = FALSE)
knitr::kable(head(centralities, 3))
X Node Network Group Network.type Conserved.fraction Prevalence.of.conserved.fraction Centrality Upper.limit Lower.limit Values
0 28341 Input anuran Input networks NA NA Degree 1.0000000 0.0000000 [(‘arctic_flash.graphml’, 0.598404255319149), (‘arctic_spiec.graphml’, 0.38563829787234044)]
1 146397 Input anuran Input networks NA NA Degree 1.0000000 0.6221754 [(‘arctic_flash.graphml’, 0.9335106382978723), (‘arctic_spiec.graphml’, 0.9867021276595744)]
2 209662 Input anuran Input networks NA NA Degree 0.8764875 0.7751082 [(‘arctic_flash.graphml’, 0.8297872340425532), (‘arctic_spiec.graphml’, 0.8218085106382979)]

The table above shows the first three values of the csv file. In this case, it shows for three taxa the 95% confidence interval of the degree (Upper.limit and Lower.limit). However, we only have two measurements: one centrality value per network. Let’s extract the degree rankings from the right column and plot these against each other. We can do this with a little string parsing. A function that will parse the values is given below; can you loop over the dataframe and generate columns with centralities for FlashWeave and SPIEC-EASI? Keep in mind that taxa may be in one network, but not the other! Solution.

library(stringr)
value <- centralities$Values[[1]]
parts <- strsplit(value, ', ')
flash <- as.numeric(str_sub(parts[[1]][2], end=-2))
spiec <- as.numeric(str_sub(parts[[1]][4], end=-3))

With the centralities extracted, we can test several things: 1. Is there a strong correlation between high rankings in the FlashWeave and SPIEC-EASI networks? 2. Is this correlation similar for the null models?

Please find the figure for the degree centrality below. Try generating it for the other centralities yourself. Solution.

library(ggplot2)
networks <- centralities[centralities$Network == 'Input',]
networks <- networks[networks$Centrality == 'Degree',]
ggplot(data=networks, aes(x=FlashWeave, y=SpiecEasi)) + geom_point(alpha=0.05) + geom_smooth() + theme_minimal()

Not only is the overlap between assocations much larger, the centralities are rather well correlated. A node that is central in the FlashWeave network tends to be central in the SpiecEasi network. However, we do see that nodes at the ‘intermediate’ end have a lot of variablity, and not all nodes that are central in the Flashweave network are very central in the SpiecEasi.

One reason for this variability could be that the edges are conserved at higher taxonomic levels: sometimes, one taxon from a specific genus may have a significant association, but in a different network, another taxon from the same genus may get that association. We can look into this by agglomerating abundances by taxonomy.

You don’t need to run FlashWeave and SpiecEasi yourself. Download the networks from the repository and run anuran on them.

anuran -i folder -o demo_genus -draw -c -perm 5 -nperm 10

With the genus-level networks, we can now run anuran again and compare the intersection to the previous intersection. Do you think the variability of the degree is lower? Is the intersection, as a fraction of the network size, larger or smaller? Solution.