Solutions

Try figuring out the solutions yourself before looking them up here. All documentation required to answer the questions are located in the hyperlinks in the tutorial.

Power law and scale-free networks

The fit_power_law functions fits a power law to the degree distribution of the network. The values for the fit are compared to expected values with the Kolmogorov-Smirnov test. The null hypothesis for this test is that the degree distribution is drawn from a reference distribution. In this case, that reference distribution is generated from a power law.

The null hypothesis can only be rejected if the p-value of the test is below 0.05. Here, the p-value is 0.97. Therefore, we cannot conclude that the degree distribution is drawn from a different distribution than the power-law distribution.

Scale-free networks are networks with a degree distribution that follows a power law. Our result indicates that the network may be scale-free and contains nodes with a degree far larger than the average degree. While there is not that much known about the effect of scale-freeness on microbial networks, studies on the internet (for an example, see Cohen and colleagues, 2001) indicate that scale-freeness decreases the network’s sensitivity to random attacks. However, we still do not know to what extent biological networks follow a power law as we have few true biological networks. Lima-Mendez and van Helden (2009) discuss some of the weaknesses of this theory.

Betweenness centrality

The code snippet below shows how to calculate and plot betweenness centrality. If a node has low degree but connects two (or more) subcomponents of a network, it can have low degree but high betweenness centrality. In this case, we have one large central component and no clear subclusters. Therefore, the nodes that have higher degree tend to have higher betweenness centrality as well.

spiec.bet <- betweenness(spiec.graph)
hist(spiec.bet)

plot_network(spiec.graph, phyloseqobj.f, type='taxa', color="Rank3", label=NULL) + geom_point(aes(size=spiec.bet), colour='deepskyblue4', shape=1)

centralities <- data.frame(degree=spiec.deg, betweenness=spiec.bet)
ggplot(data=centralities, aes(x=degree, y=betweenness)) + geom_point() + geom_smooth(method='auto')

We can generate a toy network with one node that has high betweenness centrality, but only a degree of 2.

toy <- graph(edges=c(1, 2, 2, 3, 1, 3, 3, 4, 4, 5, 5, 6, 7, 6, 5, 7), directed=FALSE)
plot(toy)

degree(toy)

## [1] 2 2 3 2 3 2 2

betweenness(toy)

## [1] 0 0 8 9 8 0 0

Closeness centrality

We can remove small values caused by disconnected components to better visualize differences in the network. The network now shows that nodes at the periphery of the network have the lowest closeness centrality. There is one in particular that has an extremely low value; nearly all shortest paths from this node need to traverse 3 extra nodes to reach the central part of the network.

spiec.close <- closeness(spiec.graph)
spiec.close[spiec.close < 0.0001] = NA
hist(spiec.close, nint=35)

centralities$closeness <- spiec.close
centralities$closeness[spiec.close < 0.0001] = NA
ggplot(data=centralities, aes(x=degree, y=closeness)) + geom_point() + geom_smooth(method='auto')

plot_network(spiec.graph, phyloseqobj.f, type='taxa', color="Rank3", label=NULL) + geom_point(aes(size=spiec.close), colour='deepskyblue4', shape=1)

Assortativity

The code snippets below calculate degree assortativity, edge weight assortativity and phylogenetic assortativity. The degree assortativity is close to zero. This indicates that nodes with a high degree are slightly more likely to connect to other high-degree nodes. For edge weight assortativity, nodes with a certain number of positive edges are slightly more likely to connect to nodes with a similar number of edges. This is not the case for negative edges, although this value is extremely small. The phylogenetic assortativity is also interesting: at the class level, nodes are apparently a bit assortative.

# degree assortativity
assortativity.degree(spiec.graph)

## [1] 0.009684691

# edge weight assortativity
negedges = rep(0, length(V(spiec.graph)))
posedges = rep(0, length(V(spiec.graph)))
connecting_edges <- incident_edges(spiec.graph, V(spiec.graph), mode="all")
i = 1
for (node in connecting_edges){
  if (length(node) != 0){
    for (edge in node){
      weight <- get.edge.attribute(spiec.graph, 'edge_weight')[edge]
      if (weight > 0){
        posedges[i] = posedges[i] + 1
      }
      else if (weight < 0){
        negedges[i] = negedges[i] + 1
      }
    }
  }
  i = i + 1
}
assortativity(spiec.graph, posedges)

## [1] 0.04549251

assortativity(spiec.graph, negedges)

## [1] 0.07412277

# taxonomic assortativity
levels <- c("Kingdom", "Phylum", "Class", "Order", "Family", "Genus", "Species")
for (k in 2:7){
  tax <- tax_table(phyloseqobj.f)[,k]
  # need to convert tax to vertex types; nominal
  tax_dict <- data.frame(unique(tax), seq(1:length(unique(tax))))
  tax_nom <- c()
  for (i in 1:length(tax)){
    tax_nom[[i]] <- tax_dict[tax_dict[,1] == tax[[i]], 2]
  }
  assort <- assortativity_nominal(spiec.graph, tax_nom)
  print(paste(levels[k], assort))
}

## [1] "Phylum 0.0939315886174431"
## [1] "Class 0.110853926655527"
## [1] "Order 0.0897543764418511"
## [1] "Family 0.104111573613561"
## [1] "Genus 0.0387637891273372"
## [1] "Species -0.0136307311028441"

Permutations

The reason why we are not using a t-test is because the distribution of values for some centralities may be rather different for permuted networks. If some rewired edges create a node with exceptionally high degree, the betweenness centrality values for all other nodes will drop. The opposite can happen if a node with high degree is permuted. Changes in the distribution should not affect the position of a node in the centrality ranking too much, so going with the top # can be more robust.

The code snippet below checks for each top # whether nodes that were in the original top # are still present in the permuted top #. The ratio of permutations where the nodes kept their positions is the robustness score. Therefore, a score of 1 indicates that nodes are very robust to errors. As you can see in the final figure, the degree robustness is stable at 1. However, all other centralities do not consistently reach 1; nodes shift in ranking more easily, which explains why the top 1 closeness node has lower robustness than most of the top 10 closeness nodes. At the same time, the graph also indicates that some high-ranking nodes have low robustness and may be dependent on one or two specific edges. For example, the eigenvector centrality is quite robust for some nodes, but highly variable for two others.

permresults = list()
for (i in 1:100){
  perm.deg <- degree(permlist[[i]])
  perm.bet <- betweenness(permlist[[i]])
  perm.close <- closeness(permlist[[i]])
  perm.eigen <- eigen_centrality(permlist[[i]])
  perms <- data.frame(degree=perm.deg, betweenness=perm.bet, closeness=perm.close, eigenvector=perm.eigen$vector)
  permresults[[i]] <- perms
}

nums <- c(1, 5, 10, 19)
results <- list()
for (m in 1:4){
  num = nums[[m]]
  data <- data.frame(matrix(0, ncol=4, nrow=4*num))
  colnames(data) <- c('ID', 'centrality', 'robustness', 'top')
  data$top <- num
  topdegree <- rownames(centralities[order(centralities$degree, decreasing=TRUE),])[1:num]
  topbetweenness <- rownames(centralities[order(centralities$betweenness, decreasing=TRUE),])[1:num]
  topcloseness <- rownames(centralities[order(centralities$closeness, decreasing=TRUE),])[1:num]
  topeigen <- rownames(centralities[order(centralities$eigen, decreasing=TRUE),])[1:num]
  data$ID <- c(topdegree, topbetweenness, topcloseness, topeigen)
  data$centrality <- c(rep("degree", num), rep("betweenness", num), rep("closeness", num), rep("eigen", num))
  for (i in 1:100){
    permdegree <- rownames(permresults[[i]][order(permresults[[i]]$degree, decreasing=TRUE),])[1:num]
    permbetweenness <- rownames(permresults[[i]][order(permresults[[i]]$betweenness, decreasing=TRUE),])[1:num]
    permcloseness <- rownames(permresults[[i]][order(permresults[[i]]$closeness, decreasing=TRUE),])[1:num]
    permeigen <- rownames(permresults[[i]][order(permresults[[i]]$eigen, decreasing=TRUE),])[1:num]
    hits <- c(topdegree %in% permdegree, topbetweenness %in% permbetweenness, topcloseness %in% permcloseness, topeigen %in% permeigen)
    data[hits,3] <- data[hits,3] + 1
  }
  data$robustness <- data$robustness / 100
  results[[m]] <- data
}

data <- rbind(results[[1]], results[[2]], results[[3]], results[[4]])

ggplot(data=data, aes(x=data$top, y=data$robustness, color=data$centrality)) + geom_smooth() + geom_point() + 
  labs(x='Number of top nodes', y='Robustness', color='Centrality measure') + scale_y_continuous(limits=c(0,1))

Node importance - answers

Lisa Rottjers

27 november 2018