查看原文
其他

R总给我惊喜

Y叔叔 biobabble 2020-02-05

Hello!


First of all, I would like to thank you for this wonderful and very powerful package! I have tried to plot a phylogenetic tree with heatmap of associated matrix (with gheatmap).


I found that the row names of matrix doesn't exactly match to the tip names of tree in case if there is missing data in associated matrix.


In other words, if we have two species (e.g., Species198 and Species1981), but only one of them is represented in the associated matrix, we will have colored cells for both species in the heatmap.


Here is a reproducible example:

library(ggtree)
set.seed(111)
# Prepare a species list
tipss <- c("Fomes", "Rozella", "Saitoella", "Entorrhiza", "Cryptococcus", "Tremella", "Puccinia", "Amoeboaphelidium")
tipss <- c(tipss, "Species198", "Species1981")
# Generate random tree
trr <- ape::rtree(n = length(tipss), rooted = TRUE, tip.label = tipss)
# Generate associated data matrix for each species
abunds <- matrix(data = sample(1:100, size = 4*length(tipss)), nrow = length(tipss))
rownames(abunds) <- tipss
colnames(abunds) <- paste("Samp", 1:ncol(abunds), sep="")
# Remove data for some species
abunds <- abunds[-which(rownames(abunds) == "Species198"), ]     # !! no information for this species
abunds <- abunds[-which(rownames(abunds) == "Cryptococcus"), ]
tt <- ggtree(trr) + geom_tiplab()
gheatmap(tt, abunds)

Here is the resulting picture:

You may see that heatmap cells are blank for Cryptococcus (as expected).


However, cells corresponding to Species198 are colored with Species1981 data (marked with red arrow).


With best regards,
Vladimir

有人在GitHub上对我报了这个bug,说我的gheatmap函数不是精确匹配,这不能够吧?!但可重复性的代码已经说明了一切,我翻一下代码,最终发现是R的问题,R再一次给了我们惊喜,比如下面演示的:

> dd
                Samp1 Samp2 Samp3 Samp4
Fomes               47    58     3    13
Rozella            100    83    27    18
Saitoella           36    51    78    63
Entorrhiza          70    32    43    16
Tremella            75    73    35    20
Puccinia            61    53     8    72
Amoeboaphelidium    95    66    92    19
Species1981         30    48    34    41
> dd['Species198',]
           Samp1 Samp2 Samp3 Samp4
Species1981    30    48    34    41

这就是问题之所在,当然这个bug我已经修正了,现在再使用gheatmap将不会有这个问题,但我们在使用R的时候,还是要小心哦,以后按名字取子集的时候,还是不要用[的好,R总是能给人惊喜,比如《你的数据被化了妆?》,以及集中吐槽的《R的诡异事件》,都值得重新温习一遍.


往期精彩

    您可能也对以下帖子感兴趣

    文章有问题?点此查看未经处理的缓存