查看原文
其他

Y叔 2018-06-04

MetaPhlan产生的文件,现在可以通过https://github.com/lch14forever/microbiomeViz包解析并可视化,这个包基于ggtree,产生类似于GraPhlAn的图:

下面的实例是基于phyloseq对象,它是ggtree内部支持的对象,包含有宏基因组的相关数据,可以直接用于树的可视化,并且通过ggtree,可以产生比phyloseq官方支持还要高大上的树,且可以用更多的数据用于树的注释。

ggtree can parse many software outputs and the evolution evidences inferred by these software can be used directly for tree annotation. ggtree not only works as an infrastructure that enables evolutionary data that inferred by commonly used software packages to be used in R, but also serves as a general tree visualization and annotation tool for the R community as it supports many S3/S4 objects defined by other R packages.

phyloseq for microbiome data

phyloseq class defined in the phyloseq package was designed for microbiome data. phyloseq package implemented plot_tree function using ggplot2. Although the function was implemented by ggplot2 and we can use theme, scale_color_manual etc for customization, the most valuable part of ggplot2, adding layer, is missing. plot_tree only provides limited parameters to control the output graph and it is hard to add layer unless user has expertise in both phyloseq and ggplot2.

library(phyloseq)

data(GlobalPatterns)
GP <- prune_taxa(taxa_sums(GlobalPatterns) > 0, GlobalPatterns)
GP.chl <- subset_taxa(GP, Phylum=="Chlamydiae")

plot_tree(GP.chl, color="SampleType", shape="Family", label.tips="Genus", size="Abundance") + ggtitle("tree annotation using phyloseq")

PS: If we look at the plot careful, we will find that legend produce by plot_tree is not correct (plot_tree map SampleType to color text which was shown in legend, but we can't find the mapping in the plot).

ggtree supports phyloseq object

One of the advantage of R is the community. R users develop packages that can work together and complete each other. ggtree fits the R ecosystem in phylogenetic analysis. It supports several classes defined in other R packages that designed for storing phylogenetic tree with associated data, including phyloseq.

library(scales)
library(ggtree)
p <- ggtree(GP.chl, ladderize = FALSE) + geom_text2(aes(subset=!isTip, label=label), hjust=-.2, size=4) +
   geom_tiplab(aes(label=Genus), hjust=-.3) +
   geom_point(aes(x=x+hjust, color=SampleType, shape=Family, size=Abundance),na.rm=TRUE) +
   scale_size_continuous(trans=log_trans(5)) +
   theme(legend.position="right") + ggtitle("reproduce phyloseq by ggtree")
print(p)

With ggtree, it would be more flexible to combine different layers using grammar of graphics syntax and more powerful since layers can be added without limitation (i.e. those predefined in plot_tree function). As an example, I extract the barcode sequence from the tree object and use msaplot to visualize the barcode sequence with the tree.

df <- fortify(GP.chl)
barcode <- as.character(df$Barcode_full_length)
names(barcode) <- df$label
barcode <- barcode[!is.na(barcode)]
msaplot(p, Biostrings::BStringSet(barcode), width=.3, offset=.05)

传送门

    您可能也对以下帖子感兴趣

    文章有问题?点此查看未经处理的缓存