作者:需要鼓励的小昱
审核:Listenlii
library(phyloseq)
packageVersion("phyloseq")
## [1] '1.32.0'
data("GlobalPatterns")# phyloseq 自带文件
GlobalPatterns
write.table(otu_table(GlobalPatterns), "GlobalPatterns_otu_table.txt", sep="\t", row.names=TRUE, col.names=NA, quote=FALSE)
在服务器运行这部分代码
# 把文件放到工作文件夹中
echo -n "#OTU Table" | cat - GlobalPatterns_otu_table.txt > GlobalPatterns_otu_biom_table.txt
biom convert -i GlobalPatterns_otu_biom_table.txt -o GlobalPatterns_otu.biom --table-type="OTU table" --to-hdf5
Phyloseq 提供的数据提取不出来序列文件。只能用自己的了
library("phyloseq")
library("dplyr")
library("biomformat")
library("Biostrings")
load(file = "ps_rs_rsingle_CK_AT.rda")#载入一个phyloseq文件
whole_seq_fasta <- readDNAStringSet("../raw_data/RDP-Bacteria-sequences.fasta")# 代表性序列文件
whole_sequences_fasta_ASV <- names(whole_seq_fasta)
whole_sequences_fasta_sequences <- paste(whole_seq_fasta)
whole_sequences_fasta_df <- data.frame(whole_sequences_fasta_ASV,
whole_sequences_fasta_sequences)
从整个数据的fastq文件,提取出来我们想要的丰富种序列。
AT_tax_picrust <- taxa_names(ps_rs_CK_AT)
AT_tax_seq_fasta_df <- whole_sequences_fasta_df[with
(whole_sequences_fasta_df,whole_sequences_fasta_ASV %in% AT_tax_picrust),]
AT_tax_seq_fasta_df_names <- rename(AT_tax_seq_fasta_df,
c( "whole_sequences_fasta_ASV" = "abundance",
"whole_sequences_fasta_sequences" = "sequence")
)
uniquesToFasta(AT_tax_seq_fasta_df_names ,
fout = "../rsingle_data/R_output/PICRUST2_data/AT_tax_rep_seqs.fna", #路径
ids = AT_tax_seq_fasta_df_names$abundance)
最后就得到了PICRUST2需要的OTU表和序列文件了!
PS:
一个问题,如果想分别分析稀有种和优势种得功能,是整个跑Picrust之后再挑出不同类型的物种,还是先挑出不同类型物种再做分析?
github上有人回答建议整个跑出来之后再挑不同类型物种的结果查看。
It would be best to generate the stratified output (mentioned in the tutorial), which you can then analyze based on the different sets of ASVs you're interested in. You might also just want to look at the output of hsp.py specifically, as it sounds like you just want to compare the predicted genome annotations for rare and abundant ASVs. https://github.com/picrust/picrust2/issues/258