探究土壤中的ARGs (抗性基因)与环境的相关关系变化的文章,文章提供了完整的数据和代码,方便大家复现和借鉴。
Soil is an important reservoir of antibiotic resistance genes (ARGs) and understanding how corresponding environmental changes influence their emergence, evolution, and spread is crucial. The soil-dwelling bacterial genus Listeria, including L. monocytogenes, the causative agent of listeriosis, serves as a key model for establishing this understanding. Here, we characterize ARGs in 594 genomes representing 19 Listeria species that we previously isolated from soils in natural environments across the United States. Among the five puta- tively functional ARGs identified, lin, which confers resistance to lincomycin, is the most prevalent, followed by mprF, sul, fosX, and norB. ARGs are pre- dominantly found in Listeria sensu stricto species, with those more closely related to L. monocytogenes tending to harbor more ARGs. Notably, phyloge- netic and recombination analyses provide evidence of recent horizontal gene transfer (HGT) in all five ARGs within and/or across species, likely mediated by transformation rather than conjugation and transduction. In addition, the richness and genetic divergence of ARGs are associated with environmental conditions, particularly soil properties (e.g., aluminum and magnesium) and surrounding land use patterns (e.g., forest coverage). Collectively, our data suggest that recent HGT and environmental selection play a vital role in the acquisition and diversification of bacterial ARGs in natural environments.
Ying-Xian Goh文章提供了全套分析的代码和数据
# Initial imports
import glob
import os
from collections import OrderedDict, defaultdict
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
import shapefile
import skbio
import statsmodels.stats.multitest as smt
from mpl_toolkits.basemap import Basemap
from scipy import stats
from scipy.stats import ttest_rel
from statannot import add_stat_annotation
from scipy.stats import mannwhitneyu
from scipy.stats import shapiro
from statsmodels.stats.multitest import multipletests
# Folder paths
out_path = "/projects/leaph/yingxian/AMR_Listeria/Data/Diversity_Analysis/Out/"
blast_sum_path = "/projects/leaph/yingxian/AMR_Listeria/Data/Gene_Detection/Out/"
card_path = "/projects/leaph/yingxian/AMR_Listeria/AMR_Listeria_add/CARD/Data/Out/"
shared = "/projects/leaph/shared/project_data/listeria_US/"
data_path = "/projects/leaph/yingxian/AMR_Listeria/Data/Diversity_Analysis/In/"
# prepare input data
env = pd.read_csv(shared + "environ_all.csv", index_col=0)
genome = pd.read_csv(shared + "Listeria_genomes.csv", index_col=0)
pd.merge(genome, env, how="left", left_on="Sample ID", right_index=True).to_csv(
data_path + "genomes_env.csv"
)
# Both ARG present and functional
df_functional = pd.read_csv(
blast_sum_path + "ARG_functional_prop.csv", header=0, names=["gene", "proportion"]
)
df_functional = df_functional[df_functional["proportion"] != 0]
df_functional["group"] = "Functional"
df_present = pd.read_csv(
blast_sum_path + "ARG_present_prop.csv", header=0, names=["gene", "proportion"]
)
df_present = df_present[df_present["proportion"] != 0]
df_present["group"] = "Truncated"
df_ARG = pd.concat([df_functional, df_present], axis=0)
df_ARG = df_ARG.replace("lmo0919", "lin")
df_ARG = df_ARG.replace("lmo1695", "mprF")
plt.rcParams["figure.figsize"] = (4, 4)
sns.barplot(
data=df_ARG,
x="gene",
y="proportion",
hue="group",
palette=["#D98880", "#A9CCE3"],
edgecolor="black",
)
plt.title("Prevelance of ARGs", size=15)
plt.legend(title="", bbox_to_anchor=(1, 1))
plt.xticks(fontsize=13, rotation=45, ha="right")
plt.xlabel("")
plt.ylabel("Fraction among Listeria genomes", fontsize=13)
# Annotating the bars
for p in plt.gca().patches:
plt.annotate(
format(p.get_height(), '.6f'),
(p.get_x() + p.get_width() / 2., p.get_height()),
ha = 'center', va = 'center',
xytext = (0, 35),
textcoords = 'offset points',
rotation = 90
)
plt.savefig(out_path + "ARG_proportion.pdf", bbox_inches="tight", dpi=300)
plt.show()
plt.rcParams["figure.figsize"] = (8, 4)
genomes_env = pd.read_csv(data_path + "genomes_env.csv", index_col=0)
arg_functional = pd.read_csv(out_path + "ARG_functional_diversity.csv", index_col=0)
spe_functional = pd.merge(
genomes_env[["Species"]],
arg_functional,
how="left",
left_index=True,
right_index=True,
)
spe_functional["group"] = "functional"
spe_functional = spe_functional.sort_values(["richness"], ascending=False)
arg_present = pd.read_csv(out_path + "ARG_present_diversity.csv", index_col=0)
spe_present = pd.merge(
genomes_env[["Species"]], arg_present, how="left", left_index=True, right_index=True
)
spe_present["group"] = "present"
spe_present = spe_present.sort_values(["richness"], ascending=False)
spe_ARG = pd.concat([spe_functional, spe_present], axis=0)
spe_ARG = spe_ARG.groupby("Species").head(len(spe_ARG))
import seaborn as sns
import matplotlib.pyplot as plt
# Bar plot for richness
bar_plot = sns.barplot(
data=spe_ARG,
x="Species",
y="richness",
hue="group",
palette=["#D98880", "#A9CCE3"],
)
# Generate random vertical jitter values
jitter_strength = 0.15 # Adjust this value for more or less jitter
# Create a jittered version of the richness data
spe_ARG['Jittered_richness'] = spe_ARG['richness'] + np.random.uniform(0, jitter_strength, size=len(spe_ARG))
# Overlay data points with stripplot
sns.stripplot(
data=spe_ARG,
x="Species",
y="Jittered_richness",
hue="group",
dodge=True,
palette=["#FFFFFF"] * len(spe_ARG['group'].unique()),
color="#FFFFFF",
alpha=0.6,
jitter=0.3,
marker="o",
linewidth=0.5,
size=2,
)
# Customize labels and title
plt.xlabel("")
plt.ylabel("Richness", size=13)
plt.title("Richness of ARGs by Listeria species", size=15)
plt.xticks(rotation=45, ha="right", size=12)
# Manually create legend for the bar plot only
handles, labels = bar_plot.get_legend_handles_labels() # Get handles and labels from the bar plot
by_label = dict(zip(labels, handles)) # Create a dictionary to remove duplicates
plt.legend(by_label.values(), by_label.keys(), title="", loc='upper right')
# Save and show the plot
plt.savefig(out_path + "ARG_richness_species.pdf", bbox_inches="tight", dpi=300)
plt.show()
# calculate pairwise ANI, then average the distance to all LM genomes, correlation analysis
anv = pd.read_csv(data_path + "ANIb_percentage_identity.tab", index_col=0, sep="\t")
# Turning the lower triangle to zeros and restacking to get the desired matrix
Anv_all = anv.where(np.triu(np.ones(anv.shape)).astype(bool))
Anv_all = Anv_all.stack().reset_index()
Anv_all.columns = ["Row", "Column", "Genetic Similarity"]
# Replacing the Isolate ID with the Species in Group1 and Group2
Species_df = genomes_env.reset_index().iloc[:, [0, 3]].copy()
# print(Species_df.head(10),'\n')
species_name = dict(zip(Species_df["Isolate ID"], Species_df["Species"]))
Anv_all["Species"] = Anv_all["Column"].replace(species_name)
Anv_all["Selected Species"] = Anv_all["Row"].replace(species_name)
Anv_all = Anv_all[["Species", "Selected Species", "Genetic Similarity"]]
# removing all L. monocytogenes from Group1 and leaving only L. monocytogenes in Group2 for the mean
Anv_Lmono = Anv_all[Anv_all["Species"] != "L. monocytogenes"]
Anv_Lmono = Anv_Lmono[Anv_Lmono["Selected Species"] == "L. monocytogenes"]
average_values = pd.DataFrame(
Anv_Lmono.groupby(["Species", "Selected Species"])["Genetic Similarity"].mean()
).reset_index()
arg_div = pd.read_csv(out_path + "ARG_functional_diversity.csv", index_col=0)
arg_div_mod = arg_div.reset_index()
arg_div_mod["Species"] = arg_div_mod["id"].replace(species_name)
arg_div_mod = arg_div_mod[["Species", "shannon", "richness"]]
arg_div_mean = (
arg_div_mod.groupby("Species")
.agg({"shannon": "mean", "richness": "mean"})
.reset_index()
)
arg_div_mean = arg_div_mean[arg_div_mean["Species"] != "L. monocytogenes"]
# Combining the Mean Correlation to arg_div mean Dataframe
arg_div_mean = pd.merge(
arg_div_mean, average_values[["Species", "Genetic Similarity"]], on="Species"
)
# List of species for sensu stricto classification
sensu_stricto_species = [
'L. monocytogenes', 'L. seeligeri', 'L. marthii', 'L. ivanovii',
'L. welshimeri', 'L. innocua', 'L. cossartiae', 'L. farberi',
'L. immobilis', 'L. swaminathanii'
]
# Function to classify species
def classify_species(species):
if species in sensu_stricto_species:
return 'sensu stricto'
else:
return 'sensu lato'
# Add 'Listeria species' column based on 'Species'
arg_div_mean['Listeria species'] = arg_div_mean['Species'].apply(classify_species)
# Define colors for sensu stricto and sensu lato
color_palette = {
'sensu stricto': '#5F9EA0',
'sensu lato': '#FF7F50'
}
# Richness
# Scatter plot with different colors by Listeria species
sns.scatterplot(
x="Genetic Similarity",
y="richness",
hue="Listeria species",
palette=color_palette,
data=arg_div_mean,
s=180,
edgecolor="black",
alpha=0.5
)
plt.suptitle("", size=15, fontweight="bold", style="italic")
# Get current handles and labels for legend
handles, labels = plt.gca().get_legend_handles_labels()
# Rearrange legend handles and labels to put 'sensu stricto' first
order = [1, 0] # Index order for legend items (sensu stricto, sensu lato)
plt.legend([handles[idx] for idx in order], [labels[idx] for idx in order], title="Listeria species", bbox_to_anchor=(1, 1))
# Adjust xlim and ylim
plt.xlim(0.72, 0.91)
plt.ylim(-0.25, 5.5)
# Add a regression line for all points
sns.regplot(
x=arg_div_mean["Genetic Similarity"],
y=arg_div_mean["richness"],
scatter=False,
color="black",
line_kws={"linewidth": 1.5}, # Optional: adjust line width
)
# Spearman correlation
r, p = stats.spearmanr(
list(arg_div_mean["richness"]), list(arg_div_mean["Genetic Similarity"])
)
print(r, p)
g = plt.gca()
g.text(
0.03,
0.9,
"Spearman ρ={:.2f}, $P$={:.2g}".format(r, p),
transform=g.transAxes,
fontsize=13,
color="black",
)
plt.gcf().set_size_inches(4.5, 4)
plt.xlabel("Genetic similarity to L. monocytogenes", size=13)
plt.ylabel("Average richness", size=13)
plt.title("Functional ARGs", size=15)
plt.savefig(
out_path + "ARG_functional_richness_phylodistance_toLM.pdf",
bbox_inches="tight",
dpi=300,
)
plt.show()
arg_div = pd.read_csv(out_path + "ARG_present_diversity.csv", index_col=0)
arg_div_mod = arg_div.reset_index()
arg_div_mod["Species"] = arg_div_mod["id"].replace(species_name)
arg_div_mod = arg_div_mod[["Species", "shannon", "richness"]]
arg_div_mean = (
arg_div_mod.groupby("Species")
.agg({"shannon": "mean", "richness": "mean"})
.reset_index()
)
arg_div_mean = arg_div_mean[arg_div_mean["Species"] != "L. monocytogenes"]
# Combining the Mean Correlation to arg_div mean Dataframe
arg_div_mean = pd.merge(
arg_div_mean, average_values[["Species", "Genetic Similarity"]], on="Species"
)
import matplotlib.patches as mpatches
arg_div_mean['Listeria species'] = arg_div_mean['Species'].apply(classify_species)
# Define colors for sensu stricto and sensu lato
color_palette = {
'sensu stricto': '#5F9EA0',
'sensu lato': '#FF7F50'
}
# Richness
# Scatter plot with different colors by Listeria species
sns.scatterplot(
x="Genetic Similarity",
y="richness",
hue="Listeria species",
palette=color_palette,
data=arg_div_mean,
s=180,
edgecolor="black",
alpha=0.65
)
plt.suptitle("", size=15, fontweight="bold", style="italic")
# Get current handles and labels for legend
handles, labels = plt.gca().get_legend_handles_labels()
# Create a list of custom legend handles with edge color
custom_handles = [mpatches.Circle((0, 0), radius=5, facecolor=color_palette[label], edgecolor='#000000') for label in labels]
# Rearrange legend handles and labels to put 'sensu stricto' first
order = [1, 0] # Index order for legend items (sensu stricto, sensu lato)
plt.legend([handles[idx] for idx in order], [labels[idx] for idx in order], title="Listeria species", bbox_to_anchor=(1, 1))
# Adjust xlim and ylim
plt.xlim(0.72, 0.91)
plt.ylim(1.8, 5.5)
# Add a regression line for all points
sns.regplot(
x=arg_div_mean["Genetic Similarity"],
y=arg_div_mean["richness"],
scatter=False,
color="black",
line_kws={"linewidth": 1.5}, # Optional: adjust line width
)
# Spearman correlation
r, p = stats.spearmanr(
list(arg_div_mean["richness"]), list(arg_div_mean["Genetic Similarity"])
)
print(r, p)
g = plt.gca()
g.text(
0.03,
0.9,
"Spearman ρ={:.2f}, $P$={:.2g}".format(r, p),
transform=g.transAxes,
fontsize=13,
color="black",
)
plt.gcf().set_size_inches(4.5, 4)
plt.xlabel("Genetic similarity to L. monocytogenes", size=13)
plt.ylabel("Average richness", size=13)
plt.title("Present ARGs", size=15)
plt.savefig(
out_path + "ARG_present_richness_phylodistance_toLM.pdf",
bbox_inches="tight",
dpi=300,
)
plt.show()
from matplotlib.lines import Line2D
def gen_map(df, field, species_colors=None, min_marker_size=1):
lats = df["Latitude"].tolist()
lons = df["Longitude"].tolist()
pos = df[field].tolist()
species = df["Species"].tolist()
zoom_scale = 3
bbox = [
np.min(lats) - zoom_scale,
np.max(lats) + zoom_scale,
np.min(lons) - zoom_scale,
np.max(lons) + zoom_scale,
]
fig, (ax, bx) = plt.subplots(
2, 1, figsize=(18, 7), gridspec_kw={"height_ratios": [10, 3]}
)
m = Basemap(
projection="merc",
llcrnrlat=bbox[0],
urcrnrlat=bbox[1],
llcrnrlon=bbox[2],
urcrnrlon=bbox[3],
lat_ts=10,
resolution="i",
ax=ax,
)
m.drawcountries(linewidth=0.5, color="black")
m.fillcontinents(color="#F8F9F9", lake_color="lightblue")
m.drawparallels(np.arange(25, 55, 5), color="lightgray", labels=[1, 0, 0, 0])
m.drawmeridians(
np.arange(-125, -65, 5), color="lightgray", labels=[0, 0, 0, 1], rotation=45
)
m.drawmeridians(
np.arange(-95, -92, 5),
color="#21618C",
labels=[0, 0, 1, 0],
textcolor="#21618C",
fontsize=13,
linewidth=2,
)
m.drawmapboundary(fill_color="lightblue")
handles = []
labels = []
sizes = []
# Define the order for legend based on species
legend_order = [
'L. welshimeri', 'L. immobilis', 'L. seeligeri', 'L. ivanovii',
'L. cossartiae', 'L. swaminathanii', 'L. monocytogenes', 'L. marthii',
'L. innocua', 'L. farberi', 'L. fleischmannii', 'L. grandensis', 'L. grayi',
'L. rocourtiae', 'L. aquatica', 'L. weihenstephanensis', 'L. portnoyi',
'L. newyorkensis', 'L. booriae',
]
for lon, lat, po, sp in zip(lons, lats, pos, species):
sizes.append((round(po, 1)))
x, y = m(lon, lat)
msize = po**min_marker_size + 5
if msize > 5:
if species_colors is not None and sp in species_colors:
marker = m.plot(
x,
y,
"o",
markersize=msize,
color=species_colors[sp],
markeredgecolor="#212121",
alpha=0.5,
)
else:
marker = m.plot(
x,
y,
"o",
markersize=msize,
color="#A9CCE3",
markeredgecolor="#212121",
alpha=0.5,
)
else:
if species_colors is not None and sp in species_colors:
marker = m.plot(x, y, "x", markersize=10, color=species_colors[sp], alpha=0.5)
else:
marker = m.plot(x, y, "x", markersize=10, color="grey", alpha=0.65)
if marker:
handles.append(marker[0])
labels.append(sp)
# Rearrange legend handles and labels based on order
handles = [handles[labels.index(species)] for species in legend_order]
labels = legend_order
legend_elements = [
Line2D(
[0],
[0],
marker="o",
markersize=12,
color="w",
markerfacecolor=species_colors[label],
markeredgecolor="#212121",
label=label,
)
for label in labels
]
legend1 = ax.legend(
handles=legend_elements,
loc="upper left",
bbox_to_anchor=(1.02, 0.9),
ncol=2,
title="Species Color",
)
ax.add_artist(legend1)
sizes = sorted(list(set(sizes)))
markers = [
Line2D(
[0],
[0],
marker="o",
markersize=size * 2,
color="w",
markeredgecolor="black",
label=str(size),
)
for size in sizes
]
legend2 = bx.legend(
handles=markers,
loc="right",
bbox_to_anchor=(0.9, 1.8),
title="Marker Size",
ncol=2,
)
bx.add_artist(legend2)
bx.axis("off")
plt.tight_layout()
return m
import colorcet as cc
import pandas as pd
import seaborn as sns
genomes_env = pd.read_csv(data_path + "genomes_env.csv", index_col=0)
sensu_stricto_species = [
'L. welshimeri', 'L. immobilis', 'L. seeligeri', 'L. ivanovii',
'L. cossartiae', 'L. swaminathanii', 'L. monocytogenes', 'L. marthii',
'L. innocua', 'L. farberi'
]
sensu_lato_species = [
'L. fleischmannii', 'L. grandensis', 'L. grayi', 'L. rocourtiae', 'L. aquatica',
'L. weihenstephanensis', 'L. portnoyi', 'L. newyorkensis', 'L. booriae'
]
# Get the unique species names
unique_species = genomes_env['Species'].unique()
# Define color palettes
color_1 = [
'#9370DB', '#6A5ACD', '#0000FF', '#4169E1',
'#87CEEB', '#00FFFF', '#008080', '#008000',
'#32CD32', '#7FFF00'
]
color_2 = ['#FFFF00', '#FFD700', '#FF8C00', '#FFA500', '#FF4500',
'#E34234', '#FF2400', '#DC143C', '#E81416']
# Create a dictionary mapping species to colors based on sensu stricto criteria
species_colors = {}
for species in unique_species:
if species in sensu_stricto_species:
species_colors[species] = color_1[sensu_stricto_species.index(species)]
elif species in sensu_lato_species:
species_colors[species] = color_2[sensu_lato_species.index(species)]
else:
species_colors[species] = '#000000' # Default color for unspecified species
# functional ARGs
arg_div = pd.read_csv(out_path + "ARG_functional_diversity.csv", index_col=0)
div_loc = pd.merge(
genomes_env[["Latitude", "Longitude", "Species"]],
arg_div,
how="left",
left_index=True,
right_index=True,
)
gen_map(div_loc, "richness", species_colors=species_colors, min_marker_size=1.8)
plt.suptitle("Richness of functional ARGs in Listeria genomes ", size=16)
plt.subplots_adjust(top=0.9)
plt.savefig(out_path + "functional_ARG_richness_map_light.pdf", dpi=500)
plt.show()
# present ARGs
arg_div = pd.read_csv(out_path + "ARG_present_diversity.csv", index_col=0)
div_loc = pd.merge(
genomes_env[["Latitude", "Longitude", "Species"]],
arg_div,
how="left",
left_index=True,
right_index=True,
)
gen_map(div_loc, "richness", species_colors=species_colors, min_marker_size=1.8)
plt.suptitle("Richness of present ARGs in Listeria genomes ", size=16)
plt.subplots_adjust(top=0.9)
plt.savefig(out_path + "present_ARG_richness_map.pdf", dpi=500)
plt.show()
原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。
如有侵权,请联系 cloudcommunity@tencent.com 删除。
原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。
如有侵权,请联系 cloudcommunity@tencent.com 删除。