我有一个包含数据组的CSV文件,并使用groupby()
方法对它们进行分离。每组都通过一些简单的数学运算进行处理,包括对几列使用min()
和max()
,以及一些减法和乘法来创建新的数据列。然后我画出每一组的图表。这在很大程度上可以工作,但我有两个关于我的代码图的抱怨-图是单独的,不是我喜欢的组合;我对每个组都有"SettingWithCopyWarning“。根据我的搜索,我相信解决方案要么是使用.loc
,要么是使用更好的split-apply (可能是组合)方法。我可以在Excel中做到这一点,但我正在努力学习Python,当我的代码正常工作时,我想要改进它。
import os.path
import sys
import pandas as pd
filename = "data/cal_data.csv"
df = pd.read_csv(filename, header=0) #one line of headers
df['Test']="Model "+df['Model No'] +", SN "+ df['Serial No'].values.astype(str) +", Test time "+ df['Test time'].values.astype(str) # combining several columns into a single column that makes grouping straight-forward, and simplifies titles of graphs. Not completely necessary.
df = df[df.index <= df.groupby('Test')['Test Point'].transform('idxmax')]#drop rows after each max test point
for title, group in df.groupby('Test'):
x1, x2 = min(group["Test Reading"]),max(group["Test Reading"])
x4, x3 = max(group["Test Point"]),min(group["Test Point"]) #min is usually zero
R=(x2-x1)/(x4-x3) #linearize
group['Test Point Error']=100*(group['Test Reading']- (group['Test Point']*R+x1))
ax=group.plot(x='Test Point', y='Test Point Error', title=title, grid=True)
ax.set_ylabel("% error (+/-"+str(Error_Limit)+"% limit)")
输出错误:
SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
编辑-添加了来自df.head(20)
的输出,以及几个绘图的图像:
Test Point Test Reading Test
0 0 0.10453 Model LC-500, SN 937618, Test time 17:20:10
1 20 0.17271 Model LC-500, SN 937618, Test time 17:20:10
2 50 0.27838 Model LC-500, SN 937618, Test time 17:20:10
3 100 0.45596 Model LC-500, SN 937618, Test time 17:20:10
4 150 0.63435 Model LC-500, SN 937618, Test time 17:20:10
5 200 0.81323 Model LC-500, SN 937618, Test time 17:20:10
6 250 0.99252 Model LC-500, SN 937618, Test time 17:20:10
7 300 1.17222 Model LC-500, SN 937618, Test time 17:20:10
8 350 1.35219 Model LC-500, SN 937618, Test time 17:20:10
9 400 1.53260 Model LC-500, SN 937618, Test time 17:20:10
10 450 1.71312 Model LC-500, SN 937618, Test time 17:20:10
11 500 1.89382 Model LC-500, SN 937618, Test time 17:20:10
14 0 0.10468 Model LC-500, SN 937618, Test time 17:31:46
15 20 0.17284 Model LC-500, SN 937618, Test time 17:31:46
16 50 0.27856 Model LC-500, SN 937618, Test time 17:31:46
17 100 0.45609 Model LC-500, SN 937618, Test time 17:31:46
18 150 0.63457 Model LC-500, SN 937618, Test time 17:31:46
19 200 0.81341 Model LC-500, SN 937618, Test time 17:31:46
20 250 0.99277 Model LC-500, SN 937618, Test time 17:31:46
21 300 1.17237 Model LC-500, SN 937618, Test time 17:31:46
编辑/更新2020年7月23日:我做了几个变通方法,使此工作,但我仍然感谢任何帮助。以下是修改后的for循环代码,将每个组写入一个新的csv文件以供稍后读取(这样我就可以添加在这里创建的新列),如果临时文件已经存在,也将其删除:
if os.path.exists("data/temp.csv"):
os.remove("data/temp.csv")
for title, group in df.groupby('Test'):
x1 = min(group["Test Reading"].head(1))
x2 = max(group["Test Reading"].tail(1))
x3 = min(group["Test Point"].head(1))
x4 = max(group["Test Point"].tail(1))
R=(x2-x1)/(x4-x3) #linearization scalar
group['Test Point Error'] =100*(group['Test Reading']- (group['Test Point']*R+x1))/(x2-x1)
file = open('data/temp.csv','a')
group.to_csv('data/temp.csv', mode="a", index=False, columns=columns, header=False)#, header=True, index=True, index_label=None, mode='w', encoding=None, compression='infer', quoting=None, quotechar='"', line_terminator=None, chunksize=None, date_format=None, doublequote=True, escapechar=None, decimal='.'))
file.close()
然后,读取临时csv,我使用seaborn (import seaborn as sns
和import matplotlib.pyplot as plt
)将多个组绘制在一起,按序列号分组,每行4个子图。
df = pd.read_csv('data/temp.csv', header=0)
df['Model/SN']=df['Model No']+" / "+df['Serial No'].values.astype(str)
g = sns.FacetGrid(df, col='Model/SN', hue='Test', col_wrap=4, sharey=False, sharex=False)
g.map(plt.axhline, y=Error_Limit, ls='--', c='red')
g.map(plt.axhline, y=-Error_Limit, ls='--', c='red')
g = g.map(sns.lineplot, 'Test Point', 'Test Point Error', ci=None)
总而言之-这些修复并不理想;它们是变通的解决方案,我仍然收到"SettingWithCopyWarning“错误。
发布于 2020-07-24 15:32:45
所以你的要求是:
之所以会出现"SettingWithCopyWarning“,是因为您正在创建一列并在每个组上设置值,这本身就是DataFrame的某些行的副本。我不是在每个循环上设置值,而是在退出for循环后将“Test_Point_Error”存储在序列和pd.concat(list)的列表中,然后将其添加到DF中。
-编辑-尝试替换:
group['Test Point Error']=100*(group['Test Reading']- (group['Test Point']*R+x1))
使用
error_list.append(100 * (group['Test Reading']- (group['Test Point']*R+x1)))
因此,在退出for-loop之后:
df.assign(test_point_error=pd.concat(error_list))
-编辑结束
如果您在退出for-loop之后绘制,那么
df.groupby().plot(subplots=True)
将返回您想要的内容。
在另一个主题中,我将去掉'Test‘的字符串连接,而是这样做:
df.groupby(['Model No', 'Serial No', 'Test Time'])
如果有很多行,这可能会使您的代码更快。
https://stackoverflow.com/questions/62725942
复制