我在熊猫中有两个相同列的数据框架,我们在A列有索引
实际:
  A      B       C 
  1     apple   red
  2     berry   blue 
  3    grapes   green第二数据帧
预期:
  A    B     C
  1   apple  green
  2   guava  blue
  3   grapes  green现在我需要比较这两个数据帧,突出不匹配的单元格和在数据帧中,然后输出到excel。
我的代码:
import pandas as pd
pd.concat([pd.concat([actual,expected,expected]).drop_duplicates(keep=False)]).to_excel(.......)产出:
  A   B   C
  1  apple red
  2  berry blue我需要突出红色和浆果
发布于 2022-01-03 14:15:48
有一个函数- compare,它帮助您比较两个数据集:https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.compare.html
df = pd.DataFrame({
        "A": [1, 2, 3],
        "B": ["apple", "berry", "grapes"],
        "C": ["red", "blue", "green"]
    },
    columns=["A", "B", "C"])
df2 = df.copy()
df2.loc[0, 'C'] = 'green'
df2.loc[2, 'B'] = 'guava'使用它,您可以得到两个数据集的比较:
df.compare(df2)给你:
    B               C
    self    other   self    other
0   NaN     NaN     red     green
2   grapes  guava   NaN     NaN通过筛选不需要的行(相同的行)和列,您可以只使用与原始数据不同的数据来获得dataframe:
compare = df.compare(df2, keep_shape=True).drop('other', level=1, axis=1)
compare = compare.droplevel(1, axis=1).dropna(how='all')
    A   B       C
0   NaN NaN     red
2   NaN grapes  NaN因为我们需要过滤原始数据集中相同的行:
filtered = df.loc[compare.index]现在,我们可以以某种方式“突出”不同的数据:
def color_cells(s):
    if pd.notna(s):
        return 'color:{0}; font-weight:bold'.format('red')
    else:
        return ''
filtered.style.apply(lambda x: compare.applymap(color_cells), axis=None)会产生这样的东西:

发布于 2022-01-03 14:29:55
使用StyleFrame可以很容易地做到这一点,如本文所示:color specific cells in excel python。
首先,运行pip安装样式框架。然后按照以下步骤操作:
from styleframe import StyleFrame, Styler
import pandas as pd
# true and expected data
d_true = {'A':[1,2,3], 'B':['apple', 'berry', 'grapes'], 'C':['red', 'blue', 'green']}
df_true = pd.DataFrame(d_true)
d_exp = {'A':[1,2,3], 'B':['apple', 'guava', 'grapes'], 'C':['green', 'blue', 'green']}
df_exp = pd.DataFrame(d_exp)
# pass df to styleFrame 
sf1 = StyleFrame(df_true)
sf2 = StyleFrame(df_exp)
# Set color for differences 
sf1_diff = Styler(bg_color='#FFCCCC') # red
sf2_diff = Styler(bg_color='#DAF6FF') # blue
# Difference matrix 
ne = sf1.data_df != sf2.data_df
# apply above color style where difereneces are true
for col in ne.columns:
    sf1.apply_style_by_indexes(indexes_to_style=ne[ne[col]].index,
                               styler_obj=sf1_diff,
                               cols_to_style=col)
    sf2.apply_style_by_indexes(indexes_to_style=ne[ne[col]].index,
                               styler_obj=sf2_diff,
                               cols_to_style=col)
# save your excel 
sf1.to_excel('df1_diff_in_color.xlsx').save()
sf2.to_excel('df2_diff_in_color.xlsx').save()输出excel文件:

https://stackoverflow.com/questions/70566100
复制相似问题