如果有人有这个数据帧:
0 1 2
0 RC100_1 RC100_1 RC100_1
1 RC101_1 RC101_1 RC101_1
2 RC101_3 RC102_1 RC102_1
3 RC102_1 RC103_3 RC103_3
4 RC102_3 RC104_1 RC104_1
5 RC103_3 RC109_2 RC109_2
6 RC104_1 RC111_1 RC111_1
7 RC109_2 RC114_2 RC114_2
8 RC111_1 RC115_2 RC115_2
9 RC114_1 RC116_1 RC116_1
10 RC115_4 RC116_2 RC116_2
11 RC116_1 RC117_2 RC117_2
12 RC117_4 RC117_4 RC117_4
13 RC117_4 RC118_2 RC118_2
14 RC118_1 RC119_2 RC119_2
15 RC119_4 RC120_2 RC120_2
16 RC120_4 RC121_2 RC121_2
17 RC121_4 RC122_2 RC122_2
18 RC122_4 RC125_2 RC125_2
19 RC125_2 RC126_3 RC126_3
20 RC126_3 RC129_2 RC129_2
21 RC129_4 RC12_24 RC12_24
是否可以将其转换为每行只包含前五个字符匹配的内容,否则将被排序?我的意思是不需要手动更换每个单元格。以下是我想要将其转换为
0 1 2
0 RC100_1 RC100_1 RC100_1
1 RC101_1 RC101_1 RC101_1
2 RC101_3 NaN NaN
3 RC102_1 RC102_1 RC102_1
4 RC102_3 NaN NaN
5 RC103_3 RC103_3 RC103_3
6 RC104_1 RC104_1 RC104_1
7 RC109_2 RC109_2 RC109_2
8 RC111_1 RC111_1 RC111_1
9 RC114_1 RC114_2 RC114_2
10 RC115_4 RC115_2 RC115_2
11 RC116_1 RC116_1 RC116_1
12 NaN RC116_2 RC116_2
13 RC117_4 RC117_2 RC117_2
14 RC117_4 RC117_4 RC117_4
15 RC118_1 RC118_2 RC118_2
16 RC119_4 RC119_2 RC119_2
17 RC120_4 RC120_2 RC120_2
18 RC121_4 RC121_2 RC121_2
19 RC122_4 RC122_2 RC122_2
20 RC125_2 RC125_2 RC125_2
21 RC126_3 RC126_3 RC126_3
22 RC129_4 RC129_2 RC129_2
23 NaN RC12_24 RC12_24
发布于 2020-10-13 07:19:57
只需列出所有文件的名称,并使用它来索引和对齐所有的文件名。以下是您发布的数据的工作示例。
import pandas as pd
import numpy as np
# in your case do something like names_idx = excel_names + csv_names + txt_names
names_idx = ['RC100_1', 'RC100_1', 'RC100_1',
'RC101_1', 'RC101_1', 'RC101_1',
'RC101_3', 'RC102_1', 'RC102_1',
'RC102_1', 'RC103_3', 'RC103_3',
'RC102_3', 'RC104_1', 'RC104_1',
'RC103_3', 'RC109_2', 'RC109_2',
'RC104_1', 'RC111_1', 'RC111_1',
'RC109_2', 'RC114_2', 'RC114_2',
'RC111_1', 'RC115_2', 'RC115_2',
'RC114_1', 'RC116_1', 'RC116_1',
'RC115_4', 'RC116_2', 'RC116_2',
'RC116_1', 'RC117_2', 'RC117_2',
'RC117_4', 'RC117_4', 'RC117_4',
'RC117_4', 'RC118_2', 'RC118_2',
'RC118_1', 'RC119_2', 'RC119_2',
'RC119_4', 'RC120_2', 'RC120_2',
'RC120_4', 'RC121_2', 'RC121_2',
'RC121_4', 'RC122_2', 'RC122_2',
'RC122_4', 'RC125_2', 'RC125_2',
'RC125_2', 'RC126_3', 'RC126_3',
'RC126_3', 'RC129_2', 'RC129_2',
'RC129_4', 'RC12_24', 'RC12_24']
# This is not needed if you already have them separately from globbing earlier
csv_names = names_idx[::3]
excel_names = names_idx[1::3]
txt_names = names_idx[2::3]
# remove duplicates
names_idx = set(names_idx)
# create an empty dataframe with index as unique file names
df = pd.DataFrame(index=names_idx)
# create empty columns and position the file names
df['csv'] = np.nan
df.csv[csv_names] = csv_names
df['excel'] = np.nan
df.excel[excel_names] = excel_names
df['txt'] = np.nan
df.txt[txt_names] = txt_names
print(df)
输出
csv excel txt
RC111_1 RC111_1 RC111_1 RC111_1
RC125_2 RC125_2 RC125_2 RC125_2
RC114_2 NaN RC114_2 RC114_2
RC116_2 NaN RC116_2 RC116_2
RC129_4 RC129_4 NaN NaN
RC118_1 RC118_1 NaN NaN
RC12_24 NaN RC12_24 RC12_24
RC121_4 RC121_4 NaN NaN
RC126_3 RC126_3 RC126_3 RC126_3
RC129_2 NaN RC129_2 RC129_2
RC102_1 RC102_1 RC102_1 RC102_1
RC101_1 RC101_1 RC101_1 RC101_1
RC109_2 RC109_2 RC109_2 RC109_2
RC119_4 RC119_4 NaN NaN
RC100_1 RC100_1 RC100_1 RC100_1
RC120_2 NaN RC120_2 RC120_2
RC122_2 NaN RC122_2 RC122_2
RC121_2 NaN RC121_2 RC121_2
RC117_4 RC117_4 RC117_4 RC117_4
RC118_2 NaN RC118_2 RC118_2
RC103_3 RC103_3 RC103_3 RC103_3
RC117_2 NaN RC117_2 RC117_2
RC102_3 RC102_3 NaN NaN
RC119_2 NaN RC119_2 RC119_2
RC114_1 RC114_1 NaN NaN
RC116_1 RC116_1 RC116_1 RC116_1
RC101_3 RC101_3 NaN NaN
RC115_2 NaN RC115_2 RC115_2
RC122_4 RC122_4 NaN NaN
RC104_1 RC104_1 RC104_1 RC104_1
RC115_4 RC115_4 NaN NaN
RC120_4 RC120_4 NaN NaN
https://stackoverflow.com/questions/64326084
复制相似问题