社区首页 >问答首页 >脚本来比较两个不同文件中的字符串

问脚本来比较两个不同文件中的字符串
EN

Stack Overflow用户

提问于 2011-12-08 10:39:35

回答 4查看 2.2K关注 0票数 3

对于堆栈溢出和脚本编写，我是全新的。我在寻找帮助，开始在脚本，不一定要找人来写它。

以下是我所拥有的: File1.csv -包含一些信息，我只对MAC地址感兴趣。MAC有一些不同的信息，但也包含MAC地址。

我需要一个脚本来解析文件1.csv中的MAC地址，如果在file2.csv中显示任何 MAC地址，它将记录一个报告。

问题如下：

对我使用的语言有什么建议，最好是perl、python或bash？
有人能为所需的逻辑提出一些结构建议(即使只是在psuedo代码中)吗？

更新

使用@Adam Wagner的方法，我真的很接近！

import csv
#Need to strip out NUL values from .csv file to make python happy
class FilteredFile(file):
        def next(self):
                return file.next(self).replace('\x00','').replace('\xff\xfe','')

reader = csv.reader(FilteredFile('wifi_clients.csv', 'rb'), delimiter=',', quotechar='|')
s1 = set(rec[0] for rec in reader)

inventory = csv.reader(FilteredFile('inventory.csv','rb'),delimiter=',')
s2 = set(rec[6] for rec in inventory)

shared_items = s1.intersection(s2)
print shared_items

这总是输出：(即使我医生.csv文件有匹配的MAC地址)

集合([])

csv文件的内容

wifi_clients.csv macNames，第一次看到，最后一次看到，Power，# BSSID，BSSID，探测BSSID inventory.csv 名称，制造商，设备类型，型号，序列号，IP地址，MAC地址，.

python

perl

bash

回答 4

Stack Overflow用户

回答已采纳

发布于 2011-12-08 10:50:10

下面是我要采取的方法：

迭代每个csv文件(python有一个方便的csv模块来实现这一点)，捕获mac地址并将其放在一个集合中(每个文件一个)。再说一次，python有一个很好的内置set类型。模块，当然还有医生们.
接下来，您可以得到intersection of set1 (file1)和set2 (file2)。这将向您展示存在于一个和两个文件中的mac地址。

示例(在python中)：

s1 = set([1,2,3])  # You can add things incrementally with "s1.add(value)"
s2 = set([2,3,4])

shared_items = s1.intersection(s2)
print shared_items

其中产出：

set([2, 3])

记录这些共享项可以使用从打印(然后将输出重定向到文件)、使用logging模块，到直接保存到文件的任何操作。

我不知道你在寻找什么深度的答案，但这应该会让你开始。

更新: CSV/Set使用示例

假设您有一个文件"foo.csv"，它看起来如下所示：

bob,123,127.0.0.1,mac-address-1
fred,124,127.0.0.1,mac-address-2

构建集合的最简单方法是这样的：

import csv

set1 = set()
for record in csv.reader(open('foo.csv', 'rb')):
    user, machine_id, ip_address, mac_address = record
    set1.add(mac_address)
    # or simply "set1.add(record[3])", if you don't need the other fields.

显然，每个文件都需要这样的内容，因此您可能希望将其放入一个函数中，以使生活更容易。

最后，如果您想要进行不那么冗长但更酷的python-way，您还可以像这样构建这个集：

csvfile = csv.reader(open('foo.csv', 'rb'))
set1 = set(rec[3] for rec in csvfile)   # Assuming mac-address is the 4th column.

票数 5

Stack Overflow用户

发布于 2011-12-08 10:54:11

我强烈建议python这样做。

因为您没有给出csv文件的结构，所以我只能显示一个框架：

def get_MAC_from_file1():
    ... parse the file to get MAC
    return a_MAC_list
def get_MAC_from_file2():
    ... parse the file to get MAC
    return a_MAC_list
def log_MACs():
    MAC_list1, MAC_list2 = get_MAC_from_file1(), get_MAC_from_file2()
    for a_MAC in MAC_list1:
        if a_MAC in MAC_list2:
            ...write your logs

如果数据集很大，则使用dict或set来代替列表和intersect操作。但是因为它是MAC地址，我想你的数据集没有那么大。因此，保持脚本易于阅读是最重要的。

票数 1

Stack Overflow用户

发布于 2011-12-08 11:03:08

Awk是这方面的完美选择

{
   mac = $1  # assuming the mac addresses are in the first column
   do_grep = "grep " mac " otherfilename" # we'll use grep to check if the mac address is in the other file
   do_grep | getline mac_in_other_file  # pipe the output of the grep command into a new variable
   close(do_grep)  # close the pipe
   if(mac_in_other_file != ""){     # if grep found the mac address in the other file
     print mac > "naughty_macs.log"  # append the mac address to the log file
   }
}