需求场景:需要找到源码中指定的某些包含客户信息的字段。 版本1: 检索一个关键字,包含的则输出到控制台。
import os
rootDir = os.getcwd()
def scan_file(filename, dirname):
if("hello" in filename):
if("src" in dirname):
print(os.path.join(dirname,filename))
else:
with open(os.path.join(dirname,filename)) as f:
lines = f.readlines()
for l in lines:
#print(l)
if("hello" in l):
if("/src" in dirname):
print(os.path.join(dirname,filename))
break
for dirName, subdirList, fileList in os.walk(rootDir):
for fname in fileList:
scan_file(fname, dirName)
版本2:检索多个关键字,输出包含关键字的文件与包含的关键字
rootDir = os.getcwd()
keywords = ["hello","world","thanks"]
def scan_file(filename, dirname,keyword):
if(keyword in filename):
if("/src" in dirname):
return True
else:
with open(os.path.join(dirname,filename)) as f:
lines = f.readlines()
for l in lines:
if(keyword in l):
if("/src" in dirname):
return True
for dirName, subdirList, fileList in os.walk(rootDir):
for fname in fileList:
flag = False
for keyword in keywords:
if(scan_file(fname, dirName,keyword)):
if(flag is False):
flag = True
f = open('test.txt', 'a')
f.write(keyword)
f.write(" ,")
f.close()
if(flag is True):
f = open('test.txt', 'a')
f.write("\n"+os.path.join(dirName,fname)+"\n")
f.close()
这个版本实现了基本功能,但是仍然不够完美。迭代的空间:
1.算法的性能,包括时间复杂度,代码的冗余、优雅 2.输出结果的可读性,最好能够按照模块对文件进行整理,呈现在excel中 3.细节:对png等不符合需求的文件进行排除。
留待读者思考。