首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >从python中的txt文件中提取特定文本

从python中的txt文件中提取特定文本
EN

Stack Overflow用户
提问于 2021-05-03 13:06:55
回答 2查看 933关注 0票数 1

我最近学到了python来做一些文本提取。我有一个数据集,如下所示:

代码语言:javascript
运行
复制
    @article{noauthor_collective_nodate,
    title = {Collective teacher efficacy},
    abstract = {Overview Influence: Collective teacher efficacy Domain: Teacher Sub-Domain: Teacher attributes Potential to Accelerate Student Achievement: Potential to considerably accelerate Influence Definition: The shared belief by a group of teachers in a particular educational environment that they have the skills to positively impact student outcomes. Evidence Number of meta-analyses: 2 Number of studies: 61 Number of students: 3,489 Number of effects: 61 Effect size: 1.39},
@article{noauthor_collective_nodate,
    title = {Collective teacher efficacy},
    abstract = {Overview Influence: Collective teacher efficacy Domain: Teacher Sub-Domain: Teacher attributes Potential to Accelerate Student Achievement: Potential to considerably accelerate Influence Definition: The shared belief by a group of teachers in a particular educational environment that they have the skills to positively impact student outcomes. Evidence Number of meta-analyses: 2 Number of studies: 61 Number of students: 3,489 Number of effects: 61 Effect size: 1.39},
}

@article{noauthor_initial_nodate,
    title = {Initial teacher education programs},
    abstract = {Overview Influence: Initial teacher education programs Domain: Teacher Sub-Domain: Teacher Education Potential to Accelerate Student Achievement: Likely to have small positive impact Influence Definition: Initial teacher education or {ITEs} (sometimes at the undergraduate level and sometimes at the post-graduate level) is the entry-level qualification for teaching in numerous countries, including the United States. More recently, there are school-based {ITEs}, non-accredited {ITEs}, and many online {ITE} programs. Evidence Number of meta-analyses: 5 Number of studies: 117 Number of students: 106,016 Number of effects: 509 Effect size: 0.10},
}

@article{noauthor_professional_nodate,
    title = {Professional development programs},
    abstract = {Overview Influence: Professional development programs Domain: Teacher Sub-Domain: Teacher Education Potential to Accelerate Student Achievement: Likely to have positive impact Influence Definition: Professional development relates to courses or interventions aimed to enhance the beliefs, actions, impact of knowledge of teachers and school leaders. Evidence Number of meta-analyses: 21 Number of studies: 1,151 Number of students: 2,321,242 Number of effects: 2,938 Effect size: 0.37},
    keywords = {Program Development},
}

我想从这篇文章中提取标题和部分摘要。通过使用以下代码,我成功地提取了我想要的输出:

代码语言:javascript
运行
复制
s = "@article{noauthor_collective_nodate, title = {Collective teacher efficacy}, abstract = {Overview Influence: Collective teacher efficacy Domain: Teacher Sub-Domain: Teacher attributes Potential to Accelerate Student Achievement: Potential to considerably accelerate Influence Definition: The shared belief by a group of teachers in a particular educational environment that they have the skills to positively impact student outcomes. Evidence Number of meta-analyses: 2 Number of studies: 61 Number of students: 3,489 Number of effects: 61 Effect size: 1.39},}@article{noauthor_collective_nodate, title = {Collective teacher efficacy}, abstract = {Overview Influence: Collective teacher efficacy Domain: Teacher Sub-Domain: Teacher attributes Potential to Accelerate Student Achievement: Potential to considerably accelerate Influence Definition: The shared belief by a group of teachers in a particular educational environment that they have the skills to positively impact student outcomes. Evidence Number of meta-analyses: 2 Number of studies: 61 Number of students: 3,489 Number of effects: 61 Effect size: 1.39},}"


start = s.find("title = {") + len("title = {")
end = s.find("}, abstract")

start2 = s.find("Influence Definition: ") + len("Influence Definition: ")
end2 = s.find("Evidence Number of meta-analyses:")

substring = s[start:end]
substring2 = s[start2:end2]
print(substring+' - '+substring2+";")

输出:

代码语言:javascript
运行
复制
Collective teacher efficacy - The shared belief by a group of teachers in a particular educational environment that they have the skills to positively impact student outcomes. ;

问题是:

  • 这只会取出第一个搜索结果
  • 我希望能够在原始文本文件上运行它,而不是以"s“的形式复制它。

有人能伸出援手吗?

EN

回答 2

Stack Overflow用户

回答已采纳

发布于 2021-05-03 13:50:51

这应该可以做到:

代码语言:javascript
运行
复制
with open("myfile.txt", "r") as f:
    s = f.readlines()
    for x in s:
        if x.__contains__("title"):
            start = x.find("title = {") + len("title = {")
            end = x.find("}")
            substring = x[start:end] + " - "
        if x.__contains__("Influence Definition"):
            start = x.find("Influence Definition: ") + len("Influence Definition: ")
            end = x.find("Evidence Number of meta-analyses:")
            substring += x[start:end]
            print(substring)
            print()
    f.close()

例如,如果您的文件名为myfile.txt,它将打印以下内容:

集体教师效能--一群教师在特定的教育环境中共同相信他们有能力积极影响学生的结果。 集体教师效能--一群教师在特定的教育环境中共同相信他们有能力积极影响学生的结果。 最初的教师教育计划--最初的教师教育(有时在本科一级,有时在研究生一级)是包括美国在内的许多国家的初级教师资格。最近,有以学校为基础的{ITE}、未经认证的{ITE}和许多在线{ITE}项目. 专业发展方案-专业发展涉及旨在加强教师和学校领导知识的信念、行动和影响的课程或干预措施。

票数 1
EN

Stack Overflow用户

发布于 2021-05-03 13:18:07

  1. str.find有一个start参数。您可以使用它跳过先前的搜索结果,只找到下一次出现的情况。
  2. 您可以使用打开从文件中读取文本(注意文档中的示例代码,即使用with open("filename")...)
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/67369455

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档