我想在一组字符串中搜索特定的模式。
给定这两个字符串向量:
actions <- c("taking","using")
nouns <- c("medication","prescription")
我想找到 action +名词的任何组合,按照这个特定的顺序,而不是名词+。例如,使用以下文本,我希望检测到组合:
使用以下案文:
phrases <- c("he was using medication",
"medication using it",
"finding medication",
"taking the left",
"using prescription medication",
"taking medication drug")
我试过使用grep("\\b(taking|using+medication|prescriptio)\\b",phrases,value = FALSE)
,但这显然是错误的。
发布于 2016-11-12 09:56:15
您可以使用actions
和nouns
值构建交互组,并将它们放入一个更大的正则表达式中:
actions <- c("taking","using")
nouns <- c("medication","prescription")
phrases <- c("he was using medication","medication using it","finding medication","taking the left","using prescription medication","taking medication drug")
grep(paste0("(",paste(actions, collapse="|"), ")\\s+(", paste(nouns,collapse="|"),")"), phrases, value=FALSE)
## => [1] 1 5 6
## and a visual check
grep(paste0("(",paste(actions, collapse="|"), ")\\s+(", paste(nouns,collapse="|"),")"), phrases, value=TRUE)
## => [1] "he was using medication" "using prescription medication" "taking medication drug"
生成的regex看起来如下所示
(taking|using)\s+(medication|prescription)
见regex演示。
详细信息
(taking|using)
-匹配taking
或(|
) using
的交替组\s+
-1或更多空白空间(medication|prescription)
-匹配medication
或prescription
的交替组。注意,(...)
捕获组可以替换为(?:...)
非捕获组,以避免将子匹配保持在内存中。
https://stackoverflow.com/questions/40565776
复制