我刚刚了解了xmlstarlet,但不幸的是,我在使用XML方面非常困难,所以我希望我能得到一些帮助。
比如说,我有这个XML文件,test.xml
<?xml version="1.0" encoding="UTF-8"?>
<objects>
<g id="layer3" inkscape:label="hello">
<circle id="circ2" inkscape:label="there"/>
<rect id="rect2" inkscape:label="world"/>
</g>
<g id="layer4">
<circle id="circ3" inkscape:label="more"/>
</g>
</objects>因此,我要做的是:对于存在inkscape:label属性的每个节点,将inkscape:label属性的值复制到id属性;因此,上面的预期输出将是:
<?xml version="1.0" encoding="UTF-8"?>
<objects>
<g id="hello" inkscape:label="hello">
<circle id="there" inkscape:label="there"/>
<rect id="world" inkscape:label="world"/>
</g>
<g id="layer4">
<circle id="more" inkscape:label="more"/>
</g>
</objects>我怎么能用xmlstarlet做这件事?
显然,通过使用表达式string("TEST"),我可以用固定值替换所有id属性,如下所示:
$ xmlstarlet edit -N inkscape="http://www.inkscape.org/namespaces/inkscape" --update '//*/@id' --expr 'string("TEST")'
test.xml
test.xml:3.40: Namespace prefix inkscape for label on g is not defined
<g id="layer3" inkscape:label="hello">
^
test.xml:4.46: Namespace prefix inkscape for label on circle is not defined
<circle id="circ2" inkscape:label="there"/>
^
test.xml:5.44: Namespace prefix inkscape for label on rect is not defined
<rect id="rect2" inkscape:label="world"/>
^
test.xml:8.45: Namespace prefix inkscape for label on circle is not defined
<circle id="circ3" inkscape:label="more"/>
^
<?xml version="1.0" encoding="UTF-8"?>
<objects>
<g id="TEST" inkscape:label="hello">
<circle id="TEST" inkscape:label="there"/>
<rect id="TEST" inkscape:label="world"/>
</g>
<g id="TEST">
<circle id="TEST" inkscape:label="more"/>
</g>
</objects>..。我可以用表达式string(../@id)“重新插入”属性id的值(因此,我基本上得到了与输入相同的输出):
$ xmlstarlet edit -N inkscape="http://www.inkscape.org/namespaces/inkscape" --update '//*/@id' --expr 'string(../@id)' test.xml
test.xml:3.40: Namespace prefix inkscape for label on g is not defined
<g id="layer3" inkscape:label="hello">
^
test.xml:4.46: Namespace prefix inkscape for label on circle is not defined
<circle id="circ2" inkscape:label="there"/>
^
test.xml:5.44: Namespace prefix inkscape for label on rect is not defined
<rect id="rect2" inkscape:label="world"/>
^
test.xml:8.45: Namespace prefix inkscape for label on circle is not defined
<circle id="circ3" inkscape:label="more"/>
^
<?xml version="1.0" encoding="UTF-8"?>
<objects>
<g id="layer3" inkscape:label="hello">
<circle id="circ2" inkscape:label="there"/>
<rect id="rect2" inkscape:label="world"/>
</g>
<g id="layer4">
<circle id="circ3" inkscape:label="more"/>
</g>
</objects>..。但是,我不能使用相同的技巧(表达式string(../@inkscape:label) -或string(../@*[local-name()='label'])从属性inkscape:label中读取),而且我也无法确定这是否是因为“名称空间前缀”。“未界定”电文:
$ xmlstarlet edit -N inkscape="http://www.inkscape.org/namespaces/inkscape" --update '//*/@id' --expr 'string(../@inkscape:label)' test.xml
test.xml:3.40: Namespace prefix inkscape for label on g is not defined
<g id="layer3" inkscape:label="hello">
^
test.xml:4.46: Namespace prefix inkscape for label on circle is not defined
<circle id="circ2" inkscape:label="there"/>
^
test.xml:5.44: Namespace prefix inkscape for label on rect is not defined
<rect id="rect2" inkscape:label="world"/>
^
test.xml:8.45: Namespace prefix inkscape for label on circle is not defined
<circle id="circ3" inkscape:label="more"/>
^
<?xml version="1.0" encoding="UTF-8"?>
<objects>
<g id="" inkscape:label="hello">
<circle id="" inkscape:label="there"/>
<rect id="" inkscape:label="world"/>
</g>
<g id="">
<circle id="" inkscape:label="more"/>
</g>
</objects>并通过get attribute value using xmlstarlet or xmllint;我可以确认我可以通过以下方法锁定id属性:
xmlstarlet select -N inkscape="http://www.inkscape.org/namespaces/inkscape" --template --value-of '//*/@id' test.xml..。但是inkscape:label的相应命令没有返回任何内容:
xmlstarlet select -N inkscape="http://www.inkscape.org/namespaces/inkscape" --template --value-of '//*/@inkscape:label' test.xml可能是名称空间的问题,但我不明白如何忽略名称空间,而只与文档中的属性名称相关联。
编辑:最后用Python 3解决了这个问题:
#!/usr/bin/env python3
# https://stackoverflow.com/questions/30097949/elementtree-findall-to-recursively-select-all-child-elements
# https://stackoverflow.com/questions/13372604/python-elementtree-parsing-unbound-prefix-error
# https://stackoverflow.com/questions/2352840/parsing-broken-xml-with-lxml-etree-iterparse
# https://stackoverflow.com/questions/28813876/how-do-i-get-pythons-elementtree-to-pretty-print-to-an-xml-file
import sys
import lxml
import lxml.etree
import xml.etree.ElementTree as ET
def proc_node(node):
target_label = 'inkscape:label' # file without namespace, like `test.xml` here
#target_label = '{http://www.inkscape.org/namespaces/inkscape}label' # file with namespace (like proper Inkscape .svg)
if target_label in node.attrib:
node.attrib['id'] = node.attrib[target_label]
for childel in node.getchildren():
proc_node(childel)
parser1 = lxml.etree.XMLParser(encoding="utf-8", recover=True)
tree1 = ET.parse('test.xml', parser1)
ET.indent(tree1, space=" ", level=0)
proc_node(tree1.getroot())
print(lxml.etree.tostring(tree1.getroot(), xml_declaration=True, pretty_print=True, encoding='UTF-8').decode('utf-8'))..。如果我调用这个xmlproc.py,那么结果是:
$ python3 xmlproc.py
<?xml version='1.0' encoding='UTF-8'?>
<objects>
<g id="hello" inkscape:label="hello">
<circle id="there" inkscape:label="there"/>
<rect id="world" inkscape:label="world"/>
</g>
<g id="layer4">
<circle id="more" inkscape:label="more"/>
</g>
</objects>..。这正是我想要的。
因此,按照假设问题的精神来说明--我如何用xmlstarlet实现这一点?
发布于 2022-10-13 22:17:04
这可以用xmllint分三个步骤完成:
# Step 1 - get label values into an array for elements containing both attributes
labels=( $(printf '%s\n' 'setrootns' 'cat //*[@inkscape:label and @id]/@inkscape:label' | xmllint --shell tmp.xml | sed -rne '/inkscape:label/ s/inkscape:label="(.*)"/\1/p' ) )
# Step 2 - build xpath
xpath=( 'setrootns' )
for i in "${!labels[@]}"; do
# get current element name ;-)
xpath[${#xpath[@]}]="xpath name((//*[@inkscape:label and @id])[$i+1])"
xpath[${#xpath[@]}]="cd (//*[@inkscape:label and @id])[$i+1]/@id"
xpath[${#xpath[@]}]="set ${labels[$i]}"
done
xpath[${#xpath[@]}]='save'
xpath[${#xpath[@]}]='bye'
#Step 3 - execute Xpath
printf "%s\n" "${xpath[@]}" | xmllint --shell tmp.xml最重要的XPath表达式是查找具有两个属性的元素,其中nodeset索引是labels数组索引+1。
(//*[@inkscape:label and @id])[$i+1]脚本输出
/ > setrootns
/ > xpath name((//*[@inkscape:label and @id])[0+1])
Object is a string : g
/ > cd (//*[@inkscape:label and @id])[0+1]/@id
id > set hello
id > xpath name((//*[@inkscape:label and @id])[1+1])
Object is a string : circle
id > cd (//*[@inkscape:label and @id])[1+1]/@id
id > set there
id > xpath name((//*[@inkscape:label and @id])[2+1])
Object is a string : rect
id > cd (//*[@inkscape:label and @id])[2+1]/@id
id > set world
id > xpath name((//*[@inkscape:label and @id])[3+1])
Object is a string : circle
id > cd (//*[@inkscape:label and @id])[3+1]/@id
id > set more
id > save
id > byehttps://stackoverflow.com/questions/74055893
复制相似问题