python - Writing with lxml emitting no whitespace even when pretty_print=True -
i'm using lxml library read xml template, insert/change elements, , save resulting xml. 1 of elements i'm creating on fly using etree.element , etree.subelement methods:
tree = etree.parse(r'xml_archive\templates\metadata_template_pts.xml') root = tree.getroot() stream = [] element in root.iter(): if isinstance(element.tag, basestring): stream.append(element.tag) # find "keywords" element , insert new "theme" element if element.tag == 'keywords' , 'theme' not in stream: theme = etree.element('theme') themekt = etree.subelement(theme, 'themekt').text = 'none' tk in themekeys: themekey = etree.subelement(theme, 'themekey').text = tk element.insert(0, theme) prints screen nicely print etree.tostring(theme, pretty_print=true):
<theme> <themekt>none</themekt> <themekey>hydrogeology</themekey> <themekey>stratigraphy</themekey> <themekey>floridan aquifer system</themekey> <themekey>geology</themekey> <themekey>regional groundwater availability study</themekey> <themekey>usgs</themekey> <themekey>united states geological survey</themekey> <themekey>thickness</themekey> <themekey>altitude</themekey> <themekey>extent</themekey> <themekey>regions</themekey> <themekey>upper confining unit</themekey> <themekey>fas</themekey> <themekey>base</themekey> <themekey>geologic units</themekey> <themekey>geology</themekey> <themekey>extent</themekey> <themekey>inlandwaters</themekey> </theme> however, when using etree.elementtree(root).write(out_xml_file, method='xml', pretty_print=true) write out xml, element gets flattened in output file:
<theme><themekt>none</themekt><themekey>hydrogeology</themekey><themekey>stratigraphy</themekey><themekey>floridan aquifer system</themekey><themekey>geology</themekey><themekey>regional groundwater availability study</themekey><themekey>usgs</themekey><themekey>united states geological survey</themekey><themekey>thickness</themekey><themekey>altitude</themekey><themekey>extent</themekey><themekey>regions</themekey><themekey>upper confining unit</themekey><themekey>fas</themekey><themekey>base</themekey><themekey>geologic units</themekey><themekey>geology</themekey><themekey>extent</themekey><themekey>inlandwaters</themekey></theme> the rest of file written nicely, particular element causing (purely aesthetic) trouble. ideas of i'm doing wrong?
below snippet of markup template xml file (save "template.xml" run code snippet @ bottom). flattening of tags occurs when parse existing file , insert new element, not when xml created scratch using lxml.
<?xml version="1.0" encoding="utf-8"?> <?xml-stylesheet type="text/xsl" href="fgdc_classic.xsl"?> <metadata xmlns:xsi="http://www.w3.org/2001/xmlschema-instance" xsi:nonamespaceschemalocation="http://water.usgs.gov/gis/metadata/usgswrd/fgdc-std-001-1998.xsd"> <keywords> <theme> <themekt>iso 19115 topic categories</themekt> <themekey>environment</themekey> <themekey>geoscientificinformation</themekey> <themekey>inlandwaters</themekey> </theme> <place> <placekt>none</placekt> <placekey>florida</placekey> <placekey>georgia</placekey> <placekey>alabama</placekey> <placekey>south carolina</placekey> </place> </keywords> </metadata> below snippet of code used snippet of markup (above):
# create new theme element insert root themekeys = ['hydrogeology', 'stratigraphy', 'inlandwaters'] tree = etree.parse(r'template.xml') root = tree.getroot() stream = [] element in root.iter(): if isinstance(element.tag, basestring): stream.append(element.tag) # edit theme keywords if element.tag == 'keywords': theme = etree.element('theme') themekt = etree.subelement(theme, 'themekt').text = 'none' tk in themekeys: themekey = etree.subelement(theme, 'themekey').text = tk element.insert(0, theme) # write xml new file out_xml_file = 'test.xml' etree.elementtree(root).write(out_xml_file, method='xml', pretty_print=true) open(out_xml_file, 'r') f: lines = f.readlines() open(out_xml_file, 'w') f: f.write('<?xml version="1.0" encoding="utf-8"?>\n') line in lines: f.write(line)
if replace line:
tree = etree.parse(r'template.xml') with these lines:
parser = etree.xmlparser(remove_blank_text=true) tree = etree.parse(r'template.xml', parser) then work expected. trick use xmlparser has remove_blank_text option set true. existing ignorable whitespace removed , therefore not disrupt subsequent pretty-printing.
Comments
Post a Comment