python - Retaining split characters -
i have following data:
<http://dbpedia.org/data/plasmodium_hegneri.xml> <http://code.google.com/p/ldspider/ns#headerinfo> _:header16125770191335188966549 <http://dbpedia.org/data/plasmodium_hegneri.xml> . _:header16125770191335188966549 <http://www.w3.org/2006/http#responsecode> "200"^^<http://www.w3.org/2001/xmlschema#integer> <http://dbpedia.org/data/plasmodium_hegneri.xml> . _:header16125770191335188966549 <http://www.w3.org/2006/http#date> "mon, 23 apr 2012 13:49:27 gmt" <http://dbpedia.org/data/plasmodium_hegneri.xml> . _:header16125770191335188966549 <http://www.w3.org/2006/http#content-type> "application/rdf+xml; charset=utf-8" <http://dbpedia.org/data/plasmodium_hegneri.xml> . now want transform data following form -- such last string enclosed in < > appears before line in appears #@ added.
#@ <http://dbpedia.org/data/plasmodium_hegneri.xml> <http://dbpedia.org/data/plasmodium_hegneri.xml> <http://code.google.com/p/ldspider/ns#headerinfo> _:header16125770191335188966549 . #@ <http://dbpedia.org/data/plasmodium_hegneri.xml> _:header16125770191335188966549 <http://www.w3.org/2006/http#responsecode> "200"^^<http://www.w3.org/2001/xmlschema#integer> . #@ <http://dbpedia.org/data/plasmodium_hegneri.xml> _:header16125770191335188966549 <http://www.w3.org/2006/http#date> "mon, 23 apr 2012 13:49:27 gmt" . #@ <http://dbpedia.org/data/plasmodium_hegneri.xml> _:header16125770191335188966549 <http://www.w3.org/2006/http#content-type> "application/rdf+xml; charset=utf-8" . i wrote following python code in order same:
infile = open('testnq.nq', 'r') outfile= open('outfile.ttl','w') while true: infileline1=infile.readline() if not infileline1: break #eof splitstring=infileline1.split(' ') line1= "#@ " + splitstring[len(splitstring)-2] outfile.write(line1) line2="" num in range (0,len(splitstring)-2): line2= line2 + splitstring[num] outfile.write(line2) outfile.close() but not able obtain spaces @ desired places. can please suggest how can same in python or using linux commands
with risk of using regular expression , complicating things, may work:
import re line = """<http://dbpedia.org/data/plasmodium_hegneri.xml> <http://code.google.com/p/ldspider/ns#headerinfo> _:header16125770191335188966549 <http://dbpedia.org/data/plasmodium_hegneri.xml> .""" print re.sub('^(?p<before>.*)(?p<match>\<[^>]+\>)(?p<after>[^<]*)$', '#@ \g<match>\n\g<before>\g<after>', line) line = """_:header16125770191335188966549 <http://www.w3.org/2006/http#responsecode> "200"^^<http://www.w3.org/2001/xmlschema#integer> <http://dbpedia.org/data/plasmodium_hegneri.xml> .""" print re.sub('^(?p<before>.*)(?p<match>\<[^>]+\>)(?p<after>[^<]*)$', '#@ \g<match>\n\g<before>\g<after>', line) which outputs:
#@ <http://dbpedia.org/data/plasmodium_hegneri.xml> <http://dbpedia.org/data/plasmodium_hegneri.xml> <http://code.google.com/p/ldspider/ns#headerinfo> _:header16125770191335188966549 . #@ <http://dbpedia.org/data/plasmodium_hegneri.xml> _:header16125770191335188966549 <http://www.w3.org/2006/http#responsecode> "200"^^<http://www.w3.org/2001/xmlschema#integer> .
Comments
Post a Comment