python - Pyparsing: Parse nested, typed parameter list with nestedExpr -
i have typed , optionally nested parameter list parse.
input: (int:1, float:3, list:(float:4, int:5)) expected dump: [[['int', '1'], ['float', '3'], ['list', [['float', '4'], ['int', '5']]]]] if type omitted, depending of following value standard type should chosen:
input: (1, float:3, (4, int:5)) expected dump: [[['str', '1'], ['float', '3'], ['tuple', [['str', '4'], ['int', '5']]]]] as might expect use types in parseaction transform values automatically during parsing. step works, hence skip here.
my approach of problem is:
import pyparsing pp dicasttypes={ "str": lambda value: value, "int": lambda value: int(value), "float": lambda value: float(value), "tuple": lambda value: tuple(value), "list": lambda value: list(value), "set": lambda value: set(value), "dict": lambda value: dict(value), } bsquoted = lambda expr : pp.literal('\\').suppress() + expr def parsingstring (specialsigns = '', printables = pp.printables): sespecialsigns = set(specialsigns).union(set('\\')) signs = ''.join(sorted(set(printables).difference(sespecialsigns))) allowedliterals = ( pp.literal(r"\t").setparseaction(lambda : "\t") | pp.literal(r"\ ").setparseaction(lambda : " ") | pp.literal(r"\n").setparseaction(lambda : "\n") | pp.word(signs) | bsquoted('"') | bsquoted("'") ) special in sespecialsigns: allowedliterals = allowedliterals | bsquoted(special) return pp.combine(pp.oneormore(allowedliterals)) value = parsingstring('(),=:') nestedvalue = pp.forward() castpattern = pp.optional(pp.oneof(list(dicasttypes.keys())) + pp.literal(":").suppress(), "str")("casttype") castpatternseq = pp.optional(pp.oneof(list(dicasttypes.keys())) + pp.literal(":").suppress(), "tuple")("casttype") parametervalue = pp.nestedexpr(content=( pp.group( (castpattern + value("rawvalue")) | (castpatternseq + nestedvalue) ) | pp.literal(',').suppress() )) nestedvalue <<= parametervalue this implementation works correctly, have serious problem default values of nested sequences:
parametervalue.parsestring('(int:1, float:3, list:(float:4, int:5))').dump() "[[['int', '1'], ['float', '3'], ['list', [['float', '4'], ['int', '5']]]]]" parametervalue.parsestring('(1, float:3, (4, int:5))').dump() "[[['str', '1'], ['float', '3'], [['str', '4'], ['int', '5']]]]" as can see, expected default value tuple sequence not set , depth of result list not correct. guess nestedexpr() catches pattern (4, int:5) before comes through parser (castpatternseq + nestedvalue). problem serious me, because plan call parseraction inside the nestedexpr pattern:
(castpattern + value("rawvalue")).setparseaction(castparameter)) | (castpatternseq + nestedvalue).setparseaction(castparameter)) this works well, if type given explicitly, of course fails otherwise.
is there opportunity make nestedexpr little bit less greedy?
update 1
hi guys. after wasting whole day yesterday, instantly found solution problem described above morning.
i added delimitedlist implementation:
value = parsingstring('(),=:') nestedvalue = pp.forward() castpattern = pp.optional(pp.oneof(list(dicasttypes.keys())) + pp.literal(":").suppress(), "str")("casttype") castpatternseq = pp.optional(pp.oneof(list(dicasttypes.keys())) + pp.literal(":").suppress(), "tuple")("casttype") parametervalue = pp.nestedexpr(content=pp.delimitedlist( pp.group( (castpattern + value("rawvalue")) | (castpatternseq + nestedvalue) ) )) nestedvalue <<= parametervalue this works well, not pretty well, can see in following examples:
parametervalue.parsestring('(int:1, (int:2, int:4))').dump() "[[['int', '1'], ['tuple', [['int', '2'], ['int', '4']]]]]" parametervalue.parsestring('(int:1, ((int:2, int:4), (int:6, int9)) )').dump() pyparsing.parseexception: expected ")" (at char 6), (line:1, col:7) parametervalue.parsestring('(int:1, ((int:2, int:4) (int:6, int:9)) )').dump() "[[['int', '1'], ['tuple', [[['int', '2'], ['int', '4']], [['int', '6'], ['int', '9']]]]]]" parametervalue.parsestring('(int:1, (tuple:(int:2, int:4) tuple:(int:6, int9)) )').dump() "[[['int', '1'], ['tuple', [['tuple', [['int', '2'], ['int', '4']]], ['tuple', [['int', '6'], ['str', 'int9']]]]]]]" the exception in example 2 lets me think, nestedexpr , delimitedlist don't work each other, they're catching pattern each other. ever reason might be, seems on problem, because if omit , in example 3, delimitedlist has nothing catch , whole pattern matches. not expected, because default types missing again. without , , explicit types parsing works well.
any ideas?
update 2
the problem, statement
parametervalue.parsestring('(int:1, ((int:2, int:4), (int:6, int9)) )').dump() raises exception, can solved altering implementation little bit (but seems more hack solution). i've added expression | pp.literal(",").suppress():
value = parsingstring('(),=:') nestedvalue = pp.forward() castpattern = pp.optional(pp.oneof(list(dicasttypes.keys())) + pp.literal(":").suppress(), "str")("casttype") castpatternseq = pp.optional(pp.oneof(list(dicasttypes.keys())) + pp.literal(":").suppress(), "tuple")("casttype") parametervalue = pp.nestedexpr(content=pp.delimitedlist( pp.group( (castpattern + value("rawvalue")) | (castpatternseq + nestedvalue) ) ) | pp.literal(",").suppress()) nestedvalue <<= parametervalue
Comments
Post a Comment