Remove whitespace after +44 and quotation marks around lat/long in JSON string with regex -
in example snippet below have json needs edited (over 1400 entries). need achieve 2 things:
in example line: "phone": "+44 2079693900", need remove whitespace between +44 , 2079693900 records. resulting in: "+442079693900"
for latitude , longitude need rid of double quotes around numbers, api using accepts these values floats. example: "latitude": "51.51736", needs be: "latitude": 51.51736
i familiar ruby, , have done parsing of json in past, thought regex best tool use kind of basic data cleaning task. have referred regex101.com , regular-expressions.info i'm pretty stuck @ point. in advance!
[ { "id": "101756", "name": "1 lombard street "email": "reception@1lombardstreet.com", "website": "http://www.1lombardstreet.com", "location": { "latitude": "51.5129", "longitude": "-0.089", "address": { "line1": "1 lombard street", "line2": "", "line3": "", "postcode": "ec3v 9aa", "city": "london", "country": "uk" } } }, { "id": "105371", "name": "108 brasserie", "phone": "+44 2079693900", "email": "enquiries@108marylebonelane.com", "website": "http://www.108brasserie.com", "location": { "latitude": "51.51795", "longitude": "-0.15079", "address": { "line1": "108 marylebone lane", "line2": "", "line3": "", "postcode": "w1u 2qe", "city": "london", "country": "uk" } } }, { "id": "108701", "name": "1901 restaurant", "phone": "+44 2076187000", "email": "london.restres@andaz.com", "website": "http://www.andazdining.com", "location": { "latitude": "51.51736", "longitude": "-0.08123", "address": { "line1": "andaz hotel", "line2": "40 liverpool street", "line3": "", "postcode": "ec2m 7qn", "city": "london", "country": "uk" } } }, { "id": "102190", "name": "2 bridge place", "phone": "+44 2078028555", "email": "fb@dtlondonvictoria.com", "website": "http://crimsonhotels.comdoubletreelondonvictoriadiningpre-theatre-dining", "location": { "latitude": "51.49396", "longitude": "-0.14343", "address": { "line1": "2 bridge place", "line2": "victoria", "line3": "", "postcode": "sw1v 1qa", "city": "london", "country": "uk" } } }, { "id": "102063", "name": "2 veneti", "phone": "+44 2076370789", "email": "2veneti@btconnect.com", "website": "http://www.2veneti.com", "location": { "latitude": "51.5168", "longitude": "-0.14673", "address": { "line1": "10 wigmore street", "line2": "", "line3": "", "postcode": "w1u 2rd", "city": "london", "country": "uk" } } },
you can use following regex:
("phone":\s*"\+44)\s+|("(?:latitude|longitude)":\s*)"([^"]+)" with following replacement:
$1$2$3 the idea capture want , not capture not, , use backreferences restore substrings want keep.
regex explanation:
the pattern contains 2 alternatives joined | alternation operator:
("phone":\s*"\+44)\s+:("phone":\s*"\+44)- 1st capturing group matching literal"phone":+ optional whitespace,+44literally\s+- 1 or more whitespaces we'll remove
("(?:latitude|longitude)":\s*)"([^"]+)":("(?:latitude|longitude)":\s*)- second capturing group matching"latitude":or"longitude":, 0 or more whitespace characters"- literal"we'll drop([^"]+)- third capturing group matching 1 or more characters other"(we'll keep that)"- again, literal"we'll drop.
see demo
Comments
Post a Comment