regex - strsplit by parentheses -
this question has answer here:
suppose have string "a b c (123-456-789)", i'm wondering what's best way retrieve "123-456-789" it.
strsplit("a b c (123-456-789)", "\\(") [[1]] [1] "a b c" "123-456-789)"
if want extract digits - between braces, 1 option str_extract. if there multiple patterns within string, use str_extract_all
library(stringr) str_extract(str1, '(?<=\\()[0-9-]+(?=\\))') #[1] "123-456-789" str_extract_all(str2, '(?<=\\()[0-9-]+(?=\\))') in above codes, using regex lookarounds extract numbers , -. positive lookbehind (?<=\\()[0-9-]+ matches numbers along - ([0-9-]+) in (123-456-789 , not in 123-456-789. lookahead ('[0-9-]+(?=\)') matches numbers along - in 123-456-789) , not in 123-456-798. taken matches cases satisfy both conditions (123-456-789) , extract in between lookarounds , not cases (123-456-789 or 123-456-789)
with strsplit can specify split [()]. keep () inside square brackets [] treat characters or else have escape parentheses ('\\(|\\)').
strsplit(str1, '[()]')[[1]][2] #[1] "123-456-789" if there multiple substrings extract string, loop lapply , extract numeric split parts grep
lapply(strsplit(str2, '[()]'), function(x) grep('\\d', x, value=true)) or can use stri_split stringi has option remove empty strings (omit_empty=true).
library(stringi) stri_split_regex(str1, '[()a-z ]', omit_empty=true)[[1]] #[1] "123-456-789" stri_split_regex(str2, '[()a-z ]', omit_empty=true) another option rm_round qdapregex if interested in extracting contents inside brackets.
library(qdapregex) rm_round(str1, extract=true)[[1]] #[1] "123-456-789" rm_round(str2, extract=true) data
str1 <- "a b c (123-456-789)" str2 <- c("a b c (123-425-478) a", "abc(123-423-428)", "(123-423-498) abcdd", "(123-432-423)", "abc (123-423-389) gr (124-233-848) ak")
Comments
Post a Comment