regex - strsplit by parentheses -
this question has answer here:
suppose have string "a b c (123-456-789)", i'm wondering what's best way retrieve "123-456-789" it.
strsplit("a b c (123-456-789)", "\\(") [[1]] [1] "a b c" "123-456-789)"
if want extract digits -
between braces, 1 option str_extract
. if there multiple patterns within string, use str_extract_all
library(stringr) str_extract(str1, '(?<=\\()[0-9-]+(?=\\))') #[1] "123-456-789" str_extract_all(str2, '(?<=\\()[0-9-]+(?=\\))')
in above codes, using regex lookarounds extract numbers , -
. positive lookbehind (?<=\\()[0-9-]+
matches numbers along -
([0-9-]+
) in (123-456-789
, not in 123-456-789
. lookahead ('[0-9-]+(?=\)') matches numbers along -
in 123-456-789)
, not in 123-456-798
. taken matches cases satisfy both conditions (123-456-789)
, extract in between lookarounds , not cases (123-456-789
or 123-456-789)
with strsplit
can specify split
[()]
. keep ()
inside square brackets []
treat characters or else have escape parentheses ('\\(|\\)'
).
strsplit(str1, '[()]')[[1]][2] #[1] "123-456-789"
if there multiple substrings extract string, loop lapply
, extract numeric split parts grep
lapply(strsplit(str2, '[()]'), function(x) grep('\\d', x, value=true))
or can use stri_split
stringi
has option remove empty strings (omit_empty=true
).
library(stringi) stri_split_regex(str1, '[()a-z ]', omit_empty=true)[[1]] #[1] "123-456-789" stri_split_regex(str2, '[()a-z ]', omit_empty=true)
another option rm_round
qdapregex
if interested in extracting contents inside brackets.
library(qdapregex) rm_round(str1, extract=true)[[1]] #[1] "123-456-789" rm_round(str2, extract=true)
data
str1 <- "a b c (123-456-789)" str2 <- c("a b c (123-425-478) a", "abc(123-423-428)", "(123-423-498) abcdd", "(123-432-423)", "abc (123-423-389) gr (124-233-848) ak")
Comments
Post a Comment