r - Remove duplicate rows -

March 15, 2014

already question answered here ,but not make work.

i have data frame here,interested remove duplicate rows based on symbol. checking column call remove duplicates.the priority p>a>m.if p,a,m keep p, if a,m, keep a, otherwise m.

      symbol  intensity  call 1     ddr1    596.95050    p 2     rfc2    420.28708    p 3     hspa6   510.73254    p 4     ddr1   1717.99487    5     guca1a  121.53488    6     uba7   1810.49780    p 7     uba7    301.51944    m 8     guca1a   34.53987    9     ccl5   5966.24609    p 10    cyp2e1   95.15707    11    cyp2e1  164.95276    m 12    esrra  1024.88745    p 13    cyp2a6  502.48877    14    gas6    921.70923    p 15    mmp14   524.96863    16    gas6   3069.48462    p 17    fntb    266.77686    18    pld1    187.65569    19    pld1   1891.04541    p 20    pld1    258.79028    m

i tried code found here

library(data.table) setdt(df)[, list(call=call[which.min(factor(call, levels=c('p', 'a', 'm')))]),                                    .(symbol)]

but removes second column intensity. help, please make sure code fastest also. thanks

expected output

          symbol  intensity  call     1     ddr1    596.95050    p     2     rfc2    420.28708    p     3     hspa6   510.73254    p     5     guca1a  121.53488        6     uba7   1810.49780    p     9     ccl5   5966.24609    p     10    cyp2e1   95.15707        12    esrra  1024.88745    p     13    cyp2a6  502.48877        14    gas6    921.70923    p     15    mmp14   524.96863        17    fntb    266.77686        19    pld1   1891.04541    p

you can either use order (in ith position) order "call" column converting factor levels specified in correct order, , subset first observation (.sd[1l]), grouped 'symbol'

library(data.table) setdt(df)[order(factor(call, levels=c('p', 'a', 'm'))),                                  .sd[1l], = symbol]

or modifying code, instead of list(call=.., can use .sd subset rows.

setdt(df)[, .sd[which.min(factor(call, levels=c('p', 'a', 'm')))], .(symbol)]

an option using dplyr is

library(dplyr) df %>%     group_by(symbol) %>%     arrange(factor(call, levels=c('p', 'a', 'm'))) %>%     slice(1l)

or use which.min within slice

df %>%     group_by(symbol) %>%     slice(which.min(factor(call, levels=c('p', 'a', 'm'))))

Search This Blog

JVParth

r - Remove duplicate rows -

Comments

Post a Comment

Popular posts from this blog

toolbar - How to add link to user registration inside toobar in admin joomla 3 custom component -

linux - disk space limitation when creating war file -

I can see elements on storyboard from one screen on the other one - Objective C -