sequence - R - Compute Mismatch By Group -


i wondering how compute mismatching cases by group.

let imagine data :

sek = rbind(c(1, 'a', 'a', 'a'),          c(1, 'a', 'a', 'a'),          c(2, 'b', 'b', 'b'),          c(2, 'c', 'b', 'b'))  colnames(sek) <- c('group', paste('t', 1:3, sep = '')) 

the data

     group t1  t2  t3  [1,] "1"   "a" "a" "a" [2,] "1"   "a" "a" "a" [3,] "2"   "b" "b" "b" [4,] "2"   "c" "b" "b" 

in order

group 1 : 0  group 2 : 1  

it fancy use stringdist library compute this.

something

seqdistgroupstr = function(x) stringdistmatrix(x, method = 'hamming')  sek %>%    as.data.frame() %>%    group_by(group) %>%    seqdistgroupstr()  

but not working.

any ideas ?

quick update: how solve question of weights? example, how pass argument - value (1,2,3, ...) - when setting mistmatch between 2 characters. mismatch between b , c cost 2 while mismatch between a , c cost 1 , on.

the code below give number of mismatches group, mismatch defined 1 less number of unique values in each column t1, t2, etc. each level of group. think need bring in string distance measure if need more binary measure of mismatch, binary measure suffices example gave. also, if want number of distinct rows in each group, @alex's solution more concise.

library(dplyr) library(reshape2)  sek %>% as.data.frame %>%   melt(id.var="group") %>%   group_by(group, variable) %>%   summarise(mismatch = length(unique(value)) - 1) %>%   group_by(group) %>%   summarise(mismatch = sum(mismatch))    group mismatch 1     1        0 2     2        1 

here's shorter dplyr method count individual mismatches. doesn't require reshaping, requires other data gymnastics:

sek %>% as.data.frame %>%   group_by(group) %>%   summarise_each(funs(length(unique(.)) - 1)) %>%   mutate(mismatch = rowsums(.[-1])) %>%   select(-matches("^t[1-3]$")) 

Comments

Popular posts from this blog

toolbar - How to add link to user registration inside toobar in admin joomla 3 custom component -

linux - disk space limitation when creating war file -