pmml in R generating improper variable names -
i using pmml package in r generate pmml logistic regression model obtained using glm function follows:
library(pmml) var <- sample(c(1,2,3),100,replace = true) var_cat <- sample(c(1,2,3,4),100,replace = true) y <- sample(c(0,1),100,replace = true) df <- data.frame(y = as.factor(y),var = as.factor(var), var_cat = as.factor(var_cat)) model <- glm(y ~ ., data = df, family = binomial) pmmloutput <- pmml(model) the ppmatrix portion of pmml shown below:
<ppmatrix> <ppcell value="2" predictorname="var" parametername="p1"/> <ppcell value="3" predictorname="var" parametername="p2"/> <ppcell value="_cat2" predictorname="var" parametername="p3"/> <ppcell value="2" predictorname="var_cat" parametername="p3"/> <ppcell value="_cat3" predictorname="var" parametername="p4"/> <ppcell value="3" predictorname="var_cat" parametername="p4"/> <ppcell value="_cat4" predictorname="var" parametername="p5"/> <ppcell value="4" predictorname="var_cat" parametername="p5"/> </ppmatrix> the first variable , levels appear alright (var,2) , (var,3). however, there 2 lines second variable variable name , levels getting split @ wrong location.
instead of getting (var_cat,2), getting split (var,_cat2) highlighted below:
<ppcell value="_cat2" predictorname="var" parametername="p3"/> this seems happen when there overlapping variable names (in case var , var_cat). however, works fine if var_cat variable present.
could suggest way address issue?
unfortunately, correct; have found bug in r code.
the way finds values assumes different variable names not substrings of another.
since var substring of var_cat, error. notice var_cat , cat potentially give same problem. on other hand, var_cat1 not substring of var_cat2, should work.
for now, easiest way name variables variable name not substring of another. fortunately planning new release in next couple of weeks, try include fix in release.
Comments
Post a Comment