Learning to write functions in R -


i point r start writing own functions because tend need same things on , over. however, struggling see how can generalize write. looking @ source code has not helped me learn because seems .internal or .primitive functions (or other commands not know) used extensively. start turning normal copy-pasted solutions functions - fancier things can come later!

as example: lot of data formatting requires doing operation, , filling in data frame zeros other combinations did not have data (e.g., years did not have observations , therefore not recorded, etc). need on , on different data sets have different sets of variables, idea , implementation same.

my non-function way of solving has been (for specific implementation , minimal example):

df <- data.frame(county = c(1, 45, 57),                  year = c(2002, 2003, 2003),                  level = c("mean", "mean", "mean"),                  obs = c(1.4, 1.9, 10.2))  #create expanded version of data frame counties <- seq(from = 1, = 77, = 2) years <- seq(from = 1999, = 2014, = 1) levels <- c("max", "mean") expansion <- expand.grid(counties, years, levels) expansion[4] <- 0 colnames(expansion) <- colnames(df)  #merge , order them observed value on top df_full <- merge(expansion, df, = true) df_full$duplicate <- with(df_full,                           paste(year, county, level))  df_full <- df_full[order(df_full$year,                          df_full$county,                          df_full$level,                          -abs(df_full$obs)), ]  #deduplicate taking first shows (the observation) df_full <- df_full[ !duplicated(df_full$duplicate), ] df_full$duplicate <- null 

i generalize somehow put in data frame (and select columns need order since changes) , expanded version out. first implementation consisted of function many arguments (the data-frame , column names wanted order/expand.grid by) , did not work:

gridexpand <- function(df, col1, col2=null, col3=null, measure){   #started "expansion" being global outside of function    #it identical first part of above code   ex <- merge(expansion, df, = true)   ex$dupe <- with(ex,                  paste(col1, col2, col3))    ex <- ex[order(with(ex,                        col1, col2, col3, -abs(measure)))]    ex <- ex[ !duplicated(ex$dupe)]    ex <- subset(ex, select = -(dupe))   }  df_full <- gridexpand(df, year, county, level, obs)  error in paste(col1, col2, col3) : object 'year' not found 

i assuming did not work because r has no way know 'year' came from. potentially try paste(df, "$year") create "df$year" not work. , not ever see else in functions missing how people reference things in data frame relevant functions.

i ideally know of resources thinking generalization, or if can point me in right direction solving particular problem think might me see doing wrong. not know of better way ask - have been trying read tutorials on writing functions 3 months , not clicking.

at glance, biggest thing can not use non-standard-evaluation shortcuts inside functions: things $, subset() , with(). these functions intended convenient interactive use, not extensible programmatic use. (see, e.g., warning in ?subset should added ?with, fortunes::fortune(312), fortunes::fortune(343).)

fortunes::fortune(312) 

the problem here $ notation magical shortcut , other magic if used incorrectly programmatic equivalent of turning toad. -- greg snow (in response user wanted access column name stored in y via x$y rather x[[y]]) r-help (february 2012)

fortunes::fortune(343) 

sooner or later r beginners bitten convenient shortcut. r newbie, think of r bank account: overuse of $-extraction can lead undesirable consequences. it's best acquire [[ , [ habit early. -- peter ehlers (about use of $-extraction) r-help (march 2013)

when start writing functions work on data frames, if need reference column names should pass them in strings, , use [ or [[ column based on string stored in variable name. simplest way make functions flexible user-specified column names. example, here's simple stupid function tests if data frame has column of given name:

does_col_exist_1 = function(df, col) {     return(!is.null(df$col)) }  does_col_exist_2 = function(df, col) {     return(!is.null(df[[col]])     # equivalent df[, col] } 

these yield:

does_col_exist_1(mtcars, col = "jhfa") # [1] false does_col_exist_1(mtcars, col = "mpg") # [1] false  does_col_exist_2(mtcars, col = "jhfa") # [1] false does_col_exist_2(mtcars, col = "mpg") # [1] true 

the first function wrong because $ doesn't evaluate comes after it, no matter value set col when call function, df$col column literally named "col". brackets, however, evaluate col , see "oh hey, col set "mpg", let's column of name."

if want lots more understanding of issue, i'd recommend non-standard evaluation section of hadley wickham's advanced r book.

i'm not going re-write , debug functions, if wanted first step remove $, with(), , subset(), replacing [. there's pretty chance that's need do.


Comments

Popular posts from this blog

toolbar - How to add link to user registration inside toobar in admin joomla 3 custom component -

linux - disk space limitation when creating war file -

How to provide Authorization & Authentication using Asp.net, C#? -