Learning to write functions in R -
i point r start writing own functions because tend need same things on , over. however, struggling see how can generalize write. looking @ source code has not helped me learn because seems .internal or .primitive functions (or other commands not know) used extensively. start turning normal copy-pasted solutions functions - fancier things can come later!
as example: lot of data formatting requires doing operation, , filling in data frame zeros other combinations did not have data (e.g., years did not have observations , therefore not recorded, etc). need on , on different data sets have different sets of variables, idea , implementation same.
my non-function way of solving has been (for specific implementation , minimal example):
df <- data.frame(county = c(1, 45, 57), year = c(2002, 2003, 2003), level = c("mean", "mean", "mean"), obs = c(1.4, 1.9, 10.2)) #create expanded version of data frame counties <- seq(from = 1, = 77, = 2) years <- seq(from = 1999, = 2014, = 1) levels <- c("max", "mean") expansion <- expand.grid(counties, years, levels) expansion[4] <- 0 colnames(expansion) <- colnames(df) #merge , order them observed value on top df_full <- merge(expansion, df, = true) df_full$duplicate <- with(df_full, paste(year, county, level)) df_full <- df_full[order(df_full$year, df_full$county, df_full$level, -abs(df_full$obs)), ] #deduplicate taking first shows (the observation) df_full <- df_full[ !duplicated(df_full$duplicate), ] df_full$duplicate <- null i generalize somehow put in data frame (and select columns need order since changes) , expanded version out. first implementation consisted of function many arguments (the data-frame , column names wanted order/expand.grid by) , did not work:
gridexpand <- function(df, col1, col2=null, col3=null, measure){ #started "expansion" being global outside of function #it identical first part of above code ex <- merge(expansion, df, = true) ex$dupe <- with(ex, paste(col1, col2, col3)) ex <- ex[order(with(ex, col1, col2, col3, -abs(measure)))] ex <- ex[ !duplicated(ex$dupe)] ex <- subset(ex, select = -(dupe)) } df_full <- gridexpand(df, year, county, level, obs) error in paste(col1, col2, col3) : object 'year' not found i assuming did not work because r has no way know 'year' came from. potentially try paste(df, "$year") create "df$year" not work. , not ever see else in functions missing how people reference things in data frame relevant functions.
i ideally know of resources thinking generalization, or if can point me in right direction solving particular problem think might me see doing wrong. not know of better way ask - have been trying read tutorials on writing functions 3 months , not clicking.
at glance, biggest thing can not use non-standard-evaluation shortcuts inside functions: things $, subset() , with(). these functions intended convenient interactive use, not extensible programmatic use. (see, e.g., warning in ?subset should added ?with, fortunes::fortune(312), fortunes::fortune(343).)
fortunes::fortune(312) the problem here $ notation magical shortcut , other magic if used incorrectly programmatic equivalent of turning toad. -- greg snow (in response user wanted access column name stored in
yviax$yratherx[[y]]) r-help (february 2012)
fortunes::fortune(343) sooner or later r beginners bitten convenient shortcut. r newbie, think of r bank account: overuse of $-extraction can lead undesirable consequences. it's best acquire
[[,[habit early. -- peter ehlers (about use of $-extraction) r-help (march 2013)
when start writing functions work on data frames, if need reference column names should pass them in strings, , use [ or [[ column based on string stored in variable name. simplest way make functions flexible user-specified column names. example, here's simple stupid function tests if data frame has column of given name:
does_col_exist_1 = function(df, col) { return(!is.null(df$col)) } does_col_exist_2 = function(df, col) { return(!is.null(df[[col]]) # equivalent df[, col] } these yield:
does_col_exist_1(mtcars, col = "jhfa") # [1] false does_col_exist_1(mtcars, col = "mpg") # [1] false does_col_exist_2(mtcars, col = "jhfa") # [1] false does_col_exist_2(mtcars, col = "mpg") # [1] true the first function wrong because $ doesn't evaluate comes after it, no matter value set col when call function, df$col column literally named "col". brackets, however, evaluate col , see "oh hey, col set "mpg", let's column of name."
if want lots more understanding of issue, i'd recommend non-standard evaluation section of hadley wickham's advanced r book.
i'm not going re-write , debug functions, if wanted first step remove $, with(), , subset(), replacing [. there's pretty chance that's need do.
Comments
Post a Comment