r - Differences in quantile function -
i struggling strange behaviour in r, quantile function.
i have 2 sets of numeric data, , custom boxplot stats function (which helped me write, not sure every detail):
sample_lang = c(91, 122, 65, 90, 90, 102, 98, 94, 84, 86, 108, 104, 94, 110, 100, 86, 92, 92, 124, 108, 82, 65, 102, 90, 114, 88, 68, 112, 96, 84, 92, 80, 104, 114, 112, 108, 68, 92, 68, 63, 112, 116) sample_vocab = c(96, 136, 81, 92, 95, 112, 101, 95, 97, 94, 117, 95, 111, 115, 88, 92, 108, 81, 130, 106, 91, 95, 119, 103, 132, 103, 65, 114, 107, 108, 86, 100, 98, 111, 123, 123, 117, 82, 100, 97, 89, 132, 114) my.boxplot.stats <- function (x, coef = 1.5, do.conf = true, do.out = true) { if (coef < 0) stop("'coef' must not negative") nna <- !is.na(x) n <- sum(nna) #stats <- stats::fivenum(x, na.rm = true) stats <- quantile(x, probs = c(0.15, 0.25, 0.5, 0.75, 0.85), na.rm = true) iqr <- diff(stats[c(2, 4)]) if (coef == 0) do.out <- false else { out <- if (!is.na(iqr)) { x < (stats[2l] - coef * iqr) | x > (stats[4l] + coef * iqr) } else !is.finite(x) if (any(out[nna], na.rm = true)) stats[c(1, 5)] <- range(x[!out], na.rm = true) } conf <- if (do.conf) stats[3l] + c(-1.58, 1.58) * iqr/sqrt(n) list(stats = stats, n = n, conf = conf, out = if (do.out) x[out & nna] else numeric()) }
however, when call quantile
, my.boxplot.stats
on same set of data, getting different quantile results sample_vocab
data (but appears consistent sample_lang
data), , not sure why:
> quantile(sample_vocab, probs = c(0.15, 0.25, 0.5, 0.75, 0.85), na.rm=true) 15% 25% 50% 75% 85% 89.6 94.5 101.0 114.0 118.4 > > my.boxplot.stats(sample_vocab) $stats 15% 25% 50% 75% 85% 81.0 94.5 101.0 114.0 136.0
could me understand happening? please note, reasonably experienced programming, have no formal training in r, learning on own.
thanks in advance!
the relevant bit of code right here:
if (coef == 0) do.out <- false else { out <- if (!is.na(iqr)) { x < (stats[2l] - coef * iqr) | x > (stats[4l] + coef * iqr) } else !is.finite(x) if (any(out[nna], na.rm = true)) stats[c(1, 5)] <- range(x[!out], na.rm = true) }
basically, if coef != 0
(in case coef
1.5, default function parameter), first , last elements of reported quantiles replaced minimum , maximum data value within coef * iqr
of 25% , 75% quantiles, iqr
distance between quantiles.
Comments
Post a Comment