R < BETA < TWiki

---++Cheat sheet for Matlab user:

[[http://mathesaurus.sourceforge.net/octave-r.html][Matlab vs. R]] 

---++Some useful R commands:
*# command line execution of R scripts:*

_R CMD BATCH < test.r_

*# get help (e.g.):*

_> help ("write.table)_

*# write data to file:*

_> write.table(sqd, file="test.dat", col.names = FALSE, quote=FALSE, row.names=FALSE)_

*# read data from file:*

_> rtd <- read.table("uf100-0239-ws55-rtd.dat")_

_> median(rtd$V2)_

_> summary(rtd)_

| *V1* | *V2* | *V3* |
| Min.   :0.0010 | Min.   :   95 | Min.   :0.0001115 |
| 1st Qu.:0.2507 | 1st Qu.: 3276 | 1st Qu.:0.0038440 |
| Median :0.5005 | Median : 8318 | Median :0.0097611 |
| Mean   :0.5005 | Mean   :12995 | Mean   :0.0152500 |
| 3rd Qu.:0.7502 | 3rd Qu.:18308 | 3rd Qu.:0.0214859 |
| Max.   :1.0000 | Max.   :91660 | Max.   :0.1075688 |

*# produce histogram of column V2:*

_> hist(rtd$V2)_

*# plot cdf:*

_> library(stepfun)_
_> plot(ecdf(rtd$V2))_

*# qq plot against std normal:*

_> qqnorm(rtd$V2); qqline(rtd$V2)_

*# wilcoxon rank sum test (compare rtds) = mann-whitney u-test:*

_> library(ctest)_

_> wilcox.test(rtd$V2,rtd40$V2,paired=FALSE)_

Note: Wilcoxon rank sum test with continuity correction

data:  rtd$V2 and rtd40$V2

W = 440056, p-value = 3.45e-06

alternative hypothesis: true mu is not equal to 0

# -> reject null hyp (null hyp = med are equal) -> med are not equal

*# kolmogorov-smirnoff test:*

_> ks.test(rtd$V2,rtd50$V2)_

Note: Two-sample Kolmogorov-Smirnov test

data:  rtd$V2 and rtd50$V2

D = 0.029, p-value = 0.7944

alternative hypothesis: two.sided

Warning message:

cannot compute correct p-values with ties in: ks.test(rtd$V2, rtd50$V2)

# -> do not reject null hyp (distr are equal)

*# kendall's tau test:*

_> corr <- read.table("flat100-corr-nov+.dat") # xxx_

_> cor.test(corr$V1,corr$V2, method="kendall")_

Note: Kendall's rank correlation tau

data:  corr$V1 and corr$V2

z.tau = 12.9965, p-value = < 2.2e-16

alternative hypothesis: true tau is not equal to 0

sample estimates:

     tau

0.8816162

# -> reject null hyp (no correlation between data)

*# spearman's rank order test (alt to above):*

_> cor.test(corr$V1,corr$V2, method="spear")_

*# wilcoxon matched pairs signed-rank test:*

_> wilcox.test(corr$V1,corr$V2, paired=TRUE)_

Note: Wilcoxon signed rank test with continuity correction

data:  corr$V1 and corr$V2

V = 3919, p-value = 1.657e-06

alternative hypothesis: true mu is not equal to 0

# -> reject null hyp (no sign perf diff)

*#kolmogorov-smirnov test against exp distr*

_> ks.test(rtd$V2, pexp, 1/mean(rtd$V2))_

_> ks.test(rtd$V2, pexp, log(2)/29.4)_

Note: chisq.test is _not_ the goodness of fit test!

*# qqplot of rtd vs. simple exp approx:*

_> qqplot(rtd$V2,qexp(rtd$V1,1/mean(rtd$V2)))_

_> qqplot(rtd$V2,qexp(rtd$V1,1/mean(rtd$V2)),log="xy")_

_> rtd <- read.table("ihlk-restart-output-1000-7-rtd.dat")_

_> qqplot(rtd$V2,qexp(1:500/500,log(2)/29.4))_

*# combine columns into table (array):*

_> qq <- cbind(rtd$V2,qexp(rtd$V1,1/mean(rtd$V2)))_

*# write 2-dim table (array) to file:*

_> write (t(qq), file="qq.dat", ncolumns=2)_

*# count number of inst for which alg A > alg B:*

_> table(corr$V1 > corr$V2)_

*# compute correlation of vectors x,y*

_> cor(x,y)_

*# test distribution for normality:*

_> shapiro.test(x)_

Note: Shapiro-Wilk normality test

[p-value < alpha: null hypothesis = data are normally distributed is rejected]

_from Holger H. Hoos_

-- Main.xulin730 - 22 Apr 2009
This topic: BETA > TipsAndTricks > WebHome > EmpiricalAlgorithmics > R
Topic revision: r1 - 2009-04-22 - xulin730