Below you will find pages that utilize the taxonomy term “R”
Post
Monty Hall problem - a small simulation in R
The Monty Hall problem is an interesting example for how much intuition can mislead us in some statistical contexts. Even more disturbing though is, for how long we are prepared to debate and defend an expected result before actually checking our initial guesses using a simple Monte Carlo simulation.
Here is simple simulation implementation the Monty Hall game show problem:
In the TV show “Let’s Make a Deal” the host Monty Hall would offer to game participant the choice of three doors.
Post
Slides and blog posts with R and emacs org-mode
Preparing a larger number of slides with R
code and plots can be a bit
tedious with standard desktop presentation software like powerpoint or keynote.
The manual effort to change the example code, run the analysis and then cut and
paste updated graphs, tables and code is high. Sooner or later one is bound to
create inconsistencies between code and expected results or even syntax errors
Post
Using Data Frames in Feather format (Apache Arrow)
Triggered by the RStudio blog article about feather I did the one line install and compared the results on a data frame of 19 million rows. First results look indeed promising:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 # build the package > devtools::install_github("wesm/feather/R") # load an existing data frame (19 million rows with batch job execution results) > load("batch-12-2015.
Post
Setting up an RStudio server for iPad access
Sometimes it can be convenient to run RStudio remotely from an iPad or another machine with little RAM or disk space. This can be done quite easily using the free RStudio Server on OSX via docker. To do this:
Find the rocker/rstudio image on docker hub and follow the setup steps here at github.
Once the image is running, you should be able to connect with Safari on the host Mac to the login page eg at
Post
Cached, asychronous IP resolution
Resolving IP addresses to host names is quite helpful for getting a quick overview of who is connecting from where. This may need some care to not put too much strain on your DNS server with a large number of repeated lookups. Also you may not want to wait for timeouts on IPs that do not resove. R itself is not supporting this specifically but can easily exploit asyncronous DNS lookup tools like adns (on OSX from homebrew) and provide a cache to speed things up.
Post
Using R for weblog analysis
Apache Weblog Analysis Whether you run your own blog or web server or use some hosted service – at some point you may be interested in some information on how well your server or your users are doing. Many infos like hit frequency, geolocation of users and distribution of spent bandwidth are very useful for this and can be obtained in different ways:
by instrumenting the page running inside the client browser (eg piwik) by analysis of the web server logs (eg webalizer) For the latter I have been using for several years webalizer, which does nice web based analysis plots.
Post
Getting hold of remote weblogs
The last post was assuming that the weblogs to analyse are directly accessible by the R session which may not be the case if your analysis is running on a remote machine. Also in some cases you may want to filter out some uninteresting log records (eg local clients on the web server or local area accesses from known clients). The next examples show how to modify the previous R script using the R pipe function to take this into account:
Post
Connect to a remote, kerberized hadoop cluster
To use a remote hadoop cluster with kerberos authentication you will need to get a proper krb5.conf file (eg from your remote cluster /etc/kerb5.conf) and place the file /etc/krb5.conf on your client OSX machine. To use this configurations from your osx hadoop client change your .[z]profile to:
export HADOOP_OPTS="-Djava.security.krb5.conf=/etc/krb5.conf" export YARN_OPTS="-Djava.security.krb5.conf=/etc/krb5.conf" With java 1.7 this should be sufficient to detect the default realm, the kdc and also any specific authentication options used by your site.