RStudio is a full-featured programming environment for coding in R and, as it comes for free, it’s totally in scope for our Open Analytics duties. The best known version of this nice piece of software is the desktop one, which is available for Windows, Mac and Linux platforms and it’s not uncommon to see as a personal analytics solution, especially where SAS is way to expensive. Sometimes, people use RStudio to work locally with R, for developing/prototyping/testing and then deploy .R files on a remote (heavy) server which runs it using stand-alone R for better performance.

The question is: why do that if one could use RStudio bigger brother, the RStudio server?

As a matter of facts, RStudio Server looks exactly the same of RStudio Desktop, but inside a web browser (actually, even the Desktop version has an HTML interface, but it is nicely embedded into the application window). So it runs on a remote http daemon, invokes a remote R client and save data on remote users’ home directories. This has obvious performance advantage, as servers are usually more powerful than tiny laptops and noisy workstations.

The only relevant differences are the authentication and resource-scheduling capabilities.

In this tutorial we assume you have an Ubuntu 64-bit server already up and running and you have sudo grants on it. If you don’t, you can easily set-up a box using Vagrant. You should really consider switching to Vagrant anyway, sooner or later, as it has only advantages and no flaws.

If your system is ready and you are too, let’s start the journey!

Installing R and dependencies

First of all, we’re going to add RCran repository to APT, through a secure key. You should replace the URL to the one of your nearest CRAN mirror, here.

Then, it’s time to install Apache, R and all the important dependencies:

It’s useful to perform some R initial housecleaning, here, and I’m talking about installing some never-without packages.
It’s important to note that everything  you install here is system-based (and shared among all RStudio users). So don’t add too many packages here!

(Alternatively, you can start a terminal R session by sudo and do the familiar install.packages from there)

Installing and configuring RStudio Server

RStudio Server must be download and installed manually. Check for the most recent version URL from the official download page, then do the followings:

Then, it’s time to make a bit of tuning on /etc/rstudio/rserver.conf. Add/Replace with the follows (eventually adjusting values reflecting your needs and available hardware bits):

Among the others (almost all resource-related), these directives limit the access to users belonging to rstudio group and restrict the access to connections from 127.0.0.1. The reasons for these restrictions will become clear later.

Obviously, we need a rstudio group to add RStudio users to.

If you don’t have users already, or for testing purposes, you can eventually use this snippet to create an ad-hoc rstudio user to the group. Don’t forget to write down the password!

Finally, we’re going to set the autostart…

…and do a restart, to apply our changes:

RStudio is properly configured and running. In theory, pointing your browser to http://<your_server_ip>:8787, you should get the RStudio login form.

But you won’t, actually. Something went wrong?

Not really! We just need to do a final configuration task.

Configuring Apache as proxy for RStudio

Do you remember we set RStudio Web Server to deny any request coming from IPs different from 127.0.0.1?

Well, the reason behind this apparently insane directive is that RStudio internal web server is not designed to give security (against brute force attacks like DDoS, for example) or to give load-balancing capabilities or some other advanced stuff. These meatballs are food for more robust Web Servers out here!

So, the idea here is to not allow the user to get access to RStudio directly, but to force him to pass through a web proxy.  So, to complete our production-ready environment we’re going to set-up Apache2 as a proxy server for RStudio.

First of all, we need to enable Apache proxy modules. In Ubuntu, this is really trivial:

Then, we need to set-up a new Virtual Host. Start copying the default template as rstudio.conf:

And edit this just-created file (as sudo) with the following. If you know what you are doing, you can add further changes to the file following your needs.

Restart Apache to apply the changes

and finally point your browser to http://<your_server_ip>/rstudio. Login with a valid set of local-machine credentials for a user that belongs to rstudio group and start playing.

Tah-Dah! Welcome to your brand-new, full of colors, RStudio Server!

Final notes and next steps

As I said, I consider this a decent production-reading environment for small2mid analytics teams. But if you’re planning to use it in more complex environments, at least three improvements come in mind.

  • Execute RStudio under a AppArmour controlled environment;
  • Configure Apache to serve under https-only protocol;
  • Install RMarkdown and Haskell compiler to build better documentation.

I won’t cover these topics now, as they would probably need a tutorial per se. But drop a line below if you need these and I would be happy to write an article on the specific subjects. Thanks for reading and happy coding!

Share This