How to install RStudio Server in a production-ready Ubuntu environment

RStudio is a full-featured programming environment for coding in R and, as it comes for free, it’s totally in scope for our Open Analytics duties. The best known version of this nice piece of software is the desktop one, which is available for Windows, Mac and Linux platforms and it’s not uncommon to see as a personal analytics solution, especially where SAS is way to expensive. Sometimes, people use RStudio to work locally with R, for developing/prototyping/testing and then deploy .R files on a remote (heavy) server which runs it using stand-alone R for better performance.

The question is: why do that if one could use RStudio bigger brother, the RStudio server?

Read more

How to build a predictive model using Talend Open Studio and R

How to build a predictive model using Talend Open Studio and R

There are plenty of scenarios when one would benefit to do a cross-over between Talend Open Studio and R. The first is perfect for even complex ETL tasks, which by their very basic nature involves massive data I/O, manipulation, federation and governance, but it completely lacks any kind of serious statistical tool.

On the other hands, R is an absolute standard for statisticians, with a huge amount of external packages for practically any possible kind of analysis one could imagine, but even simple data operations must be hand-coded. R language is a very expressive and extensible data language, but one perhaps would prefer to spend time reasoning on the predictive model, rather than writing code to get the data out from the database. This is particularly true in data exploitation scenarios, but also in rapid prototyping and, generally speaking, in the whole business world.

If it’s not enough, R is basically a data language plus a command line executor. This is historically common for statistical software (just think to SAS) so it’s not a flaw on its own. But in real life Business Intelligence life-cycle, you probably have a corporate standard, a service bus, a protocol for data transfer and so on. A better interface with R is really advisable.

This is possible using a custom optional component made by me for Talend. In this tutorial I’ll show you how to use R to build a simple predictive model with data coming from Talend and how to get results back to Talend himself, for all your ETL good habits.

Read more