Getting started with Twitter data analysis using Talend Open Studio

In this step-by-step tutorial I’ll show you how to use Talend Open Studio and the Twitter Components Pack to connect to Twitter, do a simple REST query and build a trivial relevance report on top on it. There’re tons of similar Talend tutorials out there, but no one is focused on my Twitter components pack, which let you do queries and result parsing without writing a single line of custom code. So let’s go into this 101 crash course on how to download tweets and build a real-world analysis on it.

Read more

How to setup Cloudera Hadoop in 10 minutes using Vagrant

Following a survey made for a well estabilished group at LinkedIn, Big Data professionals impute the causes of failure of most Big Data projects to a lack of knowledge on how to set up, deploy and manage a Hadoop cluster. Although I consider this vision a bit too much simplistic, while I endorse a vision where the change management matters, the bootstrap difficulties for the IT departments are indeed huge and steep. Major Hadoop distribution companies, like Cloudera and IBM, are aware of this flaw and provide sophisticated installers and managers, while they offer quick start VMs for developers at same time. First of all, this cannot be enough for those Enterprises which need to put their production environments under control and it’s even not enough for those developers which need to use something more reliable than a single-node virtual cluster. Finally, it’s important to understand how Hadoop works under the hood, but of course without wasting time on configuration details.

Vagrant definitively comes to help. Using a Cloudera Vagrant box, you’ll be able to build a complete scalable cluster that (with some minor tuning) could even be used in production environments. In this tutorial we’re going to build a cluster based on Cloudera Hadoop Distributions in minutes and without user interventions.

Read more

How to install custom components into Talend Open Studio

How to install custom components into Talend Open Studio

Talend Open Studio is a handy ETL tool which amazing extending capabilities and a complete set of tools for building new custom components as I showed in several posts in the past. Talend offers an automatic way to install components through their official marketplace. However, that place is not famous for UX and for being attractive for developers and, as a matter of facts, components hosted there are often poor and outdated. Here I’m going to show a general way to install custom components which works for both Talend Exchange components and third-party hosted ones. For hard-core developers, I’ll also show how to compile a component starting from source code.

Read more

Page 3 of 6123456