In this step-by-step tutorial I’ll show you how to use Talend Open Studio and the Twitter Components Pack to connect to Twitter, do a simple REST query and build a trivial relevance report on top on it. There’re tons of similar Talend tutorials out there, but no one is focused on my Twitter components pack, which let you do queries and result parsing without writing a single line of custom code. So let’s go into this 101 crash course on how to download tweets and build a real-world analysis on it.
I’m attending a coursera-powered class in Social Network Analysis from University of Michigan. While the course is really stunning, I had to realize that it’s kind difficult to obtain enough example datasets to study, especially if you haven’t a solid programming background. Since the course is not targeted to programmers but data analysts, It sounds really weird to me that my classmates have to wait for someone to extract, clean-up and eventually share these datasets. However, network data is usually well-structured and with low dimensionality, so I think that a data integration and manipulation software could be an easier way to prepare these datasets. I choose to do this using Talend Open Studio, as it’s probably the best free data integration platform available nowadays. It obviously doesn’t have a component to build datasets in a format readable by for example Gephi or Pajek, but Talend is eclipse-based, so it was easy to me to build a custom component to write .GML files.
In this tutorial I will guide you using this component to write graphs file, then I’ll explain you a 120-seconds method to build an endless set of example valid network datasets, useful for learning, testing, simulations and so on.