Talend Open Studio API component development docset for Dash

As anticipated in my previous post, I built a docset that will dramatically speed-up Talend Open Studio component development under Mac OS X environments and specifically to those using Dash Snippet Manager. This docset, made in Apple standard format, basically exposes the full set of Talend Open Studio API 5.2 in a more organic and searchable way. To use it, follow these steps:

  1. Install Dash from this App Store link;
  2. In Preferences > Downloads click the “+” icon and paste the following feed URL: https://s3.amazonaws.com/extra-openanalytics/dash/Talend_Open_Studio.xml;
  3. Click the “Download” button to automatically install. After this operation, the window will looks similar to the following picture:
  4. Close the Preferences Windows. A new entry Talend Open Studio should be visible between installed docsets. Now you can explore the full API set or make searches using the “tos:” prefix (ie. “tos:IMetadataColumn):
  5. Namespaces unfortunately follow Obj-C syntax rather than Java one, but I was not able to fix it yet. I think it’s a bug in my docsetutil configuration, I’ll investigate further. However, this should not avoid the usefulness of the package.

Next steps will include a set of code snippets for Dash, devoted to Talend component development. Have a nice day!

Doxygen/javadoc for Talend Open Studio API for 5.2

Talend Open Studio is a great open source platform to develop ETL and data processing workflows based on Eclipse, with a easy-to-lean plugin architecture (although not perfect, in my opinion) and a powerful set of API for developers. Even if the most part of this set of APIs is not for ordinary development (it’s for the designer part, and useful to customize the GUI perspective), the remaining part is in the classpath of the javajet preprocessor, so it can be used to compile components. It’s open source, but documentation is not easily available, so I decided to compile, host and maintain a doxygen/javadoc documentation of the whole set of Talend Open Studio API. It’s fully indexed and full-text server-side searchable and it’s built upon svn repository for the 5.2 branch.

Browse the documentation

You can find it there: Talend Open Studio API for 5.2

Here’s you can find the doxygen definition file used to make it, in case you need to tweak it and perhaps host locally for your needs. Please remember that you need to adjust all paths (including DOT_PATH, for graphs generation) accordingly!

Next step includes the generation of a docset, useful to speed up development under Max OS X, perhaps using an assistant like Dash.

Build GML graphs for Social Network Analysis in Talend

I’m attending a coursera-powered class in Social Network Analysis from University of Michigan. While the course is really stunning, I had to realize that it’s kind difficult to obtain enough example datasets to study, especially if you haven’t a solid programming background. Since the course is not targeted to programmers but data analysts, It sounds really weird to me that my classmates have to wait for someone to extract, clean-up and eventually share these datasets. However, network data is usually well-structured and with low dimensionality, so I think that a data integration and manipulation software could be an easier way to prepare these datasets. I choose to do this using Talend Open Studio, as it’s probably the best free data integration platform available nowadays. It obviously doesn’t have a component to build datasets in a format readable by for example Gephi or Pajek, but Talend is eclipse-based, so it was easy to me to build a custom component to write .GML files.

In this tutorial I will guide you using this component to write graphs file, then I’ll explain you a 120-seconds method to build an endless set of example valid network datasets, useful for learning, testing, simulations and so on.
