Too many tags! Topic modeling from multi values tags in data using LDA

Too many tags! Topic modeling from multi values tags in data using LDA

Modern web platforms and CMS follow a general trend to let the user set collection of tags, ie. attributes describing the content (a Corpus, a dataset, a collection…) which can be set to multiple values. This is not only true for web data (think about survey when users can answer with multiple values). So, multiple tags collections usually decorate even discrete datasets , and the values are in a freeform by definition. Thus, at the end of the day, we end up having thousands of different tags, most of them with very low frequency, almost useless to explain the data and the relations among observation. My idea is to classify the tags in topics, for better understanding the data and for dimensionality reduction, thanks to topic modeling solutions like LDA and Gibbs sampling.

Read more

5 tips for effective dashboard design for quantitative marketing

Before start exploring the deep cliffs of the quantitative marketing analysis, I would just like to share my personal set of best practices I always use for an effective dashboard design process. Altough these are focused to a typical marketing use case, these are totally generic as they provide a set of analytical starting points and some cooking recipes on how to approach the task of designing a visual reporting environment for your company.

Read more

Getting started with Twitter data analysis using Talend Open Studio

In this step-by-step tutorial I’ll show you how to use Talend Open Studio and the Twitter Components Pack to connect to Twitter, do a simple REST query and build a trivial relevance report on top on it. There’re tons of similar Talend tutorials out there, but no one is focused on my Twitter components pack, which let you do queries and result parsing without writing a single line of custom code. So let’s go into this 101 crash course on how to download tweets and build a real-world analysis on it.

Read more