This component performs a query against Majestic SEO backlinks checker to get a list of backlinks pointing to a given URL. It’s built around the excellent set of Majestic SEO web services using the official java connector. Using this component, you’ll be able to crawl a lot of information about whose linking to a target website, proven exceptionally useful for SEO Analysis.
This component makes use of OpenApp protocol to authenticate against Majestic SEO so it cannot be used if you only own a free Majestic SEO account due to their limitations on API usage. Aside that, please note that you might need to register a OpenApp application in order to obtain the access token needed by the component, so a valid Majestic account is mandatory.
Some features of the component:
- Secure authentication using OpenApp key and access token. No global API passphrase is needed;
- The whole set of attributes of back links is supported;
- No need to bother about data types: crawled data is auto-cast to target column’s data type (if such a cast is possible, o an exception is thrown otherwise);
- Support both fresh or historical indexes;
- Deleted link retrieval can be disabled if not needed to save resources;
- Sandbox/Live Majestic Server switcher (to not drain your Majestic resource during development/testing);
- Detailed INFO log with an option to redirect from tLogCatcher instances to stderr;
- Dynamic settings can be used to control every check boxes during component configuration;
- Written in true OOP.
How to use the component
This component must be used as start point of a job/subjob. It cannot accept main data connection, but it can accept iterate connection, if you need multiple query.
After you set the proper OpenApp properties (you eventually need to register a OpenApp in your Majestic SEO dashboard, API subsection – follow this tutorial and examples on how to do that), set the source page you want to get back links information. You can put a domain-level URL (ie. gabrielebaldassarre.com), a sub-domain level URL (ie. extras.gabrielebaldassarre.com) or even a page-level URL (ie. gabrielebaldassarre.com/talend/tmajesticbacklinkinput).
Please note that prefixing the URL with http:// and/or leaving the final trailing slash in URLs give different results!
Some experiments resulted in a poor resultset if you leave the http:// or the trailing slash, so my advice is to cut them both.
After that, fill your output schema and map each column with the proper detail function.
You can finally set query parameters (ie. which index to use, how to deal with deleted back links, which Majestic environment calls) and trigger the query. A detailed header about the response will be printed on-screen (or sent to tLogCatcher), including how many resource used/remaining on your subscription plan.
These options are accessible through Advanced Parameters tab:
- Back links limit – do not retrieve results after that limit; useful for very much linked domains and if you are running out of resources.
- Maximum number of back links crawled from the same referral domain
- Maximum number of back links crawled from the same source url (ie. for count a link+image link just once)
- Log redirection from console to tLogCatcher and vice versa.
Please remember that each check box option can be parametrized using Dynamic Settings tab.