Connection configuration between Talend and Cloudera

Cloudera is one of the three major players in the market alongside Hortonworks and MapR, which distributes the Hadoop general-interest.

In this post I will show you how to set up a connection in Talend Open Studio for Big Data (TOSBD) to be able to connect to Cloudera.

1. Create a new connection in Talend

In the Metadata section, right-click on Hadoop Cluser and select Create Hadoop Cluster.

In the new window, enter the name of the connection (optionally you can add a destination and description) and click Next.

In the next window “Hadoop Configuration Import Wizard” set in order:

  • Distribution = Cloudera
  • Version = in my case it was the highest available version CDH5.12 with YARN. When you do not see your version, choose the one that is closest to yours.
  • Option = Chaneg to “Retrieve configuration from Ambari or Cloudera”.

Once we have everything selected, click the Next button.

Now enter the server address where you will find Clouder Manager. The standard port is 7180. In addition, you must provide the user and password to Clouder Manager.

When we have all the fields completed, we must:

  1. Click ‘Connect‘. After a few seconds, we should have our cluster available in the “Discovered clusters” section.
  2. Click on the “Fetch” button.

We click the “Finished” button.

In the next window, fill in the next portion of information.

Very important: use host names instead of IP addresses!

It may happen that host names will not resolve to IP addresses. In this case, add the host names to the hosts file.

If you do not know how to do it, go to the post: Windows: How to add a server name and IP to the hosts file?

  • Namenode URI – starting with “hdfs”. The port is not necessary. Default 8080.
  • Resource Manager
  • Resoure Manager Scheduler
  • Job History
  • Staging directory
  • User name – the user you will be used for. reading/writing data from HDFS.

Now check your connection by clicking the “Check Services” button.

You will see a new window where Talend will check the connection to the cluster. If everything is ok, you will receive a green bar at the level of each site.

Click “Finished” and we can now use the defined connection in subsequent jobs.

If you enjoyed this post please add the comment below or share this post on your Facebook, Twitter, LinkedIn or another social media webpage.
Thanks in advanced!

0 0 vote
Article Rating
Notify of
Inline Feedbacks
View all comments