In this post I will show you how to set up a Connection between Talend and Cloudera to be able to connect to CDP. When you will go through this tutorial you will be able to use Talend and connect to Cloudera.
Create A New Talend Cloudera Connection
In the Metadata section, right-click on Hadoop Cluser and select Create Hadoop Cluster.
In the new window, enter the name of the connection (optionally you can add a destination and description) and click Next.
In the next window “Hadoop Configuration Import Wizard” set in order:
- Distribution = Cloudera
- Version = in my case it was the highest available version CDH5.12 with YARN. When you do not see your version, choose the one that is closest to yours.
- Option = Chanege to “Retrieve configuration from Ambari or Cloudera”.
Connect To Cloudera
Once we have everything selected, click the Next button.
Now enter the server address where you will find CDP Manager. The standard port is 7180. In addition, you must provide the user and password to CDP Manager.
When we have all the fields completed, we must:
- Click ‘Connect‘. After a few seconds, we should have our cluster available in the “Discovered clusters” section.
- Click on the “Fetch” button.
We click the “Finished” button.
In the next window, fill in the next portion of information.
Very important: use host names instead of IP addresses!
It may happen that host names will not resolve to IP addresses. In this case, add the host names to the hosts file.
If you do not know how to do it, go to the post: Windows: How to add a server name and IP to the hosts file?
- Namenode URI – starting with “hdfs”. The port is not necessary. Default 8080.
- Resource Manager
- Resoure Manager Scheduler
- Job History
- Staging directory
- User name – the user you will be used for. reading/writing data from HDFS.
Now check your connection by clicking the “Check Services” button.
You will see a new window where TOS will check the connection to the cluster. If everything is ok, you will receive a green bar at the level of each site.
Click “Finished” and we can now use the defined connection in subsequent jobs. That’s it. I hope this post will help you solve your problem and you have learned something new that will be useful for you in the future! Enjoy!
If you enjoyed this post please add the comment below and share this post on your Facebook, Twitter, LinkedIn or another social media webpage.
Thanks in advanced!