Share This Post, Help Others, And Earn My Heartfelt Appreciation! :)
4.8
(69)

When learning the Talend Studio tool [ Tutorial Working with databases ] , we cannot omit the components dedicated to databases. They are an indispensable element of data processing. The database is almost always the target, source or both of them at the same time.

In this training, I will show:

  • how to connect to the database,
  • how to create a table using the Talend component,
  • how to load data from the database and how to save it in the database,
  • how to filter and aggregate data in Talend.

Talend DI Tutorial Working with databases

Creating a database connection

First, let’s create a database connection in the metadata – I will use the MySQL database.

In the repository select Metadata -> Db Connections -> Create Connection

Talend DI Tutorial Working with databases

A window will appear in which you must enter the name of the database connection. Other fields are optional. My connection will be called MySQL-Tutorial. Then select Next to proceed.

Select the database type (MySQL for me), enter login (username), password and server details. The DataBase field is optional.

Talend DI Tutorial Working with databases

Check your connection by selecting Test connection.

You should receive the following message: “<Your database name>” connection successful.

If the connection was successful, select OK and then Finish to close the connection wizard.

Creating tables using metadata

First we need to create a new job process, for me it will be called JOB_100_30_DB. Then look for  tCreateTable component in palette or start typing its name on the designer canvas:

Talend DI Tutorial Working with databases

In the Component view tab, change Property type to Repository, and then click the small button […]. A window will open with the option of choosing a database – choose the connection you just created (for me, this is MySQL-Tutorial).

Talend DI Tutorial Working with databases

After indicating the connection from the repository, the database settings will be automatically loaded into the tCreateTable component. The table we are creating will serve us as the target table of our process. Enter its name as CUSTOMER and change the table settings as Create table if not exits.

Talend DI Tutorial Working with databases

We still have to add a scheme. To do this, select Edit schema […] and then add the columns ORG_ID (Integer), SUM_SALARY (BigDecimal) and CNT_CUST (Integer).

Talend DI Tutorial Working with databases

Adding database components

In this step, we will add the source and target components. Depending on which database you have chosen, find the right component for it, e.g. for MySQL it will be tMysqlInput, for Oracle – tOracleInput. Change Property Type to Respository and point to database connection. Then enter the table name as CUSTOMER_SRC (below you will find a file with sample data to download) and immediately select Schema as Repository.

Source data processed in this training is as follows:

Talend DI Tutorial Working with databases
Talend DI Tutorial Working with databases

To be able to read the table schema automatically, we must add it to the repository. To do this, select your database connection and click Retrive Schema

Talend DI Tutorial Working with databases

In the new window you can filter the table name or simply click Next – then Talend will search all objects available in the database. Then expand the list of tables available in the database and select CUSTOMER_SRC. Click Next.

Talend DI Tutorial Working with databases

In the next step you will see the table scheme. If you do not want to modify it, just click Finish.

We can now return to the tMysqlInput component settings. In the Schema row, click […] and then select the CUSTOMER_SRC table name.

Talend DI Tutorial Working with databases

Also remember to click Guess Query in the component settings, which will match the SQL query for the specified table.

It is time to add the tMysqlOutput component. Set the Component View Property Type as Repository, set the table name as CUSTOMER (built-in type).

Talend DI Tutorial Working with databases

Data filtering and aggregation

Add the tFilterRow component that will filter the data (we will exclude ORG_ID = 20 from further processing). To modify its settings, we must connect this commponent with tMysqlInput.

Add the record in the table using [+] and set InputColumn as ORG_ID (the column by which we will filter the data), Operator as Not equal to and Value as 20.

Talend DI Tutorial Working with databases

In the next step, we will add the tAggregateRow component, which aggregates data according to ORG_ID and calculates the sum of the SALARY field and the number of the CUSTOMER field for the given ORG_ID. To do this, click Edit Schema and then add columns according to the screen below:

Talend DI Tutorial Working with databases

Connect tAggregateRow with tFilterRow without schema propagation. In the Group by table, add the ORG_ID column, and in the Operations table, add SUM_SALARY and CNT_CUST and the appropriate sum (SALARY) and count (ID) functions.

BigData-ETL: Capture 33

Starting the process

Finally, we need to connect tCreateTable with the other components. We will use the On Subjob Ok process trigger for this. We will talk about subprocesses (here: subjob) in the next parts of the training. Now let’s just connect tCreateTable with tMysqlInput. To this end, right-click tCreateTable Trigger -> On Subjob Ok.

Talend DI Tutorial Working with databases

In the final version, our process looks like this:

Talend DI Tutorial Working with databases

We will now go to the Run view tab and start the process by clicking Run.

Then we check the data in the CUSTOMER table:

SELECT * FROM CUSTOMER;
BigData-ETL: Capture 36

If you enjoyed this post please add the comment below and share this post on your Facebook, Twitter, LinkedIn or another social media webpage.
Thanks in advanced!

How useful was this post?

Click on a star to rate it!

Average rating 4.8 / 5. Vote count: 69

No votes so far! Be the first to rate this post.

Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments