Could You Please Share This Post? I Appreciate It And Thank YOU! :) Have A Nice Day!

When learning the Talend Studio tool [ Tutorial Working with databases ] , we cannot omit the components dedicated to databases. They are an indispensable element of data processing. The database is almost always the target, source or both of them at the same time.

In this training, I will show:

Talend DI Tutorial Working with databases

Creating a database connection

First, let’s create a database connection in the metadata – I will use the MySQL database.

In the repository select Metadata -> Db Connections -> Create Connection

Talend DI Tutorial Working with databases

A window will appear in which you must enter the name of the database connection. Other fields are optional. My connection will be called MySQL-Tutorial. Then select Next to proceed.

Select the database type (MySQL for me), enter login (username), password and server details. The DataBase field is optional.

Talend DI Tutorial Working with databases

Check your connection by selecting Test connection.

You should receive the following message: “<Your database name>” connection successful.

If the connection was successful, select OK and then Finish to close the connection wizard.

Creating tables using metadata

First we need to create a new job process, for me it will be called JOB_100_30_DB. Then look for  tCreateTable component in palette or start typing its name on the designer canvas:

Talend DI Tutorial Working with databases

In the Component view tab, change Property type to Repository, and then click the small button […]. A window will open with the option of choosing a database – choose the connection you just created (for me, this is MySQL-Tutorial).

Talend DI Tutorial Working with databases

After indicating the connection from the repository, the database settings will be automatically loaded into the tCreateTable component. The table we are creating will serve us as the target table of our process. Enter its name as CUSTOMER and change the table settings as Create table if not exits.

Talend DI Tutorial Working with databases

We still have to add a scheme. To do this, select Edit schema […] and then add the columns ORG_ID (Integer), SUM_SALARY (BigDecimal) and CNT_CUST (Integer).

Talend DI Tutorial Working with databases

Adding database components

In this step, we will add the source and target components. Depending on which database you have chosen, find the right component for it, e.g. for MySQL it will be tMysqlInput, for Oracle – tOracleInput. Change Property Type to Respository and point to database connection. Then enter the table name as CUSTOMER_SRC (below you will find a file with sample data to download) and immediately select Schema as Repository.

Source data processed in this training is as follows:

Talend DI Tutorial Working with databases
Talend DI Tutorial Working with databases

To be able to read the table schema automatically, we must add it to the repository. To do this, select your database connection and click Retrive Schema

Talend DI Tutorial Working with databases

In the new window you can filter the table name or simply click Next – then Talend will search all objects available in the database. Then expand the list of tables available in the database and select CUSTOMER_SRC. Click Next.

Talend DI Tutorial Working with databases

In the next step you will see the table scheme. If you do not want to modify it, just click Finish.

We can now return to the tMysqlInput component settings. In the Schema row, click […] and then select the CUSTOMER_SRC table name.

Talend DI Tutorial Working with databases

Also remember to click Guess Query in the component settings, which will match the SQL query for the specified table.

It is time to add the tMysqlOutput component. Set the Component View Property Type as Repository, set the table name as CUSTOMER (built-in type).

Talend DI Tutorial Working with databases

Data filtering and aggregation

Add the tFilterRow component that will filter the data (we will exclude ORG_ID = 20 from further processing). To modify its settings, we must connect this commponent with tMysqlInput.

Add the record in the table using [+] and set InputColumn as ORG_ID (the column by which we will filter the data), Operator as Not equal to and Value as 20.

Talend DI Tutorial Working with databases

In the next step, we will use the tAggregateRow component in Talend, which aggregates data according to ORG_ID and calculates the sum of the SALARY field and the number of the CUSTOMER field for the given ORG_ID. To do this, click Edit Schema and then add columns according to the screen below:

Talend DI Tutorial Working with databases

Connect tAggregateRow with tFilterRow without schema propagation. In the Group by table, add the ORG_ID column, and in the Operations table, add SUM_SALARY and CNT_CUST and the appropriate sum (SALARY) and count (ID) functions.

BigData-ETL: Capture 33

Starting the process

Finally, we need to connect tCreateTable with the other components. We will use the On Subjob Ok process trigger for this. We will talk about subprocesses (here: subjob) in the next parts of the training. Now let’s just connect tCreateTable with tMysqlInput. To this end, right-click tCreateTable Trigger -> On Subjob Ok.

Talend DI Tutorial Working with databases

In the final version, our process looks like this:

Talend DI Tutorial Working with databases

We will now go to the Run view tab and start the process by clicking Run.

Then we check the data in the CUSTOMER table:

BigData-ETL: Capture 36
Could You Please Share This Post? 
I appreciate It And Thank YOU! :)
Have A Nice Day!


How useful was this post?

Click on a star to rate it!

Average rating 4.8 / 5. Vote count: 69

No votes so far! Be the first to rate this post.

As you found this post useful...

Follow us on social media!

We are sorry that this post was not useful for you!

Let us improve this post!

Tell us how we can improve this post?

Notify of

Newest Most Voted
Inline Feedbacks
View all comments
Gaius Gracchus

This is puzzling – the ETL process is doing what should be done internally inside the database – the resultant record set would then just need to be passed along to the target. With large datasets, the performance would likely degrade substantially if done as shown above. Databases are OPTIMIZED to do this kind of work so the best practice is to let the database do the filtering and aggregation and pass the finished dataset along to the target. In fact, most modern RDBMS can also concatenate fields in the query. If you want to see an IPaaS/ETL tool that… Read more »

Paweł Cieśla

Thanks for the message. In response to your message, remember the difference between an ETL and an ELT.
Talend also supports the “Pushdown” option you are talking about. Of course, as often as possible, it is recommended to perform all transformations on the base without the need to transport this data over the network to the Talend instance or another tool. Most of the cases described above represent the ETL process and not the ELT. Please bear that in mind!