When learning the Talend Studio tool [ Tutorial" Working with databases ] , we cannot omit the components dedicated to databases. They are an indispensable element of data processing. The database is almost always the target, source or both of them at the same time.
Talend Data Integration" (DI) is a powerful tool for extracting, transforming, and loading data from various sources and platforms. Working with databases is a common use case for Talend" DI, as it allows you to easily connect to and interact with your database" to extract, transform, and load data.
In this training, I will show:
- how to connect to the database,
- how to create a table using the Talend" component,
- how to load data from the database and how to save it in the database,
- how to filter and aggregate data in Talend".
Talend DI Tutorial Working with databases
Creating a database connection
First, let’s create a database connection in the metadata – I will use the MySQL" database.
In the repository select Metadata -> Db Connections -> Create Connection
A window will appear in which you must enter the name of the database connection. Other fields are optional. My connection will be called MySQL-Tutorial. Then select Next to proceed.
Select the database type" (MySQL" for me), enter login" (username), password and server details. The DataBase field is optional.
Check your connection by selecting Test connection.
You should receive the following message: “<Your database name>” connection successful.
If the connection was successful, select OK and then Finish to close the connection wizard.
Creating tables using metadata
First we need to create a new job process, for me it will be called JOB_100_30_DB. Then look for tCreateTable component in palette or start typing its name on the designer canvas:
In the Component view tab, change Property type to Repository, and then click the small button […]. A window will open" with the option of choosing a database – choose the connection you just created (for me, this is MySQL-Tutorial).
After indicating the connection from the repository, the database settings will be automatically loaded into the tCreateTable component. The table we are creating will serve us as the target table" of our process. Enter its name as CUSTOMER and change the table settings as Create table if not exits.
We still have to add a scheme. To do this, select Edit schema […] and then add the columns ORG_ID (Integer), SUM_SALARY (BigDecimal) and CNT_CUST (Integer).
Adding database components
In this step, we will add the source and target components. Depending on which database you have chosen, find the right component for it, e.g. for MySQL" it will be tMysqlInput, for Oracle" – tOracleInput. Change Property Type to Respository and point to database connection. Then enter the table name as CUSTOMER_SRC (below you will find a file with sample data to download) and immediately select Schema as Repository.
Source data processed in this training is as follows:
To be able to read the table schema automatically, we must add it to the repository. To do this, select your database" connection and click Retrive Schema
In the new window you can filter the table name or simply click Next – then Talend" will search all objects available in the database. Then expand the list of tables available in the database and select CUSTOMER_SRC. Click Next.
In the next step you will see the table scheme. If you do not want to modify it, just click Finish.
We can now return to the tMysqlInput component settings. In the Schema row, click […] and then select the CUSTOMER_SRC table name.
Also remember to click Guess Query in the component settings, which will match the SQL" query for the specified table.
It is time to add the tMysqlOutput component. Set the Component View Property Type as Repository, set the table name as CUSTOMER (built-in type).
Data filtering and aggregation
Add the tFilterRow" component that will filter the data (we will exclude ORG_ID = 20 from further processing). To modify its settings, we must connect this commponent with tMysqlInput.
Add the record in the table using [+] and set InputColumn as ORG_ID (the column by which we will filter the data), Operator as Not equal to and Value as 20.
In the next step, we will use the tAggregateRow component in Talend", which aggregates data according to ORG_ID and calculates the sum of the SALARY field and the number of the CUSTOMER field for the given ORG_ID. To do this, click Edit Schema and then add columns according to the screen below:
Connect tAggregateRow with tFilterRow" without schema propagation. In the Group by table, add the ORG_ID column, and in the Operations table, add SUM_SALARY and CNT_CUST and the appropriate sum (SALARY) and count (ID) functions.
Starting the process
Finally, we need to connect tCreateTable with the other components. We will use the On Subjob Ok process trigger for this. We will talk about subprocesses (here: subjob) in the next parts of the training. Now let’s just connect tCreateTable with tMysqlInput. To this end, right-click tCreateTable Trigger -> On Subjob Ok.
In the final version, our process looks like this:
We will now go to the Run view tab and start the process by clicking Run.
Then we check the data in the CUSTOMER table:
SELECT * FROM CUSTOMER;
Here are the general steps for working with databases in Talend" DI:
- Connect to the database: In order to work with a database in Talend" DI, you will first need to connect to it. This can be done through a JDBC connection, which allows you to connect to a wide variety of databases using a standardized interface.
- Extract data from the database: Once you have established a connection to the database, you can use various components in Talend" DI to extract data from the database. For example, you can use the “tMysqlInput” component to extract data from a MySQL" database, or the “tOracleInput” component to extract data from an Oracle" database.
- Transform the data: After extracting the data from the database, you can use various components in Talend" DI to transform and manipulate the data as needed. For example, you can use the “tMap” component to perform transformations such as filtering, sorting, and aggregating data.
- Load the data into another database or file: Once you have transformed the data, you can use various components in Talend" DI to load the data into another database or file. For example, you can use the “tMysqlOutput” component to load the data into a MySQL" database, or the “tFileOutputDelimited” component to write the data to a delimited file.
These are just a few examples of the many ways in which you can work with databases in Talend" DI. If you have any specific questions about working with databases in Talend" DI, feel free to ask.
Could You Please Share This Post? I appreciate It And Thank YOU! :) Have A Nice Day!
We are sorry that this post was not useful for you!
Let us improve this post!
Tell us how we can improve this post?
This Post Has 2 Comments
This is puzzling – the ETL process is doing what should be done internally inside the database – the resultant record set would then just need to be passed along to the target.
With large datasets, the performance would likely degrade substantially if done as shown above. Databases are OPTIMIZED to do this kind of work so the best practice is to let the database do the filtering and aggregation and pass the finished dataset along to the target. In fact, most modern RDBMS can also concatenate fields in the query.
If you want to see an IPaaS/ETL tool that can do it using this method as a better practice you should check out Boomi. (Formerly Dell Boomi)
Thanks for the message. In response to your message, remember the difference between an ETL and an ELT.
Talend also supports the “Pushdown” option you are talking about. Of course, as often as possible, it is recommended to perform all transformations on the base without the need to transport this data over the network to the Talend instance or another tool. Most of the cases described above represent the ETL process and not the ELT. Please bear that in mind!