When learning the Talend Studio tool, we cannot omit the components dedicated to databases. They are an indispensable element of data processing. The database is almost always the target, source or both of them at the same time.
In this training, I will show:
- how to connect to the database,
- how to create a table using the Talend component,
- how to load data from the database and how to save it in the database,
- how to filter and aggregate data in Talend.
Creating a database connection
First, let’s create a database connection in the metadata – I will use the MySQL database.
In the repository select Metadata -> Db Connections -> Create Connection
A window will appear in which you must enter the name of the database connection. Other fields are optional. My connection will be called MySQL-Tutorial. Then select Next to proceed.
Select the database type (MySQL for me), enter login (username), password and server details. The DataBase field is optional.
Check your connection by selecting Test connection.
You should receive the following message: “<Your database name>” connection successful.
If the connection was successful, select OK and then Finish to close the connection wizard.
Creating tables using metadata
First we need to create a new job process, for me it will be called JOB_100_30_DB. Then look for tCreateTable component in palette or start typing its name on the designer canvas:
In the Component view tab, change Property type to Repository, and then click the small button […]. A window will open with the option of choosing a database – choose the connection you just created (for me, this is MySQL-Tutorial).
After indicating the connection from the repository, the database settings will be automatically loaded into the tCreateTable component. The table we are creating will serve us as the target table of our process. Enter its name as CUSTOMER and change the table settings as Create table if not exits.
We still have to add a scheme. To do this, select Edit schema […] and then add the columns ORG_ID (Integer), SUM_SALARY (BigDecimal) and CNT_CUST (Integer).
Adding database components
In this step, we will add the source and target components. Depending on which database you have chosen, find the right component for it, e.g. for MySQL it will be tMysqlInput, for Oracle – tOracleInput. Change Property Type to Respository and point to database connection. Then enter the table name as CUSTOMER_SRC (below you will find a file with sample data to download) and immediately select Schema as Repository.
Source data processed in this training is as follows:
To be able to read the table schema automatically, we must add it to the repository. To do this, select your database connection and click Retrive Schema
In the new window you can filter the table name or simply click Next – then Talend will search all objects available in the database. Then expand the list of tables available in the database and select CUSTOMER_SRC. Click Next.
In the next step you will see the table scheme. If you do not want to modify it, just click Finish.
We can now return to the tMysqlInput component settings. In the Schema row, click […] and then select the CUSTOMER_SRC table name.
Also remember to click Guess Query in the component settings, which will match the SQL query for the specified table.
It is time to add the tMysqlOutput component. Set the Component View Property Type as Repository, set the table name as CUSTOMER (built-in type).
Data filtering and aggregation
Add the tFilterRow component that will filter the data (we will exclude ORG_ID = 20 from further processing). To modify its settings, we must connect this commponent with tMysqlInput.
Add the record in the table using [+] and set InputColumn as ORG_ID (the column by which we will filter the data), Operator as Not equal to and Value as 20.
In the next step, we will add the tAggregateRow component, which aggregates data according to ORG_ID and calculates the sum of the SALARY field and the number of the CUSTOMER field for the given ORG_ID. To do this, click Edit Schema and then add columns according to the screen below:
Connect tAggregateRow with tFilterRow without schema propagation. In the Group by table, add the ORG_ID column, and in the Operations table, add SUM_SALARY and CNT_CUST and the appropriate sum (SALARY) and count (ID) functions.
Starting the process
Finally, we need to connect tCreateTable with the other components. We will use the On Subjob Ok process trigger for this. We will talk about subprocesses (here: subjob) in the next parts of the training. Now let’s just connect tCreateTable with tMysqlInput. To this end, right-click tCreateTable Trigger -> On Subjob Ok.
In the final version, our process looks like this:
We will now go to the Run view tab and start the process by clicking Run.
Then we check the data in the CUSTOMER table:
SELECT * FROM CUSTOMER;
If you enjoyed this post please add the comment below or share this post on your Facebook, Twitter, LinkedIn or another social media webpage.
Thanks in advanced!