Processing flat files in Talend are used in many data processing systems, regardless of whether we are talking about data warehouses or Big Data systems. Very often flat files are used as data sources, which results from the many years of practice of business departments of company. I am sure that you have not once met the data stored by the business department in MS Excel files.
In this training, I will show you how to build a simple integration process loading data between flat files, how to add components on the designer screen, and how to link them. (Processing flat files in Talend DI Tutorial)
Example Of Flat Files In Talend
Talend provides out of the box components to process flat files like:
In the repository, right-click on Job Design -> Create job:
A window will appear in which you must enter the name of the process. Other fields are optional. After clicking Finish you see an empty designer box.
Adding source Flat files
There are three ways to add components to the process and start Processing flat files in Talend:
- adding a component from the palette:
- drag and drop component from the palette
- put the mouse cursor in the designer field and start writing the name of the component
- adding a component from the repository
If you try to add a component from the palette, it will be completely empty, which means that you need to know the file structure and specify it yourself in the component settings. Adding a file from the repository has the advantage that the file schema is automatically loaded, so you have fewer things to set up. It is also good practice to add all source and target objects to the repository that will be easier to manage in the future – a change in the repository can be propagated in all processes that use the object. (Processing flat files in Talend DI Tutorial)
I will show you how to add a source component in two ways. However, first download the source file that we will use in our process.
Adding a Flat Files schema manually and processing flat files in Talend
Let’s find the tFileInputDelimited component in the Files-> Input palette, drag it to the designer and then open the component view. Next to the Edit schema attribute, you will find a small button, select it. A window will appear in which you must manually enter the column names, set data types, date mask and field lengths.
Then set the path to the source file and file parameters:
Adding a Flat Files using repository
In the repository, in the Metadata, select File Delimited -> Create file delimited.
Enter the path to the file and enter the appropriate format – in my case it is Windows.
In the next step set:
- appropriate separator (Comma),
- CSV format
- first record as header
In the last step, Talend will propose you data types along with their lengths.
Drag the loaded file to the designer screen.
Adding target file
As the target file we will use the tFileOutputExcel component. Start writing its name on the designer screen, then confirm the choice – the component will appear on the designer sceen.
In the component tab, set the path to the target file:
We have already created the source and target objects. However, in order for our object not to load the same data, let’s add some transformations – we will want to combine the FIRST_NAME and NAME attributes. For this we will use the tMap component.
We already have three components in the designer, we must now link them for the process to work properly. The components can be combined in two ways:
- clicking with the mouse on the output icon (O) located on the source component and dragging the appearing arrow to the tMap component
- right-clicking on the source component, then Row -> Main (similarly an arrow will appear that needs to be dragged onto the tMap component
Similarly, tMap should be link with the target component.
We now need to set the tMap component – double click on it. A window will open in which three columns should be added to the target object: NAME, AGE, END_DATE:
Now we will link source and target attributes. Drag the FIRST_NAME and NAME attributes to the NAME field, then insert a space between them:
row1.FIRST_NAME +" "+ row1.NAME
Drag the AGE and END_DATE attributes to the target.
The process is now ready to run.
Running job and start processing flat files in Talend
Go to the run console and start the process by selecting Run button.
That’a all about how to start Processing flat files in Talend. Enjoy!
If you enjoyed this training please leave the comment below or share this post on your Facebook, Twitter, LinkedIn or another social media webpage.
Thanks in advanced!
We are sorry that this post was not useful for you!
Let us improve this post!
Tell us how we can improve this post?