The flat files are used in many data processing systems, regardless of whether we are talking about data warehouses or Big Data systems. Very often flat files are used as data sources, which results from the many years of practice of business departments of company. I am sure that you have not once met the data stored by the business department in MS Excel files.
In this training, I will show you how to build a simple integration process loading data between flat files, how to add components on the designer screen, and how to link them.
Creating the first job
In the repository, right-click on Job Design -> Create job:
A window will appear in which you must enter the name of the process. Other fields are optional. After clicking Finish you see an empty designer box.
Adding source file
There are three ways to add components to the process:
- adding a component from the palette:
- drag and drop component from the palette
- put the mouse cursor in the designer field and start writing the name of the component
- adding a component from the repository
If you try to add a component from the palette, it will be completely empty, which means that you need to know the file structure and specify it yourself in the component settings. Adding a file from the repository has the advantage that the file schema is automatically loaded, so you have fewer things to set up. It is also good practice to add all source and target objects to the repository that will be easier to manage in the future – a change in the repository can be propagated in all processes that use the object.
I will show you how to add a source component in two ways. However, first download the source file that we will use in our process.
Adding a file schema manually
Let’s find the tFileInputDelimited component in the Files-> Input palette, drag it to the designer and then open the component view. Next to the Edit schema attribute, you will find a small button, select it. A window will appear in which you must manually enter the column names, set data types, date mask and field lengths.
Then set the path to the source file and file parameters:
Adding a file using repository
In the repository, in the Metadata, select File Delimited -> Create file delimited.
Enter the path to the file and enter the appropriate format – in my case it is Windows.
In the next step set:
- appropriate separator (Comma),
- CSV format
- first record as header
In the last step, Talend will propose you data types along with their lengths.
Drag the loaded file to the designer screen.
Adding target file
As the target file we will use the tFileOutputExcel component. Start writing its name on the designer screen, then confirm the choice – the component will appear on the designer sceen.
In the component tab, set the path to the target file:
We have already created the source and target objects. However, in order for our object not to load the same data, let’s add some transformations – we will want to combine the FIRST_NAME and NAME attributes. For this we will use the tMap component.
We already have three components in the designer, we must now link them for the process to work properly. The components can be combined in two ways:
- clicking with the mouse on the output icon (O) located on the source component and dragging the appearing arrow to the tMap component
- right-clicking on the source component, then Row -> Main (similarly an arrow will appear that needs to be dragged onto the tMap component
Similarly, tMap should be link with the target component.
We now need to set the tMap component – double click on it. A window will open in which three columns should be added to the target object: NAME, AGE, END_DATE:
Now we will link source and target attributes. Drag the FIRST_NAME and NAME attributes to the NAME field, then insert a space between them:
row1.FIRST_NAME +" "+ row1.NAME
Drag the AGE and END_DATE attributes to the target.
The process is now ready to run.
Go to the run console and start the process by selecting Run button.