Share this post and Earn Free Points!

Processing flat files in Talend are used in many data processing systems, regardless of whether we are talking about data warehouses or Big Data systems. Very often flat files are used as data sources, which results from the many years of practice of business departments of company. I am sure that you have not once met the data stored by the business department in MS Excel files.

Introduction

Flat File

A flat file is a file that contains plain text and stores data in a simple, unstructured format. Flat files are often used to store data that is not too large and does not need to be processed by a database management system (DBMS).

Flat files are simple and easy to use, and they can be opened and edited with any text editor. They are often used to exchange data between systems that do not use the same DBMS or data storage format.

In a flat file, each line represents a single record, and the values in the record are separated by a delimiter, such as a comma or a tab.

Flat files are not as flexible as other data storage formats, such as relational databases or NoSQL databases, because they do not support complex relationships between data or allow for fast data retrieval. However, they are still widely used in many applications due to their simplicity and ease of use.

Flat File Processing

In this training, I will show you how to build a simple integration process loading data between flat files, how to add components on the designer screen, and how to link them.

Example Of Flat Files In Talend

Talend provides out of the box components to process flat files like:

  • tFileInputDelimited is a component in Talend that allows you to read data from a delimited file, such as a CSV or a tab-separated file. You can use this component to specify the file path, the delimiter character, and the structure of the data in the file.
  • tFileInputExcel is a component in Talend that allows you to read data from an Excel file. You can use this component to specify the file path and the sheet name, and you can select specific columns or rows to read.
  • tFileOutputDelimited is a component in Talend that allows you to write data to a delimited file, such as a CSV or a tab-separated file. You can use this component to specify the file path, the delimiter character, and the structure of the data to be written.
  • tFileOutputExcel is a component in Talend that allows you to write data to an Excel file. You can use this component to specify the file path and the sheet name, and you can select the columns and rows to write
  • etc…

These components are part of the Talend Big Data Platform and are commonly used to read and write data from and to delimited files and Excel files in ETL (extract, transform, load) processes.

Processing Flat Files in Talend

In the repository, right-click on Job Design -> Create job:

Processing Flat Files in Talend DI Tutorial - cool example in 5 mins!

A window will appear in which you must enter the name of the process. Other fields are optional. After clicking Finish you see an empty designer box.

Processing Flat Files in Talend DI Tutorial - cool example in 5 mins!

Adding source Flat files

There are three ways to add components to the process and start Processing flat files in Talend:

  • adding a component from the palette:
    • drag and drop component from the palette
    • put the mouse cursor in the designer field and start writing the name of the component
Processing Flat Files in Talend DI Tutorial - cool example in 5 mins!
  • adding a component from the repository

If you try to add a component from the palette, it will be completely empty, which means that you need to know the file structure and specify it yourself in the component settings. Adding a file from the repository has the advantage that the file schema is automatically loaded, so you have fewer things to set up. It is also good practice to add all source and target objects to the repository that will be easier to manage in the future – a change in the repository can be propagated in all processes that use the object. (Processing flat files in Talend DI Tutorial)

I will show you how to add a source component in two ways. However, first download the source file that we will use in our process.

Adding a Flat Files schema manually and processing flat files in Talend

Let’s find the tFileInputDelimited component in the Files-> Input palette, drag it to the designer and then open the component view. Next to the Edit schema attribute, you will find a small button, select it. A window will appear in which you must manually enter the column names, set data types, date mask and field lengths.

Processing Flat Files in Talend DI Tutorial - cool example in 5 mins!

Then set the path to the source file and file parameters:

Processing Flat Files in Talend DI Tutorial - cool example in 5 mins!

Adding a Flat Files using repository

In the repository, in the Metadata, select File Delimited -> Create file delimited.

Processing Flat Files in Talend DI Tutorial - cool example in 5 mins!

Enter the path to the file and enter the appropriate format – in my case it is Windows.

Processing Flat Files in Talend DI Tutorial - cool example in 5 mins!

In the next step set:

  • appropriate separator (Comma),
  • CSV format
  • first record as header
Processing Flat Files in Talend DI Tutorial - cool example in 5 mins!

In the last step, Talend will propose you data types along with their lengths.

Drag the loaded file to the designer screen.

Adding target file

As the target file we will use the tFileOutputExcel component. Start writing its name on the designer screen, then confirm the choice – the component will appear on the designer sceen.

In the component tab, set the path to the target file:

Processing Flat Files in Talend DI Tutorial - cool example in 5 mins!
Processing Flat Files In Talend Di Tutorial - Cool Example In 5 Mins! 14

Adding transformation

We have already created the source and target objects. However, in order for our object not to load the same data, let’s add some transformations – we will want to combine the FIRST_NAME and NAME attributes. For this we will use the tMap component.

We already have three components in the designer, we must now link them for the process to work properly. The components can be combined in two ways:

  • clicking with the mouse on the output icon (O) located on the source component and dragging the appearing arrow to the tMap component
Processing Flat Files in Talend DI Tutorial - cool example in 5 mins!
  • right-clicking on the source component, then Row -> Main (similarly an arrow will appear that needs to be dragged onto the tMap component
Processing Flat Files in Talend DI Tutorial - cool example in 5 mins!

Similarly, tMap should be link with the target component.

We now need to set the tMap component – double click on it. A window will open in which three columns should be added to the target object: NAME, AGE, END_DATE:

Processing Flat Files in Talend DI Tutorial - cool example in 5 mins!

Now we will link source and target attributes. Drag the FIRST_NAME and NAME attributes to the NAME field, then insert a space between them:

row1.FIRST_NAME +" "+  row1.NAME  

Drag the AGE and END_DATE attributes to the target.

The process is now ready to run.

Running job and start processing flat files in Talend

Go to the run console and start the process by selecting Run button.

Processing Flat Files in Talend DI Tutorial - cool example in 5 mins!
Processing Flat Files In Talend Di Tutorial - Cool Example In 5 Mins! 15

Summary Steps

Talend Studio is a data integration platform that includes a set of tools for designing, developing, and deploying ETL (extract, transform, load) jobs and data pipelines. It allows you to connect to a variety of data sources, extract and transform data, and load it into various target systems.

Talend Studio provides a set of components for processing files, including reading from and writing to delimited files, Excel files, and other types of files. These components can be dragged and dropped onto the design workspace and configured to specify the file path, structure, and other details.

Talend is a data integration and management platform that allows you to process flat files as part of your data integration tasks. There are several components in Talend that can be used to process flat files, depending on the specific requirements of your data integration job.

Here are some steps you can follow to process flat files in Talend:

  1. Drag and drop a tFileInputDelimited or tFileInputFixedWidth component from the Palette onto the Designer canvas. These components are used to read flat files with delimited or fixed-width fields, respectively.
  2. Connect the tFileInputDelimited or tFileInputFixedWidth component to a tMap component. The tMap component is used to transform and manipulate data as it is being read from the flat file.
  3. Configure the tFileInputDelimited or tFileInputFixedWidth component by specifying the location and name of the flat file to be read, as well as the delimiter or fixed width settings.
  4. Use the tMap component to define the mapping and transformation rules for the data being read from the flat file. This can include operations such as filtering rows, merging columns, and performing calculations on data.
  5. Connect the tMap component to a tFileOutputDelimited or tFileOutputFixedWidth component, depending on the format of the output file you want to create. These components are used to write the transformed data to a new flat file.
  6. Configure the tFileOutputDelimited or tFileOutputFixedWidth component by specifying the location and name of the output file, as well as the delimiter or fixed width settings.
  7. Run the job to process the flat file and create the output file.

You can also use other components in Talend, such as tFileInputExcel and tFileOutputExcel, to read from and write to Excel files, as well as tFileInputXML and tFileOutputXML to read from and write to XML files.

That’a all about how to start Processing flat files in Talend. Enjoy!

Could You Please Share This Post? 
I appreciate It And Thank YOU! :)
Have A Nice Day!

How useful was this post?

Click on a star to rate it!

Average rating 4.8 / 5. Vote count: 238

No votes so far! Be the first to rate this post.

As you found this post useful...

Follow us on social media!

We are sorry that this post was not useful for you!

Let us improve this post!

Tell us how we can improve this post?

Leave a Reply