Teradata Primary Index – unique non-unique or maybe without PI – check 3 types analysing easy examples?

Teradata Primary Index - unique, non-unique or maybe without PI?
Share this post and Earn Free Points!

In this post, you will learn what the Teradata Primary Index is (PI), why it is worth determining it, and you will learn about types of PI. You will also read about the very important feature of Teradata – well, if you do not specify the primary index, it does not mean that it is not there!

Introduction

Teradata

Teradata is a company that specializes in data warehousing and analytics. It offers a variety of products and services to help organizations manage, store, and analyze large amounts of data. Its main product is a database management system that is capable of handling very large datasets and enabling users to perform advanced queries and analyses on the data efficiently.

Teradata is commonly used in businesses and government agencies to support decision making, improve business processes, and extract insights from data.

Database Primary Key

A primary key is a column or set of columns in a database table that is used to uniquely identify each row in the table. A primary key is typically defined as a column or columns that cannot contain null values and must be unique across all rows in the table. In other words, no two rows in the table can have the same primary key value.

The primary key serves several important purposes in a database table:

  1. It ensures the uniqueness and integrity of the data in the table, as no two rows can have the same primary key value.
  2. It allows for efficient searching and indexing of the data in the table, as the primary key can be used as an index to quickly locate specific rows in the table.
  3. It allows for the creation of relationships between tables in a database, as foreign keys in other tables can reference the primary key in the current table.

Database Unique Key

A unique key is a column or set of columns in a database table that is used to uniquely identify each row in the table. Like a primary key, a unique key is defined as a column or columns that cannot contain null values and must be unique across all rows in the table. However, unlike a primary key, a unique key is not required to be the main index for the table and does not have to be used to create relationships with other tables.

Unique keys serve several important purposes in a database table:

  1. They ensure the uniqueness of the data in the table, as no two rows can have the same unique key value.
  2. They allow for efficient searching and indexing of the data in the table, as the unique key can be used as an index to quickly locate specific rows in the table.
  3. They can be used as an alternative to a primary key when the data in the table does not have a single column or set of columns that can be used as a primary key.

Teradata Primary Index – who needs it?

Teradata Primary Index performs a very important function – it determines which AMP will store our data on. If we care about query performance, and when choosing Teradata it certainly is, it is very important that each AMP stores a similar amount of data. Yes, an even distribution of data determines the efficiency of our operations.

The each table can have only one primary index and must be specified when creating the table. It is not possible to specify it during the ALTER operation.

Unique Primary Index (UPI)

The Unique Teradata Primary Index (UPI) ensures that the data is unique and evenly distributed across individual AMPs. What does it mean? Let’s look at the example below. We have 6 subjects, one of them in study 2000, two in study 2010 and three of them in study 2050.

SUBJECT_IDSTUDY_IDSUBJECT_NAMESUBJECT_STATUS
1002000Jan K1
1012010Marzena N1
1022010Anna P2
1032050Mateusz L1
1042050Jakub S1
1052050Zygmunt A1

Let’s create now a SUBJECT table with a unique primary index.

CREATE TABLE SUBJECT (
SUBJECT_ID INTEGER,
STUDY_ID INTEGER,
SUBJECT_NAME VARCHAR(100),
SUBJECT_STATUS BYTEINT )
UNIQUE PRIMARY INDEX UPI_SUBJECT (SUBJECT_ID);

Cool! We have already created a table with a unique primary index. We can suppose that have 3 AMPs. How was the data spread over them? Each of them got the same portion of records.

Teradata Primary Index - unique, non-unique or maybe without PI?

To clarify why this happened, we need to introduce the short explanation of the hash formula, which Teradata uses to locate records on individual AMPs. Each index value is different, so it also has a different hash formula result. Due to the lack of relationship between the hash values of the formula, the records are placed randomly but evenly on AMPs.

Non-Unique Teradata Primary Index (NUPI)

The Non-Unique Teradata Primary Index (NUPI) does not ensure the uniqueness of the data and even distribution on AMPs. To illustrate NUPI, let’s use subject data presented in UPI. Let’s create tables with a non-unique primary index:

CREATE TABLE SUBJECT (
SUBJECT_ID INTEGER,
STUDY_ID INTEGER,
SUBJECT_NAME VARCHAR(100),
SUBJECT_STATUS BYTEINT )
PRIMARY INDEX (STUDY_ID);

Ok, table with NUPI was created. Please notice only that now the index is based on the STUDY_ID column, which is not unique. How will the records be spread across our 3 AMPs?

The result of the hash formula for STUDY_ID equal to 2050 is identical in three cases, therefore these records will be kept on one AMP. Similarly for records with ID 2010.

So as we can see, the amount of data in individual AMPs are different – so we are talking about uneven distribution. Imagine now that one of our AMPs holds 100 million records, while the others keep only 10,000 records. We have to perform complex calculations on our records. AMPs that contain 10,000 records will be able to calculate calculations very quickly (let’s assume 5 seconds), while AMPs containing 100 million records need more time for the same calculations (e.g. 1 minute).

What will be the time to receive the result of our query? Yes, it will be 1 minute – the result will be return when all AMPs finish their calculations. That is why the even distribution of data on all AMPs is so important.

Multi-Column Teradata Primary Index

Multi-Column Primary Index is used on many columns (a maximum of 64  columns can be used).

The table with multi-column Teradata Primary Index will look like this:

CREATE TABLE SUBJECT (
SUBJECT_ID INTEGER,
STUDY_ID INTEGER,
SUBJECT_NAME VARCHAR(100),
SUBJECT_STATUS BYTEINT )
PRIMARY INDEX (SUBJECT_ID, STUDY_ID);

Although we use a non-unique index here, including many columns in the index makes it unique. The hash formula uses both columns, ensuring even distribution of data in individual AMPs.

What if I do not want to have a primary index?

The creating tables without a Teradata Primary Index is often used to feed stage tables. It allows for random but even distribution of records on all AMPs. If you do not want your table to have a primary index, you must include it in the table definition!

CREATE TABLE SUBJECT (
SUBJECT_ID INTEGER,
STUDY_ID INTEGER,
SUBJECT_NAME VARCHAR(100),
SUBJECT_STATUS BYTEINT )
NO PRIMARY INDEX ;

The lack of the above NO PRIMARY INDEX statement will cause Teradata feature to be called, which consists in assigning the first column as a non-unique main index (NUPI). Let’s see! We will create tables without defining the index:

CREATE TABLE SUBJECT (
SUBJECT_ID INTEGER,
STUDY_ID INTEGER,
SUBJECT_NAME VARCHAR(100),
SUBJECT_STATUS BYTEINT );

And now we will check whether our table got the main index:

SELECT DATABASENAME, TABLENAME, COLUMNNAME, INDEXTYPE, INDEXNUMBER, INDEXNAME
FROM DBC.INDICES
WHERE DATABASENAME='<your database name>' AND TABLENAME='SUBJECT';

That’s right, the Teradata Primary Index was created on the first column (SUBJECT_ID).

Summary

Teradata is a company that provides data warehousing and analytic software, services, and hardware. It was founded in 1979 and is headquartered in San Diego, California. The company’s main product is a database management system that is designed specifically for data warehousing and analytics applications. Teradata’s database management system is capable of handling very large amounts of data and is used by businesses and organizations around the world to store and analyze their data. In addition to its database management system, Teradata also offers a range of other products and services related to data warehousing and analytics, including consulting, training, and support.

In the Teradata database management system, A primary key is a column or set of columns that uniquely identifies each row in a table. The primary key is used to enforce the uniqueness and integrity of the data in the table.

Could You Please Share This Post? 
I appreciate It And Thank YOU! :)
Have A Nice Day!

How useful was this post?

Click on a star to rate it!

Average rating 4.9 / 5. Vote count: 791

No votes so far! Be the first to rate this post.

As you found this post useful...

Follow us on social media!

We are sorry that this post was not useful for you!

Let us improve this post!

Tell us how we can improve this post?