In this post, you will learn what the primary index is (PI), why it is worth determining it, and you will learn about types of PI. You will also read about the very important feature of Teradata – well, if you do not specify the primary index, it does not mean that it is not there!
Primary index – who needs it?
The primary index in Teradata performs a very important function – it determines which AMP will store our data on. If we care about query performance, and when choosing Teradate it certainly is, it is very important that each AMP stores a similar amount of data. Yes, an even distribution of data determines the efficiency of our operations.
The each table can have only one primary index and must be specified when creating the table. It is not possible to specify it during the ALTER operation.
Unique Primary Index (UPI)
The Unique Primary Index (UPI) ensures that the data is unique and evenly distributed across individual AMPs. What does it mean? Let’s look at the example below. We have 6 subjects, one of them in study 2000, two in study 2010 and three of them in study 2050.
Let’s create now a SUBJECT table with a unique primary index.
CREATE TABLE SUBJECT ( SUBJECT_ID INTEGER, STUDY_ID INTEGER, SUBJECT_NAME VARCHAR(100), SUBJECT_STATUS BYTEINT ) UNIQUE PRIMARY INDEX UPI_SUBJECT (SUBJECT_ID);
Cool! We have already created a table with a unique primary index. We can suppose that have 3 AMPs. How was the data spread over them? Each of them got the same portion of records.
To clarify why this happened, we need to introduce the short explanation of the hash formula, which Teradata uses to locate records on individual AMPs. Each index value is different, so it also has a different hash formula result. Due to the lack of relationship between the hash values of the formula, the records are placed randomly but evenly on AMPs.
Non-Unique Primary Index (NUPI)
The Non-Unique Primary Index (NUPI) does not ensure the uniqueness of the data and even distribution on AMPs. To illustrate NUPI, let’s use subject data presented in UPI. Let’s create tables with a non-unique primary index:
CREATE TABLE SUBJECT ( SUBJECT_ID INTEGER, STUDY_ID INTEGER, SUBJECT_NAME VARCHAR(100), SUBJECT_STATUS BYTEINT ) PRIMARY INDEX (STUDY_ID);
Ok, table with NUPI was created. Please notice only that now the index is based on the STUDY_ID column, which is not unique. How will the records be spread across our 3 AMPs?
The result of the hash formula for STUDY_ID equal to 2050 is identical in three cases, therefore these records will be kept on one AMP. Similarly for records with ID 2010.
So as we can see, the amount of data in individual AMPs are different – so we are talking about uneven distribution. Imagine now that one of our AMPs holds 100 million records, while the others keep only 10,000 records. We have to perform complex calculations on our records. AMPs that contain 10,000 records will be able to calculate calculations very quickly (let’s assume 5 seconds), while AMPs containing 100 million records need more time for the same calculations (e.g. 1 minute). What will be the time to receive the result of our query? Yes, it will be 1 minute – the result will be return when all AMPs finish their calculations. That is why the even distribution of data on all AMPs is so important.
Multi-Column Primary Index
Multi-Column Primary Index is used on many columns (a maximum of 64 columns can be used).
The table with multi-column primary index will look like this:
CREATE TABLE SUBJECT ( SUBJECT_ID INTEGER, STUDY_ID INTEGER, SUBJECT_NAME VARCHAR(100), SUBJECT_STATUS BYTEINT ) PRIMARY INDEX (SUBJECT_ID, STUDY_ID);
Although we use a non-unique index here, including many columns in the index makes it unique. The hash formula uses both columns, ensuring even distribution of data in individual AMPs.
What if I do not want to have a primary index?
The creating tables without a primary index is often used to feed stage tables. It allows for random but even distribution of records on all AMPs. If you do not want your table to have a primary index, you must include it in the table definition!
CREATE TABLE SUBJECT ( SUBJECT_ID INTEGER, STUDY_ID INTEGER, SUBJECT_NAME VARCHAR(100), SUBJECT_STATUS BYTEINT ) NO PRIMARY INDEX ;
The lack of the above NO PRIMARY INDEX statement will cause Teradata feature to be called, which consists in assigning the first column as a non-unique main index (NUPI). Let’s see! We will create tables without defining the index:
CREATE TABLE SUBJECT ( SUBJECT_ID INTEGER, STUDY_ID INTEGER, SUBJECT_NAME VARCHAR(100), SUBJECT_STATUS BYTEINT );
And now we will check whether our table got the main index:
SELECT DATABASENAME, TABLENAME, COLUMNNAME, INDEXTYPE, INDEXNUMBER, INDEXNAME FROM DBC.INDICES WHERE DATABASENAME='<your database name>' AND TABLENAME='SUBJECT';
That’s right, the primary index was created on the first column (SUBJECT_ID).