Run Cloudera Quickstart Using Docker – Easy Steps & Setup In 5 Mins!

Run Cloudera QuickStart using Docker
Share this post and Earn Free Points!

In this short post I will show how you can run the Cloudera QuickStart using Docker. As you know from my previous post I am big fan of dockers and of all the stuff related to dockers. It’s great tool and I am using dockers in many situations, because it’s very easy to setup the run specific application or setup complex environment in few minutes. Additional it allows us to easily manage this application.

Introduction

What Is Cloudera Data Platform (CDP)?

The Cloudera" Data Platform (CDP) is a data cloud designed for businesses. Businesses may use CDP to manage and secure the entire data lifecycle – gathering, enriching, analyzing, testing, and predicting with their data – in order to gain actionable insights and make data-driven decisions. To process enterprise data volumes, multi-stage analytic pipelines are required for the most valuable and disruptive business use cases. CDP enables businesses to succeed in the digital transformation era by enabling them to extract value from large-scale, complex, distributed, and quickly changing data.

Docker And Docker Images

Docker" is a containerization platform that allows you to package and run applications in isolated environments called containers. Containers provide a convenient and lightweight way to package and deploy applications, making it easy to run the same code on different systems and environments.

Docker images are the templates that are used to create Docker" containers. They contain all the necessary code, libraries, and dependencies needed to run an application. When you run a Docker" container, you are running an instance of a Docker" image.

To use Docker", you will need to install the Docker" engine on your system. Once the engine is installed, you can use the docker command-line tool to manage Docker" images and containers. Some common Docker" commands include:

  • docker pull IMAGE_NAME: This command downloads a Docker" image from a registry (such as Docker" Hub) to your local system.
  • docker run IMAGE_NAME: This command runs a new Docker" container based on the specified image.
  • docker ps: This command lists all the running Docker" containers on your system.
  • docker stop CONTAINER_ID: This command stops the specified Docker" container.

For more information on Docker" and how to use it, you can consult the Docker" documentation or visit the Docker" website.

Cloudera Quickstart As Docker Image

Cloudera" QuickStart is a fully-functional Apache Hadoop" distribution that is packaged in a Docker" container. It allows you to quickly and easily set up a Cloudera" Hadoop" environment on your local machine for testing and development purposes.

To use Cloudera" QuickStart with Docker", you will need to have Docker" installed on your system. You can then use the docker pull command to download the Cloudera" QuickStart Docker" image from a registry such as Docker" Hub.


One Command -> Cloudera QuickStart Using Docker

To up and run Cloudera" QuickStart environment please use this command below. Additional I mounted the “/src” directory to Docker", but of course you don’t have to do this. (Run Cloudera" QuickStart using Docker")

sudo docker run --hostname=quickstart.cloudera --privileged=true -t -i -v /src:/src --publish-all=true -p 8888 -p 7180 cloudera/quickstart /usr/bin/docker-quickstart

Now please wait few seconds or minutes. If you don’t have an image locally, it will be downloaded and after that the container will be created.

When container (Cloudera" QuickStart) will be up and running please find the entry in logs like this one. STARTUP_MSG: host = quickstart.cloudera/172.17.0.2. Now you can use this IP address to connect to Hue for instance. (Run Cloudera" QuickStart using Docker")

BigData-ETL: Screenshot from 2019 12 20 10 22 59
Run Cloudera" QuickStart using Docker"

Cloudera QuickStart -> Docker Logs

Here you can the logs from Cloudera" QuickStart container image. (Run Cloudera" QuickStart using Docker")

Starting mysqld:                                           [  OK  ]

if [ "$1" == "start" ] ; then
    if [ "${EC2}" == 'true' ]; then
        FIRST_BOOT_FLAG=/var/lib/cloudera-quickstart/.ec2-key-installed
        if [ ! -f "${FIRST_BOOT_FLAG}" ]; then
            METADATA_API=http://169.254.169.254/latest/meta-data
            KEY_URL=${METADATA_API}/public-keys/0/openssh-key
            SSH_DIR=/home/cloudera/.ssh
            mkdir -p ${SSH_DIR}
            chown cloudera:cloudera ${SSH_DIR}
            curl ${KEY_URL} >> ${SSH_DIR}/authorized_keys
            touch ${FIRST_BOOT_FLAG}
        fi
    fi
    if [ "${DOCKER}" != 'true' ]; then
        if [ -f /sys/kernel/mm/redhat_transparent_hugepage/defrag ]; then
            echo never > /sys/kernel/mm/redhat_transparent_hugepage/defrag
        fi

        cloudera-quickstart-ip
        HOSTNAME=quickstart.cloudera
        hostname ${HOSTNAME}
        sed -i -e "s/HOSTNAME=.*/HOSTNAME=${HOSTNAME}/" /etc/sysconfig/network
    fi

    (
        cd /var/lib/cloudera-quickstart/tutorial;
        nohup python -m SimpleHTTPServer 80 &
    )

    # TODO: check for expired CM license and update config.js accordingly
fi
+ '[' start == start ']'
+ '[' '' == true ']'
+ '[' true '!=' true ']'
+ cd /var/lib/cloudera-quickstart/tutorial

+ nohup python -m SimpleHTTPServer 80
nohup: appending output to `nohup.out'
JMX enabled by default
Using config: /etc/zookeeper/conf/zoo.cfg
Starting zookeeper ... STARTED
starting datanode, logging to /var/log/hadoop-hdfs/hadoop-hdfs-datanode-quickstart.cloudera.out
Started Hadoop datanode (hadoop-hdfs-datanode):            [  OK  ]
starting journalnode, logging to /var/log/hadoop-hdfs/hadoop-hdfs-journalnode-quickstart.cloudera.out
Started Hadoop journalnode:                                [  OK  ]
starting namenode, logging to /var/log/hadoop-hdfs/hadoop-hdfs-namenode-quickstart.cloudera.out
Started Hadoop namenode:                                   [  OK  ]
starting secondarynamenode, logging to /var/log/hadoop-hdfs/hadoop-hdfs-secondarynamenode-quickstart.cloudera.out
Started Hadoop secondarynamenode:                          [  OK  ]

Setting HTTPFS_HOME:          /usr/lib/hadoop-httpfs
Using   HTTPFS_CONFIG:        /etc/hadoop-httpfs/conf
Sourcing:                    /etc/hadoop-httpfs/conf/httpfs-env.sh
Using   HTTPFS_LOG:           /var/log/hadoop-httpfs/
Using   HTTPFS_TEMP:           /var/run/hadoop-httpfs
Setting HTTPFS_HTTP_PORT:     14000
Setting HTTPFS_ADMIN_PORT:     14001
Setting HTTPFS_HTTP_HOSTNAME: quickstart.cloudera
Setting HTTPFS_SSL_ENABLED: false
Setting HTTPFS_SSL_KEYSTORE_FILE:     /var/lib/hadoop-httpfs/.keystore
Setting HTTPFS_SSL_KEYSTORE_PASS:     password
Using   CATALINA_BASE:       /var/lib/hadoop-httpfs/tomcat-deployment
Using   HTTPFS_CATALINA_HOME:       /usr/lib/bigtop-tomcat
Setting CATALINA_OUT:        /var/log/hadoop-httpfs//httpfs-catalina.out
Using   CATALINA_PID:        /var/run/hadoop-httpfs/hadoop-httpfs-httpfs.pid

Using   CATALINA_OPTS:       
Adding to CATALINA_OPTS:     -Dhttpfs.home.dir=/usr/lib/hadoop-httpfs -Dhttpfs.config.dir=/etc/hadoop-httpfs/conf -Dhttpfs.log.dir=/var/log/hadoop-httpfs/ -Dhttpfs.temp.dir=/var/run/hadoop-httpfs -Dhttpfs.admin.port=14001 -Dhttpfs.http.port=14000 -Dhttpfs.http.hostname=quickstart.cloudera
Using CATALINA_BASE:   /var/lib/hadoop-httpfs/tomcat-deployment
Using CATALINA_HOME:   /usr/lib/bigtop-tomcat
Using CATALINA_TMPDIR: /var/run/hadoop-httpfs
Using JRE_HOME:        /usr/java/jdk1.7.0_67-cloudera
Using CLASSPATH:       /usr/lib/bigtop-tomcat/bin/bootstrap.jar
Using CATALINA_PID:    /var/run/hadoop-httpfs/hadoop-httpfs-httpfs.pid
Started Hadoop httpfs (hadoop-httpfs):                     [  OK  ]
starting historyserver, logging to /var/log/hadoop-mapreduce/mapred-mapred-historyserver-quickstart.cloudera.out
19/12/20 09:19:27 INFO hs.JobHistoryServer: STARTUP_MSG: 
/************************************************************
STARTUP_MSG: Starting JobHistoryServer
STARTUP_MSG:   host = quickstart.cloudera/172.17.0.2
STARTUP_MSG:   args = []
STARTUP_MSG:   version = 2.6.0-cdh5.7.0
STARTUP_MSG:   build = http://github.com/cloudera/hadoop -r c00978c67b0d3fe9f3b896b5030741bd40bf541a; compiled by 'jenkins' on 2016-03-23T18:36Z
STARTUP_MSG:   java = 1.7.0_67
************************************************************/
Started Hadoop historyserver:                              [  OK  ]
starting nodemanager, logging to /var/log/hadoop-yarn/yarn-yarn-nodemanager-quickstart.cloudera.out
Started Hadoop nodemanager:                                [  OK  ]
starting resourcemanager, logging to /var/log/hadoop-yarn/yarn-yarn-resourcemanager-quickstart.cloudera.out
Started Hadoop resourcemanager:                            [  OK  ]
starting master, logging to /var/log/hbase/hbase-hbase-master-quickstart.cloudera.out
Started HBase master daemon (hbase-master):                [  OK  ]
starting rest, logging to /var/log/hbase/hbase-hbase-rest-quickstart.cloudera.out
Started HBase rest daemon (hbase-rest):                    [  OK  ]
starting thrift, logging to /var/log/hbase/hbase-hbase-thrift-quickstart.cloudera.out
Started HBase thrift daemon (hbase-thrift):                [  OK  ]
Starting Hive Metastore (hive-metastore):                  [  OK  ]
Started Hive Server2 (hive-server2):                       [  OK  ]
Starting Sqoop Server:                                     [  OK  ]
Sqoop home directory: /usr/lib/sqoop2
Setting SQOOP_HTTP_PORT:     12000
Setting SQOOP_ADMIN_PORT:     12001
Using   CATALINA_OPTS:       -Xmx1024m
Adding to CATALINA_OPTS:    -Dsqoop.http.port=12000 -Dsqoop.admin.port=12001
Using CATALINA_BASE:   /var/lib/sqoop2/tomcat-deployment
Using CATALINA_HOME:   /usr/lib/bigtop-tomcat
Using CATALINA_TMPDIR: /var/tmp/sqoop2
Using JRE_HOME:        /usr/java/jdk1.7.0_67-cloudera
Using CLASSPATH:       /usr/lib/bigtop-tomcat/bin/bootstrap.jar
Using CATALINA_PID:    /var/run/sqoop2/sqoop-server-sqoop2.pid
Starting Spark history-server (spark-history-server):      [  OK  ]
Starting Hadoop HBase regionserver daemon: starting regionserver, logging to /var/log/hbase/hbase-hbase-regionserver-quickstart.cloudera.out
hbase-regionserver.
Starting hue:                                              [  OK  ]
Started Impala State Store Server (statestored):           [  OK  ]

Setting OOZIE_HOME:          /usr/lib/oozie
Sourcing:                    /usr/lib/oozie/bin/oozie-env.sh
  setting JAVA_LIBRARY_PATH="$JAVA_LIBRARY_PATH:/usr/lib/hadoop/lib/native"
  setting OOZIE_DATA=/var/lib/oozie
  setting OOZIE_CATALINA_HOME=/usr/lib/bigtop-tomcat
  setting CATALINA_TMPDIR=/var/lib/oozie
  setting CATALINA_PID=/var/run/oozie/oozie.pid
  setting CATALINA_BASE=/var/lib/oozie/tomcat-deployment
  setting OOZIE_HTTPS_PORT=11443
  setting OOZIE_HTTPS_KEYSTORE_PASS=password
  setting CATALINA_OPTS="$CATALINA_OPTS -Doozie.https.port=${OOZIE_HTTPS_PORT}"
  setting CATALINA_OPTS="$CATALINA_OPTS -Doozie.https.keystore.pass=${OOZIE_HTTPS_KEYSTORE_PASS}"
  setting CATALINA_OPTS="$CATALINA_OPTS -Xmx1024m"
  setting OOZIE_CONFIG=/etc/oozie/conf
  setting OOZIE_LOG=/var/log/oozie
Using   OOZIE_CONFIG:        /etc/oozie/conf
Sourcing:                    /etc/oozie/conf/oozie-env.sh
  setting JAVA_LIBRARY_PATH="$JAVA_LIBRARY_PATH:/usr/lib/hadoop/lib/native"
  setting OOZIE_DATA=/var/lib/oozie
  setting OOZIE_CATALINA_HOME=/usr/lib/bigtop-tomcat
  setting CATALINA_TMPDIR=/var/lib/oozie
  setting CATALINA_PID=/var/run/oozie/oozie.pid
  setting CATALINA_BASE=/var/lib/oozie/tomcat-deployment
  setting OOZIE_HTTPS_PORT=11443
  setting OOZIE_HTTPS_KEYSTORE_PASS=password
  setting CATALINA_OPTS="$CATALINA_OPTS -Doozie.https.port=${OOZIE_HTTPS_PORT}"
  setting CATALINA_OPTS="$CATALINA_OPTS -Doozie.https.keystore.pass=${OOZIE_HTTPS_KEYSTORE_PASS}"
  setting CATALINA_OPTS="$CATALINA_OPTS -Xmx1024m"
  setting OOZIE_CONFIG=/etc/oozie/conf
  setting OOZIE_LOG=/var/log/oozie
Setting OOZIE_CONFIG_FILE:   oozie-site.xml
Using   OOZIE_DATA:          /var/lib/oozie
Using   OOZIE_LOG:           /var/log/oozie
Setting OOZIE_LOG4J_FILE:    oozie-log4j.properties
Setting OOZIE_LOG4J_RELOAD:  10
Setting OOZIE_HTTP_HOSTNAME: quickstart.cloudera
Setting OOZIE_HTTP_PORT:     11000
Setting OOZIE_ADMIN_PORT:     11001
Using   OOZIE_HTTPS_PORT:     11443
Setting OOZIE_BASE_URL:      http://quickstart.cloudera:11000/oozie
Using   CATALINA_BASE:       /var/lib/oozie/tomcat-deployment
Setting OOZIE_HTTPS_KEYSTORE_FILE:     /var/lib/oozie/.keystore
Using   OOZIE_HTTPS_KEYSTORE_PASS:     password
Setting OOZIE_INSTANCE_ID:       quickstart.cloudera
Setting CATALINA_OUT:        /var/log/oozie/catalina.out
Using   CATALINA_PID:        /var/run/oozie/oozie.pid

Using   CATALINA_OPTS:        -Doozie.https.port=11443 -Doozie.https.keystore.pass=password -Xmx1024m -Doozie.https.port=11443 -Doozie.https.keystore.pass=password -Xmx1024m -Dderby.stream.error.file=/var/log/oozie/derby.log
Adding to CATALINA_OPTS:     -Doozie.home.dir=/usr/lib/oozie -Doozie.config.dir=/etc/oozie/conf -Doozie.log.dir=/var/log/oozie -Doozie.data.dir=/var/lib/oozie -Doozie.instance.id=quickstart.cloudera -Doozie.config.file=oozie-site.xml -Doozie.log4j.file=oozie-log4j.properties -Doozie.log4j.reload=10 -Doozie.http.hostname=quickstart.cloudera -Doozie.admin.port=11001 -Doozie.http.port=11000 -Doozie.https.port=11443 -Doozie.base.url=http://quickstart.cloudera:11000/oozie -Doozie.https.keystore.file=/var/lib/oozie/.keystore -Doozie.https.keystore.pass=password -Djava.library.path=:/usr/lib/hadoop/lib/native:/usr/lib/hadoop/lib/native

Using CATALINA_BASE:   /var/lib/oozie/tomcat-deployment
Using CATALINA_HOME:   /usr/lib/bigtop-tomcat
Using CATALINA_TMPDIR: /var/lib/oozie
Using JRE_HOME:        /usr/java/jdk1.7.0_67-cloudera
Using CLASSPATH:       /usr/lib/bigtop-tomcat/bin/bootstrap.jar
Using CATALINA_PID:    /var/run/oozie/oozie.pid
Starting Solr server daemon:                               [  OK  ]
Using CATALINA_BASE:   /var/lib/solr/tomcat-deployment
Using CATALINA_HOME:   /usr/lib/solr/../bigtop-tomcat
Using CATALINA_TMPDIR: /var/lib/solr/
Using JRE_HOME:        /usr/java/jdk1.7.0_67-cloudera
Using CLASSPATH:       /usr/lib/solr/../bigtop-tomcat/bin/bootstrap.jar
Using CATALINA_PID:    /var/run/solr/solr.pid
Started Impala Catalog Server (catalogd) :                 [  OK  ]
Started Impala Server (impalad):                           [  OK  ]

Summary

I hope this tutorial helped you to setup your first local development envrionment with full working one one Hadoop" cluster.

For more information on Cloudera" QuickStart and how to use it, you can consult the Cloudera" documentation or visit the Cloudera" website.

Could You Please Share This Post? 
I appreciate It And Thank YOU! :)
Have A Nice Day!

How useful was this post?

Click on a star to rate it!

Average rating 4.7 / 5. Vote count: 716

No votes so far! Be the first to rate this post.

As you found this post useful...

Follow us on social media!

We are sorry that this post was not useful for you!

Let us improve this post!

Tell us how we can improve this post?