We will use the FileSystem and Path classes from the org.apache.hadoop.fs library to achieve it.
Cloudera is one of the three major players in the market alongside Hortonworks and MapR, which distributes the Hadoop general-interest.
Today I will show you how you can use Machine Learning libraries (ML), which are available in Spark as a library under the name Spark MLib.
In this tutorial I will show you how you can easily install Apache Spark in CentOs
Simple short tip how to check if table exists int Hive using Spark
In this tutorial I will show you what is the best apporach to convert the data from one format (CSV, Parquet, Avro, ORC) to another.
Like in the title :)
In this tutorial, I will introduce you to examples of reading data using the Dataframe API in Spark.
In this tutorial I will show you how to create Scala project in IntelliJ IDEA and run the Spark job locally.