Apache Spark & Apache Hive: Jak sprawdzić czy tabela istnieje? – 1 prosty sposób!

You are currently viewing Apache Spark & Apache Hive: Jak sprawdzić czy tabela istnieje? – 1 prosty sposób!
Share This Post, Help Others, And Earn My Heartfelt Appreciation! :)
4.8
(196)

Operacja przy użyciu Apache Spark jest bardzo prosta. Gdy szukasz tabeli w Hive, podaj jej nazwę używając małych liter (lowercase), ponieważ spark.sqlContext.tableNames zwraca tablicę nazw tabel tylko z małymi literami.

Oficjalne informacje

Spark SQL also supports reading and writing data stored in Apache Hive. However, since Hive has a large number of dependencies, these dependencies are not included in the default Spark distribution. If Hive dependencies can be found on the classpath, Spark will load them automatically. Note that these Hive dependencies must also be present on all of the worker nodes, as they will need access to the Hive serialization and deserialization libraries (SerDes) in order to access data stored in Hive.

When working with Hive, one must instantiate SparkSession with Hive support, including connectivity to a persistent Hive metastore, support for Hive serdes, and Hive user-defined functions. Users who do not have an existing Hive deployment can still enable Hive support. When not configured by the hive-site.xml, the context automatically creates metastore_db in the current directory and creates a directory configured by spark.sql.warehouse.dir, which defaults to the directory spark-warehouse in the current directory that the Spark application is started. Note that the hive.metastore.warehouse.dir property in hive-site.xml is deprecated since Spark 2.0.0. Instead, use spark.sql.warehouse.dir to specify the default location of database in warehouse. You may need to grant write privilege to the user who starts the Spark application.

Configuration of Hive is done by placing your hive-site.xml, core-site.xml (for security configuration), and hdfs-site.xml (for HDFS configuration) file in conf/.

https://spark.apache.org/docs/latest/sql-data-sources-hive-tables.html

Hive & Apache Spark: 2.0 lub wyższa wersja

// Utwórz obiekt SparkSession wraz ze wsparciem Hive'a (enableHiveSupport())
val spark = SparkSession
.builder()
.appName("Check table")
.enableHiveSupport()
.getOrCreate()
// Wybierz bazę, gdzie będziesz szukał tabeli - małe litery
spark.sqlContext.sql("use bigdata_etl")
spark.sqlContext.tableNames.contains("schemas")
res4: Boolean = true

// Duże litery
spark.sqlContext.tableNames.contains("Schemas")
res4: Boolean = false

Od Apache Spark 1.6 do 2.0

// Pobierz obiekt HiveContext na podstawie obiektu SparkContext
val sparkConf = new SparkConf().setAppName("Check table")
val sc = new SparkContext(sparkConf)
val hiveContext = new HiveContext(sc)
hiveContext.sql("use bigdata_etl")
hiveContext.tableNames.contains("schemas")

// Duże litery
hiveContext.tableNames.contains("Schemas")
res4: Boolean = false

Apache Spark: Jeśli szukana tabela istnieje w odpowiedzi otrzymasz „true”, w przeciwnym razie „false”.

Jeśli spodobał Ci się ten post to zostaw proszę komentarz poniżej lub udostępnij ten post na swoim Facebook’u, Twitter’ze, LinkedIn lub innej stronie z mediami społecznościowymi.
Dzięki!

How useful was this post?

Click on a star to rate it!

Average rating 4.8 / 5. Vote count: 196

No votes so far! Be the first to rate this post.

Subscribe
Powiadom o
guest
0 Comments
Inline Feedbacks
View all comments