今回は Apache Hive の HiveQL クエリの紹介と、select 時に発生した “FAILED: SemanticException null” の問題について紹介します。
————————————
▼1. HiveQL とは
————————————
HiveQL を利用すると SQL 文のような create, select, update クエリなどでデータの操作が可能です。
Hive は オープンソースの分散 Data Warehouse であり、分析用の Big Data の Framework である Spark とは異なります。
Hive のデータは実際、RDB に保存されます。今回の例では MySQL を利用しています。これらのデータの操作は HiveQL を使い行います。一方で Spark では in-memory でデータが保持され、外部の HDFS のストレージにデータが保存されます。Spark SQL などを使ってデータの操作を行います。
(参考)Comparing Apache Hive vs. Spark
————————————
▼2. HiveQL の紹介
————————————
4 つのパートに分けてクエリの概要と、詳細が記載されたリンクのみを紹介します。
———————————–
2-1. DDL (Data Definition Language) Statements
2-2. DML (Data Manipulation Language) Statements
2-3. Data Retrieval Statements
2-4. Auxiliary Statements
———————————–
2-1. DDL (Data Definition Statement) LanguageManual DDL – Apache Hive – Apache Software Foundation
例) Create Database, Create table like, Create external table, Alter table, Drop table, MSCK REPAIR TABLE, Show tables, Show partition, Describe {formated | extended } table
2-2. DML (Data ManipuLation) Statement LanguageManual DML – Apache Hive – Apache Software Foundation
例) Insert table
2-3. Data Retrieval Statements
例) With 句 Hadoop Hive WITH Clause Syntax and Examples – DWgeek.com
Cluster By 句 Hive Cluster By | Complete Guide to Hive Cluster with Examples (educba.com)
Distribute by 句 Sort By, Order By, Distribute By, and Cluster By in Hive – SQLRelease
Group BY HiveQL – Select-Group By (tutorialspoint.com)
Having 句 HiveQL – GROUP BY and HAVING Clause – javatpoint
Explain 句 実行プランを出すLanguageManual Explain – Apache Hive – Apache Software Foundation
2-4. Auxiliary Statements
例) Analyzetable https://cwiki.apache.org/confluence/display/hive/statsdev
CACHE TABLE https://drill.apache.org/docs/hive-metadata-caching/
————————————
▼3. 事前準備
————————————
3-1. Apache Hive の環境を用意します。
Apache Hive インストール 3 ノードの環境 No.57 –2022/08
3-2. テーブルを作成
hive> create table test(name string) partitioned by (id int);
hive> insert into test(id,name) values(1,'taro'),(2,'hanako'),(3,'ken'),(4,'koji');
hive> select id,name from test;
OK
1 taro
2 hanako
3 ken
4 koji
————————————
▼4. HiveQL select 失敗 “FAILED: SemanticException null”
————————————
HiveQL ではリレーショナルデータベースを利用しており、データを管理しているテーブルで不整合が発生した際、select や他のクエリが実行できない状況が発生します。
例えば以下のように select を行うと正しい結果が返ってこず、Failed: SemanticException null のエラーが表示されます。
hive> select id,name from test;
FAILED: SemanticException null
この状態で msck repair table test; を実施しましたが、NullPointerException のエラーが返り、修正できませんでした。
hive> msck repair table test;
FAILED: Hive Internal Error: org.apache.hadoop.hive.ql.metadata.HiveException(Error while invoking FailureHook. hooks: java.lang.NullPointerException
at org.apache.hadoop.hive.ql.reexec.ReExecutionOverlayPlugin$LocalHook.run(ReExecutionOverlayPlugin.java:45)
at org.apache.hadoop.hive.ql.HookRunner.invokeGeneralHook(HookRunner.java:296)
at org.apache.hadoop.hive.ql.HookRunner.runFailureHooks(HookRunner.java:283)
at org.apache.hadoop.hive.ql.Driver.invokeFailureHooks(Driver.java:2616)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:2386)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:2011)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1709)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1703)
at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:157)
at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:218)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:239)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:188)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:402)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:821)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:683)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:323)
at org.apache.hadoop.util.RunJar.main(RunJar.java:236)
)
org.apache.hadoop.hive.ql.metadata.HiveException: Error while invoking FailureHook. hooks: java.lang.NullPointerException
at org.apache.hadoop.hive.ql.reexec.ReExecutionOverlayPlugin$LocalHook.run(ReExecutionOverlayPlugin.java:45)
at org.apache.hadoop.hive.ql.HookRunner.invokeGeneralHook(HookRunner.java:296)
at org.apache.hadoop.hive.ql.HookRunner.runFailureHooks(HookRunner.java:283)
at org.apache.hadoop.hive.ql.Driver.invokeFailureHooks(Driver.java:2616)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:2386)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:2011)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1709)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1703)
at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:157)
at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:218)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:239)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:188)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:402)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:821)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:683)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:323)
at org.apache.hadoop.util.RunJar.main(RunJar.java:236)
at org.apache.hadoop.hive.ql.HookRunner.invokeGeneralHook(HookRunner.java:302)
at org.apache.hadoop.hive.ql.HookRunner.runFailureHooks(HookRunner.java:283)
at org.apache.hadoop.hive.ql.Driver.invokeFailureHooks(Driver.java:2616)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:2386)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:2011)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1709)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1703)
at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:157)
at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:218)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:239)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:188)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:402)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:821)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:683)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:323)
at org.apache.hadoop.util.RunJar.main(RunJar.java:236)
Caused by: java.lang.NullPointerException
at org.apache.hadoop.hive.ql.reexec.ReExecutionOverlayPlugin$LocalHook.run(ReExecutionOverlayPlugin.java:45)
at org.apache.hadoop.hive.ql.HookRunner.invokeGeneralHook(HookRunner.java:296)
... 20 more
4-1, 原因
HiveQL ではリレーショナルデータベースを利用しており、データを管理している Metastore にあるテーブルが破損したり、不整合が発生した際、上記のエラーが発生します。
今回は以下のように MySQL の metastore のテーブルを意図して不整合を起こし、エラーを発生させました。
$ sudo mysql -u root
mysql> use metastore;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed
mysql> select * from PARTITIONS;
+---------+-------------+------------------+-----------+-------+--------+
| PART_ID | CREATE_TIME | LAST_ACCESS_TIME | PART_NAME | SD_ID | TBL_ID |
+---------+-------------+------------------+-----------+-------+--------+
| 3 | 0 | 0 | id=1 | 10 | 7 |
| 4 | 0 | 0 | id=2 | 11 | 7 |
| 5 | 0 | 0 | id=4 | 12 | 7 |
| 6 | 0 | 0 | id=3 | 13 | 7 |
+---------+-------------+---
mysql> set foreign_key_checks=0;
Query OK, 0 rows affected (0.00 sec)
mysql> update PARTITIONS set SD_ID=0 where SD_ID=10;
Query OK, 1 row affected (0.02 sec)
Rows matched: 1 Changed: 1 Warnings: 0
mysql> select * from PARTITIONS;
+---------+-------------+------------------+-----------+-------+--------+
| PART_ID | CREATE_TIME | LAST_ACCESS_TIME | PART_NAME | SD_ID | TBL_ID |
+---------+-------------+------------------+-----------+-------+--------+
| 3 | 0 | 0 | id=1 | 0 | 7 |
| 4 | 0 | 0 | id=2 | 11 | 7 |
| 5 | 0 | 0 | id=4 | 12 | 7 |
| 6 | 0 | 0 | id=3 | 13 | 7 |
+---------+-------------+------------------+-----------+-------+--------+
4-2, 対処方法
対処方法は、4-1 で変更し不整合を起こした値、SD_ID の値を基の値に戻します。
$ sudo mysql -u root
mysql> use metastore;
mysql> update PARTITIONS set SD_ID=10 where SD_ID=0;
Query OK, 1 row affected (0.01 sec)
Rows matched: 1 Changed: 1 Warnings: 0
mysql> select * from PARTITIONS;
+---------+-------------+------------------+-----------+-------+--------+
| PART_ID | CREATE_TIME | LAST_ACCESS_TIME | PART_NAME | SD_ID | TBL_ID |
+---------+-------------+------------------+-----------+-------+--------+
| 3 | 0 | 0 | id=1 | 10 | 7 |
| 4 | 0 | 0 | id=2 | 11 | 7 |
| 5 | 0 | 0 | id=4 | 12 | 7 |
| 6 | 0 | 0 | id=3 | 13 | 7 |
+---------+-------------+-----
4 rows in set (0.00 sec)以下は補足情報。SDS テーブルの情報、今回問題となったテーブル test の定義、外部制約キーの変更可能な状態を既定の値に戻すクエリを紹介しています。
mysql> select * from SDS;
+-------+-------+------------------------------------------+------------------------------+------------------------------------------------------+------------------------------------------------------------+-------------+------------------------------------------------------------+----------+
| SD_ID | CD_ID | INPUT_FORMAT | IS_COMPRESSED | IS_STOREDASSUBDIRECTORIES | LOCATION | NUM_BUCKETS | OUTPUT_FORMAT | SERDE_ID |
+-------+-------+------------------------------------------+------------------------------+------------------------------------------------------+------------------------------------------------------------+-------------+------------------------------------------------------------+----------+
| 9 | 7 | org.apache.hadoop.mapred.TextInputFormat | 0x00 | 0x00 | hdfs://masternodetsu:9000/bigdata/hive/warehouse/test | -1 | org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | 9 |
| 10 | 7 | org.apache.hadoop.mapred.TextInputFormat | 0x00 | 0x00 | hdfs://masternodetsu:9000/bigdata/hive/warehouse/test/id=1 | -1 | org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | 10 |
| 11 | 7 | org.apache.hadoop.mapred.TextInputFormat | 0x00 | 0x00 | hdfs://masternodetsu:9000/bigdata/hive/warehouse/test/id=2 | -1 | org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | 11 |
| 12 | 7 | org.apache.hadoop.mapred.TextInputFormat | 0x00 | 0x00 | hdfs://masternodetsu:9000/bigdata/hive/warehouse/test/id=4 | -1 | org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | 12 |
| 13 | 7 | org.apache.hadoop.mapred.TextInputFormat | 0x00 | 0x00 | hdfs://masternodetsu:9000/bigdata/hive/warehouse/test/id=3 | -1 | org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | 13 |
+-------+-------+------------------------------------------+------------------------------+------------------------------------------------------+------------------------------------------------------------+-------------+------------------------------------------------------------+----------+
5 rows in set (0.00 sec)
mysql> show create table PARTITIONS;
+------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| PARTITIONS | CREATE TABLE `PARTITIONS` (
`PART_ID` bigint NOT NULL,
`CREATE_TIME` int NOT NULL,
`LAST_ACCESS_TIME` int NOT NULL,
`PART_NAME` varchar(767) CHARACTER SET latin1 COLLATE latin1_bin DEFAULT NULL,
`SD_ID` bigint DEFAULT NULL,
`TBL_ID` bigint DEFAULT NULL,
PRIMARY KEY (`PART_ID`),
UNIQUE KEY `UNIQUEPARTITION` (`PART_NAME`,`TBL_ID`),
KEY `PARTITIONS_N49` (`TBL_ID`),
KEY `PARTITIONS_N50` (`SD_ID`),
CONSTRAINT `PARTITIONS_FK1` FOREIGN KEY (`TBL_ID`) REFERENCES `TBLS` (`TBL_ID`),
CONSTRAINT `PARTITIONS_FK2` FOREIGN KEY (`SD_ID`) REFERENCES `SDS` (`SD_ID`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 |
+------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.01 sec)
mysql> set foreign_key_checks=1;————————————
▼5. 参考情報
————————————
(1) Apache Hive インストール 3 ノードの環境 No.57 –2022/08
(2) LanguageManual – Apache Hive – Apache Software Foundation
(3) hadoop – SemanticException Unable to fetch table – Stack Overflow
以上です。参考になりましたら幸いです。