Apache Hive – HiveQL FAILED: SemanticException null No.60

今回は Apache Hive の HiveQL クエリの紹介と、select 時に発生した “FAILED: SemanticException null” の問題について紹介します。

————————————
▼1. HiveQL とは
————————————
HiveQL を利用すると SQL 文のような create, select, update クエリなどでデータの操作が可能です。

Hive は オープンソースの分散 Data Warehouse であり、分析用の Big Data の Framework である Spark とは異なります。

Hive のデータは実際、RDB に保存されます。今回の例では MySQL を利用しています。これらのデータの操作は HiveQL を使い行います。一方で Spark では in-memory でデータが保持され、外部の HDFS のストレージにデータが保存されます。Spark SQL などを使ってデータの操作を行います。
(参考)Comparing Apache Hive vs. Spark

————————————
▼2. HiveQL の紹介
————————————
4 つのパートに分けてクエリの概要と、詳細が記載されたリンクのみを紹介します。

———————————–
2-1. DDL (Data Definition Language) Statements
2-2. DML (Data Manipulation Language) Statements
2-3. Data Retrieval Statements
2-4. Auxiliary Statements
———————————–


2-1. DDL (Data Definition Statement) LanguageManual DDL – Apache Hive – Apache Software Foundation
例) Create Database, Create table like, Create external table, Alter table, Drop table, MSCK REPAIR TABLE, Show tables, Show partition, Describe {formated | extended } table

2-2. DML (Data ManipuLation) Statement LanguageManual DML – Apache Hive – Apache Software Foundation
例) Insert table

2-3. Data Retrieval Statements
例) With 句 Hadoop Hive WITH Clause Syntax and Examples – DWgeek.com
Cluster By 句 Hive Cluster By | Complete Guide to Hive Cluster with Examples (educba.com)
Distribute by 句 Sort By, Order By, Distribute By, and Cluster By in Hive – SQLRelease
Group BY HiveQL – Select-Group By (tutorialspoint.com)
Having 句 HiveQL – GROUP BY and HAVING Clause – javatpoint
Explain 句 実行プランを出すLanguageManual Explain – Apache Hive – Apache Software Foundation

2-4. Auxiliary Statements
例) Analyzetable  https://cwiki.apache.org/confluence/display/hive/statsdev
CACHE TABLE https://drill.apache.org/docs/hive-metadata-caching/

————————————
▼3. 事前準備
————————————
3-1. Apache Hive の環境を用意します。
Apache Hive インストール 3 ノードの環境 No.57 –2022/08

3-2.  テーブルを作成

hive> create table test(name string) partitioned by (id int);
hive> insert into test(id,name) values(1,'taro'),(2,'hanako'),(3,'ken'),(4,'koji');
hive> select id,name from test;
OK
1	taro
2	hanako
3	ken
4	koji
 

————————————
▼4. HiveQL select 失敗 “FAILED: SemanticException null”
————————————
HiveQL ではリレーショナルデータベースを利用しており、データを管理しているテーブルで不整合が発生した際、select や他のクエリが実行できない状況が発生します。

例えば以下のように select を行うと正しい結果が返ってこず、Failed: SemanticException null のエラーが表示されます。

hive> select id,name from test;
FAILED: SemanticException null
 

この状態で msck repair table test; を実施しましたが、NullPointerException のエラーが返り、修正できませんでした。

hive> msck repair table test;
FAILED: Hive Internal Error: org.apache.hadoop.hive.ql.metadata.HiveException(Error while invoking FailureHook. hooks: java.lang.NullPointerException
	at org.apache.hadoop.hive.ql.reexec.ReExecutionOverlayPlugin$LocalHook.run(ReExecutionOverlayPlugin.java:45)
	at org.apache.hadoop.hive.ql.HookRunner.invokeGeneralHook(HookRunner.java:296)
	at org.apache.hadoop.hive.ql.HookRunner.runFailureHooks(HookRunner.java:283)
	at org.apache.hadoop.hive.ql.Driver.invokeFailureHooks(Driver.java:2616)
	at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:2386)
	at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:2011)
	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1709)
	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1703)
	at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:157)
	at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:218)
	at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:239)
	at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:188)
	at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:402)
	at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:821)
	at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
	at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:683)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.hadoop.util.RunJar.run(RunJar.java:323)
	at org.apache.hadoop.util.RunJar.main(RunJar.java:236)
)
org.apache.hadoop.hive.ql.metadata.HiveException: Error while invoking FailureHook. hooks: java.lang.NullPointerException
	at org.apache.hadoop.hive.ql.reexec.ReExecutionOverlayPlugin$LocalHook.run(ReExecutionOverlayPlugin.java:45)
	at org.apache.hadoop.hive.ql.HookRunner.invokeGeneralHook(HookRunner.java:296)
	at org.apache.hadoop.hive.ql.HookRunner.runFailureHooks(HookRunner.java:283)
	at org.apache.hadoop.hive.ql.Driver.invokeFailureHooks(Driver.java:2616)
	at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:2386)
	at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:2011)
	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1709)
	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1703)
	at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:157)
	at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:218)
	at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:239)
	at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:188)
	at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:402)
	at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:821)
	at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
	at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:683)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.hadoop.util.RunJar.run(RunJar.java:323)
	at org.apache.hadoop.util.RunJar.main(RunJar.java:236)

	at org.apache.hadoop.hive.ql.HookRunner.invokeGeneralHook(HookRunner.java:302)
	at org.apache.hadoop.hive.ql.HookRunner.runFailureHooks(HookRunner.java:283)
	at org.apache.hadoop.hive.ql.Driver.invokeFailureHooks(Driver.java:2616)
	at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:2386)
	at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:2011)
	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1709)
	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1703)
	at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:157)
	at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:218)
	at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:239)
	at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:188)
	at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:402)
	at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:821)
	at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
	at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:683)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.hadoop.util.RunJar.run(RunJar.java:323)
	at org.apache.hadoop.util.RunJar.main(RunJar.java:236)
Caused by: java.lang.NullPointerException
	at org.apache.hadoop.hive.ql.reexec.ReExecutionOverlayPlugin$LocalHook.run(ReExecutionOverlayPlugin.java:45)
	at org.apache.hadoop.hive.ql.HookRunner.invokeGeneralHook(HookRunner.java:296)
	... 20 more

 

4-1, 原因
HiveQL ではリレーショナルデータベースを利用しており、データを管理している Metastore にあるテーブルが破損したり、不整合が発生した際、上記のエラーが発生します。

今回は以下のように MySQL の metastore のテーブルを意図して不整合を起こし、エラーを発生させました。

$ sudo mysql -u root
mysql> use metastore;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> select * from PARTITIONS;
+---------+-------------+------------------+-----------+-------+--------+
| PART_ID | CREATE_TIME | LAST_ACCESS_TIME | PART_NAME | SD_ID | TBL_ID |
+---------+-------------+------------------+-----------+-------+--------+
|       3 |           0 |                0 | id=1      |    10 |      7 |
|       4 |           0 |                0 | id=2      |    11 |      7 |
|       5 |           0 |                0 | id=4      |    12 |      7 |
|       6 |           0 |                0 | id=3      |    13 |      7 |
+---------+-------------+---

mysql> set foreign_key_checks=0;
Query OK, 0 rows affected (0.00 sec)


mysql> update PARTITIONS set SD_ID=0 where SD_ID=10;
Query OK, 1 row affected (0.02 sec)
Rows matched: 1  Changed: 1  Warnings: 0

mysql> select * from PARTITIONS;
+---------+-------------+------------------+-----------+-------+--------+
| PART_ID | CREATE_TIME | LAST_ACCESS_TIME | PART_NAME | SD_ID | TBL_ID |
+---------+-------------+------------------+-----------+-------+--------+
|       3 |           0 |                0 | id=1      |     0 |      7 |
|       4 |           0 |                0 | id=2      |    11 |      7 |
|       5 |           0 |                0 | id=4      |    12 |      7 |
|       6 |           0 |                0 | id=3      |    13 |      7 |
+---------+-------------+------------------+-----------+-------+--------+

4-2, 対処方法
対処方法は、4-1 で変更し不整合を起こした値、SD_ID の値を基の値に戻します。

$ sudo mysql -u root
mysql> use metastore;
mysql> update PARTITIONS set SD_ID=10 where SD_ID=0;
Query OK, 1 row affected (0.01 sec)
Rows matched: 1  Changed: 1  Warnings: 0

mysql> select * from PARTITIONS;
+---------+-------------+------------------+-----------+-------+--------+
| PART_ID | CREATE_TIME | LAST_ACCESS_TIME | PART_NAME | SD_ID | TBL_ID |
+---------+-------------+------------------+-----------+-------+--------+
|       3 |           0 |                0 | id=1      |    10 |      7 |
|       4 |           0 |                0 | id=2      |    11 |      7 |
|       5 |           0 |                0 | id=4      |    12 |      7 |
|       6 |           0 |                0 | id=3      |    13 |      7 |
+---------+-------------+-----
4 rows in set (0.00 sec)

以下は補足情報。SDS テーブルの情報、今回問題となったテーブル test の定義、外部制約キーの変更可能な状態を既定の値に戻すクエリを紹介しています。

mysql> select * from SDS;
+-------+-------+------------------------------------------+------------------------------+------------------------------------------------------+------------------------------------------------------------+-------------+------------------------------------------------------------+----------+
| SD_ID | CD_ID | INPUT_FORMAT                             | IS_COMPRESSED                | IS_STOREDASSUBDIRECTORIES                            | LOCATION                                                   | NUM_BUCKETS | OUTPUT_FORMAT                                              | SERDE_ID |
+-------+-------+------------------------------------------+------------------------------+------------------------------------------------------+------------------------------------------------------------+-------------+------------------------------------------------------------+----------+
|     9 |     7 | org.apache.hadoop.mapred.TextInputFormat | 0x00                         | 0x00                                                 | hdfs://masternodetsu:9000/bigdata/hive/warehouse/test      |          -1 | org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat |        9 |
|    10 |     7 | org.apache.hadoop.mapred.TextInputFormat | 0x00                         | 0x00                                                 | hdfs://masternodetsu:9000/bigdata/hive/warehouse/test/id=1 |          -1 | org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat |       10 |
|    11 |     7 | org.apache.hadoop.mapred.TextInputFormat | 0x00                         | 0x00                                                 | hdfs://masternodetsu:9000/bigdata/hive/warehouse/test/id=2 |          -1 | org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat |       11 |
|    12 |     7 | org.apache.hadoop.mapred.TextInputFormat | 0x00                         | 0x00                                                 | hdfs://masternodetsu:9000/bigdata/hive/warehouse/test/id=4 |          -1 | org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat |       12 |
|    13 |     7 | org.apache.hadoop.mapred.TextInputFormat | 0x00                         | 0x00                                                 | hdfs://masternodetsu:9000/bigdata/hive/warehouse/test/id=3 |          -1 | org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat |       13 |
+-------+-------+------------------------------------------+------------------------------+------------------------------------------------------+------------------------------------------------------------+-------------+------------------------------------------------------------+----------+
5 rows in set (0.00 sec)

mysql> show create table PARTITIONS;
+------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table      | Create Table                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
+------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| PARTITIONS | CREATE TABLE `PARTITIONS` (
  `PART_ID` bigint NOT NULL,
  `CREATE_TIME` int NOT NULL,
  `LAST_ACCESS_TIME` int NOT NULL,
  `PART_NAME` varchar(767) CHARACTER SET latin1 COLLATE latin1_bin DEFAULT NULL,
  `SD_ID` bigint DEFAULT NULL,
  `TBL_ID` bigint DEFAULT NULL,
  PRIMARY KEY (`PART_ID`),
  UNIQUE KEY `UNIQUEPARTITION` (`PART_NAME`,`TBL_ID`),
  KEY `PARTITIONS_N49` (`TBL_ID`),
  KEY `PARTITIONS_N50` (`SD_ID`),
  CONSTRAINT `PARTITIONS_FK1` FOREIGN KEY (`TBL_ID`) REFERENCES `TBLS` (`TBL_ID`),
  CONSTRAINT `PARTITIONS_FK2` FOREIGN KEY (`SD_ID`) REFERENCES `SDS` (`SD_ID`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 |
+------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.01 sec)

mysql> set foreign_key_checks=1;

————————————
▼5. 参考情報
————————————
(1) Apache Hive インストール 3 ノードの環境 No.57 –2022/08
(2) LanguageManual – Apache Hive – Apache Software Foundation
(3) hadoop – SemanticException Unable to fetch table – Stack Overflow

以上です。参考になりましたら幸いです。

コメントを残す

メールアドレスが公開されることはありません。 が付いている欄は必須項目です