DEV Community

Cover image for Most common errors when setting up Amazon EMR
Nowsath for AWS Community Builders

Posted on

Most common errors when setting up Amazon EMR

In this article, I'll guide you through resolving common errors that often arise during the configuration of Amazon EMR with DynamoDB.

Error - 1

Could not lookup table test_ddb in DynamoDB.
In this case my DyanamoDB name is: test_ddb

Insufficient permissions to access DynamoDB can lead to this kind of errors when attempting to create an external table with DynamoDB.

FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException: Could not lookup table test_ddb in DynamoDB.
Enter fullscreen mode Exit fullscreen mode

Solution:
Add aws access key and aws secret access key as a property in the hadoop configuration file.

File name: core-site.xml
File path: /etc/hadoop/conf/core-site.xml

  <property>
      <name>fs.s3.awsAccessKeyId</name>
      <value>NKKIXXXXXXXXTRDQDPNG</value>
  </property>
  <property>
      <name>fs.s3.awsSecretAccessKey</name>
      <value>TYwQnTXXXXxxxxXXXX9kvVc54</value>
  </property>
Enter fullscreen mode Exit fullscreen mode

In certain instances, it might be necessary to include the same properties in the tez-site.xml file too.

File path: /etc/tez/conf/tez-site.xml


Error - 2

Execution errors for any DB queries.

When querying data from the external table, this error may arise as a result of missing properties in TEZ configurations.

hive> select count(*) from ddb_testtable;
Query ID = hadoop_20231112163703_8e8fd7d7-0a00-45ff-97d6-c4cf11a58ad5
Total jobs = 1
Launching Job 1 out of 1
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.tez.TezTask
Enter fullscreen mode Exit fullscreen mode

Solution:
Add following property details in the hive configuration file.

File name: hive-site.xml
File path: /etc/hive/conf/hive-site.xml

  <property>
    <name>hive.conf.hidden.list</name>
    <value>javax.jdo.option.ConnectionPassword,hive.server2.keystore.password,fs.s3a.proxy.password,dfs.adls.oauth2.credential,fs.adl.oauth2.credential</value>
  </property>
Enter fullscreen mode Exit fullscreen mode

Error - 3

Hive Runtime Error while processing row.

Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row 
    at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:996)
Enter fullscreen mode Exit fullscreen mode

This type of error may occur due to datatype mapping issues arising from unsupported formats in Hive.

Solution:
Set these two properties as false in the hive terminal.

set hive.vectorized.execution.enabled=false;
set hive.vectorized.execution.reduce.enabled=false;
Enter fullscreen mode Exit fullscreen mode

Error - 4

Hive Runtime Error while processing writable.

Caused by: java.lang.NumberFormatException: For input string: "240381698172046689239"
    at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
Enter fullscreen mode Exit fullscreen mode

This kind of error can cause by datatype limitations. The number is too big to convert to an integral type. According to the Apache Hive documentation on Numeric Types, the maximum value for a BIGINT is "9223372036854775807" but the input "240381698172046689239" is larger than the limit.

Solution:
Refer Apache Hive documentation on Numeric Types to handle long numeric values.

Conclusion

These are the primary errors I encountered while setting up Amazon EMR with DynamoDB for data backfilling purposes. I will continue to add any additional issues that arise in the future.

If you encounter any other issues, please feel free to mention them in the comment section.

Top comments (0)