DEV Community

Cover image for Most common errors when setting up Amazon EMR
Nowsath for AWS Community Builders

Posted on

6

Most common errors when setting up Amazon EMR

In this article, I'll guide you through resolving common errors that often arise during the configuration of Amazon EMR with DynamoDB.

Error - 1

Could not lookup table test_ddb in DynamoDB.
In this case my DyanamoDB name is: test_ddb

Insufficient permissions to access DynamoDB can lead to this kind of errors when attempting to create an external table with DynamoDB.

FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException: Could not lookup table test_ddb in DynamoDB.
Enter fullscreen mode Exit fullscreen mode

Solution:
Add aws access key and aws secret access key as a property in the hadoop configuration file.

File name: core-site.xml
File path: /etc/hadoop/conf/core-site.xml

  <property>
      <name>fs.s3.awsAccessKeyId</name>
      <value>NKKIXXXXXXXXTRDQDPNG</value>
  </property>
  <property>
      <name>fs.s3.awsSecretAccessKey</name>
      <value>TYwQnTXXXXxxxxXXXX9kvVc54</value>
  </property>
Enter fullscreen mode Exit fullscreen mode

In certain instances, it might be necessary to include the same properties in the tez-site.xml file too.

File path: /etc/tez/conf/tez-site.xml


Error - 2

Execution errors for any DB queries.

When querying data from the external table, this error may arise as a result of missing properties in TEZ configurations.

hive> select count(*) from ddb_testtable;
Query ID = hadoop_20231112163703_8e8fd7d7-0a00-45ff-97d6-c4cf11a58ad5
Total jobs = 1
Launching Job 1 out of 1
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.tez.TezTask
Enter fullscreen mode Exit fullscreen mode

Solution:
Add following property details in the hive configuration file.

File name: hive-site.xml
File path: /etc/hive/conf/hive-site.xml

  <property>
    <name>hive.conf.hidden.list</name>
    <value>javax.jdo.option.ConnectionPassword,hive.server2.keystore.password,fs.s3a.proxy.password,dfs.adls.oauth2.credential,fs.adl.oauth2.credential</value>
  </property>
Enter fullscreen mode Exit fullscreen mode

Error - 3

Hive Runtime Error while processing row.

Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row 
    at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:996)
Enter fullscreen mode Exit fullscreen mode

This type of error may occur due to datatype mapping issues arising from unsupported formats in Hive.

Solution:
Set these two properties as false in the hive terminal.

set hive.vectorized.execution.enabled=false;
set hive.vectorized.execution.reduce.enabled=false;
Enter fullscreen mode Exit fullscreen mode

Error - 4

Hive Runtime Error while processing writable.

Caused by: java.lang.NumberFormatException: For input string: "240381698172046689239"
    at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
Enter fullscreen mode Exit fullscreen mode

This kind of error can cause by datatype limitations. The number is too big to convert to an integral type. According to the Apache Hive documentation on Numeric Types, the maximum value for a BIGINT is "9223372036854775807" but the input "240381698172046689239" is larger than the limit.

Solution:
Refer Apache Hive documentation on Numeric Types to handle long numeric values.

Conclusion

These are the primary errors I encountered while setting up Amazon EMR with DynamoDB for data backfilling purposes. I will continue to add any additional issues that arise in the future.

If you encounter any other issues, please feel free to mention them in the comment section.

Billboard image

The Next Generation Developer Platform

Coherence is the first Platform-as-a-Service you can control. Unlike "black-box" platforms that are opinionated about the infra you can deploy, Coherence is powered by CNC, the open-source IaC framework, which offers limitless customization.

Learn more

Top comments (0)

Best Practices for Running  Container WordPress on AWS (ECS, EFS, RDS, ELB) using CDK cover image

Best Practices for Running Container WordPress on AWS (ECS, EFS, RDS, ELB) using CDK

This post discusses the process of migrating a growing WordPress eShop business to AWS using AWS CDK for an easily scalable, high availability architecture. The detailed structure encompasses several pillars: Compute, Storage, Database, Cache, CDN, DNS, Security, and Backup.

Read full post

👋 Kindness is contagious

Dive into an ocean of knowledge with this thought-provoking post, revered deeply within the supportive DEV Community. Developers of all levels are welcome to join and enhance our collective intelligence.

Saying a simple "thank you" can brighten someone's day. Share your gratitude in the comments below!

On DEV, sharing ideas eases our path and fortifies our community connections. Found this helpful? Sending a quick thanks to the author can be profoundly valued.

Okay