Let me tell you about one of my least favorite things I like to see in PolyBase:

Error: Could not find or load main class org.apache.hadoop.mapreduce.v2.app.MRAppMaster

This error is not limited to PolyBase but is instead an issue when trying to run MapReduce jobs in Hadoop. There are several potential causes, so let’s cover each of them as they relate to PolyBase and hopefully one of these solves your issue.

SQL Server mapred-site.xml Needs Fixed

The first potential cause of this issue is that your mapred-site.xml file in SQL Server needs a little something-something added to it. Specifically, make sure that it has the following property:

  <property>
    <name>mapreduce.app-submission.cross-platform</name>
    <value>true</value>
  </property>

If you don’t know what I’m talking about, I have an older post on this topic.

SQL Server yarn-site.xml Needs Fixed

If the first option didn’t work, check out your yarn-site.xml file and ensure that it points to the right location. If you’re using Hortonworks Data Platform, be sure to check the configuration post I just linked to because the default configuration in HDP’s yarn-site.xml points you to will cause you problems.

Hadoop yarn-site.xml Needs Fixed

There is a chance that your problem isn’t on the SQL Server site—it could be on the Hadoop side. Check your yarn-site.xml file in Hadoop and ensure that the classpath is correct. Here is what I have for my yarn.application.classpath value:

$HADOOP_CONF_DIR,{{hadoop_home}}/hadoop/*,{{hadoop_home}}/hadoop/lib/*,{{hadoop_home}}/hadoop-hdfs/*,{{hadoop_home}}/hadoop-hdfs/lib/*,{{hadoop_home}}/hadoop-yarn/*,{{hadoop_home}}/hadoop-yarn/lib/*,{{hadoop_home}}/hadoop-mapreduce/*,{{hadoop_home}}/hadoop-mapreduce/lib/*

The hadoop_home parameter points to someplace like /usr/hdp/3.0.1.0-187 or whatever your specific version of HDP is. That makes it a bit more stable than the SQL Server side, where we don’t have the parameter.

I found this last cause particularly interesting because I have had success with MapReduce jobs before, but suddenly my old yarn-site.xml settings for HDP 3.0 (the default settings) stopped working and MapReduce jobs wouldn’t work again until I modified yarn-site.xml to correspond with what I use in SQL Server.

Hadoop mapred-site.xml Needs Fixed

One additional option pops up in the error message itself. Here is the full error message:

Last 4096 bytes of stderr :
Error: Could not find or load main class org.apache.hadoop.mapreduce.v2.app.MRAppMaster
Please check whether your etc/hadoop/mapred-site.xml contains the below configuration:
<property>
<name>yarn.app.mapreduce.am.env</name>
<value>HADOOP_MAPRED_HOME=${full path of your hadoop distribution directory}</value>
</property>
<property>
<name>mapreduce.map.env</name>
<value>HADOOP_MAPRED_HOME=${full path of your hadoop distribution directory}</value>
</property>
<property>
<name>mapreduce.reduce.env</name>
<value>HADOOP_MAPRED_HOME=${full path of your hadoop distribution directory}</value>
</property>

It’s looking for three particular properties in the Hadoop-side mapred-site.xml file, so make sure those are there. In my case, they turned out to be there already but considering that it’s in the error message, that makes me think this is a common enough cause that it’s worth mentioning.

Conclusion

There are several potential causes for a missing MRAppMaster class when creating MapReduce jobs and the causes tend to revolve around configuring yarn-site.xml and mapred-site.xml on your SQL Server instance and on your Hadoop cluster.

Advertisement

One thought on “PolyBase Revealed: MRAppMaster Errors

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s