Polybase MapReduce Container Size

If you like this content, check out the entire Polybase series.

Whenever you create a MapReduce job via YARN, YARN will create a series of containers, where containers are isolated bubbles of memory and CPU power.  This blog post will go over an issue that I ran into with my HDP 2.4 VM concerning YARN containers and Polybase.

Memory Starvation In A Nutshell

Originally, I assigned 8 GB of RAM to my HDP 2.4 sandbox.  This was enough for basic work like running Hive queries, and so I figured it was enough to run a MapReduce job initiated by Polybase.  It turns out that I was wrong.

Configuring YARN Containers

There are a couple configuration settings in Ambari (for the sandbox, the default Ambari page is http://sandbox.hortonworks.com:8080).  From there, if you select the YARN menu item and go to the Configs tab, you can see the two Container configuration settings.

containersettings

On my machine, I have the minimum container size marked at 512 MB and maximum container size at 1536 MB.  I also have the minimum and maximum number of CPU VCores set as well, ranging from 1 to 4.

Setting The Stage

Back at PASS Summit, I had a problem getting Polybase to work.  Getting the SQL Server settings correct was critical, but even after that, I still had a problem:  my jobs would just run endlessly.  Eventually I had to kill the jobs, even though I’d let them run for upwards of 30 minutes.  Considering that the job was just pulling 777 rows from the SecondBasemen table, this seemed…excessive…  Also, after the first 20-30 seconds, it seemed like nothing was happening, based on the logs.

longrunningjobs

On the Friday of PASS Summit, I had a chance to sit down with Bill Preachuk (t) and Scott Shaw (b | t) of Hortonworks and they helped me diagnose my issues.  By focusing down on what was happening while the MapReduce job ran, they figured out that my laptop was not creating the necessary number of YARN containers due to the service running out of memory, and so my MapReduce jobs would just hang.

Our solution was simple:  scale down the container min and max size to 512 MB, as that would guarantee that I could create at least 3 containers—which is what the Polybase engine wanted.

Now It Breaks For Real

Once we did that and I restarted all of the services, I ended up getting an interesting error message from SQL Server:

Msg 7320, Level 16, State 110, Line 2
Cannot execute the query “Remote Query” against OLE DB provider “SQLNCLI11” for linked server “(null)”. EXTERNAL TABLE access failed due to internal error: ‘Java exception raised on call to JobSubmitter_SubmitJob: Error [org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid resource request, requested memory < 0, or requested memory > max configured, requestedMemory=1536, maxMemory=512

The error message is pretty clear:  the Polybase service wants to create containers that are 1536 MB in size, but the maximum size I’m allowing is 512 MB.  Therefore, the Polybase MapReduce operation fails.

Let’s Change Some Config Files!

To get around that, I looked up the proper way to set the map and reduce memory sizes for a MapReduce job:  by changing mapred-site.xml.  Therefore, I went back to the Polybase folder on my SQL Server installation and added the following to my mapred-site.xml:

<property>
<name>mapreduce.map.memory.mb</name>
<value>512</value>
</property>
<property>
<name>mapreduce.reduce.memory.mb</name>
<value>512</value>
</property>

After restarting the Polybase service, I ran the query again and got the same error.  It seems changing that setting did not work.

Conclusions

The way I got around this problem was to increase the HDP sandbox VM to have 12 GB of RAM allocated to it.  That’s a large percentage of my total memory allocation on the laptop, but at least once I set the container size back to allowing 1536 MB of RAM, my Polybase MapReduce jobs ran.

My conjecture is that the 1536 MB size might be hard-coded, but it’s just that:  conjecture.  I have no proof either way.

One last thing I should note is that the Polybase service wants to create three containers, so you need to have enough memory available to YARN to allocate three 1536 MB containers, or 4608 MB of RAM.

Advertisements

One thought on “Polybase MapReduce Container Size

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s