Issue
I've tried to run a mapreduce job on about 20 GB data, and I got an error on reduce shuffle phase. It says that because of memory heap space. Then, I've read on many source, that I have to decrease the mapreduce.reduce.shuffle.input.buffer.percent property on mapred-site.xml with the default value 0,7. So, I decrease it to 0,2.
I want to ask, is that property take an effect on time performance my mapreduce job. So, how can I properly configure to make my mapreduce job never get an error?
Solution
mapreduce.reduce.shuffle.input.buffer.percent 0.70 The percentage of memory to be allocated from the maximum heap size to storing map outputs during the shuffle. From this it looks that that if you decrease this to a arbitrary value it may degrade the performance of the shuffle phase. There would have been certain reasoning and tests behind the default value You may check other related properties here http://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml
What is the approx data output by your mappers ,If that is huge then you may want to increase the number of mappers Likewise if the number of reducers is low heap space error could likely happen during reduce phase.
you may want to check your job counters and increase the number of mappers/reducers you may also try increasing the mapper/reducer memory by setting the properties mapreduce.reduce.memory.mb and mapreduce.map.memory.mb
Answered By - bl3e
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.