apache spark - SnappyCompressionCodec on the master -
i'm running spark standalone cluster (1.4.0). have applications running scheduler every hour. found on 1 of executions, job got finished after few seconds (instead of ~5 minutes), , in logs on master, can see following exception:
org.apache.spark.sparkexception: job aborted due stage failure: task 1 in stage 1.0 failed 4 times, recent failure: lost task 1.3 in stage 1.0 (tid 20, 172.31.6.203): java.io.ioexception: java.lang.reflect.invocationtargetexception @ org.apache.spark.util.utils$.tryorioexception(utils.scala:1257) @ org.apache.spark.broadcast.torrentbroadcast.readbroadcastblock(torrentbroadcast.scala:165) @ org.apache.spark.broadcast.torrentbroadcast._value$lzycompute(torrentbroadcast.scala:64) @ org.apache.spark.broadcast.torrentbroadcast._value(torrentbroadcast.scala:64) @ org.apache.spark.broadcast.torrentbroadcast.getvalue(torrentbroadcast.scala:88) @ org.apache.spark.broadcast.broadcast.value(broadcast.scala:70) @ org.apache.spark.scheduler.resulttask.runtask(resulttask.scala:59) @ org.apache.spark.scheduler.task.run(task.scala:70) @ org.apache.spark.executor.executor$taskrunner.run(executor.scala:213) @ java.util.concurrent.threadpoolexecutor.runworker(threadpoolexecutor.java:1145) @ java.util.concurrent.threadpoolexecutor$worker.run(threadpoolexecutor.java:615) @ java.lang.thread.run(thread.java:745) caused by: java.lang.reflect.invocationtargetexception @ sun.reflect.nativeconstructoraccessorimpl.newinstance0(native method) @ sun.reflect.nativeconstructoraccessorimpl.newinstance(nativeconstructoraccessorimpl.java:57) @ sun.reflect.delegatingconstructoraccessorimpl.newinstance(delegatingconstructoraccessorimpl.java:45) @ java.lang.reflect.constructor.newinstance(constructor.java:526) @ org.apache.spark.io.compressioncodec$.createcodec(compressioncodec.scala:68) @ org.apache.spark.io.compressioncodec$.createcodec(compressioncodec.scala:60) @ org.apache.spark.broadcast.torrentbroadcast.org$apache$spark$broadcast$torrentbroadcast$$setconf(torrentbroadcast.scala:73) @ org.apache.spark.broadcast.torrentbroadcast$$anonfun$readbroadcastblock$1.apply(torrentbroadcast.scala:167) @ org.apache.spark.util.utils$.tryorioexception(utils.scala:1254) ... 11 more caused by: java.lang.illegalargumentexception @ org.apache.spark.io.snappycompressioncodec.<init>(compressioncodec.scala:152) ... 20 more driver stacktrace: @ org.apache.spark.scheduler.dagscheduler.org$apache$spark$scheduler$dagscheduler$$failjobandindependentstages(dagscheduler.scala:1266) @ org.apache.spark.scheduler.dagscheduler$$anonfun$abortstage$1.apply(dagscheduler.scala:1257) @ org.apache.spark.scheduler.dagscheduler$$anonfun$abortstage$1.apply(dagscheduler.scala:1256) @ scala.collection.mutable.resizablearray$class.foreach(resizablearray.scala:59) @ scala.collection.mutable.arraybuffer.foreach(arraybuffer.scala:47) @ org.apache.spark.scheduler.dagscheduler.abortstage(dagscheduler.scala:1256) @ org.apache.spark.scheduler.dagscheduler$$anonfun$handletasksetfailed$1.apply(dagscheduler.scala:730) @ org.apache.spark.scheduler.dagscheduler$$anonfun$handletasksetfailed$1.apply(dagscheduler.scala:730) @ scala.option.foreach(option.scala:236) @ org.apache.spark.scheduler.dagscheduler.handletasksetfailed(dagscheduler.scala:730) @ org.apache.spark.scheduler.dagschedulereventprocessloop.onreceive(dagscheduler.scala:1450) @ org.apache.spark.scheduler.dagschedulereventprocessloop.onreceive(dagscheduler.scala:1411) @ org.apache.spark.util.eventloop$$anon$1.run(eventloop.scala:48) this job successful many times before , after run, , other jobs successful in time
any idea can cause that?
thanks, nizan
i found out root cause problem slaves ran out out of space, because of logs of applications.
i passed logs mounted dir, , removed old logs, following command:
for slave in `cat /root/spark/conf/slaves`; echo $slave; ssh -a -t -i ~/.ssh/mykey -o stricthostkeychecking=no root@$slave "rm -rf /home/hadoop/spark-logs/; mkdir /home/hadoop/spark-logs; ln -s /mnt/spark-logs/ /home/hadoop/spark-logs/" done thanks, nizan
Comments
Post a Comment