OutOfMemoryError in Scala and Spark -


i used code extracting bigrams text file:

import org.apache.spark.{sparkcontext, sparkconf} object ds_e6 {    def main(args: array[string]): unit = {     case class bigram(first: string, second: string) {       def mkreplacement(s: string) = s.replaceall(first + " " + second, first + "-" + second)     }      def stringtobigrams(s: string) = {       val words = s.split(" ")       if (words.size >= 2) {         words.sliding(2).map(a => bigram(a(0), a(1)))       } else         iterator[bigram]()     }      val conf = new sparkconf()       .setmaster("local")       .setappname("bigram")       .set("spark.executor.memory", "1g")      val sc = new sparkcontext(conf)       val data = sc.textfile("data/file.txt")       val bigrams = data.flatmap {         stringtobigrams       }.collect()        val bigramcounts = bigrams.groupby(identity).mapvalues(_.size)       val threshold = 100       val topbigrams = bigramcounts.filter(_._2 >= threshold).map(_._1)       topbigrams.foreach(println)       val replaced = data.map(r => topbigrams.foldleft(r)((r, b) => b.mkreplacement(r)))       val replaced1 = replaced.zipwithindex()         .map { case (line, i) => i.tostring + "," + line}        replaced1.coalesce(1).saveastextfile("data/output.txt")     }   } } 

my input file 45 mb , when run code shows me below error: (i think related collect())

 java.lang.outofmemoryerror: gc overhead limit exceeded  @ org.spark-project.jetty.util.blockingarrayqueue.poll(blockingarrayqueue.java:342) @ org.spark-project.jetty.util.thread.queuedthreadpool.idlejobpoll(queuedthreadpool.java:526) @ org.spark-project.jetty.util.thread.queuedthreadpool.access$600(queuedthreadpool.java:44) @ org.spark-project.jetty.util.thread.queuedthreadpool$3.run(queuedthreadpool.java:572) 

how can solve problem?

it's possible not getting 1g of memory requesting via sparkconf. reason when master = local, spark driver , executor entirely running inside jvm running code shown here creates sparkcontext. time, it's late obtain more java heap allocated when jvm started. need add -xmx1g arg command intellij uses launch jvm runs code.

you don't how you're running code in intellij. you'll need create or modify "run configuration".

in intellij ui, under "run" toolbar menu, select command "edit configurations...". bring window such shown below. shows run configuration running in "scala console". box "vm options:" need include -xmx1g jvm arg. in case, running 2.5g of memory.

intellij might have created run configuration when ran app. if not, use window create new 1 of proper type, e.g. "scala console".

to check memory in running scala app, use following commands check actual current, max, , free jvm memory see if got memory requested.

  • sys.runtime.totalmemory()
  • sys.runtime.maxmemory()
  • sys.runtime.freememory()

enter image description here


Comments

Popular posts from this blog

toolbar - How to add link to user registration inside toobar in admin joomla 3 custom component -

linux - disk space limitation when creating war file -