scala - Class cast exception when describing a data frame -

April 15, 2011

i have small dataset in csv format 2 columns of integers on computing summary statistics. there should no missing or bad data:

import org.apache.spark.sql.types._ import org.apache.spark.sql._  val raw = sc.textfile("skill_aggregate.csv")  val struct = structtype(structfield("personid", integertype, false)    :: structfield("numskills", integertype, false) :: nil)  val rows = raw.map(_.split(",")).map(x => row(x(0), x(1)))  val df = sqlcontext.createdataframe(rows, struct)  df.describe().show()

the last line gives me:

java.lang.classcastexception: java.lang.string cannot cast java.lang.integer

which of course implies bad data. weird bit can "collect" entire data set without issue implies each row correctly conforms integertype described in schema. odd can't find na values when open dataset in r.

why don't use databricks-csv reader (https://github.com/databricks/spark-csv) ? easier , safer create dataframes csv file , allows define schema of fields (and avoid cast problems).

the code simple achieve :

mydataframe = sqlcontext.load(source="com.databricks.spark.csv", header="true", path = myfilepath)

greetings,

Search This Blog

JVParth

scala - Class cast exception when describing a data frame -

Comments

Post a Comment

Popular posts from this blog

android - Pass an Serializable object in AIDL -

How to provide Authorization & Authentication using Asp.net, C#? -

How to use Authorization & Authentication in Asp.net, C#? -