scala - Class cast exception when describing a data frame -


i have small dataset in csv format 2 columns of integers on computing summary statistics. there should no missing or bad data:

import org.apache.spark.sql.types._ import org.apache.spark.sql._  val raw = sc.textfile("skill_aggregate.csv")  val struct = structtype(structfield("personid", integertype, false)    :: structfield("numskills", integertype, false) :: nil)  val rows = raw.map(_.split(",")).map(x => row(x(0), x(1)))  val df = sqlcontext.createdataframe(rows, struct)  df.describe().show() 

the last line gives me:

java.lang.classcastexception: java.lang.string cannot cast java.lang.integer

which of course implies bad data. weird bit can "collect" entire data set without issue implies each row correctly conforms integertype described in schema. odd can't find na values when open dataset in r.

why don't use databricks-csv reader (https://github.com/databricks/spark-csv) ? easier , safer create dataframes csv file , allows define schema of fields (and avoid cast problems).

the code simple achieve :

mydataframe = sqlcontext.load(source="com.databricks.spark.csv", header="true", path = myfilepath) 

greetings,

jg


Comments

Popular posts from this blog

toolbar - How to add link to user registration inside toobar in admin joomla 3 custom component -

linux - disk space limitation when creating war file -

How to provide Authorization & Authentication using Asp.net, C#? -