groovy - Apache Spark difference between two RDDs -


say have example job (in groovy w/ java api):

def set1 = [] def set2 = [] 0.upto(10) { set1 << } 8.upto(20) { set2 << } def rdd1 = context.parallelize(set1) def rdd2 = context.parallelize(set2)  //what next? 

how set delta between two? know union can create rdd has of data in rdds, how do opposite of that?

if want set subtraction subtract answer. if want "outer" collection try:

rdd1.subtract(rdd2).union(rdd2.subtract(rdd1)) 

Comments

Popular posts from this blog

toolbar - How to add link to user registration inside toobar in admin joomla 3 custom component -

linux - disk space limitation when creating war file -