groovy - Apache Spark difference between two RDDs -
say have example job (in groovy w/ java api):
def set1 = [] def set2 = [] 0.upto(10) { set1 << } 8.upto(20) { set2 << } def rdd1 = context.parallelize(set1) def rdd2 = context.parallelize(set2) //what next?
how set delta between two? know union
can create rdd has of data in rdds, how do opposite of that?
if want set subtraction subtract answer. if want "outer" collection try:
rdd1.subtract(rdd2).union(rdd2.subtract(rdd1))
Comments
Post a Comment