How to use Spark Classifaction for unseen instances? -
it looks training , test set should both present @ time of classification model creation in apache spark? if have unseen instances come later , not exist when we're creating model? have re-build model when receive unseen instance? doesn't make classification impractical in real scenarios?
it looks training , test set should both present @ time of classification model creation in apache spark?
test instances can loaded appart train instances can see in naive bayes example.
from pyspark.mllib.classification import naivebayes pyspark.mllib.linalg import vectors pyspark.mllib.regression import labeledpoint def parseline(line): parts = line.split(',') label = float(parts[0]) features = vectors.dense([float(x) x in parts[1].split(' ')]) return labeledpoint(label, features) data = sc.textfile('data/mllib/sample_naive_bayes_data.txt').map(parseline) # split data aproximately training (60%) , test (40%) training, test = data.randomsplit([0.6, 0.4], seed = 0) # train naive bayes model. model = naivebayes.train(training, 1.0) # make prediction , test accuracy. predictionandlabel = test.map(lambda p : (model.predict(p.features), p.label)) accuracy = 1.0 * predictionandlabel.filter(lambda (x, v): x == v).count() / test.count() what if have unseen instances come later , not exist when we're creating model?
this scenario same scikit , other machine learning tools, although spark offers algorithms can process streams.
Comments
Post a Comment