python mapreduce convert text into array -
i have file this:
0,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 1,1,1,1,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 2,1,1,1,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 3,1,1,1,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 4,1,1,1,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
and want make first item, key, , rest items value, array of them. code doesn´t work:
mrdd = rrdd.map(lambda line: (line[0], (np.array(int(line))))).collect() my desired output:
(3, (1,1,1,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)) (4, (1,1,1,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)) my last approach:
import os.path import numpy np basedir = os.path.join('data') inputpath = os.path.join('mydata', 'matriz_reglas_test.csv') filename = os.path.join(basedir, inputpath) reglasrdd = (sc.textfile(filename, 8) .cache() ) regrdd = reglasrdd.map(lambda line: line.split('\n')) print regrdd.take(5) movrdd = regrdd.map(lambda line: (line[0], (int(x) x in line[1:] if x))).collect() print movrdd.take(5) and error:
picklingerror: can't pickle <type 'generator'>: attribute lookup __builtin__.generator failed any appreciated.
finally have solution:
import os.path import numpy np basedir = os.path.join('data') inputpath = os.path.join('mydata', 'matriz_reglas_test.csv') filename = os.path.join(basedir, inputpath) split_regex = r'\w+' def tokenize(string): """ implementation of input string tokenization args: string (str): input string returns: list: list of tokens """ s = re.split(split_regex, string) return [int(word) word in s if word] reglasrdd = (sc.textfile(filename, 8) .map(tokenize) .cache() ) movrdd = reglasrdd.map(lambda line: (line[0], (line[1:]))) print movrdd.take(5) output:
[(0, [1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]), (1, [1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]), (2, [1, 1, 0, 0, 0, 0, 0, 0, 0]), (3, [1, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]), (4, [1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])]
thank you!!
Comments
Post a Comment