python mapreduce convert text into array -


i have file this:

0,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 1,1,1,1,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 2,1,1,1,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 3,1,1,1,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 4,1,1,1,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0

and want make first item, key, , rest items value, array of them. code doesn´t work:

mrdd = rrdd.map(lambda line: (line[0], (np.array(int(line))))).collect() 

my desired output:

(3, (1,1,1,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0))  (4, (1,1,1,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)) 

my last approach:

import os.path import numpy np basedir = os.path.join('data') inputpath = os.path.join('mydata', 'matriz_reglas_test.csv') filename = os.path.join(basedir, inputpath)  reglasrdd = (sc.textfile(filename, 8)                .cache()             ) regrdd = reglasrdd.map(lambda line: line.split('\n')) print regrdd.take(5)  movrdd = regrdd.map(lambda line: (line[0], (int(x) x in line[1:] if x))).collect() print movrdd.take(5) 

and error:

picklingerror: can't pickle <type 'generator'>: attribute lookup __builtin__.generator failed 

any appreciated.

finally have solution:

    import os.path     import numpy np     basedir = os.path.join('data')     inputpath = os.path.join('mydata', 'matriz_reglas_test.csv')     filename = os.path.join(basedir, inputpath)         split_regex = r'\w+'      def tokenize(string):         """ implementation of input string tokenization         args:             string (str): input string         returns:             list: list of tokens         """         s = re.split(split_regex, string)         return [int(word) word in s if word]       reglasrdd = (sc.textfile(filename, 8)                    .map(tokenize)                    .cache()                 )      movrdd = reglasrdd.map(lambda line: (line[0], (line[1:])))     print movrdd.take(5) 

output:

[(0, [1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]), (1, [1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]), (2, [1, 1, 0, 0, 0, 0, 0, 0, 0]), (3, [1, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]), (4, [1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])]

thank you!!


Comments

Popular posts from this blog

toolbar - How to add link to user registration inside toobar in admin joomla 3 custom component -

linux - disk space limitation when creating war file -

How to provide Authorization & Authentication using Asp.net, C#? -