hadoop - How to move image files from HDFS directory to HBase? -


i have cloudera cdh 5.3.0

i have directory in hdfs several gigabytes of image files.

these files of various types (jpg, png, gif).

for each file picturename.jpg, want row in hbase picturename row key, , column image data.

can explain how accomplish this?

for background, hbase stores binary. put , get binary data. read image binaryfile

as described, hbase table

rowkey <image-name>

cf:data <binary-image-data>

there several ways ingest data hbase.

  • with or without using mapreduce.
  • using put or bulkload.

since have several gigabytes of data, fastest way use mapreduce , bulkload. useful tutorial cloudera on bulkloads here : http://blog.cloudera.com/blog/2013/09/how-to-use-hbase-bulk-loading-and-why/

how read images , supply hadoop?

you in many ways. i'll describe methods using mapreduce, it's more scalable.

one of way implement write own hadoop recordreader, supply binary data map.

but in case think use shortcut, supply list of image path input. in map,

setup(..){   //prep filesystem fs = .. }  map(...) {    string path = key.tostring   fsdatainputstream in = fs.open(new path(val))   //using in.read() read bytes. optionally custom encoding.   //set binary value key value if using bulkload, else put object.   context.write(key, kv)  } cleanup(..) {  //close fs. misc. } 

it's bit hacky, hope idea.

also read comments. there several design considerations considered if going design system based on this.

hope helps.


Comments

Popular posts from this blog

toolbar - How to add link to user registration inside toobar in admin joomla 3 custom component -

linux - disk space limitation when creating war file -