hadoop - How to move image files from HDFS directory to HBase? -
i have cloudera cdh 5.3.0
i have directory in hdfs several gigabytes of image files.
these files of various types (jpg, png, gif).
for each file picturename.jpg, want row in hbase picturename row key, , column image data.
can explain how accomplish this?
for background, hbase stores binary. put , get binary data. read image binaryfile
as described, hbase table
rowkey
<image-name>cf:data
<binary-image-data>
there several ways ingest data hbase.
- with or without using mapreduce.
- using
putorbulkload.
since have several gigabytes of data, fastest way use mapreduce , bulkload. useful tutorial cloudera on bulkloads here : http://blog.cloudera.com/blog/2013/09/how-to-use-hbase-bulk-loading-and-why/
how read images , supply hadoop?
you in many ways. i'll describe methods using mapreduce, it's more scalable.
one of way implement write own hadoop recordreader, supply binary data map.
but in case think use shortcut, supply list of image path input. in map,
setup(..){ //prep filesystem fs = .. } map(...) { string path = key.tostring fsdatainputstream in = fs.open(new path(val)) //using in.read() read bytes. optionally custom encoding. //set binary value key value if using bulkload, else put object. context.write(key, kv) } cleanup(..) { //close fs. misc. } it's bit hacky, hope idea.
also read comments. there several design considerations considered if going design system based on this.
hope helps.
Comments
Post a Comment