hadoop - How to move image files from HDFS directory to HBase? -
i have cloudera cdh 5.3.0
i have directory in hdfs several gigabytes of image files.
these files of various types (jpg, png, gif).
for each file picturename.jpg
, want row in hbase picturename
row key, , column image data.
can explain how accomplish this?
for background, hbase stores binary. put
, get
binary data. read image binaryfile
as described, hbase table
rowkey
<image-name>
cf:data
<binary-image-data>
there several ways ingest data hbase.
- with or without using mapreduce.
- using
put
orbulkload
.
since have several gigabytes of data, fastest way use mapreduce , bulkload
. useful tutorial cloudera on bulkloads here : http://blog.cloudera.com/blog/2013/09/how-to-use-hbase-bulk-loading-and-why/
how read images , supply hadoop?
you in many ways. i'll describe methods using mapreduce, it's more scalable.
one of way implement write own hadoop recordreader
, supply binary data map
.
but in case think use shortcut, supply list of image path input. in map
,
setup(..){ //prep filesystem fs = .. } map(...) { string path = key.tostring fsdatainputstream in = fs.open(new path(val)) //using in.read() read bytes. optionally custom encoding. //set binary value key value if using bulkload, else put object. context.write(key, kv) } cleanup(..) { //close fs. misc. }
it's bit hacky, hope idea.
also read comments. there several design considerations considered if going design system based on this.
hope helps.
Comments
Post a Comment