Iterative Hash Algorithm for Fast File Check -


i want create representation of state of files in folder (ignoring order), can send state computer check if in sync. "state representation" 3 numbers concatenated . are:

sum . product . number of items 

the "sum" numerical addition of file's md5 numerical representations.

the product multiplication of of file's md5 numerical representations.

the number of items number of files.

the main reason doing this allows me create unique states iteratively/quickly when add or delete file (a modification being combination of delete add). also, 1 should end same "state" if same set of operations performed in random order.

adding file

  • generate file's md5
  • calculate md5's numerical value (x).
  • add x sum
  • multiply product x
  • increment number of items.

removing file

  • generate file's md5
  • calculate md5's numerical value (x).
  • subtract x sum
  • divide product x
  • decrement number of items.

problems

since numerical representations of hashes can quite large, may have use library generate results using strings rather integers may quite slow.

with limited testing have done, have not been able create "collisions" collision 2 different sets of file hashes produce same state (remember ignoring order of file hashes).

question

i'm sure can't first person want achieve such thing. there algorithm or iterative hash function aims same thing already, preferably in php, java, or python? there term type of thing, think of "iterative hash"? there flaw algorithm haven't spotted, such "collisions" making generated state representations non-unique?

how many states can file system take ? infinity practical purposes.

how long hash length ? short enough efficient, finite in case.

will collisions ? yes.

so, hash approach seems fine, particularly if spreads correctly points close, i.e. state of fs differing content of 1 file hashes different values.

however, should depend on hash produce collisions in long run, it's mathematical certainty probability goes 1 someday collision, given collision chance not 0.

so safe, need full md5 exchange, if speed , fast updates goal scheme sounds good, more infrequent exchanges of longer keys, on safe side if sync mission critical.


Comments

Popular posts from this blog

toolbar - How to add link to user registration inside toobar in admin joomla 3 custom component -

linux - disk space limitation when creating war file -