algorithm - Fuse tuples to find equivalence classes -
suppose have finite domain d={d1,..dk} containg k elements.
we consider s subset of d^n, i.e. set of tuples of form < a1,..,an >, ai in d.
we want represent (compactly) using s' subset of 2^d^n, i.e. set of tuples of form < a1,..an > ai being subsets of d. implication tuple s' in s' elements in cross product of ai exist in s.
for instance, consider d={a,b,c} k=3, n=2 , tuples s=< a,b >+< a,c >+< b,b >+< b,c >.
we can use s'=<{a,b},{b,c}> represent s.
this singleton solution minimal, s'=<{a},{b,c}>+<{b},{b,c}> solution larger, therefore less desirable.
some sizes, in concrete instances, need handle : k ~ 1000 elements in domain d, n <= 10 relatively small (main source of complexity), |s| ranging large values > 10^6.
a naïve approach consists in first plunging s domain of s' 2^d^n, using following test, 2 two, 2 tuples s1,s2 in s' can fused form single tuple in s' iff. differ 1 component.
e.g.
< a,b >+< a,c > -> <{a},{b,c}> (differ on second component)
< b,b >+< b,c > -> <{b},{b,c}> (differ on second component)
<{a},{b,c}> + <{b},{b,c}> -> <{a,b},{b,c}> (differ on first component)
now there several minimal s', interested in finding one, , approximations of minimisation of kind ok, provided don't give wrong results (i.e. if s' not small be, fast results).
naive algorithm has deal fact newly introduced "fused" tuple match other tuple scales badly on large input sets, n remaining low. need |s'|^2 comparisons ensure convergence, , time fuse 2 elements, i'm retesting every pair (how can improve ?).
a lot of efficiency iteration order dependent, sorting set in way(s) option, or perhaps indexing using hashes, i'm not sure how it.
imperative pseudo code ideal, or pointers reformulation of problem can run solver on help.
here's psuedo (c# code haven't tested) demonstrates s'=<{a},{b,c}>+<{b},{b,c}> method. except space requirements, when using integer index element negligible; overall efficiency , speed add'ing , test'ing tuples should extremely fast. if want practical solution have 1 have use correct adts.
elementtype[] domain = new elementtype[]; // simple array of domain elements filldomain(domain); // insert domain elements sortarray(domain); // sort domain elements k log k time sorteddictionary<int, hashset<int>> subsets; // int's index/ref domain subsets = new sorteddictionary<int, hashset<int>>(); // void addtuple(sorteddictionary<int, hashset<int>> tuples, elementtype[] domain, elementtype first, elementtype second) { int = binarysearch(domain, first); // log k time (binary search) int b = binarysearch(domain, second); // log k time (binary search) if(tuples.containskey(a)) { // log n time (binary search on sorted keys) if(!tuples[a].contains(b)) { // constant time (hash lookup) tuples[a].add(b); // constant time (hash add) } } else { // constant time (instance + hash add) tuples[a] = new hashset<in>(); tuples[a].add(b); } } // bool containstuple(sorteddictionary<int, hashset<int>> tuples, elementtype[] domain, elementtype first, elementtype second) { int = binarysearch(domain, first); // log k time (binary search) int b = binarysearch(domain, second); // log k time (binary search) if(tuples.containskey(a)) { // log n time (binary search on sorted keys) if(tuples[a].contains(b)) { // constant time (hash test) return true; } } return false; } the space savings optimizing tuple subset s' won't outweight slowdown of optimization process itself. size optimization (if know you're k less 65536 use short integers instead of integers in sorteddictionary , hashset. 50 mil integers take 4 bytes per 32bit integer * 50 mil ~= 200 mb.
edit here's approach encoding/mapping tuples string can take advantage of binary string compare , fact utf-16 / utf-8 encoding size efficient. again still doesn't doing merging optimization want, speed , efficiency pretty good.
here's quick pseudo code in javascript.
array.prototype.binarysearch = function(elm) { var l = 0, h = this.length - 1, i; while(l <= h) { = (l + h) >> 1; if(this[i] < elm) l = ++i; else if(this[i] > elm) h = --i; else return i; } return -(++l); }; // map ordered domain elements characters // example javascript's utf-16 should fine // utf-8 work var domain = { "a": string.fromcharcode(1), "b": string.fromcharcode(2), "c": string.fromcharcode(3), "d": string.fromcharcode(4) } var tuplestrings = []; // map tuple string encoding function map(tuple) { var str = ""; for(var i=0; i<tuple.length; i++) { str += domain[tuple[i]]; } return str; } function add(tuple) { var str = map(tuple); // binary search var index = tuplestrings.binarysearch(str); if(index < 0) index = ~index; // insert depends on tuplestring's type implementation tuplestrings.splice(index, 0, str); } function contains(tuple) { var str = map(tuple); // binary search return tuplestring.binarysearch(str) >= 0; } add(["a","b"]); add(["a","c"]); add(["b","b"]); add(["b","c"]); add(["c","c"]); add(["d","a"]); alert(contains(["a","a"])); alert(contains(["d","a"])); alert(json.stringify(tuplestrings, null, "\n"));
Comments
Post a Comment