partitioning - Neo4j partition -
is way physically separate between neo4j partitions? meaning following query go node1:
match (a:user:facebook) while query go node (maybe hosted on docker)
match (b:user:google) this case: want store data of several clients under neo4j, lots of them. now, i'm not sure whats best design has fulfill few conditions:
- no mixed data should returned cypher query ( hard make sure, no developer forget ":partition1" (for example) in cypher query)
- performance of 1 client shouldn't affect client, example, if 1 client has lots of data, , client has small amount of data, or if "heavy" query of 1 client running, dont want other "lite" queries of client suffer slow slow performance
in other words, storing under 1 node, @ point in future, i think, have scalability problem, when i'll have more clients.
btw, common have few clusters?
also whats advantage of partitioning on creating different label each client? example: users_client_1 , users_client_2 etc
short answer: no, there isn't.
neo4j has high availability (ha) clusters can make copy of entire graph on many machines, , serve many requests against copy quickly, don't partition huge graph of stored here, other parts there, , connected 1 query mechanism.
more detailed answer: graph partitioning hard problem, subject ongoing research. can read more over @ wikipedia, gist when create partitions, you're splitting graph multiple different locations, , needing deal complication of relationships cross partitions. crossing partitions expensive operation, real question when partitioning is, how partition such need cross partitions in query comes infrequently possible?
that's hard question, since depends not on data model on access patterns, may change.
here's how bad situation (quote stolen):
typically, graph partition problems fall under category of np-hard problems. solutions these problems derived using heuristics , approximation algorithms.[3] however, uniform graph partitioning or balanced graph partition problem can shown np-complete approximate within finite factor.[1] special graph classes such trees , grids, no reasonable approximation algorithms exist,[4] unless p=np. grids particularly interesting case since model graphs resulting finite element model (fem) simulations. when not number of edges between components approximated, sizes of components, can shown no reasonable polynomial algorithms exist these graphs.
not leave doom , gloom, plenty of people have partitioned big graphs. facebook , twitter every day, can read flockdb on twitter side or avail of relevant facebook research. summarize , cut chase, depends on data , people partition design custom partitioning strategy, it's not software them.
finally, other architectures (such apache giraph) can auto-partition in senses; if store graph on top of hadoop, , hadoop automagically scales across cluster, technically partitioning graph you, automagically. cool, right? well...cool until realize still have execute graph traversal operations on place, may perform poorly owing fact of partitions have traversed, performance situation you're trying avoid partitioning wisely in first place.
Comments
Post a Comment