python - Retrieve List Index for all Items in a Set -


i have big, huge, dictionary (it isn't pretend because easier , not relevant) contains same strings on , on again. have verified can store lot more in memory if poor man's compression on system , instead store ints correspond string.

animals = ['ape','butterfly,'cat','dog'] 

exists in list , therefore has index value such animals.index('cat') returns 2

this allows me store in object bobspets = set(2,3) rather cat , dog number of items memory savings astronomical. (really don't try , dissuade me tested.

currently convert ints strings loop

tempwordlist = set() integofindex in tempset:     tempwordlist.add(animals[integofindex]) return tempwordlist 

this code works. feels "pythonic," feels there should better way. in python 2.7 on appengine if matters. may since wonder if numpy has missed.

i have 2.5 million things in object, , each has average of 3 of these "pets" , there 7500-ish ints represent pets. (no aren't pets)

i have considered using dictionary position instead of using index. doesn't seem faster, interested if thinks should be. (it took more memory , seemed same speed or close)

i considering running bunch of tests numpy , array's rather lists, before do, thought i'd ask audience , see if wasting time on have reached best solution for.

last thing, solution should pickable since loading , transferring data.

it turns out since list of strings fixed, , wish index of string, building index array immutable. in short tuple.

moving tuple rather list gains 30% improvement in speed. far more have anticipated.

the bonus largest on large lists. seems each time cross bit threshold bonus increases, in sub 1024 lists no bonus , @ million there pretty significant.

the tuple uses less memory same data.

an aside, playing lists of integers, can make these smaller using numpy array, advantage doesn't extend pickling. pickles 15% larger. think because of object description being stored in pickle, didn't spend time looking.

so in short change make animals list tuple. hoping answer more exotic.


Comments