python - Pandas DataFrame index by belonging to a set -


i have pandas dataframe that, among columns, has 1 called phone_number. want rows have phone number shows 50 times or more. best attempt this:

counts = data.phone_number.value_counts() counts = counts[counts.values > 50] data[data.phone_number in counts.index] 

i get, however, error: typeerror: 'series' objects mutable, cannot hashed

what best way rows in data frame situation?

thank much!

you can use groupby filter.

import pandas pd import numpy np  # generate artificial data # =================================================== np.random.seed(0) # 450 rows/records in total df = pd.dataframe(np.random.randint(1, 10, 450), columns=['phone_number'])  out[74]:   phone_number 0               6 1               1 2               4 3               4 4               8 5               4 6               6 7               3 ..            ... 442             7 443             1 444             9 445             1 446             8 447             7 448             6 449             7  [450 rows x 1 columns]  # processing # ===================================================  # filtered results: 177 rows df.groupby('phone_number').filter(lambda group: group.count() > 50)  out[75]:       phone_number 2               4 3               4 5               4 8               5 11              9 12              9 17              9 20              9 ..            ... 424             5 426             4 428             5 430             5 431             5 436             4 441             4 444             9  [177 rows x 1 columns]  # reference: 71+54+52 = 177 df.phone_number.value_counts()  out[76]:  4    71 9    54 5    52 1    50 8    49 3    45 6    44 2    43 7    42 dtype: int64 

Comments

Popular posts from this blog

toolbar - How to add link to user registration inside toobar in admin joomla 3 custom component -

linux - disk space limitation when creating war file -