python - Pandas DataFrame index by belonging to a set -
i have pandas dataframe that, among columns, has 1 called phone_number. want rows have phone number shows 50 times or more. best attempt this:
counts = data.phone_number.value_counts() counts = counts[counts.values > 50] data[data.phone_number in counts.index]
i get, however, error: typeerror: 'series' objects mutable, cannot hashed
what best way rows in data frame situation?
thank much!
you can use groupby
filter
.
import pandas pd import numpy np # generate artificial data # =================================================== np.random.seed(0) # 450 rows/records in total df = pd.dataframe(np.random.randint(1, 10, 450), columns=['phone_number']) out[74]: phone_number 0 6 1 1 2 4 3 4 4 8 5 4 6 6 7 3 .. ... 442 7 443 1 444 9 445 1 446 8 447 7 448 6 449 7 [450 rows x 1 columns] # processing # =================================================== # filtered results: 177 rows df.groupby('phone_number').filter(lambda group: group.count() > 50) out[75]: phone_number 2 4 3 4 5 4 8 5 11 9 12 9 17 9 20 9 .. ... 424 5 426 4 428 5 430 5 431 5 436 4 441 4 444 9 [177 rows x 1 columns] # reference: 71+54+52 = 177 df.phone_number.value_counts() out[76]: 4 71 9 54 5 52 1 50 8 49 3 45 6 44 2 43 7 42 dtype: int64
Comments
Post a Comment