1 minute read

Towards Data Science on Medium has been a great source of tips so far, and this article is a great example that highlights the following functions new to me.

> import pandas as pd

> df = pd.DataFrame(data={
    'Name': ['Bob', 'Mark', 'Josh', 'Anna', 'Peter', 'Dexter'],
    'Points': [37, 91, 66, 42, 99, 81]
})

> df['Points'].nlargest(3)  # Display the 3 largest values in the 'Points' series.
4    99
1    91
5    81
Name: Points, dtype: int64

> df.nlargest(3, columns='Points')  # Display the rows associated with the 3 largest 'Points' values.
     Name  Points
4   Peter      99
1    Mark      91
5  Dexter      81

> df['Points'].nsmallest(3)
0    37
3    42
2    66
Name: Points, dtype: int64

> pd.cut(df['Points'], bins=2)  # Bin 'Points' values into 2 discrete intervals.
0    (36.938, 68.0]
1      (68.0, 99.0]
2    (36.938, 68.0]
3    (36.938, 68.0]
4      (68.0, 99.0]
5      (68.0, 99.0]
Name: Points, dtype: category
Categories (2, interval[float64]): [(36.938, 68.0] < (68.0, 99.0]]

> pd.cut(df['Points'], bins=[0, 50, 100])  # Use specific bins.
0      (0, 50]
1    (50, 100]
2    (50, 100]
3      (0, 50]
4    (50, 100]
5    (50, 100]
Name: Points, dtype: category
Categories (2, interval[int64]): [(0, 50] < (50, 100]]

> pd.cut(df['Points'], bins=[0, 50, 100], labels=['Fail', 'Pass'])  # Add labels.
0    Fail
1    Pass
2    Pass
3    Fail
4    Pass
5    Pass
Name: Points, dtype: category
Categories (2, object): ['Fail' < 'Pass']

Comments