Pandas: Named Aggregation
pandas>=0.25
supports named aggregation,
allowing you to specify the output column names when you aggregate a groupby,
instead of renaming.
This will be especially useful for doing multiple aggregations on the same column.
Here’s a simple example from the
Docs:
In [80]: animals
Out[80]:
kind height weight
0 cat 9.1 7.9
1 dog 6.0 7.5
2 cat 9.5 9.9
3 dog 34.0 198.0
In [81]: animals.groupby("kind").agg(
....: min_height=pd.NamedAgg(column='height', aggfunc='min'),
....: max_height=pd.NamedAgg(column='height', aggfunc='max'),
....: average_weight=pd.NamedAgg(column='weight', aggfunc=np.mean),
....: )
....:
Out[81]:
min_height max_height average_weight
kind
cat 9.1 9.5 8.90
dog 6.0 34.0 102.75
Note that pandas.NamedAgg
is just a namedtuple.
Plain tuples are allowed as well:
animals.groupby("kind").agg(
min_height=('height', 'min'),
max_height=('height', 'max'),
average_weight=('weight', np.mean),
)
If your desired output column names are not valid python keywords, construct a dictionary and unpack the keyword arguments:
animals.groupby("kind").agg(**{
'total weight': pd.NamedAgg(column='weight', aggfunc=sum),
})
Additional keyword arguments are not passed through to the aggregation functions.
Only pairs of (column, aggfunc)
should be passed as **kwargs
.
If your aggregation functions requires additional arguments, partially apply them with functools.partial()
.
Named aggregation is also valid for Series
groupby
aggregations.
In this case there’s no column selection, so the values are just the functions:
animals.groupby("kind").height.agg(
min_height='min',
max_height='max',
)
Via Adam Merberg.
Leave a comment