Pandas: Named Aggregation
pandas>=0.25 supports named aggregation, allowing you to specify the output
column names when you aggregate a groupby, instead of renaming. This will be
especially useful for doing multiple aggregations on the same column. Here’s a
simple example from the
Docs:
In [80]: animals
Out[80]:
kind height weight
0 cat 9.1 7.9
1 dog 6.0 7.5
2 cat 9.5 9.9
3 dog 34.0 198.0
In [81]: animals.groupby("kind").agg(
....: min_height=pd.NamedAgg(column='height', aggfunc='min'),
....: max_height=pd.NamedAgg(column='height', aggfunc='max'),
....: average_weight=pd.NamedAgg(column='weight', aggfunc=np.mean),
....: )
....:
Out[81]:
min_height max_height average_weight
kind
cat 9.1 9.5 8.90
dog 6.0 34.0 102.75
Note that pandas.NamedAgg is just a namedtuple. Plain tuples are allowed as
well:
animals.groupby("kind").agg(
min_height=('height', 'min'),
max_height=('height', 'max'),
average_weight=('weight', np.mean),
)
If your desired output column names are not valid python keywords, construct a dictionary and unpack the keyword arguments:
animals.groupby("kind").agg(**{
'total weight': pd.NamedAgg(column='weight', aggfunc=sum),
})
Additional keyword arguments are not passed through to the aggregation
functions. Only pairs of (column, aggfunc) should be passed as **kwargs. If
your aggregation functions requires additional arguments, partially apply them
with functools.partial().
Named aggregation is also valid for Series groupby aggregations. In this
case there’s no column selection, so the values are just the functions:
animals.groupby("kind").height.agg(
min_height='min',
max_height='max',
)
Via Adam Merberg.
Leave a comment