(TIL) Pandas: Named Aggregation

1 minute read

pandas>=0.25 supports named aggregation, allowing you to specify the output column names when you aggregate a groupby, instead of renaming. This will be especially useful for doing multiple aggregations on the same column. Here’s a simple example from the Docs:

In [80]: animals
Out[80]:
  kind  height  weight
0  cat     9.1     7.9
1  dog     6.0     7.5
2  cat     9.5     9.9
3  dog    34.0   198.0

In [81]: animals.groupby("kind").agg(
   ....:     min_height=pd.NamedAgg(column='height', aggfunc='min'),
   ....:     max_height=pd.NamedAgg(column='height', aggfunc='max'),
   ....:     average_weight=pd.NamedAgg(column='weight', aggfunc=np.mean),
   ....: )
   ....:
Out[81]:
      min_height  max_height  average_weight
kind
cat          9.1         9.5            8.90
dog          6.0        34.0          102.75

Note that pandas.NamedAgg is just a namedtuple. Plain tuples are allowed as well:

animals.groupby("kind").agg(
    min_height=('height', 'min'),
    max_height=('height', 'max'),
    average_weight=('weight', np.mean),
)

If your desired output column names are not valid python keywords, construct a dictionary and unpack the keyword arguments:

animals.groupby("kind").agg(**{
    'total weight': pd.NamedAgg(column='weight', aggfunc=sum),
})

Additional keyword arguments are not passed through to the aggregation functions. Only pairs of (column, aggfunc) should be passed as **kwargs. If your aggregation functions requires additional arguments, partially apply them with functools.partial().

Named aggregation is also valid for Series groupby aggregations. In this case there’s no column selection, so the values are just the functions:

animals.groupby("kind").height.agg(
    min_height='min',
    max_height='max',
)

Via Adam Merberg.

Tags: ,

Categories:

Updated:

Comments