Pandas: Some notes on groupby

less than 1 minute read

  1. The count() aggregation function counts only non-null values. To count all values, whether null or non-null, use size.

  2. You can specify the names of aggregated columns as the arguments to the agg function. Here I use a dictionary so that I can use string constants for colum names.

     # Series level
     df.groupby('class')['sepal length (cm)'].agg(
             # 'new column': 'function',
             'sepal_average_length': 'mean',
             'sepal_standard_deviation': 'std',
     # DataFrame level
             # 'new column': ('column', 'function'),
             'sepal_average_length': ("sepal length (cm)", "mean"),
             'sepal_standard_deviation': ("sepal length (cm)", "std"),

Via Christopher Tao and Soner Yıldırım.