1 minute read

pandas has two handy functions for combining DataFrames:

  • The combine function performs a column-wise combine of one DataFrame with another:
# Combine using a simple function that chooses the smaller column.

>>> def take_smaller(s1, s2):
    return s1 if s1.sum() < s2.sum() else s2
>>> df1 = pd.DataFrame({'A': [0, 0], 'B': [4, 4]})
>>> df1
   A  B
0  0  4
1  0  4
>>> df2 = pd.DataFrame({'A': [1, 1], 'B': [3, 3]})
>>> df2
   A  B
0  1  3
1  1  3
>>> df1.combine(df2, take_smaller)
   A  B
0  0  3
1  0  3

# Combine using a true element-wise combine function.

>>> df1 = pd.DataFrame({'A': [5, 0], 'B': [2, 4]})
>>> df1
   A  B
0  5  2
1  0  4
>>> df2 = pd.DataFrame({'A': [1, 1], 'B': [3, 3]})
>>> df2
   A  B
0  1  3
1  1  3
>>> df1.combine(df2, np.minimum)
   A  B
0  1  2
1  0  3
  • The combine_first function combines the two DataFrames by filling null values in one DataFrame with non-null values from the other:
>>> df1 = pd.DataFrame({'A': [None, 0], 'B': [4, None]})
>>> df1
     A    B
0  NaN  4.0
1  0.0  NaN
>>> df2 = pd.DataFrame({'B': [3, 3], 'C': [1, 1]}, index=[1, 2])
>>> df2
   B  C
1  3  1
2  3  1
>>> df1.combine_first(df2)
     A    B    C
0  NaN  4.0  NaN
1  0.0  3.0  1.0
2  NaN  3.0  1.0

Leave a comment