Pandas: Regular expressions with str.contains

less than 1 minute read

The pd.Series.str.contains method assumes that it is passed a regular expression for the pat input:

>>> import re
>>> import pandas as pd
>>> df = pd.DataFrame()
>>> df['item'] = [1, 2, 3, 4, 5, 6]
>>> df['size'] = ['SMALL', 'small', 'medium', 'large', 'large', 'large']
>>> df
   item    size
0     1   SMALL
1     2   small
2     3  medium
3     4   large
4     5   large
5     6   large
>>> df[df['size'].str.contains(pat='small|medium')]
   item    size
1     2   small
2     3  medium

You can also pass regex flags:

>>> df[df['size'].str.contains(pat='small|medium', flags=re.IGNORECASE)]
   item    size
0     1   SMALL
1     2   small
2     3  medium

Set regex=False to treat pat as a plain character sequence.

Via pandas.pydata.org.

Comments