Python: Normalize text with unicodedata
Using the unicodedata
Python module it’s easy to normalize any unicode data strings
(remove accents, etc):
import unicodedata
data = u'ïnvéntìvé'
normal = unicodedata.normalize\
('NFKD', data).\
encode('ASCII', 'ignore')
print(normal)
The output will be:
b'inventive'
The NFKD
stands for Normalization Form Compatibility Decomposition, and this is where
characters are decomposed by compatibility, also multiple combining characters are
arranged in a specific order.
Via enkipro.com.
Leave a comment