Python: Normalize text with unicodedata
Using the unicodedata Python module it’s easy to normalize any unicode data
strings (remove accents, etc):
import unicodedata
data = u'ïnvéntìvé'
normal = unicodedata.normalize\
('NFKD', data).\
encode('ASCII', 'ignore')
print(normal)
The output will be:
b'inventive'
The NFKD stands for Normalization Form Compatibility Decomposition, and this
is where characters are decomposed by compatibility, also multiple combining
characters are arranged in a specific order.
Via enkipro.com.
Leave a comment