Python: Normalize text with unicodedata

April 5, 2017 less than 1 minute read

Using the unicodedata Python module it’s easy to normalize any unicode data strings (remove accents, etc):

import unicodedata

data = u'ïnvéntìvé'
normal = unicodedata.normalize\
    ('NFKD', data).\
    encode('ASCII', 'ignore')
print(normal)

The output will be:

b'inventive'

The NFKD stands for Normalization Form Compatibility Decomposition, and this is where characters are decomposed by compatibility, also multiple combining characters are arranged in a specific order.

Via enkipro.com.

Share on

Twitter Facebook LinkedIn

Using Stoicism to thrive as an IC Data Scientist

January 25, 2026 6 minute read

For Data Scientists facing lack of control over data and outcomes, Stoicism offers a framework to thrive in one’s career. By focusing on the “Dichotomy of Co...

Culture turns strategy into action

January 22, 2026 2 minute read

Like many people, I enjoy working at startups with a mission I believe in. These companies match talented individuals with the latest technologies to solve i...

Python: How uv got so fast

January 9, 2026 less than 1 minute read

I love uv, the latest python dependencies manager, and use it whenever I can.

Stoicism and the pursuit of a good life

December 17, 2025 2 minute read

Stoicism emphasizes focusing on the controllable aspects of our lives and accepting the uncontrollable. Applying this philosophy can help us in our pursuit o...

Francis T. O'Donovan