Due to the vast number of medical billing codes, it is generally infeasible to generate machine learning features from them as one-hot vectors. The paper Canonical Correlation Analysis for Analyzing Sequences of Medical Billing Codes discusses the use of CCA to reduce this dimensionality and capture the inherent relationships that exist between the codes.

