One significantly pleasing factor about having been concerned within the area of machine studying for so long as I’ve is the chance to all the time study one thing new. That one thing new can both be a brand new instrument or methodology (given the speedy improvement within the machine studying panorama, there’s by no means a scarcity of that), however it will also be the invention of inaccurate processes in our work that we merely had by no means been conscious of.
A few of these might be fairly obscure and onerous to identify at first look. If these inaccurate processes do slip into your mannequin improvement, there’s a great probability it’s going to harm its predictive energy and thus its reliability, and, in the end, its applicability.
On this article, which is the start of a collection exploring widespread pitfalls in machine studying, we’ll give attention to three information dealing with errors that may happen each throughout the preprocessing part but in addition throughout the modeling part:
- Utilizing Numerical Identifiers as Options
- Random Partitioning As an alternative of Group Partitioning
- Together with Characteristic Values with Inadequate Observations