Introduction
Just lately I’ve been engaged on the domain-specific fine-tuning of a number of LLMs. The primary and perhaps a very powerful a part of this activity is to gather, scrape, and clear textual knowledge to feed the LLM. I observed that my code was turning into messy with many repetitions, as a result of for each recognized supply I used to be writing a script from scratch which had numerous issues in frequent with different scripts in my codebase. I used to be not following the “Don’t repeat your self” (DRY) precept in any respect. That is why I made a decision to implement the Template Design Sample and make my code base extra elegant and environment friendly.
The Template Design Sample
I received’t repeat right here what a design sample is and the way we classify design patterns based mostly on their functionalities, since I’ve written many articles on the topic. If you’re concerned with studying my earlier articles on this subject I’ll depart some references on the finish.
On this article, I’ll present you an instance associated to knowledge processing. Let’s say that in our undertaking we have now to take care of totally different varieties of information that we need to analyze. A few of these knowledge are…