We are going to work with a dataset containing info on sufferers who’ve been identified with diabetes or who don’t current the situation. Our aim is to extract a pattern of this knowledge specializing in sufferers over 50 years of age. For every particular person on this subset, we have to add a brand new column specifying whether or not the affected person is classed as regular, with a Physique Mass Index (BMI) under 30, or overweight, with a BMI of 30 or greater.
As soon as the info is manipulated, it will likely be exported to a brand new CSV file and forwarded to the knowledge scientist accountable for additional evaluation.
To handle this job, we are going to use databases, Python, and SQL. Initially, the info shall be imported utilizing Python. Then, we are going to create a reproduction of this knowledge in a database, the place we are going to carry out the mandatory transformations utilizing SQL queries.
After finishing the required alterations and additions, the info shall be transferred again to a Pandas dataframe
, and eventually, we are going to save the ensuing dataset in CSV format.
We are going to use the Pima Indians Diabetes Database, a publicly out there dataset downloadable right here: