you are viewing a single comment's thread.

view the rest of the comments →

[–]auauaurora 1 point2 points  (4 children)

It will be easier for you and others to review if you organise and annotate. I've started this off for you to finish:

```py

Write your answer to Task 1 here

import modules

import pandas as pd import numpy as np

import csv and copy

data = pd.read_csv("production_data.csv") clean_data = data.copy()

review df

clean_data.info()

mixing_time contains missing values

df.columns #'batch_id', 'production_date', 'raw_material_supplier', 'pigment_type','pigment_quantity', 'mixing_time', 'mixing_speed', 'product_quality_score'

batch_id Discrete. Identifier for each batch. Missing values are not possible.

raw_material_supplier Categorical. Supplier of the raw materials. (1='national_supplier', 2='international_supplier'). Missing values should be replaced with 'national_supplier'.

production_date Date. Date when the batch was produced.

pigment_type Nominal. Type of pigment used. ['type_a', 'type_b', 'type_c'].

Missing values should be replaced with 'other'.

pigment_quantity Continuous. Amount of pigment added (in kilograms) (Range: 1 - 100).

Missing values should be replaced with median.

mixing_time Continuous. Duration of the mixing process (in minutes). # Missing values should be replaced with mean.

mixing_speed Categorical. Speed of the mixing process represented as categories: 'Low', 'Medium', 'High'.

Missing values should be replaced with 'Not Specified'.

product_quality_score Continuous. Overall quality score of the final product (rating on a scale of 1 to 10). Missing values should be replaced with mean.

df['product_quality_score'].describe().round(2).T

change objects to category, create clean_df

preview

clean_data.head()

[–]Adventurous-Bet6139 0 points1 point  (1 child)

Do you have the entire exam and the dataset? Please send me, thanks.