Machine Learning Flow

AP
Jul 18, 2023

--

Steps I do while working on ML project

Start with Exploratory Data Analysis

  1. Importing Libraries and Loading Data
  2. Data Inspection
    - Use head(), info(), describe() etc. to get general overview of data
    - Get categorical and numerical columns select_dtypes(include=[]) and select_dtypes(exclude=[])
    - Get unique values of categorical columns. Used the below code at some point, simple first start
  3. Handle missing values based on the context of data. Generally, for numerical columns mean or median and mode for categorical should suffice but still check with the data once.
  4. Data Cleaning and Preprocessing
    - duplicates
    -
    change data types if necessary
    df['column'] = df['column'].astype(type)
    - and some other things that would seem relevant according to the data.
  5. Data Visualization
  6. Univariate Analysis
  7. Multivariate Analysis
  8. Feature Engineering — create dummy variables, scaling features, deriving new variables from existing ones and encoding categorical variables.
  9. Outlier Detection

EDA is done!

--

--