Start with Exploratory Data Analysis
- Importing Libraries and Loading Data
- Data Inspection
- Usehead(), info(), describe()
etc. to get general overview of data
- Get categorical and numerical columnsselect_dtypes(include=[])
andselect_dtypes(exclude=[])
- Get unique values of categorical columns. Used the below code at some point, simple first start - Handle missing values based on the context of data. Generally, for numerical columns mean or median and mode for categorical should suffice but still check with the data once.
- Data Cleaning and Preprocessing
- duplicates
- change data types if necessarydf['column'] = df['column'].astype(type)
- and some other things that would seem relevant according to the data. - Data Visualization
- Univariate Analysis
- Multivariate Analysis
- Feature Engineering — create dummy variables, scaling features, deriving new variables from existing ones and encoding categorical variables.
- Outlier Detection
EDA is done!