AI for Climate Change / continuation (part 2)

Local Project Philadelphia, USA Chapter

Coordinated by the Lead of , Imran Yasin,

Status: Completed

Project Duration: 06 Jul 2022 - 03 Aug 2022

Open Source resources available from this project

Project goals.

Model Development: We will train and evaluate the following models at least ANN, SVM, KNeighborsRegressor, GradientBoostingRegressor, ExtraTreesRegressor, RandomForestRegressor, DecisionTreeRegressor, LGBMRegressor, XGBRegressor, LinearRegression, Lasso, Ridge, ElasticNetModel Evaluation: R-Squared (R2), Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), Mean Squared Error (MSE). The evaluation metric for this project is the Mean Average Error (MAE).Deployment Build a Dashboard or App

Project plan.

    Summary and results.

    Below are highlights of Learning outcomes of various phases
    • Data Pre-processing
    Understanding insights about data and preparing it for future phases
    o Finding Numerical and Categorical Fields
    o Finding Null values using missingno, HeatMaps
    o Handling Null values using KNNImputer and SimpleImputer
    o Outlier Detection using IQR and Standard deviation methods
    o Encoding of categorical features (Finding Uniques values for each, Merging Facility types, Creating seasonal quarters etc)
     Used pandas get_dummies() and OneHotEncoder()
    o Scaling was done using StandardScaler()
    o Basic visualization for all features
     Distribution plots for numerical features and Count plots for categorical features
     Also explored sweetviz
    • Exploratory Data Analysis (EDA)
    o Univariate analysis for categorical features using barplot() and countplot()
    o Univariate analysis for numerical features using kdeplot() ,exploring mean,median, std_dev,skewness, kurtosis for each feature
    o Bivariate analysis using corr_matrix() , pairplot(), scatter plots
    o Bivariate Analysis: Continuous-Categorical using
    o Z-test, T-test, Chi- Square, Kruskal-Wallis H test
    o Basic EDA was done using Tableau also
    o Outlier fixing was done using IQR, 3 STD methods whichever applicable
    • Feature Engineering
    o VIF (Variance Inflation Factor) was used to detect multicollinearity in regression analysis
    o Feature creation (e.g. Combine monthly average temps into seasonal columns)
    o Feature selection and Importance
     Tree based models like LGBoost, XGBoost, RandomForest
     LinearRegression()and QuantileRegressor()
     Permutation Based Feature Importance
     Dimensionality reduction using RFE(Recursive Feature Elimination and PCA(Principle Component Analysis)

    Share project on: