Omdena Chapter Page: Silicon Valley

Omdena Silicon Valley Chapter - Omdena Chapters

Upcoming Projects

Silicon Valley, USA Chapter

Project Duration: January 27th – February 19th, 2022

Data Science for Climate Change: Mitigate Greenhouse Gases Emissions by reducing energy consumption of buildings

The Background

Climate change is a globally relevant, urgent, and multi-faceted issue heavily impacted by energy policy and infrastructure. Addressing climate change involves mitigation (i.e. mitigating greenhouse gas emissions) and adaptation (i.e. preparing for unavoidable consequences).  Mitigation of GHG emissions requires changes to electricity systems, transportation, buildings, industry, and land use. 

According to a report issued by the International Energy Agency (IEA), the lifecycle of buildings from construction to demolition were responsible for 37% of global energy-related and process-related CO2 emissions in 2020. Yet it is possible to drastically reduce the energy consumption of buildings by a combination of easy-to-implement fixes and state-of-the-art strategies. For example, retrofitted buildings can reduce heating and cooling energy requirements by 50-90 percent. Many of these energy efficiency measures also result in overall cost savings and yield other benefits, such as cleaner air for occupants. This potential can be achieved while maintaining the services that buildings provide.


The dataset and challenge

The WiDS Datathon dataset was created in collaboration with Climate Change AI (CCAI) and Lawrence Berkeley National Laboratory (Berkeley Lab). The challenge participants will analyse regional differences in building energy efficiency, and build models to predict building energy consumption, an important first step in understanding how to maximize energy efficiency. Accurate predictions of energy consumption can help policymakers target retrofitting efforts to maximize emissions reductions. ​“We see building retrofitting as a low-hanging fruit to reduce greenhouse gas emissions”, said Nikola Milojević-Dupont, Chair of the Content Committee at Climate Change AI. “Predicting energy consumption of buildings helps identify retrofitting approaches that can ultimately reduce emissions.”


The Problem

In this 4-weeks project, the team will be modelling data on energy consumption to help reduce GHG emissions. 

Supervised learning methods such as Artificial Neural Networks (ANNs) and Support Vector Machines (SVMs) can be used in the prediction of building energy consumption.

Data provided can be augmented with meteorological data such as temperature, wind speed and pressure.  According to Ref. one of the major variables for building energy prediction is meteorological data.

One of the key challenges will be to choose a subset of appropriate features which impact the energy dynamics of a building to be used in model training.

Researchers suggest that the availability of a building energy system with accurate forecasting, is projected to save between 10 and 30% of total energy consumptions in buildings. Thus, the continuous effort to enhance building energy prediction is essential for more efficient buildings. The advancement of data-driven models has produced satisfactory energy estimation results. Although, without the detection of an algorithm that can accurately predict building energy consumption, this will increase greenhouse gas emission, construction of more inefficient buildings, energy demand and decrease in financial savings. Our goal is to try to develop a model(s) that can accurately predict the energy consumption.


The Project Goals

Data merging: The utilization of multiple datasets requires data merging. Therefore, the building dataset and other acquired datasets such as the meteorological dataset

Data cleaning: The process of data cleaning applied involves the removal of outliers and treatment of missing data.

Data conversion: The data may be comprised of some categorical data in variables, which will be converted to values suited for ML algorithms

Data Normalization: Scaling the data to a unit norm to avoid problems during modelling. Due to the different types of data (e.g., continuous, discrete, and categorical) present in the dataset, it is essential to normalize the data to eliminate the influence of the dimension and avoid difficulties during the model development phase

Feature Selection: The degree to which a particular feature may impact a model will vary. We need to consider features which have a larger influence on the model and eliminate the ones that are not relevant. This also helps with Model performance. Techniques such as Filter Based: Pearson’s Correlation or Embedded Methods: Random Forest Classifier can be explored for selecting the most suitable input feature. 

Model Development: Several Models ANN, SVM etc 

Model Evaluation: performance measures: R-Squared (R2), Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), Mean Squared Error (MSE). The most often utilized for energy consumption prediction are the MSE and RMSE. For the WIDS data-thon,

the evaluation metric for this competition is Root Mean Squared Error (RMSE). The RMSE is commonly used measure of the differences between predicted values provided by a model and the actual observed values.


Submissions to WIDS 2022 Datathon: Teams of 4 can submit their data to the competition. This is an Optional component of the challenge. 


  1. Data Pre-processing and Exploratory Analysis


       2. An interactive Plot /Map displaying current Energy Consumption analytics of each state.

3. Forecast the energy consumption in past time and estimate the consumtion in the future through Time-series analysis.

THE DATA: Data is provided by WIDS for this project. It can be found here.


Data Overview

The WiDS Datathon 2022 focuses on a prediction task involving roughly 100k observations of building energy usage records collected over 7 years and a number of states within the United States. The dataset consists of building characteristics (e.g. floor area, facility type etc), weather data for the location of the building (e.g. annual average temperature, annual total precipitation etc) as well as the energy usage for the building and the given year, measured as Site Energy Usage Intensity (Site EUI). Each row in the data corresponds to a single building observed in a given year. Your task is to predict the Site EUI for each row, given the characteristics of the building and the weather data for the location of the building.

THE DATA: Data is provided by WIDS for this project. It can be found here.


Data Dictionary


  • id: building id
  • Year_Factor: anonymized year in which the weather and energy usage factors were observed
  • State_Factor: anonymized state in which the building is located
  • building_class: building classification
  • facility_type: building usage type
  • floor_area: floor area (in square feet) of the building
  • year_built: year in which the building was constructed
  • energy_star_rating: the energy star rating of the building
  • ELEVATION: elevation of the building location
  • january_min_temp: minimum temperature in January (in Fahrenheit) at the location of the building
  • january_avg_temp: average temperature in January (in Fahrenheit) at the location of the building
  • january_max_temp: maximum temperature in January (in Fahrenheit) at the location of the building
  • cooling_degree_days: cooling degree day for a given day is the number of degrees where the daily average temperature exceeds 65 degrees Fahrenheit. Each month is summed to produce an annual total at the location of the building.
  • heating_degree_days: heating degree day for a given day is the number of degrees where the daily average temperature falls under 65 degrees Fahrenheit. Each month is summed to produce an annual total at the location of the building.
  • precipitation_inches: annual precipitation in inches at the location of the building
  • snowfall_inches: annual snowfall in inches at the location of the building
  • snowdepth_inches: annual snow depth in inches at the location of the building
  • avg_temp: average temperature over a year at the location of the building
  • days_below_30F: total number of days below 30 degrees Fahrenheit at the location of the building
  • days_below_20F: total number of days below 20 degrees Fahrenheit at the location of the building
  • days_below_10F: total number of days below 10 degrees Fahrenheit at the location of the building
  • days_below_0F: total number of days below 0 degrees Fahrenheit at the location of the building
  • days_above_80F: total number of days above 80 degrees Fahrenheit at the location of the building
  • days_above_90F: total number of days above 90 degrees Fahrenheit at the location of the building
  • days_above_100F: total number of days above 100 degrees Fahrenheit at the location of the building
  • days_above_110F: total number of days above 110 degrees Fahrenheit at the location of the building
  • direction_max_wind_speed: wind direction for maximum wind speed at the location of the building. Given in 360-degree compass point directions (e.g. 360 = north, 180 = south, etc.).
  • direction_peak_wind_speed: wind direction for peak wind gust speed at the location of the building. Given in 360-degree compass point directions (e.g. 360 = north, 180 = south, etc.).
  • max_wind_speed: maximum wind speed at the location of the building
  • days_with_fog: number of days with fog at the location of the building


  • site_eui: Site Energy Usage Intensity is the amount of heat and electricity consumed by a building as reflected in utility bills


The Learning Outcomes

  • Data Pre-processing & Data Insights. An interactive Plot /Map displaying current Energy Consumption analytics of each state.
  • EDA And Feature Engineering
  • Developing Models. ANN, SVM, Energy consumption forecasting through Time-series analysis.
  • Evaluating Models based on Metrics
  • Deploying to a Dashboard or App


The Tasks & Timeline


Week 1 Week 2 Week 3 Week 4

– Data Pre-processing

-Data Insights

– Research

– Exploratory Data Analysis (EDA)

– Plots

-Feature Engineering

-Model Development 

-Model Evaluation

– Initiate deployment


-Submit to Datathon (Optional)

– Deployment Build a Dashboard or App


Resources to Understand Climate Change and the Role of Data Science

  • O. Lucon, D. Urge Vorsatz, et al. Buildings. In Climate Change 2014: Mitigation of Climate Change. Contribution of Working Group III to the Fifth Assessment Report of the Intergovernmental Panel on Climate Change. 2014.
  • Ürge-Vorsatz, Diana, et al. “Advances toward a net-zero global building sector.” Annual Review of Environment and Resources 45 (2020): 227-269.
  • Rolnick, David, et al. “Tackling climate change with machine learning.” arXiv preprint arXiv:1906.05433 (2019).
  • Milojevic-Dupont, Nikola, and Felix Creutzig. “Machine learning for geographically differentiated climate change mitigation in urban areas.” Sustainable Cities and Society (2020): 102526.
  • Kontokosta, Constantine E., and Christopher Tull. “A data-driven predictive model of city-scale energy use in buildings.” Applied energy 197 (2017): 303-317.
  • Kolter, J., and Joseph Ferreira. “A large-scale study on predicting and contextualizing building energy usage.” Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 25. No. 1. 2011.
  • Jui-Sheng Chou, Duc-Son Tran. “Forecasting energy consumption time series using machine learning techniques based on usage patterns of residential householders”, Energy, Volume 165, Part B, 15 December 2018, Pages 709-726

Completed Projects

Earthquake Safe Path: Devising safe paths to travel in the aftermath of an Earthquake.

The Background:

California is unusually prone to earthquakes because it exists on the San Andreas Fault. The presence of hundreds of fault lines has led to over 10,000 earthquakes per year in California. How can a city prepare to respond to an earthquake? There are many ways that a local city/government can respond but protecting the safety and lives of individuals is the priority during any disaster situation. Emergency evacuation planning will help reduce confusion, minimize injuries, and ultimately save lives. One can use the Istanbul case study to get educated about how to devise safe paths in the aftermath of an earthquake and adapt it to any local area, including LA. This part of the project only covers the Istanbul case study. A future project or a modification could be using data from the LA area.

The Problem

There is much work being done across the world to use AI for predicting earthquakes and damage. For example, one way to help the affected population is by accurately predicting and
verifying safe routes between schools, hospitals, workplaces, and homes to reduce the risk of traveling after an earthquake.

The Project Goals

To provide a safe and fast route planning for families to evacuate after an earthquake.

Possible Modification of an Existing Solution:  The methods applied for studying the safety routes in the aftermath of the Turkey earthquake by the Omdena team can be modified and applied to Los Angeles, which is a dense area. The main difference will be to find a new way to collect data. Find more information about the original project here.

A devastating earthquake in Turkey occurred in 1999 around 150-200 kilometers from Istanbul with a magnitude above 7 on the Richter scale. According to official records, 18,373 people died, 48,901 injured, and 5,840 were lost  Using the map of an Istanbul area, we will be building a model that calculates a risk score for each section of the locality using various Machine Learning algorithms, then calculating the shortest and safest path between two locations. Using Turkey data, we will estimate a path’s safety using distance-to-buildings as a safety proxy utilizing the density of buildings and road width. Next, we will apply a pathfinding algorithm providing the safest and shortest path from A to B.

The Learning Outcomes:

This is a computer vision project that will build your skills in the areas of:

1. Deep Learning Models

2. Image Segmentation Models

3. Pathfinding Algorithms

4. Satellite Image Processing

5. Integrating heatmaps with existing street graphs (OSM)

Source Code:

Link to the Original Project: Improving the Aftermath Management of an Earthquake

Project Results:

Silicon Valley Chapter Lead
Nishrin Kachwala

Nishrin Kachwala

Nishrin has 15+ years of experience Developing products building software infrastructure and technical marketing for Fortune 500 companies. She gives her time to volunteering/mentoring/teaching in her local School community and Data Science organizations WW Code, Women in Big Data, and Women in AI. She has a Master of Science in Physics and did Ph. D research in Astrophysics.