More about Omdena
Omdena is an innovation platform for building AI solutions to real-world problems through the power of bottom-up collaboration.
Steps towards building an ethical credit scoring AI system for individuals without a previous bank account.
With traditional credit scoring system, it is essential to have a bank account and have regular transactions, but there are a few groups of people especially in developing nations that still do not have a bank account for a variety of reasons; they do not see the need for it, some are unable to produce the necessary documents, for some the cost of opening the accounts is high, some may not have the knowledge about opening accounts, lack of awareness, trust issues and some unemployed.
Some of these individuals may need loans for essentials; maybe to start a business or like farmers who need a loan to buy fertilizers or seeds. While many of them may be reliable creditors but because they do not get access to funding, they are being pushed to take out high-cost loans from non-traditional, often predatory lenders.
Low-income individuals have an aptitude for managing their personal finances. And we need a system for ethical credit scoring AI in order to help these borrowers and clutch them from falling into deeper debts.
Omdena partnered with Creedix to build an ethical AI-based credit scoring system so that people get access to fair and transparent credit.
The goal was to determine the creditworthiness of an un-banked customer with alternate and traditional credit scoring data and methods. The data was focused on Indonesia but the following approach is applicable in other countries.
It was a challenging project and I believe everyone should be eligible for a loan for essential business ventures but they should be able to pay it back while not having to pay exorbitant interest rates. Finding that balance was crucial for our project.
Three datasets were given to us,
Information on transactions made by different account numbers, the region, mode of transaction, etc.
All the data is privacy law compliant.
All data given to us was anonymous as privacy was imperative and not an afterthought.
Going through the data we understood we had to use unsupervised learning since the data was not labeled.
Some of us were comparing online available data sets to the data set we had at hand, and some of us started working on sequence analysis and clustering to find anomalous patterns of behavior. Early on, we measured results with silhouette score — a heuristic tool to figure out if the parameters we had would provide significant clusters. The best value is 1 with well separable clusters, and the worst is -1 with strongly overlapping ones. We got average values close to 0s, and these results were not satisfactory to us.
With the given data we performed feature engineering. We calculated per ca-pita income score and segregated management roles from other roles. We also calculated the per capita income score so that we can place buckets into accounts in areas that are likely to be reliable customers. For example. management roles mean they would have a better income to pay back.
But even with all the feature engineering, we were unable to get a signal from the data given for clustering. How did we proceed?
We scraped data online from different sites like indeed and numbeo. Since we had these challenges we were not able to give one solution to the customer and had to improvise to provide a plan for future analysis, so we used dummy data.
We scraped data from sites like numbeo to get the cost of living per area, how much they spend on living. From indeed we got salary data to assign an average salary to the jobs.
With the data, scraped online and feature engineering from the given dataset, we tried to figure out if we can get a prediction from using clustering algorithms.
As mentioned above, with the context that we have gathered from Creedix, we have engineered or aggregated many features based on the transaction time series dataset. Although these features describe each customer better, we can only guess the importance of each feature with regards to each customer’s credit score based on our research. So, we have consolidated features for each customer based on our research on credit scoring AI. As for the importance of each feature with regards to credit scoring AI in Indonesia, this will be up to the Creedix team to decide.
CreditScore = 7*Salary + 0.5*Zakut + 4000*Feature1 + …+ 5000*Feature6
Solutions given to Creedix were both Supervised Learning and Unsupervised Learning. Even after all the feature engineering and data found online we were still getting a low silhouette score signifying that there would be overlapping clusters.
So we decided that we will provide solutions for Supervised Learning using Auto ML and Unsupervised learning, both using dummy variables, the purpose -was to serve future analysis or future modeling for the Creedix Team.
The dataset we used for Supervised Learning — https://www.kaggle.com/c/GiveMeSomeCredit/data
With Supervised Learning, we did modeling with both TPOT and Auto SKLearn. This was done so that when we have more features available that are accessible to them but may not be for Omdena collaborators they can use the information to build their models. When they have target variables to use.
Our idea is to create a script that can take any datasets and automatically search for the best algorithm by iterating through all classifiers/regressors, hyperparameters based on user-defined metrics.
Our initial approach was to code from scratch iterating individual algorithms from packages (e.g. sklearn, XGBoost and LightGBM) but then we came across Auto ML packages that already do what we wanted to build. Thus, we decided to use those readily available packages instead and not spend time reinventing the wheel.
We used two different auto ML packages TPOT and Auto-sklearn. TPOT will automate the most tedious part of machine learning by intelligently exploring thousands of possible pipelines and finding the best one for your data.
Auto-sklearn frees an ML user from algorithm selection and hyperparameter tuning. It leverages recent advances:
Both TPOT and auto-sklearn are similar, but TPOT stands out between the two due to its reproducibility. TPOT is able to generate both the model and also its python script to reproduce the model.
In the beginning, we used agglomerative clustering (a form of hierarchical clustering) since the preprocessed dataset contains a mix of continuous and categorical variables. As we have generated many features from the dataset (some of them very similar ones, based on small variations in their definition), first we had to eliminate most of the correlated ones. Without this, the algorithm would struggle to find the optimal number of groupings. After this task, we remained with the following groups of features:
and three single specific features:
In a range of potential clusters from 2 to 14, the average silhouette score was best with 8 clusters — 0.1027. The customer data was sliced into 2 large groups and 6 much smaller ones, which was what we were looking for (smaller groups could be considered anomalous):
This was not a satisfactory result, anyway. On practical grounds, describing clusters 3 to 8 proved challenging, which is correspondent with a relatively low clustering score.
It has to be remembered that the prime reason for clustering was to find reasonably small and describable anomalous groupings of customers.
We, therefore, decided to apply an algorithm that is efficient with handling outliers within a dataset — DBSCAN. Since the silhouette clustering score is well suited for convex clusters and DBSCAN is known to return complex non-convex clusters, we forgo calculating any clustering scores and focus on the analysis of the clusters returned by the algorithm.
Manipulating the parameters of DBSCAN, we found the clustering effects were stable — the clusters contained similar counts, and customers did not traverse between non-anomalous and anomalous clusters.
Also analyzing and trying to describe various clusters we find it easier to describe qualities of each cluster, for example:
Important to note is that for various sets of features within the data provided here, clustering score for both hierarchical as well as DBSCAN methods returned even better clustering efficiency scores. However, at this level of anonymity (i.e. without the ground truth information), one cannot decide the best split of customers. It might transpire there is a relatively different optimal set of features that best splits customers and provides better entropy scores of these groups calculated on the creditworthiness category.
By Jake Carey-Rand
One of my favorite quotes at the moment is from Max Tegmark, MIT professor and author of ‘Life 3.0: Being Human in the Age of Artificial Intelligence’. Tegmark talks about avoiding “this silly, carbon-chauvinism idea that you can only be smart if you’re made of meat” in reference to a more inclusive definition of intelligence to include artificial as well as biological intelligence. I’d like to double down on the requirement for an even more inclusive definition of intelligence – or rather, a more inclusive approach to artificial intelligence (AI). An approach where the emphasis is on diversity and collaboration, for meat lovers, vegans, and robots alike.
Outside the tech biosphere, reservations are often expressed about AI. These moral questions can run even deeper for some of us within the AI sector. Fear that AI will put humans out of a job or learn to wage war against humanity is bounced around the social interwebs at will. But ask a machine learning engineer how the AI she’s been developing actually does what it does, and most often you are met by a bit of a shrug of the shoulders beyond a certain point in the process. The truth is, advanced AI is still a bit of a mystery to us mere humans – even the really smart machine learning humans.
Armed with this context, I won’t argue there aren’t potential downsides. AI is built by people. People decide what data goes into the model. People build models. People train the models and ultimately people decide how to productionize the models and integrate them into a broader workflow or business.
Because all of this is (for the moment) directed by people, it means we have choices. Up to a point – we have a choice about how we create AI, what its tasks are, and ultimately the path we direct it to take. The implications of these choices are crystal clear now more than ever. The power of AI to create a better, healthier and arguably more equitable world is tangible and occurring at a very rapid pace. But so is the dark alternative – people have a choice to create models which spread Fear, Uncertainty, and Doubt to hack an election or to steal money.
AI is a tool like any other… well, almost.
The pursuit of ‘AI nirvana’ is thought by some to be a pipedream cluttered with wasted money and resources along the path to mediocre success. Others share a view that AI at-scale is something reserved only for the FAANG companies (plus Microsoft, Uber, etc.). Without diving into the technicalities of data science and machine learning too deeply, the reality is that organizations are still struggling to capture the value of their data with any corresponding models they build. In fact, 87% of data science projects fail to deliver anything of value in production to the business. Challenges I hear time and again from customers, friends and colleagues include:
Critically, some of the most important characteristics of data science success relate to soft skill development – those which make us uniquely human. Yes, we need great programmers, data wranglers, architects, and analysts for everything from data archeology to model training. But it is just as important (I would argue now more important) to curate emotional intelligence if you want to succeed with artificial intelligence. The success of an organization is now judged more heavily based on its ability to build and maintain Cultural Empathy, Critical Thinking, Problem Solving, and Agile Initiatives. Importantly, these skills also lead to a more natural ability to link data science investment directly to organizational (and social) value.
In other words, instilling a culture of diversity, inclusion, and collaboration is integral to AI and ultimately business success. As an organizational psychologist and professor, Tomas Chamorro-Premuzic said in a 2017 Harvard Business Review article, “No matter how diverse the workforce is, and regardless of what type of diversity we examine, diversity will not enhance creativity unless there is a culture of sharing knowledge.” Collaboration is key.
Out of all the soft skills, the need for an unbiased and collaborative approach to AI is probably the most important thing we can do to more positively impact AI development. Omdena has quickly become the world leader in Collaborative AI, demonstrating rapid success in solving some of the world’s toughest problems. Experts discuss AI bias at length, but remember that humans create AI. We are not perfect and we certainly are not all-knowing. Imagine if all AI were produced by programmers in Silicon Valley. Even they would agree, a model to predict landslides based on drought patterns from satellite imagery in Southeast Asia, would be better done in collaboration with those local to the problem who also understand farming and economics relevant to the region. Likewise, a model built to analyze mortgage default risk based on social sentiment analysis and financial data mining needs to be built by a diverse, collaborative team. As recent history is teaching us, decisions made by the few, expand to elevate systemic division and privilege.
Jack Ma, the world’s wealthiest teacher, said in an address to Hong Kong graduates, ‘Everything we taught our kids over the past 200 years, machines will do better in the future. Educators should teach what machines are not capable of, such as creativity and independent thinking.’
My hope is that schools are adapting to this change, along with all the other changes they must now manage. But for most corporate teams, they have some catching up to do to ensure AI adoption is not only successful but considered a success for all. Let’s start by encouraging a broad, diverse, and collaborative approach to AI. As Tegmark says, “Let’s Build AI that Empowers Us”.
Jake Carey-Rand is a technology executive with nearly 20 years of experience across AI, big data, Internet delivery, and web security. Jake recently joined Omdena as an advisor, to help scale the AI social enterprise.
Omdena is the company “Building Real-World AI Solutions, Collaboratively.” I’ve been watching the impact Omdena and its community of 1,200+ data scientists, from more than 82 countries (we call them Changemakers) have been doing over the last 12 months. Their ability to solve absolutely critical issues around the world has been inspiring. It has also led to some questions about how these Changemakers have been able to do what so many organizations fail to do time and time again – create real-world AI solutions in such a short amount of time. This has inspired us to explore how we could scale this engine of AIForGood even faster. The Omdena platform can be leveraged by enterprises who, especially during these challenging times, have to accelerate, adapt, and transform their approach to “business as usual” through a more collaborative approach to AI.
Is it possible to estimate with minimum expert knowledge if your street will be safer than others when an earthquake occurs?
We answered how to estimate the safest route after an earthquake with computer vision and route management.
The last devastating earthquake in Turkey occurred in 1999 (>7 on the Richter scale) around 150–200 kilometers from Istanbul. Scientists believe that this time the earthquake will burst directly in the city and the magnitude is predicted to be similar.
The main motivation behind this AI project hosted by Impacthub Istanbul is to optimize the Aftermath Management of Earthquake with AI and route planning.
After kicking off the project and brainstorming with the hosts, collaborators, and the Omdena team about how to better prepare the city of Istanbul for an upcoming disaster, we spotted a problem quite simple but at the same time really important for families: get reunited ASAP in earthquake aftermath!
Our target was set to provide safe and fast route planning for families, considering not only time factors but also broken bridges, landing debris, and other obstacles usually found in these scenarios.
We resorted to working on two tasks: creating a risk heatmap that would depict how dangerous is a particular area on the map, and a path-finding algorithm providing the safest and shortest path from A to B. The latter algorithm would rely on the previous heatmap to estimate safeness.
Challenge started! Deep Learning for Earthquake management by the use of Computer Vision and Route Management.
By this time, we optimistically trusted in open data to successfully address our problem. However, we realized soon that data describing buildings quality, soil composition, as well as pre and post-disaster imagery, were complex to model, integrate, when possible to find.
Bridges over streets, buildings height, 1000 types of soil, and eventually, interaction among all of them… Too many factors to control! So we just focused on delivering something more approximated.
The question was: how to accurately estimate street safeness during any Earthquake in Istanbul without such a myriad of data? What if we could roughly estimate path safeness by embracing distance-to-buildings as a safety proxy. The farther the buildings the safer the pathway.
For that crazy idea, firstly we needed buildings footprints laid on the map. Some people suggested borrowing buildings footprints from Open Street Map, one of the most popular open-source map providers. However, we noticed soon Open Street Map, though quite complete, has some blank areas in terms of buildings metadata which were relevant for our task. Footprints were also inaccurately laid out on the map sometimes.
A big problem regarding the occurrence of any Earthquake and their effects on the population, and we have Computer Vision here to the rescue! Using Deep Learning, we could rely on satellite imagery to detect and then, estimated closeness from pathways to them.
The next stone on the road was to obtain high-resolution imagery of Istanbul. With enough resolution to allow an ML model locates building footprints in the map as a standard-visually-agile human does. Likewise, we would also need some annotated footprints on these images so that our model can gracefully train.
Instead of labeling hundreds of square meters manually, we trusted on SpaceNet (and in particular, images for Rio de Janeiro) as our annotated data provider. This dataset contains high-resolution satellite images and building footprints, nicely pre-processed and organized which were used in a recent competition.
The modeling phase was really smooth thanks to fast.ai software.
We used a Dynamic Unit model with an ImageNet pre-trained resnet34 encoder as a starting point for our model. This state-of-the-art architecture uses by default many advanced deep learning techniques, such as a one-cycle learning schedule or AdamW optimizer.
All these fancy advances in just a few lines of code.
We set up a balanced combination of Focal Loss and Dice Loss, and accuracy and dice metrics as performance evaluators. After several frozen and unfrozen steps in our model, we came up with good-enough predictions for the next step.
For more information about working with geospatial data and tools with fast.ai, please refer to .
Finding high-resolution imagery was the key to our model and at the same time a humongous stone hindering our path to victory.
For the training stage, it was easy to elude the manual annotation and data collection process thanks to SpaceNet, yet during prediction, obtaining high-res imagery for Istanbul was the only way.
Thankfully, we stumble upon Mapbox and its easy-peasy almost-free download API which provides high-res slippy map tiles all over the world, and with different zoom levels. Slippy map tiles are 256 × 256 pixel files, described by x, y, z coordinates, where x and y represent 2D coordinates in the Mercator projection, and z the zoom level applied on earth globe. We chose a zoom level equal to 18 where each pixel links to real 0.596 meters.
As they mentioned on their webpage, they have a generous free tier that allows you to download up to 750,000 raster tiles a month for free. Enough for us as we wanted to grab tiles for a couple of districts.
Once all required tiles were stealing space from my Google Drive, it was time to switch on our deep learning model and generate prediction footprints for each tile.
Then, we geo-referenced the tiles by translating from the Mercator coordinates to the latitude-longitude tuple (that used by mighty explorers). Geo-referencing tiles was a required step to create our prediction piece of art with GDAL software.
gdal_merge.py thecommand allows us to glue tiles by using embedded geo-coordinates in TIFF images. After some math, and computing time… voilà! Our high-res prediction map for the district is ready.
Ok, I see my house but should I go through this street?
Building detection was not enough for our task. We should determine distance from a given position in the map to the closest building around so that a person in this place could know how safe is going to be to cross this street. The larger the distance the safer, remember?
The path-finding team would overlay the heatmap below on his graph-based schema and by intersecting graph edges (streets) with heatmap pixels (user positions), they could calculate the average distance for each pixel on the edge and thus obtaining a safeness estimation for each street. This would be our input when finding the best A-B path.
If a pixel belongs to some building, it returns zero. If not, return the minimum euclidean distance from the center point to the building’s pixels. This process along with NumPy optimizations was the key to mitigate the quadratic complexity beneath this computation.
Repeat the process for each pixel and the safeness map comes up.
Omdena is an innovation platform for building AI solutions to real-world problems through the power of bottom-up collaboration.
Can AI help to address energy poverty in Nigeria where more than 100m people lack stable access to electricity?
By Laura Clark Murray
Without stable access to electricity, families can’t light their homes or cook their food. Hospitals and schools can’t dependably serve their communities. Businesses can’t stay open.
Energy poverty shapes and constrains nearly every aspect of life for those who are trapped in it. As the Global Commission to End Energy Poverty puts it, “we cannot end poverty without ending energy poverty.” In fact, energy poverty is considered to be one of humanity’s greatest challenges of this century.
In Nigeria, Africa’s most populous country, more than half of the 191 million citizens live in energy poverty. And though governments have been talking for years about extending national electricity grids to deliver energy to more people, they’ve made little progress.
Rather than focusing on the national electricity grid, Nigerian non-profit Renewable Africa 365, or RA365, is taking a different approach. RA365 is working with local governments to install mini solar power substations, known as renewable energy microgrids. Each microgrid can deliver electricity to serve small communities of 4,000 people. In this way, RA365 aims to address Nigerian energy poverty community-by-community with solar installations.
To be effective, RA365 needs to convince local policymakers of the potential impact of a microgrid in their community. For help they turned to Omdena. Omdena is a global platform where AI experts and data scientists from diverse backgrounds collaborate to build AI-based solutions to real-world problems. You can learn more here about Omdena’s innovative approach to building AI solutions through global collaboration.
Omdena pulled together a global team of AI experts and data scientists. Working collaboratively from remote locations around the globe, the team set about identifying the regions in Nigeria where the energy poverty crisis is most dire and where solar power is likely to be effective.
To determine which regions don’t have access to electricity, our team looked to satellite imagery for the areas of the country that go completely dark at night. Of those locations, they prioritized communities with large populations that incorporate schools and hospitals. Also the collaborators looked at the distance of those communities from the existing national electricity grid. In reality, if a community is physically far from the existing grid, it’s unlikely to be hooked up anytime soon. In this way, by analyzing the satellite data with population data, the team identified the communities most in crisis.
In any machine learning project, the quality and quantity of relevant data is critical. However, unlike projects done in the lab, the ideal data to solve a real-world problem rarely exists. In this case, available data on the Nigerian population was incomplete and inaccurate. There wasn’t data on access to the national electricity grid. Furthermore, the satellite data couldn’t be relied upon. Given this, the team had to get creative. You can read how our team addressed these data roadblocks in this article from collaborator Simon Mackenizie.
The team built an AI system that identifies regional clusters in Nigeria where renewable energy microgrids are both most viable and likely to have high impact on the community. In addition, an interactive map acts as an interface to the system.
RA365 now has the tools it needs to guide local policymakers towards data-driven decisions about solar power installation. What’s more, they’re sharing the project data with Nigeria Renewable Energy Agency, a major funding source for rural electrification projects across Nigeria.
With this two-month challenge, the Omdena team delivered one of the first real-world machine learning solutions to be deployed in Nigeria. Importantly, our collaborators from around the globe join the growing community of technologists working to solve Nigeria’s toughest issues with AI.
Ademola Eric Adewumi, Founder of Renewable Africa 365, shares his experience working with the Omdena collaborators here. Says Adewumi, “We want to say that Omdena has changed the face of philanthropy by its support in helping people suffering from electrical energy poverty. With this great humanitarian help, RA365 hopes to make its mission a reality, bringing renewable energy to Africa.”
Building AI through global collaboration
Omdena is a global platform where changemakers build ethical and inclusive AI solutions to real-world problems through collaboration.