Creating an Automated Redaction Wizard That Utilizes Optical Character Recognition

Creating an Automated Redaction Wizard That Utilizes Optical Character Recognition

50 AI engineers collaborated to create an automated redaction wizard that utilizes optical character recognition, natural language processing, and machine learning algorithms.

 

The problem

Most industries use some amount of redaction, but some use it more than others. The medical field, for example, has requirements under HIPAA to protect personal health information (PHI). When documents are redacted, they can be used or published by a wider audience than originally intended without compromising confidentiality.

Redaction is also commonly used to protect other kinds of personal identifying information (PII) like:

  • Social security numbers
  • Driver’s license numbers
  • Financial details
  • Proprietary information or trade secrets
  • Addresses, dates of birth, and names
  • Certain information on legal or Medical documents

 

Performing this process manually is time-consuming not to mention the human error factor that makes this approach inefficient, especially when having a large number of documents involved.

Redactable is an online tool that offers various ways to redact official documents. Search automation, pattern reduction, and manual reduction are some of the options provided by our platform. Our goal is to elevate our users’ experiences and improve our document redaction process by harvesting the power of AI and the advancement in the field of Natural Language Processing.

 

The project outcomes

The team built a document reduction pipeline that included collecting, processing, and labeling a custom data set, training and testing multiple state-of-the-art NLP models, and building a training pipeline allowing the model to improve its performance over time.

 

An AI-Driven Chatbot for Refugees Helplines

An AI-Driven Chatbot for Refugees Helplines

In this 8-week challenge, a global team of 50 AI engineers developed a chatbot to support refugees helpline.

 

The problem

The integration process for immigrants in a new country is not an easy task. Even with hard work and the desire to fit in, immigrants may face many challenges, including accessing basic information or understanding some of their basic rights in the country of residence.

With the help of Omdena Collaborators, Art For Impact wants to take a step further and utilize cutting-edge technology in building a bridge of understanding between cultures, combating hatred, showing the world that immigrants are contributing to their host communities, and allowing them to prosper. Currently, most of the conversations are being carried out by human agents and others by simple multiple option lists. The majority of the questions are repetitive. The goal is to reduce the time agents spend on simple and repetitive questions by building an intelligent Chatbot, able to hold intuitive and natural conversions to provide the services and answer those questions.

 

The project outcomes

Starting with data collection and augmentation, the team applied exploratory data analysis (EDA) to gain first insights into the data. Various EDA approaches (word clouds, topic modeling, knowledge graphs, etc.) were helpful in identifying relevant chatbot intents, entities, and responses. 

Next, the team built NLU (natural language understanding) and NLG (natural language generation) models in Rasa. In order to handle emergency cases, the models identify utterances and messages related to emergencies and group them by intent. Keywords are highlighted in emergency messages as entities. Some stories were developed as conversation archetypes

Finally, the application was deployed using Docker. 

 

Partner testimonial

 

 

 

Detecting Hate Speech in Tamil Language using Natural Language Processing

Detecting Hate Speech in Tamil Language using Natural Language Processing

A global team of 50 AI changemakers in this high-impact 2-month challenge are working to detect hate speech in Tamil language. There is also a plan to take lessons learned from this initiative to develop a similar tool for Sinhala language. The partner for this challenge is social enterprise DreamSpace Academy (DSA). The Challenge is supported by the NYU Center on International Cooperation and the Netherlands Ministry of Foreign Affairs.

This challenge requires experience in Data Analysis and NLP.

 

The problem areas

You will focus on the following hate-speech related categories:

  • Community-based hate speech
  • Religion-based hate speech
  • Gender-based hate speech
  • Political hate speech

 

The project goals 

The data

  • Social media + web platforms.
  • DreamSpace Academy has already built a Lexicon-based model for certain hate speech words. DSA identified certain social media pages that spread hate speech very often. Data will be scraped from those pages.

 

The model

The envisioned outcome is to build an AI-enabled system that prepares graphs and statistics for lexicon reports on hate speech used in Sri Lanka in the Tamil Language.

The deliverables are listed as follows:

  • An AI model to detect hate speech in the Tamil language.
  • A classification into religion, gender, community, and political based hate speech.
  • The possibility of retraining the model (and take lessons learned from this initiative to develop a similar tool for Sinhala language).

 

Why join? The uniqueness of Omdena AI Challenges

A collaborative experience you never had in your working life! For the next eight weeks, you will not only build AI solutions to make a real-world impact but also go through an entire data science project lifecycle. This covers problem scoping, data collection, and preparation, as well as modeling for deployment.

And the best part is that you will join a global and collaborative team of changemakers. Omdena AI Challenges are not a competition or hackathon but a real-world project that will take your experience of what is possible through collaboration to a new level.

Find more information on how an Omdena project works

 

Improving Transparency and Democratizing Access to Public Sector Contracts using NLP

Improving Transparency and Democratizing Access to Public Sector Contracts using NLP

In a two-month challenge, a global team of 50 Omdena’s collaborators contributed to help democratize access to contract opportunities buried within public government documents and improve public sector transparency.

 

The problem

There are a lot of signals that indicate upcoming opportunities hidden within the public government documents. Early notification is seen as key to giving companies without an existing relationship (and the related insider info) an opportunity to participate in these contracts. The ultimate desire is for better communities in which to raise our families and build our businesses by improving a broken procurement process where incumbents win the majority of the contracts.

In order to never miss an opportunity about upcoming contract opportunities, policy changes, or any important local government topics, Ontopical combs publicly available information from government departments and then processes and analyzes text and transcripts to deliver unique insights. The bad news is that sorting through that online information isn’t easy. There is no one place to collect relevant information.

 

The project outcomes

Ontopical has gathered years of Municipal Council meeting agendas, minutes, and videos. Using these council meeting minutes, agendas & video transcriptions as well as data from Request for Proposal (RFP) posting sites, as training data, train a model to recognize upcoming business opportunities in meeting agendas, minutes, and videos LONG BEFORE any RFP is posted. This will help democratize access to upcoming contract opportunities by providing all with the advance notice required to submit a strong proposal. This will greatly help to make the process of submitting proposals in response to RFPs more transparent.

 

Leverage AI to Crowdsource the Mapping of the Datasphere

Leverage AI to Crowdsource the Mapping of the Datasphere

In this 8-weeks challenge, you will join 50 AI changemakers to identify possible maps of the Datasphere and crowdsource datasets that can be used to model different visualization tools from the perspective of individuals and/or organizations. 

 

The problem

Data increasingly underpins and reflects most human activities. The volume of data, be it personal or non-personal, collected and produced, stored, utilized, and in transit, grows at an accelerated pace. The Datasphere is the notional space where all of this digital data exists.

How we collectively govern the Datasphere will strongly determine the future of human society in the 21st century and our capacity to deal with major global challenges such as health, energy, climate change, and food security. Unfortunately, persistent misuse of data and general mistrust among governments, companies, and civil society hamper the search for cooperative solutions for data governance. The rising tensions call for a bold paradigm shift on data governance models and for innovative mechanisms for perceiving and visualizing the Datasphere.

The Datasphere Initiative seeks to bring a new, holistic and positive approach to the governance of the Datasphere. It aims to be a platform to improve coordination and accelerate the adoption of concrete proposals to overcome the current tensions and polarization around data. It will do so by raising awareness, producing evidence-based analysis, and catalyzing human-centric technical, policy, and institutional innovations. The vision of the Datasphere Initiative is a Datasphere that fosters trust, prosperity, sustainability, and well-being for all.

 

The project goals

This project seeks to develop an AI system to enable the visualization of the Datasphere as a whole and its different dimensions, building on datasets of personal and/or non-personal data. The ultimate objective is to make the Datasphere tangible for users and decision-makers and generate an emotional reaction to catalyze a rethinking of how the Datasphere could be reclaimed and governed. 

This project can be broken down into 4 steps:

  • Step 1: Identify relevant datasets stemming from individuals or organizations, public or private
  • Step 2: Automate such data collection for mapping and cartography the Datasphere, enabling predictive modeling on how the Datasphere will continue to expand.
  • Step 3: Develop an interactive dashboard where the Datasphere can be visualized in its different layers and segments of different Dataspheres (by user profile or geography; company type or sector; theme)
  • Step 4: Explore the idea of building an AI system to personalize the individual’s current position or “journey through the Datasphere” (e.g. web-based, browser ad-on, etc), while maintaining the notion of connectedness to a broader Datasphere. 

 

 

Why join? The uniqueness of Omdena AI Challenges

A collaborative experience you never had in your working life! For the next eight weeks, you will not only build AI solutions to make a real-world impact but also go through an entire data science project lifecycle. This covers problem scoping, data collection, and preparation, as well as modeling for deployment.

And the best part is that you will join a global and collaborative team of changemakers. Omdena AI Challenges are not a competition or hackathon but a real-world project that will take your experience of what is possible through collaboration to a new level.

Find more information on how an Omdena project works