Creating an Automated Redaction Wizard That Utilizes Optical Character Recognition
50 AI engineers collaborated to create an automated redaction wizard that utilizes optical character recognition, natural language processing, and machine learning algorithms.
Most industries use some amount of redaction, but some use it more than others. The medical field, for example, has requirements underHIPAA to protect personal health information (PHI). When documents are redacted, they can be used or published by a wider audience than originally intended without compromising confidentiality.
Redaction is also commonly used to protect other kinds of personal identifying information (PII) like:
Social security numbers
Driver’s license numbers
Proprietary information or trade secrets
Addresses, dates of birth, and names
Certain information on legal or Medical documents
Performing this process manually is time-consuming not to mention the human error factor that makes this approach inefficient, especially when having a large number of documents involved.
Redactable is an online tool that offers various ways to redact official documents. Search automation, pattern reduction, and manual reduction are some of the options provided by our platform. Our goal is to elevate our users’ experiences and improve our document redaction process by harvesting the power of AI and the advancement in the field of Natural Language Processing.
The project outcomes
The team built a document reduction pipeline that included collecting, processing, and labeling a custom data set, training and testing multiple state-of-the-art NLP models, and building a training pipeline allowing the model to improve its performance over time.
This Challenge has been hosted with our friends at