Natural Language Processing for Ethiopian Languages

Project background.

Ethiopia, the oldest independent country in Africa and the only one in the continent with its own alphabet, has a population of almost 120 Million people. Its a land of enormous diversity with more than 80 languages and over 200 dialects. Amharic or Amharigna, is one of the working languages in the country along with Oromigna and Tigrigna.

The rest of the world is rapidly adopting Machine Learning and AI to take advantage of the available language data. Countries, Ethiopia, with low-resource languages remained behind. It’s time for them to catch up. The ability to effectively leverage current language technologies can benefit in a variety of ways such as by increasing literacy, preserving legacy languages, doing large-scale analysis, improving efficiency, etc. There is a better amount of data available on the internet today than ever before, and leveraging it to build useful projects remained a challenge.

Project plan.

  • Week 1

    Initiation, platforming and teaming

  • Week 2

    Data collection stage

  • Week 3

    Data collection and processing

  • Week 4


