Omdena Academy Courses

Understanding Vision Transformers

December 29, 2023

Omdena Course Featured Image

For whom is this course?

Transformers are now considered state-of-the-art in sequence modeling tasks. However, recent works on transformers employing self-attention and the global attention paradigm have shown remarkable performance in multiple computer vision tasks. In this course, we will explore the baseline vision transformer models and observe their performance on remote sensing image classification.

What will you learn?

A good understanding of vision transformers
How to deploy transformer models for remote sensing image classification
Good knowledge of model implementation in PyTorch

Prerequisites

Python basics
Pytorch
Deep Learning basics
Linear Algebra basics

Syllabus

Session 1: Understanding Vision Transformers (3 hours)

What is a vision transformer?
The overall structure of a vision transformer
Components of a vision transformer

Session 2: The Attention mechanism (2 hours)

What is self-attention?
Role of global attention in transformers
How attention is computed

Session 3: Using vision transformers for remote sensing image classification (8 hours)

Study the impact of data augmentation strategies
Understand the relation of network depth and transformer performance
Impact of changing the image size on model accuracy

Session 4: Vision Transformers vs CNNs (2 hours)

What is Inductive Bias
Understand the impact of Field of View
The difference in data and memory requirements

Instructors

media card

Umaima Rahman

LinkedIn Profile

media card

Rasha Salim

LinkedIn Profile

Course Info

Certificateyes

Duration15 hours

Start DateJuly 25, 2022

Last Registration DateJuly 20, 2022

No of Students40

Skill Levelintermediate

View more Courses

media card

View all courses from Omdena Academy Go Back