Skip to content

lucas-levy/representation_learning

Repository files navigation

Representation Learning for Computer Vision

Labs from the Master MVA Representation Learning for Computer Vision class, taught by Pietro Gori and Loïc Le Folgoc. This graduate course is an introduction to representation learning in computer vision and medical imaging applications. It covers topics such as Transfer Learning, Self-Supervised Learning, Vision Transformers, and Explainability in Neural Networks, among others. Each lab assignement explores one of these topics.

TP1. Intriguing Properties

This lab's goal is to reproduce some results of the paper "Intriguing properties of neural networks" (Szegedy et. al., 2014). In particular, we produce adversarial examples: by applying a certain hardly perceptible perturbation to an image, we can cause the network to misclassify the sample. This illustrates the non-smoothness of the representions learned by deep neural networks.

visualisation_tp1

TP2. Domain Adaptation

In this lab, we implement the method of "Unsupervised Visual Domain Adaptation Using Subspace Alignment" (Fernando et. al., 2013) for Unsupervised Domain Adaptation. The goal is to learn a model from a labeled source dataset that generalizes to an unlabeled target dataset whose input distribution differs (covariate shift), while assuming the labeling function remains the same.

TP3. Self-Supervised Learning 1: Rotation Prediction

Self-supervised learning trains models on unlabeled data by creating artificial (pretext) task from the data itself, enabling the model to learn meaningful features without human-provided labels. This lab implement the method proposed in "Unsupervised Representation Learning by Predicting Image Rotations" (Gidaris et. al., 2018), whose pretext task consists in predicting a rotation (0°, 90°, 180°, 270°) that has been applied to an image.

visualisation_tp3

TP4. Self-Supervised Learning 2: Contrastive Learning

In this lab we implement "A Simple Framework for Contrastive Learning of Visual Representations" (Chen et. al. 2020)'s SimCLR method. This method's goal is contrastive learning: training a model to distinguish between similar and dissimilar data points by pulling together representations of positive pairs and separating those of negative pairs, in a self-supervised way.

TP5. Vision Transformers

This lab's goal is to reimplement a Vision Transformer (ViT), an model based on self-attention mechanisms and introduced by "An Image Is Worth 16x16 Words: Transformers for Image Recognition at Scale" (Dosovitskiy et. al., 2021). We implement such a model, and then train and test it on the CIFAR-10 dataset.

visualisation_tp5

TP6. Masked Auto-Encoding

In this lab, we pretrain a ViT in a self-supervised way, using the Masked Auto-Ecoding (MAE) approach, introduced by "Masked Autoencoders Are Scalable Vision Learners" (He et. al., 2021). To do so, we mask random patches of the input images and train an auto-encoder on reconstructing the missing pixels. The ViT encoder has learned efficient representations, and can then be fine-tuned for downstream tasks.

visualisation_tp6

TP7. Variational Auto-Encoder

In this lab, we train a Variational Auto-Encoder (VAE), and compare different VAE models in terms of image generation, reconstruction and disentanglement. In particular, we focus on the $\beta$-VAE, introduced by "beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework" (Higgins et. al., 2017). We explore how the $\beta$ hyperparameter offers a tradeoff between image reconstruction and disentanglement.

TP8. Interpretability

In this last lab, we implement two explainability methods, and visualize them on two datasets from MedMNSIT, a database of biomedical images.

visualisation_tp8

Here, we visualize occlusion maps "Visualizing and Understanding Convolutional Networks" (Zeiler and Fergus, 2014) on a dataset of blood cells images.

Releases

No releases published

Packages

 
 
 

Contributors