Labs from the Master MVA Representation Learning for Computer Vision class, taught by Pietro Gori and Loïc Le Folgoc. This graduate course is an introduction to representation learning in computer vision and medical imaging applications. It covers topics such as Transfer Learning, Self-Supervised Learning, Vision Transformers, and Explainability in Neural Networks, among others. Each lab assignement explores one of these topics.
This lab's goal is to reproduce some results of the paper "Intriguing properties of neural networks" (Szegedy et. al., 2014). In particular, we produce adversarial examples: by applying a certain hardly perceptible perturbation to an image, we can cause the network to misclassify the sample. This illustrates the non-smoothness of the representions learned by deep neural networks.
In this lab, we implement the method of "Unsupervised Visual Domain Adaptation Using Subspace Alignment" (Fernando et. al., 2013) for Unsupervised Domain Adaptation. The goal is to learn a model from a labeled source dataset that generalizes to an unlabeled target dataset whose input distribution differs (covariate shift), while assuming the labeling function remains the same.
Self-supervised learning trains models on unlabeled data by creating artificial (pretext) task from the data itself, enabling the model to learn meaningful features without human-provided labels. This lab implement the method proposed in "Unsupervised Representation Learning by Predicting Image Rotations" (Gidaris et. al., 2018), whose pretext task consists in predicting a rotation (0°, 90°, 180°, 270°) that has been applied to an image.
In this lab we implement "A Simple Framework for Contrastive Learning of Visual Representations" (Chen et. al. 2020)'s SimCLR method. This method's goal is contrastive learning: training a model to distinguish between similar and dissimilar data points by pulling together representations of positive pairs and separating those of negative pairs, in a self-supervised way.
This lab's goal is to reimplement a Vision Transformer (ViT), an model based on self-attention mechanisms and introduced by "An Image Is Worth 16x16 Words: Transformers for Image Recognition at Scale" (Dosovitskiy et. al., 2021). We implement such a model, and then train and test it on the CIFAR-10 dataset.
In this lab, we pretrain a ViT in a self-supervised way, using the Masked Auto-Ecoding (MAE) approach, introduced by "Masked Autoencoders Are Scalable Vision Learners" (He et. al., 2021). To do so, we mask random patches of the input images and train an auto-encoder on reconstructing the missing pixels. The ViT encoder has learned efficient representations, and can then be fine-tuned for downstream tasks.
In this lab, we train a Variational Auto-Encoder (VAE), and compare different VAE models in terms of image generation, reconstruction and disentanglement.
In particular, we focus on the
In this last lab, we implement two explainability methods, and visualize them on two datasets from MedMNSIT, a database of biomedical images.
Here, we visualize occlusion maps "Visualizing and Understanding Convolutional Networks" (Zeiler and Fergus, 2014) on a dataset of blood cells images.




