{"cells": [{"cell_type": "markdown", "id": "72112122", "metadata": {"papermill": {"duration": 0.004307, "end_time": "2025-04-03T19:33:27.420565", "exception": false, "start_time": "2025-04-03T19:33:27.416258", "status": "completed"}, "tags": []}, "source": ["\n", "# Tutorial 11: Vision Transformers\n", "\n", "* **Author:** Phillip Lippe\n", "* **License:** CC BY-SA\n", "* **Generated:** 2025-04-03T19:33:20.850738\n", "\n", "In this tutorial, we will take a closer look at a recent new trend: Transformers for Computer Vision.\n", "Since [Alexey Dosovitskiy et al.](https://openreview.net/pdf?id=YicbFdNTTy) successfully applied a Transformer on a variety of image recognition benchmarks, there have been an incredible amount of follow-up works showing that CNNs might not be optimal architecture for Computer Vision anymore.\n", "But how do Vision Transformers work exactly, and what benefits and drawbacks do they offer in contrast to CNNs?\n", "We will answer these questions by implementing a Vision Transformer ourselves, and train it on the popular, small dataset CIFAR10.\n", "We will compare these results to popular convolutional architectures such as Inception, ResNet and DenseNet.\n", "This notebook is part of a lecture series on Deep Learning at the University of Amsterdam.\n", "The full list of tutorials can be found at https://uvadlc-notebooks.rtfd.io.\n", "\n", "\n", "---\n", "Open in [{height=\"20px\" width=\"117px\"}](https://colab.research.google.com/github/PytorchLightning/lightning-tutorials/blob/publication/.notebooks/course_UvA-DL/11-vision-transformer.ipynb)\n", "\n", "Give us a \u2b50 [on Github](https://www.github.com/Lightning-AI/lightning/)\n", "| Check out [the documentation](https://lightning.ai/docs/)\n", "| Join us [on Discord](https://discord.com/invite/tfXFetEZxv)"]}, {"cell_type": "markdown", "id": "51e4452f", "metadata": {"papermill": {"duration": 0.003333, "end_time": "2025-04-03T19:33:27.427428", "exception": false, "start_time": "2025-04-03T19:33:27.424095", "status": "completed"}, "tags": []}, "source": ["## Setup\n", "This notebook requires some packages besides pytorch-lightning."]}, {"cell_type": "code", "execution_count": 1, "id": "b9365690", "metadata": {"colab": {}, "colab_type": "code", "execution": {"iopub.execute_input": "2025-04-03T19:33:27.435402Z", "iopub.status.busy": "2025-04-03T19:33:27.435096Z", "iopub.status.idle": "2025-04-03T19:33:28.626249Z", "shell.execute_reply": "2025-04-03T19:33:28.624897Z"}, "id": "LfrJLKPFyhsK", "lines_to_next_cell": 0, "papermill": {"duration": 1.198141, "end_time": "2025-04-03T19:33:28.628868", "exception": false, "start_time": "2025-04-03T19:33:27.430727", "status": "completed"}, "tags": []}, "outputs": [{"name": "stdout", "output_type": "stream", "text": ["\u001b[33mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable.It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning.\u001b[0m\u001b[33m\r\n", "\u001b[0m"]}, {"name": "stdout", "output_type": "stream", "text": ["\r\n", "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m24.2\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m25.0.1\u001b[0m\r\n", "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpython -m pip install --upgrade pip\u001b[0m\r\n"]}], "source": ["! pip install --quiet \"torchmetrics >=1.0,<1.8\" \"torch >=1.8.1,<2.7\" \"torchvision\" \"pytorch-lightning >=2.0,<2.6\" \"seaborn\" \"numpy <3.0\" \"matplotlib\" \"tensorboard\""]}, {"cell_type": "markdown", "id": "92b433e5", "metadata": {"papermill": {"duration": 0.007721, "end_time": "2025-04-03T19:33:28.644847", "exception": false, "start_time": "2025-04-03T19:33:28.637126", "status": "completed"}, "tags": []}, "source": ["