Deep Learning/Fundamentals

Deep Learning Roadmap

투푸월드 2023. 7. 4. 17:30

My own deep learning mastery roadmap, inspired by Deep Learning Papers Reading Roadmap.

There are some customized differences:

  • not only academic papers but also blog posts, online courses, and other references are included
  • customized for my own plans - may not include RL, NLP, etc.
  • updated for 2019 SOTA

Introductory Courses

Basic CNN Architectures

  •  AlexNet (2012) [paper]
    • Alex Krizhevsky et al. "ImageNet Classification with Deep Convolutional Neural Networks"
  •  ZFNet (2013) [paper]
    • Zeiler et al. "Visualizing and Understanding Convolutional Networks"
  •  VGG (2014)
    • Simonyan et al. "Very Deep Convolutional Networks for Large-Scale Image Recognition" (2014) [Google DeepMind & Oxford's Visual Geometry Group (VGG)] [paper]
    • VGG-16: Zhang et al. "Accelerating Very Deep Convolutional Networks for Classification and Detection" [paper]
  •  GoogLeNet, a.k.a Inception v.1 (2014) [paper]
    • Szegedy et al. "Going Deeper with Convolutions" [Google]
    • Original LeNet page from Yann LeCun's homepage.
    •  Inception v.2 and v.3 (2015) Szegedy et al. "Rethinking the Inception Architecture for Computer Vision" [paper]
    •  Inception v.4 and InceptionResNet (2016) Szegedy et al. "Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning" [paper]
    • "A Simple Guide to the Versions of the Inception Network" [blogpost]
  •  ResNet (2015) [paper]
    • He et al. "Deep Residual Learning for Image Recognition"
  •  Xception (2016) [paper]
    • Chollet, Francois - "Xception: Deep Learning with Depthwise Separable Convolutions"
  •  MobileNet (2016) [paper]
    • Howard et al. "MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications"
    • A nice paper about reducing CNN parameter sizes while maintaining performance.
  •  DenseNet (2016) [paper]
    • Huang et al. "Densely Connected Convolutional Networks"

Generative adversarial networks

  •  GAN (2014.6) [paper]
    • Goodfellow et al. "Generative Adversarial Networks"
  •  DCGAN (2015.11) [paper]
    • Radford et al. "Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks"
  •  Info GAN (2016.6) [paper]
    • Chen et al. "InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets"
  •  Improved Techinques for Training GANs (2016.6) [paper]
    • Salimans et al. "Improved Techinques for Training GANs"
    • This paper suggests multiple GAN training techinques such as feautre matching, minibatch discrimination, one sided label smoothing, virtual batch normalization.
    • It also suggests a renown generator performance metric, called the inception score.
  •  f-GAN (2016.6) [paper]
    • Nowozin et al. "f-GAN: Training Generative Neural Samplers using Variational Divergence Minimization"
  •  Unrolled GAN (2016.7) [paper]
    • Metz et al. "Unrolled Generative Adversarial Networks"
  •  ACGAN (2016.10) [paper]
    • Odena et al. "Conditional Image Synthesis With Auxiliary Classifier GANs"
  •  LSGAN (2016.11) [paper]
    • Mao et al. "Least Squares Generative Adversarial Networks"
  •  Pix2Pix (2016.11) [paper]
    • Isola et al. "Image-to-Image Translation with Conditional Adversarial Networks"
  •  EBGAN (2016.11) [paper]
    • Zhao et al. "Energy-based Generative Adversarial Network"
  •  WGAN (2017.4) [paper]
    • Arjovsky et al., "Wasserstein GAN"
  •  WGAN_GP (2017.5) [paper]
    • Gulrajani et al., "Improved Training of Wasserstein GANs"
    • Improves the training stability by applying "gradient penalty (GP)" to the loss function
  •  BEGAN (2017.5) [paper]
    • Berthelot et al. "BEGAN: Boundary Equilibrium Generative Adversarial Networks"
    • Introduces a diversity ratio, or an equilibrium constant that controls the variety - quality tradeoff, and also proposes a convergence measure using it.
  •  CycleGAN (2017.5) [paper]
    •  DiscoGAN (2017.5) [paper]
    • DiscoGAN and CycleGAN proposes the EXACT SAME learning techniques for style transfer task using GAN, developed independently at the same time.
  •  Frechet Inception Distance (FID) (2017.6) [paper]
    • Heusel et al. "GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium"
    • The paper's main contribution is a technique called Two Time-Scale Update Rule (TTSU), but it is mostly known for the distance metric called Frechet Inception Distance that measures the distance between two distributions of activation values.
  •  ProGAN (2017.10) [paper]
    • Karras et al. "Progressive Growing of GANs for Improved Quality, Stability, and Variation"
  •  PacGAN (2017.12) [paper]
    • Higgins et al. "PacGAN: The power of two samples in generative adversarial networks"
  •  BigGAN (2018) [paper]
  •  GauGAN (2019.3) [paper]
    • Park et al. "Semantic Image Synthesis with Spatially-Adaptive Normalization"

Advanced GANs

  •  DRAGAN (2017.5) [paper]
    • Kodali et al. "On Convergence and Stability of GANs"
  •  Are GANs Created Equal? (2017.11) [paper]
    • Lucic et al. "Are GANs Created Equal? A Large-Scale Study"
  •  SGAN (2017.12) [paper]
    • Chavdarova et al. "SGAN: An Alternative Training of Generative Adversarial Networks"
  •  MaskGAN (2018.1) [paper]
    • Fedus et al. "MaskGAN: Better Text Generation via Filling in the _____"
  •  Spectral Normalization (2018.2) [paper]
    • Miyato et al. "Spectral Normalization for Generative Adversarial Networks"
  •  SAGAN (2018.5) [paper] [tensorflow]
    • Zhang et al. "Self-Attention Generative Adversarial Networks"
  •  Unusual Effectiveness of Averaging in GAN Training (2018) [paper]
    • "Benefitting from training on past snapshots."
    • Uses exponential moving averaging (EMA)
  •  Disconnected Manifold Learning (2018.6) [paper]
    • Khayatkhoei, et al. "Disconnected Manifold Learning for Generative Adversarial Networks"
  •  A Note on the Inception Score (2018.6) [paper]
    • Barratt et al., "A Note on the Inception Score"
  •  Which Training Methods for GAN do actually converge? (2018.7) [paper]
    • Mescheder et al., "Which Training Methods for GANs do actually Converge?"
  •  GAN Dissection (2018.11) [paper]
    • Bau et al. "GAN Dissection: Visualizing and Understanding Generative Adversarial Networks"
  •  Improving Generalization and Stability for GANs (2019.2) [paper]
    • Thanh-Tung et al., "Improving Generalization and Stability of Generative Adversarial Networks"
  •  Augustus Odena - "Open Questions about GANs" (2019.4) [distill.pub]
    • Very nice article about current state of GAN research and discusses problems yet to be solved.

Autoencoders

  •  Original autoencoder (1986) [paper]
    • Rumelhart, Hinton, and Williams, "Learning Internal Representations by Error Propagation"
  •  AutoEncoder [science]
    • Hinton et al., "Reducing the Dimensionality of Data with Neural Networks"
  •  Denoising Autoencoders (2008) [paper]
    • Vincent et al. "Extracting and Composing Robust Features with Denoising Autoencoders"
  •  Wasserstein Autoencoder (2017) [paper]
    • Tolstikhin et al. "Wasserstein Auto Encoders"

Autoregressive models

  •  PixelCNN (2016) [paper]
    • van den Oord et al. "Conditional image generation with PixelCNN decoders."
  •  WaveNet (2016) [paper]
    • van den Oord et al. "WaveNet: A Generative Model for Raw Audio"
  •  tacotron?

Layer Normalizations

  •  Batch Normalization (2015.2) [paper]
    • Ioeffe et al. "Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift"
  • Group Norm
  •  Instance Normalization (2016.7) [paper]
    • Ulyanov et al. "Instance Normalization: The Missing Ingredient for Fast Stylization"
  •  Santurkar et al. "How does Batch Normalization help Optimization?" (2018.5) [paper]
  •  Switchable Normalization (2019) [paper]
    • Luo et al. "Differentiable Learning-to-Normalize via Switchable Normalization"
  •  Weight Standardization (2019.3) [paper]
    • Qiao et al. "Weight Standardization"

Initializations

  •  Xavier Initialization (2010) [paper]
    • Glorot et al., "Understanding the difficulty of training deep feedforward neural networks"
  •  Kaiming (He) Initialization (2015.2) [paper]
    • He et al., "Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification"
  •  All you need is a good init (2015.11) [paper]
    • Mishkin et al., "All you need is a good init"
  •  All you need is beyond a good init (2017.4) [paper]
    • Xie et al. "All You Need is Beyond a Good Init: Exploring Better Solution for Training Extremely Deep Convolutional Neural Networks with Orthonormality and Modulation"

Dropouts

  • Dropout (2014) [paper]
    • Srivastava et al. "Dropout: A Simple Way to Prevent Neural Networks from Overfitting"
  • Inverted Dropouts [notes on CS231n]
    • Multiplying the inverted keep_prob value on training so that values during inference (or testing) is consistent.
  •  Li et al., "Understanding the Disharmony between Dropout and Batch Normalization by Variance Shift" (2018.1) [paper]

Meta-Learning / Representation Learning (Zero-Shot learning, Few-Shot learning)

  •  Zero-Data Learning (2008) [paper]
    • Larochelle et al., "Zero-data Learning of New Tasks"
  •  Palatucci et al., "Zero-shot Learning with Semantic Output Codes" (NIPS 2009) [paper]
  •  Socher et al., "Zero-Shot Learning Through Cross-Modal Transfer" (2013.1) [paper]
  •  Lampert et al., "Attribute-Based Classification for Zero-Shot Visual Object Categorization" (2013.7) [paper]
  •  Dinu et al., "Improving zero-shot learning by mitigating the hubness problem" (2014.12) [paper]
  •  Romera-Paredes et al. - "An embarrassingly simple approach to zero-shot learning" (2015) [paper]
  •  Prototypical Networks (2017.3) [paper]
    • Snell et al., "Prototypical Networks for Few-shot Learning"
  •  Zero-shot learning - the Good, the Bad and the Ugly" (2017.3) [paper]
    • Xian et al., "Zero-Shot Learning - The Good, the Bad and the Ugly"
  •  In defence of the Triplet Loss (2017.3) [paper]
    • Hermans et al., "In Defense of the Triplet Loss for Person Re-Identification"
  •  MAML (2017.3) [paper]
    • Finn et al, "Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks"
  •  Triplet Loss and Online Triplet Mining in Tensorflow (2018.3) [Oliver Moindrot Blog]
  •  Few-Shot learning Survey (2019.4) [paper]
    • Wang et al. "Few-shot Learning: A Survey"

Transfer learning

  •  Survey 2018 (2018) [paper]
    • Tan et al. "A Survey on Deep Transfer Learning"

Geometric learning

  • Geometric Deep Learning (2016) [paper]
    • Bronstein et al. "Geometric deep learning: going beyond Euclidean data"

Variational Autoencoders (VAE)

  •  VQ-VAE (2017.11) [paper]
    • van den Oord et al., "Neural Discrete Representation Learning"
  •  Semi-Amortized Variational Autoencoders (2018.2) [paper]
    • Kim et al. "Semi-Amortized Variational Autoencoders"

Object detection

Semantic Segmentation

Sequential Model

  •  Seq2Seq (2014) [paper]
    • Sutskever et al. "Sequence to sequence learning with neural networks."

Neural Turing Machine

  •  Neural Turing Machines (2014) [paper]
    • Graves et al., "Neural turing machines."
  •  Pointer Networks (2015) [paper]]
    • Vinyals et al., "Pointer networks."

Attention / Question-Answering

  •  NMT (Neural Machine Translation) (2014) [paper]
    • Bahdanau et al, "Neural Machine Translation by Jointly Learning to Align and Translate"
  •  Stanford Attentive Reader (2016.6) [paper]
    • Chen et al. "A Thorough Examination of the CNN/Daily Mail Reading Comprehension Task"
  •  BiDAF (2016.11) [paper]
    • Seo et al. "Bidirectional Attention Flow for Machine Comprehension"
  •  DrQA or Stanford Attentive Reader++ (2017.3) [paper]
    • Chen et al. "Reading Wikipedia to Answer Open-Domain Questions"
  •  Transformer (2017.8) [paper] [google ai blog]
    • Vaswani et al. "Attention is all you need"
  •  [read] Lilian Weng - "Attention? Attention!" (2018) [blog_post]
    • A nice explanation of attention mechanism and its concepts.
  •  BERT (2018.10) [paper]
    • Devlin et al., "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding"
  •  GPT-2 (2019) [paper (pdf)]
    • Radford et al. "Language Models are Unsupervised Multitask Learners"

Advanced RNNs

Model Compression

  • MobileNet (2016) (see above: Basic CNN Architectures)
  •  ShuffleNet (2017)
    • Zhang et al. "ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices"

Neural Processes

  •  Neural Processes (2018) [paper]
    • Garnelo et al. "Neural Processes"
  •  Attentive Neural Processes (2019) [paper]
    • Kim et al. "Attentive Neural Processes"
  •  A Visual Exploration of Gaussian Processes (2019) [Distill.pub]
    • Not a neural process, but gives very nice intuition about Gaussian Processes. Good Read.

Self-supervised learning

Data Augmentation

  •  Shake Shake Regularization (2017.5) [paper]
    • Gastaldi, Xavier - "Shake-Shake Regularization"

Interpretation and Theory on Generalization, Overfitting, and Learning Capacity

  •  MDL (Minimum Description Length)
    • Peter Grunwald - "A tutorial introduction to the minimum description length principle" (2004) [paper]
  •  Grunwald et al., - "Shannon Information and Kolmogorov Complexity" (2010) [paper]
  •  Dauphin et al. "Identifying and attacking the saddle point problem in high-dimensional non-convex optimization" (2014.6) [paper]
  •  Choromanska et al. "The Loss Surfaces of Multilayer Networks" (2014.11) [paper]
    • argues that non-convexity in NNs are not a huge problem
  •  Knowledge Distillation (2015.3) [paper]
    • Hinton et al., "Distilling the Knowledge in a Neural Network"
  •  3-Part Learning Theory by Mostafa Samir
  •  Deconvolution and Checkerboard Artifacts - Odena (2016) [distill.pub article]
  •  Keskar et al. "On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima" (2016.9) [paper]
  •  Rethinking Generalization (2016.11) [paper]
    • Zhang et al. "Understanding deep learning requires rethinking generalization"
  •  Information Bottleneck (2017) [paper] [original paper on information bottleneck (2000)] [youtube-talk] [article in quantamagazine]
    • Shwartz-Ziv and Tishby, "Opening the Black Box of Deep Neural Networks via Information"
  •  Neyshabur et al, "Exploring Generalization in Deep Learning" (2017.7) [paper]
  •  Sun et al., "Revisiting Unreasonable Effectiveness of Data in Deep Learning Era" (2017.7) [paper]
  •  Super-Convergence (2017.8) [paper]
    • Smith et al. - "Super-Convergence: Very Fast Training of Neural Networks Using Large Learning Rates"
  •  Don't Decay the Learning Rate, Increase the Batch Size (2017.11) [paper]
    • Smith et al. "Don't Decay the Learning Rate, Increase the Batch Size"
  •  Hestness et al. "Deep Learning Scaling is Predictable, Empirically" (2017.12) [paper]
  •  Visualizing loss landscape of neural nets (2018) [paper]
  •  Olson et al., "Modern Neural Networks Generalize on Small Data Sets" (NeurIPS 2018) [paper]
  •  Lottery Ticket Hypothesis (2018.3) [paper]
    • Frankle et al., "The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks"
    • Empirically showed that zeroing small weights after training, rewinding except zeroed wegiths, and then re-triaining with 'pruned' weights showed even better results.
  •  Intrinsic Dimension (2018.4) [paper]
    • Li et al., "Measuring the Intrinsic Dimension of Objective Landscapes"
  •  Geirhos et al. "ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness" (2018.11) [paper]
  •  Belkin et al. "Reconciling modern machine learning and the bias-variance trade-off" (2018.12) [paper]
  •  Graetz - "How to visualize convolution features in 40 lines of code" (2019) [medium]
  •  Geiger et al. "Scaling description of generalization with number of parameters in deep learning" (2019.1) [paper]
  •  Are all layers created equal? (2019.2) [paper]
    • Zhang et al. "Are all layers created equal?"
  •  Lilian Weng - "Are Deep Neural Networks Dramatically Overfitted?" (2019.4) [lil'log]
    • Excellent article about generalization and overfitting of deep neural networks

Adversarial Attacks and Defense against attacks (RobustML)

  • RobustML site
  •  Adversarial Examples Szegedy et al. - Intreguing Properties of Neural Networks (2013.12) [paper]
    • induces missclassification by applying small perturbations
    • this paper was the first to coin the term "Adversarial Example"
  •  Fast Gradient Sign Attack (FGSM) (2014.12)
    • Goodfellow et al., "Explaining and Harnessing Adversarial Examples" (ICLR 2015) [paper]
    • This paper presented the famous "panda example" (as also seen in pytorch tutorial)
  •  Kurakin et al., "Adversarial Machine Learning at Scale" (2016.11) [paper]
  •  Mandry et al., "Towards Deep Learning Models Resistant to Adversarial Attacks" (2017.6) [paper]
  •  Carlini et al., "Audio Adversarial Examples: Targeted Attacks on Speech-to-Text" (2018.1) [paper]

Neural architecture search (NAS) and AutoML

  • GREAT AutoML Website [site]
    • They maintain a blog, a list of NAS literatures, analysis page, and a web book.
  •  AdaNet (2016.7) [paper] [GoogleAI blog]
    • Cortes et al. "AdaNet: Adaptive Structural Learning of Artificial Neural Networks"
  •  NAS (2016.12) [paper]
    • Zoph et al. "Neural Architecture Search with Reinforcement Learning"
  •  PNAS (2017.12) [paper]
    • Liu et al. "Progressive Neural Architecture Search"
  •  ENAS (2018.2) [paper]
    • Pham et al. "Efficient Neural Architecture Search via Parameter Sharing"
  •  DARTS (2018.6) [paper]
    • Liu et al. "DARTS: Differentiable Architecture Search"
    • Uses a continuous relaxation over the discrete neural architecture space.
  •  RandWire (2019) [paper]
    • Xie et al. "Exploring Randomly Wired Neural Networks for Image Recognition" [Facebook AI Research]
  •  A Survey on Neural Architecture Search (2019) [paper]
    • Witsuba et al., "A Survey on Neural Architecture Search"

Practical Techniques

DL roadmap reference

Theory

Resources

  • A Selective Overview of Deep Learning (2019) [paper]
    • Fan et al. "A Selective Overview of Deep Learning"
    • A nice overview paper on deep learning up to early 2019 (about 30 pages)