Cheat Sheet for different types of Deep Learning problems

The table below summarizes some common problems that deep learning may be used to address

Binary Classification Multiclass Classification
(labels are integers)
Multiclass Classification
(labels are one-hot encoded)
Regression
Purpose Predicting 2 categorical outcomes Predicting >2 categorical outcomes Predicting >2 categorical outcomes Predicting a continuous outcome
Examples Email is Spam or Not Spam

Someone has a disease or not
A breast cancer ultrasound image is normal (0), benign (1), or malignant (2)

A movie is a Documentary (0), RomCom (1), Action (2), or Horror (3)
Song genre is pop (1 0 0), rap (0 1 0), or country (0 0 1)

A flower pic is a daisy (1 0 0 0), rose (0 1 0 0), sunflower (0 0 1 0), or tulip (0 0 0 1)
Stock price

Hours of sleep per night
Loss Function Binary Crossentropy Sparse Categorical Crossentropy Categorical Crossentropy Mean Squared Error
Final Layer of NN Dense(1, activation='sigmoid') Dense(NUM_CLASSES, activation='softmax') Dense(NUM_CLASSES, activation='softmax') Dense(1) # Linear activation, i.e., no activation

Loss Functions

The loss function is a measure of how poorly the model is performing, i.e., the difference between the model's predictions and the actual results.
A smaller loss means the model is performing better (the model has a smaller error).

Activation Functions

Nodes of a neural network (NN) generally employ an activation function. The activation function commonly used in the final (output) layer of a NN is indicated in the table above. But what about for other layers before the final (output) layer of the NN? As a rule of thumb for activation functions, ReLU is commonly used in Dense layers (e.g., in Multi-Layer Perceptrons), ReLU is commonly used in Conv2D layers (e.g., in Convolutional Neural Networks), and tanh is commonly used in SimpleRNN/LSTM/GRU layers (e.g., in Recurrent Neural Networks).