Cheat Sheet for different types of Deep Learning problems

The table below summarizes some common problems that deep learning may be used to address

	Binary Classification	Multiclass Classification (labels are integers)	Multiclass Classification (labels are one-hot encoded)	Regression
Purpose	Predicting 2 categorical outcomes	Predicting >2 categorical outcomes	Predicting >2 categorical outcomes	Predicting a continuous outcome
Examples	Email is Spam or Not Spam Someone has a disease or not	A breast cancer ultrasound image is normal (0), benign (1), or malignant (2) A movie is a Documentary (0), RomCom (1), Action (2), or Horror (3)	Song genre is pop (1 0 0), rap (0 1 0), or country (0 0 1) A flower pic is a daisy (1 0 0 0), rose (0 1 0 0), sunflower (0 0 1 0), or tulip (0 0 0 1)	Stock price Hours of sleep per night
Loss Function	Binary Crossentropy	Sparse Categorical Crossentropy	Categorical Crossentropy	Mean Squared Error
Final Layer of NN	Dense(1, activation='sigmoid')	Dense(NUM_CLASSES, activation='softmax')	Dense(NUM_CLASSES, activation='softmax')	Dense(1) # Linear activation, i.e., no activation

Loss Functions

The loss function is a measure of how poorly the model is performing, i.e., the difference between the model's predictions and the actual results.
A smaller loss means the model is performing better (the model has a smaller error).

Activation Functions

Nodes of a neural network (NN) generally employ an activation function. The activation function commonly used in the final (output) layer of a NN is indicated in the table above. But what about for other layers before the final (output) layer of the NN? As a rule of thumb for activation functions, ReLU is commonly used in Dense layers (e.g., in Multi-Layer Perceptrons), ReLU is commonly used in Conv2D layers (e.g., in Convolutional Neural Networks), and tanh is commonly used in SimpleRNN/LSTM/GRU layers (e.g., in Recurrent Neural Networks).