Binary Classification | Multiclass Classification (labels are integers) |
Multiclass Classification (labels are one-hot encoded) |
Regression | |
---|---|---|---|---|
Purpose | Predicting 2 categorical outcomes | Predicting >2 categorical outcomes | Predicting >2 categorical outcomes | Predicting a continuous outcome |
Examples | Email is Spam or Not Spam Someone has a disease or not |
A breast cancer ultrasound image is normal (0), benign (1), or malignant (2) A movie is a Documentary (0), RomCom (1), Action (2), or Horror (3) |
Song genre is pop (1 0 0), rap (0 1 0), or country (0 0 1) A flower pic is a daisy (1 0 0 0), rose (0 1 0 0), sunflower (0 0 1 0), or tulip (0 0 0 1) |
Stock price Hours of sleep per night |
Loss Function | Binary Crossentropy | Sparse Categorical Crossentropy | Categorical Crossentropy | Mean Squared Error |
Final Layer of NN | Dense(1, activation='sigmoid') | Dense(NUM_CLASSES, activation='softmax') | Dense(NUM_CLASSES, activation='softmax') | Dense(1) # Linear activation, i.e., no activation |
The loss function is a measure of how poorly the
model is performing, i.e., the difference between the model's
predictions and the actual results.
A smaller loss
means the model is performing better (the model has a smaller error).
Nodes of a neural network (NN) generally employ an activation function. The activation function commonly used in the final (output) layer of a NN is indicated in the table above. But what about for other layers before the final (output) layer of the NN? As a rule of thumb for activation functions, ReLU is commonly used in Dense layers (e.g., in Multi-Layer Perceptrons), ReLU is commonly used in Conv2D layers (e.g., in Convolutional Neural Networks), and tanh is commonly used in SimpleRNN/LSTM/GRU layers (e.g., in Recurrent Neural Networks).