Classification
Classification is a fundamental subcategory of Supervised Learning where the goal is to automatically assign a categorical label to an unlabeled instance. In this setting, a "teacher" or data analyst provides a collection of labeled examples, each consisting of a feature vector (quantitative descriptions) and a corresponding label (the desired category). The learning algorithm uses this dataset to produce a model that can take new, unseen inputs and deduce their correct class membership.
📊 Evaluation: See Classification Metrics for performance measurement
1. Primary Categories of Classification
Classification tasks are defined by the structure and number of their possible outputs:
-
Binary Classification: The simplest form, where there are only two possible classes, such as "spam" or "not_spam," often referred to as the positive and negative classes.
-
Multiclass Classification: Involves three or more mutually exclusive categories, such as identifying if an image contains a "galaxy," "star," or "planet".
-
Multi-label Classification: Occurs when an instance can belong to several categories at once, such as an image tagged with "nature," "concert," and "people".
-
One-Class Classification: Focuses on identifying objects of a specific class among all objects, typically used for outlier or anomaly detection where examples of the "attack" or "intrusion" class are rare.
2. The Decision Boundary
The central technical goal of a classification algorithm is to establish a decision boundary (or hypersurface) that separates different classes within the input space.
-
Linear Separability: If the classes can be separated exactly by a straight line or a hyperplane, the dataset is called linearly separable.
-
Function of the Model: The specific form of this boundary—whether straight, curved, or complex—is determined by the chosen algorithm and determines the model's Accuracy.
3. Core Classification Algorithms
Algorithms are often categorized by how they approach the learning task:
-
Linear Models:
- Logistic Regression:** Despite its name, this is a classification algorithm that models the probability of class membership using a continuous function (the sigmoid function) whose output is between 0 and 1.
-
Instance-Based / Non-Parametric
- Support Vector Machines (SVM): A model-based algorithm that views feature vectors as points in a high-dimensional space and attempts to draw a hyperplane that separates classes with the largest possible margin.
-
Decision Trees: A flowchart-like structure that makes decisions by examining specific features at branching nodes until a "leaf" node representing a class is reached.
-
k-Nearest Neighbors (kNN): An instance-based or "lazy" learner that does not build a formal model but instead keeps all training examples in memory and assigns a label to new data points based on the majority vote of its k closest neighbors.
- Handling Multiclass Problems
Many algorithms, such as the standard SVM, are naturally binary. To solve multiclass problems with these tools, two primary strategies are used:
- One-versus-Rest (OvR): Also called One-versus-All (OvA), this involves building C binary classifiers (where C is the number of classes), each trained to distinguish one specific class from all the others.
- One-versus-One (OvO): Involves training a binary classifier for every possible pair of classes; the final class is typically decided by a majority vote.
- Performance Evaluation
Evaluating model Classification metrics is significantly more complex than evaluating a regression model because simple Accuracy can be misleading, particularly on imbalanced datasets.
-
Confusion Matrix: A table summarizing success by counting True Positives (TP), True Negatives (TN), False Positives (FP/Type I error), and False Negatives (FN/Type II error).
-
Precision and Recall: Precision measures the Accuracy of positive predictions, while Recall (Sensitivity) measures the model's ability to find all actual positive cases.
-
F1-Score: The harmonic mean of Precision and recall, providing a single number to balance the trade-off between the two.
-
Area Under the ROC Curve (AUC): Measures the model's performance across all possible decision thresholds, where a value of 1.0 represents a perfect classifier.