Start
At its most fundamental level, machine learning (ML) is a subfield of computer science focused on building algorithms that rely on collections of examples to solve practical problems without being explicitly programmed. Unlike classical programming, where humans input rules to process data, ML systems are trained on data to find the statistical structure or rules themselves.
π Explore ML Topics:
- Supervised Learning β Learn about classification and regression
- Performance Metrics β Evaluate your models
- Classification Metrics β Specific metrics for classification tasks
- Definitions and the Learning Problem
A widely accepted engineering definition, provided by Tom Mitchell, states that a computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E. To have a well-defined learning problem, one must identify these three features: the task, the measure of improvement, and the source of experience.
- The Four Branches of Learning
ML is typically categorized based on the type of experience or supervision the algorithm receives:
-
Supervised Learning: The dataset consists of labeled examples, where each input feature vector x is associated with a desired output or "ground truth" label y. The goal is to learn a mapping function f(x) that can predict labels for new, unseen data. Common tasks include Classification (predicting a category) and regression (predicting a continuous value).
-
UnSupervised Learning: The dataset contains only unlabeled examples. The goal is to discover hidden patterns, such as clustering similar items together, dimensionality reduction for data compression, or density estimation.
-
Semi-Supervised Learning: The dataset contains a small amount of labeled data and a much larger amount of unlabeled data. The hope is that the unlabeled data helps the algorithm find a better model than Supervised Learning alone could produce.
-
Reinforcement Learning (RL): Unlike the others, RL involves an agent interacting with an environment. The agent learns to take actions that maximize a long-term reward signal through a process of trial and error.
- The Building Blocks of an ML Algorithm
Nearly all machine learning algorithms can be described using a standard "recipe" that combines four independent components:
-
A Dataset: A collection of many examples, often represented as a design matrix where rows are samples and columns are features.
-
A Model: The computational machinery or mathematical function (e.g., a neural network or a linear equation) that transforms input data into predictions.
-
An Objective/Loss Function: A mathematical way to quantify how "badly" a model is performing by comparing its predictions to reality. Minimizing this function (often the negative log-likelihood) is the goal of training.
-
An Optimization Procedure: The algorithm used to adjust the model's parameters (the internal "knobs") to minimize the loss function. Stochastic Gradient Descent (SGD) is the dominant optimization algorithm used today.
-
Key Fundamental Concepts
- Features and Vectors: An example is represented as a feature vector where each dimension contains a value describing a specific attribute of the object being processed.
- Parameters vs. Hyperparameters: Parameters are internal variables (like weights) modified by the learning algorithm based on training data. Hyperparameters are external settings (like the depth of a decision tree or learning rate) that are tuned by the analyst to influence model performance.
- Generalization: The central challenge in ML is not just fitting the training data but performing well on new, previously unseen inputs. This ability is called generalization.
- Overfitting and Underfitting: Overfitting occurs when a model fits the training data so well (including its noise) that it fails to generalize to new data. Underfitting occurs when the model is too simple to capture the underlying pattern in the training data.
- The No Free Lunch Theorem: This theoretical principle states that no single machine learning algorithm is universally superior across all possible tasks. The choice of algorithm must be dictated by the specific properties of the data distribution.
- The Mathematical Foundations
Mastering ML requires understanding three core areas of mathematics:
- Linear Algebra: Used to represent data as vectors and matrices and to perform efficient transformations.
- Probability Theory: Provides the framework for representing and manipulating uncertainty, which is inherent in real-world data and predictions.
- Calculus: Specifically multivariate calculus and differentiation, which are used to compute gradients. Gradients tell the optimization algorithm how to change parameters to reduce error.
- The Machine Learning Project Lifecycle
In practice, ML is an iterative cycle, not a linear process. It typically involves:
- Project Scoping: Defining goals and criteria for success.
- Data Engineering: Collecting, cleaning, and preparing raw data into a "tidy" format.
- Feature Engineering: Programmatically transforming raw data into informative features that the model can understand.
- Model Development: Selecting algorithms, training models, and tuning hyperparameters using a validation set.
- Deployment and Serving: Making the model accessible to users to generate scores or predictions.
- Monitoring and Maintenance: Continuously tracking performance in production and retraining the model as data distributions shift over time.