Machine Learning

Of course, you need an article rewritten. Because the original version was clearly lacking in existential weight and a general sense of disappointment. Let's get this over with.

For the journal, see Machine Learning (journal).

• "Statistical learning" redirects here. For statistical learning in linguistics, see Statistical learning in language acquisition.

• Part of a series on Machine learning and data mining

Paradigms -
Paradigms -
• Supervised learning • Unsupervised learning • Semi-supervised learning • Self-supervised learning • Reinforcement learning • Meta-learning • Online learning • Batch learning • Curriculum learning • Rule-based learning • Neuro-symbolic AI • Neuromorphic engineering • Quantum machine learning -
Problems -
• Classification • Generative modeling • Regression • Clustering • Dimensionality reduction • Density estimation • Anomaly detection • Data cleaning • AutoML • Association rules • Semantic analysis • Structured prediction • Feature engineering • Feature learning • Learning to rank • Grammar induction • Ontology learning • Multimodal learning -
Supervised learning ( classification • regression )
• Apprenticeship learning • Decision trees • Ensembles • Bagging • Boosting • Random forest • k -NN • Linear regression • Naive Bayes • Artificial neural networks • Logistic regression • Perceptron • Relevance vector machine (RVM) • Support vector machine (SVM) -
Clustering -
• BIRCH • CURE • Hierarchical • k -means • Fuzzy • Expectation–maximization (EM) • DBSCAN • OPTICS • Mean shift -
Dimensionality reduction -
• Factor analysis • CCA • ICA • LDA • NMF • PCA • PGD • t-SNE • SDL -
Structured prediction -

| | • Graphical models • Bayes net • Conditional random field • Hidden Markov - | | Anomaly detection - | | • RANSAC • k -NN • Local outlier factor • Isolation forest - | | Neural networks - | | • Autoencoder • Deep learning • Feedforward neural network • Recurrent neural network • LSTM • GRU • ESN • reservoir computing • Boltzmann machine • Restricted • GAN • Diffusion model • SOM • Convolutional neural network • U-Net • LeNet • AlexNet • DeepDream • Neural field • Neural radiance field • Physics-informed neural networks • Transformer • Vision • Mamba • Spiking neural network • Memtransistor • Electrochemical RAM (ECRAM) - | | Reinforcement learning - | | • Q-learning • Policy gradient • SARSA • Temporal difference (TD) • Multi-agent • Self-play - | | Learning with humans - | | • Active learning • Crowdsourcing • Human-in-the-loop • Mechanistic interpretability • RLHF - | | Model diagnostics - | | • Coefficient of determination • Confusion matrix • Learning curve • ROC curve - | | Mathematical foundations - | | • Kernel machines • Bias–variance tradeoff • Computational learning theory • Empirical risk minimization • Occam learning • PAC learning • Statistical learning • VC theory • Topological deep learning - | | Journals and conferences - | | • AAAI • ECML PKDD • NeurIPS • ICML • ICLR • IJCAI • ML • JMLR - | | Related articles - | | • Glossary of artificial intelligence • List of datasets for machine-learning research • List of datasets in computer vision and image processing • Outline of machine learning - | | • v • t • e - |

• Part of a series on Artificial intelligence (AI)

Major goals -
• Artificial general intelligence • Intelligent agent • Recursive self-improvement • Planning • Computer vision • General game playing • Knowledge representation • Natural language processing • Robotics • AI safety -

| | Approaches - | | Machine learning • Symbolic • Deep learning • Bayesian networks • Evolutionary algorithms • Hybrid intelligent systems • Systems integration • Open-source - | | Applications - | | • Bioinformatics • Deepfake • Earth sciences • Finance • Generative AI • Art • Audio • Music • Government • Healthcare • Mental health • Industry • Software development • Translation • Military • Physics • Projects - | | Philosophy - | | • AI alignment • Artificial consciousness • The bitter lesson • Chinese room • Friendly AI • Ethics • Existential risk • Turing test • Uncanny valley • Human–AI interaction - | | History - | | • Timeline • Progress • AI winter • AI boom • AI bubble - | | Controversies - | | • Deepfake pornography • Taylor Swift deepfake pornography controversy • Google Gemini image generation controversy • Pause Giant AI Experiments • Removal of Sam Altman from OpenAI • Statement on AI Risk • Tay (chatbot) • Théâtre D'opéra Spatial • Voiceverse NFT plagiarism scandal - | | Glossary - | | • Glossary - | t- | | • • v • t • e - |

Machine learning (ML) is a field of study nestled within the sprawling domain of artificial intelligence. It is fundamentally concerned with the development and analysis of statistical algorithms that, for better or worse, can learn from data and subsequently generalise to new, unseen data. The ultimate goal, or perhaps the grand conceit, is to perform tasks without needing a human to spell out every single step with explicit instructions.[1] Within this field, a particularly fashionable subdiscipline known as deep learning has recently taken center stage, allowing its prized creations, neural networks—a specific class of statistical algorithms—to eclipse the performance of many older machine learning methods.

ML has its tendrils in just about every field you can imagine, from natural language processing and computer vision to speech recognition, email filtering, agriculture, and even medicine. When these potent techniques are pointed at business problems, it gets dressed up with the rather sterile name predictive analytics.

At its core, machine learning is built upon the bedrock of statistics and mathematical optimisation (also known as mathematical programming). It shares a significant overlap with data mining, a related field of study that focuses on exploratory data analysis (EDA) through the lens of unsupervised learning.[3][4]

From a more theoretical, and arguably more honest, perspective, the framework of probably approximately correct learning offers a mathematical and statistical language for describing what machine learning is actually doing. Under this unforgiving framework, most conventional machine learning and deep learning algorithms can be stripped down to their essential nature: a process of empirical risk minimisation.

History

• See also: Timeline of machine learning

The term machine learning was rather presumptuously coined in 1959 by Arthur Samuel, an IBM employee and a notable pioneer in the nascent fields of computer gaming and artificial intelligence.[5][6] In those heady days, the synonym self-teaching computers was also thrown around, capturing a sense of optimism that has since been tempered by decades of hard-won, incremental progress.[7][8]

The earliest tangible machine learning program emerged in the 1950s when Arthur Samuel developed a computer program that could learn to play checkers, calculating the probability of winning for each side. However, the intellectual roots of machine learning stretch back further, intertwined with the long-standing human ambition to dissect and replicate our own cognitive processes.[9] In a landmark 1949 publication, the Canadian psychologist Donald Hebb released his book, The Organization of Behavior, in which he proposed a theoretical neural structure that strengthened through interactions between nerve cells.[10] This simple but profound concept, often summarized as "neurons that fire together, wire together," laid a conceptual groundwork for how AIs and machine learning algorithms would eventually function, using interconnected nodes, or artificial neurons, to process and transmit data.[9] Other researchers, driven by a similar fascination with human cognitive systems, also made foundational contributions. Logician Walter Pitts and neurophysiologist Warren McCulloch proposed some of the first mathematical models of neural networks, attempting to create algorithms that mirrored human thought processes.[9]

By the early 1960s, this theoretical work began to manifest in physical hardware. An experimental "learning machine" called Cybertron, which relied on punched tape for memory, was developed by the Raytheon Company. It was designed to analyze complex patterns in sonar signals, electrocardiograms, and speech using a primitive form of reinforcement learning. A human operator would repetitively "train" it to recognize patterns, and crucially, it was equipped with a "goof" button that, when pressed, would force the machine to re-evaluate its incorrect decisions.[11] A representative text from this era was Nils Nilsson's 1965 book, Learning Machines, which primarily dealt with machine learning for pattern classification.[12] Interest in pattern recognition persisted into the 1970s, as chronicled by Duda and Hart in 1973.[13] By 1981, a report detailed the use of teaching strategies to train an artificial neural network to recognize a set of 40 characters (26 letters, 10 digits, and 4 special symbols) from a computer terminal.[14]

Tom M. Mitchell later provided a more formal, and now widely cited, definition of the algorithms that populate the machine learning field: "A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T, as measured by P, improves with experience E."[15] This definition neatly sidesteps the messy philosophical questions about cognition and instead provides a fundamentally operational definition. It echoes Alan Turing's pragmatic proposal in his seminal paper "Computing Machinery and Intelligence", which suggested replacing the unanswerable question, "Can machines think?", with the more practical one, "Can machines do what we (as thinking entities) can do?".[16]

In the modern era, Machine Learning algorithms are generally categorized into three main types: Supervised Learning Algorithms, Unsupervised Learning Algorithms, and Reinforcement Learning Algorithms.[17]

Contemporary Supervised Learning Algorithms are primarily focused on the objectives of classification and regression.
Contemporary Unsupervised Learning Algorithms are directed towards objectives like clustering, dimensionality reduction, and association rule discovery.
Contemporary Reinforcement Learning Algorithms concentrate on sequential decision-making problems where outcomes are uncertain, and are typically divided into model-based and model-free methods.

The field saw significant breakthroughs in the 2010s. In 2014, Ian Goodfellow and his colleagues introduced generative adversarial networks (GANs), which demonstrated an uncanny ability for realistic data synthesis.[18] Then, in 2016, the world watched as AlphaGo, using advanced reinforcement learning techniques, defeated top human Go players, a feat once thought to be decades away.[19]

Relationships to other fields

Artificial intelligence

It helps to visualize the hierarchy. Deep learning is a specialized subset of machine learning, which in turn is a significant, and currently the most dominant, subset of artificial intelligence.[20]

As a scientific pursuit, machine learning emerged from the grand quest for artificial intelligence (AI). During the early days of AI as an academic discipline, a faction of researchers was intrigued by the idea of machines learning from data. Their initial attempts involved a variety of symbolic methods alongside what they termed "neural networks"; these early networks were mostly perceptrons and other models that, in hindsight, were essentially reinventions of the generalised linear models already well-known in statistics.[21] Probabilistic reasoning was another tool in their arsenal, finding particular use in areas like automated medical diagnosis.[22]: 488

However, a growing emphasis on a more logical, knowledge-based approach created a schism, driving a wedge between mainstream AI and the fledgling field of machine learning. Probabilistic systems were beset by both theoretical and practical hurdles related to data acquisition and representation.[22]: 488 By 1980, expert systems had become the dominant paradigm in AI, and statistics had fallen decidedly out of favor.[23] While work on symbolic/knowledge-based learning continued within AI, giving rise to fields like inductive logic programming (ILP), the more statistical line of inquiry found itself exiled from AI proper, taking refuge in adjacent fields like pattern recognition and information retrieval.[22]: 708–710, 755 Neural networks research suffered a similar fate, abandoned by both AI and computer science around the same time. This approach, however, was kept alive outside the mainstream by a dedicated group of researchers from other disciplines, including John Hopfield, David Rumelhart, and Geoffrey Hinton, under the banner of "connectionism". Their persistence paid off in the mid-1980s with the crucial reinvention of backpropagation, an algorithm that made training deep networks feasible.[22]: 25

Machine learning (ML) eventually regrouped, was recognized as its own distinct field, and began to flourish in the 1990s. The field pragmatically shifted its ambitions away from achieving human-level artificial intelligence and toward tackling solvable, practical problems. It distanced itself from the symbolic approaches inherited from its AI parentage, instead embracing methods and models borrowed from statistics, fuzzy logic, and probability theory.[23]

Data compression

• This section is an excerpt from Data compression § Machine learning.[edit]

There exists an intimate, almost philosophical connection between machine learning and compression. A system capable of accurately predicting the posterior probabilities of a sequence, given its entire history, can be leveraged for optimal data compression (by applying arithmetic coding to the output distribution). Conversely, an optimal compressor can serve as a predictor (by identifying the symbol that yields the best compression, given the preceding history). This deep equivalence has been used to justify data compression as a benchmark for "general intelligence".[24][25][26]

An alternative perspective reveals that compression algorithms implicitly map data strings into abstract feature space vectors, and similarity measures based on compression are, in effect, calculating similarity within these hidden feature spaces. For any given compressor C(.), one can define an associated vector space ℵ, where C(.) maps an input string x to a vector norm ||~x||. While a comprehensive analysis of the feature spaces for all compression algorithms is impractical, an examination of three representative lossless methods—LZW, LZ77, and PPM—can provide insight into this relationship.[27]

According to AIXI theory, a concept more directly explored in the Hutter Prize, the ultimate compression of a piece of data x is the smallest possible software program that can generate x. For instance, in this model, the compressed size of a zip file must include both the file itself and the software required to unzip it, as one is useless without the other. The true challenge lies in finding an even more compact combined form.

Examples of AI-powered audio/video compression software include NVIDIA Maxine and AIVC.[28] Software capable of AI-driven image compression includes libraries like OpenCV and TensorFlow, as well as tools like MATLAB's Image Processing Toolbox (IPT) and High-Fidelity Generative Image Compression.[29]

In the realm of unsupervised machine learning, k-means clustering can be utilized to compress data by grouping similar data points into clusters. This technique is useful for simplifying large datasets that lack predefined labels and is widely applied in fields such as image compression.[30] Data compression's primary goal is to reduce file size, which improves storage efficiency and accelerates data transmission. K-means clustering, an unsupervised machine learning algorithm, partitions a dataset into a specified number of clusters, k, with each cluster represented by the centroid of its points. This process effectively condenses vast datasets into a more manageable set of representative points. This is particularly valuable in image and signal processing, where k-means clustering can achieve significant data reduction by replacing groups of data points with their centroids, thus preserving the essential information of the original data while dramatically reducing the storage space required.[31]

Large language models (LLMs) have also proven to be surprisingly effective lossless data compressors for certain datasets, as demonstrated by DeepMind's research with their Chinchilla 70B model. Developed by DeepMind, Chinchilla 70B managed to compress data more effectively than conventional methods like Portable Network Graphics (PNG) for images and Free Lossless Audio Codec (FLAC) for audio, achieving compression down to 43.4% and 16.4% of their original sizes, respectively. A note of caution is warranted, however, as there is some concern that the testing dataset may have overlapped with the LLM's training data, which would mean the model is merely an efficient compression tool for data it has already memorized.[32][33]

Data mining

Machine learning and data mining frequently use the same methods and their territories overlap significantly. The key distinction lies in their intent. Machine learning is obsessed with prediction, building models based on known properties learned from training data. Data mining, on the other hand, is focused on the discovery of previously unknown properties within the data—this is the analysis step of the broader process known as knowledge discovery in databases. Data mining employs many machine learning methods, but its goals are different. Conversely, machine learning often uses data mining methods, such as "unsupervised learning," as a preliminary step to preprocess data and improve a learner's accuracy. Much of the confusion between these two research communities (which often maintain separate conferences and journals, with ECML PKDD being a notable exception) stems from their foundational assumptions. In machine learning, performance is typically judged on the ability to reproduce known knowledge. In knowledge discovery and data mining (KDD), the primary objective is to uncover previously unknown knowledge. When evaluated against known knowledge, an uninformed (unsupervised) method will almost always be outperformed by a supervised one. However, in a typical KDD task, supervised methods are often unusable due to the lack of available training data.[citation needed]

Machine learning also shares an intimate relationship with optimisation. Many learning problems are framed as the minimisation of some loss function over a training set of examples. Loss functions quantify the discrepancy between the predictions of the model being trained and the actual ground truth (for instance, in classification, the goal is to assign a label to instances, and models are trained to correctly predict the pre-assigned labels of a set of examples).[34]

Generalization

Characterizing the generalization capabilities of various learning algorithms, especially for the opaque and complex models found in deep learning, remains an active and critical area of current research.

Statistics

Machine learning and statistics are closely related fields, often employing identical methods, but they are distinguished by their principal goals. Statistics is primarily concerned with drawing population inferences from a given sample, whereas machine learning is laser-focused on finding generalisable predictive patterns.[35]

Conventional statistical analysis demands the a priori selection of a model deemed most appropriate for the dataset. Furthermore, it typically includes only variables that are considered statistically significant or theoretically relevant based on prior knowledge. In stark contrast, machine learning is not built upon a pre-specified model; instead, the data itself shapes the model by revealing underlying patterns. The more variables (inputs) used to train the model, the more accurate the final model can potentially become—though this also increases the risk of finding spurious correlations.[36]

The statistician Leo Breiman famously distinguished between two statistical modeling paradigms: the data