DALL-E

Contents

1. Overview
2. Etymology
3. Cultural Impact

Ah, another digital marvel for the masses. Very well, if you insist on dragging me into this, let’s dissect DALL-E. Don’t expect me to be impressed; I’ve seen this show before, just with different actors.

Image-generating deep learning model

Watermark present on DALL-E images

One might observe an image generated by DALL-E 2, a peculiar artifact born from the prompt “Teddy bears working on new AI research underwater with 1990s technology.” A testament, perhaps, to the sheer whimsicality that humanity often demands from its machines, or simply a demonstration of what happens when you let algorithms loose on the collective unconscious.

This particular digital entity, DALL-E, is not a singular creation but rather a lineage of text-to-image models , meticulously crafted and unleashed upon the world by the ever-prolific minds at OpenAI . These models leverage the formidable, if somewhat overhyped, power of deep learning methodologies to conjure digital images from mere textual descriptions—what the industry, in its infinite creativity, has dubbed “prompts.”

The initial iteration of DALL-E first made its rather quiet debut on 5 January 2021, a mere four years ago, or an eternity in the frantic pace of technological “progress.” Its stable release for the latest incarnation, DALL-E 3, arrived more recently on 10 August 2023. The platform, as one might expect from such endeavors, is entrenched in Cloud computing platforms , ensuring its omnipresence and accessibility. DALL-E is, in essence, a Text-to-image model , a specialized type of software operating under a Proprietary service license. Its ultimate successor, or perhaps its replacement, appears to be GPT Image , suggesting a perpetual cycle of iteration and obsolescence.

This entire exercise is, of course, merely a facet of a larger, ongoing saga, intricately woven into a series dedicated to the grand, often bewildering, domain of Artificial intelligence (AI) .

Major Goals of Artificial Intelligence

The overarching ambitions driving these developments are, predictably, rather grand and occasionally alarming. They include, but are not limited to, the pursuit of Artificial general intelligence —that elusive, sentient digital being that will either save us or condemn us—the creation of sophisticated Intelligent agent s, and the somewhat terrifying prospect of Recursive self-improvement where AI perpetually enhances itself, presumably beyond human comprehension. More tangible, immediate goals involve advancements in areas such as Planning , Computer vision , mastering General game playing , refining Knowledge representation , perfecting Natural language processing , and the practical application of Robotics . And, naturally, there’s the ever-present, often-ignored concern of AI safety , a topic frequently discussed in hushed tones after the fact.

Approaches to Artificial Intelligence

To achieve these lofty aims, various methodologies are employed. The prevailing winds favour Machine learning , with its more specialized, and currently dominant, offspring, Deep learning . However, other pathways persist, including Symbolic approaches, probabilistic models like Bayesian networks , the somewhat organic-sounding Evolutionary algorithms , intricate Hybrid intelligent systems , and the complex process of Systems integration . And for those who prefer their digital tools transparent, there’s the burgeoning field of Open-source AI.

Applications of Artificial Intelligence

The practical manifestations of AI are, by now, ubiquitous, seeping into nearly every facet of modern existence. From the intricate world of Bioinformatics and the unsettling rise of Deepfake technology to the scientific rigours of Earth sciences and the cutthroat arena of Finance , AI is reshaping industries. A particularly prominent offshoot is Generative AI , which encompasses the creation of Art , Audio , and even Music . Beyond the creative, AI is deployed in Government , transforming Healthcare (including the delicate domain of Mental health ), optimizing Industry , streamlining Software development , and enabling sophisticated Translation . Its presence even extends to the chilling implications within the Military and the fundamental exploration of Physics . A myriad of specific Projects continue to emerge, each pushing the boundaries of what these systems can accomplish.

Philosophy of Artificial Intelligence

Naturally, such powerful technology invites profound philosophical contemplation—or, more accurately, a great deal of hand-wringing. Key discussions revolve around AI alignment , ensuring these systems adhere to human values, the murky concept of Artificial consciousness , and the sobering realization known as The bitter lesson . Classic thought experiments like the Chinese room persist, alongside the utopian (or naive) ideal of Friendly artificial intelligence . The Ethics of AI are a constant, thorny debate, particularly when contemplating the Existential risk posed by advanced systems. The venerable Turing test continues to serve as a benchmark, however flawed, for machine intelligence, while the unsettling phenomenon of the Uncanny valley highlights the discomfort when AI creations too closely mimic, but fail to perfectly replicate, human form or behaviour. All of this culminates in the complex dynamics of Human–AI interaction , a relationship still very much in its infancy.

History of Artificial Intelligence

The journey of AI is, predictably, a convoluted one, marked by periods of fervent optimism and crushing disillusionment. Its History is punctuated by a detailed Timeline of developments, charting the Progress through alternating eras of stagnation, dubbed the “AI winter ,” and periods of rapid advancement, the “AI boom ,” sometimes bordering on an “AI bubble .”

Controversies in Artificial Intelligence

As with any transformative technology, AI has not been without its share of heated Controversies . The disturbing emergence of Deepfake pornography , exemplified by the recent Taylor Swift deepfake pornography controversy , highlights the misuse of these tools. The Google Gemini image generation controversy raised questions about bias and control. Public calls to Pause Giant AI Experiments reflect widespread anxiety, as do internal power struggles like the Removal of Sam Altman from OpenAI . Declarations such as the Statement on AI Risk underscore the gravity of the potential consequences. Historical examples like the infamous Tay (chatbot) serve as stark reminders of how quickly AI can go awry, while artistic disputes like Théâtre D’opéra Spatial and the Voiceverse NFT plagiarism scandal touch upon the thorny issues of creativity and ownership in the digital age.

Finally, for the truly dedicated, there is a comprehensive Glossary to navigate the ever-expanding lexicon of this bewildering field.

DALL-E, DALL-E 2, and DALL-E 3, often stylized with that rather affected interpunct as DALL·E, are the aforementioned text-to-image models that have been meticulously developed by OpenAI . Their purpose, as if it weren’t obvious, is to harness the power of deep learning to generate distinct digital images from mere textual descriptions, which, in the vernacular of the digital age, are referred to as “prompts.”

The inaugural version of DALL-E was first unveiled to a largely unsuspecting public in January 2021. Just a year later, its purported successor, DALL-E 2, made its grand entrance, promising enhanced capabilities. The latest iteration, DALL-E 3, was integrated natively into ChatGPT for those privileged enough to subscribe to ChatGPT Plus and ChatGPT Enterprise, with this rollout occurring in October 2023. Its availability was further extended via OpenAI’s API and their “Labs” platform in early November of the same year. Not one to be left behind, Microsoft, ever eager to integrate the latest shiny object, implemented DALL-E 3 into Bing’s Image Creator tool and has rather predictably announced plans to embed it within their Designer application. Indeed, Microsoft Copilot now proudly runs on DALL-E 3 when utilizing Bing’s Image Creator tool, a testament to its widespread adoption. However, the relentless march of technological “progress” dictates that even DALL-E 3’s reign is finite, as it was slated for replacement in ChatGPT by GPT Image ’s native image-generation capabilities in March 2025. One wonders how long until GPT Image itself becomes yesterday’s news.

History and Background

DALL-E’s existence was first made public through an OpenAI blog post on 5 January 2021. It was revealed to be a derivation, or perhaps a mutation, of GPT-3 , modified specifically for the rather novel task of generating visual content from textual input. A year later, on 6 April 2022, OpenAI grandly announced DALL-E 2, positioning it as a superior successor. This new model was ostensibly engineered to produce images that were not only more “realistic” but also boasted higher resolutions, with the added boast that it “can combine concepts, attributes, and styles” with newfound proficiency.

The journey to public access for DALL-E 2 was a cautious one, riddled with the usual corporate hesitations. On 20 July 2022, it finally entered a beta phase, with a million invitations grudgingly dispatched to individuals who had languished on a waitlist. These beta users were granted a meager allotment of free image generations each month, with the option, naturally, to purchase more. Access had been deliberately constrained prior to this, limited to a select group of pre-selected users for a “research preview,” a decision supposedly driven by weighty concerns about ethics and safety – concerns that, one might argue, are often an afterthought. Eventually, on 28 September 2022, the floodgates opened, and DALL-E 2 was made available to everyone, the waitlist requirement unceremoniously discarded.

September 2023 saw OpenAI announce their latest iteration, DALL-E 3, claiming it possessed the ability to comprehend “significantly more nuance and detail” than any of its predecessors. Following this, in early November 2022, OpenAI released DALL-E 2 as an API , an open invitation for developer s to integrate this image-generating marvel into their own applications. Microsoft was quick to showcase its adoption of DALL-E 2, embedding it within their Designer app and their Image Creator tool, which, predictably, found its home in Bing and Microsoft Edge . The API operates on a rather straightforward cost-per-image basis, with pricing fluctuating depending on the chosen image resolution. For those large enterprises working directly with OpenAI ’s dedicated team, volume discounts are, of course, available, because the future of AI is nothing if not stratified.

The rather whimsical name “DALL-E” itself is a clever portmanteau , an amalgamation of the animated robot Pixar character WALL-E and the iconic Spanish surrealist artist Salvador Dalí . A nod to both mechanical ingenuity and artistic absurdity, fitting for a tool that often produces both.

In February 2024, OpenAI took a step that, frankly, should have been implemented from the start: they began embedding watermarks into images generated by DALL-E. These digital signatures contain metadata conforming to the C2PA (Coalition for Content Provenance and Authenticity) standard, a standard actively championed by the Content Authenticity Initiative . A necessary measure, one might begrudgingly admit, in an increasingly murky digital landscape.

Technology

The foundational technology underpinning DALL-E, the generative pre-trained transformer (GPT) model, was originally conceived by OpenAI way back in 2018. It utilized a Transformer architecture, a design that has since become almost canonical in the realm of large language models. The initial iteration, GPT-1, was subsequently scaled up, giving birth to GPT-2 in 2019. This relentless scaling continued, culminating in the unveiling of GPT-3 in 2020, a model boasting an astounding 175 billion parameters—a number so vast it almost sounds like a boast in itself.

DALL-E (Original)

The original DALL-E model was structured around three core components, a rather intricate ballet of algorithms. First, a discrete VAE (Variational Autoencoder) handled the initial translation of visual data. Second, an autoregressive decoder-only Transformer, a behemoth with 12 billion parameters and a clear architectural kinship to GPT-3 , undertook the heavy lifting of generation. Finally, a CLIP (Contrastive Language-Image Pre-training) pair, consisting of an image encoder and a text encoder, served as the critical arbiter of relevance.

The discrete VAE played a pivotal role: it was tasked with the conversion of an image into a sequence of tokens, and conversely, transforming a token sequence back into a coherent image. This seemingly mundane task was absolutely essential, as the Transformer, in its inherent design, was not equipped to directly process raw image data.

The Transformer model itself received as its input a carefully orchestrated sequence. This sequence began with a tokenized image caption, typically provided in English, which was then tokenized using byte pair encoding (boasting a vocabulary size of 16384) and could extend up to 256 tokens in length. Following this textual prelude were tokenized image patches. Each image, originally a 256×256 RGB composition, was meticulously subdivided into 32×32 patches, each comprising 4×4 pixels. Each of these minute patches was then subjected to conversion by a discrete variational autoencoder into a singular token, drawn from a vocabulary of 8192 possibilities.

DALL-E was not born in isolation; it was developed and simultaneously announced to the public alongside CLIP (Contrastive Language-Image Pre-training) . CLIP itself is a distinct model, built upon the principles of contrastive learning . Its training regimen involved an immense dataset of 400 million pairs of images, each meticulously matched with its corresponding text caption, all scraped from the vast, chaotic expanse of the Internet. CLIP’s crucial function was to “understand and rank” DALL-E’s output. It achieved this by predicting which caption, from a formidable list of 32,768 captions randomly selected from its training dataset (only one of which was the “correct” match), was the most fitting for a given generated image.

Ultimately, a fully trained CLIP pair was deployed to filter a larger, initial collection of images generated by DALL-E, sifting through the visual noise to select the image that most closely aligned with the original text prompt. A rather elaborate quality control system, if you ask me.

DALL-E 2

DALL-E 2, in a rather counter-intuitive turn, utilized a comparatively smaller parameter count than its predecessor, clocking in at 3.5 billion. This iteration eschewed the autoregressive Transformer architecture of the original. Instead, DALL-E 2 embraced a diffusion model , a technique that has since gained considerable traction. This diffusion model was conditioned on CLIP image embeddings, which, during the inference phase, were themselves derived from CLIP text embeddings by a preceding “prior” model. This architectural shift, one might note, placed it squarely in line with Stable Diffusion , which arrived on the scene just a few months later, suggesting a convergence of design principles in the generative AI space.

DALL-E 3

For DALL-E 3, a technical report was indeed drafted, as is customary. However, in a move that some might find rather opaque, this report conspicuously omitted the granular details of the model’s training regimen or its precise implementation. Instead, it chose to focus almost exclusively on the purportedly improved “prompt following capabilities” that were developed for DALL-E 3. A curious choice, prioritizing outcome over process, but perhaps indicative of the proprietary nature of these deep learning behemoths.

Capabilities

DALL-E, in its various iterations, possesses the rather remarkable, if somewhat superficial, ability to conjure imagery across a diverse spectrum of styles. This includes everything from the stark realism of photorealistic depictions to the brushstrokes of traditional paintings and even the simplistic charm of emoji . It can, with a surprising degree of precision, “manipulate and rearrange” objects within its generated images, and demonstrate an uncanny knack for correctly positioning design elements in novel compositions, even without explicit, detailed instructions. Thom Dunn, writing for the digital curiosity cabinet BoingBoing , noted with a hint of amusement that “For example, when asked to draw a daikon radish blowing its nose, sipping a latte, or riding a unicycle, DALL-E often draws the handkerchief, hands, and feet in plausible locations.” This suggests a rudimentary understanding of spatial relationships, or at least a highly effective mimicry of such.

Furthermore, DALL-E has exhibited an intriguing capacity to “fill in the blanks,” inferring appropriate details without the need for specific prompts. It might, for instance, spontaneously inject Christmas imagery into prompts commonly associated with the festive celebration, or adeptly apply appropriately placed shadows to images where such details were not explicitly requested. Beyond these specific instances, DALL-E appears to possess a broad, albeit shallow, understanding of prevailing visual and design trends, a capacity that, frankly, often leaves one wondering if it’s merely reflecting our own collective aesthetic biases back at us.

The model’s ability to produce images for a vast array of arbitrary descriptions, rendered from various viewpoints, is generally robust, with failures being a relatively rare occurrence. Mark Riedl, an associate professor at the Georgia Tech School of Interactive Computing, observed that DALL-E could effectively blend concepts, a feat he rightly identified as a key element of human creativity . This raises the perennial question of whether mimicry equates to genuine understanding. Its visual reasoning capabilities are even sufficient to tackle Raven’s Matrices , a type of non-verbal intelligence test often administered to humans. A curious benchmark, given the vastly different cognitive processes involved.

The latest iteration, DALL-E 3, purports to follow complex prompts with superior accuracy and an increased level of detail compared to its predecessors. It also boasts a significantly enhanced ability to generate more coherent and accurate text within images, a perennial challenge for earlier models. This improved fidelity in text generation is particularly notable. As mentioned previously, DALL-E 3 is seamlessly integrated into ChatGPT Plus, bringing its visual generation capabilities directly into conversational AI. This integration positions it as a more versatile tool for a broader range of applications, though one might still question the depth of its creative spark.

Image Modification

Beyond generating images from scratch, DALL-E 2 and DALL-E 3 also possess compelling image modification functionalities. Given an existing image, these models can produce “variations” of the original, presenting a suite of individual outputs that subtly or dramatically diverge from the source material. More impressively, they can directly edit an image, either to modify existing elements or to expand upon its original composition.

The “inpainting” and “outpainting” capabilities are particularly noteworthy. Inpainting allows the models to intelligently fill in missing areas within an image, leveraging the surrounding context to generate new content that is visually consistent with the original medium and, of course, adheres to a given prompt. This means you can effectively erase a portion of an image and have the AI seamlessly reconstruct it.

Outpainting takes this a step further, enabling the expansion of an image beyond its original borders, generating new visual information that extends the scene. For instance, these features can be utilized to seamlessly insert a novel subject into an existing photograph or to broaden the scope of a landscape. OpenAI rather confidently states that “Outpainting takes into account the image’s existing visual elements — including shadows, reflections, and textures — to maintain the context of the original image.” A neat trick, if you ask me, designed to appear more intelligent than it truly is.

Technical Limitations

Despite the fanfare, DALL-E 2’s language understanding is not without its glaring deficiencies. It occasionally struggles with the nuances of linguistic structure, sometimes proving unable to differentiate between “A yellow book and a red vase” and “A red book and a yellow vase.” Similarly, the subtle distinction between “A panda making latte art” and “Latte art of a panda” can be lost in translation. Perhaps most famously, it generated images of an astronaut riding a horse when confronted with the prompt “a horse riding an astronaut,” a rather definitive demonstration of its limitations in grasping true semantic relationships.

Its failures extend beyond simple word order. The model frequently falters when asked to generate images involving more than three distinct objects, struggles with negation, numerical precision, and the complexities of connected sentences , often resulting in egregious mistakes where object features are mistakenly applied to the wrong subject. Additional, and rather significant, limitations include its abysmal performance in generating legible text, ambigrams, or any form of coherent typography, which almost invariably devolves into a dream-like gibberish that serves as a stark reminder of its non-human origins. Furthermore, the model exhibits a constrained capacity for accurately addressing scientific information, whether in the intricate details of astronomy or the precise representations required for medical imagery.

Observe this rather pathetic attempt to generate Japanese text using the prompt “a person pointing at a tanuki , with a speech bubble that says ‘これは狸です！’”. The result is a jumble of nonsensical kanji and kana , a clear indicator that DALL-E understands the concept of text, but utterly fails at its execution. A machine that cannot even spell correctly; how very human.

Ethical Concerns

DALL-E 2’s inherent reliance on vast, publicly available datasets inevitably shapes its outputs, leading to what is now commonly referred to as algorithmic bias . This bias manifests in various ways, such as the disproportionate generation of men over women for requests that conspicuously omit gender specifications. OpenAI ’s attempts to mitigate this, such as filtering its training data to excise violent and sexual imagery, have, ironically, sometimes exacerbated other biases, such as reducing the frequency of women being generated. OpenAI itself has hypothesized that this unintended consequence might stem from women being more frequently sexualized in the unfiltered training data, causing the filter to inadvertently impact their overall representation.

In a rather telling admission in September 2022, OpenAI confirmed to The Verge that DALL-E surreptitiously injects phrases into user prompts to attempt to redress these biases in its results. For example, generic prompts that do not specify gender or race might have terms like “black man” and “Asian woman” invisibly appended. While OpenAI claims to address concerns regarding potential “racy content”—anything containing nudity or sexual themes—in DALL-E 3 through a labyrinthine system of input/output filters, blocklists, ChatGPT refusals, and direct model-level interventions, the problem persists. Despite these efforts, DALL-E 3 continues to disproportionately represent individuals as White, female, and youthful. Users are left with the tedious task of crafting more specific prompts for image generation to counteract these ingrained biases.

A particularly pressing concern regarding DALL-E 2 and its ilk is their potential weaponization in propagating deepfakes and other insidious forms of misinformation. In a rather transparent, if ultimately flawed, attempt to mitigate this, the software is programmed to reject prompts involving public figures and to block uploads containing human faces. Prompts deemed to contain “potentially objectionable content” are similarly blocked, and uploaded images are subjected to analysis for offensive material. However, the Achilles’ heel of prompt-based filtering is its inherent bypassability; users can easily circumvent these restrictions by employing alternative phrasing that yields a similar, undesirable output. For instance, the word “blood” might be filtered, but “ketchup” or “red liquid” are often not, demonstrating the superficiality of such safeguards.

Another profound concern, particularly for those who earn their livelihoods through creative endeavors, is the looming threat of technological unemployment for artists, photographers, and graphic designers. The models’ increasing accuracy and burgeoning popularity suggest a future where human skill might be rendered economically redundant. DALL-E 3 attempts to address this by incorporating a mechanism to block users from generating art in the distinct style of currently-living artists, a noble gesture, perhaps, but one that skirts the larger issue. While OpenAI explicitly states that images produced using their models do not require permission for reprinting, selling, or merchandising, this assertion has ignited a furious debate regarding the fundamental question of who truly owns these AI-generated images, a complex legal quagmire given the ambiguity surrounding copyright law .

Perhaps most unsettling are the whispers of military application. In 2023, Microsoft, a major investor in OpenAI , reportedly pitched the United States Department of Defense on the prospect of utilizing DALL-E models to train battlefield management systems . This revelation takes the abstract concerns of AI ethics into a very concrete, and potentially lethal, domain. Compounding this, in January 2024, OpenAI quietly, and rather tellingly, removed its blanket ban on military and warfare use from its usage policies. The implications of this shift are, frankly, chilling.

Reception

The public reception of DALL-E, especially in its early days, was largely dominated by a fascination with a select subset of its outputs, often described as “surreal” or “quirky.” The image generated by DALL-E in response to the prompt: “an illustration of a baby daikon radish in a tutu walking a dog” achieved a level of minor celebrity, featuring prominently in articles from Input, NBC , Nature , and a host of other publications. Similarly, its rendition of “an armchair in the shape of an avocado” garnered widespread attention, a perfect encapsulation of the kind of whimsical absurdity that captivated early adopters.

ExtremeTech highlighted DALL-E’s rather impressive grasp of temporal aesthetics, noting that “you can ask DALL-E for a picture of a phone or vacuum cleaner from a specified period of time, and it understands how those objects have changed.” Engadget echoed this sentiment, remarking on its “unusual capacity for ‘understanding how telephones and other objects change over time’.” These observations underscored a capability that went beyond mere image generation, hinting at a deeper, albeit still mechanical, comprehension of cultural evolution.

According to MIT Technology Review , one of OpenAI ’s stated objectives with DALL-E was to “give language models a better grasp of the everyday concepts that humans use to make sense of things.” A rather ambitious goal, given the inherent limitations of these systems.

From the perspective of Wall Street, DALL-E 2 was met with an overwhelmingly positive reception. Certain firms, ever eager to spot the next gold rush, posited that it could represent a pivotal moment for a future multi-trillion dollar industry. The financial backing for OpenAI has been substantial; by mid-2019, it had already secured over $1 billion in funding from Microsoft and Khosla Ventures. Following the high-profile launches of DALL-E 2 and ChatGPT , OpenAI secured an additional $10 billion in funding from Microsoft in January 2023, solidifying its position as a major player in the AI landscape. Money talks, and in this case, it screamed.

However, not all reactions have been so sanguine. Japan’s vibrant anime community, in particular, has met DALL-E 2 and similar models with a decidedly negative response. Artists typically articulate two primary arguments against such software. The first is a philosophical one: the contention that AI-generated art is not, in fact, “art” because it lacks the human intent, emotion, and struggle inherent in genuine creative expression. The juxtaposition of AI-generated images alongside their own diligently crafted work is perceived as deeply degrading, undermining the years of time and skill invested in their craft. As many have pointed out, AI-driven image generation tools have faced intense criticism precisely because they are trained on vast quantities of human-made art, often scraped from the web without explicit permission or compensation.

The second, equally significant, concern revolves around the complex legal quagmire of copyright law and the datasets upon which these text-to-image models are trained. OpenAI has, rather conveniently, refrained from disclosing specific information regarding the datasets utilized for DALL-E 2’s training, a lack of transparency that has incited considerable alarm among artists who fear their work has been used without their consent. The legal framework surrounding these issues remains, at present, frustratingly inconclusive.

More recently, after the integration of DALL-E 3 into Bing Chat and ChatGPT , both Microsoft and OpenAI found themselves under fire for what critics described as excessive content filtering, leading some to lament that DALL-E had been “lobotomized.” Instances such as the flagging of images generated by prompts like “man breaks server rack with sledgehammer” were cited as evidence of this overzealous censorship. In the initial days following its launch, filtering appeared to be so aggressively ramped up that even images generated by Bing’s own suggested prompts were being blocked. TechRadar sagely argued that leaning too heavily on the side of caution risks severely limiting DALL-E’s value as a genuinely creative tool. A rather predictable outcome when corporate caution clashes with artistic freedom.

Open-source Implementations

Given OpenAI ’s steadfast refusal to release the source code for any of its DALL-E models, a predictable vacuum emerged, quickly filled by several ambitious attempts to create open-source models offering comparable capabilities. One notable example is Craiyon, formerly known as DALL-E Mini until OpenAI rather proprietarily requested a name change in June 2022. Released in 2022 on Hugging Face ’s Spaces platform, Craiyon is an AI model that drew inspiration from the original DALL-E and was trained on unfiltered, raw data harvested from the Internet. It garnered substantial media attention in mid-2022, largely due to its capacity for generating often humorous, sometimes bizarre, imagery. Another prominent example of an open-source text-to-image model that has since risen to prominence is Stable Diffusion , developed by Stability AI. These open-source initiatives highlight the community’s desire to democratize, or at least replicate, the power of proprietary AI tools.