OpenAI's new AI image generator pushes the boundaries with fast detail and resolution

On Wednesday, OpenAI Announce DALL-E 3, the latest version of the AI image synthesis model that features full integration with ChatGPT. DALL-E 3 displays images by closely following complex descriptions and handling the creation of text within the image (such as labels and tags), which is a challenge to previous models. Currently in research preview, it will be available to ChatGPT Plus and Enterprise customers in early October.

Like its predecessor, DALLE-3 is a text-to-image generator that creates new images based on written descriptions called prompts. Although OpenAI hasn’t released any technical details about DALL-E 3, the AI model at the heart of previous versions of DALL-E was trained on millions of images created by human artists and photographers, some of which are licensed from stock sites like Shutterstock. DALL-E 3 likely follows the same formula, but with new training techniques and longer computational training time.

Judging by the samples provided by OpenAI on its promotional blog, DALL-E 3 appears to be a radically more capable photomontage model than anything else available in terms of following prompts. While OpenAI’s examples have been carefully chosen for their effectiveness, they appear to faithfully and convincingly follow quick instructions to render objects with minimal distortions. Compared to DALL-E 2, OpenAI says DALL-E 3 optimizes small details like hands more effectively, creating virtually attractive images without the need for hacking or quick engineering.

DALL-E 3 image provided by OpenAI with the prompt: “Illustration of an avocado sitting in a therapist’s chair saying ‘I feel so empty inside’ with a hole the size of a hole in its middle. Therapist, spoon, scribbles notes.”

OpenAI
DALL-E 3 image provided by OpenAI with the claim: “A vast landscape made entirely of different meats spreads out in front of the viewer. Thin, juicy hills of roast beef, trees of chicken thighs, rivers of bacon, and rocks of pork create a surreal scene,” but the sight is appetizing. The sky is decorated with pepperoni sun and salami clouds.”

OpenAI
DALL-E 3 image provided by OpenAI with the prompt: “Thumbnail of a café decorated with indoor plants. Wooden beams crisscross above, highlighting a cold beverage station with small bottles and cups.”

OpenAI
DALL-E 3 image provided by OpenAI with the claim: “A close-up of a hermit crab nestled in wet sand, with sea foam nearby and highlighting details of its shell and sand texture.”

OpenAI
DALL-E 3 image provided by OpenAI with the claim: “Paper craft art depicts a girl giving her cat a gentle hug. They both sit amid potted plants, with the cat purring contentedly while the girl smiles. The scene is decorated with handmade paper flowers and leaves.”

OpenAI
DALL-E 3 image provided by OpenAI with the claim: “A pixel art view of Coit Tower standing tall on Telegraph Hill, with a panoramic view of the city below and birds flying around.”

OpenAI
DALL-E 3 image provided by OpenAI with the claim: “Little potato kings wear majestic crowns, sit on thrones, and oversee a vast potato kingdom filled with potato themes and potato castles.”

OpenAI
DALL-E 3 image provided by OpenAI with the claim: “Illustration of a human heart made of transparent glass, standing on a pedestal in the middle of a stormy sea. Sunlight breaks through the clouds, illuminating the heart, revealing a small universe inside.” “The quote ‘Find the universe within you’ is etched in bold letters across the horizon.”

OpenAI
DALL-E 3 image provided by OpenAI with the claim: “Middle-aged woman of Asian descent, her dark hair streaked with silver, broken and cracked, intricately embedded within a sea of broken porcelain. Porcelain sparkles with splattered paint. Patterns in a harmonious blend of blue “Brilliant and matte, green, orange, and red, her dance is captured in a surreal juxtaposition of movement and stillness. Her skin tone, a light porcelain hue, adds an almost mystical quality to her form.”

OpenAI

By comparison, Midjourney, a competing AI image synthesis model from another vendor, displays realistic detail well, but still requires a significant amount of unintuitive tinkering with prompts to get any control over the image output.

The DALL-E 3 also seems to handle text within images in a way its predecessors couldn’t (some competing models like the Stable Diffusion XL and Deep Floyd They get better at it.) For example, an avocado cartoon with the character’s quote was perfectly created, a message including the words, “Illustration of an avocado sitting on a therapist’s chair saying ‘I feel so empty inside’ with a crater-sized hole in the middle of it.” Encapsulated in a speech bubble.

Notably, OpenAI says that DALL-E 3 has been “built natively” on ChatGPT and will arrive as an integrated feature of ChatGPT Plus, allowing for conversational improvements to images in a way that uses the AI assistant as a brainstorming partner. This also means that ChatGPT will be able to generate images based on the context of the current conversation, which could lead to fresh new capabilities. Microsoft’s Bing Chat AI assistant, also built on OpenAI technology, has been able to create images in chat since March.

The teapot that created the storm

Image generated by DALL-E 3's artificial intelligence — Zoom in / AI-generated image DALL-E 3 of “a 3D rendering of a coffee cup placed on a window sill during a windy day. The storm outside the window is reflected in the coffee, with miniature lightning bolts and turbulent waves visible inside the cup. The room is dimly lit, adding From the dramatic atmosphere.”

OpenAI

The original version of DALL-E appeared in January 2021, and OpenAI launched its dramatically more capable sequel in April 2022, launching a new era of AI-generated imagery with such amazing fanfare that it captivated its initial closed beta testers. DALL-E models use a technology called Latent spread That refines noise into images that it “recognizes” from the knowledge it gains from training on the dataset and guidance from the vector. The same technology allowed the Stable Diffusion open weight model to emerge in August last year.

Given how DALL-E learns concepts about images in training by mining a massive dataset of human-generated artwork, the AI image generation technology has been highly controversial since its introduction last year. The technology has sparked protests from artists who fear it will unethically replace or replicate their methods, lawsuits over copyright infringement based on stolen images used as training data without consulting copyright holders, and new copyright rulings from the Copyright Office. American publishing and the United States. District Court Judge.

As a nod to these controversies, OpenAI says that DALL-E 3 is designed to reject requests that request an image in the style of a live artist. OpenAI too Provides a model Creators can opt out of having their images used to train future models. These measures seem unlikely to satisfy artists who typically believe that AI training should be opt-in only rather than included in image datasets by default.

Comparison between — Zoom in / Comparison of “An expressive oil painting of a dunking basketball player, depicted as a nebula explosion” as created by DALL-E 2 (left) and DALL-E 3 (right).

OpenAI

Currently, US copyright policy states that only artwork created by AI cannot receive copyright protection, so technically any image created with DALL-E 3 would fall into the public domain. Although OpenAI doesn’t explicitly acknowledge this, it does say that “the images you create with DALL-E 3 are yours to use and do not need our permission to reprint, sell, or market them.” This is a marked change from last year when OpenAI Restricted use of image DALE-2 Based on a license that states that OpenAI “owns all generations.”

In terms of safety, OpenAI says that, like DALL-E 2, it has implemented keyword and image detection filters in DALL-E 3 to limit its ability to produce violent, sexual, or hateful content. The system is also programmed to reject requests that generate photos of public figures by name, which caused issues with rival AI-powered photo generator Midjourney when it created fake arrest photos of Donald Trump.

OpenAI says it worked with experts known as the “Red Team” to identify and mitigate potential risks, such as harmful biases or generating propaganda and misinformation. OpenAI hasn’t offered any word about the potential of its tool to do this Bend the historical record With thinly disguised slurs, though, it says it’s experimenting with a “source classifier” tool that could help determine whether an image was created by DALL-E 3 or not.

At the moment, we don’t have access to DALL-E 3 to test it yet, but OpenAI says the AI image generator is now undergoing closed testing. It plans to make it available to ChatGPT Plus and Enterprise customers “in October via the API and in Labs later this fall.”

Ayhan

“Writer. Friendly troublemaker. Lifelong food junkie. Professional beer evangelist.”

Techsprouts

OpenAI’s new AI image generator pushes the boundaries with fast detail and resolution – Ars Technica

The teapot that created the storm

Leave a Reply Cancel reply

The alleged real Martha of the reindeer denies stalking Piers Morgan

Footprints in China point to a new megaraptor that roamed with the dinosaurs

Charlotte Hornets hire Charles Lee as head coach

The new M4-powered iPad Pro (2024) blows its M2-based predecessor out of the water