OpenAI’s new AI image generator pushes the boundaries with fast detail and resolution – Ars Technica

On Wednesday, OpenAI Announce DALL-E 3, the latest version of the AI ​​image synthesis model that features full integration with ChatGPT. DALL-E 3 displays images by closely following complex descriptions and handling the creation of text within the image (such as labels and tags), which is a challenge to previous models. Currently in research preview, it will be available to ChatGPT Plus and Enterprise customers in early October.

Like its predecessor, DALLE-3 is a text-to-image generator that creates new images based on written descriptions called prompts. Although OpenAI hasn’t released any technical details about DALL-E 3, the AI ​​model at the heart of previous versions of DALL-E was trained on millions of images created by human artists and photographers, some of which are licensed from stock sites like Shutterstock. DALL-E 3 likely follows the same formula, but with new training techniques and longer computational training time.

Judging by the samples provided by OpenAI on its promotional blog, DALL-E 3 appears to be a radically more capable photomontage model than anything else available in terms of following prompts. While OpenAI’s examples have been carefully chosen for their effectiveness, they appear to faithfully and convincingly follow quick instructions to render objects with minimal distortions. Compared to DALL-E 2, OpenAI says DALL-E 3 optimizes small details like hands more effectively, creating virtually attractive images without the need for hacking or quick engineering.

By comparison, Midjourney, a competing AI image synthesis model from another vendor, displays realistic detail well, but still requires a significant amount of unintuitive tinkering with prompts to get any control over the image output.

See also  Apple says the $17,000 gold watch and all first-generation watches are obsolete and refuses to repair them

The DALL-E 3 also seems to handle text within images in a way its predecessors couldn’t (some competing models like the Stable Diffusion XL and Deep Floyd They get better at it.) For example, an avocado cartoon with the character’s quote was perfectly created, a message including the words, “Illustration of an avocado sitting on a therapist’s chair saying ‘I feel so empty inside’ with a crater-sized hole in the middle of it.” Encapsulated in a speech bubble.

Notably, OpenAI says that DALL-E 3 has been “built natively” on ChatGPT and will arrive as an integrated feature of ChatGPT Plus, allowing for conversational improvements to images in a way that uses the AI ​​assistant as a brainstorming partner. This also means that ChatGPT will be able to generate images based on the context of the current conversation, which could lead to fresh new capabilities. Microsoft’s Bing Chat AI assistant, also built on OpenAI technology, has been able to create images in chat since March.

The teapot that created the storm

Image generated by DALL-E 3's artificial intelligence
Zoom in / AI-generated image DALL-E 3 of “a 3D rendering of a coffee cup placed on a window sill during a windy day. The storm outside the window is reflected in the coffee, with miniature lightning bolts and turbulent waves visible inside the cup. The room is dimly lit, adding From the dramatic atmosphere.”

OpenAI

The original version of DALL-E appeared in January 2021, and OpenAI launched its dramatically more capable sequel in April 2022, launching a new era of AI-generated imagery with such amazing fanfare that it captivated its initial closed beta testers. DALL-E models use a technology called Latent spread That refines noise into images that it “recognizes” from the knowledge it gains from training on the dataset and guidance from the vector. The same technology allowed the Stable Diffusion open weight model to emerge in August last year.

See also  It was announced that the Lemnis Gate server had been shut down and crossed off the list

Given how DALL-E learns concepts about images in training by mining a massive dataset of human-generated artwork, the AI ​​image generation technology has been highly controversial since its introduction last year. The technology has sparked protests from artists who fear it will unethically replace or replicate their methods, lawsuits over copyright infringement based on stolen images used as training data without consulting copyright holders, and new copyright rulings from the Copyright Office. American publishing and the United States. District Court Judge.

As a nod to these controversies, OpenAI says that DALL-E 3 is designed to reject requests that request an image in the style of a live artist. OpenAI too Provides a model Creators can opt out of having their images used to train future models. These measures seem unlikely to satisfy artists who typically believe that AI training should be opt-in only rather than included in image datasets by default.

Comparison between
Zoom in / Comparison of “An expressive oil painting of a dunking basketball player, depicted as a nebula explosion” as created by DALL-E 2 (left) and DALL-E 3 (right).

OpenAI

Currently, US copyright policy states that only artwork created by AI cannot receive copyright protection, so technically any image created with DALL-E 3 would fall into the public domain. Although OpenAI doesn’t explicitly acknowledge this, it does say that “the images you create with DALL-E 3 are yours to use and do not need our permission to reprint, sell, or market them.” This is a marked change from last year when OpenAI Restricted use of image DALE-2 Based on a license that states that OpenAI “owns all generations.”

See also  15 people learned lessons the hard way

In terms of safety, OpenAI says that, like DALL-E 2, it has implemented keyword and image detection filters in DALL-E 3 to limit its ability to produce violent, sexual, or hateful content. The system is also programmed to reject requests that generate photos of public figures by name, which caused issues with rival AI-powered photo generator Midjourney when it created fake arrest photos of Donald Trump.

OpenAI says it worked with experts known as the “Red Team” to identify and mitigate potential risks, such as harmful biases or generating propaganda and misinformation. OpenAI hasn’t offered any word about the potential of its tool to do this Bend the historical record With thinly disguised slurs, though, it says it’s experimenting with a “source classifier” tool that could help determine whether an image was created by DALL-E 3 or not.

At the moment, we don’t have access to DALL-E 3 to test it yet, but OpenAI says the AI ​​image generator is now undergoing closed testing. It plans to make it available to ChatGPT Plus and Enterprise customers “in October via the API and in Labs later this fall.”

Leave a Reply

Your email address will not be published. Required fields are marked *