AI at its Finest – DALL·E: Creating Images from Text


You know what AI is getting crazily serious. I mean you can talk to someone like your deceased friend, pay homage to Replika. Wanna plan your future buildings? Use cloud-based AI software by Spacemaker.

AI goes as far as hiring and firing candidates for you. It can even write something as thoughtful, musical, and literary as poetry!

So you must be thinking now what?

Meet another marvel of AI, DALL-E.

A 12-billion parameter version of GPT-3, DALL-E has been designed to produce images using text descriptions and a data pool of text-image pairs.

Hmm. Has a nice ring to it!

Experts found that the AI is smart enough to put together unrelated ideas and concepts impressively, interprets the text, produces anthropomorphized forms of objects and animals. It tweaks the existing images depending upon the text fed to it.

GPT-3 found out that a massive neural network can be instructed using language to complete several text generation tasks. This same neural network is utilized to produce images quite efficiently in DALL-E.

The way DALL-E works is quite similar to GPT-3 as it is also a transformer language model. DALL-E receives both image and text as a distinct stream of data. The data is comprised of as many as 1280 tokens and is well-equipped to produce all tokens with maximum likelihood.

Attribute Control

DALL-E has the ability to amend different attributes of an object no matter how many times the modifications are made.

Multiple objects- Right here, right now!

It is quite challenging to imagine that a number of objects, their spatial relationships as well as attributes can be controlled all the same time. Take, for instance, the phrase “a hedgehog wearing a red hat, yellow gloves, blue shirt, and green pants.”

For a complete and accurate interpretation of the above phrase, DALL-E must coordinate each apparel with the correct animal. Not only that, while doing so, it should also particularly consider the precise combinations like (blue, shirt), (yellow, gloves), (red, hat) and (green, pants) without mixing any of them.

DALL-E is indeed intelligent in terms of controllability. However, the success rate of the program is highly dependent upon the phrasing and word arrangement of the caption. Moreover, DALL-E is vulnerable to confusing the objects as their number increases. The objects and their colors might be confused which leads to a decline in the success rate of the program.

So yup! Phrasing is a crucial part as far as the working of DALL-E is concerned (be careful!)

3D Modelling

Let me tell you that your DALL-E is superefficient to generate a 3D model and allows the viewing of a scene to be controlled. Experts put this ability of DALL-E to test by attempting to draw the head of a renowned figure. The result? Seamless animation of a rotating head! Wow!

Coordinating Unrelated Concepts

They say you might not be able to compare apples and oranges but tell you what something amazing can happen when AI enters the scene. Yes, that’s true DALL-E has the unique ability to blend different ideas and produce strikingly different objects. It is because language offers the flexibility to combine imaginary and real things. Experts achieved this by transferring certain conceptions to animals and crafting products based on inspiration from other non-related ideas.

Illustrations and Animals- HMMMM!

The program can also get quite artistic and produce three types of illustrations like emojis, animal chimeras, and anthropomorphic versions of objects and animals.


Come on now! Refresh your artistic muscles with this exciting DALL-E.


Please enter your comment!
Please enter your name here