You can now edit ChatGPT images like Photoshop. A very interesting and useful new feature, but it quickly shows its limitations.
Today, there are many AI image generators that can create highly realistic images. These tools are becoming more powerful with increasing features. ChatGPT Plus subscribers recently gained the ability to edit specific areas of an image, similar to using Photoshop. No longer needing to regenerate the entire image to change a specific element, users can now indicate the area of the image to adjust to DALL-E, provide a prompt, and let the AI work. Very convenient indeed, but with certain limitations nonetheless.
Edit Images in ChatGPT
If you are a ChatGPT Plus subscriber, you can generate the image of your choice in seconds. To edit it, simply click on it; a Select button appears in the top right corner (shaped like a pencil with a line). You then adjust the size of the selection tool to cover the area of the image you wish to modify.
This is where things get interesting. You can regenerate only this selection, whereas previously the entire image needed to be regenerated. Once the selection is made, input your new instructions for that specific section. The more precise you are, the better the result. And your request is processed.
Based on my experiments, ChatGPT and DALL-E use the same “magic” as apps like Google’s magic eraser, intelligently filling the background with existing scene information while trying not to alter the rest.
It’s not the most advanced selection tool, and there are some inconsistencies here and there, but in most cases, the alterations are quite well done. No doubt OpenAI will improve the overall system over time.
Generative AI Shows Its Limits
While this new editing tool excels in changing the color and position of an object in a composition, for example, it struggles more to resize a giant on the ramparts of a castle. Sometimes, ChatGPT might even make the object that bothers you, and itself, disappear.
This feature is still very new, and OpenAI does not claim in any way that it can replace human editing. Clearly, that is not the case. Things will improve over time, that is certain, but we can see today the limitations of this kind of system.
DALL-E and similar models are very good at rearranging pixels and giving you a good approximation of a castle, for example, thanks to the numerous castles they have seen during their training, but the AI doesn’t really know what a castle is. It doesn’t understand geometry or physical space, which explains why some generated castles are oddly shaped. You will often notice this with buildings, furniture, and certain objects.
These models do not (yet) understand what they are showing you: that’s why in many videos generated by OpenAI Sora, people disappear as if by magic because the AI intelligently rearranges pixels without following the people. Similarly, these generative AIs struggle greatly to generate images of couples of different ethnicities because couples of the same ethnicity are much more common in their training data.
Another area for improvement is the inability of these generators to create completely white backgrounds. They are incredibly intelligent tools in many aspects, but they do not “think” like us and do not understand what they are doing as a human artist would. Keep this in mind when using them.