Future Arc Logo - EU Registered Trademark

The meteoric advancement of image generating AI

By Omar Anwar

In the last half a year or so, following the contraction of the Crypto and NFT space, the latest innovation to capture the minds of the public with its latent potential seems to be generative artificial intelligence (AI). The likes of ChatGPT seems to be all everyone speaks about now be it in positive tones (help with automation of monotonous tasks, an alternative to current search engines) or fearful whispers (potential redundancies through replacement, unchecked development of AI without the necessary legislation to alleviate ethical concerns). The truth often lies in a more boring middle ground (for now) as stated by Sam Altman himself, Founder of OpenAI. He writes that ChatGPT is “Incredibly limited” “but good enough at some things to create a misleading impression of greatness”.

Generative AI however isn’t just limited to chatbots, it also encompasses the field of text to image generators such as DALL-E 2, Midjourney and Stable Diffusion which have similarly created a buzz throughout social media due to the accessibility of their striking imagery. The tacit understanding of the difficulty of creating art conventionally versus the supposed ease, speed and imagination of AI created art contributes all the more to the shock people feel when they realise how much more powerful AI has become in the last few years. Even for those who have followed the rapid developments of the field, It’s easy to forget that the first images generated by the first text-to-image ml model AlignDraw were a mere 8 years ago and nothing like you see today (see below).

Article: Generating images from captions with attention

My first encounter with AI text to image generation came through a chance retweet of @images_ai on my Twitter feed back in December 2021 (thank you to whoever retweeted them, for the life of me I can’t remember who it was). What I do however clearly remember was that it was this piece. 

 

Something about the colour scheme and contrast of the central looking glass to its surrounding background made me want to find out more about the artist. Through reading their profile I was shocked to find that this piece was composed via a text-to-image model known as VQGAN+CLIP and not solely by human hands.

 

As with just about everyone who has come across AI art, this was my personal mind blown moment. 

Instantly I wanted to make my own pieces to see just how it was possible. They thankfully had a link to Katherine Crowson’s (@RiversHaveWings) user-friendly notebook on Google Colab as well as a tutorial guide. After a bit of messing around trying to get my head around all these new concepts I was finally able to make my first pieces. Unfortunately due to a change of computers I don’t have these anymore but I still have one I created from a more simple mobile app released at the time that also used VQGAN+CLIP.

The piece you see on the left is succinctly named “A hearty meal in the gardens on the surface of Mars, Vaporwave chrome render Unreal Engine”. The invention of text to image apps such as Wombo AI has greatly lowered the barrier to entry that many of us lay people previously encountered and have played a key role in the proliferation of Generative AI.

But before continuing on I wanted to highlight the human aspect of AI art and how the best pieces you often see don’t come from just striking gold with the right words. AI artists have to go through rigorous experimentation with prompting, post production editing and a discerning eye to decide which image to select from the often hundreds generated from their processes. This great thread by @amli_art breaks down her workflow in DiscoDiffusion and does it in a manner infinitely better than I could attempt so please do check it out if you’re interested in the nitty gritty of the field

It was from this point that the more accessible forms of image generators started to make their way to the public in an explosion in summer 2022. While similar tools existed prior to this point, the release of Midjourney (open-beta on July 12th) DALL-E 2 (Beta phase for 1 million waitlisters July 20th) and Stable Diffusion (open sourced August 22nd), captured the attention of many in the greater public with the leaps they took in realism and with the sheer volume of pieces available to view online from excited users keen to share their own pieces.

Interest overtime for the term “AI Image generator”. Note the spikes in summertime and the zenith 1-2 weeks after ChatGPT’s release date.

The seemingly exponential rate at which technology is developing in the AI Generative field has led to the knock on effect that the necessary ethical and legislative developments required are not keeping up. The most prominent example is early reports that high school students were using ChatGPT to tackle their assignments. In response local school boards attempted to ban access to the site on their local computers. However this was an inadequate response as often students would use their personal devices or wait until they were home to complete coursework. As of a few days ago OpenAI have developed and released a detection tool (whose accuracy and effectiveness is still to be determined) to help identify the likelihood of a piece of text being AI-written. Conversations that were initially stoked by fear and brainstorming how to ban access to the tool have matured in certain quarters, with time, to acceptance for its place in the future and attempts to understand how to integrate it as a teaching tool (both as an ideas generator and as a medium to critique and develop one’s own critical thinking skills).

 

Text-to-image models in turn come with even more legal grey zones that are currently being debated in a number of cases. Firstly there is the question of can one copyright work generated by an AI model? The current working theory is that if there is enough human involvement in the output of the final piece (i.e. prompt experimentation, post production tuning and production of multiple images)  then that piece will be able to be copyrighted. Further to that there’s the question of training the models themselves, can copyright protected data be used to train AI models? Finally, how does compensation/credit work for artists whose work is used in these models? Any potential remedy for this will require a solution that works for both the future and retroactively for artists whose work has already been used.

 

I hope you’ve enjoyed this stream of consciousness on generative AI. As a final point to illustrate just how far we’ve come in the last 8 years, I’ve taken the original prompts used for the first images produced by AlignDraw and inserted them into the latest version of Midjourney to bring you the images you see below. If this post has inspired any further questions or you just want to talk about generative AI you can always drop me a message!