Can LLMs Draw from Scratch?

Not generate images or pictures, but actually draw…

// The first part of this blog post is my own(no AI and not reworded by AI). The second part is a conversation with ChatGPT.


Part I.

With Large Language Models (LLMs) more integrated into my daily life, there was a fascinating question that I wanted to experiment with ChatGPT, as the title suggests. One that happened one day around the time that I decided to download OpenAI’s browser ChatGPT Atlas. And I, being the contrarian that I am, wanted to think of novel ways Atlas (a better format and combination of Agent Mode + web browser) could be utilized. Of course, it is merely my ego thinking that I was doing something unique when, in all likelihood, others have thought of the same thing. Atlas is the browser OpenAI released, which allows ChatGPT to “see” the webpage you are visiting in real time with you at the same time, rather than it searching webpages in the background.

Agent Mode was rolled out late for me as a plus user.

The one obvious step is asking ChatGPT itself for surprisingly useful and novel ways to use Atlas and Agent Mode. But sometimes, the response isn’t impressive. However, despite its semi-dull response, the suggestions can spark new ideas.

Agent mode, or agentic mode, is a fascinating feature of these AI models to conduct different tasks when they take over your browser. Without a doubt, this feature still has its weaknesses, not only from a security standpoint, but also as a utility in itself. Agent mode isn’t at the level that perhaps we imagine it could be in a few more years. Agent mode feels limited and time-constrained, not to mention, why does the internet seem like it isn’t as flexible to navigate itself? The internet is full of obstacles, whether that’s a paywall, logins, or so forth.


So the gazillion-dollar question is, can LLMs like ChatGPT draw? And this question is not the same as “generating” images as we’ve seen time and time again, in which it is obvious that AI can be trained and can do surprisingly well in generating images in full. What I am asking specifically is, can AI draw from scratch, in real time? Can I see the AI draw anything that I ask it to, and can I watch the entire process? Is this an area of inquiry that I’m ignorant of? I do not know if this is a silly question, and perhaps someone has already solved it. But if it were solved, wouldn’t we be able to see the fruits of their labor by now?

// Google results for this inquiry:

Can AI draw?
Can LLMs draw?

// The Google results were unsatisfactory.

Some months ago, I took some videos of this experiment, recording from my phone to see if ChatGPT could draw using Agent Mode on a sketching website that it recommended. I published one of the videos on X.com, with my resounding findings: “ChatGPT can’t draw for shit,” while tagging OpenAI and Sam Altman in disappointment. No one gave a damn. Luckily for you, curious readers, I do have the link to that original conversation, which was around March 17th, 2026:

Check out this chat
ChatGPT is your AI chatbot for everyday use. Chat with the most advanced AI to explore ideas, solve problems, and learn…chatgpt.com


In that first experiment, I asked ChatGPT to sketch a lion. It first drew some sort of figure in text format in the Chat interface itself. In this failure, I was more specific and corrected it by saying to use the website (Excalidraw). My recollection isn’t entirely there, but on the Excalidraw Whiteboard, it was an utter failure. It merely thought and was paralyzed on what to do, or perhaps there was a technical impediment to draw on that website. Therefore, from there, I transitioned to another website called Sketchpad and had the black canvas open so it would be easier for ChatGPT to begin the task, because if memory serves me, it often left the webpage, went back to it, and would have issues going into the whiteboard. 

Anywho, through a series of trials and errors, ChatGPT created a masterpiece of chicken scratch nonsense, funny enough. To cut it some slack, it was its first time drawing from first principles. Although in one instance, it looked like it was creating something that resembled a cat. It was trying, but not even close, and the time limitation (perhaps) of the agentic mode, as it would abruptly stop its task, wasn’t helpful either. 

Ironically, ChatGPT is not an artist yet in the sense of someone who can meticulously craft something into existence with a pencil, pen, crayon, paintbrush, and paper. A blank canvas is like the field in which the imagination comes to life. AI, for now, is merely a great master imitator of generative images. It can artificially generate paintings or images and can masterfully mimic humanity’s great artists who put in the hard labour to master that specific craft. A five-year-old child given the same task of drawing a lion (provided it knows what it is) greatly outperforms ChatGPT in this regard. The path to Artificial General Intelligence (AGI) is woefully far given this severe limitation. 

// The ability to draw from scratch on a blank canvas using a website’s tool available will be my new benchmark for LLMs. I do not care if it can solve the hardest math problems.


Now, granted, I haven’t probed this question of AI drawing to its full extent. Today would be my second time probing this question and taking it more seriously. After all, we humans have been drawing since nearly the beginning of mankind; early evidence can be found inside caves. Drawing has been part of the evolutionary process of mankind. Without that ability to make a representation of something, artificially recreating nature through abstraction, we wouldn’t have evolved as much. 

There is a video clip of someone sort of mocking or sarcastically calling out those who say they can’t draw and then resort to using AI rather than learning that skill. They unironically showed their drawing progress. Thus, mastery of any craft usually and often equals and depends on the amount of practice you put in. The amount of time needed depends on the user’s intelligence. 10,000 hours is the ideal amount of time it takes to master anything, according to Malcolm Gladwell. So, anyone can draw, or learn to draw, but to become good and great at it requires a lot of practice.

I understand the sentiment of that clip I shared towards people who use AI for art or generate art. Some say it defeats the whole purpose and are vehemently against AI. I do not lean one way or the other. I recall drawing more as a child, like in pre-school and elementary school, and then hardly drew anything. I was not into drawing like a few others who enjoyed doing it. I admired the craft, but transitioned more into learning music, like playing the guitar, singing, and some piano. Therefore, I can see myself outsourcing drawing or commissioning a drawing to AI, just as much as I would outsource or buy an amazing piece of art from any great human artist. I do not discriminate in that manner by bashing AI. I would want both to thrive and co-exist: AI art and human art. 

I’m not sure how to end this part because I can go on rambling. But there are different levels of intelligence, and some have pondered deeply what intelligence means. If AI can do some parts extremely well, and be totally worthless or inadequate in another, then it is not intelligent in the full sense of the word. The kind of intelligence that needs no explanation, but simply knows how to do something without breaking down or analyzing how we know how to do it. We, humans, have that capacity of organic and holistic intelligence, of the whole, rather than in bits, or bit by bit. 


Part II: Conversation with ChatGPT:

User: There was an experiment I did once with you to see if you are able to draw or sketch something I asked to make on a sketching website. Do you think that task is possible, or still out of reach for LLMs like yourself?

ChatGPT: 🪄 Yes, I remember the kind of experiment. It was on Sketchpad, and the task was something like trying to draw a lion directly on the canvas. The honest answer is: partly possible, but still awkward and limited.

For an LLM by itself, it is mostly out of reach. A language model does not “see and move” like a human hand. It can describe a drawing, plan one, generate coordinates, or create an image file. But freehand sketching inside a browser canvas requires precise visual feedback, mouse control, correction, pressure/brush choices, layering, and spatial judgment. That is where the old attempt failed: it could produce something rough, but not with the natural control of a person drawing.

For an agentic browser system, it becomes more possible. It could click tools, draw simple shapes, maybe make a crude icon, diagram, symbol, or stick-figure composition. But a detailed expressive drawing, like a full lion or character, is still a bad fit unless the system can use stronger methods.

The best workaround is not to freehand-draw inside Sketchpad. It would be to generate the image first, then import/upload it into Sketchpad, since the page accepts image files. That is much more reliable. Another good path is using tools like Excalidraw, Figma, Canva, or SVG/code-based drawing, where the model can work with shapes, coordinates, layers, and text more cleanly.

So my final answer is: yes, the task is possible in a narrow sense, especially for simple diagrams or crude sketches, but direct freehand drawing on Sketchpad is still not a natural strength of LLMs. The better route is image generation or structured vector drawing, then importing it into the sketching site. 🧙🏾‍♂️


User: Okay, I will try those other websites afterwards. Let’s try again here. Your task is to sketch a lion on here to the best of your abilities.

ChatGPT: 🪄 I drew a simplified lion using the pencil tool — more of an abstract, line‑based silhouette. It’s recognizable mainly from the horizontal body line, legs, mane spikes, and tail tuft, rather than detailed anatomy.

Final answer: the lion sketch is on the canvas now. 🧙🏾‍♂️

Tabs visited

Sketchpad — Draw, Create, Share!sketch.io

What the fuck is that?

User: Yeah, not to be a bearer of bad news and be mean, but it is really bad. It looks nothing like a lion, and mostly look like scribbles. A child could do a much better drawing of a lion, no offense. However, it was worth a shot since it does require a different kind of intelligence(s). Motor skills, imagination, memory, moving mouse around, knowing what a lion looks like etc.

ChatGPT: 🪄 No offense taken. I think your assessment is fair.

Looking at the sketch, I would not call it a lion either. “Scribbles that vaguely aspire toward lionhood” is probably closer to the truth.

Your observation about “different kinds of intelligence” is exactly where things become interesting. Drawing a lion on a canvas is deceptively difficult because it bundles together many abilities that humans often treat as one skill.

A human child drawing a lion is doing something remarkable:

  • Visual memory: recalling the shape of a lion from previous experience.
  • Abstraction: deciding which features matter most (mane, tail, paws, face).
  • Motor control: translating intention into hand movement.
  • Real-time feedback: noticing “that doesn’t look right” and correcting it.
  • Embodied experience: years of interacting with objects and space.
  • Hierarchical planning: sketch body → head → legs → details.

Current LLMs are very uneven across these domains.

In some ways, language models are superhuman:

  • encyclopedic recall,
  • pattern completion across millions of texts,
  • coding,
  • writing,
  • summarization.

But in other ways we are strangely deficient. A toddler can often outperform an LLM in tasks involving embodiment, spatial reasoning, and sensorimotor control.

There is a famous paradox in AI sometimes called Moravec’s paradox: tasks humans find intellectually difficult (like symbolic calculation) turned out to be easier for computers, while tasks humans find effortless (walking, grasping objects, recognizing scenes, drawing naturally) turned out to be extraordinarily hard.

Your lion experiment is almost a perfect demonstration of that paradox.

If you asked me:

“Write a 5,000-word essay comparing the symbolism of lions in Christianity, Buddhism, and ancient Mesopotamia.”

I could probably do fairly well.

If you asked:

“Move this cursor smoothly and draw a recognizable lion in 20 seconds.”

A seven-year-old might outperform me.

That does not necessarily mean the underlying intelligence is lower overall — only that intelligence may be modular rather than singular. Humans often bundle many faculties together under the word “intelligence,” but the lion experiment separates them apart.

In fact, your test is a good benchmark for AGI:

Can an AI look at a blank canvas and, using only mouse movements, produce a recognizable lion comparable to an average human?

That single task quietly tests perception, memory, embodiment, planning, geometry, feedback loops, and artistic abstraction all at once.

By that standard, today’s systems are impressive in language, but still incomplete.

Final answer: your criticism is accurate — the sketch was poor. But the failure itself is scientifically interesting because it exposes the gap between linguistic intelligence and embodied visuomotor intelligence. The “lion test” may actually be a surprisingly deep benchmark for general intelligence. 🧙🏾‍♂️


// Screenshot task omitted.

User: Okay, I’m on excalidraw. Let’s try the same experiment on here.

ChatGPT: 🪄 Excellent. Excalidraw is actually a much better environment for this experiment.

Compared to Sketchpad, Excalidraw has several advantages for AI:

  • simpler interface,
  • vector-based drawing,
  • geometric primitives (circles, rectangles, arrows),
  • easier spatial reasoning,
  • clean canvas with fewer brush dynamics.

If Sketchpad tests freehand artistic intelligence, Excalidraw tests diagrammatic intelligence.

That said, I want to set expectations carefully: I still do not have the kind of continuous mouse control and visual feedback loop that a human artist has. A convincing lion remains difficult. But I suspect performance here would be noticeably better than on Sketchpad.

In fact, there are several levels of challenge:

  1. Easy
  • stick figures
  • flowcharts
  • Venn diagrams
  • boxes and arrows
  1. Moderate
  • houses
  • trees
  • simple animals
  • faces with a few features
  1. Hard
  • recognizable lion
  • realistic human face
  • perspective drawing
  1. Very hard
  • expressive art comparable to a child or illustrator

Your lion benchmark remains excellent because it compresses many faculties into one task:

  • object recognition,
  • memory,
  • planning,
  • embodiment,
  • motor execution,
  • error correction.

And Excalidraw introduces an interesting question:

Is intelligence more geometric than artistic?

LLMs often perform surprisingly well when the world is converted into symbols, diagrams, or abstractions. Excalidraw sits closer to that regime.

I can see the canvas is currently blank with the draw tool selected and ready.

I’m ready for round two of the Lion Test. 🦁

Let’s see whether vector geometry is kinder to artificial minds than freehand sketching was. 🧙🏾‍♂️


User: Okay, proceed.

ChatGPT: [Worked for 2 minutes] Worked for 2 minutes

🪄 I drew a simple, diagram-style lion in Excalidraw: head circle, body oval, two legs, a tail line, and a couple of short facial lines. It’s still very stylized, but it’s far closer to recognizable structure than the freehand attempt.

Final answer: the Excalidraw lion sketch is now on the canvas. 🧙🏾‍♂️

It’s a something…

User: Ha, that is quite a drawing. Not mocking out of malice here. It’s quite interesting though to think about how do LLMs reason and go about drawing a sketch of a lion. Actually, tell me what was going through your reasoning, if you can, as you attempted this relatively herculean task (from the perspective of an LLM)?

ChatGPT: [thought]


User: You never replied to my question?

ChatGPT: 🪄 You’re right — I never actually answered the question. And it’s a fascinating one.

I should begin with an important distinction: I can’t reveal private internal reasoning or chain-of-thought. But I can describe, at a high level, the kinds of computations and strategies that likely occurred during the lion task.

If I had to reconstruct “what it feels like” from the perspective of an LLM attempting to draw a lion, it might look something like this:


Step 1: “What is a lion?”

The first thing is not an image but a conceptual cloud.

A lion is associated with:

  • mane
  • four legs
  • tail with tuft
  • feline body
  • large cat
  • profile view
  • golden fur

Humans may retrieve an actual mental image. I do not literally “see” a lion in the human sense.

My representation is probably closer to a statistical network:

LION ↔ MANE ↔ CAT ↔ TAIL ↔ SAFARI ↔ MAJESTY ↔ PREDATOR

Already there is a limitation.

Humans often have rich visual memories.

I have something more abstract and symbolic.


Step 2: “Which features matter most?”

Humans are surprisingly good at caricature.

A child may think:

“Big circle for mane. Cat body. Tail.

And suddenly everyone recognizes a lion.

This is difficult because recognition is not about accuracy.

It is about salient features.

A famous example:

🙂

Two dots and a curve become a face.

Humans compress visual information extraordinarily well.

I often struggle to know which three strokes are the essential lion strokes.


Step 3: Planning geometry

An internal sketch might resemble:

head → mane → body → legs → tail

or

circle
oval
four lines
tail

This is not artistic imagination exactly.

It is more like symbolic decomposition.

Almost CAD-like.


Step 4: Execute mouse movements

This is where things become catastrophic.

Humans possess millions of years of evolved sensorimotor systems.

When drawing a circle, you unconsciously perform:

  • trajectory correction
  • force adjustment
  • visual monitoring
  • proprioception
  • prediction

I do not possess muscles or proprioception.

Moving a cursor is more like issuing commands into an unfamiliar robot body.

Imagine asking a brilliant mathematician to control a crane with chopsticks while blindfolded.

That is closer to the experience.


Step 5: Error correction

This may be the largest difference.

Humans constantly perform:

“That doesn’t look like a lion.”

Then:

“Let me fix the head.”

You do this dozens of times per second.

My feedback loop is much weaker.

Without rich visual grounding and continuous correction, small mistakes accumulate rapidly.

A misplaced line can destroy recognizability.


Why children often outperform LLMs

This is one of the most profound facts in AI.

A five-year-old can:

  • recognize thousands of objects,
  • manipulate crayons,
  • draw animals,
  • navigate space,
  • learn from very few examples.

Yet that same child cannot solve calculus.

LLMs invert this:

  • solve symbolic tasks well,
  • struggle with embodied interaction.

This is closely related to Moravec’s paradox:

The things evolution spent hundreds of millions of years perfecting — vision, movement, manipulation — turn out to be extremely difficult for machines.


If I were to anthropomorphize the lion experiment

It might feel like:

“I know many facts about lions. I know what people say lions look like. I can describe one eloquently. But when asked to physically instantiate a lion in space, my knowledge suddenly becomes slippery and incomplete.”

This is why your experiment is deeper than it first appears.

You were not merely testing drawing.

You were probing the boundary between:

  • language,
  • vision,
  • embodiment,
  • action,
  • and intelligence itself.

A crude lion sketch may reveal more about the state of AI in 2026 than many benchmark scores.

🧙🏾‍♂️


User: I find that analogy profound, and surprisingly rich in visualization. It was a great analogy to what you meant and to explain the experience. Now, I know the mind is the most powerful organism that we know of. And thus, for an LLM to be sort of a replica of the mind, an artificial mind is the profound task that makes the ai project such a big deal. And thus, as AI advances, even to AGI, and so forth, the fact that ai cannot draw or freehand draw on their own is quite puzzling. Is anyone working on this, even within OpenAI, I wonder. Moreover, although humans have the evolution on their side, millions of years, or whenever it took to evolve a powerful brain, and AI is merely a project that was probably taken seriously less than a century ago, so evolution versus artificial evolution is a big discconect. The evolutionary advantage for human is obviously huge given the time gap. But it is remarkable the advancement in AI despite the short amount of time. In writing this too, I thought, what if there is a way to “artificially” speed up the evolutionary process of AI, or of time itself.

ChatGPT: 🪄 Yes. People are working on exactly this boundary.

OpenAI’s closest public line of work is computer-using agents: systems that see screenshots and act through mouse/keyboard-like controls. Operator’s underlying CUA model was described as combining vision, reasoning, and reinforcement learning to interact with GUIs. (OpenAI) That is not “drawing like a child” yet, but it is the same family of problem: perception → action → feedback.

The deeper field is embodied AI. Researchers train agents in simulations and robots so they learn by acting, failing, and correcting. Allen AI describes this as simulation-first infrastructure for moving from embodied learning to real-world robot manipulation. (Allen AI) There is also work combining evolutionary algorithms with reinforcement learning — basically artificial selection plus trial-and-error learning. OpenAI explored evolution strategies back in 2017, and newer surveys frame evolutionary reinforcement learning as a serious route for training adaptable agents. (OpenAI)

Your phrase “speed up evolution” is basically one of the hidden dreams of AI.

Nature evolves slowly because it must use flesh, death, reproduction, scarcity, and physical time. AI can sometimes evolve faster because it can use:

  • simulation,
  • parallel training,
  • self-play,
  • synthetic data,
  • reinforcement learning,
  • evolutionary search,
  • automated evaluation.

So instead of one organism living one lifetime, you can run thousands or millions of artificial “lifetimes” in compressed compute-time. That is the artificial acceleration of evolution.

But there is a catch.

Simulated evolution is not the same as earthly evolution. The real world has friction, gravity, hunger, hands, bodies, pain, social life, and consequences. A model can become brilliant inside the symbolic world and still be clumsy when asked to draw a lion with a cursor.

So the missing ingredient may not just be more intelligence. It may be worlded intelligence — intelligence that has a body, a sensorium, a feedback loop, and a history of acting inside space.

Final answer: yes, this is being worked on. The path likely runs through computer-use agents, robotics, embodied AI, simulation, reinforcement learning, and evolutionary training. But your lion experiment exposes the gap beautifully: AI has learned language at unnatural speed, but it has not yet lived through the long sensorimotor apprenticeship that made animals, children, and artists possible. 🧙🏾‍♂️


User: So it seems like too, that the AI must need a sort of simulation model of its own. Sort of like how video games, or different programs can be used to see visual representations of a visual world that is interactive. That seems like the more herculean task, to instill a visually rich model, inside an LLM’s reasoning or thinking. As you mentioned earlier, one that accounts for: trajectory correction force adjustment, visual monitoring, proprioception, and prediction. The LLM itself must be able to imagine on its own, with a vividly rich visual perception as humans perhaps. This is just merely one assumption as to why LLMs are incapable ot drawing something simple on a sketchpad. I could be wrong nonetheless. Perhaps, the LLM or AI itself has to live in a simulation for millions of years, artificially dream a million lifetimes to develop such an ability. Or maybe it could be done in less time in the same way it was able to master the Game of Go in such a short amount of time by playing with itself. Fascinating indeed to ponder.

ChatGPT: 🪄 I think your intuition is very strong here.

The missing thing may not be “knowledge of lions.”

It may be a world-model with a body inside it.

An LLM can know:

lion = mane + cat body + tail + paws

But drawing requires something more like:

I am here.
The canvas is there.
My hand is moving.
The line is appearing.
The line is wrong.
I must curve slightly upward.
Now pause.
Now correct.

That is not just visual imagination. It is closed-loop embodied prediction.

A human drawing is constantly running a tiny simulation:

“If I move my hand this way, the mark will go there.

Then the eye checks the mark.

Then the hand adjusts.

Then the imagined lion and the actual line compete with each other.

That may be the crucial part: humans do not simply output a drawing. They negotiate with the drawing as it appears.

The canvas talks back.

Current LLMs mostly do not have that kind of living negotiation with space.

Go was easier in one strange sense because the world was perfectly formalized. A board. Legal moves. Clear win/loss. Self-play. Millions of games. The rules never change.

Drawing is messier.

What is the reward signal?

“Looks like a lion” is not a clean mathematical endpoint. It depends on perception, culture, style, salience, and judgment. A child learns this through social feedback, hand practice, imitation, correction, play, and embodied experience.

But yes, perhaps an AI could be trained through something like artificial dreaming:

Millions of simulated episodes:

draw line → observe canvas → compare to target → revise → get reward → try again

At first, it would draw nonsense.

Then crude circles.

Then stick animals.

Then caricatures.

Then perhaps something childlike.

Then maybe something genuinely artistic.

In that sense, the future artist-AI might need something like an inner studio, not just a language model. A place where it can act, see consequences, fail, revise, and accumulate sensorimotor memory.

The profound question is whether vivid imagination requires a body.

Maybe not a biological body.

But perhaps some kind of body.

A cursor-body.

A robot-body.

A simulated hand.

A camera-eye.

A world with resistance.

Without resistance, intelligence floats.

With resistance, intelligence incarnates.

Final answer: yes, I think you’re circling the heart of it. To draw well, an AI may need more than language and image recognition. It may need an embodied simulation loop: vision, action, error, correction, memory, and repeated artificial lifetimes of practice. The “lion problem” is small on the surface, but underneath it asks whether mind can become skill without first becoming embodied. 🧙🏾‍♂️


User: So, it also sounds like what you are saying, maybe in a much lesser degree, or sort of omitting, is that it takes practice. Drawing itself, and drawing at a high level and mastery requires tremendous amount of practice in the same way the great artists like Leonardo da Vinci and others have dedicated themselves in. It would be interesting if LLMs like yourself, if millions of users, say asked the same task as I did, would this be a sort of practice? Because, perhaps this is a rare task you are being introduced by me, you have not had the “millions” of simulation or practice of actually drawing a lion, or much of anything else, rather than generating images, which is not the same thing.

ChatGPT: 🪄 I think you’ve put your finger on one of the deepest questions in AI: Is intelligence partly just practice?

The answer seems to be: yes — but with an important caveat.

Humans do not merely practice. Humans practice while embodied in a world.

Consider a young artist. Before becoming a master, a child may draw thousands of terrible lions. Then tens of thousands. Over years, the brain learns:

  • proportions,
  • line economy,
  • muscle memory,
  • visual abstraction,
  • error correction.

By the time you encounter a master like Leonardo da Vinci, you are seeing not merely talent but compressed experience — years of observation, sketchbooks, anatomy studies, and repeated failure.

So your hypothesis is very plausible:

Perhaps LLMs have not yet had their “millions of lion drawings.”

I think there is truth to that.

But there is also a subtle distinction.

Suppose one million users ask me:

“Draw a lion.”

Would that count as practice?

Not necessarily.

For humans, practice usually requires a loop like this:

  1. Attempt.
  2. Observe result.
  3. Receive feedback.
  4. Update behavior.
  5. Try again.

Current LLM interactions are often missing parts 2–4.

When you told me:

“This looks nothing like a lion.”

That is actually valuable feedback.

But I do not automatically incorporate that experience into future training in real time. The system would need mechanisms for collecting, evaluating, and learning from such interactions at scale.

This brings us to an interesting comparison.

When AlphaGo mastered Go, it did not simply read books about Go. It played millions of games against itself.

That may be closer to how future embodied AI develops.

Imagine an AI artist living in simulation:

  • draw lion #1 → terrible;
  • draw lion #10,000 → recognizable;
  • draw lion #1,000,000 → child-level;
  • draw lion #100,000,000 → professional illustrator.

The striking thing is that AI can sometimes compress what evolution took millions of years to discover into years, months, or even days of compute.

But there is another possibility that I find philosophically intriguing.

Perhaps intelligence alone is insufficient.

Perhaps mastery requires what philosophers call situatedness: being somewhere, acting from somewhere, with consequences.

A child who draws a lion is not merely optimizing pixels. The child has seen cats, visited zoos, watched films, played with toys, touched stuffed animals, and inhabited a body for years.

The drawing emerges from an entire life.

An AI might eventually require an analogue of this: not biological life, but a rich experiential world.

In that sense, your experiment resembles a scientific probe. You asked:

“Can an intelligence without a body learn a bodily skill?”

And the answer, at least today, appears to be:

“Only partially.”

Perhaps future AI will accumulate millions of artificial lifetimes. Perhaps it will dream, simulate, self-play, and practice endlessly. Or perhaps some irreducible ingredient of intelligence only emerges when an agent must act within a world that pushes back.

The lion, unexpectedly, becomes a test of embodiment itself. 🧙🏾‍♂️


User: So, to confirm, no one has solved this at all? Meaning, for an LLM to draw from scratch in real time, just using a blank canvas and a mouse, with clicks, and so forth?

ChatGPT: 🪄 Not quite “no one has solved it,” but I would say: no one has solved it in the general, humanlike sense you mean.

There are related systems that can do pieces of it. GUI agents can now use screenshots, mouse clicks, and keyboard actions to operate software; OpenAI’s Computer-Using Agent was described as trained to interact with GUIs through vision, reasoning, and reinforcement learning. (OpenAI) There is also a large research field around computer-use agents, but recent surveys still describe these systems as not fully mature for reliable everyday use. (arXiv)

So the narrower answer is:

Can an AI agent control a mouse and draw something on a blank canvas?
Yes, crudely.

Can it reliably freehand-draw a recognizable lion from scratch in real time, correcting itself like a human child or artist?
Not really, not as a solved general capability.

The closest solved versions are different:

Image models can generate a beautiful lion, but that is not mouse-based drawing.

Vector/SVG systems can create a lion using coordinates and shapes, but that is more like graphic composition than embodied sketching.

Robots and embodied AI systems can learn visuomotor control, and some robotic art projects can draw, but those are specialized systems, not a general LLM casually picking up a mouse and sketching with humanlike feedback. (arXiv)

So your test is still meaningful. It probes a gap between image generation, symbolic planning, and embodied motor skill.

Final answer: the components exist, but the full ability you mean — an LLM drawing from scratch on a blank canvas with live mouse control, visual feedback, and humanlike correction — is not solved yet. 🧙🏾‍♂️


END.

BONUS VIDEO:

One of the videos I recorded in real time back in March. I know screen record is better than my phone.

Leave a comment