There are two main frames in this animation, which we will call frames A and B. We also show one of the frames in the transition between frames A and B and call this frame C. It is helpful to enlarge them.
Frame A shows 12 men and frame B shows 13 men. The transition frames, including frame C, show that the transformation between frames A and B does not create or destroy any part of the image. The only change is the swapping of two regions. The illusion is that somehow 12 men are transformed into 13 men with this single change.
In the following images, we see frames A, B, and C with colored boxes showing the three regions of the image. The blue region remains fixed, while the red and green regions change places.
The red region is straightforward, with 5 men associated with it both in frame A and in frame B.
But something funny is going on in the green region. In frame A, there are 7 men associated with the green region, while in frame B, there are 8 men associated with the green region.
It is helpful to put frames A and B into an animated graphic that switches instantly between frames A and B.
Notice that the figures in frame B are generally smaller than the figures in frame A. This is one of the keys to understanding the illusion.
Now we attempt to simplify the puzzle by crossing out figures and seeing what is left.
Starting with frame A, we notice that there are 5 men at the left, all partly in the red region, and 7 men at the right, all partly in the green region. We cross out the right two men in both groups. Then we cross out the same portions in frame B.
Now look at frame B with crossouts. At the right, the group of 5 men, associated with the red region, reduces neatly to 3 men when we cross out.
But what happens on the left, with the green region? When we cross out 2 men, does it leave 5 men, as we see clearly in frame A, or 6? We cannot really tell, since what we are really left with are only parts of men. This is another key to the illusion.
Now we are ready to understand how the illusion works. In the process we will better understand how all illusions work, by playing on assumptions we take for granted but are not reliable in all cases.
The question “Where does the extra man come from?” is disingenuous, because the image is make up of pixels, not men. The illusion works because it shifts pixels, not men. The only place the men exist is in our perception.
The image is low resolution, which allows the illusionist to get away with tricks that would not work in high resolution. As computer users, we are used to low resolution figures, in which a few lines and dots represent to us a human face or figure. Consider that the mere combination of a colon and a right parenthesis is recognizable to most of us as a smiley face.
The generally scruffy and unusual appearance of the men is another technique for diverting our attention from the distorted appearance of many of the faces and other body parts that occur in the image.
Take a close look at frame B. Notice the figure on the extreme left. He has had the top of his head removed, yet we still think of this figure as a man.
Look at the face of the second figure from the right. In frame B, his head appears low, with the line representing his chin well down onto his chest. Yet in frame A, this same line does not represent a chin, but the boundary of a tank shirt.
If we measure the heights of the men in frame A and add them up, we would find that this total matches the corresponding total for frame B. This indicates that no pixels have been created or destroyed. It also indicates that the average height of the figures in frame B is smaller than in frame A.
If we were allowed to move pixels around as much as we wanted, we could easily make one face or body into two, just by making the images smaller. The obvious size difference between the two sets of images would not fool us into believing that a new man had somehow been created out of nothing. But when we change 12 men into 13, the size difference is not as noticeable, and we are more easily fooled.
Another factor that keeps us from noticing the size difference is the delay of several seconds in the transitions between frames A and B. It is easy for us to forget exactly how big the figures were in the preceding frame.
The 13 images of men are created out of 12 by moving pixels, which naturally changes the average size of the figures. By itself this is not remarkable. The genius of this illusion is that it accomplishes this transfomation in a single move, that the figures are recognizably human, that we do not notice the changes in average size, and that we do not mind the distortions in the image.