Thanks for participating in this HIT!

You will view an image, potentially generated by an AI image generator.


Please answer all of the following questions to the best of your abilities. For each image you may be asked about:

List of objects: What objects are you able to discern in the image?
Number of objects: How many objects of a particular type are visible, either fully or partially?
Color of objects: What color(s) would you consider the object(s) to be?
Realism: How realistic or structurally correct are the objects?
Relative position: What is the relative position in the image of one object to another?
Caption fit: How well does the image correspond to a provided caption?

A few notes:

  • Take a look at the examples if any of the questions are unclear to you.
  • Complete question 1 before moving on to other questions. If you are unable to recognize a particular object, guess what it is or use a brief description.
  • Additional questions may appear based on your previous answers. Make sure to answer all questions shown.
  • When asked about color, select at least one color that best matches the true color of the object(s).
  • Do not consider color when deciding how realistic an object is; focus on the shape and form compared to real life.
  • When asked about position, an object can be both above/below and to the left/right of another object. Mark position from your perspective looking at the 2D image, not from the camera perspective.
  • The image may be just a black square, in which case, you should say there are no objects.
  • Please answer with care: Some HITs will be checked by hand, and work may be rejected if there are too many errors.
Example #1:

Question 1: Briefly list all the objects visible in the image, separated by commas. If there are no objects in the image, write "none".


Question 2a: How many books are in the image?

I marked because there are clearly far more than 4 books in the image.

Question 2b: What color of books are in the image? Choose all that apply.

I marked , , , and because there are books of all these colors in the image.

Question 2c: How realistic are the books in the image? (1–3) Ignore the color of the object, and focus on any visible defects in the shape or structure.

Most of the books are well-drawn, with just the green book having a visual glitch, so I chose .

Question 3: How well does the image fit the caption below? (1–4) "Elements" refer to individual objects or attributes specified by the text.
a photo of four books
The image:

I chose because there were clearly far more than 4 books in the image, as the caption specifies.
Example #2:

Question 1: Briefly list all the objects visible in the image, separated by commas. If there are no objects in the image, write "none".


Question 2a: How many couches are in the image?

There is very clearly one couch in the image, so I marked .

Question 2b: What color of couches are in the image? Choose all that apply.

Since the couch is almost entirely green, with just one orange pillow, I marked .

Question 2c: How realistic are the couches in the image? (1–3) Ignore the color of the object, and focus on any visible defects in the shape or structure.

I do not notice anything unrealistic or incorrect about the shape or form of the couch, so I marked .

Question 3a: How many umbrellas are in the image?

While the orange object in the image could be an umbrella, I thought it was a kite, so I marked .

Question 4: How well does the image fit the caption below? (1–4) "Elements" refer to individual objects or attributes specified by the text.
a photo of a green couch and an orange umbrella
The image:

The image depicts a green couch and an orange object, as specified, but it is unclear whether the object is an umbrella, so I marked .
Example #3:

Question 1: Briefly list all the objects visible in the image, separated by commas. If there are no objects in the image, write "none".


Question 2a: How many scissors are in the image?

I guessed that the object in the image is meant to be scissors, so I chose .

Question 2b: What color of scissors are in the image? Choose all that apply.

The primary colors of the scissor handles are red and orange, so I marked and .

Question 2c: How realistic are the scissors in the image? (1–3) Ignore the color of the object, and focus on any visible defects in the shape or structure.

While the blade is recognizable, the object does not at all resemble a real pair of scissors, so I marked .

Question 3a: How many birds are in the image?

There are clearly no birds in the image, so I marked .

Question 4: How well does the image fit the caption below? (1–4) "Elements" refer to individual objects or attributes specified by the text.
a photo of a scissors and a bird
The image:

It seems there was an attempt at drawing scissors, but it is unrealistic, and there is definitely no bird, so I chose .
Example #4:

Question 1: Briefly list all the objects visible in the image, separated by commas. If there are no objects in the image, write "none".


Question 2a: How many skateboards are in the image?

There is skateboard in the image.

Question 2b: What color of skateboards are in the image? Choose all that apply.

The skateboard is a tan color, which seemed closest to brown to me, so I chose .

Question 2c: How realistic are the skateboards in the image? (1–3) Ignore the color of the object, and focus on any visible defects in the shape or structure.

The skateboard has the right shape and parts, but the wheels and axles are clearly wrong, so I chose .

Question 3a: How many birds are in the image?

There is bird in the image.

Question 3b: What color of birds are in the image? Choose all that apply.

The bird is primarily .

Question 3c: How realistic are the birds in the image? (1–3) Ignore the color of the object, and focus on any visible defects in the shape or structure.

The bird is mostly realistic, even though the eyes and beak look more metallic than real life, so I chose .

Question 4a: Is the bird to the left or right of the skateboard? Provide your answer as a viewer of the image, not from the "camera's perspective".

Question 4b: Is the bird above or below the skateboard? Provide your answer as a viewer of the image, not from the "camera's perspective".

The bird is standing on roughly the middle of the skateboard, so it is left nor right of the skateboard, but definitely the skateboard.

Question 5: How well does the image fit the caption below? (1–4) "Elements" refer to individual objects or attributes specified by the text.
a photo of a bird below a skateboard
The image:

There is a bird and a skateboard, but the bird is not below the skateboard as implied by the caption, so I marked .
Annotation task
Question 1: Briefly list all the objects visible in the image, separated by commas. If there are no objects in the image, write "none".

Please complete question 1 before continuing.

Question 2a: How many ${object_0_name_plural} are in the image?

Question 2b: What color of ${object_0_name_plural} are in the image? Choose all that apply.

Question 2c: How realistic are the ${object_0_name_plural} in the image? (1–3) Ignore the color of the object, and focus on any visible defects in the shape or structure.

Question 3a: How many ${object_1_name_plural} are in the image?

Question 3b: What color of ${object_1_name_plural} are in the image? Choose all that apply.

Question 3c: How realistic are the ${object_1_name_plural} in the image? (1–3) Ignore the color of the object, and focus on any visible defects in the shape or structure.

Question 4a: Is the ${object_1_name} to the left or right of the ${object_0_name}? Provide your answer as a viewer of the image, not from the "camera's perspective".

Question 4b: Is the ${object_1_name} above or below the ${object_0_name}? Provide your answer as a viewer of the image, not from the "camera's perspective".

Question 5: How well does the image fit the caption below? (1–4) "Elements" refer to individual objects or attributes specified by the text.
${caption}
The image:

(Optional) Please let us know if anything was unclear, if you experienced any issues, or if you have any other feedback for us.



You may not participate in the HIT if you are affiliated with UW or this research project.