Tuesday, 12 February 2013

Gibson: Ecological approach


Gibson: Ecological approach

The central tenet of Gibson’s approach is that “... the world is its own best
representation.” Gibson’s work is especially interesting because it complements
the role of perception in IRM and is consistent with the action-perception
cycle. Gibson postulated (and proved) the existence of affordances.
AFFORDANCES Affordances are perceivable potentialities of the environment for an action. For example,
to a baby arctic tern, the color red is perceivable and represents the potential
for feeding. So an affordance can be amore formalway of defining the
external stimulus in IRM. But like IRMs, an affordance is only a potential—
it doesn’t count until all the other conditions are satisfied (the baby tern is
hungry). An affordance can also be the percept that guides the behavior. The
presence of red to a hungry baby arctic tern releases the feeding behavior.
But the feeding behavior consists of pecking at the red object. So in this case,
red is also the percept being used to guide the action, as well as release it.

Gibson referred to his work as an “ecological approach” because he believed
that perception evolved to support actions, and that it is silly to try
to discuss perception independently of an agent’s environment, and its survival
behaviors. For example, a certain species of bees prefers one special
type of poppy. But for a long time, the scientists couldn’t figure out how the
bees recognized that type of poppy because as color goes, it was indistinguishable
from another type of poppy that grows in the same area. Smell?
Magnetism? Neither. They looked at the poppy under UV and IR light. In
the non-visible bands that type of poppy stood out from other poppy species.
And indeed, the scientists were able to locate retinal components sensitive
to that bandwidth. The bee and poppy had co-evolved, where the poppy’s
color evolved to a unique bandwidth while at the same time the bee’s retina
was becoming specialized at detecting that color. With a retina “tuned” for
the poppy, the bee didn’t have to do any reasoning about whether there was
a poppy in view, and, if so, was it the right species of poppy. If that color was
present, the poppy was there.
Fishermen have exploited affordances since the beginning of time. A fishing
lure attempts to emphasize those aspects of a fish’s desired food, presenting
the strongest stimulus possible: if the fish is hungry, the stimulus of
the lure will trigger feeding. As seen in Fig. 3.6, fishing lures often look to a
human almost nothing like the bait they imitate.
What makes Gibson so interesting to roboticists is that an affordance is directly
perceivable. Direct perception DIRECT PERCEPTION means that the sensing process doesn’t
require memory, inference, or interpretation. This means minimal computation,
which usually translates to very rapid execution times (near instantaneous)
on a computer or robot.
But can an agent actually perceive anything meaningful without some
memory, inference, or interpretation? Well, certainly baby arctic terns don’t
need memory or inference to get food from a parent. And they’re definitely
not interpreting red in the sense of: “oh, there’s a red blob. It’s a small oval,
which is the right shape for Mom, but that other one is a square, so it must
be a graduate ethology student trying to trick me.” For baby arctic terns, it’s
simply: red = food, bigger red = better.
Does this work for humans? Consider walking down the hall and somebody
throws something at you. You will most likely duck. You also probably
ducked without recognizing the object, although later you may determine it
was only a foam ball. The response happens too fast for any reasoning: “Oh
look, something is moving towards me. It must be a ball. Balls are usually
hard. I should duck.” Instead, you probably used a phenomena so basic that

you haven’t OPTIC FLOW noticed it, called optic flow. Optic flow is a neural mechanism
for determining motion. Animals can determine time to contact quite easily
with it. You probably are somewhat familiar with optic flow from driving in
a car. When driving or riding in a car, objects in front seem to be in clear focus
but the side of the road is a little blurry from the speed. The point in space
that the car is moving to is the focus of expansion. From that point outward,
there is a blurring effect. The more blurring on the sides, the faster the car is
going. (They use this all the time in science fiction movies to simulate fasterthan-
light travel.) That pattern of blurring is known as a flow field (because
it can be represented by vectors, like a gravitational or magnetic field). It is
TIME TO CONTACT straightforward, neurally, to extract the time to contact, represented in the
cognitive literature by  .
Gannets and pole vaulters both use optic flow to make last-minute, precise
movements as reflexes. Gannets are large birds which dive from high
altitudes after fish. Because the birds dive from hundreds of feet up in the
air, they have to use their wings as control surfaces to direct their dive at the
targeted fish. But they are plummeting so fast that if they hit the water with
theirwings open, the hollow bones will shatter. Gannets fold their wings just
before hitting the water. Optic flow allows the time to contact, , to be a stimulus:
when the time to contact dwindles below a threshold, fold those wings!

Pole vaulters also make minute adjustments in where they plant their pole
as they approach the hurdle. This is quite challenging given that the vaulter
is running at top speed. It appears that pole vaulters use optic flow rather
than reason (slowly) about where the best place is for the pole. (Pole vaulting
isn’t the only instance where humans use optic flow, just one that has been
well-documented.)
In most applications, a fast computer program can extract an affordance.
However, this is not the case (so far) with optic flow. Neural mechanisms in
the retina have evolved to make the computation very rapid. It turns out that
computer vision researchers have been struggling for years to duplicate the
generation of an optical flow field from a camera image. Only recently have
we seen any algorithms which ran in real-time on regular computers.48 The
point is that affordances and specialized detectors can be quite challenging
to duplicate in computers.
Affordances are not limited to vision. A common affordance is knowing
when a container is almost filled to the top. Think about filling a jug with
water or the fuel tank of a car. Without being able to see the cavity, a person
knows when the tank is almost filled by the change in sound. That change
in sound is directly perceivable; the person doesn’t need to know anything
about the size or shape of the volume being filled or even what the liquid is.
One particularly fascinating application of affordances to robotics, which
also serves to illustrate what an affordance is, is the research of Louise Stark
and Kevin Bowyer.135 A seemingly unsurmountable problem in computer
vision has been to have a computer recognize an object from a picture. Literally,
the computer should say, “that’s a chair” if the picture is of a chair.
The traditional way of approaching STRUCTURAL MODELS the problem has been to use structural
models. A structuralmodel attempts to describe an object in terms of physical
components: “A chair has four legs, a seat, and a back.” But not all chairs fit
the same structural model. A typing chair has only one leg, with supports
at the bottom. Hanging baskets don’t have legs at all. A bench seat doesn’t
have a back. So clearly the structural approach has problems: instead of one
structural representation, the computer has to have access to many different
models. Structural models also lack flexibility. If the robot is presentedwith a
new kind of chair (say someone has designed a chair to look like your toilet
or an upside down trash can), the robot would not be able to recognize it
without someone explicitly constructing another structural model.
Stark and Bowyer explored an alternative to the structual approach called
GRUFF. GRUFF identifies chairs by function rather than form. Under Gibsonian
perception, a chair should be a chair because it affords sitting, or serves


















Stark and Bowyer represented sittability as a reasonably level and continuous
surface which is at least the size of a person’s butt and at about the
height of their knees. (Everything else like seat backs just serve to specify
the kind of chair.) Stark and Bowyer wrote a computer program which accepted
CAD/CAMdrawings from students who tried to come up with nonintuitive
things that could serve as chairs (like toilets, hanging basket chairs,
trash cans). The computer program was able to correctly identify sittable
surfaces that even the students missed.
It should be noted that Stark and Bowyer are hesitant tomake claims about
what this says about Gibsonian perception. The computer vision algorithm
can be accused of some inference and interpretation (“that’s the seat, that’s
the right height”). But on the other hand, that level of inference and interpretation
is significantly different than that involved in trying to determine the
structure of the legs, etc. And the relationship between seat size and height
could be represented in a special neural net that could be released whenever
the robot or animal got tired and wanted to sit down. The robot would start
noticing that it could sit on a ledge or a big rock if a chair or bench wasn’t
around.




No comments:

Post a Comment