Microsoft Word - Chirimuuta - PSA 2012 - v 2.docx Draft.  Please  do  not  quote  without  permission.  Comments  welcome.     1     Psychophysical  Methods  and  the  Evasion  of  Introspection     M.  Chirimuuta     Dept.  History  &  Philosophy  of  Science   1017  Cathedral  of  Learning   4200  Fifth  Avenue   University  of  Pittsburgh,  Pittsburgh,  PA  15260     mac289@pitt.edu     Abstract     While  introspective  methods  went  out  of  favour  with  the  decline  of  Titchener’s   analytic  school,  many  important  questions  concern  the  rehabilitation  of   introspection  in  contemporary  psychology.  Hatfield  (2005)  rightly  points  out  that   introspective  methods  should  not  be  confused  with  analytic  ones,  and  goes  on  to   describe  their  “ineliminable  role”  in  perceptual  psychology.  Here  I  argue  that  certain   methodological  conventions  within  psychophysics  reflect  a  continued  uncertainty   over  appropriate  use  of  subjects’  perceptual  observations  and  the  reliability  of  their   introspective  judgements.     My  first  claim  is  that  different  psychophysical  methods  do  not  rely  equally  on   the  introspective  capabilities  of  experimental  subjects.  I  contrast  “minimally-­‐ introspective”  tasks  with  “introspection-­‐heavy”  ones.  It  is  only  in  the  latter,  I  argue,   that  introspection  can  be  said  to  have  a  non-­‐trivial  role  in  the  subjects’  performance.   My  second  claim  is  that  my  rough-­‐and-­‐ready  distinction  maps  onto  a  number  of   important  “dichotomies”  in  vision  science  (Kingdom  and  Prins  2009).  Not   coincidentally,  the  introspection-­‐heavy  categorisation  captures  many  of  the  tasks   typically  considered  less  able  to  yield  useful  information  regarding  the  processes   underlying  visual  sensation.         1.  Introduction       Recent  work  on  introspection  in  psychology  has  been  careful  to  separate  the  specific   commitments  of  Titchener’s  analytical  school  from  the  discussion  of  introspection   more  generally.  For  example,  Hatfield  (2005)  defines  introspection  broadly  as,  “a   mental  state  or  activity  in  or  through  which  persons  are  aware  of  properties  or   aspects  of  their  own  conscious  experience”  (p.260).  He  later  defines  introspection   as,  “deliberate  and  immediate  attention  to  certain  aspects  of  phenomenal   experience,”  arguing  that,  “it  continues  to  be  used  as  a  source  of  evidence  in   perceptual  and  cognitive  psychology”  (p.279).  In  this  paper  I  will  challenge  the   appropriateness  of  Hatfield’s  definitions  in  the  branch  of  perceptual  psychology   known  as  psychophysics1,  offering  an  alternative  account.                                                                                                                     1  Psychophysics  is  defined  by  Gescheider  (1997)  as  “the  scientific  study  of  the  relation  between   stimulus  and  sensation.”  The  disciplinary  demarcation  between  psychophysics  and  perceptual   Draft.  Please  do  not  quote  without  permission.  Comments  welcome.     2         Figure  1   A  Metameric  Match  Experiment.  The  subject  is  asked  to  adjust  the  intensities  of  R,  G   and  Y  monochromatic  lights  so  that  the  yellows  are  indistinguishable.       Hatfield  discusses  the  psychophysical  task  of  metameric  colour  matching  (see  fig.  1)   as  one  example  of  a  perceptual  experiment  reliant  on  introspection  (p.278).   However,  it  is  reasonable  to  question  whether  Hatfield’s  definitions  can  effectively   target  those  activities  which  are  introspective,  or  if  they  are  too  permissive  and   encompass  a  range  of  activities  ordinarily  thought  of  as  just  perceptual  and  not   reliant  on  introspection.  First,  though,  there  is  an  important  exegetical  question  over   how  to  understand  Hatfield’s  claim  that  introspection  is  a  source  of  evidence  in   experiments  such  as  the  metameric  matching  one.       One  possible  reading,  which  I  reject,  is  that  Hatfield  just  points  to  the  fact  that   psychophysics,  unlike  behaviourist  psychology  assumes  and  moreover  requires  that   experimental  subjects  have  conscious  perceptual  experiences2.  For  the  mission  of   psychophysics,  an  experimental  approach  to  the  mind  inaugurated  by  Fechner   (1860),  is  to  chart  and  measure  the  physical  energies  needed  to  elicit  specific   conscious  perceptual  states.  But  I  strongly  doubt  that  Hatfield  intends  to   characterise  methods  in  psychology  as  introspective  purely  in  terms  of  a  contrast   with  the  behaviourist  research  program.  In  fact,  Hatfield  is  in  agreement  with   Danziger  (1980)  that  our  understanding  of  introspection  in  psychology  has  been   distorted  by  the  behaviourist  reaction  to  Titchener’s  analytical  school.       What  is  more,  the  mere  having  of  conscious  states  is  a  different  thing  from  the   possession  of  some  ability  to  report  reliably  on  the  nature  of  those  states.  It  is  the   latter  capacity  that  is  typically  identified  with  introspection.  For  example,                                                                                                                                                                                                                                                                                                                                             psychology  more  generally  has  become  somewhat  blurry  in  recent  years,  with  many  experiments   that  are  classified  as  psychophysical  dealing  with  complex  perceptual  states,  not  just  simple   sensation.     2  Not  to  say  that  the  behaviourist  psychologists  all  assumed  that  human  beings  were  unconscious   zombies,  but  that  their  experimental  methods  were  indifferent  to  the  presence  or  absence  of   consciousness.   !"#$%&' ''''(' !"#$%&'' '''')' !"#$%&'*' Draft.  Please  do  not  quote  without  permission.  Comments  welcome.     3   Schwitzgebel’s  (2011)  sceptical  case  against  introspection  does  not  target  the  idea   that  we  have  conscious  states,  or  that  those  states  are  important  to  our  mental   economy,  but  is  concerned  to  argue  that  to  a  greater  extent  than  we  care  to  admit   the  contents  of  those  states  are  indeterminate  or  unknown  to  us.  My  interpretation   of  Hatfield’s  notion  of  introspection  will  hinge  on  this  point.  I  understand  his  claim   that  contemporary  perceptual  psychology  relies  on  introspective  evidence  to  be  the   claim  that  psychologists  exploit  their  subjects’  introspective  ability  in  order  to  glean   information  about  the  human  perceptual  system,  and  furthermore  it  is  a   presupposition  of  this  experimental  practice  that  subjects  are  competent   introspectors  in  the  sense  that  they  are  capable  of  giving  verbal  or  motor  responses   which  reliably  indicate  the  presence  or  absence  of  particular  features  of  their   conscious  experiences.  For  example,  a  psychophysical  experiment  which  measures   the  absolute  detection  threshold  for  a  dim  spot  of  light  is  said  to  be  reliant  on  the   subject’s  capacity  to  introspect  in  the  sense  that  her  subjective  awareness  of  the   spot  is  a  crucial  data  point  that  the  experimenter  has  access  to  because  of  the   subject’s  capacity  to  introspect.    And  thus  the  experimenter  must  assume  that  the   subject  can  faithfully  indicate  those  times  that  the  spot  enters  her  conscious  field  of   view.       Yet  a  problem  with  this  account  is  that  it  is  not  clear  how  it  can  be  employed  to   distinguish  introspection  from  ordinary  perception,  for  doesn’t  the  subject’s  activity   in  the  detection  task  just  boil  down  to  her  looking  for  a  dim  spot  of  light?  This  worry   could  prompt  us  to  take  Hatfield  as  endorsing  a  more  restrictive  definition.  For   Hatfield  also  suggests  that  what  characterises  introspection  over  ordinary   perception  is  that  one  attends  to  one’s  experience  of  an  object,  not  just  the  object   itself  (p.  279,  “immediate  attention  to…  phenomenal  experience”).  This  makes   introspection  importantly  different  from  perception  because  as  many  would  have  it,   perception  is  generally  “transparent”  and  our  perceptual  encounter  with  the  world   is  not  interrupted  with  moments  of  attention  to  experience  itself.  The  difficulty  with   this  reading  is  that  it  then  becomes  unclear  how  the  more  restrictive  definition  of   introspection  could  apply  to  many  of  the  psychophysical  tasks  that  Hatfield  wants  it   to  apply  to,  such  as  stimulus  detection  and  the  metameric  matching  experiment.   Subjects  perform  such  tasks  by  directing  their  attention  to  external  stimuli,  namely   the  coloured  lights,  and  need  not  attend  to  their  own  phenomenal  experience,  qua   experience.  Nor  do  they  need  to  consider  their  experience  in  a  more  fine  grained  or   detailed  way  than  in  ordinary  perception.       In  short  ,  the  problem  is  that  while  Hatfield’s  restrictive  definition  has  the  virtue   allowing  one  to  demarcate  introspection  from  perception,  it  cannot  reasonably  be   applied  to  the  range  of  psychophysical  tasks  that  Hatfield  claims  it  does.  And   furthermore  a  case  could  be  made  that  it  should  not  apply  to  any  perceptual   experiment,  since  these  generally  involve  attention  to  external  objects,  not  attention   to  phenomenal  experience  itself.  Yet  the  more  liberal  definition  makes  all  perceptual   activity  concurrently  introspective  in  a  somewhat  trivial  sense.       Draft.  Please  do  not  quote  without  permission.  Comments  welcome.     4   It  strikes  me  that  a  different  approach  to  defining  introspection  -­‐-­‐  in  the  context  of   psychophysics  -­‐-­‐  is  needed,  one  that  does  not  characterise  introspection  in  terms  of   an  object  of  attention  or  focus  of  awareness.  In  this  paper  I  propose  that  the  tasks   that  should  be  said  to  involve  introspection  are  the  ones  which  rely  on  experimental   subjects’  capacity  to  analyse  and  compare  sensory  experiences  that  bear  non-­‐obvious   relationships  of  similarity  and  difference  to  each  other.  Thus  on  my  account   introspection  can  be  part  of  the  process  of  perceiving  and  attending  to  an  external   object,  and  need  not  be  overtly  directed  at  phenomenal  experience.  The  subject  may   interpret  her  task  to  be  simply  that  of  attending  to  the  external  stimulus,  but  she  can   be  reporting  on  aspects  of  her  phenomenal  experience  nonetheless.    It  is  also  a   feature  of  my  view  that  the  extent  to  which  tasks  rely  on  introspection  is  a  matter  of   degree.  In  the  next  section  I  give  a  set  of  examples  of  common  psychophysical  tasks   that  are  either  “introspection-­‐heavy”  or  “minimally-­‐introspective”.  In  the  third   section  I  describe  how  the  cluster  of  introspection-­‐heavy  tasks  –  though  not   described  in  this  way  by  scientists  themselves  –  has  commonly  attracted  suspicion   from  psychophysicists  as  being  less  likely  to  produce  data  that  is  “objective”  and   informative  about  neural  mechanisms.  I  ask  whether  this  is  mere  coincidence,  or  if   the  methodological  norms  of  psychophysics  reflect  a  certain  wariness  towards   introspection.         2.  Introspection  in  Psychophysics  as  Controlled  Comparison       2.1  Examples  Of  Introspectively  Demanding  and  Undemanding  Psychophysical   Tasks.       The  metameric  match  paradigm,  illustrated  in  figure  1  has  been  used  to  diagnose   specific  types  of  colour  vision  deficiency  since  the  late  19th  century.  Differences  in   the  number  of  retinal  cone  types  an  individual  has,  and  the  spectral  sensitivities  of   those  cone  classes,  lead  to  measurable  differences  in  the  proportion  of  red  to  green   in  a  composite  light  that  he  or  she  judges  to  look  identical  to  a  yellow   monochromatic  standard.  Note  that  in  this  task  the  only  perceptual  judgment  that   the  subject  need  make  is  over  whether  the  composite  and  monochromatic  light  are   visually  indistinguishable.  If  the  lights  are  presented  as  abutting  (as  in  fig.  1)  then   the  subject  simply  has  to  judge  whether  or  not  the  colour  field  is  homogeneous.  No   attention  to  the  specific  qualities  of  the  perceived  colour  is  required.       Draft.  Please  do  not  quote  without  permission.  Comments  welcome.     5     Figure  2   An  Asymmetric  Match  Experiment.  The  subject  is  asked  to  adjust  the  proportions  of  R   and  G  monochromatic  lights  so  that  the  yellows  match  in  hue.  The  intensity  of  the  Y   light  is  fixed,  so  the  yellows  cannot  be  matched  for  brightness  and  are  therefore   distinguishable  even  when  hue  is  judged  to  be  equivalent.       Contrast  this  task  with  an  asymmetric  match  paradigm  (fig.  2).  In  this  case  the  two   central  stimuli  are  not  matched  for  luminosity  but  the  subject  must  say  whether  or   not  they  match  in  hue  regardless  of  their  visible  difference  in  brightness.  This   requires  that  the  subject  analyse  her  experience  of  the  two  colours  in  terms  of   separate  dimensions  of  hue  and  brightness,  and  make  a  judgment  as  to  the  identity   of  just  one  of  these  dimensions,  disregarding  the  difference  in  the  other.  Thus  the   subject  must  make  a  series  of  comparisons  between  pairs  of  stimuli  in  order  to  find   the  pair  that  holds  the  unique  but  non-­‐obvious  relationship  of  sameness  of  hue.  This   relationship  is  non-­‐obvious  in  that  it  is  not  marked  by  a  simple  defining   characteristic  like  a  homogenous  spatial  profile.       It  should  be  fairly  intuitive  that  this  task  is  “introspection-­‐heavy”  in  a  way  that  the   metameric  matching  task  is  not.  The  contrast  between  these  two  tasks  points  us  to   the  central  intuition  behind  my  new  characterisation  of  introspection.  The  idea  is   that  the  metameric  matching  task  is  “minimally-­‐introspective”  because  it  can  be   performed  without  any  careful  comparison  of  the  phenomenal  qualities  one   experiences  on  presentation  of  the  two  stimuli.  The  metameric  paradigm  relies  on   introspection  only  in  the  minimal  sense  that  it  assumes  the  subject  can  know  and   reliably  report  when  her  conscious  visual  field  is  homogeneous  with  respect  to   colour3.  The  asymmetric  matching  task,  on  the  other  hand,  is  “introspection-­‐heavy”   because  it  does  require  this  careful  comparison  of  sensory  experiences  that  bear   non-­‐obvious  relationships  of  similarity  and  difference  to  each  other.       Asymmetric  matching  paradigms  have  been  used  to  study  achromatic  perception  of   lightness  and  darkness  (fig.  3a  ,  see  Gilchrist  2004)  and  to  study  colour  constancy.   Figure  3b  gives  an  example  of  an  asymmetric  task  in  which  the  observer  views  a                                                                                                                   3  i.e.  relies  on  introspection  defined  in  the  first,  permissive  sense.  To  reiterate  the  discussion  of   section  1,  the  problem  with  the  minimal  notion  of  introspection  is  that  it  cannot  distinguish   introspection  from  ordinary  perception.     !"#$%&' ''''(' !"#$%&'' '''')' *')+,+-+./+' 01%'23+"' 4-560&.+%%' Draft.  Please  do  not  quote  without  permission.  Comments  welcome.     6   scene  under  two  different  lighting  conditions.  She  is  instructed  to  adjust  the  colour   of  the  central  patch  in  one  image  until  it  looks  as  if  made  from  the  same  paper  as  the   central  patch  in  the  other  (Foster,  2011).  Importantly,  even  when  the  patches  are   matched  there  will  still  be  a  visible  difference  in  colour  between  them,  and  the   experiment  relies  on  the  subject  having  a  clear  sense  of  what  sameness  of  material   would  look  like  in  spite  of  these  differences.  Again,  the  task  is  “introspection-­‐heavy”   in  comparison  to  a  task  in  which  the  subject  just  has  to  report  on  the  absolute   identity  or  distinguishability  of  two  stimuli.  In  particular,  it  relies  on  the  subject’s   ability  to  make  a  “judgment  call”  on  the  one  best  match,  given  a  range  of  close   contenders  which  vary  along  a  number  of  different  dimensions.  I  describe  the   introspection-­‐heavy  tasks  as  requiring  controlled  comparison  because  the  demand   placed  on  the  subject  is  to  perform  some  kind  of  analysis  and  comparison,  but   within  parameters  that  are  pre-­‐specified  by  the  experimenter.           Figure  3   (a)  Achromatic  asymmetric  match  experiment  where  black  annulus  influences   perceived  brightness  of  one  of  the  circles.  Subject  is  asked  to  determine  point  of   subjective  equality  of  the  brightness  of  the  two  circles.   (b)  Asymmetric  colour  constancy  experiment.  Subject  is  asked  to  adjust  the  colour  of   one  of  the  patches  (marked  with  arrow)  until  it  looks  as  if  it  is  made  from  the  same   paper  as  the  other.  (From  Foster  2011,  permission  needed.)     Another  kind  of  paradigm  that  intuitively  fits  the  idea  of  controlled  comparison  is  a   rating  scale  task.  In  a  series  of  experiments  published  recently  (To  et  al  2008,  2010,   Tolhurst  et  al  2010)  subjects  were  presented  with  nearly  300  pairs  of  photographs  –   an  original  and  a  modified  version  –  and  were  asked  to  rate  how  similar  the  pairs   were  on  a  scale  from  0  (completely  identical)  to  any  arbitrarily  high  value  (see  fig.   !"#$%&'()*+,&-.%%' $-/0'122.1)%'%13.' 1%'4&,.)5' and Judd (1940, Section III, 2(c)) to be due to indeterminacy either generally in observer attitude or specifically in assigning chromatic effects to an appropriate physical origin. With a properly differen- tiated criterion for matching, therefore, fewer extreme values should occur. A direct comparison of asymmetric color matching with undifferentiated and differentiated criteria appears not to have been reported. There is evidence that with a paper-match cri- terion subjects’ responses are at least close to being normally dis- tributed, with little evidence of outliers. In an experiment on simultaneous asymmetric color matching with a paper-match cri- terion (Foster, Amano, & Nascimento, 2001), the distribution of constancy indices from 20 subjects (Section 4.1) was found to have a standard deviation of 0.14 about a mean of 0.66. No subject scored less than 2 s.d. below the mean.10 Implicit in Arend and Reeves’ (1986) experimental procedure was the assumption that their two kinds of judgments were based on two 3-dimensional spaces: one concerned with hue, saturation, and brightness, the other with surface color per se. Brainard et al., (1997) have also proposed that in asymmetric color matching with an undifferentiated color-match criterion, more than three dimen- sions are involved in subjects’ judgments. The question of the num- ber of dimensions underlying color judgments with surfaces under variegated illumination was addressed directly by Tokunaga and Logvinenko (2010) in a multidimensional scaling experiment. Sub- jects were asked to judge the dissimilarity of surfaces in a scene with multiple illuminants. Their responses were best modeled with three dimensions associated with surfaces and another three with the illuminants, but with just one illuminant, responses could be modeled with the usual three dimensions. 3.2. Relational color constancy As shown later (Sections 4.2 and 4.5), many experiments aimed at measuring color constancy have actually measured a different phenomenon, namely, relational color constancy. This refers to the constancy of the perceived relations between the colors of sur- faces under illuminant changes, rather than of the perceived colors themselves11 (Foster & Nascimento, 1994; Nascimento & Foster, 1997). For example, in Fig. 3, the scene is illuminated by different daylights, with correlated color temperatures (a) 17,000 K, (b) 4000 K, (c) 6500 K, and (d) 4000 K. The color of the light reflected from the sphere in the bottom left corner in a–c is clearly different. Nevertheless, given the limits of the color reproduction of these images on the printed page, it can be seen that the sphere has the same or similar surface color in each image by comparing it with the nearby foliage and by looking over each image as a whole. By contrast, in d, although the color of the light reflected from the sphere is the same as in a, it can be seen that the sphere has a differ- ent surface color, now more bluish, again by comparing it with near- by foliage or over the image as a whole. In a–c, the perceived relations between the colors are largely preserved, and in d, they are not. Relational color constancy has been given an operational mean- ing, independent of its subjective content, namely, the ability of an observer to correctly attribute changes in the color appearance of a scene either to changes in the spectral composition of the illumi- nant or to changes in the reflecting properties of that scene, i.e. its materials (Craven & Foster, 1992; Foster, Craven, & Sale, 1992). A similar issue has been emphasized by Zaidi (1998). The formal equivalence of perceptual and operational interpretations of relational color constancy was set out by Foster and Nascimento (1994, Appendix 1), and its experimental application is described here in Section 4.5. The phenomenology of illuminant and material changes has been found to be particularly compelling when the changes occur as a temporal sequence without an intervening delay. Thus, when subjects were presented with successive Mondrian patterns re- lated by illuminant or material changes (Craven & Foster, 1992), they reported that the ‘‘changes of illuminant tended to be per- ceived as a coloured wash over the display, whereas changes of material led to a distinctively uneven appearance’’ (p. 1364). As Fig. 3 illustrates, reliable discriminations can also be made between simultaneously presented images related by an illumi- nant change, as in a and b, and by an additional material change, as in a and d. This ability persists with presentations las- ting < 200 ms and led to the suggestion that the ability to judge whether color relations were preserved or violated was the result of fast, relatively low-level, spatially parallel visual processing (Foster et al., 1992). This notion was supported by subsequent measurements in which during successive illuminant changes, material changes in one or more surfaces in an array of other sur- faces were shown to be readily detected almost independently of the numbers of surfaces (Foster, Nascimento, et al., 2001). One Fig. 5. Mondrian patterns used by Arend and Reeves (1986) in simultaneous asymmetric color matching. The patterns consisted of Munsell matte colored papers of Munsell Value 5 simulated under daylight and sunlight with correlated color temperatures 6500 K on the left and 4000 K on the right. Patch luminances were varied by ±10%. The variable ‘‘match’’ patch arrowed in the right pattern was matched against the corresponding test patch arrowed in the left pattern (arrows absent in the original). Recreated from Fig. 1 of Arend and Reeves (1986, pp. 1744–1745). 10 The kurtosis of the sample, a measure of the potential of the distribution for outliers, was in fact less than that for a normal distribution. 11 The perceived relations between the colors of surfaces should not be confused with what defines related colors, such as brown and olive, which require the presence of other colors, achromatic or chromatic, to be perceived. The former refers to pairs (or larger groupings) of arbitrary colors; the latter to particular individual colors. 680 D.H. Foster / Vision Research 51 (2011) 674–700 617' 6(7' Draft.  Please  do  not  quote  without  permission.  Comments  welcome.     7   4a).  In  one  of  these  publications,  Tolhurst  and  colleagues  (2010)  also  present  results   of  a  simple  two-­‐alternative-­‐forced-­‐choice  (2-­‐AFC)  contrast  discrimination   experiment  in  which  subjects  just  had  to  report  which  of  a  pair  of  otherwise   identical  photographs  contained  a  small,  high  contrast  central  patch  (see  fig.  4b).   They  then  apply  their  model  of  contrast  discrimination  to  the  rating  scale  data.  The   rating  scale  task  falls  under  my  introspection-­‐heavy  category,  while  the  contrast   discrimination  task  is  minimally-­‐introspective.  In  the  former,  the  subject  must  make   a  judgement  as  to  the  relative  similarity  of  a  large  number  of  pairs  of  stimuli,  that   differ  in  different  ways,  whereas  in  the  latter  task  she  detects  the  presence  or   absence  of  a  high-­‐contrast  patch  in  a  rather  automatic  fashion.  Figure  4  illustrates   how  similar  stimuli  can  be  used  in  these  two  very  different  experiments,  so  it  is  not   complexity  of  stimulus  per  se  that  determines  how  introspectively  demanding  the   task  is.  Rather,  the  determining  factor  is  the  nature  of  the  response  that  the  subject   must  make  to  the  stimulus.  That  is,  whether  the  response  is  simply  choice  between   saying  the  high  contrast  patch  appeared  first  or  second  out  of  two  stimuli,  or  if  it   calls  for  a  more  careful  examination  of  the  perceived  properties  of  the  stimuli.           Figure  4   Example  of  stimuli  used  by  Tolhurst  et  al.  2010  (a)  Rating  scale  task.  For  each  of  294   image  pairs,  subjects  were  asked  to  rate  how  similar  or  different  they  appeared  on  a   numerical  scale  of  their  own  devising.  (permission  needed)   354 D. J. Tolhurst et al. / Seeing and Perceiving 23 (2010) 349–372 Figure 1. Monochrome representations of some of the kinds of image pair used in our experiments. (A) and (B) from the ‘garden scene’ series. The left-hand images show two of the parent images in the experiments, while the middle and right-hand images show variant images against which the parents could be compared. There were 48 variants of each. (A) Shows two variants that differ in the magnitude of a single change type, while (B) shows variants that differ in 2 different ways. (C–E) From the ‘varied pairs’ series; the upper stimulus is the parent, and the lower image is one of 5 variants in the experiments. (C) Shows a colour change; (D) shows a shape change; (E) shows an item disappearing. The ‘colour change’ was achieved by changing the hue and the saturation of one banana, using code written in Matlab (The Mathworks). ‘Shape’ and ‘appearance’ changes used time-lapse photography. For details and coloured examples, see To et al. (2008, 2010). !"#$ !%#$ Draft.  Please  do  not  quote  without  permission.  Comments  welcome.     8   (b)  2-­‐AFC  contrast  discrimination.  Subjects  had  to  report  if  the  high  contrast  central   patch  appeared  in  the  first  or  second  stimulus.       Before  moving  on,  I  would  like  to  emphasise  that  my  two  categories  are  intended  to   reflect  a  qualitative  difference  in  how  introspectively  demanding  these  tasks  are,   and  that  I  will  say  nothing  in  this  paper  about  how  to  quantify  this  difference,  and   how  it  is  that  introspective  demands  admit  of  degree.    For  example,  the  question  of   whether  or  not  metameric  matching  is  even  less  introspectively  demanding  than   contrast  discrimination  will  be  left  unanswered.  It  seems  plausible  that   introspective  demands,  like  attentional  demands,  come  in  degrees  but  I  offer  no   suggestions  of  how  one  might  measure  this.  It  is  also  plausible  that  there  will  be   some  tasks  that  occupy  middle  ground  between  my  categories  and  are  hard  to   classify  either  way.  I  will  not  deal  with  such  cases  here.  My  aim  in  presenting  a  set  of   tasks  that  are  intuitively  more  reliant  on  introspection  than  the  others  has  been  to   highlight  one  way  that  introspection  may  be  said  to  play  a  role  in  perceptual   psychology,  and  to  this  end  I  have  focused  on  the  most  clear  cut  cases.       2.2  Other  Classifications  of  Psychophysical  Tasks       One  of  the  attractive  things  about  psychophysics  as  a  subject  for  philosophy  of   science  is  the  fact  that  throughout  its  short  history  methodological  questions  about   the  best  way  to  measure  sensory  responses  have  been  debated  in  a  perspicacious   way  by  leading  protagonists.  Moreover,  such  controversies  still  resonate  in  the   living  memory  of  the  discipline,  and  are  recounted  even  in  the  most  recent   textbooks.  One  way  in  which  methodological  debates  commonly  unfold  is  with  a   distinction  first  being  drawn  between  two  broad  classes  of  psychophysical   techniques,  and  the  relative  merits  of  the  two  classes  are  subsequently  discussed.       In  their  textbook  Kingdom  and  Prins  (2009)  devote  a  chapter  to  the  “dichotomies”   that  have  been  most  significant  to  psychophysicists  past  and  present.  The  first  of   these,  Brindley’s  (1960,  1970)  distinction  between  Class  A  and  Class  B  observations   is  particularly  relevant  to  my  account  of  introspection.  Brindley  characterised  Class   A  observations  as  any  tasks  in  which  the  observer  just  had  to  report  on  the  absolute   similarity  or  dissimilarity  in  the  appearance  of  a  pair  of  stimuli.  For  example,  the   measurement  of  the  detection  threshold  for  a  spot  of  light  is  Class  A  because  the   subject  need  only  indicate  whether  the  trial  in  which  the  spot  is  present  is   distinguishable  or  not  from  the  reference  stimulus  in  which  the  spot  is  absent.   Likewise,  the  measurement  of  the  discrimination  threshold  for  the  brightness  of  the   spot  is  also  Class  A,  as  it  just  requires  the  subject  to  report  if  the  trial  in  which  the   luminosity  of  the  spot  is  increased  looks  different  from  the  trial  in  which  the   luminosity  remained  at  baseline.  In  contrast,  Brindley  (1970:133)  categorised  as   Class  B,  “[a]ny  observation  that  cannot  be  expressed  as  the  identity  or  non-­‐identity   of  two  sensations…”;  for  example,  “all  those  [observations]  in  which  the  subject   must  describe  the  quality  or  intensity  of  his  sensations,  or  abstract  from  two   different  sensations  some  aspect  in  which  they  are  alike.”       Draft.  Please  do  not  quote  without  permission.  Comments  welcome.     9   Brindley’s  description  of  Class  B  observations  is  interchangeable  with  my   characterisation  of  introspection-­‐heavy  tasks.  Indeed,  the  tasks  which  I  presented  as   examples  of  my  minimally-­‐introspective  category  –  metameric  matching  and   contrast  discrimination  –  are  Class  A,  whereas  all  kinds  of  asymmetric  matching  and   rating  scale  tasks  are  Class  B.  In  essence,  both  of  these  categorisation  schemes  can   be  understood  as  drawing  a  distinction  between  tasks  in  which  the  experimental   subject  is  treated  somewhat  like  a  thoughtless  measuring  instrument,  and  methods   that  rely  on  the  subject’s  status  as  a  critical  being  who  can  attend  to  and  reflect  on   her  own  conscious  states.  The  point  is  not  that  the  A/minimally-­‐introspective  Class   treats  the  subject  as  if  unconscious,  or  that  it  requires  the  subject  to  have  sensory   capacities  but  not  cognitive  ones.  Rather,  it  is  that  the  A/minimally-­‐introspective   Class  makes  no  demands  on  any  capacity  for  reflection  on  and  comparison  of   occurrent  sensory  states,  whereas  tasks  in  the  B/introspection-­‐heavy  Class  do4.       To  illustrate  this,  imagine  a  machine  that  can  read  off  the  conscious  sensory  states  of   a  subject  performing  a  contrast  discrimination  task.  In  order  to  predict  the  subject’s   responses  to  any  trial,  all  the  machine  must  do  is  to  assign  a  number  to  the   intensities  of  the  subject’s  experience  of  contrast  for  the  central  regions  of  the  two   different  stimuli.  If  they  have  the  same  values  the  machine  predicts  the  answer  is   ‘same’,  and  if  they  differ  the  machine  predicts  an  answer  of  ‘different’.  Once  the  non-­‐ trivial  problem  of  reading  off  individuals’  phenomenal  states  is  solved,  the  rest  is   uncomplicated!  If  an  equivalent  machine  were  to  be  built  for  the  rating  scale  task,   the  blueprint  could  not  be  so  simple.  There  is  no  one  quality  of  the  subject’s   conscious  experience  of  the  stimuli  that  the  machine  could  measure  and  use  to   predict  the  response.  Rather,  the  machine  would  have  to  rely  on  some  complicated   model  of  how  various  differences  in  the  experienced  qualities  of  the  images  are   weighted  against  each  other  to  give  and  impression  of  greater  or  lesser  degrees  of   similarity5.  In  other  words,  a  model  of  introspective  comparison  and  not  a  simple   measurement  algorithm.       The  mind-­‐reading  machine  thought  experiment  again  confronts  us  with  the  fact  that   the  distinction  being  drawn  is  not  between  tasks  that  are  in  no  way  introspective   and  those  that  completely  are.  Rather,  it  is  about  the  extent  to  which  these  tasks  call   upon  some  putative  introspective  capacity.  For  the  first  machine,  dealing  with  the                                                                                                                   4  In  support  of  this  idea  that  the  key  distinction  in  play  here  is  between  subject-­‐as-­‐measuring-­‐ instrument  and  subject-­‐as-­‐reflective-­‐being,  it  is  worth  noting  that  Brindley’s  one  example  of  a   psychophysical  document  explicitly  hostile  to  Class  B  observations  is  the  1943  Optical  Society  of   America  (OSA)  report  that,  as  Stevens  (1951:31)  relates,  “reduces  psychophysics  to  the  employment   of  a  human  observer  as  a  null  instrument  under  a  set  of  strictly  specified  conditions”  And  Brindley’s   one  example  of  a  psychophysicist  liberal  with  regards  to  Class  B  is  Stevens  (1951),  who  explicitly   rejects  the  OSA  definition  as  too  narrow  and  restrictive  (and  cf.  Helson  1949).   5  Interestingly,  however,  Tolhurst  et  al.  (2010)  can  predict  trends  in  the  similarity  rating  data  to  a  fair   degree  of  accuracy  with  a  model  of  entirely  unconscious  neuronal  response  functions.  The  fact  that   there  is  “machine”  that  can  predict  responses  to  the  contrast  discrimination  and  rating  scale   experiments,  without  peering  into  the  conscious  states  of  subjects  should  not  detract  from  the  fact   that  any  hypothetical  machine  attempting  to  examine  conscious  states  in  order  to  predict  responses   would  have  a  to  treat  the  two  experiments  differently.     Draft.  Please  do  not  quote  without  permission.  Comments  welcome.     10   contrast  discrimination  experiment,  can  still  peer  into  the  conscious  states  of   observers  and  this  captures  some  minimal  notion  of  introspective  activity.  Yet  the   second  machine,  dealing  with  the  rating  scale  experiment,  needs  not  only  to   determine  what  the  subject  experiences,  but  also  to  determine  what  the  subject   makes  of  her  experience,  what  is  more  and  less  salient  about  the  different  qualities   presented  in  her  visual  phenomenology.  This  is  an  introspective  undertaking  of  a   weightier  kind.       It  is  hard  to  say  how  influential  Brindley’s  distinction  has  been.  It  came  under   immediate  criticism  from  Boynton  and  Onley  (1962)  but  was  clearly  accepted  in   some  form  by  Marks  (1978)  and  Teller  (1984),  and  is  discussed  at  length  in   Gescheider’s  (1997)  psychophysics  textbooks.  Kingdom  and  Prins  (2009,  p.18)   choose  not  to  employ  it  as  an  overarching  basis  for  classifying  psychophysical   experiments  because  of  the  problem  that  certain  tasks  cannot  be  classified  as  either   A  or  B.       Kingdom  and  Prins’  preferred  distinction    is  between  tasks  that  measure   performance  and  those  that  measure  appearance,  which  they  characterise  in  the   following  way:   “If  the  measurement  can  be  meaningfully  considered  to  be  better  under  one   condition  than  under  another,  then  it  is  a  performance  measure,  if  not  it  is  an   appearance  measure.”  (p.22)   Performance  tasks  are  any  ones  designed  to  chart  perceptual  “limits”  (e.g.  contrast   discrimination,  detection  of  a  spot  of  light  against  a  differently  coloured   background).  An  example  of  an  appearance  task  is  an  experiment  comparing  the   strength  of  the  Müller-­‐Lyer  illusion  with  fin  angles  of  45°  and  60°.  Even  if  the  length   of  the  central  bars  appears  to  be  more  different  when  the  fins  are  45°,  there  is  no   sense  in  which  the  subject  is  “better”  at  the  task  in  that  condition.  So  this  Class  B   observation  can  also  be  said  to  be  an  appearance  measure.  Thus  there  is  an  overlap   with  my  distinction:  appearance  tasks  tend  to  be  introspection-­‐heavy,  and   performance  tasks  tend  to  be  minimally-­‐introspective.  But  it  is  not  as  well  matched   as  is  the  case  with  Class  A  vs.  B.  In  particular,  the  metameric  match  task  that  I   classify  as  minimally-­‐introspective  turns  out  to  be  an  appearance  measure.6       3.  Not  All  Psychophysical  Methods  Were  Created  Equal     All  I  have  argued  so  far  is  that  there  is  an  intuitive  way  of  differentiating   psychophysical  tasks  that  are  more  reliant  on  introspection  from  those  that  are  not,   and  that  my  categorisations  turn  out  to  be  roughly  co-­‐extensional  with   categorisations  of  tasks  developed  within  the  psychophysical  tradition.  The                                                                                                                   6  A  related  dichotomy  is  Sperling’s  (et  al.  1990)  Type  1  vs.  Type  2  distinction.  In  Type  1  experiments   the  subject’s  response  maybe  either  correct  or  incorrect  with  respect  to  some  physical  dimension  of   the  stimulus  (e.g.  for  either  is  more  oblique  than  line  2).  For  Type  2  the  experimenter  is  cannot   classify  responses  as  correct  or  incorrect.  Note  again  that  the  metameric  match  turns  out  to  be  Type   2,  even  though  Class  A/minimally-­‐introspective.     Draft.  Please  do  not  quote  without  permission.  Comments  welcome.     11   question  now  is  what  to  make  of  this  finding.  Is  it  just  a  coincidence  that  the   distinctions  coincide?  It  should  come  as  no  surprise  to  the  reader  that  my  next  point   will  be  that  the  categories  that  line  up  on  the  introspection-­‐heavy  side  have  tended   to  meet  with  more  diffidence  and  suspicion  from  psychophysicists  than  those  on  the   other  side.       Brindley  (1970)  presents  type  A  observations  as  especially  informative  about  the   physiological  mechanisms  underlying  perception  because  they  can  be  used  to  test   “psycho-­‐physical  linking  hypotheses”,  that  two  stimuli  (e.g.  yellow  monochromatic   light,  and  a  certain  mixture  of  red  and  green  lights)  will  produce  the  same  neural   activity  and  hence  the  same  sensation.       On  the  relative  status  of  the  two  classes  he  writes  that,     “The  use  of  Class  A  observations  as  a  basis  for  analysing  the  function  of  the  eye   and  visual  pathway  is  not  controversial;  every  writer  on  vision  admits,  at  least   by  implication,  that  they  can  be  legitimately  used.  On  the  use  of  the  kinds  of   observation  here  called  Class  B,  there  have  been  differences  of  opinion  …  The   conservative  opinion,  in  its  most  extreme  form,  is  that  only  Class  A  observations   are  of  any  value,  and  in  a  discussion  of  visual  mechanisms  all  Class  B   observations  may  be  entirely  disregarded.”  (1970,  p.  134)     Brindley  himself  takes  this  view  to  be  too  narrow,  but  is  critical  of  Stevens’  (1951)   “extreme  liberal  opinion”  for  failing  to  make  the  distinction.  Later  in  the  book,  when   discussing  Hering’s  opponent  theory  of  colour  he  writes  as  if  it  is  still  moot  whether   the  kinds  of  phenomenological  reports  presented  by  Hering  in  support  of  his  theory   can  actually  be  taken  as  evidence  for  a  kind  of  colour  mechanism  (p.208).     One  might  think  that  this  is  all  besides  the  point  in  a  discussion  about  introspection   because  the  reason  why  the  value  of  Class  B  observations  was  held  in  question  was   not  because  they  are  introspection-­‐heavy,  but  because  their  failure  to  underwrite   psychophysical  bridge  principles.  But  I  do  not  think  that  this  problem  is  so   disconnected  to  from  the  issue  of  introspection.  For  if  Class  B  tasks  were  to  be   granted  some  supporting  assumptions,  like  the  ones  offered  for  Class  A,  then  one   could  equally  say  that  they  are  informative  of  underlying  neural  mechanisms.  For   example,  in  the  case  of  the  asymmetric  hue  matching  experiment,  why  not  assume   that  when  the  hue  sensation  for  each  stimulus  is  equal,  that  is  evidence  that  there  is   a  neural  pathway  somewhere  between  the  photoreceptors  and  the  cortex  that   conveys  the  same  message  in  both  cases?  This  would  be  a  special  case  of  the   assumption  made  in  support  of  inferences  from  Class  A  observations  that,   “whenever  two  stimuli  cause  physically  indistinguishable  signals  to  be  sent  from  the   sense  organs  to  the  brain,  the  sensations  produced  by  these  stimuli,  as  reported  by   the  subject  in  words,  symbols  or  actions,  must  also  be  indistinguishable”  (Brindley   1970,  p.133).     Yet,  Class  B  observations  are  treated  differently.  The  reason  for  this  difference  is   likely  because  Brindley  and  other  theorists  (e.g  Marks  1978)  have  been  wary  of   attributing  to  subjects  the  kind  of  introspective  powers  that  would  be  needed  to   Draft.  Please  do  not  quote  without  permission.  Comments  welcome.     12   analyse  hue  separately  from  all  other  sensory  qualities,  and  determine  exactly  the   point  of  equivalence  of  hue.    In  other  words,  if  these  theorists  had  shared   Titchener’s  faith  in  the  analytical  acumen  of  introspection,  they  would  have  had  no   reason  to  treat  Class  B  observations  differently  from  Class  A.         This  pattern  of  unequal  treatment  can  be  seen  not  just  in  the  discussion  of  Class  A   and  B  observations,  but  also  with  respect  to  the  other  dichotomies  discussed  by   Kingdom  and  Prins.  They  note  that  it  is  fairly  common  for  psychophysicists  to  refer   to  some  tasks  as  more  “objective”  or  “subjective”  than  others,  with  all  the  value-­‐ laden  connotations  of  these  terms.  Kingdom  and  Prins  explain  this  usage  in  the   following  way:     “All  psychophysical  experiments  are  in  a  trivial  sense  subjective,  because  they   measure  what  is  going  on  inside  the  head,  and  if  this  is  the  intended  meaning  of   the  term  then  the  distinction  is  redundant7.  The  dichotomy  is  more  often   invoked,  however,  to  differentiate  between  different  types  of  psychophysical   procedure.  The  distinction  has  been  used  variously  to  characterize  Class  A   versus  Class  B  observations,  tasks  for  which  there  is  versus  tasks  for  which  there   is  not  a  correct  and  an  incorrect  response8,  forced-­‐choice  versus  non-­‐forced-­‐ choice  procedures,  and  criterion-­‐dependent  versus  criterion-­‐free  procedures.”   (p.18-­‐19)     The  notion  of  subjectivity  at  play  here  is  encapsulated  in  the  idea  that  experiments   are  subjective  if  they  are  introspection-­‐heavy.  For  all  the  tasks  on  the  wrong  side  of   the  subjective-­‐objective  tracks  are  ones  which  rely  on  the  subject’s  judgments   concerning  the  appearance  of  the  stimuli,  involving  complex  comparisons  which   cannot  be  independently  verified  by  examining  the  physical  properties  of  the  stimuli   themselves.       To  conclude,  there  is  a  sense  in  which  the  title  of  this  paper  is  misleading.  I  have  not   showed  that  the  psychophysicists  have  avoided  using  experimental  methods  more   reliant  on  introspection,  or  that  the  use  of  such  methods  has  always  been   questioned.  Indeed,  when  Kingdom  and  Prins  write  that,  “Both  performance-­‐based   and  appearance-­‐based  experiments  are  important  to  our  understanding  of  vision.   Measures  from  both  types  of  experiments  are  probably  necessary  to  fully   characterize  the  system”  (p.26),    they  are  articulating  a  methodological  pluralism   that  many  psychophysicists  would  endorse.  However,  the  crucial  point  is  that  the   methods  on  the  wrong  side  of  the  divide,  those  more  reliant  on  introspection,   continue  to  need  their  advocates,  whereas  those  on  the  other  have  been  accepted   without  question.  This  is  an  indication  of  the  contested  status  of  introspection   within  the  psychophysics  tradition.                                                                                                                         7  Cf.  the  worry  discussed  above  that  all  psychophysical  experiments  rely  on  introspection  in  a  trivial   or  “minimal”  way,  hence  the  distinction  between  introspection  and  perception  is  made  redundant.       8  I.e.  performance  vs.  appearance  or  Sperling’s  Type  1  vs.  Type  2.   Draft.  Please  do  not  quote  without  permission.  Comments  welcome.     13     References     Boynton,  R.  M.  and  J.  W.  Onley  (1962).  "A  critique  of  the  special  status  assigned  by   Brindley  to  'Psychophysical  Linking  Hypotheses'  of  'Class  A'."  Vision  Research  2:   383-­‐390.     Brindley,  G.  S.  (1960,  2nd  edition  1970).  Physiology  of  the  Retina  and  the  Visual   Pathway.  London,  Edward  Arnold.     Danziger,  K.  (1980).  "The  History  of  Introspection  Reconsidered."  Journal  for  the   History  of  the  Behavioural  Sciences  16:  241-­‐262.     Fechner,  G.  (1860/1966).  Elements  of  Psychophysics  Holt,  Rinehard  and  Winston.     Foster,  D.  H.  (2011).  "Color  Constancy."  Vision  Research  51:  674-­‐700.     Gescheider,  G.  A.  (1997).  Psychophysics:  The  fundamentals.  Mahwah  NJ,  Lawrence   Erlbaum.     Gilchrist,  A.  (2004).  Seeing  Black  and  White.  Oxford,  Oxford  University  Press.     Hatfield,  G.  (2005).  Introspective  Evidence  in  Psychology.  Scientific  Evidence:   Philosophical  theories  and  applications.  P.  Achinstein.  Baltimore,  Johns  Hopkins   University  Press.     Helson,  H.  (1949).  "Review  of  'Introduction  to  Color'."  Psychological  Bulletin  46(2):   166-­‐169.     Kingdom,  F.  A.  A.  and  N.  Prins  (2009).  Psychophysics:  A  practical  introduction.   Amsterdam,  Elsevier  Academic  Press.     Marks,  L.  E.  (1978).  The  Unity  of  the  Senses.  New  York,  Academic  Press       Schwitzgebel,  E.  (2011).  Perplexities  of  Consciousness.  Cambridge  MA,  MIT  Press.     Sperling,  G.,  B.  A.  Dosher,  et  al.  (1990).  "How  to  study  the  kinetic  depth   experimentally  "  Journal  of  Experimental  Psychology:  Human  Perception  and   Performance  16:  445-­‐450.     Stevens,  S.  S.  (1951).  Handbook  of  Experimental  Psychology.  London,  Chapman  &   Hall       Teller,  D.  Y.  (1984).  "Linking  Propositions."  Vision  Research  24(10):  1233-­‐1246.     Draft.  Please  do  not  quote  without  permission.  Comments  welcome.     14   To,  M.  P.  S.,  P.  G.  Lovell,  et  al.  (2008).  "Summation  of  perceptual  cues  in  natural   visual  scenes."  Proc.  Royal.  Soc.  Lond.  B.  Biol.  Sci  275:  2299-­‐2308.     To,  M.  P.  S.,  P.  G.  Lovell,  et  al.  (2010).  "Perception  of  suprathreshold  naturalistic   changes  in  colored  natural  images."  J.  Vision  10:  1-­‐22.     Tolhurst,  D.  J.,  M.  P.  S.  To,  et  al.  (2010).  "Magnitude  of  Perceived  Change  in  Natural   Images  May  be  Linearly  Proportional  to  Differences  in  Neuronal  Firing  Rates."   Seeing  and  Perceiving  23:  349-­‐372.  Reprinted  in  J.A.  Soloman  (ed)  2011,  Fechner's   Legacy  in  Psychology.  Boston:  Brill