level in a structured interview or by means of some other similar structured procedure. These assessments provide measures that have considerable precision. Moreover, they unquestionably tap important skills. Nevertheless, we should recognize that such investigations differ from other possible studies that would examine what an individual does when he or she is engaged in ongoing activities. For example, consider ‘‘leadership,’’ one of the research areas discussed by Dawson et al. Dawson (2006) described a carefully developed system for assessing a subject’s level of understanding this concept from responses during an assessment interview. But instead of proceeding this way, an investigator could examine what a subject does upon encountering a particular item while

going through his or her ‘‘inbox’’ or when a subordinate asks the subject a specific question in a naturally occurring situation. Thinking about such in situ examples makes it clear that we could not possibly map what a person might do in such situations onto the developmental sequence without drawing on rich prior familiarity with the relevant practices. Therefore, these examples underscore the role played by interpretation. Looking at this matter the other way, the in situ examples help us to see how much actually is involved when we do use the structured assessment procedures that these investigators have so successfully developed. A great wealth of interpretive appreciation of the phenomena is concretized in those measurement procedures. The in situ examples also

raise a new issue: is the 13-level sequence relevant for some or all naturally occurring situations, or is its relevance limited to the kind of skills that can be assessed in the particular ways typically employed in the research in question, which could be called skills at understanding of a more reflective sort? I am not asserting that the complexity sequence would not hold for a broad range of skills involving in situ behavior. I only wish to point out that the sequence might be limited in these ways. The work is interpretive. It is based on procedures that provide concrete examples of certain meaningful phenomena. Therefore, we can ask whether the assessments made in these investigations actually serve as concrete examples of clearly in situ psychological phenomena and, more

generally, we can ask what is the range of phenomena that are successfully tapped by the structured assessments. None of this is to argue against the value of this research. It is possible for raters to draw upon their prereflective understanding and employ the carefully developed manuals and the developmental model to assess complexity levels. Furthermore, it is of great interest that research efforts along these levels have demonstrated that the developmental sequence holds in many different areas when skills are assessed using the kinds of procedures that have been employed. In sum, I believe that the research by Dawson, Fischer, and their colleagues represents examples of excellent, ‘‘apparently strong’’ quantitative research. 5. Caveats concerning possible pitfalls

In my position paper, I made several specific suggestions about how researchers should change the ways they use quantitative methods. For example, I argued for using relational codes instead of always coding discrete behaviors. It should now be clear that my comments along those lines were misleading if they suggested that I believe certain quantitative methods (e.g., discrete behavior codes) are always problematic—or, if they suggested that I wanted to rule out quite generally what others might call ‘‘strong’’ quantitative methods. According to my approach, ‘‘good’’ quantitative research includes many examples of what others consider ‘‘strong’’ methods in addition to many examples of ‘‘soft’’ methods. Notwithstanding possible confusion, at this

juncture, I also want to state that this does not amount to wholesale approval of all quantitative research. I agree with Stam that there are real dangers in what he calls ‘‘Pythagoreanism.’’ Quantitative methods frequently are employed in a problematic manner. In my opinion, this occurs when they are used in such a way that they cannot serve their interpretive function. For example, measures of decibel levels are likely to fail at assessing angry behavior if the vocalizations in question do not occur in a structured situation in which loudness serves as a concrete example of such behavior (this is circular, and that is the point). In general, quantitative methods are unhelpful in a particular case insofar as they are actually used in a way that conforms to traditional