Toronto CSC 2130 - On the Difficulty of Replicating Human Subjects Studies in Software Engineering

Unformatted text preview:

On the Difficulty of Replicating Human Subjects Studies inSoftware EngineeringJonathan Lung, Jorge Aranda, Steve Easterbrook and Greg WilsonDepartment of Computer Science,University of TorontoToronto, Canada, M5S 2E4{lungj, jaranda, sme, gvwilson}@cs.toronto.eduABSTRACTReplications play an important role in verifying empiricalresults. In this paper, we discuss our experiences performinga literal replication of a human subjects experiment thatexamined the relationship between a simple test for consis-tent use of mental models, and success in an introductoryprogramming course. We encountered many difficulties inachieving comparability with the original experiment, due toa series of apparently minor differences in context. Based onthis experience, we discuss the relative merits of replication,and suggest that, for some human subjects studies, literalreplication may not be the the most effective strategy forvalidating the results of previous studies.Categories and Subject Descriptors: A.m GeneralLiterature: MISCELLANEOUSGeneral Terms: ExperimentationKeywordsexperience report, empirical, human subjects, replication1. INTRODUCTIONReplication of empirical studies is frequently advocatedbut rarely practiced. For example, Basili et al. argue thatsystematic replication of experiments is crucial for buildingknowledge [1], while Kitchenham et al. identify the lack ofincentive for conducting replications as one of the barriers toevidence-based software engineering [9]. In a recent surveyof the empirical software engineering literature, Sjøberg etal. [14] found only twenty instances of published replications,just nine of which were performed by researchers otherthan the original team. The problem isn’t unique to SE –replications are rare in many fields.Many have speculated on why replication is rare. Amongthe reasons cited are the lack of information in published re-ports, even where materials are available, and that reproduc-ing an experiment requires tacit knowledge that would neverbe captured in published reports [11]. Also, replications arePermission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.ICSE’08, May 10–18, 2008, Leipzig, Germany.Copyright 2008 ACM 978-1-60558-079-1/08/05 ...$5.00.seen as less interesting than novel research, and there is aperception in the research community that replications arehard to publish [9].In this paper, we concern ourselves only with replicationfor experiments involving human subjects. Such experi-ments are increasingly important for improving our under-standing of social and cognitive processes involved in SE.For these experiments, threats to validity are introduced byfactors such as variability in human behaviour, difficulty ofisolating confounding factors, and researcher bias. Effectsobserved in a single study might be caused by factors thatwere not measured or controlled. The aim of replicationis to check that the results of an experiment are reliable.In particular, external replication (replication by differentresearchers) can identify flaws in the way that hypothesesare expressed and can help to identify the range of conditionsunder which a phenomenon occurs [2].To properly replicate a human subjects experiment, pub-lished reports are usually insufficient. Basili et al. advocateusing lab packages, whereby experimenters provide all theirexperimental materials along with precise details of theirdata collection and analysis techniques [1]. Even then,collaboration with the original team is important – possiblyeven essential.Unfortunately, there are very few published experiencereports of the challenges of replication in SE, beyond thosecited above. This leaves many questions about replicationunanswered. For example, how much involvement of theoriginal research team is normal or necessary, and how doesone achieve a balance between involvement and maintainingindependence? How should we balance the goal of attempt-ing a faithful replication against opportunities to improve onthe original design? Are there cases where an entirely newstudy would be more suitable? And, if exact replication isimpossible, how close can we can get, and how much dovariations matter?In an attempt to better understand replications, we per-formed one ourselves. We selected a study that was gener-ating considerable buzz on the Internet in 2006. Dehnadiand Bornat had written a draft paper entitled The CamelHas Two Humps, in which they claimed to have developeda test, administered before students were exposed to in-structional programming material, that is able to accuratelypredict which students would succeed in an introductoryprogramming course and which would struggle [5]. Theclaims were startling enough that, even though the paperwas unpublished1, several groups around the world set out1The paper is still, to date, unpublished.to replicate the experiment. We chose to replicate thisparticular study for a number of reasons: we were interestedin the results ourselves; the study design appeared to besound, but (like all experiments) had a number of potentialthreats to validity; and the experimental materials werereadily available from the original experimenters so thatperforming the replication seemed straightforward.In attempting to replicate this experiment, we encoun-tered many unexpected challenges. We discuss how we dealtwith each of them and reflect on our experiences. We con-clude that conducting a literal replication is hard, even withgood access to the original researchers and their materials.For this particular study, we now believe we would havelearned more by designing a new experiment rather thanreplicating the existing one. We draw on our experienceperforming this replication to explain this conclusion.2. BACKGROUND2.1 The Role of Replication in SEBecause of the importance of human activities in softwaredevelopment, empirical methods in SE are typically adaptedfrom disciplines that study human behaviour, both at theindividual level (e.g. psychology) and the team and organi-zational levels (e.g. sociology) [7]. The complexity of humanbehaviour means that these methods only provide limited,qualified evidence about the


View Full Document

Toronto CSC 2130 - On the Difficulty of Replicating Human Subjects Studies in Software Engineering

Documents in this Course
Load more
Download On the Difficulty of Replicating Human Subjects Studies in Software Engineering
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view On the Difficulty of Replicating Human Subjects Studies in Software Engineering and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view On the Difficulty of Replicating Human Subjects Studies in Software Engineering 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?