Unformatted text preview:

Information Extraction Technique This document describes the instructions for using the Information Extraction Technique. The document is divided into 5 main sections: 1) An introduction to the technique 2) A glossary where the key terms are defined in this context 3) A procedure of steps that describes clearly WHAT you have to do. 4) A set of guidelines, i.e. heuristics to keep in mind as you do the above procedure. 5) A set of data entry forms 1) Introduction The Information Extraction Technique is a structured reading method for extracting information from papers, which can later be analyzed to explore the evidence that supports various hypotheses. In this assignment, you will follow the procedure for identifying and recording hypotheses from papers in the scientific literature. The procedure focuses the search specifically on hypotheses and context descriptions, providing some guidelines to help recognize and abstract them. 2) Glossary Hypothesis A hypothesis is a tentative explanation for certain behaviors, phenomena, or events that have occurred or will occur. A good hypothesis states as clearly and concisely as possible the expected relationship (or difference) between two or more variables and defines those variables in operational, measurable terms. Any hypothesis should be stated in such a way that data can be collected that either supports or refutes the hypothesis. For the purpose of our analysis, we classify hypotheses as tested or untested: a) Tested Hypothesis A tested hypothesis is a tentative explanation for certain behaviors, phenomena, or events that have occurred in experience or empirical study. b) Untested Hypothesis (Belief) An untested hypothesis (otherwise called a belief or assumption) is a tentative explanation for certain behaviors, phenomena, or events without explicit reference to empirical data. 3) Procedure 1) Read the paper, keeping in mind the two kinds of information that you want to identify: a. Hypotheses (tested and untested), and b. Context descriptions 2) When you find relevant information during your reading, highlight it so that there can be some traceability back to the original source if questions arise later. 3) Transfer the key details to the data entry forms. For complete descriptions of the fields you should complete, see section 5 on “data entry forms.” 4) Some Guidelines There will be at least one context description associated with each source (possibly more, if the paper describes data that was collected from several projects). Our experience is that the context descriptions usually come shortly after the introduction. As different studies report different metrics of interest to them, not every paper will have all of the required information for our template. However, the template should be filled out as completely as possible given the information that has been published. Our experience is that the section of the paper describing the analysis of the empirical data is where most tested hypotheses can be found. The conclusions are a good place to findhypotheses, although these are many times repetitions from earlier in the text. Some hypotheses will also be found in tables and figures; although not explicitly stated in the text of the paper, relationships that are expressed visually for readers will need to be translated into textual form to be inserted into the Hypothesis Form in a usable way. 5) Data Entry Forms An excel file contains worksheets for recording: - Context descriptions - Hypotheses 5.1) Context Descriptions Fill out one form for each paper (or each study recorded in the paper, if there are multiple). The attributes of the context description form are: • Paper Title: o The title of the paper from which you are extracting the information. • Complete Reference: o A complete bibliographic reference to this paper. • Topic: o Use the IEEE keywords at the following site to denote topic categories.  http://www.computer.org/mc/keywords/software.htm o Choose the keywords that best describe the subject of the study described in the paper. • Goals: o You should fill in a set of GQM goal templates for the study (using the form: Analyze O for the purpose of X with respect to M from the point of view of P in the context of C). Remember to make clear the entities being studied, (i.e., the process, product, model, metric, ...), the attributes of the entities that are of interest, the purpose of the study, (i.e., whether the study is aimed at characterizing, understanding, evaluating, predicting, or improving), and for whom the study should be of value, (i.e., a researcher, project manager, corporation,...). • Variables o Describe as many as possible of the following characteristics for each dependent and independent variable in the study:  Name: How the variable is referred to in the paper.  Possible Values: The possible values for the variable, if controlled.  Data Collection Details: Details of the method used to measure the variable, including for example what instrumentation and tool support were used. • Subjects o Describe as many as possible of the following characteristics for the subjects in the study:  Category: A generalized description of the experience level of subjects. Possible values here can be: • Undergraduate Students; • Graduate Students; • Students: This is an “unknown” type of students. • Professionals; • Scientists; • Unknown: Not described in the paper.  Number: The number of subjects that participated in the experiment.  Incentives: Describes the subject recruitment rewards. The possible values are • Grades: If students’ grades were affected by their participation • Extra Credit: If participating students received extra credit in the class • Payment: The subjects were paid to take part in the experiment. • Other Rewards: Please specify.• No Rewards • Unknown • Task o Category: Categorize the tasks given to subjects according to the tasks applied and the work products they were applied to (e.g. created a design document).  Possible values of tasks: • Plan • Create • Modify • Analyze  Possible work products: • Requirements • Architecture/design • Code • Etc. o Duration:  Duration of the time that subjects had available for the task(s). o Work Mode: Select whether subjects performed the task(s) as:  Team  Individual o Application


View Full Document

UMD CMSC 735 - Information Extraction Technique

Download Information Extraction Technique
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Information Extraction Technique and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Information Extraction Technique 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?