The Questionnaire as a Data Source
Whitney, D.R. (1972, April). The questionnaire as a data source. (Technical bulletin No. 13). Iowa City: University Evaluation and Examination Service. Prepared for the American Educational Research Association training session on Data Collection in Educational Research and Evaluation.
"The world is full of well-meaning people who believe that anyone who can write plain English and has a modicum of common sense can produce a questionnaire. The book is not for them."
A. N. Oppenheim
This Bulletin is intended for persons who are interested in using a questionnaire to collect data. It considers the questionnaire as a data source--that is, a scientific instrument for gathering reliable and valid information for some purpose(s). As such, it is somewhat more than just a "how to" list of suggestions for writing good questionnaires. It is intended to be used prior to the development of the questionnaire and/or survey design.
This Bulletin does not venture into the statistical analysis of questionnaire data, except briefly in discussing question form and sampling procedures. Assistance from a competent statistical consultant should be sought early in the planning stage of the study and frequently during the development of the instrument and preparation for data analysis.
Since this is not a text, and not a research paper, it was written to be readable and interesting. Consequently, little attention was given to documenting or referencing the suggestions. Rather, I freely admit to borrowing ideas from a variety of sources. These are listed at the end as references. For that which is useful in the Bulletin, the reader may credit those authors. For that which is not, I must take the responsibility. You should supplement this discussion with two of the texts listed in the reference section. These are by Oppenheim (an overview of survey research) and Payne (a witty exploration of question writing).
This Bulletin was written with the idea that you would "work through" your prospective project in stages as we go along. The order in which topics are treated is roughly the same as you will need to employ in designing your survey.
What is a Questionnaire?
This may seem to be an obvious question, but let's begin by making certain we're talking about the same thing. A questionnaire is a printed form, sent by mail to a respondent who completes the form and returns it. It is usually designed specifically for the study in question. Unlike a test, which yields only a total score, or an inventory, in which each item is interpreted as a part of a scale or group of items, a questionnaire yields many separate pieces of information. Analysis usually consists of tabulation or cross-tabulation of responses to individual items--rarely are more elaborate procedures used.
Questionnaires are most often used direct (person-to-person) contact with respondents is not possible or necessary. It is probably the singe most widely-used data source in educational research. Some experts have estimated that as many as half the research studies conducted use a questionnaire as a part of the data collection process.
When Should You Use a Questionnaire?
In spite of its wide usage, the questionnaire is not appropriate for all purposes. Even when you are developing your own questions, there are at least two other methods of data collection which you should consider.
Perhaps the most obvious alternative is using a questionnaire is the personal interview. The advantages of this approach include richness or response, ability to clear up misconceptions, opportunity to follow up responses, and, by implication, better data in many situations. Additionally, respondents will usually be more conscientious if the interviewer is present.
The phone interview is very similar to the personal interview, with the additional advantage of requiring less interviewer time. (It is not necessary to travel to conduct the interviews.) Using the phone also allows the respondent somewhat more anonymity than does face-to-face interviewing. Obviously, however, the costs of using this technique are prohibitive unless all respondents live in the same area or the interviewer has access to a toll-free telephone line.
In general, these methods are preferable to the questionnaire. If a questionnaire is used instead of these methods, the form must be simpler, the investigator loses control over the ordering and sequencing or responses, and the study will probably result in a somewhat lower response rate. There are, of course, offsetting advantages gained through the use of a questionnaire. The mailed survey is usually far less expensive and, because the form has been simplified, the analysis is simpler and less costly. Finally, the use of a questionnaire does reduce the influence (and consequent bias) due to the presence of the interviewer. Still interested? Let's go on.
Purposes of Questionnaire Studies
The underlying purpose of the study will, of course, dictate the items to be included on the questionnaire. It will also influence the method of sampling and statistical analyses. Before looking at the "how's" of questionnaire writing, let's look at the "why's". The purposes for most questionnaire studies fall into two categories--descriptive and analytical. Both purposes may be present in some studies.
The task here is to count something. The intent may be to estimate the parameters (population facts) for certain characteristics of a given population or to survey current practices in some field or profession.
In these studies, the investigator is usually interested in comparing characteristics among two or more populations. Studies in which the interest lies in exploring the relationship among variables for a single population also qualify as analytical studies.
Assumptions In Using Questionnaires
You may have decided that, in view of the advantages of the mailed questionnaire and the purpose of your study, you want to use a questionnaire. Fine. Before we proceed, however, let me call your attention explicitly to the assumptions we make when we use such an instrument. Typically we assume that the respondent
1. is a competent source of data (that he is able to answer).
2. will provide data willingly.
3. understands questions as the interviewer intends.
4. answers question in the form intended and with integrity.
If, after reviewing your purpose and potential respondents, these assumptions seem reasonable for your study, we can proceed.
Some Common Mistakes in Using Questionnaires
Before we advance to the "do's" of questionnaire development, let me warn you of some of the most common "don't's" in such studies. We'll touch on many of these items later in the recommendations sections, but in the hope that they will stick, I'll enumerate them at the outset. Eight of the most common mistakes are
1. asking for information which is more readily and/or accurately available elsewhere
2. failing to create sufficient incentive for the respondent to answer.
3. including questions which the respondent sees as ridiculous or unimportant.
4. including questions which encourage some sort of "favorable" response.
5. using equivocal or ambiguous questions.
6. using responses which are too limited in scope to be useful.
7. not living up to promises made to respondents.
8. developing a form which is too long or complicated.
You are probably saying "That's only common sense. I'd never do that." Fine! Come back to this point after you've roughed out your questionnaire and see if you can still make that claim.
Perhaps you are wondering when we will get to the "good stuff." Have patience. A questionnaire is not an end in itself, only the means to an end. If fits into an overall plan for the survey and needs to be considered in light of other decisions about the survey design. A survey is a planned collection of data for some purpose. As such, it must begin with a clear statement of purpose(s). Only if this is done prior to developing the questionnaire can the instrument be evaluated for quality as a data source.
Questions Involved in Survey Design
A number of questions must be answered in order to develop a survey design. These include:
1. What variables will be measured?
2. What method(s) will be used?
3. Who will be surveyed? Specifically, what population(s)? Will control groups be needed?
4. When will questioning occur?
5. What instruments have to be adapted or developed?
It may be helpful to pause here and try to answer these questions about your study before proceeding further.
Types of Variables
As you may already know, statisticians have a language all their own. However, we may borrow some of their ideas in order to gain a better understanding of the role played by the variables you have identified for study. These roles are:
These are the "causes" or predictors or antecedents in your study. They are usually manipulated systematically in order to study their effects.
These variables are the outcomes or results presumably attributable to the manipulation of the independent variables. They are the crucial foci of your attention in the study.
These are the variables we will try to eliminate or "make equal" in the study. This can be accomplished be excluding them from the study, (e.g., by selecting equal numbers or men and women in each group) or by holding them constant (e.g., by doing separate analyses for men and women).
These are the troublesome ones. They may be confounded with other variables of interest (e.g., men generally earn higher salaries than do women in similar jobs), or be the results of various kinds of errors (e.g., a deficient vocabulary among respondents from lower socio-economic levels). They are present in every study and must be considered when interpreting the results of the survey. Many of the recommendations which follow are intended to minimize the effects of the latter kind of uncontrolled variables. Careful selection of the sampling method can help reduce the effects of the former kind.
Sources of Error in Surveys
We're almost there. But first, a brief review of the common sources of error (uncontrolled variables) in surveys. Again, they may seem obvious and you may be sure that you won't commit them. OK. Check back here periodically as you develop your questionnaire. These sources include:
1. faults in the design of the survey.
2. sampling errors.
3. errors due to non-response
4. biases due to questionnaire design.
5. lack of reliability and validity in the questionnaire.
6. bias in coding free-response questions.
7. errors in processing or statistical analysis.
8. faulty interpretation of results.
Matters of Fact and Opinion
We've arrived! When you set out to build a questionnaire, one of the first problems is to decide what kind of questions you will ask. For the most part, your questions will ask for factual information or for attitudes and opinions. It is commonly presumed that factual questions yield more reliable responses than do opinion items. There are probably many reasons for this, but two stand out. In general, respondents are usually better able to give factual information. Perhaps more importantly, respondents are more willing to give factual information than to expose their attitudes or opinions.
This generalization has a number of implications for the way in which you design your questionnaire. In some cases, it may be advantageous for you to ask a large number of factual questions (e.g., about attendance at meetings, membership on committees, etc.) to enable you to infer attitudes (in this case, attitude toward the organization). Since attitude and opinion items are often less reliable, it is necessary to group them in some way to achieve reliable results. Obviously, opinion items must be very carefully developed.
The Criteria of Ability and Willingness to Respond
I've included to the assumption that respondents be both able and willing to respond truthfully to the questionnaire items. Fortunately, we don't have to leave this to chance. There are a number of things we can do to ensure that this assumption is legitimate.
After listing the variables which are to be measured in your study, you should ask yourself whether or not a better source exists for obtaining some of the data. Validity of your data is maximized to the degree that you can obtain data from "good" sources. For example, can you get some information about the respondent from organizational records and reports? Public records? Archives? The library? In addtion to obtaining such information from "hard" sources, this allows you to reduce the number of questions you have to ask the respondents. (A bonus--shortening the questionnaire will generally have a favorable effect on the respondents' attitude or willingness to respond!)
A second consideration stemming form these criteria should be an attempt to make the respondent want to respond truthfully. Some of the things you can do to help generate this attitude include giving the respondent a clear idea of the amount of time which will be required, making clear the kind of responses which will be required, and simplifying the questionnaire by eliminating trivial questions and unnecessary detail. That is, try to look at your questionnaire as the respondent will. Would you be willing (or better, eager) to cooperate?
The letter accompanying your questionnaire plays a key role in obtaining a truthful response. The letter should explain the purpose of the surbvey clearly and persuasively. The sponsor(s) of the study--both personal and institutional--should be apparent. Endorsement of someone whom the respondents see as prestigious will help. Other useful motivational techniques include convincing the respondent of the importance of the study, assuring him of confidentiality, (Explain the necessity of coding to avoid later mailings.) and making available a summary of the results. Be sure to stress that he is part of a carefully selected sample and you need his response. In order to help you determine the effect of you letter on the respondents, show it to your friends, your spouse, and your secretary. Ask them point blank if they would be willing to respond. No? Go back to work. Yes? Congratulations--step one accomplished.
A caveat. It is common for graduate students to include a "pitch" about the study being "for my dissertation" or "needing the results to graduate." If you are surveying college faculty, they will assume this. If you are surveying others, they won't care--or worse--will react negatively to this appeal. If you can't make the case for the study on its own merits, find another topic.
In a mailed survey, the directions or instructions to the respondent are intended to compensate for the absence of an interviewer. Let this be your guide to writing this part of the questionnaire. Try to tell the respondents as clearly as possible how and where to answer each question. (Even though you may think the form is self-explanatory, the respondent may not!) If you have used terms with which the respondent may not be familiar, give sufficient definition(s) to ensure that all respondents interpret your language the same way. This kind of standardization is crucial to the validity of your results. In so far as is possible, personalize the directions. That is, phrase them as if you were conducting the interview in person. Finally, use boldface, capitals, and italics to draw attention to the directions. This is especially important in the kind of questions which branch to various parts of the questionnaire depending on answers to previous questions. If necessary to avoid overly-complicated forms, ask everyone to answer every item and analyze the responses separately for those responding in different ways to the prior question.
What Kind(s) of Questions?
The variety of questions which may be used in a questionnaire staggers the imagination. (At least it does mine!) The criterion for choosing the kind of questions to be used is "Which kind best serves the purpose you have in mind?" This section should acquaint you with the more common kinds of questions and illuminate the strengths and weaknesses of each. Since your purposes are usually many, your questionnaire will probably employ more than one kind of question.
There are two types of questions: open-ended or closed-ended. That is, where the respondent supplies the answer or selects it from or marks a list supplied by the investigator. Some of the advantages and disadvantages of each may be obvious, but let me try to make them explicit.
Advantages of Supply-Type Questions
The advantages of this kind of question tend to be the disadvantages of the other type and vice versa. The advantages of open-ended questions are that they:
1. are subject to little of the investigator.
2. elicit a wide variety of responses.
3. are useful for introducing subjects or new parts of questionnaires.
4. provide background for interpreting results.
5. give respondents a chance to "have their say."
6. are more "courteous".
7. can aid in drafting questions and coding responses (when used in pilot work).
8. give "sparkle" and credibility to your final report.
Advantages of Selection-Type Questions
Conversely, the advantages of closed-ended questions are that they:
1. are interpreted more uniformly by respondents.
2. produce easily tabulated responses.
3. are unaffected by the respondent's verbosity.
4. eliminate some problems of vocabulary and definitions.
5. allow more questions to be asked.
Supply-type questions may range from long essays to the simple fill-in-the blank variety. When using the kind of questions, you should be sensitive to the length of responses requested. Since it takes longer for a person to think and write (as in free-response questions) than it does to think and mark (as in selection-type items), extensive use of free-response questions necessarily lengthens the amount of time required to answer the questionnaire. (Remember--willingness to respond!) Additionally, the amount of space allowed for the response dictates the length of the answer. For these reasons, free-response questions should probably be used only when a selection-type can't be developed to elicit the information. Also, poorly designed free-response questions can make for BIG problems in data tabulation and analysis.
The uses for free-response questions, however, are many and your questionnaire will probably include at least some of this variety. For example they can be used to solicit suggestions in both pilot work and the final form, as follow-up questions to selection-type items, to obtain reasons for other answers, and as probes for further explanation of previous answers. Another potential use is in the argument form--asking the respondent to list reasons for and against a proposal or action.
At the opposite extreme from free-response questions are those which ask the respondent to choose between two responses (e.g., yes-no, agree-disagree, etc.). Strictly dichotomous items are probably most useful for reporting behavior and for eliciting opinons about proposed or pending actions. There are a number of problems with this type of item, however, which may render it less useful than you might suppose.
For example, the implied alternatives to a dichotomous question are not always strictly complementary. Asking whether a respondent would forbid a certain behavior by his child will usually not yield complementary figures as compared to asking him if he would allow such behavior. For questions of this variety, the pilot study (asking the question both ways for halves of the tryout group) can be used to assess this effect.
When confronted with a two-way question, respondents often feel the need for an "undecided" or "no opinion" option. This begins to blend over to the multiple-choice format which will be considered next. Similarly, respondents may feel the need to qualify a dichotomous answer (e.g., Yes, unless it rains.). In this case, the investigator should try to anticipate the needed qualifications and convert the item to a multiple-choice format. Here, again, the pilot study can help determine the most likely responses if the question is asked in free-response form. The need for middle ground is more prevalent when the extremes are harsh (e.g., favor-oppose) than when they are milder (e.g., good idea-poor idea).
One very useful form of the dichotomous item is the argument type. In this form, the respondent is presented with the two arguments (usually one for and one against an action) and asked to indicate his preference. Or he may be presented with an argument and asked to indicate agreement or disagreement.
The flexibility of the 3-or-more choice item probably makes it the single most useful kind of item for questionnaire work. It allows for obtaining gradations of opinion and for combinations of reasons or actions. It differs from a free-response question in that it draws attention to possible alternatives instead of requiring the respondent to generate them. This is both a blessing--and a curse--in that it may suggest responses which would not have occurred to the respondent. For this reasons, some cautions on the use of such items are in order.
When faced with a set of responses, respondents tend to exhibit certain kinds of response patterns which you can partially counteract. For example, when asked a question which has numerical responses, respondents may exhibit a preference for the center of the scale. This may be counteracted somewhat when asking for estimates (opinions of fact) by placing the correct answer first or last on the list. If the question stem is rather complex, or if the responses are complex and lengthy, respondents may tend to choose the first or last response. Use of split-ballot technique in both pilot work and final form may help you counteract this tendency--or at least estimate its effect. (The split-ballot technique consists of giving different forms of the questionnaire to equivalent portions of the sample.)
When constructing the list of responses for the multiple-choice items, you should make every effort to ensure that your list exhausts the most likely responses. Use of a free-response form of the question in the pilot work is recommended as a technique for determining the most common responses.
When asking a question for which there is a likely dominant or obvious response (e.g., "it costs too much" or "I don't have the time"), it may be necessary to eliminate that option, either by saying "Aside from price, what . . ." or simply not listing the response (although this practice may be offensive to respondents).
When constructing this type of question do not, if at all possible, allow respondents to mark more than one answer. This prevents difficulties in interpreting results for questions for which the per cent responding may exceed 100%. Additionally, respondents may differ in their tendency to mark multiple answers. This difference in response tendency diminishes the validity of the resulting data. To counteract this, it may be necessary to offer combinations of responses which can be (singly) mark. Here, again, the pilot work can help identify necessary combinations. Use "best answer" directions to emphasize the restriction to one response per item.
Care must be given, when building lists of responses, to the "balance" of the list as a whole. Similar or closely-related responses tend to "bleed" responses from each other. In a list with one negative and four positive responses, for example, the popularity of the negative one may be severely overestimated because it represents the only response in that general direction. Similarly, the popularity of each of the positive ones may be underestimated because of the presence of the others. As a general rule, try to achieve some kind of symmetry in the responses.
You may have already asked "How man responses should I use?" Although there is no right answer to the question--the choice being dependent on so many considerations--most authorities suggest that it is difficult for respondents to keep more than 5 or 6 choices clearly in mind. Your decision will have to involve a compromise between completeness of the list and the need for brevity.
The basic format here is a list of behavior, actions or events which the respondents are asked to mark in certain ways (e.g., Which of the following do you do at least once a week?). The key in using this type of item is the development of the list. It should include all activities respondents are likely to engage in, but be brief enough to fit onto a single page and not unduly lengthen the questionnaire. Using a free-response form of the question in the pilot study is an excellent way to develop a brief list of "critical" behaviors or events.
The two-way grid is a variation of this method. The list is essentially the same, but the respondent is asked to check each element on the list for a number of situations or directions. A good example of this would be a list of patient symptoms and potential treatments in which the physician might be asked to indicate which treatment is generally prescribed for each symptom. In this way information is gathered simultaneously on two dimensions--the most common treatment for each symptom and the symptoms for which each treatment was most commonly prescribed. The grid is an efficient way to gather a lot of information quickly. (A bonus--the respondent can answer a lot of questions without having to learn new ways of responding.)
Ranking requires that the respondent arrange a set of objects in order with respect to some common aspect. This can be a rather powerful method to measure the preferences of the respondents or their perceptions of certain relationships. The essential component of this kind of task, the absence of which is probably the major error in questionnaires using ranking, is the need to BE SPECIFIC ABOUT THE CRITERIA FOR RANKING. That is, you should leave no doubt in the respondent's mind about the basis on which the ordering is to be done. This specification of the ranking criterion profoundly affects the validity of the results. Be careful and thorough. You should ask yourself, again, whether the respondent is able to perform the ranking based on good information.
Because ranking is a more complex task than checking, it is essential that the list of objects not be too long (i.e. no more than 10 items or objects). Most respondents will begin by identifying the objects fitting the extremes then struggle with those left in the middle. Don't have so many objects that the middle group is too large to be ranked confidently. (This produces negative attitudes.) If the list must be longer than 10 objects, a two-stage process can be used.
Since ranking makes use of ordinal information (as opposed to check lists which are essentially categorical or non-ordered in nature) powerful statistical techniques can be applied to the resultant data. It is necessary, however, to ensure (via your instructions to the respondents) that ALL OBJECTS ARE RANKED in order to apply these techniques. For your analysis' sake, make this clear to the respondents. Analysis procedures for incomplete rankings are far less useful and far less powerful.
A third step up the scale of response techniques (checklist were first, rankings were second) is the process of rating. A rating consists of the respondents assigning numbers to their judgements about the relative amount of some property which a set of objects possesses. This may be done either by having the respondents assign numerical weights (like percentages) or by having them locate the object on some kind of graphic marking scale. Rating involves more information than ranking, since it introduces the idea of scale or absolute location in addition to the idea of rank or relative location. As such, it allows the use of more sophisticated statistical techniques than do the previous kinds of items. Unfortunately, however, it also possesses some difficulties. For one thing, ratings are more subject to respondent inconsistency than are rankings. (Remember--ability of the respondent to do what is asked!) Secondly, the scale needs to be rather carefully designed. (There are a multitude of methods for scaling responses after they have been obtained to ensure certain properties.) Lastly, the tendency to respond toward the center of the scale may effectively reduce your 10 point scale to a range of only a few points. Again, differences among respondents in their use of extreme ratings clouds the validity of the results.
As with rankings, you need to be rather specific about the criteria for rating. Additionally, the extreme points on the scale should be defined verbally--not just numerically. Ideally, each point on the scale should be defined verbally.
To counteract the tendency of respondents to use only the favorable portion or a scale (called "generosity error"), you should make the scale long enough to be able to salvage some variability among ratings. This is particularly important if your study is of the analytical type. Another kind of response tendency ("halo effect" or the tendency to rate an object or person highly on all attributes if it (he) is seen as "good" in one area) can be partially diminished by arranging the scales so that the favorable end of the scale is alternately on the left and right side of the page.
Question Sequence and Arrangement
So you've got your questions written. Now what? The most common practice in the layout of items in a questionnaire is to place all those dealing with a single topic or idea together. This helpful in focusing the respondent's attention on one task at a time. Additionally, however, questions which use the same response format should be grouped to minimize the number of changes in the response rules. Particularly useful is the serialized question idea. Here, questions using the same directions and response options are grouped under the common directions and response categories. This eliminates the need to repeat the response options for each question (saving valuable space) and makes the respondent's job easier. Groupings like this should be short enough to fit on a single page, or the directions and response options should be repeated at the top of each page if the number of questions in large.
A useful method for organizing questions within a single content area is that of the "funnel" approach. Here, the topic is introduced with rather general questions (perhaps free-response) and followed by those which are increasingly detailed or deal with smaller aspects of the problem.
Sometimes you may find it useful to use "filter" questions to elicit answers from respondents only if they have answered the previous questions in a certain way. If this is done, review the earlier suggestions dealing with directions and format.
A more elaborate form of question organization is Gallup's Quintamensional plan. This plan is designed to ensure that the questions explore many aspects of the respondent's opinions. Questions asked (and their order) are:
1. Is the respondent aware of the issue?
2. What are his general feelings about the issue?
3. Questions involving specific parts of the issue.
4. What are the reasons for his views?
5. How intense or strong are his views?
Obviously, this technique involves some aspects of the "funnel" approach, but augments it with other considerations.
There are many other suggestions for question arrangement. The arrangement should be developed in such a way as to avoid implanting ideas in the respondents' minds early in the questionnaire which will influence their later responses. It is often useful (with respect to developing a favorable attitude) to start with relatively easy and impersonal questions and progress to more complex and sensitive ones.
One common practice in question sequencing should be avoided. Many investigators begin with a series of staccato questions designed to elicit biographical responses which will be used for grouping respondents in analytical studies. This practice may lead respondents who are interested in the study and ready to begin the task of answering (the result of a skillful introductory letter) to question whether the information being sought is relevant to the purposes described. Worse, they may be sensitized to the very group differences you hope to identify, and alter their responses accordingly. It is probably a better idea to leave such items for the end of the questionnaire (after obtaining the responses) where they can serve a kind of "cooling down" function from the experience of responding.
Questions for "Other" Purposes
In addition to the questions you include to obtain information about the independent and dependent variables of interest to you, there are a couple of other kinds of questions you may want to consider including in your questionnaire. One is the "sleeper" or "cheater" question designed to let you to assess the reliability of the responses. If, for example, you asked the respondents how many hours they spent working in a later section. The responses to the leisure time questions would be suspect for any respondent whose leisure-plus-working hours left little or no time for sleep. A variation of this type of question involves the introduction of a phony name into a list of objects or firms. Or you might want to ask, in a later section of the questionnaire, a question similar in nature (or perhaps slightly more inclusive or exclusive) to one in an earlier section. The consistency of responses for anyone answering with far fewer or more activities in one section than in the other is questionable. It should be noted, however, that such questions can only be used with factual information, since wording affects opinion items so markedly. The best defense against purposeful distortion is a healthy, positive attitude developed in the introductory letter.
Before we move on to some administrative and sampling topics, a common-sense rule for question wording is in order. Two things which should be avoided are the use of leading questions and loaded words. By these terms I mean (leading) questions which suggest a desirable or expected response and (loaded) words which are emotionally colored or which engender approval or disapproval. Reasons why certain questions encourage certain responses or why certain words conjure up issues of desirability are legion. Perhaps a few examples will suffice.
Many people are rather comfortable with the status quo. "Changing the law" may be seen as a more drastic step than "It should be possible." The results from these two wordings or similar questions may be quite different (and perhaps account for more variation in responses than does sampling error). Using the pilot study to try out alternative versions of questions may help avoid this.
Certain words may be loaded with differing amounts of prestige. For example, listing a "sanitary engineer" in an occupational checklist may draw more responses than listing a "janitor." People may claim to do something more often than they really do (e.g., attend concerts, go to art museums) or less often (e.g., smoke, use alcohol, watch TV) because of the status and prestige they associate with these activities. Your strategy here should be to use wording which minimizes this effect (e.g., "Have you had time to" rather than "Did you") and stresses accuracy and minimize values.
Needless to say, but I will, avoid embarrassing questions or items of a private nature.
In order to anticipate words with problems or connotation (and, in addition, those with multiple-meanings), it may be helpful for you to review the excellent list of 1000 "problem words" developed by Stanley Payne. Use it to check over the first draft of your questionnaire.
The Care and Treatment of Respondents
I've said earlier that it is crucial to develop in your respondents a positive attitude toward the completion of the task you've laid out for them. Let me be a bit more explicit. YOU ARE ASKING RESPONDENTS FOR THEIR TIME AND EFFORT to help you conduct your study. No effort should be spared to make their task enjoyable and worthwhile. We've already explored some ways in which this can be done (via the letter of introduction and looking at the questionnaire from the respondent's point of view). Here are some other suggestions for producing and maintaining positive feelings in your respondents.
Avoid staccato questions and be as polite as possible in your questions and directions. This may sound trite, but remember to say "please" and "thank you." Avoid using technical language and jargon. (Not everyone knows what is meant by right-to-work laws and Title IV programs.) Use a minimal number of response methods in your questionnaire--less adjustments for the respondent. To shorten the time require to complete the questionnaire, minimize open-ended questions requiring long responses. Make sure that the final form of the questionnaire looks like it is being used in an important study. Proof that copy as if your study depended on it--it does! Provide the return postage envelope along with specific directions for returning the materials.
The language you use in the questionnaire does affect the results. It also affects the attitude of your respondents. Don't "talk down" to your respondents. Give explicit definitions, to be sure, but do it in a way that makes the need for the definition constructive (e.g., What do you think of the work of the company-based lending and savings agency--credit union?) rather than deprecating (e.g., What do you think of the work of the credit union--the part of the company that lends money and pays interest on savings?). Respondents are more likely to read the words "now for you dummies" into the dash in the second version.
Other things to avoid include over-elaboration (especially when the elaboration may give rise to a contradiction), double negatives, lengthy questions and distinctions without real differences. If a respondent thinks "You asked me that before" you can be sure that you've lost some ground from there on.
Review your questionnaire to make sure that antecedents for pronouns are clear. If in doubt, repeat the term or the abbreviation. Also, be careful about abbreviations--they're not as universal as you may think. Whenever you ask respondents to recall or recount his activities, provide a peg on which to hang their answers (e.g., "during the month of August" or "in the average week"). If you ask for such a recounting, be sure that the time period is a natural one (few housewives could tell you how many bars of soap they purchase in a year without figuring how many they buy per week and multiplying by 52. You do the mathematical work.) and that the time period is not subject to seasonal variations which cannot be projected over an entire year. (Don't ask businesses for their sales in December and multiply by 12 to get a yearly figure.) If you ask for an estimate of size, length, or frequency, be sure to indicate the desired units of measurement.
Punctuation and grammar can also have subtle effects on responses. Respondents tend to pause at commas and dashes and perhaps jump the gun in their responses. If the question has an important qualifying phrase, place it at the beginning so respondents won't overlook it and interpret the question differently than intended.
Finally (whew!), let's go back to the need for brevity. Some of this should have been accomplished in the construction of your questionnaire (the real length). It may be helpful to look at the apparent length of the instrument as well. Professionally printed questionnaires, in addition to giving a polished and serious impression, also allow for major reductions in the physical length required for your questions. This is important, since the respondent's first impression of length is the number of pieces of paper. Other apparent shortening devices include the provision of plenty of white space (especially in the margins) and dividing the questionnaire into sections with items numbered separately within each section (so that the second indicator of apparent length--the number of the last question--is reduced). All else being equal, shorter questionnaires get better response rates but this can be offset by motivation, apparent length and other devices.
Reliability and Validity of Questionnaire Responses
We are considering the questionnaire as a data source. That is, as a scientific instrument for gathering data. As such, the measurement concepts of reliability and validity can, and should be, applied to this kind of instrument, as they are to standardized tests. In case your measurement background is rusty or non-existent, let me explain what is meant by these terms. Reliability is usually defined as the accuracy or consistency with which measurements are taken. That is, how well does the instrument measure whatever it does measure? Validity, on the other hand, refers to the degree to which an instrument measures what it is supposed to measure. For our purposes, we can consider reliability to be the accuracy with which respondents approach the true facts or attitudes. Obviously, many of the suggestions so far have been meant to improve or maximize both properties in your questionnaire. Clear wording and directions, positive attitude on the part of the respondent (and other considerations involving the criteria of able and willing to answer) improve both qualities. OK, but how do you go about estimating the reliability and validity of your questionnaire?
Reliablity and Validity of Factual Questions
I've said earlier that, in general, people tend to answer factual questions more accurately (reliably) than opinion items. Facts after all, are more observable (he did or he didn't; there are either two cars or three cars in the garage). That's not the whole answer. Some facts can be, and usually are, reported more accurately than others. Questions which ask the respondent to recall past information or which are asked in ways that are unique to the questionnaire are not so accurately reported. The reliability of your factual items can be assessed in a number of ways during both the pilot work and the final project--although it may be sufficient to estimate it from the results of the former.
One method for estimating the accuracy of responses is to ask the same question, or a related one, in another part of the questionnaire. If the results differ markedly and you think they should not, or if they don't differ in the way you think they should , the responses to the item may not be accurate. The split-ballot technique and pilot work allow you to try out different versions of the same questions (with a later check item comon to both) to see which version seems to result in the most accurate reporting.
When free-response answers are being coded and classified, the coder's objectivity (another aspect of reliability) should be made. Try out your coding rules in the pilot work (or, better yet, try out different coding rules) with different coders to estimate the degree to which coder idiosyncrasies affect the accuracy of the results.
The validity of factual items depends on both the respondent's motivation and the clarity of wording. Wherever independent sources of factual information are available, your results should be cross-checked. This may be done either for individuals or for the group as a whole. (If the local school records show that only 20% or the teachers took advanced degree work last summer, but 70% of your sample claims to have done so, the validity of their responses must be questioned.)
You might have thought of using an independent rater (such as the respondent's spouse or supervisor) to validate the answers to your items. This is acceptable if you can apply the criteria of ability and willingness to answer to their responses as well. (Plus, of course, getting around their desire to make the respondent look good or poor.) If time allows, you might interview a small portion of your sample in order to ask for corroboration or gather additional data to cross-check the original responses.
These techniques are useful for establishing the validity after the fact, but by then it may be too late. Your best safeguards are the rapport you establish with the respondent and your success at reducing reasons for distorting responses. It may surprise you to learn that the body of research on the validity of self-report information in education is largely supportive of its use. That is, people do what they say they'll do and report reasonably accurately what they have and haven't done in the past IF they have no reason not to.
Reliability and Validity of Opinion Questions
Here's the crunch! Since attitude and opinion items depend so heavily on their wording, it is not usually reasonable to use the similar-question internal check on reliability. In most cases, because the reliability of an individual item is so suspect, attitude item should be grouped (logically and empirically) into scales and considered as "tests" of a sort. The pilot work should play a crucial role in determining groupings of attitude items, as well as in eliminating items which do not correlate with others in the same logical grouping. When items are treated in this way, standard procedures for quantifying the internal consistency of the scales can be applied--often with pleasantly high coefficients.
General speaking, there is no criterion available against which attitude responses can be compared. Even the seemingly logical internal checks of attitude against the behaviors reported is faulty, since people often don't follow through on their attitudes and because the behavior may result from other, unmeasured factors.
If data on groups with known attitudes and behaviors is available, it may be used for some crude kinds of checks. Usually, however, the poorer substitute of construct validity must be used (that is, the degree to which attitude scales correlate with responses to other variables). Don't forget to check the literature for similar studies (ideally employing identical items and scales) in the hope of finding some results against which yours can be checked. These are usually available, although not always easily accessible, if yours is a replication of another study. Do you see the advantage of using available instruments and questions?
There is a mathematical relationship between validity and reliability which offers some additional hope for getting more valid results. Essentially, it is accurate to say that the reliability of an instrument sets an upper limit on the validity. If you think of reliability as the correlation between responses on two identical items and validity as the correlation between responses on the item and something else (the criterion), then the relationship should be clear. Responses to an item can't correlate higher with something else than they do with themselves. Increases the reliability and you will usually increase the validity to some degree.
Selecting a Sample
You may have been wondering where and how you get your respondents. By this time you should already have answered the "where" part. If, in designing the study, you have clearly identified the population(s) you wished to make statements about, you're finished with half of the sampling problem. If not, try to do that before going on.
Now for the "how" part. There are two kinds of samples: judgmental samples and probability samples. (Read that as poor and good samples--respectively) Judgmental samples, often called "grab samples," are those which arise when you take whomever is available. Your study can be run using such a sample, but you can not generalize the results to any larger population that taking your questionnaire. For example, if you survey all school administrators in the Des Moines, Iowa, school system you can get a good description of that group. BUT, you can't make any legitimate inferences about school administrators in large cities in Iowa or anyplace else. In order to do that, you'll need a probability sample from the larger population.
In order to qualify as a probability sample, two things must be present:
1. Each person in the sampling unit (more about that later) must have an equal likelihood of being chosen, and
2. the choice of one subject must in no way influence the choice of any other subject.
For most studies, a probability sample is necessary so that your readers will answer "I do" to the question "Who cares about that population?". There are many excellent books on sampling, and this topic will be treated only superficially here. This is another time to consult your statistician!
Sampling for Descriptive Studies
If your purpose is simply to estimate the characteristics of a certain population or populations, a simple random sample will probably suffice. (If you don't remember your purpose, go back to page 3.) The two necessary steps in taking a random sample are (1) to enumerate in some way every person in the population (Note: This says people, not school districts, businesses, etc.) and (2) to choose the N subjects by using a table of random numbers. Here the population is the sampling unit. These steps satisfy both requirements for a probability sample, and do so rather simply. In this way you can be sure that any characteristics present in the population are, more-or-less, present to the same degree in your sample. (This is one way to deal with uncontrolled variables.)
A fancier version of this procedure is the stratified random sample. If you want to make sure that some characteristics are present in your sample to exactly the same degree as they are in the population, you will need to divide your population into groups possessing different combinations of those characteristics, determine what proportion of the entire population each group represents (say xx%) and sample randomly xx% of your N subjects from that group. Here the groups are the sampling units. This technique is useful in controlling the effect of the most crucial independent variables.
Sampling for Analytical Studies
In this case, the sampling steps are similar to those above, but with one major difference. If you want to control the effects of certain variables, you must sample equal numbers of subjects from groups possessing varying amounts of variables. That is, after grouping your population into k groups according to the presence or absence of the controlled variables, you must sample N/k people from each group. Note that this results in a sample which does not reflect the degree to which the characteristics are present in the total population. This is necessary to control the effects of these variables and to ensure that the parameter estimates for each group will have the same degree of statistical precision. You can always weight responses to obtain good estimates for the total population if necessary.
Obviously, this is no job for an amateur. If you have laid out your lists of variables carefully in the categories described earlier and have identified the specific questions you wish to answer, your statistical consultant will be able to help you develop the appropriate sampling design.
The decision about the appropriateness of your sample involves not only the sample size (which is, of course, important), but also the representativeness of the sample for the purposes of the survey. A large sample is less important than a representative one!
One final consideration. It is safe to assume that whether or not a person responds to your questionnaire is not a random process. Deciding on the quality of a sample of respondents is, therefore, not simply a question of good initial sampling design, although that is a necessary condition. After the responses are in, checks must be made on those who did not respond. Two kinds of checks are possible. The first involves a comparison of known characteristics (from organizational records, addresses, sex, etc.) for responders and non-responders. Do the non-responders differ from responders? The second check is more difficult. It has been found that late-responders often differ from early-responders on matters of opinion and attitude. Further, responses or non-responders, when finally obtained tend to more closely resemble the later-responders. If you have recorded the date of return, you should compare early-and late-responders. If they differ greatly in their responses, it is probably safe to assume that the non-responders differ in a similar way. It is not safe to assume, however, that those who responded were those more interested in or favorable toward the topic.
Some statistical methods are available for offsetting the effects of non-response, but is far better to secure a high response rate and avoid the issue altogether. Many previous suggestions are designed to secure a good response rate. Others, involving follow-up procedures, will come later.
The Pilot Work
If you haven't guessed already, the pilot work (or tryout) is the key to the quality of your questionnaire. Unfortunately, it is also commonly neglected in educational research or used for only a few of the possible advantages if offers. I've mentioned a number of uses for the pilot work already and other will follow. In general, however, if you can't or don't run a pilot project in the development of your instrument, it is questionable whether the final results will be very useful.
An initial stage in the pilot work may simply involve distributing your questionnaire to a large number of friends, associates, and other willing to look at the instrument. This will be most helpful if you accompany the form with some specific questions about clarity, attitude, format, and so on. This is not, however, a substitute for a full pilot study using respondents like those who will take the final form.
One of the areas in which pilot work is most useful is in trying out your questions. The use of the split-ballot technique enables you to compare results received using different versions of the same questions. You can also check for the effects of varying the response locations. Use the pilot work to elicit lists of the critical behaviors for checklists as well as to determine which multiple-choice responses are necessary. The pilot stage is critical for developing and trying out coding rules for free-response questions.
Pilot work is also useful for debugging your administrative procedures. Try a couple of versions of the introductory letter to investigate the effect on response rate. You should use mailing procedures identical to those planned for the final version. Does everything work smoothly?
Handle the returns just as you will the final project. Do you tabulating and coding procedures work efficiently? How long did it take to get the data ready for analysis? Does fit with your timetable for the overall project? What will be the cost of the complete project?
Using the data from the pilot project, run the whole study through to its intended conclusion. Are you computer programs or computational aids ready? Did your rating scales yield a satisfactory spread of responses? Try writing up the results, at least briefly, as you intend to do with the final version. Have you asked all the questions you need to? Are there some you can eliminate? Was the response rate high enough? Try to personally contact some non-responders. Ask them why they didn't respond. Will you need to use a bigger N? Are there any strange results or peculiar responses?
The pilot study is also a good opportunity to gather data on the reliability and validity of your questions. Insert "sleeper" questions and internal reliability checks. Which version of the question resulted in more reliable results? Use it!
Use the pilot study to solicit reactions from the respondents on the questionnaire length. Does it need to be changed? Which parts didn't they like or which did they have trouble responding to?
One more point. It is crucial that the pilot work be done on a sample of respondents from the same population(s) you will use. Even if this reduces your final N, the gain in effectiveness will be worth it. If possible, you should use at least 50 subjects in the pilot (or 100 if you are using the split-ballot technique) to ensure that the response frequencies will be stable enough for you to draw the necessary conclusions.
Assembling the Final Form
In general, if you have done a thorough pilot project, the final version of your questionnaire should almost assemble itself. If the final form involves substantial revisions of the pilot form, you should pilot the final version to make sure you haven't inadvertently built in new flaws. Remember that the appearance of the form plays a big part in setting the attitude of the respondent toward the study. Don't just mimeograph it--have it printed and bound into a booklet complete with return address and postage. Make it easy for the respondent to cooperate.
Remember what I have said about response rate? Here is where the game is often won or lost. Nearly every study, no matter how good the original materials were, will need a careful and thorough follow-up design. Be sure to make plans so that you can identify the non-respondents at each stage in the follow-up. It'll save you money and the respondents frustration. Some of the earlier recommendation (e.g., return postage, guarantee of anonymity and confidentiality, timeliness, etc.) have been aimed at increasing the original response rate, since, the higher this rate, the less work you will have to do in the follow-up. In the introductory letter, set a deadline for returning the materials--usually one or two weeks--to encourage prompt response and to keep responses from dribbling in long after the study is completed.
A number of possibilities exist for follow-up. One particularly successful survey (which got a 99% return from 600 professional researchers) used the following plan:
1. A card reminder 1-2 weeks after the questionnaire mailing.
2. A second reminder a week later.
3. A second mailer of the questionnaire after another week.
4. A personal letter after the second mailing.
5. A short form of the questionnaire with the "most critical" items.
6. A second mailing of the short form.
7. Supplementary questions for those completing the short form.
8. Personal contact via telephone or telegram.
It is not expected, of course, that you will be able to use all of these techniques. You should, however, plan for enough follow-up contacts to insure that a low response rate will not invalidate an otherwise excellent study.
Preparation For Data Analysis
Questions of statistical treatment of your data are beyond the scope of this paper. You should have involved a statistical consultant well ahead of the time you conducted your pilot study. If not, you may severely limit ability to answer the questions you had in mind.
Some words are also in order about the administrative side of your data handling task. Prior to your pilot study, you should have planned for the procedures to be used in transferring you results from the questionnaire to tables (if the analysis is to be done by hand) or to punch cards (if the analysis is to be done by the computer). The former is often simplified by using a questionnaire form with offset margins so that all marks and tallies are visible without having to turn a page--a considerable saving of time with longer forms. The latter is greatly facilitated by using optical scanning forms from which a scanner can transfer the data to cards or magnetic tape. If you questionnaire lends itself to this treatment, you can have scannable forms printed with your questions, directions, and so forth. (Bonus--often the back of the form can be used for free-response questions.)
Computers not withstanding, there is no substitute for looking at the completed questionnaires to determine whether the numbers mean what they appear to mean!
A Final Word
As you begin to see the work involved in a good questionnaire survey, you may have asked yourself if it was worth it. If the study you have planned is worth doing, it is probably worth doing right. If not, why bother? In view of the fact that it commonly takes standardized test publishers 3 to 5 years to develop a new form of their tests, the time required to develop a good questionnaire is negligible.
Selected Books Discussing Questionnaire Surveys
Good, C. V. Introduction to education research (2nd ed.) New York: Appleton-Century Crofts, 1963.
Good, C. V., Barr, A. S., & Scates, D. E. The methodology of educational research. New York: D. Appleton-Century, 1935.
Koos, L. V. The questionnaire in education. New York: MacMillan, 1928.
Oppenheim, A. An. Questionnaire design and attitude measurement. New York: Basic Books, 1966.
Payne, S. L. The Art of asking questions. Princeton: Princeton University Press, 1951.
Rummel, J. F. An introduction to research procedures in education (2nd ed.) New York: Harper & Row, 1964.
Skager, R. W. & Weinberg, C. Fundamentals of educational research: An introduction. Dallas: Scott Foresman, 1971.
Wise, J. E., Nordberg, R. B., & Reitz, D. B. Methods of research in education. Boston: D. C. Heath, 1967.