> Y bjbjWW f== |]j"L$PVd'
lv4000B%D%D%D%-q%E&'$)t+V='E0
",00='hnnP
hhh0nRB%nnnn0B%hh R0$"B% 4($% chapter 4
ADVANCE \d7
DESIGNING AND SELECTING THE SAMPLE
This technical chapter is intended mainly for sample specialists, but also for the surveycoordinator as well as other technical resource persons. It will help you to:
ADVANCE \d4 ( Recognize the features of a proper probability sample design
( Assess whether an existing sample can be used, OR
( Assess if a new sample must be designed
( Decide on the sample size
( Be informed about weighting, estimation and sampling errors
Editors note Users of the first edition of the Handbook will observe that this chapter is considerably revised. This version reflects revisions arising from valuable sampling lessons learned in the first round of surveys, from recommendations provided by the MICS evaluation team, and from invited commentary on the chapter by several sampling experts.
Features of a Proper Probability Sample Design
Conduct of the multiple indicator survey in your country will be done on a sample basis, as opposed to collecting data for the entire target population. There are different target populations in the survey households, women ages 15 to 49 years, and children in different age groups. The respondents, however, will usually be the mothers or caretakers of children in each household visited.
Design of an appropriate probability sample for the survey is just as important as development of the various questionnaire modules, in terms of producing results that will be valid and as free from bias as possible. There are a number of ways you can design a probability sample, and each country will undoubtedly have its own situation, conditions, and data needs that dictate the particular sample plan it adopts. There are certain features that should be observed by all countries, however, to meet the requirements of a scientific probability sample:
Use accepted probability sampling methods at every stage of sample selection;
Select a nationally representative sample;
Ensure that the field implementation is faithful to the sample design;
Ensure that the sample size is sufficient to achieve reliability requirements.
In addition to these four requirements, there are other features of sample design that you are strongly recommended to adopt, although each may be modified in certain ways depending upon country situations and needs. They include:
Simple, as opposed to complex, sampling procedures;
Use of the most recent population census as the sampling frame;
A self-weighting sample, if possible.
To avoid sample bias you should use probability sampling to select the respondents. Sample bias depends on the selection techniques, not the sample size. Increasing the sample size will not eliminate sample bias if the selection techniques are wrong.
The use of scientifically grounded probability sampling methods for surveys has been practiced in most countries of the world for decades. If a sample is not accurately drawn from the whole population of interest, by using well-known probability techniques, the survey estimates will be biased. Moreover, the magnitude of these biases will be unknown. It is crucial to ensure that the sampling methodology employs probability selection techniques at every stage of the selection process.
Probability sampling is a means of ensuring that all individuals in the target population have a known chance of being selected into the sample. Further, that chance must be non-zero and calculable. A sure clue of not having a probability sample is when the sampling statistician cannot calculate the selection probabilities of the sample plan being used.
Examples of sampling methods that are not based on probability techniques are judgment samples, purposive samples, and quota samples. The random walk method of selecting children is a quota sample procedure. It is important that you not use such procedures for the multiple indicator survey.
The best way to control sampling bias is to insist on strict probability sampling. There are other biases, non-sampling in origin, including nonresponse, erroneous response, and interviewer errors, but these will occur in varying degrees anyway, no matter what kind of sampling methods are used. Appropriate steps must be taken to control these non-sampling biases as well, including such measures as pretesting, careful interviewer training, and quality control of fieldwork.
A second required feature of sample design for the indicator survey is that the sample should be national in scope and coverage. This is necessary because the indicator estimates to assess attainment of the WSC goals must reflect the situation of the nation as a whole. It is important to include, to the extent practicable, difficult-to-enumerate groups to ensure complete national coverage. Such groups might be nomads, homeless or transient persons, refugee camps, military quarters, as well as settlements in isolated areas that are difficult to access for one reason or another. It is quite likely that children in particular, living in such situations, have different health conditions from those found in more stable or traditional living environments, and excluding them would result in biased indicator estimates.
For probability sampling to be effective, it is essential that the field implementation of the sample selection plan, including the interviewing procedures, be faithful to the design. There have been numerous occasions where lax fieldwork has ruined an otherwise perfectly acceptable sample design. The field supervisors must make certain that the sample selection procedures are followed strictly.
A crucial feature of valid probability sampling is the specification of precision requirements (margin of error), in order to calculate the sample size. This topic, which is fairly complicated, is discussed in Appendix Seven.
Your sample should be designed as simply as possible. It is well known that the more complexity that is built into the sample plan, the more likely its implementation is to go wrong. This can be especially troublesome at the field level if complicated sampling procedures have to be carried out. Moreover, the operational objective to produce the survey results in a timely way may not be met.
It is strongly recommended that the most recent population census be used as the basis for the sample frame, updated if necessary. Nearly all countries of the world now have a recent population census, one conducted within the last 10 years. The frame is essentially the set of materials from which the survey sample is selected. A perfect sampling frame is one that is complete, accurate and up-to-date, and while no frame is 100 percent perfect, the population census comes closest in most countries. The prime use of the census for our survey is to provide a complete list of enumeration areas (EAs) with measures of size, such as population or household counts, for selection of the first-stage sampling units. Maps are usually part of the census of population in most countries, and these might include sketch maps for the enumeration areas. The maps are a useful resource because the selected enumeration areas will likely have to be updated in terms of the current households residing therein, especially if the census is more than a year or two old.
A sample plan is said to be self-weighting when every sample member of the target population is selected with the same overall probability. The overall probability is the product of the probabilities at each of the stages of selection. A self-weighting sample is desirable because various estimates can be prepared, for example, percentage distributions, from the sample figures without weighting, or inflating, them. In keeping with the desire for simplicity in sample design, it is better to have a self-weighting design than a more complicated, non-self-weighting one. Still, self-weighting should not be considered a strict criterion, because weighting the sample results to prepare the estimates can be easily handled by todays computers. Moreover, there are some situations where the sample design cannot be self-weighting.
ADVANCE \d7 (ADVANCE \r4Example:
Suppose that in your country you will need separate urban and rural indicator estimates, and suppose further that you want the estimates to be equally reliable. This would necessitate selecting a sample of equal size in the urban and rural sectors. Unless the urban and rural populations are equal, the sampling rates in each would be different. Hence, the overall national sample would require weighting for correct results and, therefore, the survey sample would not be self-weighting.
Determining What Sample to Use for MICS
Designing, selecting, and implementing a probability sample from beginning to end is a time-consuming and expensive process. For the MICS there is the need to produce the indicator estimates in a comparatively short time frame, and you may not have sufficient time to design a new sample for the survey. Hence, there are two major steps to be followed in determining what sample to use for MICS:
Step 1: Determine if an existing sample can be used for MICS.
Step 2: If no suitable existing sample can be found, develop a sample specific to MICS.
The following sections follow these steps to delineate three options for a MICS sample design.
Use of an Existing Sample Option 1
Fortunately, most countries have well-developed survey programmes through their national statistical offices or health ministries. It may be possible in your country, therefore, to use an already-existing sample, one that has been designed for other purposes. This is the recommended option for your survey if the existing sample is a valid probability sample and is available. The existing sample must be evaluated to see if it meets the requirements of probability sampling (discussed above in the Manual) plus the other features of sound sample design that were discussed above.
There are various ways in which an existing sample may be used, as follows:
Attaching MICS questionnaire modules to the questionnaires to be used in another survey;
Using the sample, or a subset, from a previous survey;
Using the household listings in the sample enumeration areas of another survey;
Using the enumeration areas from a previous survey with a fresh listing of households.
Of these choices, there are advantages and disadvantages to each. Timing considerations are also key. For example, it will not be possible to utilize the first choice if no other survey is going to be carried out within the prescribed time frame for the MICS. This choice attaching the questionnaire modules to another survey, sometimes called piggy-backing because the data for both surveys are collected simultaneously has an obvious appeal since the sampling will have already been done, thus saving the sampling costs for the MICS. A major disadvantage, however, can be the problem of respondent burden since MICS questionnaires are quite long and the parent survey may have its own lengthy questionnaire. These aspects must be carefully evaluated.
The second choice, using the sample from a previous survey, also has the advantage that the sample design is already in place, again saving the sampling costs. If the sample size for the previous survey is too large, it would be a simple matter for the sampling statistician to sub-sample the original sample to bring the size into compliance with the MICS requirements. By contrast, however, if the sample size is too small, expanding it is more problematic. There is also the disadvantage of revisiting the same households from the previous survey, again because of potential problems arising from respondent burden and/or conditioning. Finally, the previous survey must be very recent for this to be a viable choice.
The third choice, using the household listings in sample enumeration areas from a previous survey as a frame for selecting the MICS sample, has the dual advantage that (1) the first-stage units are already sampled and (2) household listings are already available. Hence, again, most of the sampling operations and costs will have already been taken account of. An advantage is that different households would be selected for the MICS, thus eliminating the problems of respondent burden, fatigue, or conditioning. A disadvantage is that the household listings would be out of date if the previous survey is more than a year or two old, in which case this choice would not be viable. In fact, when the household listings are out-of-date, then the fourth of the choices listed above can be considered. This choice requires making a fresh listing of households in the sample enumeration areas before sample selection. While this has the disadvantage of having to update with a new household listing operation and its associated expense, the advantage is that the first-stage units would have already been selected and the sample plan itself is basically in place without further design.
Table 4.2
Option 1 Existing Sample
Pros
ADVANCE \d4 ( Saves time and cost
( Likely to be properly designed with probability methods
( Adjustments to fit the MICS can be simple
Cons
ADVANCE \d4 ( Requires updating if old
( Respondents may be overburdened
( Indicator questionnaire may be too long if piggy-backed
( Adjustments to fit the MICS can be complex Each of these points should be carefully evaluated and a determination made about the feasibility of implementing the necessary modifications before you decide to use an existing sample.
An existing sample that may be an excellent candidate is the Demographic and Health Survey (DHS). Many countries have conducted these surveys recently and others plan to in the coming months. The measurement objectives of DHS are quite similar to the MICS. For that reason, the sample design that is used in DHS is likely to be perfectly appropriate for your use.
To use the DHS sample, it would be necessary to evaluate its availability, timeliness, and suitability in terms of your requirements. Either a very recent DHS sample could be used to field the MICS, or an upcoming DHS could be used with the MICS as a supplement. It would require agreement and cooperation with the DHS sponsoring or implementing agency in your country, noting the constraints mentioned above about overburdening respondents.
Another survey that many countries have implemented and whose sample may be appropriate for your use is a labour force survey. While the measurement objectives of labour force surveys are quite different than the health-related objectives of the MICS, labour force surveys are frequently designed in a very similar fashion to health surveys in terms of stratification, sample size, and other sampling criteria.
A Specific Sample Design for MICS
As noted in Step 2 earlier in this chapter, when a suitable existing sample is not available for use in MICS, either for a stand-alone survey or a supplement to another survey, a new sample will have to be designed and selected.
In this section of the Manual we recommend the main properties of the design that the MICS sample should possess. The sample size is of course a key feature and it is taken up as a separate section later in the Manual. Two options are presented below, but first the general features are summarized.
In the most general terms, your survey sample should adopt the features included at the beginning of this chapter. It should be a probability sample in all stages of selection, national in coverage, and designed in as simple a way as possible, so that its field implementation can be easily and faithfully carried out with minimum opportunity for deviation from the design. In keeping with the aim of simplicity, both stratification and the number of stages of selection should be minimal. Regarding stratification, its prime purpose is to increase the precision of the survey estimates, plus permit oversampling for subnational areas when those areas are of particular interest. A type of stratification that is simple to implement and highly efficient when national level estimates are the main focus is implicit stratification. It is a form of geographic stratification which, when used together with systematic pps sampling (see illustrations in Appendix Seven), automatically distributes the sample proportionately into each of the nations administrative subdivisions, as well as the urban and rural sectors. Implicit stratification is carried out by geographically ordering the sample frame in serpentine fashion, separately by urban and rural, before applying systematic pps.
Further, the design should be a three-stage sample. The first-stage, or primary, sampling units (PSUs) should be defined, if possible, as census enumeration areas and they should be selected with pps. The enumeration area is recommended because the PSU should be an area unit around which the fieldwork can be conveniently organized; it should be small enough for mapping, segmentation, or listing of households, but large enough to be easily identifiable in the field.
The second stage would be the selection of segments, and the third stage the particular households within a segment that would be designated for interview in the survey. These households could be selected in a variety of ways through subsampling from an existing list of the households in each segment or a newly created one.
There is, of course, room for flexibility in this design, depending on country conditions and needs. The design is likely to vary a good deal from one country to another with respect to the number of sample PSUs, the number of segments per PSU, and the number of households per segment, and, hence, the overall sample size.
As a very general rule of thumb:
( The number of PSUs should be in the range of 250 to 350;
The cluster sizes (that is, the number of households to interview in each segment) should be in the range of 10 to 40, depending upon which of two options described below is followed;
The overall sample size should be in the range of 2,500 to 14,000 households.
A country may decide, for its own purposes, that it wants indicator estimates for a few subregions in addition to the national level. In that case, its sample design would undoubtedly include a different stratification scheme and a greater number of PSUs, so as to ensure adequate geographic representation of the sample areas in each subregion. In addition, the sample size for the survey would have to be increased substantially in order to provide reliable estimates for subregions, or for other subnational domains (discussed in more detail later in this chapter).
Standard Segment Design Option 2
It was mentioned above that the DHS programme may provide a suitable existing sample for use in the MICS. The standard DHS sample design is in fact a good model for the MICS, if you decide that a new sample has to be designed. The DHS sample model has also been used in other health-related survey programmes such as the PAPCHILD surveys in the Arab countries.
The DHS and PAPCHILD sample models are based on the so-called standard segment design, which has the benefits of probability methodology, simplicity, and close relevance to the MICS objectives, both substantive and statistical. The sampling manuals for DHS and PAPCHILD note that most countries have convenient area sampling frames in the form of enumeration areas of the most recent population census. Sketch maps are normally available for the enumeration areas, as are counts of population and/or households. The census enumeration areas are usually fairly uniform in size. In many countries, there are no satisfactory lists of living quarters or households, nor is there an adequate address system, especially in many rural areas. Consequently, it is necessary to prepare new listings of households to bring the frame up to date.
To apply the standard segment design to the MICS, first arrange the census frame of enumeration areas in geographic sequence to achieve implicit stratification. Some enumeration areas are so large that it is not economically feasible to carry out a new listing of all households if they are selected. Instead, it is more efficient to use segments. This is done by assigning each enumeration area a measure of size equal to the desired number of standard segments it contains. In the DHS and PAPCHILD sampling manuals, it is recommended that the number of standard segments be defined (and computed) by dividing the census population of the enumeration area by 500 and rounding to the nearest whole number. This size for the standard segment is recommended for the MICS, if you decide to use Option 2.
Table 4.3
Option 2 Summary of Standard Segment Design
Features
ADVANCE \d4 ( Three-stage sampling with implicit stratification
( Selection of enumeration areas by pps
( Mapping and segmentation in enumeration areas with more than one standard segment
( Selection of one segment at random in each enumeration area
( Listing of households in sample segments
( Systematic selection of sample households in segments
Parameters
Usually, 250 to 350 sample enumeration areas (PSUs)
Standard segments of 500 population (about 100 households)
Non-compact cluster of size of 10 to 40 households (differs from Option 3 below)
Sample size of, usually, 2,500 to 14,000 households
The next step is to select sample enumeration areas using probability proportionate to this measure of size. Note that the measure of size is also the number of segments. In many cases you may find that the average size of an enumeration area is about 500 population (or, equivalently, 100 households when the average household size is 5); therefore, the typical measure of size will be one.
Segmentation, using the available maps, is the next phase of operation. When the number of segments in a sample enumeration area is equal to one, no segmentation is necessary, because the segment and the enumeration area area are one and the same. If the number of segments is greater than one, then segmentation will be necessary. This entails subdividing the sampled enumeration area into parts (equal to the number of segments), with each part containing roughly the same number of households. Segmentation may be done as an office operation if the maps are accurate enough; otherwise a field visit would be necessary, especially in cases where identifiable internal boundaries within the enumeration area are not clearly delineated (see Chapter 6 for details on mapping and segmentation).
After segmentation, one segment is selected at random in each sample enumeration area. In all selected segments, a new household listing is undertaken; again, this will be typically about 100 households. Then, from the listings, using a fixed fraction, a systematic sample of households is chosen in each sample segment for interview.
Table 4.4
Option 2 Standard Segment Design
Pros
ADVANCE \d4 ( Probability sample
( Minimal mapping and segmentation
( Amount of listing is minimal
( Somewhat more reliable than Option 3 (below)
( Partially corrects for old sampling frame
( Self-weighting design
Cons
ADVANCE \d4 ( Listing, though minimal, is necessary in every sample segment
( May give widely variable segment sizes, especially if frame is old
ADVANCE \d7
(ADVANCE \r4Example:
It might be decided to select one fifth of the newly listed households in each sample segment. Thus, if there are, say, 300 segments, then the number of households selected in each segment would be approximately 20 (though it would vary by PSU) and the overall sample size would be approximately 6,000 households.
The standard segment design is convenient and practical. In a typical country, that is, one where the enumeration area averages about 100 households, very little actual segmentation would have to be done. Moreover, the amount of household listing is also limited.
The sample households under Option 2 are contained within non-compact clusters, and the sample is self-weighting. The number of households selected in each sample PSU will vary somewhat because the PSUs are selected based on their census sizes, which will undoubtedly be different from the actual sizes when the new household listing is made.
ADVANCE \d7 (ADVANCE \r4Example:
Suppose the within-segment selection rate is calculated to be 1 in 5 of the listed households. If a segment is selected on the expectation of 98 households based on the census, but the listing shows there are now 112 households, then a one-fifth sample of the households will yield 22 or 23 households (the correct number) instead of the expected 19 or 20. The procedure not only reflects population change correctly, but it also retains the self-weighting nature of the sample. The deviation in the average segment size should not be great, unless the frame is old.
Modified Segment Design Option 3
We have discussed the use of an existing sample as the preferred option for the MICS, whenever a well-designed existing sample is available and relevant. We have also discussed using the DHS and PAPCHILD model sample plan, the standard segment design, as the next best option whenever your country has to design the indicator survey sample from scratch.
Option 3 uses a modification of the standard segment design. The modified segment design is similar to the standard segment design, but there are important differences. Rather than creating standard segments of size 500 population in each sample enumeration area, the latter is subdivided into a predetermined number of segments. This predetermined number is equal to the number of census households in the enumeration area divided by the desired cluster size and rounded to the nearest whole number.
(ADVANCE \r4Example:
If the desired cluster size is 20 households, and there are 155 households in the enumeration area, then 8 segments would be created.
As with Option 2, enumeration areas are sampled with probability proportionate to the number of segments they contain. Each selected enumeration area is then segmented into the predetermined number of segments using sketch maps together with a quick count of current dwellings. Carefully delineated boundaries must be formulated in the segmentation, and the number of dwellings in each segment should be roughly equal, although it need not be exact. Note that the quick count can be based on dwellings rather than households, except that for multi-unit dwellings, inquiries would likely be necessary to ascertain the number of units.
After segmentation, one (and only one) segment is selected at random within each sample enumeration area. All the households contained within the boundaries of the sample segment are then interviewed for the survey, the segment thus forming a compact cluster of households.
The other features of the modified segment design are essentially the same as the standard segment design three-stage sampling, implicit stratification, pps selection of enumeration areas.
Table 4.5
Option 3 Summary of Modified Segment Design
Features
ADVANCE \d4 ( Three-stage sampling with implicit stratification
( Predetermination of number of segments by PSU
( Selection of census enumeration areas by pps
( Mapping and segmentation in all sample enumeration areas
( Selection of one segment at random in each enumeration area
( Interview of all sample households in selected segment
Parameters
ADVANCE \d4 ( Usually, 250 to 350 sample enumeration areas (PSUs)
( Compact cluster of 20 to 30 households (minimum size 20)
( Sample size of, usually, 5,000 to 10,500 households
( Segment size and cluster size are synonymous (unlike Option 2)
The modified segment methodology has an advantage over the standard segment design in that no household listings need be undertaken, thus eliminating a major survey expense. The quick-count operation and sketch mapping do, however, bear an additional expense, but the cost of the quick count is minimized since it can be done by visual inspection rather than actually knocking on doors to speak to respondents. In addition, the procedure compensates for using a sampling frame that may be outdated by interviewing all the current households in a sample segment no matter how many there were at the time of the census.
A disadvantage of the modified segment design is that the segments (the clusters) are compact. Therefore, with the same sample size, the sampling reliability for this design will be somewhat less than the standard segment design, where the clusters are non-compact. This could be compensated, however, by sampling more enumeration areas with a smaller sample take within the enumeration areas. Another disadvantage of the modified segment approach is that the segmentation itself requires comparatively small segments to be delineated, which may not be practical in some countries. It can be very problematic in small areas where there are not enough natural boundaries such as roads, lanes, streams, etc. for the segmentation to be accurate or even adequate. For this reason, it is recommended that the segment size under this option be at least 20 households; and to compensate for the decrease in reliability with the compact segment, it should not be greater than 30 households. Boundary delineation is extremely important when forming segments, in terms of controlling sampling bias.
Table 4.6
Option 3 Modified Segment Design
Pros
ADVANCE \d4 ( Probability sample
( No listing of households required
( Partially corrects for old sampling frame
( Self-weighting design
Cons
ADVANCE \d4 ( Mapping, quick count, and segmentation necessary in every sample enumeration area
( Creation of small segments may not be practical
Somewhat less reliable than Option 2 for same sample size
Shortcut Designs Not Recommended
In the mid-decade version of this Manual, considerable attention was devoted to the method of random walk, which is used in the Expanded Programme of Immunization (EPI). The chief objection to using the random walk method for the multiple indicator survey is that the household selection is not based on probability sampling methods, but rather on a procedure that effectively gives a quota sample.
Since the MICS have large sample sizes, the random walk procedure is inappropriate. It is sometimes argued that the small-scale EPI surveys, with their correspondingly small sample sizes, are dominated more by sampling variance than by bias, thus somewhat justifying the use of the random walk method. For the MICS, however, that same argument leads to the reverse conclusion that bias is of greater concern than sampling variance, due to the much greater sample sizes, and so stricter probability methodologies should be used at each stage of selection.
Shortcut procedures, such as random walk, that depart from probability designs are not recommended for the MICS and should be used only as a very last resort.
Deciding on Sample Size, Number of PSUs, and Cluster Sizes
The size of the sample is perhaps the most important parameter of the sample design, because it affects the precision, the cost, and the duration of the survey more than any other factor. Sample size must be considered in terms of both the available budget for the survey and its precision requirements. The latter must be further considered in terms of the requirements for national versus subnational estimates. Moreover, the overall sample size cannot be considered independently of the number of sample areas PSUs and the size of the ultimate clusters. So, while there are mathematical formulas to calculate the sample size, it will be necessary to include all of these factors in making your final decision.
Two general rules of thumb govern the choices on the number of PSUs and the cluster sizes: the more PSUs you select the better, as both geographic representation, or spread, and overall reliability will be improved; the smaller the cluster size, the more reliable the estimates will be.
(ADVANCE \r4Example:
In a national survey, 600 PSUs with cluster sizes of 10 households each will yield a more reliable survey result than 400 PSUs with clusters of 15 households each, even though they both contain the same overall sample size of 6,000 households. In addition, a cluster size of 10 is better than 15, because the survey reliability is improved with the smaller cluster size as well. So, in summary, it is better to strive for more rather than fewer PSUs and smaller rather than larger clusters, provided other factors are the same.
While, in general, the more PSUs the better, the number of PSUs in your survey will be affected to a great extent by cost considerations and whether subnational estimates are needed. Travel cost is a key factor. If the distances between sample PSUs are great and the same interviewing teams will be traveling from place to place (as opposed to using resident interviewers in each PSU), then decreasing the number of PSUs selected will significantly decrease overall survey costs. In contrast, if your survey requirements call for subnational estimates, there will be pressure on selecting more rather than fewer PSUs.
The choice of the cluster size for your survey is another parameter that has to be taken into account in determining sample size. Its effect can be assessed by the so-called sample design effect, or deff. The deff is a measure that compares the ratios of sampling variance from the actual stratified cluster survey sample (MICS in the present case) to a simple random sample of the same overall sample size.
(ADVANCE \r4Example:
If the calculated value of the deff from the indicator survey were to be 2.0, this would tell you that the survey estimate has twice as much sampling variance as a simple random sample of the same size.
The costs of simple random sampling preclude it from being a feasible option for the MICS, which is why cluster sampling is used instead. The factors that contribute to sample design effects are stratification, the cluster size, and the cluster homogeneity, that is, the degree to which two persons (or households) in the cluster have the same characteristic. The increased likelihood of two
children living in close proximity both having received a given vaccination, compared to two children living at random locations in the population, is an example of cluster homogeneity.
Stratification generally decreases sampling variance, while the homogeneity measure and the cluster size increase it. Hence, an objective in your sample design is to choose your cluster size so as to balance homogeneity, for which a smaller size is better, with cost, for which a larger size is usually better.
To calculate the sample size for the indicator survey, the deff must be taken into account in the calculation formula. There are two problems, however. First, while the value of deff can be easily calculated after the survey, it is not often known prior to the survey, unless previous surveys have been conducted on the same variables. Second, the value of deff is different for every indicator and, in fact, every target group, because the cluster homogeneity varies by characteristic. It is not practical, of course, to conduct a survey with different sample sizes for each characteristic based on their variable deffs, even if we knew what they were.
The deffs will not generally be known for indicators prior to the survey, but it is expected that they will be quite small for many indicators, that is, those based on rare subclasses (for example, children 12 to 23 months). If there has been a previous household survey that collected similar data to that of the MICS, and used a very similar sample design, you may be able to use the deffs from this previous survey to assess the likely design effects for MICS. Few household surveys calculate deffs, but the DHS is one good source of such information.
In the tables that follow in the next section, we have taken a conservative approach and assumed the design effects to be somewhat higher than they may be in practice, because we want to ensure that the sample size will be big enough to measure all the indicators. Nevertheless, a rule of thumb in choosing the cluster size, and the recommended approach, is to make sure that the cluster size is as small as can be efficiently accommodated in the field, taking into account related considerations such as the number of PSUs and field costs (discussed above) and achieving interviewer workloads of a convenient size.
Calculating the Sample Size
The Manual provides two tables, Tables 4.9 and 4.10, with sample sizes already calculated on the basis of MICS requirements plus certain assumptions. You may use the table values, if they fit your situation, to get your sample size; otherwise, you or your sampling specialist may calculate the sample size directly, using the formulas given in Appendix Seven.
To calculate the sample size, using the appropriate mathematical formula, requires that several factors be specified and values for others be assumed or taken from previous or similar surveys. These are as follows:
( The precision, or margin of error, wanted
( The level of confidence desired
( The estimated (or known) proportion of the population in a given target group
( The predicted or anticipated coverage rate, or prevalence, for a given indicator
( The sample deff (discussed above)
( The average household size
An adjustment for potential nonresponse
The calculation of sample size is complicated by the fact that some of these factors vary by indicator. We have already mentioned that deffs differ. Even the margin of error wanted is not likely to be the same for every indicator (and in practice it cannot be). This implies that different sample sizes would be needed for different indicators to achieve the necessary precision. Obviously, we must settle upon one sample size for the survey.
We will define a reliable estimate for the survey differently, depending upon whether it represents high or low coverage (which is the reason there are two tables of sample sizes). For the indicator estimates, it is recommended that the margin of error, or precision, be set at 5 percentage points for rates of coverage (e.g., immunizations) that are comparatively high, greater than 25 percent, and at 3 percentage points for coverage rates that are low, 25 percent or less. This is a departure from the first round of MICS, where the margin of error for all indicators was specified to be 5 percentage points.
The 5-percentage-point margin of error is a plausible tolerance if the coverage or prevalence is comparatively high, say, 25 percent or greater. When coverage is low, however, 5 percentage points is too loose a criterion to obtain an informative estimate. Suppose the coverage is only 15 percent; then the confidence interval with a 5-percentage-point margin of error would be 10 to 20 percent, a wide range that may not be a very useful estimate for many programme planning purposes. Not surprisingly, that kind of very lenient tolerance in the margin of error would not require a large sample.
Defining and Choosing the Key Indicator to Calculate Sample Size
The recommended strategy for calculating the sample size is to choose an important indicator that will yield the largest sample size. This will mean first choosing a target population that comprises a small proportion of the total population. This is generally a target population of a single-year age group (for example, children 12 to 23 months old), which in many countries comprise about 3 percent of the population. We recommend using 3 percent unless you have better estimates available for your country. Second, the particular indicator must be chosen for this same target population. We will label it the key indicator (but only for purposes of calculating the sample size).
In making your choice for the key indicator you will need to pick one with low coverage. Some low coverage indicators should be excluded from consideration, however. This can be explained by reviewing the indicators in Table 4.8, where examples are given of indicators for which low coverage is undesirable and the World Summit for Children goals are focused on raising the rate (for example, the DPT immunization rate). The second set of indicators in Table 4.8 give examples for which the situation is the opposite low coverage is desirable and the goal is to lower it further (an example is stunting prevalence). For indicators where the desirably low coverages are so low that the World Summit for Children goals have already been met, it would not make sense to base your sample size on them, and they should be excluded when picking your key indicator.
Table 4.7
Indicator Coverage Rates, Prevalence, or Proportion
Low Coverage Undesirable
ADVANCE \d4 ( Access to safe water, sanitary facilities
( School attendance
( Antenatal and childbirth care
( Vitamin A supplement coverage
( Breastfeeding rates
( Immunization coverage rates
Low Coverage is Desirable
ADVANCE \d4 ( Mortality rates
( Malnutrition prevalence
Child labour
Table 4.8 contains suggestions for picking the target group and key indicator for purposes of calculating the sample size directly or finding the sample size in Tables 4.9 or 4.10. Note that Table 4.8 does not list the maternal mortality ratio as a candidate for the key indicator. This is because the sample sizes that would be necessary to measure this indicator are much too large in the tens of thousands and it is impractical to give them consideration. This does not mean that such indicators should not be measured in the survey, but rather that the sample size for the survey should not be based on them. The survey results for these indicators will have larger sampling errors and, hence, wider confidence intervals than the other indicators.
In making your choice, you must also consider the relative importance of the various indicators in your country. Aside from the list given in Table 4.8, other possibilities could be considered, but calculations for sample size were carried out for likely candidates and it was determined that unless the prevalence is extremely low, the required sample sizes will be less than those for the indicators in the Table.
Table 4.8
Checklist for Target Group and Indicator
To decide on the target group and indicator that you need to determine your sample size:
1. Pick the single-year age group that is the one comprising the smallest percentage of the population probably about 3 percent.
2. For that target group, pick the lowest from among the following coverage rates:
ADVANCE \d4 ( DPT immunization level
( Measles immunization level
( Polio immunization level
( Tuberculosis immunization level
3. Do not pick from the desirably low coverage indicators one that is already acceptably low.
(ADVANCE \r4Example:
One indicator that might be thought to be a candidate for sample size calculation because of its combination of low coverage and small population percentage is the indicator for moderate and severe underweight prevalence of under-five females. However, the proportion that the target group represents of the population is likely to be comparatively high about 7.5 percent (that is, 3 percent per single-year age group x 5 x 0.5, with the last two factors for under-fives and gender, respectively). But perhaps more importantly, low coverage is desirable for this indicator. If it is very low to begin with, it is probably not a health problem in your country and so it would not be necessary to focus on it in the survey.
Using the Sample Size Tables
The sample size tables that have been prepared will often fit the situation in your country, so that you can find the sample size from one of the two tables without having to calculate it using the formula in Appendix Seven. The difference between Table 4.9 and Table 4.10 is that the former is to be used when your key indicator (as determined from Table 4.8) is expected to have a moderate-to-high coverage (or prevalence) rate, and the latter pertains when your key indicator coverage rate is expected to be low. Use Table 4.9 if the key indicator you have chosen has moderate-to-high coverage, defined as greater than 25 percent prevalence. Use Table 4.10 if it is 25 percent or less.
In the tables, the level of confidence for the precision of the estimates is prespecified at 95 percent. Varying values of the average household size and deff are used from 4.5 to 6.5 and from 1.5 to 2.0. The precision (margin of error) level is set at 5 percentage points for Table 4.9 and at 3 percentage points for Table 4.10. Both tables reflect a 10 percent upward adjustment in sample size to allow for potential nonresponse in the survey. It is crucial to note that both tables also assume that the target population for your key indicator comprises 3 percent of the total population. When it is a different value, you cannot use the tables to find the required sample size. More about what to do in this situation is given later in this section.
Table 4.9
Sample Size (Households) for Smallest Target Population, Moderate-to-High Coverageand 5 Percent Margin of Error
Average Household Size (number of persons)deff = f = 1.5Deff = f = 1.75deff = f = 2.04.54,8895,7046,5195.04,4005,1335,8675.54,0004,6675,3336.03,6674,2784,8896.53,3843,9494,513Use this table when your
( Target population is 3 percent
( Key indicator coverage rate is greater than 25 percent [note: the value of 50 percent was used to calculate the sample sizes since it gives the largest value]
( Margin of error is 5 percentage points
One of the sample sizes in Table 4.9 should fit your situation if your key indicator coverage is expected to be moderate to high. You should use the middle column to help you decide which size sample is appropriate for your situation if you do not have actual estimates of deff from previous surveys. As mentioned before, deff will vary for each indicator, so unless you have specific information about it you should assume it to be 1.75 for the key indicator you have chosen. If you do have an estimate of deff for the key indicator and it is higher than 2.0 or lower than 1.5, then you can calculate the sample size yourself using the formula in Appendix Seven. If you have a value of deff that you can use and it is between 1.5 and 2.0, interpolation in Table 4.9 (or 4.10) may be used to find your sample size.
An illustration of the use of Table 4.9 follows:
(ADVANCE \r4Example:
If your average household size is 5.5 persons and you have no information about deff, you would assume the deff to be 1.75 and the recommended sample size for your situation is thus found in Table 4.9 to be 4,667 households. The figures should not, however, be taken as exact but only as approximate sample sizes; remember that several assumptions were made in calculating the sample sizes. It would make sense to round the sample sizes up or down depending upon budget restraints. You might decide that 4,500 or 5,000 would be appropriate when considering between-PSU travel costs, cluster sizes, or interviewer workloads.
Table 4.10 below should be used if your estimates of rate coverage are low.
Table 4.10
Sample Size (Households) for Smallest Target Population, Low Coverage,and 3 Percent Margin of Error
Average Household Size (number of persons)deff = f = 1.5deff = f = 1.75deff = f = 2.04.510,18511,88313,5805.09,16710,69412,2225.58,3339,72211,1116.07,6398,91210,1856.57,0518,2269,402Use this table when your
( Target population is 3 percent (no change from Table 4.9)
( Key indicator coverage rate is 25 percent or less [note: the value of 25 percent was used to calculate the sample sizes since it gives the largest value]
( Margin of error is 3 percentage points
The sample sizes in Table 4.10 are much larger than those in Table 4.9. This is because the table applies when coverage is lower, in which case the margin of error must be more stringent (3 instead of 5 percentage points) in order for the resulting survey estimates to be meaningful. Therefore, it is very important that you carefully evaluate the currently available estimates of your key indicator candidates before deciding which table to use.
What happens to the sample size calculations if the proportion of children in a single-year age group in your country is not 3 percent but instead is closer to, say, 2 or 4 percent? In the case of the former, you can just multiply all of the numbers in either table by 3/2 to come up with the sample sizes. This, of course, is dramatically larger, since the sample sizes would be 50 percent bigger than those in the tables. If, on the other hand, your one-year age group is nearer to 4 percent of the population, your sample sizes can be calculated by multiplying the figures in Table 4.9 by 3/4, or, in other words, reducing them by 25 percent.
Illustrating Sample Size with Cluster Size and Number of PSUs
We conclude this section with several examples using different scenarios to illustrate the interrelationship of sample size, number of PSUs, and cluster size.
(ADVANCE \r4Example:
Target group: Children 12 to 23 months old
Percent of population: 2.9 percent
Key indicator: DPT immunization coverage
Prevalence: 55 percent
Deff: No information
Average household size: 6
Under this scenario, use Table 4.9 because coverage of the key indicator is comparatively high and also near 50 percent. The target population, comprising 2.9 percent, is also very close to the 3 percent figure that Table 4.9 is based upon. With no information on the design effect, use the middle-column value of 1.75, and the sample size for an average household size of 6.0 persons is then found to be 4,278 households. [Note: If you had preferred to use the (long) formula in Appendix Seven, it would give a more exact sample size of 4,381, using the precise prevalence of 0.55 and the precise target population percentage of 29.]
Suppose that your country is relatively large in geographic size and, further, that there are a large number of provinces, say, 15. You and your sampling staff have concluded, therefore, that you need to have a minimum of 300 PSUs in order to achieve good geographic spread and a sufficient representation in each province. Moreover, you have decided that the budget for the survey would support that number of PSUs. The cluster size may then be calculated as 4,278 divided by 300, or about 15 households. With 15 as the cluster size, this would further imply that you ought to use Option 2. Recall that the recommended minimum segment size for Option 3 is 20 households, because those segments (clusters) are compact and it may not be practical to construct suitable boundaries for such small segments.
Instead of targeting 300 PSUs as your number, you and the survey and sampling staff may have decided, alternatively, that you wanted clusters of a certain size, say, 12, in order to meet operational requirements such as interviewer workload distribution. In this case, you would divide 4,278 by 12 to give you the number of PSUs about 356. You would then review this number, in terms of cost and other considerations, and either accept it or adjust your cluster size. You might conclude that 325 is the maximum number of PSUs you can field, in which case you would have to adjust the cluster size to 13.
(ADVANCE \r4Example 2:
Target group: Children 12 to 23 months old
Percent of population: 3.0 percent
Key indicator: Polio immunization coverage
Prevalence: 23.5 percent
Deff: No information
Average household size: 5.5
Under this scenario, use Table 4.10 because coverage of the key indicator is less than 25 percent. Again using the middle column, for a design effect value of 1.75, the sample size for an average household size of 5.5 persons is found to be 9,722 households. (With the long formula, the more exact sample size calculation is 9,322.)
Because of cost considerations and field workloads, suppose the survey team decides it wants cluster sizes of 30 households if possible. Here, dividing 9,722 by 30 gives 324 PSUs, and you may decide that this is an acceptable number for the fieldwork. If, on the other hand, you decided that you would like to have about 400 PSUs for geographic spread, you would divide 9,722 by 400, which gives 24 as your cluster size. Recall that the smaller the cluster size the more reliable the indicator estimates will be (for all indicators, not just the key indicator). You may decide, therefore, to use the 400-PSU design with its average cluster size of 24 households. Under either case 324 PSUs of size 30, or 400 of size 24 both Option 2 and Option 3 could be considered for your design, and this decision would be made by taking into account the cost of household listing (Option 2) compared to the segmentation operation (option 3) and other factors, such as travel costs.
(ADVANCE \r4Example 3:
Target group: Children 0 to 11 months old
Percent of population: 3.0 percent
Key indicator: Birth weight under 2.5 kg
Prevalence: 15 percent
Deff: 1.2 (from a previous survey)
Average household size: 6
Under this scenario, you would have to use the long formula in Appendix Seven for a low coverage indicator, since, among other reasons, Table 4.10 does not provide the sample size when the design effect is under 1.5. The formula gives 4,166 households.
Suppose that the survey staff has concluded that the survey can only handle 200 PSUs because of cost considerations, even though this Manual urges 250 to 350 PSUs. In this case, you would figure the cluster size by dividing 4,166 by 200, to give 21 households as the cluster size. (As an alternative, you could use 20 households and just increase the number of PSUs by 8). In this case, either Option 2 or 3 would provide a suitable design.
Table 4.11
Summary Checklist for Sample Size and Design
ADVANCE \d4( Determine target group that is smallest percentage of population
( Determine coverage rate for same target group
( If lowest coverage rate is: greater than 25%, choose sample size from Table 4.9; 25% percent or less, use Table 4.10; ELSE, calculate sample size with formula given in Appendix Seven
( Decide on cluster size, usually in a range of 10 to 40 households
( Divide sample size by cluster size to get number of PSUs (sample areas)
( Review your choices of n, cluster size, and number of PSUs, IN ORDER TO
( Choose among Options 1, 2, or 3 for sample design
Sample Size for Subnational Estimates, Changes, and Subgroup Analyses
Thus far we have been concerned with sample sizes necessary to generate national estimates of indicators. Many countries, however, will also want to use the survey to provide subnational figures for example, at the level of urban-rural, regions, states or provinces, or possibly districts. Such data would be used for identifying areas where greater efforts are needed, as well as for programming and evaluation purposes.
A crucial limiting factor in providing reliable subnational estimates is sample size. For each reporting domain (that is, subnational area such as a region or urban-rural), the overall sample size must be increased substantially for the results to be acceptably reliable. If equally reliable results are wanted for each domain, it is common practice to increase the national sample size, n, by a factor close to the number of domains, thus selecting n cases in each domain. So, if equally reliable data are wanted for 10 regions of the country, the sample size that had been calculated for the national estimates on the basis of, say, Tables 4.9 or 4.10, would have to be multiplied by a factor of about 10 in order to obtain the regional estimates. This, of course, is clearly impractical for most countries and we do not recommend it.
Compromises would have to be considered, especially if the number of domains is large. Various alternatives might be considered. A plausible alternative is to restrict the separate reporting domains, such as provinces, for example, to only those that exceed a certain population size. The remaining sub-areas could be combined into regional groupings. Another alternative is to allow the precision levels for the domain estimates to be less strict than that for the national estimate; for example, if the margin of error for the national estimate is set at 5 percentage points (as per Table 4.9), the separate reporting domains might have their margins of error set at 7.5 percentage points. Further, these two alternatives could be used in combination.
Some of the goals are expressed as expected reductions, such as decreasing the prevalence of malnutrition by 20 percent in a five-year period. This type of assessment would require two surveys at the beginning and end of the period. The size of the sample necessary to measure the change between two time periods is highly dependent on the magnitude of the change, as well as the magnitude of the two estimates at each point. It is a somewhat complicated matter and impractical to provide short, general guidelines for estimating changes. It is recommended that you seek the help of the National Statistics Office or specialized sampling assistance if your plans include change measurement.
Regarding subgroup analyses, such as indicators by gender or socioeconomic group, the indicator estimates will be less precise than those for the whole sample.
The example below shows how the margins of error increase for ever-smaller subgroups.
(ADVANCE \r4Example:
Based on the full (national) sample giving a margin of error (precision) of plus or minus 5 percentage points for a 50-percent coverage rate, the margin of error would be, approximately plus or minus
( 6.3 percentages points for gender-specific indicators, given 50-percent boys and 50-percent girls in the sample
( 8.6 percentage points for a subgroup comprising 20 percent of the overall sample
Thus, reasonably precise results can be obtained for gender-specific indicators as well as for other subgroups making up one fifth or more of the whole sample.
Preparing Estimates and Sampling Errors
In this section we discuss the weighting alternatives for preparing the estimates, plus the need to calculate sampling errors.
Two types of weighting, if appropriate, may be applied in sequence in producing the estimates of the indicators. Unless the sample households have been selected with uniform overall probabilities (that is, a self-weighting design), all sample data should be weighted using the inverse of the overall selection probabilities the so-called design weights. The design weights should be adjusted to account for nonresponse, however, even if the sample is self-weighting. This might be done in a variety of ways including weighting up the respondents in each PSU (or cluster) to represent the nonrespondents in that PSU. The main advantage of this approach, unlike the post-stratification adjustments discussed in Appendix Seven, is that it does not require external data. These two steps, applying design weights and nonresponse adjustments, may be all the weighting that is necessary for your survey.
However, further weighting may be undertaken by adjustment of the design weights to make the weighted sample distribution for some key variables, such as urban-rural or region, conform to an external population distribution. This type of post-stratification weighting should be considered when there have been significant departures from the design at the implementation stage, when approximate procedures have been used because of deficiencies in the sampling frame, or when the sample departs from strict probability criteria (see Appendix Seven for further details).
Table 4.12
Getting Help
This chapter of the Manual, though fairly detailed, is not intended to make expert sampling statisticians of its readers. Many of the aspects of sample design will likely require assistance from a specialist, either from within the Government National Statistics Office (NSO) or from outside. These may include calculation of the sample size, construction of frame(s), evaluation of the design Options 1 through 3, applying the pps sampling scheme, computation of the weights, and preparation of the sampling error estimates. In any case it is strongly recommended that the NSO be consulted on the design.
Need to Calculate Sampling Errors
As part of the routine preparation of the survey results, sampling errors and associated variables, such as deffs, should be estimated for the main indicators. This is essential in order to evaluate the reliability of the indicator estimates. To that end, the confidence intervals incorporating the margin of error around the survey estimates cannot be constructed unless the sampling errors are estimated. And, hence, interpretation of the estimates is severely hampered.
Calculation of sampling errors, or standard errors, can be (though are not necessarily) a fairly complicated part of the survey operation. Standard errors should be calculated in a way that takes account of the complex sample design (clustering, stratification, and weighting). The inappropriate application of simple random sampling formulas will, as a rule, seriously underestimate the standard errors.
Since there will undoubtedly be a variety of sample designs used, including some based on existing samples, it is not practical to provide a general scheme for estimating the sampling errors. To facilitate the process, however, no matter what design is used in your country, the data records need to contain PSU identifiers and, if strata are used, the strata need to be identified for all PSUs.
There are a number of software packages that have been developed and can be easily adapted for variance estimation. They include the CLUSTERS computer program, originally developed for the World Fertility Survey and available from the University of Essex; CENVAR, a software product available from the U.S. Bureau of the Census without charge; and WESVAR, a program produced by WESTAT for use with SPSS. Some of the packages are free and can even be downloaded from the Internet, while others are commercially sold.
A household is defined in the multiple indicator survey as a group of persons who live and eat together.
The multiple indicator survey has different target populations depending upon the indicator. Examples include children in 4-month age groupings for the breastfeeding indicator, children 0 to 11 months old, 12 to 23 months, children under 5 years old, children under 5 with diarrhoea, women 15 to 49 years old, and total population.
The term margin of error may be called level of precision in other publications. It is used to set the confidence interval of the survey estimate. A margin of error of 5 percentage points thus gives a confidence interval plus or minus that amount around the point estimate. If the estimate is 50 percent with a 5 percent margin of error, the confidence interval is 45 to 55 percent.
The sampling matters are described in Demographic and Health Surveys: Sampling Manual, Basic Documentation - 8, Macro International Inc., Calverton, Maryland 1987.
This is probability proportionate to size (pps) and it refers to the technique of selecting sample areas proportional to their population sizes; thus an area containing 600 persons would twice as likely to be selected as one containing 300 persons.
See The Arab Maternal and Child Health Survey, Basic Documentation 5: Sampling Manual, League of Arab States, Cairo, 1990.
A non-compact cluster is one in which the households selected for the sample are spread systematically throughout the entire sample area. A compact cluster is one in which each sample household in a given segment is contiguous to its next-door neighbor. Non-compact clusters give more reliable results than compact clusters, because of their smaller design effects.
There is an alternative procedure when the population is thought to have changed significantly, so that the average segment size might be too variable for efficient field assignments. The segment size may be fixed rather than the fraction of households to select, in which case a different sampling interval would have to be calculated and applied in each sample segment. Each segment would then have a different weight and this would have to be accounted for in the preparation of the indicator estimates.
See a complete description of the modified segment (or cluster) design in Turner, A., R. Magnani, and M. Shuaib, A Not Quite as Quick but Much Cleaner Alternative to the Expanded Programme on Immunization (EPI) Cluster Survey Design, International Journal of Epidemiology, 1996, Vol. 25, No.1.
A type of probability sampling in which n sample units are selected with equal probability from a population of N units, usually without replacement and by using a table of random numbers.
The mathematical expression for deff is a function of the product of the cluster homogeneity and the cluster size. Even if the cluster size is large in terms of total households it will be small in terms of the particular target population, and so the deff is likely to be small also.
In making your choice of the lowest-percentage population groups, it is strongly recommended that you exclude from consideration the four-month age groupings of children that form the basis for the breastfeeding indicators, because the necessary sample sizes would likely be impractically large.
Regarding sample size for the maternal mortality ratio, a 1997 guide by WHO and UNICEF, The Sisterhood Method for Estimating Maternal Mortality, recommends that if the MMR is 300 (per 100,000 live births), it can be estimated with a sample size of about 4000 respondents with a margin of error of about 60, utilizing the indirect sisterhood method.
Vijay Verma suggests, instead, increasing the national-level sample size by the factor D.65 where D is the number of domains. The reliability of each domains estimate is somewhat less than the national estimate under this approach. See A Critical Review of MICS Sampling Methodology, report by Verma to UNICEF, April 1995.
See the unpublished note to UNICEF, Some Proposed Modifications for the WHO Simplified Cluster Sampling Method for Estimating Immunization Coverage, by Graham Kalton, September 1988, page 10.
The discussion on post-stratification weighting is paraphrased from the report, cited in footnote 14, to UNICEF by Vijay Verma, to whom we are indebted for his careful review and commentary on the original Handbook.
A comprehensive review of these programs has been published: Sampling Error Software for Personal Computers, by Jim Lepkowski and Judy Bowles of the University of Michigan. The article appears in The Survey Statistician, No. 35, December 1996, 1017 (see website: www.fas.harvard.edu/~stats/survey-soft/iass.html).
4.PAGE 20 multiple indicator cluster survey manual
ADVANCE \d4
designing and selecting the sample 4.PAGE 19
Table 4.1
Probability Sampling
ADVANCE \d4( Every person in target population has a chance of being selected
( Selection chance is non-zero
( Chance is calculable mathematically
( Probability techniques are used in every stage of selection
<=34567hijklRV * 5
r
Zd56CJOJQJ5CJOJQJ5CJOJQJj0JH*U60JCJCJ
56CJOJQJ j4 jU5CJj5CJUOJQJ0JC<=>4iTUV $dhx$dh$dhx$$$dPxx$
dP$
$d$d$$d d $d $d <=>4iTUV ?VW@
u
1X9v{ #$%%%#&''G((((+b+++E,,,/r2777&7A7B7G7l77778%8b888M9:{<>>;> ?N@YE4G~HI
X ?VW@
u
qqq#$
&Fdx
$hXw?p` $d
"Xw?p` #$
&Fdx
$hXw?p` $$$
&Fdx
$hXw?p` $dhx
1X9v{ #$8$0d x
V`08p @xHP X !(#$ $1$dhx$dhx
$dhx$&#$0$iK$d%d&d'd/s.s#$z%qz9:ixywLF N !$!"&########$$
$$$$%%#&''''(()I*c..r277A7G7H7S7T7U70J@0JCJ56:6CJCJ5:CJ
jCJU
j.CJ( jUjUhmHnH6j0JH*UH$%%%#&''G((((+b+++E,,#$
&Fdx
$hXw?p` 1$dhx$$dhdhx$dhxdh$d
`0,,/r2777&7A7B7G7l77778%8b8dO$
$$
$dh$
$d$
$d$dhx$dx
"Xw?p` U7V7W7m7n7o7777777777788888$8%8&8'8(8a8b8c8d8e899>;>CCCTEWEF"FIII@JBJKK'L)L+N2NfNNOO4PKPVeVgVVVVVVVVVVVVWWWWWkWlWmWWWWWWWY0J:60J5CJj0JH*U jUOJQJ j4Wb888M9:{<>>;> ?N@YE4G~HIIBJdx
dhdhx
&`0Xw?p` $1$dhx$dhx$$$
dO$
IIBJK(L)LeNfNNO=SfVgVhVrVVVVVWjWWWX4XXYYZZZZ[^`
```;`<`A`d````aa#aqaaaaaacc&deeggghfi_kwkkk|noPpQpRp\pppppq7qsqqqq;rwrrrr
!
XBJK(L)LeNfNNO=SfVgVhVrVVVV$$$$$$d$
$dh$
$$d$
$1$dhxdhxxx$dhxdx
&Fdx
VVWjWWWX4XXYYZZZy$$$$:dO$
$
&F>dO$
$
&FVdO$
$x$$dO$
:
$dO$
:
$dP$
:
YYYYYY[L\-_R_
`;`A`B`M`N`O`P`Q`e`f`g```````````aaa#a$a/a0a1a2a3aOaYapaqarasataaaaaaaaaaaaavdwdeeeeeeeeeeegggj0JCJH*UCJj0JH*U5:CJ
jCJU
j.CJ( j4 jU0J6OJQJLZZ[^`
```;`<`A`d````aa#aqaadO$
d,$dO$
d$$
$dh$
$d$
$1$dhx$dhxaaaaacc&deegggh{vdh$d
`0$dhx !1$d
8$0d x
V`08p @xHP X !(#$$$$
dO$
d
gghiijj`kakbkmknkvkwkkkknTo,p/pQppppppppppppq q
q3q6q8q9q:qtquqvqqqqqqrrrrrrxryrzrrrrrrrbxByyyyyyyyyyzzz*zOJQJ j4 jU0JCJ5:CJ
jCJU
j.CJ(j0JH*U6:Phfi_kwkkk|noPpQpRp\ppppz$$$$$$
$dh$
$d$
$d
`08$0d x
V`08p @xHP X !(#$ $1$dhx$dhxppq7qsqqqq;rwrrrrrrauyy$dhx$$$"$$dO$
,`0Xw?p` d$$$$$dO$
drrrauyyyyyyyz)zVzoztzz {C{D{E{F{i{|)Ղ
)ASNԕ=Dg1Owx8>QRS]<Srsȧէ֧קاӪtuv
Yyyyyyyz)zVzoztzz {C{D{E{F{$dhx$$$
&FdO$
dddO$
ddO$
d$$
$dh$
$d$
*z+z,zWzXzYztzuzzzzzzzzzF{i{)Ղ
VZae*+,78@A`du}Ȏ̎AF_o$)ȑבԕ6CJj0JH*UCJCJ
jCJU
j.CJ(0JCJ65: jUOJQJ j4NF{i{|)Ղ
)t$d x
`0$d x
`08$0d x
V`08p @xHP X !(#$x
&`0Xw?p` $dhx $1$dhx$dh
)ASNԕ=~x$dhx
&`0Xw?p` $1$dhx $1$dhx$dhx$d x
`08$0d x
V`08p @xHP X !(#$Dg1Owx8>QRS
$d$
dhx$dhdhx
&`0Xw?p` $dhxdx
&Fdx
dx
EFGhij
234>GstvC$\c^PR=>TUrקا̨ͨΨu߭CJ jU5560JH*j0JH*U0JCJ6OJQJ j4TS]<Srsȧէ֧
&FdO$
ddO$
ddO$
,`0Xw?p` d$$
$d$
$dh$
֧קاӪtuvdO$
,`0Xw?p` d$$
$dh$
$d$
dhxdhx
$dhx$dhx$$$߭$@cîĮŮݮӱ+;L\]agmstx~øɸϸиعٹڹ@X˿$%+237=CJKOU[
_߭$@cîĮŮzt$dh$$$dPx$
dP$
,`0Xw?p` d0$
&`0Xw?p` 0dPx$
&`0Xw?p` %&ABf®ĮŮƮǮȮӮԮܮݮӱ#(L:Ʒַ+0;@LQи
Eֹٹ$ٻݻABͼͼ j4CJOJQJ6CJOJQJCJOJQJ56CJOJQJ5CJOJQJ0JOJQJ6CJCJ5:CJ
jCJU
j.CJ(CJ6 j4CŮݮӱ+ywr$$
d
$$dh
$d$dhx$$dh$dhx
`0$d
`06$0d
V`08p @xHP X !(#$+;L\]agmstx~\\7$$l4\$ Hl$$$7$$l4\$ Hl$$<$
øɸϸ\\\$<$$$7$$l4\$ Hl$ϸиعٹڹ$}}t]UU$dhx$$l4$ <$
>x$
,`0Xw?p` !BdP$
,`0Xw?p` x<$7$$l4\$ Hl$ @X˿$$<$$$
d
$dh
$d
`08$0d x
V`08p @xHP X !(#$$dhxBCNOWXýǽ˿$]apxb{|}~VWX~E
0ʽʳʳʳCJ(
j.CJ(6 j4CJOJQJOJQJ6CJOJQJCJOJQJ56CJOJQJ5CJOJQJ0JOJQJ6CJCJ5:CJ
jCJUBhd7$$l4\$ Hl$ <$
4$$7$$l4\$ Hl$
%+237=CJKOU[a``\ <$
4$$7$$l4\$ Hl$[ab{UD
;Rh01%>Tqr(?c~<=>?Jwx@MJK[\]huv
!
Zab{UD|}}t]UU$dhx$$l4$ <$
>x$
,`0Xw?p` !BdP$
,`0Xw?p` x<$7$$l4\$ Hl$ D
;Rh0!$1$d
`0$d
`08$0d x
V`08p @xHP X !(#$ $$dhx$d $1$dhx01%>Tqr(?8$0d x
V`08p @xHP X !(#$$d$d
`0>wxyq@A#"+~UxHɿԿԿԶԿԿԿOJQJ j4j0JH*U6@CJOJQJ j4CJOJQJjCJOJQJUCJOJQJ0JOJQJ5:CJ
jCJU
j.CJ(CJ(6CJCJ>?c~<=>?Jwx@dP$
dP$
$
$dh$
$d$
$dhx!1$d $d
`0Mi8$0d x
V`08p @xHP X !(#$1$dhx $1$dhx$dhx$$dh$dhx$$$dO$
dP$
JK[\]huv$
$d$
$dh$
$$d$
$$$dhx
&`0Xw?p` $dhx$d
`$d
`0HIJgh\u$'hm
tuMNu #Flm v w
bc+
B
jCJUH*5:0J6j0JH*U0JCJj0JCJH*USvp tMlv
b
*?@
*CJ66 Heading 3$@&
56CJ8@8 Heading 4$$@&5CJ:@: Heading 5$$dx@&>@> Heading 6$$d@&
H44 Heading 7$$@&5<@< Heading 8$@&5CJOJQJ: : Heading 9 $$@&
5>*CJ<A@<Default Paragraph Font4&@4Footnote Reference&O&CN5:CJ OJQJ,O,CH$5CJ$OJQJ O" IBT56TO2TTL=
8`0Xw?p` &OA&1H5:CJOJQJ$OQ$TT5CJOJQJOaTCJOJQJ"Oq"TFNCJOJQJ, @,Footer
!&)@&Page Number,,Header
!2B@2 Body Text$$d 6P6Body Text 2$$d BCBBody Text Indent$dNRNBody Text Indent 2$dhx62@2
Footnote TextCJDSDBody Text Indent 3
$LT@L
Block Text!!$d
`0CJ8Y"8Document Map"-D OJQJ#
x5?Jv^ads̢Hg
kDc
m~Y 3Fyyyyyyy{{{{~U7Yg*zBH
$,b8BJVZahpyF{)S֧Ů+ϸaD0?v Ir[v G3S333QQAZMZ#[/[[[[[____aemejjklssttt}~+7ߧǨӨBNx TTTTTTTTTTTTTTTTTTTTTTTTTTTTTT6Bkqt~!T!!-T
?2$XLf(IL]$2${~09@H 0(
" ( (
(P
SL. `T`TT`T
B
S ?9 $@~%
%88 9&9y99>>>>??TAWAAAB"B"D-DOESEEEGG%H)HzHHIII*IIIJJRRHSLSSS{VV^^ h%h,j/j3k6k5l9luuyy{{
||}}:}>}'~+~~~<@[_Ɓ5@VZae`dȈ̈AE$)VZkvoxΡԡծ#'0;,0<@MQ#ٵݵ÷ǷǺ˺غܺRVJNvz>B>Bsww{?C",0x|-81<$'hm.2 8?HN~fm """"#"#"@@[[__aa
}
} pp116;OOvwvw̛0RRbchinoyzĲŲʲ˲ֳ׳ot()TU]^ !&'-.89>?EFPQVW\]UX~puQR ef^_9:__lmbc;<KKoqvwjo?D[[ 15FJ<@>?/058AAEHNPvx~bdmn % David5C:\TESSA\Final MICS to Printer Files\EngChapter4.DOCDavid1C:\TESSA\Revised MICS Nov 99\EditedChap4Feb00.docDavidIC:\TESSA\No-Hyphen Final MICS to Printer Files\No-Hyphen EngChapter4.DOCDavidIC:\TESSA\No-Hyphen Final MICS to Printer Files\No-Hyphen EngChapter4.DOCDavid>C:\WINDOWS\TEMP\AutoRecovery save of No-Hyphen EngChapter4.asdDavid>C:\WINDOWS\TEMP\AutoRecovery save of No-Hyphen EngChapter4.asdDavidIC:\TESSA\No-Hyphen Final MICS to Printer Files\No-Hyphen EngChapter4.DOCDavidIC:\TESSA\No-Hyphen Final MICS to Printer Files\No-Hyphen EngChapter4.DOCDavidIC:\TESSA\No-Hyphen Final MICS to Printer Files\No-Hyphen EngChapter4.DOCDavid=C:\TESSA\Final MICS to Printer Files\G_EngChapter4correc.DOC95(\@TFx|Dy2 e(B9H|Rp6 IEd>|*D|0QK|XQL4fb 056CJ(OJQJo(+0o(.0OJQJo(4hhOJQJo(056CJ(OJQJo(+0OJQJo(4hhOJQJo(0OJQJo(40OJQJo(40OJQJo(40o(.hhOJQJo(@4fb(B95xXQDyIEd>*DH0QKRp6@HP LaserJet 4LLPT1:HPPCL5MSHP LaserJet 4LHP LaserJet 4L@g,,@MSUDNHP LaserJet 4L?[a^
HP LaserJet 4L@g,,@MSUDNHP LaserJet 4L?[a^
h@FGST @J@X@GTimes New Roman5Symbol3&Arial7"UniversEMonotype SortsKWP IconicSymbolsAG"Albertus Medium5&Tahoma# hCC5Br#gY d Patricia H. DavidDavidOh+'0
8DP
\hpxssPatricia H. DaviddatratrNormal.dot David.d2viMicrosoft Word 8.0@@z!q@@r#՜.+,D՜.+,@hp
JSI and World Educationg
Title 6>
_PID_GUIDAN{DB16D501-12BA-11D3-9D4C-006097BDE043}
!"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~Root EntrymmaryInformation FbJ1Tablej+WordDocument77fSummaryInformation(DocumentSummaryInformation8CompObjZObjectPoolbJbJDuP<<<LL4LtLLL
FMicrosoft Word Document
MSWordDoc9q