# Population, sample and sampling

Statistics revolve around the **study of data sets** and, in this lesson, we will learn some of the main differences between the most concurrent concepts in this area: **population, sample and sampling** . In addition to obtaining detailed information on each of them, you will find a comparison chart so that you can achieve a better understanding.

Index ## Population | ## Shows | ## Sampling | |

Definition | The population refers to the collection of all the elements that have common characteristics, which comprises the universe. | Sample means a subset of the members of the population chosen to participate in the study. | Process used in statistical analysis to select sample items from a population |

It includes | Each and every one of the units in the group. | Only a handful of units in the population. | It is the same process to go from the entire population to just a handful of units. |

Characteristic | Parameter | Statistics | Statistics. |

Data collection | Complete enumeration or census | Sample or sampling survey | Through simple random sampling, or systematic sampling. |

Focuses on | Identify the characteristics. | Make inferences about the population. | Select elements of a population to be analyzed. |

Examples | A teacher decides to analyze the performance of his students in the final exam of the subject. The population would be the total of the grades of all the students (300). | Following the same example, the sample would consist of taking the notes of 100 of them and analyzing them (that is, a fraction of the population). | Researchers can use statistical methods to define a confidence interval around a sample mean. |

## What is sampling

Sampling is a **process used in statistical analysis** in which a predetermined number of observations are taken from a larger population. The methodology used to sample from a larger population depends on the type of analysis being performed, but can include **simple random sampling or systematic sampling** .

A sampling method is a **procedure for selecting sample items from a population** . Simple random sampling refers to a sampling method that has the following properties.

- The population consists of N objects.
- The sample consists of n objects.
- All possible samples of n objects have the same probability of occurring.

An important benefit of **simple random sampling** is that it allows researchers to use statistical methods to analyze sample results. For example, given a simple random sample, researchers can use statistical methods to define a confidence interval around a sample mean. Statistical analysis is not appropriate when non-random sampling methods are used.

## What is a population

In statistics, the term “population” has a slightly different meaning than it is given in ordinary speech. It need not refer only to animated people or creatures: the population of Great Britain, for example, or the dog population of London. Statisticians also talk about a **population of objects, events, procedures, or observations** , including things like the amount of lead in urine, doctor visits, or surgical operations. A population is, therefore, a set of creatures, things, cases, etc.

Although a statistician must clearly define the population they are dealing with, they may not be able to list it exactly. For example, in ordinary usage, the population of England denotes the number of people within the boundaries of England, perhaps as listed in a census. But a doctor could embark on a study to try to answer the question “What is the average systolic blood pressure of Englishmen aged 40 to 59?” But who are the “English” mentioned here? Not all English people live in England, and the social and genetic backgrounds of which can vary. A surgeon can study the effects of two alternative operations for gastric ulcer. But how old are the patients? What gender are they? How serious is your illness? Where they live? And so. The reader needs**accurate information on these matters to draw valid inferences** from the sample that was studied to the population under consideration. Statistics such as means and standard deviations, when taken from populations, are called **population parameters** . They are often denoted by Greek letters: the population mean is denoted by μ (mu) and the standard deviation denoted by ς (lowercase sigma)

## What is a sample

A population commonly contains too many individuals to study properly, so an investigation is often limited to one or more samples drawn from it. A well-chosen sample will contain most of the information about a particular population parameter, but the relationship between the sample and the population must be such that it allows **true inferences** to be made **about it** .

Consequently, the first important attribute of a sample is that each individual in the population from which it is drawn must have a non-zero probability of being included in it; A natural suggestion is that these possibilities should be equal. We would like the elections to be held independently; in other words, the choice of one theme will not affect the possibility of other themes being chosen. To ensure this, we make the choice through a process in which only chance operates, such as spinning a coin or, more generally, using a table of random numbers. A sample so chosen is called a **random sample** . The word “random” does not describe the sample as such, but the way it is selected.

Taking a satisfactory sample sometimes presents more problems than statistically analyzing the observations made on it. A full discussion of the topic is beyond the scope of this article.

Before drawing a sample, the researcher must **define the population from which it came** . Sometimes he or she can completely list their members before starting the analysis; for example, all livers studied at necropsy during the previous year, all patients aged 20 to 44 years were admitted to hospital with perforated peptic ulcer in the previous 20 months. In **retrospective studies**of this type, the numbers can be assigned serially from anywhere in the table to each patient or sample. Suppose we have a population of size 150 and we want to take a sample of size five. contains a set of computer-generated random digits arranged in groups of five. Pick any row and column, say the last five-digit column. Read only the first three digits and move down the column that starts with the first row. Therefore, we have 265, 881, 722, etc. If a number appears between 001 and 150, we include it in our sample. Therefore, in order, there will be subjects numbered 24, 59, 107, 73, and 65 in the sample. If necessary, we can continue to the next column on the left until the entire sample is chosen.

Using random numbers in this way is generally preferable to taking every alternate patient or every fifth specimen, or acting on some other regular schedule. The regularity of the plan may occasionally coincide by chance with some unforeseen regularity in the presentation of the study material, for example, through hospital appointments of patients of certain practices on certain days of the week, or samples prepared in batches according to with some schedule.

As susceptibility to the disease generally varies in relation to age, gender, occupation, family history, exposure to risk, inoculation status, the country in which you lived or visited, and many other genetic or environmental factors , it is advisable to examine the samples when they are drawn to see if they are, on average, comparable in these respects. The **random selection process** is intended to do this, but it can sometimes lead to disparities. To avoid this possibility, the sampling can be **stratified**. This means that a frame is initially established and the patients or study objects in a random sample are assigned to the compartments of the frame. For example, the frame could have a primary division into males and females and then a secondary division of each of those categories into five age groups, resulting in a frame with ten compartments.

It is important to note that the distributions of the categories in two samples formed in such a frame may actually be comparable, but will not reflect the distribution of these categories in the population from which the sample is drawn unless the compartments in the frame have been designed with that in mind. For example, equal numbers could be admitted in the male and female categories, but men and women are not equally numerous in the general population, and their relative proportions vary with age. This is known as **stratified random sampling** . To sample from a long list, a compromise between strict theory and practicalities is known as a **systematic random sample.**. In this case, we choose interval-separated subjects from the list, say every tenth subject, but we choose the starting point within the first interval at random.