STATISTICS TUTORIAL-2

Surendhar R
5 min readApr 5, 2021

In this Statistics tutorial 2, you are going to see Population, Sample, Population Mean, Sample Mean, Measures of Dispersion - Range, Variance, and Standard Deviation.

POPULATION AND SAMPLE :

1. POPULATION

The population is nothing but the total points available or the entire group that you want to draw a conclusion about. Population size is denoted by N.

For example,

  1. Total students in a CSE department.
  2. All the employees in the Sales department.

2. SAMPLE

A sample is a specific group or subset that will collect from the population. The sample size is denoted by n.

For example,

  1. 30 students(taking a subset from the total no. of students) from the CSE department
  2. 50 employees (taking a subset from the total no. of employees) from the Sales department.

Note :

Sample size n is always lesser than the Population size N (n<N).

3. POPULATION MEAN AND SAMPLE MEAN

Let me explain this concept with an example. Consider you want to find the average height of all the men in America. There are more than 150 million men in America. Here the population is total no. of men in America. To find the solution, we sum up all the heights of the men and divide it by the total no. of men in America. This is the concept of finding the mean from the population. But it is advisable to use it when the population size is small. But here the population size is very large and it is not possible to find all the heights, then sum it up and then divide it by population size. Instead of finding the mean for the whole population, we can take a sample or take a subset from the population say 1 million, then add all the heights of the men from the sample and then divide it by sample size. This is the concept of finding the mean from the sample. This method is preferable when the population size is so large.

POPULATION MEAN :

The mean calculated from the population is called the population mean.

SAMPLE MEAN :

The mean calculated from the sample is called the sample mean.

FORMULA :

4. MEASURES OF DISPERSION

Measures of dispersion describe the spread of data around a central value (mean, median, or mode). They tell us how much variability there is in the data.

TYPES OF MEASURES OF DISPERSION:

  1. Range
  2. Variance
  3. Standard Deviation

1. RANGE

Range tells you about the spread of a set of data. It is calculated by taking the difference between the highest and lowest value in the dataset. Value of Range is high if the spread of data is more. Value of Range is low if the spread of data is less.

FORMULA:

IMPLEMENTATION USING PYTHON:

sample_1 = [2.74, 1.23, 2.63, 2.22, 3, 1.98]

range_1 = max(sample_1) — min(sample_1)

print(“Range of the sample 1 : “,range_1)

sample2 = [1, 2, 5, 4, 8, 9, 12]

range_2 = max(sample2) — min(sample2)

print(“Range of the sample 2 : “,range_2)

sample3 = [-9, -1, -0, 2, 1, 3, 4, 19]

range_3 = max(sample3) — min(sample3)

print(“Range of the sample 3 : “,range_3)

OUTPUT :

Range of the sample 1 : 1.77

Range of the sample 2 : 11

Range of the sample 3 : 28

2. VARIANCE

Variance is the measure of each data point that is far away from the middle point(spread of data). It is calculated as the average squared deviation from the mean of the dataset. It tells you about the degree of spread of data. The value of variance is high when the spread of data is more. The value of variance is low when the spread of data is less.

FORMULA:

IMPLEMENTATION USING PYTHON:

  1. First import variance() function from statistics module.
  2. Then calculate variance for each sample using variance().

from statistics import variance

sample_1 = [2.74, 1.23, 2.63, 2.22, 3, 1.98]

var = variance(sample_1)

print(“Variance of the sample 1 : “,var)

sample2 = [1, 2, 5, 4, 8, 9, 12]

var2 = variance(sample2)

print(“Variance of sample 2 is : “,var2)

sample3 = [-9, -1, -0, 2, 1, 3, 4, 19]

var3 = variance(sample3)

print(“Variance of sample 3 is : “,var3)

OUTPUT :

Variance of the sample 1 is : 0.40924

Variance of sample 2 is : 15.80952380952381

Variance of sample 3 is : 61.125

3. STANDARD DEVIATION

Standard deviation is the measure of dispersion or variation of dataset relation to its mean. It is calculated by taking the square root of variance. If the standard deviation is high, the data points are further from the mean. If the standard deviation is low, the data points are near to the mean.

FORMULA:

IMPLEMENTATION USING PYTHON:

  1. First import stdev() function from statistics module.
  2. Then calculate variance for each sample using stdev().

from statistics import stdev

sample_1 = [2.74, 1.23, 2.63, 2.22, 3, 1.98]

stdeviation_1 = stdev(sample_1)

print(“Standard Deviation of sample 1 : “, stdeviation_1)

sample2 = [1, 2, 5, 4, 8, 9, 12]

stdeviation_2 = stdev(sample2)

print(“Standard Deviation of sample 2 : “, stdeviation_2)

sample3 = [-9, -1, -0, 2, 1, 3, 4, 19]

stdeviation_3 = stdev(sample3)

print(“Standard Deviation of sample 3 : “, stdeviation_3)

OUTPUT :

Standard Deviation of sample 1: 0.639718688174732

Standard Deviation of sample 2: 3.9761191895520196

Standard Deviation of sample 3: 7.8182478855559445

CONCLUSION:

I hope that I have explained the concepts clearly. Thank you for your patient reading.

LEARN WELL, GROW WELL…

--

--