STATISTICS TUTORIAL -1

Surendhar R
5 min readMar 20, 2021

In this Statistics tutorial -1, you are going to see Random variables and their types, Central tendency, Measures of Central tendency.

1. RANDOM VARIABLE:

A Random variable is a variable which maps the outcome of the random process to the numbers.

For example,

1) Let Tossing a coin be the random experiment

Let X be the random variable

X = 1 if it is heads

2 if it is tails

2) Let Rolling 7 dices be the random experiment.

Let Y be the random variable

Y = Sum of the upward face after rolls 7 dices.

What is the need for Random Variable?

  1. If you want to do a little bit of math on the outcomes.

2. If you cared about the probability let’s say, what is the probability of the sum of the upward face after rolls 7 dices is even?

2. TYPES OF RANDOM VARIABLE:

1. Discrete Random Variable

2. Continuous Random Variable

WHAT IS DISCRETE RANDOM VARIABLE?

A discrete random variable is a variable that takes distinct and separable values. It is finite or countably infinite in nature.

For example,

1. Take the example of Tossing a coin.

X = 1 if it is heads

2 if it is tails

where X be the random variable of this random experiment.

2. A year that a random student was born.

Z value could be

Z = 1986 or Z=1987 or Z=1990

where z is a random variable.

WHAT IS CONTINUOUS RANDOM VARIABLE?

A continuous random variable is a variable that takes any values between the interval. It is infinite and is not countable in nature.

Example:

Let X and Y be the random variable.

X= Random mass of an animal in the zoo.

Y = The length of time it takes a truck driver to go from Chennai to Bangalore.

3.CENTRAL TENDENCY OR AVERAGE:

In Statistics, a central tendency is a central or a typical value of a probability distribution.

MEASURES OF CENTRAL TENDENCY:

We can measure or find the value of the central tendency using the following:

· Mean

· Median

· Mode

1. Mean:

Mean is defined as the sum of all the values divided by the total no. of values.

Formula:

For example:

1) To find the average height of the students in a class, let’s say

There are 10 students in a class. The heights (in cm) of the students are

152.3,165.3,174.6,180.3,145.6,157.6,175.6,166.5,176.5,174.2

The average height of the student in a class =(152.3+165.3+174.6+180.3+145.6+157.6+175.6+166.5+176.5+174.2)/10 = 166.85 cm

FINDING MEAN USING PYTHON:

1) First import NumPy module

import numpy as np

2) then find the mean using NumPy mean() function.

# Using numpy mean() function

heights =[152.3,165.3,174.6,180.3,145.6,157.6,175.6,166.5,176.5,174.2]

mean_height = np.mean(heights)

print(“Mean height of the student in a class is : “,mean_height)

Output :

Mean height of the student in a class is :  166.85

2. MEDIAN

Median is defined as arranging the values or data in ascending order and then find the middlemost value. The middlemost value is the median for the data.

FORMULA:

i. For n is odd:

If n is odd, we will add 1 with n and divided it by 2.

ii. For n is even:

If n is even, we will have two middle values. So that we find the mean for those two values and get the median value.

For example:

1) To find the average DBMS mark of students of the CSE A section in the CAT exam.

Let’s consider there are 40 students in CSE.

Solution using Python:

  1. First import NumPy module.

import numpy as np

2. then find median using NumPy median() function

import random

marks = random.sample(range(10, 50), 40)

median_mark = np.median(marks)

print(“The median of the marks of 40 students is : “,median_mark)

Output :

The median of the marks of 40 students is: 29.5

3. MODE:

Mode is defined as the value which have highest frequency or repeated most no. of times in the given values.

For example:

1. Let’s take the values 8,2,3,8,4,3,8,9,4,7,9,0,1

Mode = 8 (high frequency or count is high).

2. 100,200,100,650,700,200,100,550,963

Mode = 100.

FINDING MODE USING PYTHON:

  1. First import statistics module.

import statistics as stats

2. then find mode using statistics mode() function.

data1 = [8,2,3,8,4,3,8,9,4,7,9,0,1]

mode_value = stats.mode(data1)

print(“The mode value for data1 is : “,mode_value)

Output :

The mode value for data1 is :  8

data2 = [100,200,100,650,700,200,100,550,963]

mode_value = stats.mode(data2)

print(“The mode value for data2 is : “,mode_value)

Output :

The mode value for data2 is :  100

Mean vs Median vs Mode in Outlier Case:

Let’s consider the data contains the values 3,3,3,3,3,3,3,4,4,4,100. The data contains an outlier value of 100. If you take mean, median, and mode for the data, the results will be

Mean = 12.09

Median = 3.0

Mode = 3

There are more no. of 3’s present in the data. The average should be near to 3. Median and Mode give 3 as a result. But mean gives 12.09 as a result. It is because of that outlier value. So it is preferable to use median and mode when there is an outlier present in the data or remove the outlier and then use mean.

Implementation using Python:

data = [3,3,3,3,3,3,3,4,4,4,100]

mean = np.mean(data)

median = np.median(data)

mode = stats.mode(data)

print(“The mean of the data is : “,mean)

print(“The median of the data is : “,median)

print(“The mode of the data is : “,mode)

Output :

The mean of the data is : 12.090909090909092

The median of the data is : 3.0

The mode of the data is : 3

CONCLUSION:

I hope that I have explained the concepts clearly. Thank you for your patient reading.

Learn Well and Grow Well…

--

--