Statistics in EXCEL part-1


In Descriptive statistics , we see the central tendency[Typical value] of the data , Central tendency are the
1) Mean(Average) = [ Population Mean (u) or Sample Mean (x)]
2) Median =  Middle value in the data set, it's formula shows the location of the median not the value
3) Mode = Highest occurring value in the data set

If they all lies in the same point then that is wonderful(bell shaped),i mean it is symmetric( Normal distributed)
skewed graph =  distribution of Mode is greater than the median and mean.

Variability = How spread apart the data values are.( Unit of difference from mean)
(spread about the mean)
 They are calculated using
1) Range = Max(data) - Min(data)
2) Variance or sample variance ( it is in square form, n-1 for sample , n for population, ( x- mean)^2
3) Standard Deviation = [sqrt (variance)]  it is used more since it have same unit as the data.
4) Standard error = how well a sample represent population. Formula = standard deviation/[sqrt(n)]

Now heading to a excel sheet where stock data of CISCO and GM is represented we have caluclated distributed analysis using Data analysis add-ins of Microsoft Excel.
At first lets go with Typical value, what we should chose, Mean or Median ?The answer depend upon the skewness of the data. There is measure of skewness in the table. 
if skewness is > +1 and < -1 Then it is highly skewed  so use MEDIAN
example: income data ( which a rich man greaterly shift the average(typical) value of income, but that doesnot affect Median)
if skewness is in point ( < 1 and > -1 ) then use MEAN
Here, our exaple have low skewness ( in point) so we have to use Mean
So, Cisco does better than GM in typical Value

Variance, let's look at standard deviation for the info . As standard deviation of CISCO is high 
More spread of data from mean for CISCO
skewness of data ? Not much skewness ie, fairly symmetric Datasets

OUTLIERS

To calculate outliers mathematically, atfirst calculate Q1 and Q3 , first quartile and third quartile 
Inter quartile range aka IQR = Q3 - Q1 ( 50% of data)
OUTLIERS = 1.5 * IQR + Q3   or   Q1 - 1.5 * IQR 

Rule of Thumb  = 95% of your data is within 2 standard deviation of the mean
and yes anything more than 2 standard deviation of mean is called outliers

so mean - 2s = -19%  and mean + 2s = 30%
so it is typical than cisco stocks are changes from this range
Graphically we can highlight outliers using the above data ( mean-2s and mean+2s) and conditional formatting in excel

Statistical Functions PERCENTILE.EXC and PERCENTRANK.EXC
Select data and named it 'cisco' and F3 key will paste the entire data . 
We can add many excel function in the data like , skew(), average() . those function can be extracted from the cell using =FORMULATEXT() 

90ile =PERCENTILE.EXC(CSCO,.9)  = 20.6%
Nine out of 10 month or below and 1 out of 10 month or above
So 21% return, one out of ten month does better than 20.6% 
                         nine out of ten month does worst than 20.6%
Where our data points rank in the data set ??? 
at row F153 ( August 1998) it lose 14% it is bad month but where does that row rank ??
so =PERCENTRANK.EXC(CSCO,F153) = 0.045 
ie 4.5% of the months are worse than august 1998

PERCENTILE returns value from data set ( the value is 20.6) 
PERCENTRANK return a probability ( it is 4.5% probability that the month have worse than 14%)

EXCEL functions 
=FORMULATEXT(cell )
=AVERAGE(array) //mean
=MEDIAN(Array)
=MODE(array)      
=KURT(array) .    //kurtosis
=VAR(array)       //simple variance
=SKEW(array)      //skewness
=STDEV(Array)     //standard deviation
=LARGE(Array,2)   // 2nd largest value in Array
=SMALL(Array,2)   //2nd smallest value in Array
=RANK.EQ(cell,Array,0) //0 for descending 1 for ascending(same rank for same value)
=RANK.AVG(cell,Array,0)  //rank is rank.5 if 2 value is same and rank is .33 for triple same
=GEOMEAN(array(selection))

Geometric Mean ( used for comparing 2 value with very different properties)
Nth root of product of N number 
Note: here we calculate 1+ return
Cagr( compound anual growth rate) = Geometric mean -1 



Comments

Popular Posts