Statistics in EXCEL part-1
In Descriptive statistics , we see the central tendency[Typical value] of the data , Central tendency are the
1) Mean(Average) = [ Population Mean (u) or Sample Mean (x)]
2) Median = Middle value in the data set, it's formula shows the location of the median not the value
3) Mode = Highest occurring value in the data set
If they all lies in the same point then that is wonderful(bell shaped),i mean it is symmetric( Normal distributed)
skewed graph = distribution of Mode is greater than the median and mean.
Variability = How spread apart the data values are.( Unit of difference from mean)
(spread about the mean)
They are calculated using
1) Range = Max(data) - Min(data)
2) Variance or sample variance ( it is in square form, n-1 for sample , n for population, ( x- mean)^2
3) Standard Deviation = [sqrt (variance)] it is used more since it have same unit as the data.
4) Standard error = how well a sample represent population. Formula = standard deviation/[sqrt(n)]
Now heading to a excel sheet where stock data of CISCO and GM is represented we have caluclated distributed analysis using Data analysis add-ins of Microsoft Excel.
if skewness is > +1 and < -1 Then it is highly skewed so use MEDIAN
example: income data ( which a rich man greaterly shift the average(typical) value of income, but that doesnot affect Median)
if skewness is in point ( < 1 and > -1 ) then use MEAN
Here, our exaple have low skewness ( in point) so we have to use Mean
So, Cisco does better than GM in typical Value
Variance, let's look at standard deviation for the info . As standard deviation of CISCO is high
More spread of data from mean for CISCO
skewness of data ? Not much skewness ie, fairly symmetric Datasets
OUTLIERS
To calculate outliers mathematically, atfirst calculate Q1 and Q3 , first quartile and third quartile
Inter quartile range aka IQR = Q3 - Q1 ( 50% of data)
OUTLIERS = 1.5 * IQR + Q3 or Q1 - 1.5 * IQR
Rule of Thumb = 95% of your data is within 2 standard deviation of the mean
and yes anything more than 2 standard deviation of mean is called outliers
so mean - 2s = -19% and mean + 2s = 30%
so it is typical than cisco stocks are changes from this range
Graphically we can highlight outliers using the above data ( mean-2s and mean+2s) and conditional formatting in excel
Statistical Functions PERCENTILE.EXC and PERCENTRANK.EXC
Select data and named it 'cisco' and F3 key will paste the entire data .
We can add many excel function in the data like , skew(), average() . those function can be extracted from the cell using =FORMULATEXT()
90ile =PERCENTILE.EXC(CSCO,.9) = 20.6%
Nine out of 10 month or below and 1 out of 10 month or above
So 21% return, one out of ten month does better than 20.6%
nine out of ten month does worst than 20.6%
Where our data points rank in the data set ???
at row F153 ( August 1998) it lose 14% it is bad month but where does that row rank ??
so =PERCENTRANK.EXC(CSCO,F153) = 0.045
ie 4.5% of the months are worse than august 1998
PERCENTILE returns value from data set ( the value is 20.6)
PERCENTRANK return a probability ( it is 4.5% probability that the month have worse than 14%)
EXCEL functions
=FORMULATEXT(cell )
=AVERAGE(array) //mean
=MEDIAN(Array)
=MODE(array)
=KURT(array) . //kurtosis
=VAR(array) //simple variance
=SKEW(array) //skewness
=STDEV(Array) //standard deviation
=LARGE(Array,2) // 2nd largest value in Array
=SMALL(Array,2) //2nd smallest value in Array
=RANK.EQ(cell,Array,0) //0 for descending 1 for ascending(same rank for same value)
=RANK.AVG(cell,Array,0) //rank is rank.5 if 2 value is same and rank is .33 for triple same
=GEOMEAN(array(selection))
Geometric Mean ( used for comparing 2 value with very different properties)
Nth root of product of N number
Note: here we calculate 1+ return
Cagr( compound anual growth rate) = Geometric mean -1
Comments
Post a Comment