hi bill. the book mentioned by you is very costly. i am from India. can you suggest me any other affordable good books or resources where i can download the same.

The “Big Data” issue is a pivotal question in Statistics right now. Will it change the basic way we perform statistical analysis? No. The problem is that with all of these new sources for data, social media, youtube, text messages, anything and everything. We are receiving data a lot more data, at a much higher rate, and in many different forms(Think structured, data that fits in rows and columns, and unstructured data, like a youtube video) than we are historically used to handling. Like I said before this will not change our statistical measures but it will change how we collect, store, and think about data as a society. We need bigger places to store them and faster ways to analyze. I hope this messy explanation clears up a little bit about “Big Data”.

Will Big Data (http://en.wikipedia.org/wiki/Big_data) change the way Statisics is performed/used? If the sample size is very close to the target size (might be mixing up the terms) will it change the nature of statistics which historically was used to create meaning from small amounts of data? Using your metaphor: If we tagged almost every animal in the woods, would that change how we determine what made the footprints?

For learning stats with minimal assumed prior knowledge this is the best book I’ve found. Although this one partners SPSS there are versions of the same book for R and SAS statistical packages. Andy Field’s ‘Statistics Hell’ website is worth a look too.

Adam, great question. I’m not familiar with Big Data, but one analogy (take from discussion here: http://www.reddit.com/r/math/comments/zevvv/a_brief_introduction_to_probability_statistics/) is that it could be like not just measuring the footprint of the bear, but getting its DNA sequence. Whoa! What can we figure out then? [“This bear is going to have kids who have footprints like this…” or even “This bear is going to have high blood pressure” :)].

So yep, I think it’s tagging animals at a super-deep level, to make even crazier predictions.

The Big Data issue is more than just the amount of data, though having large amounts of data is part of the solution. It’s about not necessarily understanding a particular model, but just letting the data, the statistics, provide answers. So in translation systems like Google Translate the idea is to drop any linguistic model. Linguistic models are way too complicated, and they vary over time and context - read urbandictionary.com. There’s not way to keep the models updated to make current relevent translations. By looking for all the ways in which sets of words are used on masses of data there becomes a statistical likelihood that a particular translation will be the best understood. It doesn’t matter if the translation defies all grammatical rules of a language, and hence any liklihood of getting into some formal linguistic model, for example when translatig some very popular current idiom. All we need to know is what it is what’s being used, and so what will be understood.

Beyond that I’ve found most stats books have too many consecutive formulas with minimal explanation, and not enough fully worked examples with real data, or simply badly explained!

This is why this site is so good. Kalid realizes that understanding is the most important thing first, allowing you to build your knowledge from that solid base all the way to those complex formulas. Good work Kalid!

I love your website because I have the same viewpoint as you as for visual thinking.

As a former statistical process control engineer from the Deming’s school, I may formulate differently :

statistics is descriptive

probability is predictive

I would rather use the term infer the model than predict the model, because once you get the model, you’ll try to make prediction on future data range.

The big problem is that most people will infer that the model is the Normal Law because that’s what they were taught at school. Deming and Shewhart did insist that Normal should not be applied in real world and in fact scientists were very unrigorous about their usage. For example if you learnt at school the Henry graphical method http://fr.wikipedia.org/wiki/Droite_de_Henry (I can’t find any english link) well it can very wrong to infer Normal Law from it in real life matter like industrial quality because it is a static model whereas living things are dynamic (the model shape itself can change over time !).

The purpose of statistical process control is roughly to make the process model of an industrial product stable in time.

Rule #1 from the Eric book of Stats (may the gods help us if that book is ever written)
“Your results don’t mean anything, until and unless they mean something”

I’ve heard everyone from electrical engineers to political pollsters get all ginned up over statistics and miss the fundamental relationship behind the wall of data they throw up. Don’t get me wrong I love a good discussion on harmonic mean vs. arithmetic mean as much as the next guy. I’ve just seen a lot of cases where people forget that stats is intended as a tool to better see the world around us, not a veil to falsely lend authority to our preconceived notions.

My Rule #1 above has the following corollary:
There exist 3 things in statistics that have some (maybe smaller than you think) level of importance. Starting with the least important they are
A) your results
B) your explanation of your results
C) your justification of your explanation of your results

An example of A)
-the ‘average’ height of 10 students in class is 5’9"
-the ‘average’ bacteria population in 10 petri dishes after 2 hours is 12,000,000
-the ‘average’ gain of the 10 amplifiers is 2.3 dB

An example of B)
-collected 10 data points and used arithmetic mean
-collected 10 data points and used harmonic mean
-applied same input to 10 devices, collected output on log scale, converted each to linear scale, took arithmetic mean, converted ‘average’ output back to log scale, used ‘average’ log output and log input to determine ‘average’ gain by the following equation…

An example of C)
‘I used this equation because…’
‘Here are other equations I could have used, they would have this effect, that model wouldn’t have fit this example because…’

Wow, I have decided to do my final paper about the how and why of statistics and I have just stumbled into your website while doing homework on statistics. YAY!!! I have used stats in my work reporting to federal and state agencies but never really understood what they meant or why they even needed them really. So that is why I chose this for my final paper. I have a couple of good places to turn, but I am so much more excited about this website than the other ones. Thank you for doing something like this for those of us who are “mathematically challenged”