Inferential statistics enables researchers to make predictions and draw conclusions about a larger population by analyzing a representative sample, rather than collecting data from every individual. This approach is essential because studying entire populations is often impractical due to time and resource constraints. The lecture covers three key sampling methods: simple random sampling (selecting individuals entirely by chance), stratified sampling (dividing the population into homogeneous groups and sampling from each), and cluster sampling (dividing into natural groupings and randomly selecting entire clusters). Additionally, the course introduces correlation and regression analysis for examining relationships between variables, and hypothesis testing with null and alternative hypotheses to validate statistical inferences.
Approfondir
Prérequis
- Pas de données disponibles.
Prochaines étapes
- Pas de données disponibles.
Approfondir
Inferential StatisticsAjouté :
dear participants I welcome all of you in this uh lecture of influential statistics as I already told you we can uh cize statistics into two different branches one is descriptive statistics that we have already seen uh in last lecture and then in today's class we will discuss inferential statistics so in last class we have seen uh how to measure centrality like mean median mode and then uh we have already seen how to measure a spread uh I mean how our data is scattered right so for that purpose we are having range and then standard deviation so we have already seen uh these two topics in last class and uh we have already solved certain problems be related to C centrality and spread in today's lecture we'll discuss idea behind inferential statistics and uh I mean what is the purpose of influential statistics actually and why it is very uh uh much uh important as far as data science is concerned we'll see uh what is population wor and uh what is sample uh and then we'll see certain sampling methods as well and we'll also compare correlation and regression so what is the difference between correlation and regression and what is the similarity between these two then we'll move on to introduce you hypothesis testing so why do we need influential stattic and what is the purpose of influential uh statist actually suppose uh we want to know the average age of data science professional in India right we want to find out the average age of data science professions in India so there could be two Poss possibilities and which uh which one should be used the very first one is meet every data science Professionals in India not on their age and then calculate the total average right so you have to meet every data science professionals let's say there are there are uh 10,000 uh professionals across India so you have to meet every professional and and ask the um her or his age and finally find out the um uh you know average age the other possibility is to hand Peck a number of Professionals in a city like Bangalore or you could uh select NOA as will not on their ages and use it to calculate the Indian average so which approach be used in the initial approach uh the initial approach would be highly impractical right due to the vast resources and time it would consume right so you have to select you know you have to visit each and every state and then uh probably each and every district and then you have to meet data science professionals on the other hand the second approach appears more viable right however it comes with a significant potential drawback and what is the drawback if the demographics of Bangalore are not representative of India's overall population there is a substantial risk of inaccurately determining the average salary of data science professionals Across the Nation so the question remains which technique could we employ to accurately determine the mean age of data scientist throughout India right so here actually influential statistics will help us so how uh we will Define influential statistics unlike descriptive statistics which help us describe and understand features of our uh features of our data that we have already seen influential statistics allow us to make predictions or inferences about a larger population from a sample influencial statistics allow us to make a prediction based on Sample actually so how to select sample and what are the different sampling techniques uh very soon we will see uh the different sampling techniques it is like being a statistical detective actually investigating and making educated guess from data you have Ed so the target is to take a sample and find some value let's say um mean and um you know and influence about a large population right so that is the target of influential statistics since we cannot you know uh visit each and every uh elements in the in the population because it is not practical that uh we have seen so that's why we are taking a sample and then we find out certain you know uh we are performing certain statistics and we'll inference uh about a larger population so what is the importance of inferential statistics making conclusion from a sample about the population to conclude if a sample selected is statistically significant to the whole population or not right it is very important whether this the sample that you have selected it is actually statistically significant or not um as far as the whole population is concerned comparing two models to find which one is more statistically significant as compared to other in future selection whether adding or removing a variable or you could say feature helps in improving the model or not so for all these purposes we can uh use inferential statistics so uh certain key Concepts we will discuss as far as inferential statistics is concerned population versus sample so what is population what is sample sampling methods cor relation and regression and hypothesis testing so what is population and what is sampling every influential statistic is based on the idea of a population and a sample and what is population and what is sample so the population is the whole group you are interested in since in an last example I was talking about you know uh average of data science profession so the population is the each and every dat professional across India so the are the population while a sample is a smaller group drawn from that population so a smaller group let's say I'm taking uh Bangalore I'm considering Bangalore and then visiting each and every data professional so this is my sample the goal is to use the sample data to make inferences about the population let's say one more example if we want to know the average height of adult women in India so we cannot measure every individual you know since we are having uh huge population so we cannot um you know uh measure every individual in this case instead what we what we will do we take a sample of mov and use that data to make an inference about the average height in the entire population so we'll take example and how to select a sample that we'll discuss so there are different type of sampling methods one is simple random sampling a stratified sampling and cluster sampling a sample and resulting statistic will be useful only if it is a representative of the population so why we are having different type of sampling and the target is to select a proper sample in such a way that is it is actually representative of the whole population the very uh first one is simple random sampling simple random sampling is the basic sampling technique where we select a group of subject or you could say um some example for a study from a larger group uh that is called a population each individual each individual is chosen randomly and entirely by chance it is very important each individual SEL selection of uh SE I mean selection of individual should be uh um entirely by chance right so that we call good sample such that each individual has the same probability of being choosen at any stage during the sampling process so the probability of selecting a an individual is actually same for each and every individual let's say a case and imagine we are conducting a study on the uh heating habits of high school students in a city let's say city of uh Bangalore to obtain a simple random sample we could assign a number to every high school students in the city's Public Public School directory so suppose city is maintaining a public school directory so what we will do we'll give a random number to each and every uh entry in that directory right uh and then we'll select 100 students uh in in our study let's say 100 students suppose in Bangalore city um 10,000 students are there as far as high school is concerned let's say so what we'll do we'll collect a public school directory and we'll use a random number generator to to first of all we will give individual you know uh number to each and every student and then we use a random number generator right to select 100 students let's say 100 student let's suppose 200 so this would be a good uh sample as far as stratified sampling is concerned in this uh method of sampling uh it involves dividing the population into smaller groups known as starta that share a similar attribute relevant to the research so whatever the our research is let's say we our research is you know we are collecting uh adult women for finding the average height let's say we are collecting data science professional to find their average salary so all should share the common attribute samples are then taken from each stum each I would say group to ensure that the sample includes members from each segment of the population right so the target is to uh select individual members from each segment of the population from each Statum starta actually and then in so in this case suppose uh let's say an example I will it with with an example the the the example that I discussed in the just previous slide so uh continuing with our study on high school students uh suppose we want to ensure our sample accurately reflect the city's demographic in terms of socioeconomic status so we can divide the entire student population to different starta example low medium high so we can divide the whole student into different population right low um belonging to low uh economic zone high medium high so in this way I we can divide uh the whole population into different socio economic group and then perform simple random sampling so with each Statum within each group this ensure that the students from all economic backgrounds are proportionally represented so in the previous case in the previous random uh sampling so we are not considering uh different groups actually but in staty sampling we are considering different groups and from um these groups we are again performing the um random sampling so that is the uh stratified sampling now we'll discuss the cluster sampling cluster sampling is a technique used when natural groupings are evident in a statistical population so natural grouping suppose you are having several citizen right so the natural grouping could be we can divide into uh these individuals into different states right and similarly they could belong to different districts so District wise we can make a different group or statewise we can make a different groups so this is the natural grouping it involves dividing the population into clusters first we divide the population into cluster and then one or more clusters are choosing at random and everyone within the selected cluster is sampled right so the Clusters may be for example IND idual Villages or geographical areas so individual State and so on so continuing with the previous um example if you wanted to expand our study uh on eating habits to the National level right it may be impractical to use simple random or stratified sampling due to the large and WID spread population so the population is actually the population of whole India instead we could use cluster sampling by dividing the country into clusters based on District or state so we can divide the whole nation into different states and different districts as well we then randomly select a few district and survey every individual this method can save time and resources while still providing a representative uh sample actually so in this way we can perform the uh cluster uh sampling so we have seen three different categories of sampling so the very first was simple random sampling then stratified sampling and then um we have seen cluster sampling right and what could be the examples and applications so in medical resch um research suppose researchers use infuential researcher want to use infuential statistics to determine the effectiveness of a new drug right suppose a new drug um um has been you know discovered by pharmaceutical company and they want to test their effectiveness so again which type of sampling should we take right so we cannot to perform uh test on each and every individuals of a particular Nation so again we'll select certain sample and then we'll try to uh INF we will try to take inference from this sample for the whole population Economist uh use influential statistics to predict future economic conditions again they will take certain sample and uh will they will try to inference uh uh they will try to take inference uh for the whole nation to predict future economic conditions businesses use it to understand consumer behavior and to improve products and services right let's say Amazon is maintaining you know there are several customers in Amazon and flip cart right and they can you know understand U to understand the consumer Behavior they can take certain samples right so they can take samples of um users from Delhi users from Bangalore users from you know different uh cities and they they will try to you know uh find out their uh feedbacks and their preferences and finally they can improve their products and services you know since um um like Amazon is providing several services so they can improve in this way so that is very useful as far as these type of uh you know uh decisions are concerned now we'll see what is correlation and regression so infuential statistics also encompasses correlation and regression analysis which allow us to examine the relationship between variables for instance we can use use regression to predict weight loss based on calorie intake and exercise frequency you know so we can predict weight loss so again this is a uh regression analysis type of problem right regression type of problem correlation and regression are about the relationship between variables while correlation measures the strength and direction of a relationship between two variables that Direction it means either it is positive direction or negative Direction it means if one variable uh value of one variable of feature is increasing so if other parameters value is also increasing it means it is a positive direction negative Direction it means if one variable value is uh value of one feature is increasing but the value of other feature is decreasing so this is the negative Direction so uh using the correlation we can identify uh this type of relationship between uh features and regression allows us to predict one variable based on the value of another so uh suppose you want to predict um you know lung disease based on certain individual parameters right independent parameters let's say age weight of person and and uh you know uh blood pressure level and several other parameters so on the basis of these parameters they want to predict the lung disease so that is actually um uh that can be performed using the uh classification and regression right so that that is the relationship between correlation and regression then there are certain techniques to validate inferences how will you validate the inference to determine whether the inferences made on a sample are valid statist use a variety of techniques here is a detail explanation so what technique they will use so hypothesis testing they will they can perform hypothesis testing again they can also perform validation techniques in data science right so hypothesis testing is one of the main tool of influential statistics it laws us to decide whether there is enough evidence to support a particular belief or hypothesis about a population suppose if a new drug came into market and if we want to know whether this drug is more effective than the previous drug right so uh this type of assumptions suppose uh someone says this drug is more effective than the previous one so this is a hypothesis actually this is the belief that need to be validated so the validity of inferences is often tested by setting up a null hypothesis um that is there is no effect or no difference so that is called null hypothesis and an alternative hypothesis uh that is there is an effect or a difference right so alternative hypothesis says there is a difference actually so there is a um you know that the newly discovered direct is more effective than the previous one so that is called alternative hypothesis null hypothesis actually says there's no such I mean this the newly discovered drug is not effective than the previous one and if there is a certain change in the result this is whole because of chance not because of effectiveness of the drug so that is the null hypothesis we'll see Nal hypothesis in a separate lecture and the another one is cross validation technique such as K fold cross validation or simple um training and testing data so we want to in case of data science the whole data set is divided into two different um uh sub data set one is called training data set and one another one is called testing data set so we want to the Train the model on the basis of training data set and we'll validate it using the uh testing data set so that is one more type of validation the other validation could be a kold cross validation which is more effective so uh in this c u c kold cross validation the whole the very first step is to partition the data so split your data in in this case in Partition of data you need to split your data set into k equal size segments or FS then the next step is train and test so in this case you will use K minus1 fors for training your model and the remaining fold for testing rotating the test fold each time what does it mean suppose you are using three um fold cross validation so what you will do you will use two folds for training your model and the reming model it me since you are using three fold cross validation so uh you will use at a time you will use two folds for training and one fold for testing and then in this way we will rotate it so first and second will be used for training third one will be used for uh you know testing then second and third will be used for training and the first one will be used for testing similarly first and third will be used for training and the second one will be used for testing so in this way we will rotate it and finally uh after testing with each fold average the result to get an overall performance Matrix so this is the way how kfold actually works now to summarize it what we have uh seen in uh this lecture we have seen sampling methods uh so we have under sampling method we have seen uh simple random sampling then stratified sampling and cluster sampling what is the difference between correlation and regression that we have seen and how to validate inferences the the techniques which are actually that that can be used to validate inferences so all these we have discussed in in this class thank you so much for joining the class we'll meet again in next lecture [Music]
Vidéos Similaires
A Number Plus 5 Is 12
MathGirlTutor
101 views•2026-06-03
How to solve this radicals? #radicals #maths #mathematics #mathreview
MsRosette
851 views•2026-06-02
Olympiad Mathematics | Indian | Can You Solve This One?
PhilCoolMath
650 views•2026-06-03
slick TMUA geometry!
JPiMaths
109 views•2026-06-04
H2 Math June Holiday 2026 Intensive Revision | H2 Math Tuition by Achevas #singaporemath #h2math
AchevasTV
304 views•2026-06-01
Escaping the Fog
LogicLemurGaming
760 views•2026-06-03
Edexcel IAL S2 Statistics June 2025 - Complete Paper Walkthrough | WST02/01
Math_Mind_1
140 views•2026-06-03
Can You Solve This Simple Math Problem?
Math_Joy
11K views•2026-06-04
Tendances
This spider is a VAMPIRE (Kinda...)
moreparz
2764K views•2026-06-02
Take Down Notification: Reckless Ben’s Patreon Account
JackConteExtras
1479K views•2026-06-02
Making Ai Choose Where I Eat
Tyrecordslol
3080K views•2026-06-03
Can AI tell what accent I’m using?? #carterpcs #tech #ai #chatgpt
actuallycarterpcs
2732K views•2026-06-01











