In linear regression, the coefficient of determination (R²) measures the proportion of total variation in the response variable explained by the least squares regression line, calculated as the square of the correlation coefficient. A residual plot, which displays residuals against the explanatory variable, helps determine model adequacy: no pattern indicates a linear model is appropriate, while patterns suggest a non-linear model may be better. Additionally, residual plots can reveal violations of constant error variance and identify outliers that may affect model reliability.
Deep Dive
Prerequisite Knowledge
- No data available.
Where to go next
- No data available.
Deep Dive
Math 150 section 4.3Added:
hi guys so this section is on regression diagnostics okay so we're going to continue studying the notion of linear regression but we're going to focus more on some extra features of studying linear regression as well as some problems and things that could happen right okay so let me start by mentioning a couple of important points here so least squares regression can have its drawbacks including a correlation not necessarily implying that the explanatory variable causes change in response right so just because two variables are associated with each other it does not necessarily mean that one directly causes change in the other variable we talked about this right in the last section okay and this is because of what are called lurking variables so lurking variables can influence both variables in the study and this does what's called confound the conclusions confounding the conclusions so we should now say right off the bat that even though they're associated with each other you know if one variable goes up so that's the other one or if one variable goes down the other variable goes up right it does not necessarily mean that they cause each other to occur right it just means you can use one to predict the change in the other one using a model right now regression models are not perfect that there is always variation from the line in the response variable so we see this happen casually in all of our scatter diagrams okay whenever we draw a scatter diagram the dots are never going to line up perfectly even on your line of best fit there is always room for error right we call these residuals right these are the differences between the observed value and the predicted value now the amount of variation that happens can actually be determined mathematically and it's what's known as the coefficient of determination okay so definition first coefficient of determination r squared measures the proportion of total variation in the response variable that is explained by the least squares regression line so the coefficient of determination r squared okay is given by the square of the correlation coefficient so this is the same as the correlation coefficient squared okay so coefficient of determination is equal to the linear correlation coefficient squared okay so again coefficient of determination measures the proportion of total variation in the response variable y that is explained by the least squares regression line so in other words when you think of a proportion you can think of like a percentage so it's the percent of the variation in your response variable y that can be explained by its linear relationship with the explanatory variable when using your least squares regression model okay well it sounds a little abstract but actually it's not too bad let's take a look at this example let's say we want to interpret the coefficient of determination for the tropical storm data right so we used this data in the last section but i copied the data here for convenience okay so grab your calculator and i wanted you to input the data in l1 and also in l2 okay so first you're going to go to stat and enter and make sure you have your data in l1 and l2 right okay and go back to the home screen and then press stat and go over to calc choose number four being reg enter okay and then keep the same setting right and highlight calculate and enter okay here we go so uh the one we're interested in right now is this one right here the second one from the bottom okay this is the coefficient of determination r squared and that's 0.8638 or 0.864 if you round to um three decimal places um okay so that's it now another way to get r squared is simply square the value of the correlation coefficient r right so if you square negative 0.9294 you're going to get this the same thing actually okay all right let's go back to our lecture notes okay so our coefficient of determination r squared is 0.86 okay so this is your coefficient of determination so when you're finding the coefficient of determination it tells you exactly what it says here it's the proportion or percent so 86.4 so is the proportion or the percent of total variation in the response variable that is explained by the least squares regression line so in order to interpret this remember that the response variable for the tropical storm data was wind speed right so basically we interpret this like this 86.4 percent of total variation in wind speed can be explained by the least squares regression line in other words we can explain variation in wind speed as predicted by air pressure about 86.4 percent of the time right so let's write that down okay so 80 86.4 percent of total variation in wind speed can be explained by the least squares regression line right so in other words um we can explain variation in wind speed as predicted by air pressure oops air or pressure by the way air pressure is our explanatory variable right okay about 86.4 percent of the time okay all right so um that means right so that means if you subtract 86.4 from 100 you get what so 100 minus 86.4 percent so that's uh see 13.6 so this can also mean that 13.6 of variation cannot be explained by the regression line it makes sense right so 86.4 percent of total variation in wind speed can be explained by the least squares regression line and uh 13.6 right which is uh what's left over right so 13.6 of a variation cannot be explained by the least squares regression line okay so that means that there are other things that are probably affecting the tropical wind speed of the of the storms right so maybe whether if um it's over land or if it's oversea or if there are other weather uh conditions and that are not accounted for right so this 13.6 right this is uh or this can be or this is explained by other variables such as weather conditions okay so that's pretty much it for the coefficient of determination all right so let's go to the next page okay so there's one more thing to discuss here that is to discuss the fact that even if we do have a linear association between two variables it does not necessarily mean that a linear model is the most appropriate one for the data that we're studying is it better to use one that is not linear if so how do we know so some phenomenon you may have seen in past math classes or whatnot oftentimes if you plot those in the plane right let's say for example something that involves exponential growth you will probably get something like this something like so something like that right so you will get something that might grow really slow for a bit at the beginning right but then begin to increase really rapidly right so you might see something like this in exponential growth and even though it might look linear in certain places right perhaps a model that is exponential is more appropriate for this right so even if there appears to be a linear association it does not mean that it's the best model to use now to explore this in this class we can use what is called a residual plot to be able to tell if a linear model is appropriate or not okay so a residual plot puts the explanatory variable x on the horizontal axis as usual and puts their corresponding residuals up the vertical axis residuals play an important role in determining the adequacy of the linear model okay now using a residual plot we can determine whether a linear model is appropriate to describe the relation between the explanatory and response variable so here's the decision criteria if a residual plot shows no pattern at all then a linear model is appropriate for the study okay and if a residual plot shows a pattern then a linear model is not appropriate for the study very important okay so for example right take a look at the residual plots below look at the one on the left uh residual plot a okay the first one okay this shows no pattern and notice the points are all over the place right points are kind of all over the place okay so that means a linear model is appropriate the first case right here okay well why is that well if points are all over the place that means these residuals occur due to random error not due to inappropriate choice of model okay that's why a linear model is appropriate okay but look at the one on the right okay this one clearly shows a pattern it's like a kind of a u-shape okay so since the the plot shows a pattern clearly uh linear model is not appropriate for this one now uh using a residual plot we can also determine whether the variance of the residuals is constant okay so take a look at the residual plot below right notice that the spread of the residuals so it's like this the spread of the residuals is increasing as the explanatory variable x increases so this is the spread at the beginning right but then as x increases the spread it becomes larger okay so this means that the predictions made using the regression equation will be less reliable when x is large because there is more variability in y okay now this is a violation of what's called constant error variance constant error variance so it says if a plot of residuals against the explanatory variable shows the spread of the residuals increasing or decreasing as the explanatory variable increases then constant error variance of the linear model is violated so if the model does not have a constant error variance right statistical inference using the regression model is not reliable so actually this is quite important to check if the variance of the residuals is constant so this one right here this plot um this this is an example that the variance of the residuals is not constant so this plot this plot does not display constant error variance so this is not good okay we don't want this all right and finally an outliers can be found using a residual plot as well because these residuals will lie very far from the rest of the plot right so very easy to spot them actually okay so in this residual plot right here on at the bottom okay you can see that this point up here let me use red to highlight this sort of circle this one okay this point right here you can see that this one is very far away from the rest right so you can see that this is an outlier okay so let's talk about how to construct a residual plot okay so we're going to use these two data sets and construct and compare residual plots okay so again keep in mind that if there is no pattern then a linear model is appropriate and if there is a pattern then the linear model is not appropriate all right so the first one so this is the um the same tropical uh storm data okay so i want you to grab your calculator and i will show you how to construct a residual plot for this data okay so let me show you how to construct a residual plot okay so first you go to stat and you press enter and make sure you have the data in l1 and l2 and i already put in there so you don't have that data in your list make sure you put it in right now okay so this is for the first data set okay um the tropical storm data all right good now it's very important that after you type your data in right you go to stat and you go to calc and you go down to number four lean reg right okay and then you calculate you run that um the function okay make sure you you do that first then you go to second y equals that will take you to stat plot right and then the first one is on that's what you want okay oh you know what actually i was already playing around with it so i have something else here um i'm pretty sure this is how it looks like on your end right so you make sure you turn on the um you know unplot one and then make sure the first one is highlighted which is the scatter plot and right now you see x list l1 in y list l2 right so if you hit zoom in nine right zoom 9 that will give you a regular plot regular scatter diagram okay not a residual plot okay now to graph a residual plot you um you keep x list the same right so you put l1 there but why list you have to change that to residuals so to do that you press second stat okay and then number seven it says brazil that's uh residuals okay so go down to number seven press enter okay so now the y list and has residuals okay and after that you just press zoom and then nine here we go so this is the residual plot for the first data okay let's see so notice that when we plot the residuals right we really don't see any pattern at all right points are all over the place right so in terms of linear models this is a good thing okay so if the residuals are all over the place if there is no pattern it means that a linear model is appropriate okay so we can make that conclusion um for that okay all right let me go let's go back to the lecture notes okay so i sketched the residual plot here so we can say that the residual plot shows no pattern therefore the linear model is appropriate for this data okay so let's go ahead and construct a residual plot for the second data all right so grab your calculator okay so uh let's graph a residual plot for the second data okay so um first you go to stat right and then enter and then you type your data in i already did that so go ahead and pause the video and then type type the data in l1 and l2 okay and then after you do that before we graph a scatter plot make sure you run being reg ax plus b okay so you're gonna go to stat calc make sure you do this okay number four and then calculate and then you get these values right okay so r equals negative 0.952 well it's very close to negative one which means that there is a strong negative linear association between two variables okay so that means the faster the wind speed the colder it's going to feel makes sense okay but the question is is the linear model the most appropriate model to use right it says there is a linear association but is the linear model the best one to use right okay so we'll see all right so we're going to plot a residual plot but before that we're going to plot a regular scatter diagram okay so what i'm going to do is i'm going to come down here to y list we're going to change this to l2 okay again right now we're trying to get the regular scatter diagram not the residual one okay so change y list to l2 okay and then you press zoom and then 9. look at this so this is what we get right so this scatter plot sort of appears linear sort of but actually it curves it curves right which leads us to believe that even if they are linearly associated or linearly related it may not be the best model to use because it curves right it's not completely linear okay so there might be um a better one okay all right so then what we're gonna do is we're going to stamp plot again press enter and then now we're gonna look at the residual plot so change l2 to resid so second stat go down to number seven enter and then you press zoom and then nine oh now you get this this is not the plot for the data okay this is the plot for the residuals okay and because they show a pattern it's a clear pattern right well in this case kind of a u-shape or a parabolic pattern okay oh by the way just because it has a u-shape or a parabolic pattern it does not mean that a parabola is the best model that's not what it means but it does mean that a linear model is probably not very appropriate okay all right so yeah let's go back to lecture notes and summarize okay so i sketched the residual plot here all right so we can say that um the pattern there's a pattern of residuals okay therefore it's not uh best to use a linear model okay so that's it for this video i will see you in the next one bye guys
Related Videos
Escaping the Fog
LogicLemurGaming
760 views•2026-06-03
Olympiad Mathematics | Indian | Can You Solve This One?
PhilCoolMath
650 views•2026-06-03
A Brutal Radical Expression Made Easy! The Shortcut Changes Everything.
tamoshop
112 views•2026-06-02
V : jee main /advance class 11 mathematics : Binomial Theorem class-1 ( 29 may 2026 )
dcamclassesiitjeemainsadva9953
125 views•2026-05-29
Is This Pentomino Tileable?
3cycle
241 views•2026-05-30
This Sudoku Has Many Lines!!
CrackingTheCryptic
2K views•2026-05-29
Olympiad Mathematics | Indian Can You Solve This One?
PhilCoolMath
268 views•2026-06-02
Olympiad Mathematics | Indian | Can You Solve This?
PhilCoolMath
669 views•2026-06-02











