Table of Contents
In the field of statistics, particularly within regression models, understanding the discrepancy between actual data points and the model’s predictions is crucial. This difference is known as a residual.
A residual is fundamentally the vertical distance between an observed value and its corresponding predicted value generated by the fitted regression line. It quantifies how well the model fits a specific data point.
The calculation is straightforward:
Residual = Observed value – Predicted value
When visualizing a simple linear regression, the residual for each observation is clearly seen as the vertical line segment extending from the data point to the regression line:

What is a Residual in Regression Analysis?
The residual serves as a core diagnostic tool in statistical modeling. Analyzing these differences helps researchers assess the validity of the model assumptions, such as linearity and homoscedasticity.
If a residual is positive, the model underestimated the actual value; if it is negative, the model overestimated the actual value. Ideally, residuals should be small and randomly distributed around zero, indicating a good fit.
However, simply looking at raw residuals can be misleading, especially when comparing observations across different scales or models. This is where normalization becomes necessary, leading us to the concept of standardized residuals.
Introducing Standardized Residuals
While raw residuals provide the absolute error, they do not account for the variability inherent in the data or the influence of the observation itself. To address this, we use a normalized form known as the standardized residual (sometimes referred to as the “internally studentized residual”).
Standardized residuals scale the raw residuals by dividing them by an estimate of their standard deviation. This scaling process makes the errors comparable across all observations, allowing for a more objective identification of influential points or outliers.
These residuals are particularly valuable for diagnostics because they transform the error into a unitless measure, similar to a Z-score, which simplifies the application of standard statistical thresholds.
The Standardized Residual Formula Explained
The calculation of a standardized residual, denoted as ri, requires two additional components beyond the raw residual itself: the model’s overall error variance and the observation’s statistical influence.
The formula is defined mathematically as:
ri = ei / s(ei) = ei / RSE√1-hii
Where the components represent the following crucial statistical measures:
- ei: The ith raw residual, calculated as the observed Y minus the predicted Y.
- RSE: The Residual Standard Error of the model. This is an estimate of the standard deviation of the error term (ε).
- hii: The leverage of the ith observation. Leverage measures how far an observation’s predictor values are from the mean of the predictor values, indicating its potential influence on the regression line slope.
By incorporating the RSE and the leverage, the denominator acts as a corrected standard deviation for the specific residual, ensuring the standardization is accurate based on the model’s overall fit and the data point’s position.
Identifying Outliers using Standardized Residuals
One of the primary uses of standardized residuals is the robust identification of potential outliers. Since these residuals are standardized, we can apply general statistical rules derived from the normal distribution to determine unusual data points.
A common rule of thumb in statistical practice dictates that any standardized residual with an absolute value greater than 3 is considered a potential outlier. Given that standardized residuals approximate a standard normal distribution (Z-scores), an absolute value exceeding 3 suggests an observation that is highly unlikely to have occurred if the model were correctly specified and the errors were normally distributed.
It is important to note that identifying an outlier does not automatically mean removal. These observations require further investigation to determine if they stem from genuine variation, a data entry error, or a specific anomaly in the process being modeled. Researchers in certain fields may use a slightly stricter threshold, sometimes considering an absolute value greater than 2 to signal observations worthy of closer scrutiny.
Practical Example: Calculating Standardized Residuals
To illustrate the process, let us consider a sample dataset comprising 12 observations, featuring one predictor variable (X) and one response variable (Y). We aim to fit a linear regression model to this data.
The initial dataset is presented below:

Using statistical software (like R, Excel, Python, Stata, etc.), we first fit the optimal linear regression line to this data. For this example, assume the calculated line of best fit is:
ŷ = 29.63 + 0.7553 * X
This equation forms the basis for predicting Y values and calculating the initial raw residuals for every data point.
Step-by-Step Calculation Walkthrough
We begin by calculating the predicted value (ŷ) and the raw residual (ei) for each observation. For instance, considering the first observation where X = 8 and Y = 41:
Predicted Y (ŷ) = 29.63 + 0.7553 * (8) = 35.67
Raw Residual (ei) = Observed Y – Predicted Y = 41 – 35.67 = 5.33
Repeating this procedure for all 12 observations yields the complete list of raw residuals:

Next, we require the Residual Standard Error (RSE) and the leverage (hii) for each point. For this specific model, the RSE is determined to be 4.44. While the calculation of the leverage statistic is typically automated by statistical packages, its values are necessary for the final standardization step:

Finally, we apply the standardized residual formula, utilizing the raw residual (ei), RSE, and leverage (hii). For the first observation, where ei = 5.33 and hii = 0.27:
ri = 5.33 / 4.44√1-.27 = 1.404
After calculating ri for every point, the complete list of standardized residuals is compiled:

Visualizing Results and Interpretation
The most effective way to interpret standardized residuals is through a diagnostic plot, typically plotting the predictor values against the calculated standardized residuals. This visualization allows us to check for patterns, non-linearity, and, most importantly, identify extreme values that cross the predetermined outlier thresholds.
Here is the scatterplot generated from our example data:

By observing the plot, we can confirm that none of the standardized residuals exceed the common absolute threshold of 3. Therefore, based on this metric, none of the observations in this dataset are classified as significant outliers.
The choice between an absolute threshold of 2 or 3 often depends on the field of study and the sensitivity required for the analysis. Regardless of the chosen threshold, standardized residuals remain an indispensable tool for model diagnostics, ensuring the integrity and reliability of the regression analysis.
Additional Resources
The following tutorials provide additional information about standardized residuals and related regression diagnostics:
Cite this article
Mohammed looti (2025). What Are Standardized Residuals?. PSYCHOLOGICAL STATISTICS. Retrieved from https://statistics.arabpsychology.com/what-are-standardized-residuals/
Mohammed looti. "What Are Standardized Residuals?." PSYCHOLOGICAL STATISTICS, 6 Nov. 2025, https://statistics.arabpsychology.com/what-are-standardized-residuals/.
Mohammed looti. "What Are Standardized Residuals?." PSYCHOLOGICAL STATISTICS, 2025. https://statistics.arabpsychology.com/what-are-standardized-residuals/.
Mohammed looti (2025) 'What Are Standardized Residuals?', PSYCHOLOGICAL STATISTICS. Available at: https://statistics.arabpsychology.com/what-are-standardized-residuals/.
[1] Mohammed looti, "What Are Standardized Residuals?," PSYCHOLOGICAL STATISTICS, vol. X, no. Y, ص Z-Z, November, 2025.
Mohammed looti. What Are Standardized Residuals?. PSYCHOLOGICAL STATISTICS. 2025;vol(issue):pages.