The Breusch-Pagan Test: Definition & Example

Name: The Breusch-Pagan Test: Definition & Example
Rating: 5 (34 reviews)
Author: Mohammed looti

Mohammed looti

The Breusch-Pagan Test: Definition & Example

Breusch-Pagan Test, Data Analysis, Econometrics, Heteroscedasticity, homoscedasticity, hypothesis testing, Regression Analysis, residual analysis, residuals, Statistical Tests

In the field of regression analysis, one foundational assumption dictates the validity and reliability of our statistical inferences: the errors in the model must exhibit constant variance. This condition is formally known as homoscedasticity. Achieving homoscedasticity ensures that the spread of the residuals—the differences between the observed and predicted values—remains uniform across all ranges of the independent variables.

When this critical assumption holds true, it guarantees that the estimation process is efficient. Specifically, it allows the Ordinary Least Squares (OLS) method to produce the Best Linear Unbiased Estimates (BLUE). Furthermore, the consistency of the error variance is essential for accurately calculating the standard errors of the coefficient estimates, which, in turn, form the basis for constructing reliable confidence intervals and performing accurate hypothesis tests.

Therefore, any statistical model—particularly those involving linear regression—must be rigorously checked for the presence of constant variance. A failure to confirm homoscedasticity means that subsequent statistical claims, even if the underlying coefficients appear correct, are likely built upon unstable foundations, leading to potentially misleading conclusions.

The Problem of Heteroscedasticity and Its Impact

The violation of the constant variance assumption is termed heteroscedasticity. This occurs when the variance of the residuals changes systematically depending on the level of the predictor variables. For instance, the errors might be very small when the independent variable is low but become much larger and more spread out as the independent variable increases.

Crucially, the presence of heteroscedasticity does not introduce bias into the OLS coefficient estimates; the estimated coefficients remain centered around the true population parameters. However, the standard errors calculated under the assumption of constant variance become unreliable. They are typically underestimated, making the model appear more precise than it actually is. This often leads to inflated t-statistics and the incorrect rejection of the null hypothesis—a Type I error.

Before employing formal statistical methods, analysts often attempt a visual assessment of this condition. By plotting the residuals against the fitted values of the regression model, one can look for distinct patterns. If the scatter plot resembles a funnel (widening or narrowing) or displays distinct clusters of varying density, it serves as a strong preliminary indicator that heteroscedasticity is present, necessitating a formal test to confirm the suspicion.

Example of heteroscedasticity for a Breusch-Pagan Test

Introducing the Breusch-Pagan Test: Definition and Mechanism

To move beyond subjective visual inspection, statisticians rely on formal tests, the most prominent of which is the Breusch-Pagan test. Developed by Trevor Breusch and Adrian Pagan, this test provides a structured, objective method for detecting non-constant variance in the residuals of a regression model. It is widely considered one of the primary diagnostic tools for assessing model assumptions.

The core mechanism of the Breusch-Pagan test is ingenious: it utilizes an auxiliary regression. Instead of directly testing the variance, which is difficult, the test assesses whether the squared residuals (which serve as a proxy for error variance) can be explained by the original independent variables. If the independent variables are good predictors of the squared errors, it implies that the error variance is systematically changing with the levels of those predictors, confirming heteroscedasticity.

Specifically, the test calculates the coefficient of determination (R-squared) from this auxiliary regression. If this R-squared value is high, it suggests that a significant portion of the variability in the squared residuals is explained by the predictors. This explanatory power is then transformed into a statistical measure—the Chi-Square test statistic—which allows us to determine the probability of observing such a result if the null hypothesis of homoscedasticity were true.

Formalizing the Test: Hypotheses and Interpretation

Like all hypothesis testing procedures, the Breusch-Pagan test is structured around two competing hypotheses that address the nature of the residual variance:

Null Hypothesis (H₀): Homoscedasticity is present. The variance of the error terms is constant across all levels of the independent variables. Mathematically, the variance function is constant (e.g., Var(ε_i) = σ²).
Alternative Hypothesis (H_A): Heteroscedasticity is present. The variance of the error terms is not constant but is dependent on the independent variables or the fitted values of the model.

The decision to accept or reject the null hypothesis hinges entirely on the calculated p-value. If the resulting p-value is smaller than the chosen significance level (commonly α = 0.05), there is sufficient statistical evidence to reject H₀. Rejecting the null hypothesis means we conclude that heteroscedasticity is a significant issue in the model, indicating that the standard errors are unreliable and require adjustment.

Conversely, if the p-value is greater than the significance level, we fail to reject the null hypothesis. In this scenario, we proceed with the assumption that the homoscedasticity assumption holds true, and the model’s standard error calculations are considered valid for inference.

Detailed Procedure: The Five Steps of the Breusch-Pagan Test

While statistical software packages automate the computation, understanding the procedural steps of the Breusch-Pagan test is essential for interpreting the output correctly. The test is executed through a sequence of five distinct stages:

Fit the Primary Regression Model: The first step involves running the original OLS regression model that is being investigated. This establishes the relationship between the dependent variable and the set of predictor variables.
Calculate the Squared Residuals: Once the primary model is fitted, the residuals ((epsilon_i)) for every observation must be calculated. Subsequently, these residuals are squared ((epsilon_i^2)). These squared residuals become the response variable for the subsequent auxiliary regression, serving as the measure of variance variability.
Fit the Auxiliary Regression: A second, or auxiliary, regression model is then fitted. In this model, the squared residuals calculated in Step 2 act as the new dependent variable, while the original predictor variables are used again as the independent variables. The key output extracted from this regression is its R-squared value ((R^2_{new})).
Calculate the Chi-Square Test Statistic: The test statistic is derived directly from the results of the auxiliary regression. The statistic, denoted as X², is calculated using the formula: X² = n * R²_new, where:
- n: Represents the total number of observations (the sample size).
- R²_new: Is the R-squared value obtained from the auxiliary regression model (Step 3).
Determine the P-Value: The calculated X² statistic is then compared to a Chi-Square distribution. The degrees of freedom (df) for this distribution is equal to p, which is the number of predictor variables used in the original regression model. The corresponding p-value is found based on the test statistic and the degrees of freedom. If this p-value is below the significance threshold (α), the null hypothesis of homoscedasticity is rejected.

A Practical Application Example

To illustrate this process, consider a scenario where a data scientist is analyzing the performance of 10 professional basketball players. The goal is to predict a player’s rating based on their average points, assists, and rebounds per game. The fundamental question is whether the variability of the prediction errors changes depending on the player statistics.

The initial dataset, containing the key variables for 10 observations, is structured as follows:

bp1

The first step involves fitting the primary OLS model to predict the player rating. Suppose the resulting equation is:

rating = 62.47 + 1.12*(points) + 0.88*(assists) – 0.43*(rebounds)

Following the procedure, we calculate the residual for each player (Actual Rating – Predicted Rating) and square these residuals. These squared residuals are then used as the dependent variable in the auxiliary regression, which includes Points, Assists, and Rebounds as predictors. The resulting table summarizing the residuals and the squared residuals used in the auxiliary model is shown below:

bp2

From the output of the auxiliary regression, we extract the necessary components for the test statistic calculation:

n (Sample Size): 10
R²_new (Auxiliary R-squared): 0.600395

Interpreting the Results and Conclusion

Using the extracted statistics, we proceed to calculate the Chi-Square test statistic for the Breusch-Pagan test:

X² = n * R²_new = 10 * 0.600395 = 6.00395.

The degrees of freedom (df) is equal to the number of predictor variables in the original model, which is p = 3 (Points, Assists, Rebounds).

By comparing the calculated X² value (6.00395) against the Chi-Square distribution with 3 degrees of freedom, the corresponding p-value is determined to be approximately 0.111418.

The interpretation of this result is straightforward: since the calculated p-value (0.111418) is substantially greater than the conventional significance level (α = 0.05), we must fail to reject the null hypothesis (H₀). This statistical outcome indicates that there is insufficient evidence to conclude that heteroscedasticity is present in the model residuals. Therefore, we can proceed with confidence, assuming that the model satisfies the homoscedasticity assumption, and that the calculated standard errors are reliable for hypothesis testing.

Leveraging Statistical Software and Remedial Measures

In modern data analysis, the laborious manual calculation of the Breusch-Pagan test is rarely necessary. Sophisticated statistical software (such as R, Python’s statsmodels, Stata, or SPSS) includes built-in functions that execute the multi-step procedure instantly, returning the final X² statistic and the associated p-value with minimal effort. This automation allows analysts to focus on interpreting the result rather than the mechanics of the auxiliary regression.

However, detection is only the initial phase. If the Breusch-Pagan test leads to the rejection of the null hypothesis (i.e., heteroscedasticity is detected), corrective action must be taken to restore the validity of the statistical inferences. Common remedial strategies include using robust standard errors (such as White’s correction), which adjust the standard error estimates to account for the non-constant variance without changing the coefficient estimates themselves.

Alternatively, the analyst might consider transforming the variables (e.g., using logarithmic transformation) to stabilize the variance, or employing Generalized Least Squares (GLS) if the exact form of the heteroscedasticity is known. The choice of remedy depends heavily on the severity and nature of the non-constant variance identified, ensuring that the final regression analysis is built on statistically sound principles.

The following tutorials provide step-by-step examples of how to perform the Breusch-Pagan test in different statistical programs:

Cite this article

APAMLACHICAGOHARVARDIEEEAMA

Mohammed looti (2025). The Breusch-Pagan Test: Definition & Example. PSYCHOLOGICAL STATISTICS. Retrieved from https://statistics.arabpsychology.com/the-breusch-pagan-test-definition-example/

Mohammed looti. "The Breusch-Pagan Test: Definition & Example." PSYCHOLOGICAL STATISTICS, 6 Nov. 2025, https://statistics.arabpsychology.com/the-breusch-pagan-test-definition-example/.

Mohammed looti. "The Breusch-Pagan Test: Definition & Example." PSYCHOLOGICAL STATISTICS, 2025. https://statistics.arabpsychology.com/the-breusch-pagan-test-definition-example/.

Mohammed looti (2025) 'The Breusch-Pagan Test: Definition & Example', PSYCHOLOGICAL STATISTICS. Available at: https://statistics.arabpsychology.com/the-breusch-pagan-test-definition-example/.

[1] Mohammed looti, "The Breusch-Pagan Test: Definition & Example," PSYCHOLOGICAL STATISTICS, vol. X, no. Y, ص Z-Z, November, 2025.

Mohammed looti. The Breusch-Pagan Test: Definition & Example. PSYCHOLOGICAL STATISTICS. 2025;vol(issue):pages.

Download Post (.PDF)

The Breusch-Pagan Test: Definition & Example

Table of Contents

The Essential Assumption: Homoscedasticity in Regression

The Problem of Heteroscedasticity and Its Impact

Introducing the Breusch-Pagan Test: Definition and Mechanism

Formalizing the Test: Hypotheses and Interpretation

Detailed Procedure: The Five Steps of the Breusch-Pagan Test

A Practical Application Example

Interpreting the Results and Conclusion

Leveraging Statistical Software and Remedial Measures

Cite this article

Table of Contents

The Essential Assumption: Homoscedasticity in Regression

The Problem of Heteroscedasticity and Its Impact

Introducing the Breusch-Pagan Test: Definition and Mechanism

Formalizing the Test: Hypotheses and Interpretation

Detailed Procedure: The Five Steps of the Breusch-Pagan Test

A Practical Application Example

Interpreting the Results and Conclusion

Leveraging Statistical Software and Remedial Measures

Cite this article

Share