Table of Contents
In statistical analysis, the primary method used to quantify the linear relationship between two continuous variables is the correlation coefficient. This standardized metric is essential for data scientists and researchers, as it provides a clear measure of both the strength and the direction of the linear association present in the data. Understanding this relationship is often the first step in building predictive models or determining causality.
The value of the correlation coefficient is always bounded between -1 and 1. This range offers immediate, intuitive insight into how the variables move relative to one another:
- -1 signifies a perfectly negative linear correlation. This means that as one variable increases, the other variable decreases consistently and proportionally.
- 0 indicates that there is no linear correlation between the two variables. They move independently without a discernible linear pattern.
- 1 represents a perfectly positive linear correlation, where both variables increase or decrease together in lockstep.
Furthermore, the magnitude, or absolute value, of the coefficient dictates the strength of the association. Coefficients closer to 1 or -1 indicate a substantially stronger relationship, suggesting that changes in one variable are highly predictable based on changes in the other. Conversely, values near zero suggest a weak or negligible linear link.
This comprehensive guide will detail the methodology for calculating these crucial coefficients within the powerful SAS software environment. Specifically, we will focus on leveraging the dedicated proc corr procedure. For our practical examples, we will utilize a widely available built-in SAS dataset named sashelp.Fish, which contains 159 morphometric measurements collected from fish sampled in a Finnish lake.
Initial Data Inspection: Exploring the Fish Dataset
Before initiating any complex statistical analysis, such as calculating correlation, it is considered best practice to examine the structure and initial observations of the dataset. This preliminary step helps ensure data quality, variable suitability, and correct understanding of the data types. For our purposes, we will inspect the sashelp.Fish dataset.
We can efficiently view the first few records of the data using the proc print procedure in SAS. The following code snippet instructs SAS to display only the first 10 observations, providing a quick snapshot of the variables available for analysis:
/*view first 10 observations from Fish dataset*/ proc print data=sashelp.Fish (obs=10); run;

This initial inspection confirms that the dataset contains several critical numeric variables, including Weight, Length (in multiple forms), Height, and Width. These variables represent continuous measurements and are thus highly suitable for the calculation of the correlation coefficient, which relies on continuous data for meaningful results.
Example 1: Pairwise Correlation Using the VAR Statement
In many research scenarios, the interest lies specifically in quantifying the relationship between a predefined pair of variables rather than the entire dataset. For this example, we will calculate the Pearson correlation coefficient—the most common measure of linear correlation—between the fish measurements Height and Width.
To achieve this specific calculation using proc corr, we employ the powerful VAR statement. The VAR statement explicitly tells proc corr which variables should be included in the analysis, efficiently narrowing the scope to just the two variables of interest:
/*calculate correlation coefficient between Height and Width*/ proc corr data=sashelp.fish; var Height Width; run;

The output generated by proc corr is typically divided into two distinct sections. The first section provides descriptive summary statistics (N, Mean, Standard Deviation, Minimum, and Maximum) for the included variables (Height and Width), offering context for the data distribution. The second and most critical table, labeled “Pearson Correlation Coefficients,” presents the calculated correlation value along with its statistical significance.
Reviewing the results for Height and Width, we observe the following key metrics:
- The calculated Pearson correlation coefficient is 0.79288.
- The corresponding P-value is reported as <.0001.
This analysis reveals a strong positive linear correlation (R ≈ 0.79) between the height and width measurements of the fish. Crucially, because the P-value is extremely small (far below the conventional significance threshold of α = .05), we can confidently conclude that this correlation is statistically significant and not likely due to random chance.
Example 2: Generating a Comprehensive Correlation Matrix
When working with datasets that feature numerous numeric variables, manually running pairwise correlations becomes inefficient. A more streamlined approach is to generate a comprehensive correlation matrix, which calculates the Pearson correlation coefficient for every possible unique combination of variables simultaneously. This matrix serves as an invaluable diagnostic tool, offering a holistic view of multivariate relationships.
In proc corr, calculating the correlation among all numeric variables is remarkably simple. We achieve this by initiating the procedure but intentionally omitting the VAR statement. When the VAR statement is absent, SAS automatically identifies and processes all appropriate numeric variables within the specified dataset:
/*calculate correlation coefficient between all pairwise combinations of variables*/ proc corr data=sashelp.fish; run;

The resulting output is a densely informative correlation matrix. Each cell in this matrix provides three key pieces of information: the correlation coefficient itself, the sample size (N) used for that specific pair, and the corresponding P-value, which aids in assessing statistical significance for each relationship.
A quick examination of this matrix allows us to efficiently analyze relationships involving the fish’s Weight and its various length measurements (Length1, Length2, and Length3). The findings confirm a remarkably consistent and strong association:
- The Pearson correlation coefficient between Weight and Length1 is 0.91644.
- The Pearson correlation coefficient between Weight and Length2 is 0.91937.
- The Pearson correlation coefficient between Weight and Length3 is 0.92447.
These exceptionally high positive coefficients—all hovering around R = 0.92—clearly demonstrate that the fish’s weight is strongly and linearly related to its length, irrespective of which specific length measurement is used. This kind of redundancy often suggests multicollinearity if these variables were to be used in a regression model.
Example 3: Graphical Confirmation Using Scatterplots
While numerical coefficients provide precise measures of association, relying solely on numbers can be misleading, especially if the relationship is non-linear or masked by outliers. Therefore, visualizing the relationship using a scatterplot is a critical step to confirm the assumption of linearity and to visually assess the strength and direction of the correlation.
The proc corr procedure in SAS is highly flexible and allows users to generate high-quality graphical output directly through the PLOTS option. To confirm the strong positive correlation observed between Height and Width, we will re-run our pairwise analysis and append the necessary plotting statement: plots=scatter(nvar=all). The nvar=all sub-option ensures that a scatterplot is created for every pair of variables specified in the VAR statement.
/*visualize correlation between Height and Width*/ proc corr data=sashelp.fish plots=scatter(nvar=all);; var Height Width; run;

The resulting scatterplot provides clear visual validation of our numerical findings. The data points form a distinct, tight cloud that slopes upward from left to right, visually confirming the strong positive linear correlation (R = 0.79288). The lack of severe curvature or isolated outliers reinforces the reliability of the calculated correlation coefficient.
A useful feature of this graphical output is the inclusion of key statistical summaries directly on the plot. In the top-left corner, analysts can quickly reference the total number of observations, the exact value of the correlation coefficient, and the associated P-value, integrating both the visual and numerical evidence of the relationship.
Summary and Additional Resources for SAS Procedures
The correlation matrix and individual correlation coefficients calculated using proc corr are indispensable tools for initial data exploration and statistical modeling in SAS. By effectively utilizing the VAR statement for specific pairs or omitting it to generate a full matrix, researchers can quickly and accurately assess the linear associations between their variables, confirming data patterns visually through the PLOTS option.
Understanding how to calculate the correlation coefficient is just one aspect of working with large datasets in the SAS environment. To continue developing your statistical programming skills, explore other powerful procedures essential for data preparation, manipulation, and advanced modeling:
Cite this article
Mohammed looti (2025). Calculate Correlation in SAS (With Examples). PSYCHOLOGICAL STATISTICS. Retrieved from https://statistics.arabpsychology.com/calculate-correlation-in-sas-with-examples/
Mohammed looti. "Calculate Correlation in SAS (With Examples)." PSYCHOLOGICAL STATISTICS, 1 Nov. 2025, https://statistics.arabpsychology.com/calculate-correlation-in-sas-with-examples/.
Mohammed looti. "Calculate Correlation in SAS (With Examples)." PSYCHOLOGICAL STATISTICS, 2025. https://statistics.arabpsychology.com/calculate-correlation-in-sas-with-examples/.
Mohammed looti (2025) 'Calculate Correlation in SAS (With Examples)', PSYCHOLOGICAL STATISTICS. Available at: https://statistics.arabpsychology.com/calculate-correlation-in-sas-with-examples/.
[1] Mohammed looti, "Calculate Correlation in SAS (With Examples)," PSYCHOLOGICAL STATISTICS, vol. X, no. Y, ص Z-Z, November, 2025.
Mohammed looti. Calculate Correlation in SAS (With Examples). PSYCHOLOGICAL STATISTICS. 2025;vol(issue):pages.