Use SAS to solve Questions

I don’t understand this Computer Science question and need help to study.

Regression Analysis on House Price in Chicago

Basic concepts on regression analysis.

We have the following Chicago community data on SAS.

/* Chicago Community */

filename webdat url “” ;

/* Import Chicago Community data*/

PROC IMPORT OUT= chicago_cca



proc contents ; run ;

data community ; set chicago_cca ;

income = income/1000;

Black= Black*100;

Unemp= Unemp*100;

Hispanic = Hispanic*100 ;

label Black= “Black Population Ratio” ;

label Hispanic=”Hispanic Population Ratio” ;

label unemp=”Unemployment Rate” ;

label income=”Median Income in $1000″ ;

proc means ; run ;

1. Simple regression Model

Let’s investigate the relationship between income (Y) and unemployment rate (X) using SAS.

1) Find the correlation coefficient and covariance between X and Y.

2) Scatter plot between X and Y with regression line

3) Perform regression analysis using the following code:

proc reg data= community ;

Model income = unemp;

run ;

4) Using SAS code save the predicted value of income from above model and called “yhat1” and save the residuals called “res1”. Scatter plot themincome, yhat1, and res1 with unemp as the X variable.Explain what you found from the plots.

5) Carefully find or calculate the following statistics using the regression output and explain their meanings.

  • Total Sum of Square (SST)
  • Regression Sum of Square (SSR)
  • Error Sum of Square (SSE)
  • R square
  • Adjusted R square
  • Variance of Y
  • Variance of Error
  • Standard Error and variance of b1
  • Standard Error and variance of b2
  • T statistic and P-value of t statistic

6) Perform the following hypothesis tests using the output

  • Ho: B1 = 0,Ha: B1 ¹ 0
  • Ho: B2 = 0, Ha: B3 ¹ 0
  • Ho: B1 = B2 = 0 , Ha: Ho is not true

2. Multiple Regression Model

1) Estimate a multiple regression model using the following code:

proc reg data= community ;

Model income = unemp Hispanic black / VIF;

run ;

2) Carefully explain the difference between R square and Adjusted R square. Compare this model with the simple regression model, which one is better and why?

3) Perform the F test. Setup your hypothesis and perform the test.

4) Carefully explain the multicollinearity and what are the consequences of multicollinearity. From the output of the multiple regression model, do you find the multicollinearity?