/* SAS Example 1: Textbook Chapter 12.2 Question 12.9 */ /* The US Bureau of Labor Statistics produces the consumer price index (CPI) as an indicator of the costs of goods and services to the customer. The bureau also produces a CPI_U, which is a consumer price index for all urban consumers (about 80% of the population), and a CPI_W, which is a consumer price index for clerical workers (about 32% of the population). We need to use the data given to develop a simple regression line to predict the CPI_U from the CPI_W */ /* Note that enclosing text in these brackets means the computer will ignore those lines when reading the program */ OPTIONS PS=52; OPTIONS LS=78; OPTIONS NODATE; OPTIONS PAGENO=1; /* The above four lines are just the way we define the page dimensions that we want the output to conform to. */ DATA CPI; INPUT ITEM $ 1-23 CPI_U CPI_W; /* The input statement is where we name our 3 variables, Item, CPI_U and CPI_W. Whenever we have non numeric data, we put a $ after that variable. Since the Item variable also has a varying length, we tell the computer to accept up to 15 letters before moving on to the next variable. The cards statement that follows just tells the computer that the data is following. We list the data without any ; symbols, but then put one in right at the end */ CARDS; Food, beverages 145.5 144.9 Housing 145.9 143.0 Apparel, upkeep 131.1 130.2 Transportation 135.9 135.2 Medical Care 212.2 211.5 Entertainment 150.2 148.3 Other goods 199.4 197.5 Services 164.2 161.6 ; PROC REG SIMPLE; /* This command just tells the computer to do a simple linear regression analysis on the data. The following command tells it to predict (to model) y (CPI_U) based on x (CPI_W). The P and the R following the slash just says we want predicted values (P) and residuals (R) returned with our output. */ MODEL CPI_U = CPI_W/ P R; OUTPUT OUT=C P=PRED R=RESID; RUN; /* In this step we basically name out output file 'C' and name our predicted values 'pred' and residuals 'resid.' */ PROC PLOT DATA=C; /* The above command tells the computer to plot the data stored in our output file, which we called 'C' earlier */ PLOT CPI_U*CPI_W='A' PRED*CPI_W='P'/ OVERLAY; PLOT RESID*CPI_W='E'; /* We will get two different plots from the above commands. The first will label our ACTUAL points (labeled A) and PREDICTED points (labeled P) on the same plot (due to the overlay command). The second plot will show us what the residuals look like. */ RUN; /* We must always end our program with a run command. */