/* SAS Example 1a: Examination of HATCO data */ /* Note that enclosing text in these brackets means that the computer will ignore those lines when reading the program */ OPTIONS PS=52; OPTIONS LS=78; OPTIONS NODATE; OPTIONS PAGENO=1; /* The OPTIONS statements allows us to set various layout options. The first two options listed are Paper Size (number of lines on the page) and Line Size (number of characters on a line). The NODATE statement requests that the date not be included on each page of output (which looks messy). The PAGENO option specifies the page number that the first page of output will be labelled. */ /* Note that at the end of every command line you must include a ; symbol. Often if a program is not running correctly, it is because you forgot this symbol */ DATA HATCO; INFILE 'A:/HATCO_SET.PRN'; /* The DATA statement is creating a "working file" in SAS called HATCO. This will not be saved once you quit the program, but can be ammended and added to whilst using the program and referred to and used at any further point in the program. The "starting point" for this new data file will the imported HATCO_SET.PRN file, although any changes we make won't be reflected in this PRN file. */ INPUT X1-X14; /* Our imported HATCO_SET file has 14 variables for 100 different observations. We are instructing SAS to label these observations X1 through X14. Importing more complicated data (such as alphanumeric characters and names) will be discussed later. */ LABEL X1 = 'Delivery Speed'; LABEL X2 = 'Price Level'; LABEL X3 = 'Price Flexibility'; LABEL X4 = 'Manufacturer Image'; LABEL X5 = 'Service'; LABEL X6 = 'Salesforce Image'; LABEL X7 = 'Product Quality'; LABEL X8 = 'Firm Size'; LABEL X9 = 'Usage Level'; LABEL X10 = 'Satisfaction Level'; LABEL X11 = 'Specification Buying'; LABEL X12 = 'Structure Of Procurement'; LABEL X13 = 'Type Of Industry'; LABEL X14 = 'Type Of Buying Situation'; RUN; /* Even though our variables will still have the names X1 through X14, in our output they will take on the LABEL listed here. We must include a RUN statement once we have finished giving our data entry instructions. */ PROC PRINT DATA=HATCO; TITLE 'HATCO Data'; RUN; /* SAS has many different PROCedures that we will be using throughout the course. Here we are asking it to PRINT our HATCO data set that we just created and label the output 'HATCO Perception Data'. */ /*********************************************************************/ /* EXAMINING SINGLE VARIABLES */ /*********************************************************************/ PROC UNIVARIATE DATA=HATCO NORMAL PLOT; TITLE 'Univariate Statistics for HATCO Data'; VAR X1; QQPLOT; RUN; /* The Univariate procedure allows us to get basic univariate statistics (mean, median, mode, standard deviation) as well as to examine how closely the data fits a particular distribution. By specifying NORMAL in our procedure statement, we are stating that we want to see how closely the distribution resembles a NORMAL distribution, and asking for a plot. The actual normal plot provided looks very poor though so the QQPLOT command has also been specified to get a separate, better looking plot. */ /* The VAR statement is used to tell SAS the variables you wish to examine. To cut down on output, I only asked for X1, although we should really have specified VAR X1-X7 X9 X10 (the quantitative variables) */ /*********************************************************************/ /* EXAMINING BIVARIATE RELATIONSHIPS */ /*********************************************************************/ PROC PLOT DATA=HATCO; TITLE 'Plots of Bivariate Relationships'; PLOT X2*X1; PLOT X3*X1; RUN; /* The PLOT procedure allows us to print out plots of bivariate relationships. The plot desired should be specified in the form PLOT (Y-AXIS VARIABLE)*(X-AXIS VARIABLE) */ PROC CORR; TITLE 'Correlations between HATCO variables'; VAR X1-X4; RUN; /* Another important part of examining bivariate relationships is looking at correlations. PROC CORR requests a correlation matrix between any set on specified VARiables. Again, to cut down on output, I've only requested correlations for X1 through X4. */ PROC SORT DATA=HATCO; BY X14; RUN; /* The above commands sorts our data in order on the variable X14 (a qualitative variable with the values 1, 2 or 3). This is important for the following procedure. */ PROC BOXPLOT DATA=HATCO; PLOT X1*X14 / BOXSTYLE=SCHEMATIC; RUN; /* The BOXPLOT procedure is used to compare the distribution of X1 (the variable of interest) over the different levels of X14 (the class variable). In order to do this, the data must be sorted by X14. The default style of boxplot has the fences continue out to the minimum and maximum values of the data in that group. Since our aim is to identify outliers, I have specified the BOXSTYLE=SCHEMATIC option, which causes the fences to continue to 1.5*IQR below the lower and 1.5 above the upper quartiles, and then all points outside of these fences will by symbolized as an outlier. */