SAS Workshop
SAS Data Management
Hun Myoung Park, Ph.D.
University Information Technology Services
Center for Statistical and Mathematical Computing
Saturday, October 03, 2015
© 2009-2010 The Trustees of Indiana University
http://www.indiana.edu/~statmath
[email protected] (812) 855-4740, (317) 278-4740
SAS Data Management
October 3, 2015
Outline
•
•
•
•
•
•
•
•
•
INPUT Statement: Input Styles
INFILE Statement
PROC IMPORT/EXPORT
Data Manipulation
IF…ELSE & DO…END Clauses
Recode Variables
LABEL and RENAME Variables
Select Observations & Variables
Working with Data Sets
University Information Technology Services
Center for Statistical and Mathematical Computing
SAS Data Management
October 3, 2015
SAS DATA SOURCES
University Information Technology Services
Center for Statistical and Mathematical Computing
SAS Data Management
October 3, 2015
DATA STEP
•
•
•
•
DATA steps read and manipulate data.
INPUT tells how SAS reads data.
DATALINES indicates data items to be read
begin from the next line.
INFILE reads data from an external file.
LIBNAME sm ‘c:\temp\sas’;
DATA sm.grade0;
INPUT name $ id stat math;
DATALINES;
John 10091 89 95
…
RUN;
University Information Technology Services
Center for Statistical and Mathematical Computing
SAS Data Management
October 3, 2015
INPUT STATEMENT
•
There are six input styles.
1.
2.
3.
4.
5.
6.
•
•
List INPUT: simply list variable names
Modified list INPUT: for ill-organized data
Column INPUT: location of data items
Formatted INPUT: format of data items
Named INPUT: variable=data
Mixed INPUT: combine input styles
SAS recognizes styles automatically.
List (previous slide), column, and
formatted INPUT are commonly used.
University Information Technology Services
Center for Statistical and Mathematical Computing
SAS Data Management
October 3, 2015
COLUMN INPUT
•
•
•
•
Specify column locations of data items
$ follows a character variable name
Data should be aligned correctly
Some data may be skipped (id below)
DATA sm.grade1;
INPUT name $ 1-6 stat 12-14 math 15-17;
DATALINES;
John 10091 89 95
Rachel10100
99
…
RUN;
University Information Technology Services
Center for Statistical and Mathematical Computing
SAS Data Management
October 3, 2015
FORMATTED INPUT
•
•
•
•
Specify formats of data items.
Data should be aligned correctly.
$6. means a string 6 characters long.
3.0 or 3. means a 3 digit number without
a decimal point.
DATA sm.grade2;
INPUT name $6. id 5. male 3. stat 3.0;
DATALINES;
John 10091 89 95
Rachel10100
99
…
University Information Technology Services
Center for Statistical and Mathematical Computing
SAS Data Management
October 3, 2015
COLUMN CONTROLS
•
•
•
@n moves the input pointer to the nth
column.
+n shifts the input pointer by n column
to the right.
You may skip some variables (e.g., id)
INPUT name $6. +5 stat 3. @15 math 3.;
DATALINES;
John 10091 89 95
Rachel10100
99
…
University Information Technology Services
Center for Statistical and Mathematical Computing
SAS Data Management
October 3, 2015
LINE CONTROLS
•
@ and @@ trailing hold input records
INPUT name $ stat math @@;
DATALINES;
John
89 95 Ron 76 100
•
#n moves the input pointer to nth line, /
moves the input pointer to the next line
INPUT name $6. #2 stat 3. / math 3.;
DATALINES;
John
89
95
University Information Technology Services
Center for Statistical and Mathematical Computing
SAS Data Management
October 3, 2015
USING INFORMAT
•
•
INFORMAT defines a format of data
entry and FORMAT for data output
COMMA and PERCENT
INPUT income COMMA9. saving PERCENT3.;
DATALINES;
170,200 89%
92,540 95%
•
MMDDYY and DOLLAR
INPUT dates MMDDYY10. sales DOLLAR12.0;
DATALINES;
10/01/2009
$343,750
10/05/2009 $9,784,650
University Information Technology Services
Center for Statistical and Mathematical Computing
SAS Data Management
October 3, 2015
INFILE STATEMENT 1
•
INFILE specifies an external data file
that is created in advance.
INFILE ‘c:\temp\sas\grade2.txt’;
INPUT name $6. id 5. stat math (3.);
•
•
If the data file contains missing values,
add the MISSOVER option to INFILE.
The MISSOVER option treats all
remaining variables without values as
missing values.
INFILE ‘c:\temp\sas\grade2.txt’ MISSOVER;
University Information Technology Services
Center for Statistical and Mathematical Computing
SAS Data Management
October 3, 2015
INFILE STATEMENT 2
•
•
Alternatively, use FILENAME to make
an alias of an external file.
Like LIBNAME, FILENAME makes it
easy to referring a file name especially
when the name and path are long.
FILENAME gFile ‘c:\temp\grade2.txt’;
DATA sm.grade2;
INFILE gFile MISSOVER;
INPUT name $6. id 5. stat math (3.);
RUN;
University Information Technology Services
Center for Statistical and Mathematical Computing
SAS Data Management
October 3, 2015
INFILE STATEMENT 3
•
•
DLM specifies a delimiter. More than
one delimiter can be used. e.g., *iu*
DSD (Delimiter Sensitive Data)
assumes a comma delimiter and is
needed for missing values
INFILE DATALINES DLM=’,’ DSD;
INPUT name $ id stat math;
DATALINES;
John,10091,89,95
Rachel,10100,,99
…
University Information Technology Services
Center for Statistical and Mathematical Computing
SAS Data Management
October 3, 2015
READ ODD STRINGS
•
What if a string variable is long and/or
contain commas? Use ampersand (&)
INFILE DATALINES DLM=‘,’ DSD;
INPUT name : $20. title & $100.;
DATALINES;
Lindblom,”Still Muddling, Not Yet Through”
Park, “Reading “”Small is Beautiful”””
•
How to read quotation marks in a string?
Use tilde(~)
INPUT name : $20 title ~ $100.;
DATALINES;
Lindblom,”Still Muddling, Not Yet Through”
University Information Technology Services
Center for Statistical and Mathematical Computing
SAS Data Management
October 3, 2015
READ THROUGH NETWORK
•
SAS can read external data files
through TCP/IP (http and ftp)
FILENAME masil URL
'http://www.masil.org/sas/grade2.txt';
DATA sm.grade2;
INFILE masil FIRSTOBS=1;
INPUT name $6. id 5. stat math (3.);
•
•
URL denotes HTTP (protocol)
FIRSTOBS=1 tells SAS to read
observations from the 1st row
University Information Technology Services
Center for Statistical and Mathematical Computing
SAS Data Management
October 3, 2015
VIEW A DATA SET
•
PROC CONTENTS displays data
structures of a data set.
PROC CONTENTS DATA=sm.grade2; RUN;
•
PROC MEANS displays summary
statistics of a data set.
PROC MEANS DATA=sm.grade2; RUN;
•
PROC PRINT displays values of
variables in a data set.
PROC PRINT DATA=sm.grade2;
VAR stat math;
University Information Technology Services
Center for Statistical and Mathematical Computing
SAS Data Management
October 3, 2015
PROC IMPORT 1
•
PROC IMPORT reads external files such as
CSV, Excel, dBase, and Access.
FILENAME airline URL
'http://www.indiana.edu/~statmath/stat
/all/panel/airline.csv';
PROC IMPORT DATAFILE=‘airline’
OUT=sm.airline DBMS=CSV;
GETNAMES=YES;
DATAROW=2;
•
•
DBMS specifies the format of the file.
IMPORT Wizard is more user-friendly in MSWindows. Click File->Import Data.
University Information Technology Services
Center for Statistical and Mathematical Computing
SAS Data Management
October 3, 2015
PROC IMPORT 2
•
An example of reading an Excel file.
PROC IMPORT DATAFILE="c:\airline.xls“
OUT=sm.airline2 DBMS=EXCEL2000;
SHEET=“sheet1";
GETNAMES=YES;
•
SHEET specifies a worksheet with a
range of data to be read in an Excel file.
SHEET=“sheet1$A1:G91";
•
•
GETNAMES=YES reads variable
names from the file.
CAUTION: Excel 2007 file should be
saved as 2003 format for SAS 9.13.
University Information Technology Services
Center for Statistical and Mathematical Computing
SAS Data Management
October 3, 2015
PROC EXPORT
•
Export to a CSV file. DATA specifies a
data set to be exported.
PROC EXPORT
DATA=sm.airline
OUTFILE="c:\temp\airline.csv“
DBMS=CSV REPLACE;
•
Export to an Excel file. REPLACE
overwrites the existing data set.
PROC EXPORT
DATA=sm.ariline
OUTFILE="c:\temp\airline2.xls“
DBMS=EXCEL2000 REPLACE;
RUN;
University Information Technology Services
Center for Statistical and Mathematical Computing
SAS Data Management
October 3, 2015
DATA MANIPULATION
•
•
Data sets are manipulated in DATA steps
SET statement loads existing SAS data sets
DATA sm.grade7;
SET sm.grade2;
sum1 = stat+math;
sum2 = SUM(stat, math);
mean1 = (stat+math)/2;
mean2 = MEAN(stat, math);
RUN;
•
The result is stored into grade7 without
changing anything in grade2
University Information Technology Services
Center for Statistical and Mathematical Computing
SAS Data Management
October 3, 2015
IF…THEN CLAUSE
•
IF… THEN
IF mean1 GT 90 THEN DELETE;
IF sum1 >= 98 THEN grade=‘A+’;
IF (stat>90 AND math>90) THEN grade=‘A’;
•
IF… THEN… ELSE
IF mean2 >= 98 THEN grade=‘A+’;
ELSE grade=‘A’;
•
IF… THEN… ELSE IF…
IF mean2 >= 98 THEN grade=‘A+’;
ELSE IF mean2>=92 THEN grade=‘A’;
ELSE IF mean2>=88 THEN grade=‘A-’;
ELSE grade=‘B’;
University Information Technology Services
Center for Statistical and Mathematical Computing
SAS Data Management
October 3, 2015
DO LOOP
• DO…TO… END
DO i=1 TO 100 BY 1;
random=RANNOR(7654321);
OUTPUT sm.randomtest;
END;
•
•
•
•
DO num= 4, 6, 9, 10; …; END;
DO str=‘Mon’, ‘Tue’, ‘Wed’; …; END;
DO WHILE (n<=10); …; END;
DO OVER for an array variable
DO OVER arr; …; END;
University Information Technology Services
Center for Statistical and Mathematical Computing
SAS Data Management
October 3, 2015
IF & DO… END
•
This combination executes a list of
statements in a block. Imagine { and } in
C and most computer languages.
IF mean2 GT 90 THEN DO;
…;
…;
END;
ELSE DO;
…;
…;
END;
University Information Technology Services
Center for Statistical and Mathematical Computing
SAS Data Management
October 3, 2015
INPUT & DO… END
•
This combination enables you to read
experiment data in an easy manner.
INPUT drug $ @@;
DO level=‘High’, ‘Medium’, ‘Low’;
INPUT symptom @@;
OUTPUT;
END;
DATALINES;
Drug1 34 55 37
Drug2 84 86 87
Drug3 87 86 24
University Information Technology Services
Center for Statistical and Mathematical Computing
SAS Data Management
October 3, 2015
RECODE VARIABLES 1
•
Use operators and functions
DATA sm.grade8;
SET sm.grade2;
random=RANNOR(7654321);
id=_N_; /* serial observation number */
•
IF… THEN and Special Usage
IF age <= 20 THEN age2=0;
ELSE IF age<=30 THEN age2=1;
ELSE IF age<=40 THEN age2=2;
ELSE age2=3;
Age2 = (age>20)+(age>30)+(age>40);
University Information Technology Services
Center for Statistical and Mathematical Computing
SAS Data Management
October 3, 2015
RECODE VARIABLES 2
•
•
ARRAY links between variables and
corresponding arrays (temporary vars.)
Suppose trust has 1 through 5 and
needs reverse recoding.
ARRAY arr trust;
DO OVER arr;
arr=6-arr;
END;
ARRAY arr(5) q1-q5;
DO i=1 TO 5;
arr(i)=6-arr(i);
END;
University Information Technology Services
Center for Statistical and Mathematical Computing
SAS Data Management
October 3, 2015
MULTIPLE RESPONSE
•
•
Suppose respondents are asked to choose
three items and their choices are stored in the
wide form from c1, c2, c3, respectively.
For frequency analysis, data need to be
rearranged in to the long form using OUTPUT.
DATA sm.patient2;
SET sm.patient;
choice=c1; OUTPUT;
choice=c2; OUTPUT;
choice=c3; OUTPUT;
RUN;
•
The number of observations becomes N*3 .
University Information Technology Services
Center for Statistical and Mathematical Computing
SAS Data Management
October 3, 2015
RENAME VARIABLES
•
•
Use the DATA statement in DATA Step.
“old variable= new variable”
DATA sm.grade9( RENAME=(sum1=total1) );
SET sm.grade7;
…
RUN;
•
Use the RENAME statement in a DATA
step. “old variable= new variable”
DATA sm.grade10;
SET sm.grade7;
RENAME sum2=total2 mean2=average2;
…
RUN;
University Information Technology Services
Center for Statistical and Mathematical Computing
SAS Data Management
October 3, 2015
LABEL VARIABLES
•
Variable labels
SET sm.grade2;
LABEL stat=‘Statistics’ math=‘Math’;
•
•
Value labels using PROC FORMAT. PROC
FORMAT;
VALUE yes_L 1=‘Yes’ 0=‘No’;
VALUE $male_L ‘M’=‘Male’ ‘F’=‘Female’;
A period in label name ($male_L.) may
not be omitted in a FORMAT statement.
DATA sm.survey2;
SET sm.survey1;
FORMAT trust yes_L. male $male_L.;
University Information Technology Services
Center for Statistical and Mathematical Computing
SAS Data Management
October 3, 2015
SELECT OBSERVATIONS
•
•
Use the IF clause
Keep only obs that meet the condition
IF mean2 >= 90;
IF (stat GE 80 OR math GE 80);
•
Delete obs that meet the condition
IF mean2 LT 80 THEN DELETE;
IF (stat<80 AND math <80) THEN DELETE;
•
WHERE in DATA/PROC uses a subset
of a data set w/o changing the data set
WHERE mean2 GE 90;
University Information Technology Services
Center for Statistical and Mathematical Computing
SAS Data Management
October 3, 2015
SELECT VARIABLES
•
Use either KEEP or DROP statement;
You may not use both in a DATA step.
DATA sm.grade15;
SET sm.grade7;
DROP sum1 mean1;
RUN;
•
In a DATA statement. Do not omit =.
DATA sm.grade16 (KEEP=sum2 mean2);
SET grade7;
DATA sm.grade17 (DROP=sum1 mean1);
SET grade7;
RUN;
University Information Technology Services
Center for Statistical and Mathematical Computing
SAS Data Management
October 3, 2015
APPEND DATA SETS
•
Use SET with a list of data sets to be
appended.
DATA sm.grade20;
SET sm.grade12 sm.grade13;
•
You may use PROC APPEND or PROC
DATASETS when the master data set is
huge.
PROC APPEND BASE=sm.grade12 DATA=sm.grade13;
RUN;
•
BASE= specifies a master data set and
DATA= a data set to be appended.
University Information Technology Services
Center for Statistical and Mathematical Computing
SAS Data Management
October 3, 2015
SPLIT DATA SETS
•
Use DATA statement. Do not omit =
after DROP and KEEP.
DATA sm.grade21 (DROP=math)
sm.grade22(DROP=stat);
SET sm.grade2;
…
•
Use OUTPUT statement in IF…
THEN… ELSE clause.
DATA …
SET sm.grade7;
IF mean2 GT 90 THEN OUTPUT sm.gradeHigh;
ELSE OUTPUT sm.gradeLow;
…
University Information Technology Services
Center for Statistical and Mathematical Computing
SAS Data Management
October 3, 2015
SORT DATA SETS
•
Sort data sets before using BY in a
DATA/PROC step (e.g., match merge)
PROC SORT DATA=sm.grade21 OUT=grade23;
BY id;
•
PROC SORT by default sorts in a
ascending order. For a descending
order, add DESCENDING before the
variable name
PROC SORT DATA=sm.grade22 OUT=sm.grade24;
BY id DESCENDING mean2;
RUN;
University Information Technology Services
Center for Statistical and Mathematical Computing
SAS Data Management
October 3, 2015
MERGE DATA SETS
•
Simple merge using MERGE.
DATA sm.grade25;
MERGE sm.grade21 sm.grade24;
•
Match merge using MERGE and BY.
DATA sm.grade26;
MERGE sm.grade23 sm.grade24;
BY id;
…
•
•
Be careful when data structure is
different across data sets to be merged.
Can be used for updating data sets.
University Information Technology Services
Center for Statistical and Mathematical Computing
SAS Data Management
October 3, 2015
UPDATE DATA SETS
•
•
UPDATE updates a data set using a
transaction data set.
Data sets need to be sorted in advance.
DATA sm.surveyNew;
UPDATE sm.survey08 sm.survey10
UPDATEMODE=MISSINGCHECK;
BY id;
•
MISSINGCHECK prevents missing in
the transaction file from replacing values
in a master file.
University Information Technology Services
Center for Statistical and Mathematical Computing
SAS Data Management
October 3, 2015
GET AGGREGATE DATA
•
PROC MEANS generates aggregate
statistics into a data set.
PROC MEANS DATA=sm.airline;
VAR cost output;
CLASS airline;
OUTPUT OUT=sm.air_mean1 MEAN=costM outputM;
•
•
Variables costM and outputM are stored
into a new data set air_mean1.
You may specify other statistics as well.
PROC MEANS DATA=sm.airline N SUM MEAN VAR;
VAR cost output fuel load;
CLASS airline;
OUTPUT OUT=sm.air_mean2;
University Information Technology Services
Center for Statistical and Mathematical Computing
SAS Data Management
October 3, 2015
PROC DATASETS
•
Display data sets in the library of
interest.
PROC DATASETS LIBRARY=sm;
RUN;
•
The procedure can also manage (e.g.,
copy, rename, repair, delete) SAS files.
University Information Technology Services
Center for Statistical and Mathematical Computing
SAS Data Management
October 3, 2015
RESOURCES
•
•
•
•
•
•
•
•
•
•
•
•
Burlew, Michele M. 2002. Reading External Data Files Using
SAS: Examples Handbook. Cary, NC: SAS Institute.
Long, J. Scott. 2009. The Workflow of Data Analysis Using
Stata. College Station, TX: Stata Press.
http://support.sas.com/documentation/index.html
http://v9doc.sas.com/sasdoc/
http://www.indiana.edu/~statmath/stat/sas/index.html
http://www.indiana.edu/~statmath/support/bydoc/
http://www.masil.org/sas/datastep.html
http://www.masil.org/sas/statement.html
http://www.masil.org/sas/input.html
http://www.masil.org/sas/import.html
http://www.ats.ucla.edu/stat/sas/
http://www.listserv.uga.edu/archives/sas-l.html
University Information Technology Services
Center for Statistical and Mathematical Computing
Descargar

Presentation Title - Indiana University Bloomington