




Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
How to concatenate and interleave sas data sets using proc append and the set statement. It covers the differences in behavior when variables have different attributes, lengths, and types. It also discusses the use of force option and interleaving with by statement and sort procedure.
Typology: Study notes
1 / 8
This page cannot be seen from the preview
Don't miss anything!
When Variables Have Different Attributes If a variable has different attributes in the BASE= data set than it does in the DATA= data set, the attributes in the BASE= data set prevail. In the cases of differing formats, informats, and labels, the concatenation succeeds. If the length of a variable is longer in the BASE= data set than in the DATA= data set, the concatenation succeeds. If the length of the variable is longer in the DATA= data set than in the BASE= data set, or if the same variable is a character variable in one and numeric in the other, PROC APPEND fails to concatenate the files unless you specify the FORCE option. Using the FORCE options has the following consequences: The length specified in the BASE= data set prevails. Therefore, the SAS System truncates values from the DATA= data set to fit them into the length specified in the BASE= data set (or pads them with blanks). The type specified in the BASE= data set prevails. The procedure replaces values of the wrong type (all values for the variable in the DATA= data set) with missing values.
Choosing between PROC APPEND and the SET Statement If two data sets contain the same variables and the variables possess the same attributes, the file that results from concatenating them with PROC APPEND is the same as the file that results from concatenating them with the SET Statement. However, PROC APPEND does this much faster (especially if the BASE= data set is large -- you are avoiding the processing of all that data). The two methods differ enough when the variables or their attributes don’t match, that you must consider the differences in behavior before you decide which method to use.
Different lengths If the same variable has a different length in two or more data sets, uses the length from the data set you name first in the SET statement. Requires the FORCE option if the length of a variable is longer in the DATA= data set. Truncates the values of the variable to match the length in the BASE= data set. Different types Doesn’t concatenate Requires the FORCE option to concatenate. Use type from the BASE= data set and assigns missing values to the variable in observations from the DATA= data set.
***** Interleaving SAS Data Sets ***** Interleaving combines individually sorted SAS data sets into one sorted data set, using SET statements and BY statements. The number of observations in the new data set is the sum of the number of observations in the original data sets. How to use the By statement How to sort data sets to prepare for interleaving How to use the SET and BY statement together to interleave observations. Using BY-Group Processing The BY Statement specifies the variable or variables by which you want to interleave the data sets. To understand this, we first review our understanding of BY variables, BY values and BY Groups. BY variable – is a variable named by the BY statement. BY value – is the value of a BY variable. BY group – is all observations with the same value for all BY variables. In discussions of interleaving, BY groups commonly span more than one data set. If you use more than one variable in a BY statement, a BY group is a group of observations with a unique combination of values for those variables.
We have two data sets (from two divisions), each containing the variables Project is a unique code that identifies the project Dept Is the name of the department involved in the project Manager is the name of the manager of the dept. Headcoun is the number of people working for the manager on the project
( See program SAS_Create_interleav_randd.sas ) Note: Data Set randd is already sorted by PROJECT. ( See program SAS_Create_interleav_pubs.sas ) Note: Data Set pubs has variables in a different order and is not sorted by PROJECT. We want to combine the data sets by PROJECT so that the new data set shows the resources that both divisions are devoting to each project. Both data sets must be sorted by PROJECT before you can interleave them.
Interleaving the Data Sets To interleave the data set INTLEAVE.RANDD and the data set INTLEAVE.PUBS, use the SET statement and the BY statement as follows: data randdpub; set intleave.randd intleave.pubs; by project; run; ( See program SAS_Interleave_randd-pubs.sas ) Note, we did not have to sort INTLEAVE.RANDD since it was already sorted by PROJECT; otherwise we should sort it first.