Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Interacting with External Files in SAS: A Deep Dive into FILENAME Statement, Study notes of Finance

The FILENAME statement in SAS, which enables data access from external files. It covers various data layouts, such as fixed width, delimited, and hierarchical files, as well as writing to external files and interacting with FTP servers. Additionally, it discusses advanced topics like pipes, URL access, and SOAP web services.

What you will learn

  • What are the advantages of using the URL access method with the FILENAME statement?
  • How can you use pipes with the FILENAME statement?
  • How do you write to external files using the FILENAME statement?
  • What is the role of the FILENAME statement in interacting with FTP servers?
  • What are the different layouts for reading data into SAS using the FILENAME statement?

Typology: Study notes

2021/2022

Uploaded on 09/27/2022

astarloa
astarloa 🇺🇸

4.2

(12)

299 documents

1 / 29

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
Paper 119-2012
The FILENAME Statement: Interacting with the world outside of SAS®
Chris Schacherer, Clinical Data Management Systems, LLC
ABSTRACT
The FILENAME statement has a very simple purposecreating a symbolic link to an external file or device. The
statement itself does not process any data, specify the format or shape of a data set, or directly produce output of any
type; yet, this simple statement is an invaluable construct that allows SAS programs to interact with the world outside
of SAS. Through specification of the appropriate device type, the FILENAME statement allows you to symbolically
reference external disk files, interact with FTP servers, send e-mail messages, and integrate data from external
programs and processesincluding the local operating system and remote web services. The current work provides
examples of how you can use the different device types to perform a variety of data management tasks.
INTRODUCTION
The wide array of data access methods available to the SAS programmer today has resulted in a decrease in the
number of new SAS programmers who explore the FILENAME statement. In fact, in one recent survey, only 14% of
users reported using the FILENAME statement as a means of accessing their source data (Milum, 2011). Whereas
this decline has come about due, largely, to an improved toolset that makes the majority of data access tasks more
efficient, this increased efficiency has come at the cost of new SAS users not being exposed to this powerful tool.
The current work attempts to close this knowledge gap by introducing the reader to the FILENAME statement,
providing some examples of frequently used techniques for manipulating flat files with the INFILE and FILE
statements, and demonstrating several techniques for leveraging the FILENAME statement against common data
management tasks.
To begin this exploration, consider how the FILENAME statement was described in The SAS Language Guide for
Personal Computers (Release 6.03 Edition) (SAS, 1988):
The FILENAME statement associates a SAS fileref (a file reference name) with an
external file's complete name (directory plus file name). The fileref is then used as a
shorthand reference to the file in the SAS programming statements that access external
files (INFILE, FILE, and %INCLUDE). Associating a fileref with an external file is also
called defining the file. Use the FILENAME statement to define a file before using a fileref
in an INFILE, FILE, OR %INCLUDE statement.
In short, the FILENAME statement creates a symbolic link to an external file (or device) that, in turn, can be used to
refer to that file (or device) elsewhere in the SAS program. In the following code, the file reference "claims" is defined
as the physical file "\\finance\analytics\report\source\2011_07.txt". When this fileref is later encountered in the INFILE
statement, SAS interprets the symbol "claims" as the fully qualified path and file name defined in the FILENAME
statement.
FILENAME claims DISK '\\finance\analytics\report\source\2011_07.txt';
DATA work.claims_2011_07;
INFILE claims; {interpreted as: INFILE '\\finance\analytics\report\source\2011_07.txt';}
INPUT svc_date mmddyy10. account_no $10. principle_dx $8. billed_charges;
RUN;
READING & WRITING EXTERNAL DISK FILES
Far and away the most frequent use of the FILENAME statement is the reading and writing of external disk files. In
the preceding example the file "2011_07.txt" was referenced in the FILENAME statement and subsequently read into
the SAS data file "claims_2011_07" using the INFILE and INPUT statements
1
. Although an exhaustive discussion of
using INFILE and INPUT statements to read external files is beyond the scope of this discussion, a few examples are
provided here. For more in-depth explanation of the INFILE and INPUT statements, the reader is referred to Aster
and Seidman (1997) and Kolbe (1997).
1
The DISK device type was explicitly specified in this filename statement, but because DISK is the default access method, this
keyword can be omitted when reading/writing external disk files.
Data Mana
g
ement
SAS Global Forum 20
1
2
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d

Partial preview of the text

Download Interacting with External Files in SAS: A Deep Dive into FILENAME Statement and more Study notes Finance in PDF only on Docsity!

Paper 119-

The FILENAME Statement: Interacting with the world outside of SAS®

Chris Schacherer, Clinical Data Management Systems, LLC

ABSTRACT

The FILENAME statement has a very simple purpose—creating a symbolic link to an external file or device. The statement itself does not process any data, specify the format or shape of a data set, or directly produce output of any type; yet, this simple statement is an invaluable construct that allows SAS programs to interact with the world outside of SAS. Through specification of the appropriate device type, the FILENAME statement allows you to symbolically reference external disk files, interact with FTP servers, send e-mail messages, and integrate data from external programs and processes—including the local operating system and remote web services. The current work provides examples of how you can use the different device types to perform a variety of data management tasks.

INTRODUCTION

The wide array of data access methods available to the SAS programmer today has resulted in a decrease in the number of new SAS programmers who explore the FILENAME statement. In fact, in one recent survey, only 14% of users reported using the FILENAME statement as a means of accessing their source data (Milum, 2011). Whereas this decline has come about due, largely, to an improved toolset that makes the majority of data access tasks more efficient, this increased efficiency has come at the cost of new SAS users not being exposed to this powerful tool. The current work attempts to close this knowledge gap by introducing the reader to the FILENAME statement, providing some examples of frequently used techniques for manipulating flat files with the INFILE and FILE statements, and demonstrating several techniques for leveraging the FILENAME statement against common data management tasks.

To begin this exploration, consider how the FILENAME statement was described in The SAS Language Guide for Personal Computers (Release 6.03 Edition) (SAS, 1988):

The FILENAME statement associates a SAS fileref (a file reference name) with an external file's complete name (directory plus file name). The fileref is then used as a shorthand reference to the file in the SAS programming statements that access external files (INFILE, FILE, and %INCLUDE). Associating a fileref with an external file is also called defining the file. Use the FILENAME statement to define a file before using a fileref in an INFILE, FILE, OR %INCLUDE statement.

In short, the FILENAME statement creates a symbolic link to an external file (or device) that, in turn, can be used to refer to that file (or device) elsewhere in the SAS program. In the following code, the file reference "claims" is defined as the physical file "\finance\analytics\report\source\2011_07.txt". When this fileref is later encountered in the INFILE statement, SAS interprets the symbol "claims" as the fully qualified path and file name defined in the FILENAME statement.

FILENAME claims DISK '\finance\analytics\report\source\2011_07.txt';

DATA work.claims_2011_07; INFILE claims; {interpreted as: INFILE '\finance\analytics\report\source\2011_07.txt';} INPUT svc_date mmddyy10. account_no $10. principle_dx $8. billed_charges; RUN ;

READING & WRITING EXTERNAL DISK FILES

Far and away the most frequent use of the FILENAME statement is the reading and writing of external disk files. In the preceding example the file "2011_07.txt" was referenced in the FILENAME statement and subsequently read into the SAS data file "claims_2011_07" using the INFILE and INPUT statements^1. Although an exhaustive discussion of using INFILE and INPUT statements to read external files is beyond the scope of this discussion, a few examples are provided here. For more in-depth explanation of the INFILE and INPUT statements, the reader is referred to Aster and Seidman (1997) and Kolbe (1997).

(^1) The DISK device type was explicitly specified in this filename statement, but because DISK is the default access method, this

keyword can be omitted when reading/writing external disk files.

The FILENAME Statement: Interacting with the world outside of SAS®, continued

FIXED WIDTH, UNIFORM LAYOUT

One of the fundamental characteristics that determine the appropriate approach for reading a flat file into SAS is the manner in which the variables are positioned on the data record. “Fixed-Width” data records contain data elements that are located in defined, absolute (i.e., “fixed”) positions on the record; “Delimited” records are those in which values are written one after the other with a delimiter (such as a comma or a tab) defining their relative position on the record. In the case of the medical claims file "2011_07.txt" in the preceding example, the values of the variables are identified in terms of their fixed positions within the data record.

svc_date account_no principle_dx billed_charges 07/01/2011 VGH3344562 V712.4 $234. 07/01/2011 XWY3928957 325.4 $34. |||||||||||||||||||||||||||||||||||||||||| 1 10 20 30 40

The value of "svc_date", for example, lies between positions 1 and 10, "account_no" between positions 13 and 22, etc. When this file is read by the following DATA step, the INFILE statement fetches each data record from the file into the input buffer. The INPUT statement then parses the data into values associated with the listed variables using explicitly defined INFORMATS; "svc_date" is identified to SAS as representing a numeric date in the format of MM/DD/YYYY (or, mmddyy10.), "account_no" is indicated to be a 12-character text string, and "billed_charges" is identified as a numeric value that is being represented in the external file using the dollar12. format.

FILENAME claims DISK '\finance\analytics\report\source\2011_07.txt';

DATA work.claims_2011_07; INFILE claims FIRSTOBS= 2 ; INPUT svc_date mmddyy10. account_no $ 12. prin_dx $ 8. billed_charges dollar12.; RUN ;

Once the DATA step completes, the new dataset "work.claims_2011_07" contains "svc_date" as a SAS date, "billed_charges" as a numeric value, and "account_no" and "prin_dx" as character data.

This form of the INPUT statement is referred to as LIST ENTRY because the variables are simply listed in the order they are encountered on the data record. An alternative form of the input statement (one that is strongly recommended for fixed-width files) is to use an explicit column pointer (@) followed by the position at which the variable is located, the variable name, and the INFORMAT used to read the data into that variable. This method of declaring variables in the INPUT statement is referred to as COLUMN ENTRY. Using column entry, the previous example might be rewritten as follows:

FILENAME claims DISK '\finance\analytics\report\source\2011_07.txt';

DATA work.claims_2011_07; INFILE claims FIRSTOBS= 2 ; INPUT @ 1 svc_date mmddyy10. @ 13 account_no $10. @ 25 prin_dx $6. @ 31 billed_charges dollar12.; RUN ;

In addition to forcing you to confirm your knowledge of the data structure before reading in the data, this method also facilitates validation of your program against data mapping documentation supplied with the file. By explicitly identifying the location and INFORMAT of each variable being read, the intentions of your program are made explicit and errors in the INPUT statement can be detected more readily.

The preceding two examples also introduce an important OPTION to the INFILE statement—FIRSTOBS. FIRSTOBS identifies the row at which the INFILE statement begins reading data from the file. Because the first record on the file "2011_07.txt" contains variable names, FIRSTOBS is set to "2".

The FILENAME Statement: Interacting with the world outside of SAS®, continued

svc_date,account_no,principle_dx,billed_charges,...,city_state 07/01/2011,VGH3344562,V712.4,$234.55,…,"LaCrosse, WI" 07/01/2011,XWY3928957,325.4,$34.55,…,"Mount Horeb, WI" ||||||||||||||||||||||||||||||||||||...||| 1 10 20 30 ...

If no value is provided for LRECL, the values for "city_state" in the following two records would (by default) be truncated at the 256 th character. DATA work.claims_2011_07; INFILE claims FIRSTOBS= 2 DLM =',' DSD MISSOVER; INPUT svc_date :mmddyy10. account_no :$10. city_state :$25.; RUN ;

With the LRECL option assigned to accommodate the longest record in the file, the entire line of data will be read into the (expanded) buffer and the full values of "city_state" will be assigned.

DATA work.claims_2011_07; INFILE claims FIRSTOBS= 2 DLM =',' DSD LRECL= 500 MISSOVER; INPUT svc_date :mmddyy10. account_no :$10. city_state :$25.; RUN ;

END-OF-LINE OPTIONS - If the INPUT statement reaches the end of a record without having assigned all variables a value (even a “null” value), the default behavior is for the INPUT statement to continue reading data by proceeding to the next record and looking for values to assign to the remaining variables in the "current" record—the FLOWOVER option. In the following example, the first record of the dataset has no value for "city_state". If the default FLOWOVER option is allowed to control the reading of the data from this dataset, INPUT will continue to the second line and read "07/01/2011" as the value of "city_state".

svc_date,account_no,principle_dx,billed_charges,city_state 07/01/2011,VGH3344562,V712.4,$234. 07/01/2011,XWY3928957,,,"Mount Horeb,WI"

This is rarely the desired behavior. To correct this situation, you can use the MISSOVER option, which keeps the INPUT statement from reading data from the next line and, instead, assigns missing values to all variables that do not yet have a value assigned at the time the end of the data line is reached. Similarly, when reading delimited data (or list-entry fixed-width data), the TRUNCOVER option generates the same result 2

. The STOPOVER option, on the other hand, raises an error and ends the DATA step if the end of a record is reached before all variables have been assigned a value.

To process the preceding dataset without error, the MISSOVER option is used—resulting in the first record having a missing value for "city_state".

DATA work.claims_2011_07; INFILE claims FIRSTOBS= 2 DLM =',' DSD LRECL= 500 MISSOVER; INPUT svc_date :mmddyy10. account_no :$10. prin_dx :$6. billed_charges :dollar12. city_state :$25.; RUN ;

(^2) See Cates (2001) for in-depth explanation of situations in which these options differ.

The FILENAME Statement: Interacting with the world outside of SAS®, continued

HIERARCHICAL FILES

The previous examples of reading data from a physical file assume that the file has a consistent structure across all lines of data. However, source system files can take on many forms that violate this assumption. One common variant is the hierarchical file—in which header records containing a full complement of identifying information are followed by subordinate records in which data elements from the header are implied by the order of the records on the file.

member svc_date account_no cpt charge

Jones, Mary 07/02/2011 VGH3344562 73564 410. 99214 150. Smith, Michael 07/02/2011 XWY3928957 90761 192. 82565 25. 82310 18. ||||||||10||||||||20||||||||30||||||||40||||||||50||||||||60|

If you were to read the preceding file using the methods demonstrated in the previous examples, only two of the five rows of data (the header records) would contain the member name, service date, and account number—making programmatic summarization of these data by member, service date, or account number impossible.

FILENAME clm_dtl DISK 'C:_CDMS\SGF 2012\filename_Data\hierarchy.txt';

DATA work.claim_detail; INFILE clm_dtl FIRSTOBS= 2 ; INPUT @ 1 member $18. @ 20 svc_date mmddyy10. @ 32 account_no $10. @ 46 cpt $5. @ 52 billed_charges best12.; RUN ;

The missing data elements in the subordinate rows need to be populated with data from the corresponding header row; this can be achieved using the RETAIN statement. The RETAIN statement allows you to define a variable such that it will "retain" its value across records—until you assign the retained variable a new value. The following code takes advantage of this characteristic of "retained" variables to populate the "member", "svc_date", and "account_no" variables in the "claim_detail" dataset. First, the RETAIN statement specifies the variables that will retain their values. Then, as each record is processed, the value of "member" is assessed to determine if it has a value. If the current record has a non-null value for “member”, the values of the retained variables “member_r”, "svc_date_r", and "account_no_r" are assigned the values of "member", "svc_date", and "account_no", respectively, from that record. Once assigned a value, these retained variables carry that value forward from record to record until a new value is assigned—which occurs, in this example, when the next header record is encountered. When records with null values of "member" (i.e., the subordinate rows) are encountered, the retained values are used to assign values to "member", "svc_date", and "account_no".

FILENAME clm_dtl DISK '\finance\analytics\report\source\2011_07_DETAIL.txt';

DATA work.claim_detail; RETAIN member_r svc_date_r account_no_r; INFILE clm_dtl FIRSTOBS= 2 ;

The FILENAME Statement: Interacting with the world outside of SAS®, continued

WRITING TO EXTERNAL FILES

Finally, in addition to reading from external files, one can also write to a fileref using FILE and PUT statements. In the following example, the "claim_detail" dataset is written to "c:\test.txt" as a comma-delimited file with the "member" field excluded. It should be noted here that not only is it easy to control which columns are included in the output file (by including the variable name in the PUT statement), but you can also rearrange the order of the variables in the new file by placing them in the desired output order in the PUT statement.

FILENAME clm_dtl DISK 'c:\test.txt';

DATA NULL;

FILE clm_dtl DLM=','; SET work.claim_detail; PUT account_no svc_date :MMDDYY10. cpt billed_charges :DOLLAR12.2; RUN ;

Similarly, using column pointers to place variables at specific locations on the file record, you could create a new fixed-width text file.

FILENAME clm_dtl DISK 'c:\test.txt';

DATA NULL;

FILE clm_dtl; SET work.claim_detail; PUT @10 account_no @30 svc_date mmddyy 10. @50 cpt @70 billed_charges dollar10.2; RUN ;

This type of manipulation is often required for the submission of flat file datasets to regulatory agencies or vendors that require specific file layouts for input to their data processing systems.

VGH3344562 07/02/2011 73564 $410. VGH3344562 07/02/2011 99214 $150. XWY3928957 07/02/2011 90761 $192. XWY3928957 07/02/2011 82565 $25. XWY3928957 07/02/2011 82310 $18. ||||||||10||||||||20||||||||30||||||||40||||||||50||||||||60||||||||70|||||||

INTERACTING WITH FTP SERVERS

The examples in the previous section will help you get started reading and writing files to/from your local computer or LAN directory, but enterprise-level files are sometimes made available only through a central FTP server. Although obtaining these files via your favorite FTP client is usually a relatively simple task, the FILENAME statement's FTP access method can be used to fully automate the retrieval and processing of these files.

In the following example, the "claims" fileref is specified as a pointer to the file "2011_07.txt" via the FTP access method. When the fileref is referenced later in the program, SAS will attempt to establish an FTP connection to the server (HOST) "cdms-llc.com", using the credentials of user "chris". Once connected to the FTP server, the CD option is used to change the working directory on the FTP server to the directory containing the source file

The FILENAME Statement: Interacting with the world outside of SAS®, continued

“20 1 1_07.txt” 3. Finally, the PROMPT and DEBUG options are included, respectively, to issue a prompt for entry of the user's password and to write informational messages about the FTP transaction to the SAS log.

FILENAME claims FTP '2011_07.txt' USER = 'chris' HOST = 'cdms-llc.com' CD = "ftproot\public" PROMPT DEBUG;

Once the fileref is defined, you are ready to use it in a DATA step. Note in the following example that the DATA step also creates a new variable "discount_rate" based on the conditional processing of a variable being read by the INPUT statement. This demonstrates that as soon as a variable is defined in the INPUT statement, it exists as a variable on the current record and can be used in other transformations performed in the DATA step.

DATA monthly_claims; INFILE claims FIRSTOBS= 2 MISSOVER LRECL= 300 ; INPUT @ 1 svc_date mmddyy10. @ 13 account_no $10. @ 25 prin_dx $6. @ 273 billed_charges dollar12.;

IF billed_charges ne. THEN discount_rate = .8 *billed_charges; ELSE discount_rate = 0. ; RUN ;

Once the fileref is accessed by the INFILE statement, the specified FTP connection is attempted and the user will be prompted with a dialog box asking for the password associated with the USER account. After the password is provided and authenticated by the HOST, the connection is completed. At this point the CD command is executed— navigating to the directory containing the source file—and retrieval of the file begins. The data lines are then read sequentially from the input buffer and are parsed into values that are assigned to the defined variables. Because the DEBUG option was specified, the LOG includes the following messages describing the connection and data transfer:

NOTE: >>> USER chris NOTE: <<< 331 Password required for chris. NOTE: >>> PASS XXXXXXXXXXXX NOTE: <<< 230-Welcome to FTP server NOTE: <<< 230 User chris logged in. NOTE: >>> PORT 192,168,1,105,213, NOTE: <<< 200 PORT command successful. NOTE: >>> TYPE A NOTE: <<< 200 Type set to A. NOTE: >>> CWD ftproot\public
NOTE: <<< 250 CWD command successful. NOTE: >>> PWD NOTE: <<< 257 "/chris/ftproot/public" is current directory. NOTE: >>> RETR 2011_07.txt NOTE: <<< 150 Opening ASCII mode data connection for 2011_07.txt(372 bytes). NOTE: User chris has connected to FTP server on Host cdms-llc.com. NOTE: The infile CLAIMS is: Filename=2011_07.txt, Pathname= "/chris/ftproot/public" is current directory, Local Host Name=M2400, Local Host IP addr=192.168.1.100, Service Hostname Name=cmds-llc.com, Service IP addr=64.78.48.48,Service Name=FTP, Service Portno=21,Lrecl=300,Recfm=Variable NOTE: <<< 226 Transfer complete. NOTE: >>> QUIT

(^3) It should be noted that the FTP options require syntax specific to the OS and file system hosting the FTP server. As such, you

should work with the administrator of the FTP server if you encounter problems with the FTP options sent by your fileref.

The FILENAME Statement: Interacting with the world outside of SAS®, continued

Once the macro variable is assigned the value of the password, it can be assigned as the value of "PASS", and the password is no longer shared in the .sas file.

FILENAME claims FTP '2011_07.txt' USER='chris' PASS=&PASSWORD HOST = 'schacherer.com' CD ="ftproot\public" DEBUG;

The previous FTP example involved reading data from a single file on a single server. However, when creating analytic datasets, the product of your data processing sometimes needs to be sent to another location for further processing (e.g., loaded into a business intelligence system, sent to a third-party data aggregator, or reported to a regulatory agency). In the following example, two filerefs ("source" and "target") based on the FTP access method are defined. Following assignment of the filerefs, a DATA NULL step is executed to read the external file "source", perform a series of data transformations, and write a comma-delimited file out to the file "target".

FILENAME source FTP '2011_07.txt' USER='chris' PASS=&PASSWORD HOST = 'cdms-llc.com' CD ="ftproot\public" DEBUG; FILENAME target FTP 'HOSPITALXYZ_QUALRPT_2011_07.txt' USER='chris' PASS=&PASSWORD HOST = '' CD ="reports\hospital\XYZ" DEBUG;

DATA NULL;

INFILE source FIRSTOBS= 2 MISSOVER LRECL= 300 ; INPUT @ 1 svc_date mmddyy10. @ 13 account_no $10. @ 25 prin_dx $6. @ 273 billed_charges dollar12.; IF billed_charges ne. THEN discount_rate = .8 *billed_charges; ELSE discount_rate = 0. ; … … FILE target DLM=','; PUT account_no svc_date :mmddyy10. cpt billed_charges :dollar12.2; RUN ;

If you wanted to automate this program to run each month as a scheduled batch job, you could change the program in the following manner to enable it to run on the first day of each month without having to edit the program to change the source file name. In this example, it is assumed that the source file is written on the last day of each month with the filename in the format of "YYYY_MM". Therefore, using the appropriate nested functions, you could dynamically specify the name of the source file to be read. When run in September, 2011 the concatenated macro variables in the FILENAME statement resolve to the name of the source file generated at the end of the previous month— "2011_08.txt".

%LET SOURCE_FILE1 = %SYSFUNC(YEAR(%SYSFUNC(INTNX(MONTH,%SYSFUNC(TODAY()),-1))))_; %LET SOURCE_FILE2 = %SYSFUNC(MONTH(%SYSFUNC(INTNX(MONTH,%SYSFUNC(TODAY()),-1))),Z2.).txt;

FILENAME source FTP "&source_file1&source_file2" USER='chris' PASS=&PASSWORD HOST = 'schacherer.com' CD ="ftproot\public" DEBUG;

Whether you use macro variables to dynamically assign filenames, directory locations, or other parts of the fileref or hard-code values and manage changes to them manually, the FTP access method can significantly improve the efficiency with which you process data files stored on FTP servers. It should be noted, however, that because the FTP servers with which you will be interacting are often on different hardware/software platforms than the one on which your SAS software is operating, you may encounter unfamiliar errors referring to system constructs you do not understand. When that happens, work with the administrator of the FTP server to troubleshoot the issue; he or she has expertise on the particular platform to which you are attaching and will likely be able to help you debug your code.

In the previous examples the FTP device type was used to connect to remote FTP servers, issue FTP commands, and retrieve files for processing in SAS. Similarly, the PIPE command can be used to initiate a wide array of processes and capture the data returned by those processes.

The FILENAME Statement: Interacting with the world outside of SAS®, continued

PIPES

"A pipe is a channel of communication between two processes." (SAS, 2011b) An unnamed pipe establishes one- way communication between two processes—wherein one process writes to the pipe and the other reads from it. A named pipe enables two-way communication wherein data can pass in both directions. The examples provided here focus on unnamed pipes in which SAS invokes a process outside of SAS and then reads the data written to the pipe by the invoked process. The following example (adapted from Varney, 2008) shows how to use the PIPE device type to create a dataset containing the hierarchical directory structure of the directory "C:\ETL". The fileref "contents" is defined in terms of a call to the DOS command "TREE". When the INFILE statement initiates retrieval of the fileref "contents" the DOS command "TREE C:\ETL" is executed with the switches "/F" and "/A". Instead of returning these data to a DOS command window, as would happen if you were executing this command from the command line, the data are returned through the communication channel (the pipe) defined in the filename statement.

FILENAME contents PIPE 'TREE "C:\ETL" /F /A' LRECL= 2000 ;

DATA etl_source; INFILE contents TRUNCOVER; INPUT content_entry $char2000.; RUN ;

You could use this information (for example) in an automated program to check for the existence of a given file and conditionally stop the program if the file is not found (Flavin & Carpenter, 2001; Schacherer, 2010). Having read the directory contents with the preceding DATA step, the following code queries the "etl_source" dataset to determine whether the "C:\ETL" directory contains the source file your program will later attempt to process. The result of this query is assigned to the macro variable "file_exists", and the value of "file_exists" can then be evaluated to determine the appropriate course of action.

PROC SQL NOPRINT; SELECT count(*) INTO :file_exists FROM etl_source WHERE compress(content_entry) = "2011_08.txt"; QUIT ;

If the source file was found in the "etl_source" dataset, the macro variable "file_exists" will have a value of "1" and the outcome of the following DATA NULL step will be the assignment of a null value to the macro variable "terminator". When "&terminator" is encountered later in the program, it will resolve to a null, nonexistent value that is not interpreted as executable SAS code. If, on the other hand, the file was not found in the query of "etl_source" (i.e., "file_exists" = 0), "terminator" is assigned the value "ENDSAS;", and when "&terminator" resolves following the DATA step, the SAS session is terminated.

DATA NULL; IF &file_exists = 1 THEN CALL SYMPUT('terminator',''); ELSE CALL SYMPUT('terminator','ENDSAS;'); RUN ;

&terminator

The FILENAME Statement: Interacting with the world outside of SAS®, continued

DATA NULL;

IF &file_size < 500 THEN CALL SYMPUT('terminator','%PUT Error-Source file too small to continue; ENDSAS;'); ELSE CALL SYMPUT('terminator',''); RUN ;

&terminator

Finally, in addition to using PIPE to execute commands that generate information consumed by SAS, the PIPE device type can be used to execute commands that impact the system on which SAS is running. As demonstrated by Varney (2008) and Wei (2009), filerefs indicating the PIPE device type can be used to issue MKDIR commands to create new directories in either a static or dynamic manner. In the following example, a new directory "c:\transformed claims" is created so that it can be used to save a copy of a SAS dataset from the WORK library to a persistent storage area.

FILENAME new_dir PIPE 'MKDIR "c:\transformed claims"' ;

DATA NULL;

INFILE new_dir; RUN ;

LIBNAME persist 'c:\transformed claims';

DATA persist.claims_2011_ 08 ; SET work.claims_2011_ 08 ; RUN ;

These examples merely scratch the surface of what can be achieved with the PIPE device type. The ability to interact with other programs and processes opens up a number of possibilities for developing creative SAS solutions. Another device type that allows you to do some very creative things with SAS is the EMAIL device type.

E-MAILING FROM SAS

Using the EMAIL device type, you can send "success/failure" messages at the end of long-running processes, keep track of multi-stage processes by sending status messages following milestone events in your programs, and automate the delivery of reports and output. For example, you could put the following code at the end of your SAS program to notify you when the program has finished. In this example, the fileref "job_done" is defined with the EMAIL access method, and the options "to" and "subject" are added to define the recipient of the e-mail and the subject line. When referenced in the DATA NULL step, the fileref, by default, invokes the installed MAPI client to send the e-mail using the default user account. The PUT statements that follow the FILE statement define the content of the e-mail message, and the message is queued for delivery when the RUN statement is encountered.

FILENAME job_done EMAIL to="cschacherer@cdms-llc.com" subject="Job XYZ Finished"; DATA NULL; FILE job_done; PUT "Job XYZ has finished running."; PUT " "; PUT "Please review the .LOG file for job details"; RUN ;

The delivered message is depicted below in the recipient's e-mail inbox.

The FILENAME Statement: Interacting with the world outside of SAS®, continued

As previously mentioned, the EMAIL access method defaults to using your system's MAPI client. Unfortunately, for many users, programmatically invoking your e-mail client will result in a dialog box similar to the following:

The user must then explicitly "Allow" the e-mail message to be sent; which is fine if you are working with your SAS session interactively, but causes significant problems if your intention is to schedule the SAS job to run unattended. In the case of unattended SAS programs, you will want to change the option specifying the e-mail system to use when sending e-mail from SAS. By specifying SMTP as your default e-mail system, you bypass the MAPI client and access you SMTP server directly—essentially, generating the e-mail directly on the server instead of sending it through your MAPI client. An example of this approach is provided below.

OPTIONS EMAILSYS="SMTP" EMAILHOST="mail..com" EMAILPORT= 25 EMAILID="chris@.com" EMAILPW=mysecretpassword;

Now when the "job_done" fileref is defined with the EMAIL access method, it is referencing SMTP as the e-mail system, instead of MAPI. It will use the e-mail credentials specified in EMAILID and EMAILPW to connect to the SMTP server through port 25 and send the e-mail without going through the e-mail client and requiring manual intervention.

FILENAME job_done EMAIL to="cschacherer@cdms-llc.com" subject="Job XYZ Finished"; DATA NULL; FILE job_done; PUT "Job XYZ has finished running."; PUT " "; PUT "Please review the .LOG file for job details"; RUN ;

The FILENAME Statement: Interacting with the world outside of SAS®, continued

FILENAME job_done EMAIL to="cschacherer@cdms-llc.com" subject="Job XYZ Finished";

When established as part of the fileref definition, these options serve as the default values for emails generated by invoking that fileref. However, EMAIL options can also be assigned in the FILE statement, and options indicated in the FILE statement override those defined in the FILENAME statement. In the following example, even though the "job_done" fileref specifies "cschacherer@cdms-llc.com" as the recipient, the e-mail will be sent to "admin@cdms- llc.com" as specified in the FILE statement.

DATA NULL; FILE job_done to="admin@cdms-llc.com" subject="Job 123 Finished"; PUT "Job XYZ has finished running."; PUT " "; PUT "Please review the .LOG file for job details"; RUN ;

Similarly, the !EM_! notation embedded in a PUT statement can be used to override options specified in the FILENAME and FILE statements. This notation can also be used to control the e-mail activity being generated by the DATA step. The following example (adapted from Tilanus, 2008 & SAS, 2011c), depicts a dataset of monthly healthcare claim totals for a set of client companies. Each record in the dataset contains the company name, the e- mail address of a contact person at that company, and the healthcare claims total for the month. Using the PUT statement and the !EM_! notation, an e-mail containing the total claims for a given company will be sent to that company's contact.

As the dataset "claim_total_list" is processed in the following DATA NULL step, each invocation of the "claims" file reference will generate an e-mail for which the "subject" and "to" attributes are dynamically assigned using the values of "company" and "email_address" in the current record, and the body of each e-mail will dynamically change to include the "claim_total".

FILENAME claims EMAIL;

DATA NULL ;

SET claim_total_list END=last_record; FORMAT claim_total dollar12.; FILE claims ; PUT '!EM_SUBJECT! Monthly Claim Total for Client:' company; PUT '!EM_TO!' email_address; PUT 'Total Claims for the Month: ' claim_total; PUT '!EM_SEND!'; PUT '!EM_NEWMSG!'; IF last_record THEN PUT '!EM_ABORT!'; RUN ;

!EM_SEND! and !EM_NEWMSG! play an important role in generating multiple e-mail messages from the "claim_total_list" dataset. Without these two directives, only one e-mail would be sent; after all of the records have been processed, a single e-mail would be sent to the e-mail address listed on the last record and the subject line of the e-mail would include the name of the company on that last record. As each record is encountered in the DATA step, the !EM_TO! and !EM_SUBJECT! directives reassign the value of the "subject" and "to" e-mail attributes, but by default no e-mail message is sent until the DATA step completes. Interestingly, however, the PUT statements are still being executed for each record in the dataset; so, in addition to the "to" and "subject" attributes being reassigned for each record, a line of text stating "Total Claims for the Month: …" is being generated to the body of the e-mail.

The FILENAME Statement: Interacting with the world outside of SAS®, continued

The !EM_SEND! directive makes sure that a "send" is executed for each record in the dataset. After the e-mail message is completed and sent, the !EM_NEWMSG! directive purges the content of the message associated with the current record, so that the next record processed by the DATA step creates a new message associated with that record. Finally, after the last record is processed, the !EM_ABORT! directive is issued to stop processing directed at the fileref at the top of the DATA step. The result is one e-mail sent for each record in the dataset.

In addition to generating data-driven, dynamic e-mails that incorporate SAS data into the body of the e-mail, the EMAIL access method can also be used to deliver output generated by the Output Delivery System® (ODS) as an attachment to the e-mail message. In the following example, a monthly claims report is generated for client company "xyz" by running a PROC TABULATE within an ODS statement.

ODS PDF FILE='C:\monthly claims report.pdf' STYLE = Journal TITLE 'Claim Payments by Paid Date'; PROC TABULATE DATA=xyz; CLASS datepaid; VAR payment ; TABLE datepaid='',sum=''(payment='Claim Total')FORMAT=DOLLAR20. /BOX='Paid Date'; RUN ; ODS PDF CLOSE;

Once the file is generated, it is attached to an e-mail by specifying the name and location of the file in the ATTACH atttribute of the e-mail.

FILENAME report EMAIL;

DATA NULL;

FILE report to="cschacherer@cdms-llc.com" subject="Monthly Claims Report for Company XYZ " attach="c:\monthly claims report.pdf"; PUT "Attached is the monthly report for Company XYZ."; PUT " "; PUT "Please let us know if you have any questions about the summarized claims."; RUN ;

The FILENAME Statement: Interacting with the world outside of SAS®, continued

FILENAME claims URL 'http://www.cdms-llc.com/sgf2012/data/2011_0 7 .txt';

DATA work.claims_2011_0 7 ; INFILE claims FIRSTOBS= 2 ; INPUT svc_date mmddyy10. account_no $12. prin_dx $8. billed_charges dollar12.; RUN ;

In order to facilitate the resolution of errors encountered using the URL access method, it is a good idea to add the DEBUG option to the FILENAME statement. The DEBUG option will generate detailed information to the LOG regarding the HTTP communications between SAS and the web server. Without the DEBUG option, the following error (caused by misspelling the source file name) can be a bit confusing.

38 FILENAME claims URL 'http://www.cdms-llc.com/sgf2012/data/ 201107.txt '; 39 40 DATA work.claims_2011_07; 41 INFILE claims FIRSTOBS=2; 42 INPUT svc_date mmddyy10. account_no $12. prin_dx $8. billed_charges dollar12.; 43 RUN;

ERROR: Invalid reply received from the HTTP server. Use the debug option for more info.

Re-running the same code with the DEBUG option specified, you can see the GET request being sent from SAS, and the response informs you that the specified web page was not found ("HTTP/1.1 404 Not Found").

FILENAME claims URL 'http://www.cdms-llc.com/sgf2012/data/ 201107 .txt ' DEBUG;

DATA work.claims_2011_0 7 ; INFILE claims FIRSTOBS= 2 ; INPUT svc_date mmddyy10. account_no $12. prin_dx $8. billed_charges dollar12.; RUN ;

NOTE: >>> GET /sgf2012/data/201107.txt HTTP/1. NOTE: >>> Host: www.cdms-llc.com NOTE: >>> Accept: /. NOTE: >>> Accept-Language: en NOTE: >>> Accept-Charset: iso-8859-1,*,utf- NOTE: >>> User-Agent: SAS/URL NOTE: >>> NOTE: <<< HTTP/1.1 404 Not Found NOTE: <<< Content-Length: 1635 NOTE: <<< Content-Type: text/html NOTE: <<< Server: Microsoft-IIS/6. NOTE: <<< X-Powered-By: ASP.NET NOTE: <<< Date: Thu, 25 Aug 2011 20:20:32 GMT NOTE: <<< Connection: close NOTE: <<< ERROR: Invalid reply received from the HTTP server. Use the debug option for more info.

Correcting the file name addresses this issue, and you are able to access the data file.

The FILENAME Statement: Interacting with the world outside of SAS®, continued

Admittedly, serving up raw text files as web pages is not very common; it is more likely that the data you will access with the URL method are served up via the web in a more "human readable" form that makes it attractive and more user-friendly to someone viewing the data—what Richardson & Ruby (2007) refer to as the "human web". The data you are trying to access programmatically is likely in the form of HTML or some other web-centric format focused on making data consumable by humans.

As a result, the preceding table of July 2011 claims data is very easy for a human to understand, however, the following example of the data contained in the file "2011_07.htm" suggests that you will need to parse each line of the file using conditional logic and SAS functions in order to produce the corresponding SAS dataset. Although it is absolutely achievable, it often requires a significant bit of trial and error.

**07/01/2011**

_[additional html]_

**VGH3344562**

_[additional html]_

**V712.4**

_[additional html]_

**$234.55**

The flip-side of the "human web" is the "programmable web." A term Richardson & Ruby (2007) use to describe "programmer-friendly technologies for exposing a web site's functionality in officially sanctioned ways—RSS, XML- RPC, and SOAP." They argue that these two webs should be reunited with the goal of creating a "network that you can use whether you're serving data to human beings or computer programs." Whereas the current author does not disagree with this goal, he is admittedly biased toward the programmable web—with its drive toward providing data to eager analysts.

In the following example (adapted from Mack, 2010), the URL method is used to query a RESTful web service available at "hipaaspace.com". In this example, the proprietary "getcode" service associated with National Drug Code (NDC) data is being queried to return descriptive data associated with a specific NDC^5. In order to send a request to the "getcode" service using the URL access method, the parameters expected by the service are assigned to the macro variables "ndc", "return_type", and "pass_code"—representing, respectively, the NDC code for which information is being requested (ndc), the form of the data to be returned in response to the request (return_type), and a security token that identifies you as an authorized user (pass_code) 6

. In assigning the value of macro variable "ndc", %QSYSFUNC (which masks special characters in the arguments of the executed functions) executes the URLENCODE function (which encodes arguments with the URL escape syntax).

5 Please note that in this example, values for the "pass_code" and "target" macro variables have been wrapped to maintain the readability of the code in the paper; in order to run this example, the whitespace inserted in these values should be removed. (^6) As of the time the paper was written, the security token in this example was available from hipaaspace.com to allow prospective

users to test their web services