当前位置:网站首页>SAS data processing technology (1)

SAS data processing technology (1)

2022-08-10 23:56:00 metaX

SAS数据处理技术

Quicker_20220810_180014
Quicker_20220810_180014

通常情况下,使用data stepDoing data processing is a good option. Data step for data processing,可以提供.

  • Flexible programming capabilities
  • Rich data processing functions

other tools and techniques

  • SORT, SQL, 以及TRANSPOSEprocess step,It is very useful for data processing and transformation.

  • SASThe macro function makes the code more flexible.

  • 调试(Debugging)技术,Can be used to help identify logical errors.

SAS程序流程

SAS程序
SAS程序

Review some knowledge points

  • SORTprocess step的OUT=Option to specify the creation of a new output dataset, rather than overwriting the original dataset.

  • 用FORMATProcedures can create user-defined output formats and input formats.默认情况下,The created format will be stored in the directory

libname orion 's:\workshop';
data work.qtrlsalesrep;
proc sort data=work.qtrlsalesrep;

proc format;
value $ctryfmt 'AU'='Australia'
	       'US'='United tates';
run;
image-20220810181441305
image-20220810181441305

(选b; a是data那句,c是set那句)

  • INPUT语句赋值语句创建的变量会被重新初始化(Initialized to missing values).
  • SET语句Read variables are not reinitialized(will only be overwritten by subsequent data)

OUTPUTThe statement implements data transposition

Output multiple lines of observations

  • 例中,Each row of records generates three observations.Each new observation will include a codenameIDand a test valueSCORE.
data A;
	input ID $ scorel-score3;
	drop scorel-score3; # Field names are not output(列名)
	score=scorel;output;
	score=score2;output;
	score=score3;output;
cards;
02126 99 96 94
02128 89 90 88
;
proc print;
run;

Use one data step to output create multipleSAS数据集

  • in a data step,It can be done by converting the output dataset 名称列在DATAwithin a statement to create multiple datasets.
  • 通过在OUTPUTSpecify the dataset name within the statement,Direct output to one or more specific datasets is possible
data usa australia other;
set orion.employee_addresses;
if Country='AU'then output australia;
else if Country='US'then output usa;
else output other;
run;

* 或者
data usa australia other;
	set orion.employee addresses;
	select (Country);
		when ('US')output usa;
		when ('AU')output australia;
		otherwise output other;
	end;
run;
*在SELECT语句中使用DO-END,当一个表达式为真时,使用DO-ENDStatements can execute multiple statements

Use conditional statements to control which observation is output to which dataset

  • in the data step,用DROP语句KEEP语句Controls which variables are output to the output dataset.

  • 默认情况下,SASAll observations in the dataset are processed one by one.用FIRSTOBS=和OBS=Options control which observations are processed.

Create an accumulation variable

  • The general form of a summation statement:
variable + expression;
  • 求和语句
    • If the variable to the left of the plus sign did not exist before,create the variable
    • This variable is initialized to the value before the first loop of the data step0
    • This variable is automatically preserved
    • 执行时,Add the value of the expression to the variable
    • 忽略缺失值.

Cumulative summation of grouped data

  • 定义First..和Last.过程
  • Calculates the cumulative sum of grouped data
  • Use subsetsIFstatement to output the specified observation
data deptsals(keep=Dept Deptsal);
	set SalSort;
	by Dept;
	if First.Dept then Deptsal=0;
	Deptsal+Salary;
	if Last.Dept;
run;
proc sort data=sashelp.cars
			out=cars;
	by Make;
run;

data total cars(keep=Make MSRP sum_price);
	set cars;
	by Make;
	if first.Make then sum_price=0;
	sum price+MSRP;
	if last.Make;
run;

proc print data=total_cars noobs;
	var Make sum_price;
	format sum price dollar10.2;
run;

Different types of data are read in different ways

1
1

列输入,Formatted input and list input are usedINPUTThere are three ways a statement reads data.

方式适用情况
列输入The data column is fixed standard data
格式化输入Data columns are fixed standard data and non-standard data
列表输入Standard and non-standard data separated by spaces or other delimiters

列输入(Column Input) Read in with column input,conditions that the data satisfies

  • in fixed fields

  • standard character or numeric values (如 58 -23 67.23 00.99 5.67E5 1.2E-2)

  • INPUTThe general form of the statement input method : INPUT variable <$> startcol-endcol...;

Read raw data files using formatted input method

  • INPUTThe general form of the statement formatted input method: INPUT 指针控制 变量 输入格式...;
  • Formatted input methods read data in the following ways:
    • Move the input pointer to the starting position
    • Name the variable
    • Specifies the input format
    • input @5 FirstName $10.;
    • Column control pointer:@n Move the pointer to n列 +n 将指针向后移动n列

Controls when records are loaded

  • Read multiple records from the raw data file as one observation.

  • DATA SAS-data-set;
    	INFILE 'raw-data-file-name';
    	INPUT specifications;
    	INPUT specifications;
    	<additional SAS statements>
    RUN;
    
  • Read a raw data file of mixed-type records.

  • The row pointer controller controls when new records are loaded

    • DATA SAS-data-set;
      	INFILE 'raw-data-file-name';
      	INPUT specifications/
      	specifications;
      	<additional SAS statements>
      RUN;
      
    • 当SAS遇到一个“/”时,The record for the next row is loaded. Line control pointer:# n 载入第n行 / Load the next line

  • Read a subset of raw data files for mixed-type records.

Other tricks for the list input method

  • Data at the end of the record are missing values.INFILE 'raw-data-file'MISSOVER;

  • Missing data are represented by two consecutive delimitersINFILE 'file-name' DSD;.

  • Each record contains multiple observations.INPUT var1 var2 var3 [email protected]@

原网站

版权声明
本文为[metaX]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/222/202208102331542660.html