我有一个包含超过80000行的大型数据集,按“名称”和“收入”排序(名称和收入都有重复)。至于名字,我想要五个收入最低的人。至于第二个名字,我希望有5最低收入(但收入抽奖到名字,然后取消资格被选中)。以此类推,直到姓(如果当时还有任何收入)。
发布于 2020-10-26 18:55:44
data have;
do n = 1 to 8e5;
do _N_ = 1 to 100;
income = ceil(rand('uniform') * 1e4);
address = cats('Address_', _N_);
output;
end;
end;
run;
data want(drop=c);
if _N_ = 1 then do;
dcl hash h(dataset : 'have(obs=0)', ordered : 'a', multidata : 'y');
h.definekey('income');
h.definedata(all : 'y');
h.definedone();
dcl hiter i('h');
dcl hash inc();
inc.definekey('income');
inc.definedone();
end;
do until (last.n);
set have;
by n;
h.add();
end;
do c = 0 by 0 while (i.next() = 0);
if inc.add() = 0 then do;
c + 1;
output;
end;
if c = 5 then leave;
end;
_N_ = i.first();
_N_ = i.prev();
h.clear();
run;
发布于 2020-10-26 11:19:27
你首先要把收入按名字排列。所以:
proc rank data=yourdata out=temp ties=low;
by name;
var income;
ranks incomerank;
run;
然后,您要按名称筛选5种最低收入,因此:
proc sql;
create table want as
select distinct *
from temp
where incomerank < 6;
quit;
发布于 2020-10-26 13:09:18
你需要对收入进行分类和跟踪
name
.
array
对array
中最低的五个income
进行排序和跟踪,以跟踪和检查income
是否处于output
状态,因此不符合后续名称输出的条件。示例:
插入一种符合条件的低价值收入将被使用,并将很快由于只有5个项目。
data have;
call streaminit(1234);
do name = 1 to 1e6;
do seq = 1 to rand('integer', 20);
income = rand('integer', 20000, 1000000);
output;
end;
end;
run;
data
want (label='Lowest 5 incomes (first occurring over all names) of each name')
want_barren(keep=name label='Names whose all incomes were previously output for earlier names')
;
array X(5) _temporary_;
if _n_ = 1 then do;
if 0 then set have;
declare hash incomes();
incomes.defineKey('income');
incomes.defineDone();
end;
_maxmin5 = 1e15;
x(1) = 1e15;
x(2) = 1e15;
x(3) = 1e15;
x(4) = 1e15;
x(5) = 1e15;
do _n_ = 1 by 1 until (last.name);
set have;
by name;
if incomes.check() = 0 then continue;
* insert sort - lowest five not observed previously;
if income > _maxmin5 then continue;
do _i_ = 1 to 5;
if income < x(_i_) then do;
do _j_ = 5 to _i_+1 by -1;
x(_j_) = x(_j_-1);
end;
x(_i_) = income;
_maxmin5 = x(5);
incomes.add();
leave;
end;
end;
end;
_outflag = 0;
do _n_ = 1 to _n_;
set have;
if income in x then do;
_outflag = 1;
OUTPUT want;
end;
end;
if not _outflag then
OUTPUT want_barren;
drop _:;
run;
https://stackoverflow.com/questions/64535558
复制相似问题