假设我有一个表:
HH SLOT RN
--------------
1 1 null
1 2 null
1 3 null
--------------
2 1 null
2 2 null
2 3 null
我想设置RN为1到10这是确定的数,整个表重复之间的随机数,但它的坏的重复次数的在任何给定的HH。 。例如,:
I want to set RN to be a random number between 1 and 10. It's ok for the number to repeat across the entire table, but it's bad to repeat the number within any given HH. E.g.,:
HH SLOT RN_GOOD RN_BAD
--------------------------
1 1 9 3
1 2 4 8
1 3 7 3 <--!!!
--------------------------
2 1 2 1
2 2 4 6
2 3 9 4
这是对Netezza公司,如果这有什么差别。这一个是一个真正的headscratcher我。在此先感谢!
This is on Netezza if it makes any difference. This one's being a real headscratcher for me. Thanks in advance!
好了,我不能让一个漂亮的解决方案,所以我做了黑客:
Well, I couldn't get a slick solution, so I did a hack:
创建一个新的名为整型字段 rand_inst
。
分配一个随机数,以每一个空槽。
更新 rand_inst
来成为这个家庭中的随机数的实例编号。例如,如果我得到两个3的,那么第二个3将有 rand_inst
设置为2。
更新表,任何地方分配一个不同的随机数 rand_inst> 1
。
重复作业和更新,直到我们收敛于一个解决方案。
Created a new integer field called rand_inst
.
Assign a random number to each empty slot.
Update rand_inst
to be the instance number of that random number within this household. E.g., if I get two 3's, then the second 3 will have rand_inst
set to 2.
Update the table to assign a different random number anywhere that rand_inst>1
.
Repeat assignment and update until we converge on a solution.
下面是什么样子。懒得其匿名化,这样的名字有点不同,我原来的职位:
Here's what it looks like. Too lazy to anonymise it, so the names are a little different from my original post:
/* Iterative hack to fill 6 slots with a random number between 1 and 13.
A random number *must not* repeat within a household_id.
*/
update c3_lalfinal a
set a.rand_inst = b.rnum
from (
select household_id
,slot_nbr
,row_number() over (partition by household_id,rnd order by null) as rnum
from c3_lalfinal
) b
where a.household_id = b.household_id
and a.slot_nbr = b.slot_nbr
;
update c3_lalfinal
set rnd = CAST(0.5 + random() * (13-1+1) as INT)
where rand_inst>1
;
/* Repeat until this query returns 0: */
select count(*) from (
select household_id from c3_lalfinal group by 1 having count(distinct(rnd)) <> 6
) x
;