(qkdb+) How to group data with slight date discrepancies (+-1 day) for unique identifier creation - Stack Overflow

I am trying to create uniqueid column for each child (as they appear more than once in the table) so th

I am trying to create uniqueid column for each child (as they appear more than once in the table) so that I can identify between children with same name (no child has the same dadname, mumname and childdob)

q)group select dadname, mumname, childdob from peopletb
dadname         mumname            childdob   |                  
----------------------------------------------| -----------------
simon           jessica             2007.03.27| 0 1658 2292 18295
mike            rachel              2007.02.06| 1 4293 10976 33439 53751 63077 81844 90244 98782 ..
james           lucy                2005.04.09| 2 12756 21354
francis         anne                2007.04.12| 3 4151 8252 159075 172735 175531 177199 206953 40..
francis         anne                2007.04.13| 7 3255 7292 128021 143595 157839 168120 186578 21..
mike            rachel              2007.02.05| 10 4292 10946 33339 56751 67077 82844 96244 99381 ..

and so on

The issue is, is that there are some slight discrepancies in the childdob where it has been incremented by one day or the opposite it has gone back one day.

when grouping the data i want it to group childdob +1 or -1 from each other so that the discrepancy is eliminated.

e.g.

mike            rachel              2007.02.05| 10 4292 10946 33339 56751 67077 82844 96244 99381 ..
&
mike            rachel              2007.02.06| 1 4293 10976 33439 53751 63077 81844 90244 98782 ..

would have the same uniqueid

as would

francis         anne                2007.04.12| 3 4151 8252 159075 172735 175531 177199 206953 40..
&
francis         anne                2007.04.13| 7 3255 7292 128021 143595 157839 168120 186578 21..

I am trying to create uniqueid column for each child (as they appear more than once in the table) so that I can identify between children with same name (no child has the same dadname, mumname and childdob)

q)group select dadname, mumname, childdob from peopletb
dadname         mumname            childdob   |                  
----------------------------------------------| -----------------
simon           jessica             2007.03.27| 0 1658 2292 18295
mike            rachel              2007.02.06| 1 4293 10976 33439 53751 63077 81844 90244 98782 ..
james           lucy                2005.04.09| 2 12756 21354
francis         anne                2007.04.12| 3 4151 8252 159075 172735 175531 177199 206953 40..
francis         anne                2007.04.13| 7 3255 7292 128021 143595 157839 168120 186578 21..
mike            rachel              2007.02.05| 10 4292 10946 33339 56751 67077 82844 96244 99381 ..

and so on

The issue is, is that there are some slight discrepancies in the childdob where it has been incremented by one day or the opposite it has gone back one day.

when grouping the data i want it to group childdob +1 or -1 from each other so that the discrepancy is eliminated.

e.g.

mike            rachel              2007.02.05| 10 4292 10946 33339 56751 67077 82844 96244 99381 ..
&
mike            rachel              2007.02.06| 1 4293 10976 33439 53751 63077 81844 90244 98782 ..

would have the same uniqueid

as would

francis         anne                2007.04.12| 3 4151 8252 159075 172735 175531 177199 206953 40..
&
francis         anne                2007.04.13| 7 3255 7292 128021 143595 157839 168120 186578 21..
Share Improve this question asked Nov 18, 2024 at 18:28 bashprofile84bashprofile84 992 silver badges11 bronze badges
Add a comment  | 

1 Answer 1

Reset to default 2

There's a few tricky edge cases when you don't know the true dob, and if you have datapoints both above and below, but something like this might be a starting point:

peopletb:([]dadname:100?`simon`mike`james;mumname:100?`jessica`rachel`lucy;childdob:100?2007.01.03 2007.01.02 2007.01.04 2007.06.01 2007.08.10 2007.08.09 2007.10.23 2007.10.24)

/default grouping
q)select idx:i by dadname,mumname,childdob from `childdob xasc peopletb
dadname mumname childdob  | idx
--------------------------| --------------
james   jessica 2007.01.02| ,9
james   jessica 2007.01.04| ,26
james   jessica 2007.08.09| 56 58 59
james   jessica 2007.08.10| 75 77
james   jessica 2007.10.24| ,96
james   lucy    2007.01.02| ,3
james   lucy    2007.01.03| 13 21 24
...

/step 1 - fudge the date
q)fudge:update childdob:{(y;x)2>y-x}\[childdob]by dadname,mumname from `childdob xasc peopletb;
/step 2 - group the fudged dates
q)select idx:i by dadname,mumname,childdob from fudge
dadname mumname childdob  | idx
--------------------------| --------------------------
james   jessica 2007.01.02| ,9
james   jessica 2007.01.04| ,26
james   jessica 2007.08.09| 56 58 59 75 77
james   jessica 2007.10.24| ,96
james   lucy    2007.01.02| 3 13 21 24
james   lucy    2007.01.04| ,34
james   lucy    2007.08.09| 50 51 55 60 65 66 70 76 78
...

Notice how james/jessica combined 2007.08.09/10 into 09 but didn't combine 2007.01.02/04 (ambiguous, doesn't satisfy 2>y-x). However you could increase the fudge to 3>y-x for more leniency

发布者:admin,转转请注明出处:http://www.yc00.com/questions/1745601746a4635442.html

相关推荐

发表回复

评论列表(0条)

  • 暂无评论

联系我们

400-800-8888

在线咨询: QQ交谈

邮件:admin@example.com

工作时间:周一至周五,9:30-18:30,节假日休息

关注微信