2019-10-09
Grouping date-ranges and Islands with uniq ID's without Partition by seperator
stackoverflow
Question

I need some help with grouping based on date-ranges, where I do not have any obvious Partition by seperator. I have this dataset (Table New_Test), where every ID has an unknown amount of month+year entries but if they are connectected, they should get the same grouping ID (in a new column). E.g. If Month 9, Year 2016 is followed by Month 10, Year 2016 they should both get the Grouping-ID 1. If there is a gap, which is the case from Month 3, 2018 to Month 5, 2018 a new Grouping ID should be assigned.

To find the consecutive order of month, I draw out Match1 and Match2 (helper-columns) which is a lead and lag function of the previous and next Start- and EndDate.

To assign a Grouping-ID I've tried an IIF-formula, where a match between either StartDate = Match2 or EndDate = Match1 is assigned the value 1 or else 0. I've tried substituting the 1 with various versions of Dense_rank, rank, Row_number. If I use Dense_Rank() OVER (PARTITION BY ID ORDER BY ID), I get the Grouping-ID values 1,0,1 for the three group ranges regarding ID 1 instead of the goal of 1,2,3 as I have no usable seperator in my dataset. This means that when I later on wants to group these data by date-range MIN and MAX these 2 islands will be combined into 1 which is not what I want.

I hope somebody has some great inputs to this! :)

SELECT 
ID
,StartDate
,EndDate
,LEAD(DATEADD(day,-1,StartDate),1) OVER (ORDER BY ID, Year, Month) AS Match1
,LAG(DATEADD(day,1,EndDate),1) OVER (ORDER BY ID, Year, Month) AS Match2
,IIF(StartDate= LAG(DATEADD(day,1,EndDate),1) OVER (ORDER BY ID, Year, Month)
OR EndDate =LEAD(DATEADD(day,-1,StartDate),1) OVER (ORDER BY ID, Year, Month)
,1,0) AS Grouping-ID
,Year
,Month
FROM NEW_Test

My data looks like after first edit:

ID  StartDate   EndDate     Match1      Match2      Year    Month
1   01-09-2016  30-09-2016  30-09-2016  NULL        2016    9
1   01-10-2016  31-10-2016  31-10-2016  01-10-2016  2016    10
1   01-11-2016  30-11-2016  30-11-2016  01-11-2016  2016    11
1   01-12-2016  31-12-2016  31-12-2016  01-12-2016  2016    12
1   01-01-2017  31-01-2017  31-01-2017  01-01-2017  2017    1
1   01-02-2017  28-02-2017  28-02-2017  01-02-2017  2017    2
1   01-03-2017  31-03-2017  31-03-2017  01-03-2017  2017    3
1   01-04-2017  30-04-2017  30-04-2017  01-04-2017  2017    4
1   01-05-2017  31-05-2017  31-05-2017  01-05-2017  2017    5
1   01-06-2017  30-06-2017  30-06-2017  01-06-2017  2017    6
1   01-07-2017  31-07-2017  31-07-2017  01-07-2017  2017    7
1   01-08-2017  31-08-2017  31-08-2017  01-08-2017  2017    8
1   01-09-2017  30-09-2017  30-09-2017  01-09-2017  2017    9
1   01-10-2017  31-10-2017  31-10-2017  01-10-2017  2017    10
1   01-11-2017  30-11-2017  30-11-2017  01-11-2017  2017    11
1   01-12-2017  31-12-2017  31-12-2017  01-12-2017  2017    12
1   01-01-2018  31-01-2018  31-01-2018  01-01-2018  2018    1
1   01-02-2018  28-02-2018  28-02-2018  01-02-2018  2018    2
1   01-03-2018  31-03-2018  30-04-2018  01-03-2018  2018    3
1   01-05-2018  31-05-2018  31-10-2018  01-04-2018  2018    5
1   01-11-2018  30-11-2018  30-11-2018  01-06-2018  2018    11
1   01-12-2018  31-12-2018  NULL        01-12-2018  2018    12
2   01-09-2016  30-09-2016  30-09-2016  NULL        2016    9
2   01-10-2016  31-10-2016  31-10-2016  01-10-2016  2016    10
2   01-11-2016  30-11-2016  30-11-2016  01-11-2016  2016    11
2   01-01-2017  31-01-2017  31-01-2017  01-01-2017  2017    1
2   01-02-2017  28-02-2017  28-02-2017  01-02-2017  2017    2
2   01-03-2017  31-03-2017  31-03-2017  01-03-2017  2017    3
2   01-04-2017  30-04-2017  30-04-2017  01-04-2017  2017    4
2   01-05-2017  31-05-2017  31-05-2017  01-05-2017  2017    5

What should be the end-result:

ID  StartDate   EndDate     Match1      Match2      Year Month  Grouping-ID
1   01-09-2016  30-09-2016  30-09-2016  NULL        2016    9   1
1   01-10-2016  31-10-2016  31-10-2016  01-10-2016  2016    10  1
1   01-11-2016  30-11-2016  30-11-2016  01-11-2016  2016    11  1
1   01-12-2016  31-12-2016  31-12-2016  01-12-2016  2016    12  1
1   01-01-2017  31-01-2017  31-01-2017  01-01-2017  2017    1   1
1   01-02-2017  28-02-2017  28-02-2017  01-02-2017  2017    2   1
1   01-03-2017  31-03-2017  31-03-2017  01-03-2017  2017    3   1
1   01-04-2017  30-04-2017  30-04-2017  01-04-2017  2017    4   1
1   01-05-2017  31-05-2017  31-05-2017  01-05-2017  2017    5   1
1   01-06-2017  30-06-2017  30-06-2017  01-06-2017  2017    6   1
1   01-07-2017  31-07-2017  31-07-2017  01-07-2017  2017    7   1
1   01-08-2017  31-08-2017  31-08-2017  01-08-2017  2017    8   1
1   01-09-2017  30-09-2017  30-09-2017  01-09-2017  2017    9   1
1   01-10-2017  31-10-2017  31-10-2017  01-10-2017  2017    10  1
1   01-11-2017  30-11-2017  30-11-2017  01-11-2017  2017    11  1
1   01-12-2017  31-12-2017  31-12-2017  01-12-2017  2017    12  1
1   01-01-2018  31-01-2018  31-01-2018  01-01-2018  2018    1   1
1   01-02-2018  28-02-2018  28-02-2018  01-02-2018  2018    2   1
1   01-03-2018  31-03-2018  30-04-2018  01-03-2018  2018    3   1
1   01-05-2018  31-05-2018  31-10-2018  01-04-2018  2018    5   2
1   01-11-2018  30-11-2018  30-11-2018  01-06-2018  2018    11  3
1   01-12-2018  31-12-2018  NULL        01-12-2018  2018    12  3
2   01-09-2016  30-09-2016  30-09-2016  NULL        2016    9   4
2   01-10-2016  31-10-2016  31-10-2016  01-10-2016  2016    10  4
2   01-11-2016  30-11-2016  30-11-2016  01-11-2016  2016    11  4
2   01-01-2017  31-01-2017  31-01-2017  01-01-2017  2017    1   5
2   01-02-2017  28-02-2017  28-02-2017  01-02-2017  2017    2   5
2   01-03-2017  31-03-2017  31-03-2017  01-03-2017  2017    3   5
2   01-04-2017  30-04-2017  30-04-2017  01-04-2017  2017    4   5
2   01-05-2017  31-05-2017  31-05-2017  01-05-2017  2017    5   5
Answer
1

This is a variation of the gaps-and-island problem.

I would proceed as follows:

  • in a subquery, rank the records by id and start date, using ROW_NUMER(), and set up a flagged that checks if the start date of the next record is contiguous to the end of the current record and if it has the same id
  • in the outer query, do a window sum on the flag; the difference between the row number and the flag gives you the grouping id

Consider:

SELECT 
    x.*
    1 + rn - SUM(matched) OVER(ORDER BY id, rn) AS GroupingID
FROM (
    SELECT
        t.*
        ROW_NUMBER() OVER(ORDER BY id, StartDate) rn,
        CASE 
            WHEN 
                id = LEAD(id) OVER(ORDER BY id, StartDate)
                AND DATEADD(day, 1, EndDate) = LEAD(StartDate) OVER(ORDER BY id, StartDate)
            THEN 1 
            ELSE 0 
        END matched
    FROM mytable
) x
Grouping date-ranges and Islands with uniq ID's without Partition by seperator
See more ...