I currently have a tricky problem at work to write an algorithm for. I googled around a bit, but "time intervals" did not come up with anything useful. Although I have some rough idea of how I could solve it, I still would value some input.
I have information of (It has only couple dozen entries.) ServiceNum, DollarCost
and a input database table in the form (several GBytes): ClientNumber (CN), BeginDate(BD), EndDate(ED), ServiceNumber_Purchased(SN) --Date intervals are always [closed, closed]
The output is: ClientNumber(CN), BeginDate(BD), Enddate(ED), DollarCost(DC)
With the following requirements: 1) The input dates can be overlapping dates. The output has to be non-overlapping "broken up" dates
Example: (assuming SN 1=$10 and SN 2=$20) input (CN, BD, ED ,SN): 10, 1/1/2001,1/1/2005,1 10, 1/1/2002,1/1/2004,2
should result in: output (CN, BD, ED, DC): 10, 1/1/2001, 12/31/2001, $10 10, 1/1/2002, 1/1/2004, $30 10, 1/2,2004, 1/1/2005, $10
2) if the same service is purchased twice for an interval it should be billed only once. Example: input: 10, 1/1/2001, 1/1/2005, 1 10, 1/1/2004, 1/1/2007, 1
output: 10, 1/1/2001, 1/1/2007, $10
Here are my thoughts about a solution so far:
1. fetch the input data sorted by clientNumber, Begindate, Enddate and ServiceNumber
2. read the row, store as temporary good interval, then read another row
3. if new begin-date is bigger than previous good interval end-date (or previously saved end-date), output previous good interval, start new good interval
4. else create new good interval entry with good interval begin-date and current row begin-date, store good interval end-date into a list or something (so we always compare against smallest end-date).
5. Use "bitwise or" on a service bitfield to add services and caluate the total
As you can see my algorithm is a bit scetchy right now, but at this point I am more interested to determine the best data structure for the date interval problem (stack, queue, set,...).