I usually use a compound key (CreatedTime, Status)
for my Log
table, but I’m reconsidering this design. Since CreatedTime
is typically very unique and Status
only has 3-5 possible values, it seems that Status
might not add much to further filtering after CreatedTime
.
Most of my queries involve retrieving logs for a specific time range, optionally filtering or counting by Status
. Conceptually, if I were working with a physical log book sorted by time, identifying entries with a specific Status
(e.g., "Successful") would be cumbersome. On the other hand, having separate log books for each Status
, all sorted by time, could make searching more efficient—though combining and re-sorting results for all Status
values might complicate things. Does the database optimize for such scenarios?
I've already asked three different AIs about this, but their answers were vague and contradictory (and even the same AI gives different answers just by asking slightly different), and I can't find much on Google and SO. Could someone confirm whether my intuition here is correct?
I usually use a compound key (CreatedTime, Status)
for my Log
table, but I’m reconsidering this design. Since CreatedTime
is typically very unique and Status
only has 3-5 possible values, it seems that Status
might not add much to further filtering after CreatedTime
.
Most of my queries involve retrieving logs for a specific time range, optionally filtering or counting by Status
. Conceptually, if I were working with a physical log book sorted by time, identifying entries with a specific Status
(e.g., "Successful") would be cumbersome. On the other hand, having separate log books for each Status
, all sorted by time, could make searching more efficient—though combining and re-sorting results for all Status
values might complicate things. Does the database optimize for such scenarios?
I've already asked three different AIs about this, but their answers were vague and contradictory (and even the same AI gives different answers just by asking slightly different), and I can't find much on Google and SO. Could someone confirm whether my intuition here is correct?
Share Improve this question edited Mar 22 at 2:57 Dale K 27.5k15 gold badges58 silver badges83 bronze badges asked Mar 22 at 2:44 Luke VoLuke Vo 20.9k25 gold badges127 silver badges230 bronze badges 2- I'm not totally sure what you are asking to be honest, but the answer is, it depends, specifically on the exact queries you end up using. The general rule of thumb is, index on keys that give the most definition first, i.e. a date before a status. But depending on your queries, you might not even benefit from adding status to the index at all. You best bet is to actually test it. – Dale K Commented Mar 22 at 3:01
- How many records are we talking here? If you use an index that has the CreatedTime first, it's going to be efficient in retrieving data ranges, but if you want to count by status in addition to selecting the date range, it's going to scan all the index entries between the 2 dates. It definitely helps to include the status in the index. On the other hand, if you index by status first and then the CreateTime, it's going to help with queries that filter by a single status and date range. – boggy Commented Mar 22 at 3:08
2 Answers
Reset to default 3Your indexes should be based on how you typically query your data. Since you typically query logs for a given date range, then an index on CreatedTime
would be most efficient. I doubt that the secondary kay on Status
makes a significant difference unless you query a very large number of logs without your date range any only a few match the status you want. Also, since you most likely do not have multiple logs of different statuses at the exact same time, the sub-index is not helping since it's going to have to scan all index records for the given timeframe anyway.
Indexing by Status, CreatedTime
is not going to be significantly faster than CreatedDate, Status
if you want logs of one status, and will be less performance if you force the engine to scan through several statuses and consolidate the results.
Of secondary importance is how the data is added. Since you almost certainly add logs sequentially, indexing by Status, CreatedTime
will be less efficient since you'll be inserting records in the middle quite often, making it harder to add records. Indexing on CreatedTime
means you'll almost always be adding records to the end of your table, barring unusual activity like bulk imports of older logs.
I'd recommend 2 indexes:
CreatedTime
Status, CreatedTime where Status != 'OK'
- a partial index
You'd want to query all logs for some time period and also, for example, all entries with Status=Error
for some time period. Those 2 indexes will help you with both. As vast majority of log entries will likely have OK
status (or equivalent), so the second index will be much smaller.
Storing logs in a database is generally not very cost efficient. It might cause long backup and restore times, sudden increases of storage usage causing out of storage space errors, unplanned IO usage spikes while it is vacuumed and other potential problems.
发布者:admin,转转请注明出处:http://www.yc00.com/questions/1744326372a4568669.html
评论列表(0条)