postgresql - Postgres collation for sorting text containing decimal numbers - Stack Overflow

I'm trying to sort product names in a natural way.This means sorting text containing numbers by t

I'm trying to sort product names in a natural way. This means sorting text containing numbers by the numbers (if the text is the same). Note that the example below is a simplification, in the real world the names are not so structured. I ideally want to use a collation for this, as that seems like the easiest way.

SELECT col FROM (
    VALUES ('test 0.10'), 
           ('test 0.05'), 
           ('test 0.200'), 
           ('test 5'), 
           ('test 20'), 
           ('test 0.3')) 
AS t(col) ORDER BY col COLLATE "natural";

I want this to return

    col     
────────────
 test 0.05
 test 0.10
 test 0.200
 test 0.3
 test 5
 test 20

I've read the question Postgres: Sort a list of strings that contain numbers but here I specifically want to handle both decimal and whole numbers.

For a more realistic example of the mess that I need to handle:

SELECT  col
FROM (
    VALUES ('apple 0.10'),
           ('banana 0.05'),
           ('apple 0.200'),
           ('banana 5'),
           ('apple 20'),
           ('apple 3'),
           ('banana 0.3 or 4.5?'),
           ('flour 100g, 20 bags'),
           ('flour 100g, 3 bags'),
           ('no number'))
AS t(col)
ORDER BY col;

which should be sorted like this:

 apple 0.10
 apple 0.200
 apple 3
 apple 20
 banana 0.05
 banana 0.3 or 4.5?
 banana 5
 flour 100g, 3 bags
 flour 100g, 20 bags
 no number

I'm trying to sort product names in a natural way. This means sorting text containing numbers by the numbers (if the text is the same). Note that the example below is a simplification, in the real world the names are not so structured. I ideally want to use a collation for this, as that seems like the easiest way.

SELECT col FROM (
    VALUES ('test 0.10'), 
           ('test 0.05'), 
           ('test 0.200'), 
           ('test 5'), 
           ('test 20'), 
           ('test 0.3')) 
AS t(col) ORDER BY col COLLATE "natural";

I want this to return

    col     
────────────
 test 0.05
 test 0.10
 test 0.200
 test 0.3
 test 5
 test 20

I've read the question Postgres: Sort a list of strings that contain numbers but here I specifically want to handle both decimal and whole numbers.

For a more realistic example of the mess that I need to handle:

SELECT  col
FROM (
    VALUES ('apple 0.10'),
           ('banana 0.05'),
           ('apple 0.200'),
           ('banana 5'),
           ('apple 20'),
           ('apple 3'),
           ('banana 0.3 or 4.5?'),
           ('flour 100g, 20 bags'),
           ('flour 100g, 3 bags'),
           ('no number'))
AS t(col)
ORDER BY col;

which should be sorted like this:

 apple 0.10
 apple 0.200
 apple 3
 apple 20
 banana 0.05
 banana 0.3 or 4.5?
 banana 5
 flour 100g, 3 bags
 flour 100g, 20 bags
 no number
Share Improve this question edited Mar 11 at 14:04 rutchkiwi asked Mar 6 at 11:11 rutchkiwirutchkiwi 5764 silver badges16 bronze badges 9
  • Surely 0.3 and 0.200 in your "wanted" output should be the other way around ...? – C3roe Commented Mar 6 at 11:26
  • The answer you link to DOES seem to handle both integers and decimals. Am I missing something? – Richard Huxton Commented Mar 6 at 11:34
  • @RichardHuxton Yes, that will sort 0.3 before 0.05. But I don't think there is a better collation to approximate the desired behavior. – Laurenz Albe Commented Mar 6 at 12:20
  • Write a regex that finds the number, cast to numeric and sort? – Frank Heikens Commented Mar 6 at 14:43
  • My post suggested the dot does work as a decimal point, which is untrue. I went back and edited that, tanks for catching. As to how to make that work, unless ICU supports that, I don't think there's another way than a custom substring-based sorting. – Zegarek Commented Mar 6 at 16:48
 |  Show 4 more comments

1 Answer 1

Reset to default 1

This one works, by using a regex to find the first numeric value in the content:

SELECT  col
    ,   CAST((regexp_match(col, '([0-9]+(\.[0-9]+)?)'))[1] AS NUMERIC)
FROM (
    VALUES ('test 0.10'),
           ('test 0.05'),
           ('test 0.200'),
           ('test 5'),
           ('test 20'),
           ('test 0.3 or 4.5?'),
           ('no number'))
AS t(col)
ORDER BY
    CAST((regexp_match(col, '([0-9]+(\.[0-9]+)?)'))[1] AS NUMERIC) DESC NULLS LAST;

-- edit -- To support multiple values in a single record, you can use the function regex_matches and a global (g) flag:

SELECT
    t.col,  -- Or any other columns you'd like to return
    (
        SELECT array_agg(CAST(m[1] AS NUMERIC))
        FROM regexp_matches(t.col, '(\d+(\.\d+)?)', 'g') m
    ) AS matches
FROM (
    VALUES  ('apple 0.10'),
           ('banana 0.05'),
           ('apple 0.200'),
           ('banana 5'),
           ('apple 20'),
           ('apple 3'),
           ('banana 0.3 or 4.5?'),
           ('flour 100g, 20 bags'),
           ('flour 100g, 3 bags'),
           ('no number'))
AS t(col)
ORDER BY 2 DESC NULLS LAST;

发布者:admin,转转请注明出处:http://www.yc00.com/questions/1744980596a4604418.html

相关推荐

发表回复

评论列表(0条)

  • 暂无评论

联系我们

400-800-8888

在线咨询: QQ交谈

邮件:admin@example.com

工作时间:周一至周五,9:30-18:30,节假日休息

关注微信