google bigquery - Filtering Big Query view with partition column - Stack Overflow

I have a case where I need to filter data using a view and a partition column.Table:CREATE TABLE `tes

I have a case where I need to filter data using a view and a partition column.

Table:

CREATE TABLE `testing-442412.test_dataset.Product`(
  Id INTEGER NOT NULL,
  UserId INTEGER NOT NULL,
  CompanyId INTEGER NOT NULL,
  DateCreated DATETIME NOT NULL
) PARTITION BY DATE_TRUNC(DateCreated, MONTH) CLUSTER BY UserId, CompanyId;

View:

CREATE VIEW `testing-442412.test_dataset.ProductView` AS
SELECT
  UserId,
  CompanyId,
  EXTRACT(YEAR FROM DateCreated) AS Year,
  EXTRACT(MONTH FROM DateCreated) AS Month,
  COUNT(DISTINCT Id) AS Total
FROM `testing-442412.test_dataset.Product`
GROUP BY
  UserId,
  CompanyId,
  Year,
  Month

I WANT TO FILTER MY AGGREGATED DATA USING QUERY LIKE THIS:

SELECT * FROM `testing-442412.test_dataset.ProductView` WHERE Year = 2024 AND Month = 1;

However, when I do this, I notice that partition pruning doesn't seem to be working, as many records are read in the execution details. How can I improve my view in this case? Does partition pruning even work with the EXTRACT function? Or do I need to use the DateCreated column directly without any manipulation, with a query like this:

SELECT FROM testing-442412.test_dataset.Product WHERE DateCreated BETWEEN DATETIME("2024-01-01") AND DATETIME("2024-01-30");

I don't really want to force my clients to use table Product directly, so I hope you will be able to help me.

I have a case where I need to filter data using a view and a partition column.

Table:

CREATE TABLE `testing-442412.test_dataset.Product`(
  Id INTEGER NOT NULL,
  UserId INTEGER NOT NULL,
  CompanyId INTEGER NOT NULL,
  DateCreated DATETIME NOT NULL
) PARTITION BY DATE_TRUNC(DateCreated, MONTH) CLUSTER BY UserId, CompanyId;

View:

CREATE VIEW `testing-442412.test_dataset.ProductView` AS
SELECT
  UserId,
  CompanyId,
  EXTRACT(YEAR FROM DateCreated) AS Year,
  EXTRACT(MONTH FROM DateCreated) AS Month,
  COUNT(DISTINCT Id) AS Total
FROM `testing-442412.test_dataset.Product`
GROUP BY
  UserId,
  CompanyId,
  Year,
  Month

I WANT TO FILTER MY AGGREGATED DATA USING QUERY LIKE THIS:

SELECT * FROM `testing-442412.test_dataset.ProductView` WHERE Year = 2024 AND Month = 1;

However, when I do this, I notice that partition pruning doesn't seem to be working, as many records are read in the execution details. How can I improve my view in this case? Does partition pruning even work with the EXTRACT function? Or do I need to use the DateCreated column directly without any manipulation, with a query like this:

SELECT FROM testing-442412.test_dataset.Product WHERE DateCreated BETWEEN DATETIME("2024-01-01") AND DATETIME("2024-01-30");

I don't really want to force my clients to use table Product directly, so I hope you will be able to help me.

Share Improve this question edited Mar 10 at 15:43 M_G asked Mar 10 at 15:42 M_GM_G 112 bronze badges 1
  • Kudos on your first question on SO. It is high quality with plenti of info and sample code. – Yun Zhang Commented Mar 12 at 4:14
Add a comment  | 

1 Answer 1

Reset to default 0

(Unfortunately) BigQuery has relatively sensitive rules on what predicate can be used as a partition filter. Not every reasonable combination is now detected as a valid partition filter. I personally think your filter is reasonable and could be expected for "monthly" partition but it is indeed not supported.

Here's the best way of confirming by set the table option require_partition_filter to true, then when the table is not access with a "detected" partition filter, the query will just fail

CREATE OR REPLACE TABLE mydataset.temp (
  Id INTEGER NOT NULL,
  UserId INTEGER NOT NULL,
  CompanyId INTEGER NOT NULL,
  DateCreated DATETIME NOT NULL
) 
PARTITION BY DATE_TRUNC(DateCreated, MONTH) 
CLUSTER BY UserId, CompanyId
OPTIONS (require_partition_filter=true); -- <=== this one

The here's an very non-exhaustive list of what works and doesn't work:

select * from mydataset.temp
-- where EXTRACT(MONTH FROM DateCreated) = 1;            # <=== doesn't work
-- where EXTRACT(DATE FROM DateCreated) = '2020-01-01';  # <=== works
-- where DATE(DateCreated) = '2020-01-01';               # <=== works
-- WHERE DateCreated BETWEEN DATETIME("2024-01-01") AND DATETIME("2024-01-30"); # <=== works

发布者:admin,转转请注明出处:http://www.yc00.com/questions/1744837339a4596343.html

相关推荐

发表回复

评论列表(0条)

  • 暂无评论

联系我们

400-800-8888

在线咨询: QQ交谈

邮件:admin@example.com

工作时间:周一至周五,9:30-18:30,节假日休息

关注微信