bigquery external table on bigtable - fetch perf with rowkeys join on smaller bq table - Stack Overflow

Scenario:Bigquery EXTERNAL table on bigtable table (Bigtable table has ~250 million rowkeys)Bigquery s

Scenario:

  1. Bigquery EXTERNAL table on bigtable table (Bigtable table has ~250 million rowkeys)
  2. Bigquery stored table that has 100,000 rowkeys (subset of rowkeys on BT)

When running a query like below, it takes pretty long to get the results of bigquery query, even though we know that querying bigtable with a given rowkey is pretty fast

Query:

SELECT
        bt.rowkey,
        bt.column1,
        ..., 
        bt.column14
FROM
      `bq_dataset_mine.my_bq_small_table_with_rowkeys` sm

JOIN  `bq_dataset_mine.my_bq_external_table_on_bt_table1` bt
  ON
      bt.rowkey = sm.rowkey

Is there a possibility of speeding up this bigquery query that joins a small bigquery table having rowkeys with the larger BQ external table on BT?

Here is what my query plan looks like on BQ

Step details
  READ
  $1:rowkey, $2:my_cf1.cq1.cell.timestamp, $3:my_cf1.cq2.cell.value, $4:my_cf2.cq3.cell.timestamp, $5:my_cf1.cq4.cell.value, ..., $11:my_cf1.cq10.cell.value, ...

  FROM my-proj.my-dset.my_bt_ext_table

  READ
  $30

  FROM __stage00_output

  JOIN
  $40 := $1, $41 := $2, $42 := $3, $43 := $4, $44 := $5, $45 := $6, $46 := $7, $47 := $8, $48 := $9, $49 := $10, $50 := $11, $51 := $12, $52 := $13, $53 := $14, $54 := $15, $55 := $16

  SEMI HASH JOIN EACH  WITH ALL  ON $1 = $30

  WRITE
  $40, $41, $42, $43, $44, $45, $46, $47, $48, $49, $50, $51, $52, $53, $54, $55

  TO __stage01_output

Shuffle
  Shuffle output bytes    2.2 MiB

Looking at the plan, seems like the explanation might be because of the fact that we are selecting lot of the columns from the BT table rather than just testing for existence of the rowkey, so all of those columns need to be stored into the hash table (Hash Semi Join) thus maybe the time and cost go up!

If someone from Google Bigtable or Bigquery team can look at the plan and see if there is a possibility of speeding up here, it would be great.

发布者:admin,转转请注明出处:http://www.yc00.com/questions/1745085211a4610364.html

相关推荐

发表回复

评论列表(0条)

  • 暂无评论

联系我们

400-800-8888

在线咨询: QQ交谈

邮件:admin@example.com

工作时间:周一至周五,9:30-18:30,节假日休息

关注微信