How to profit fromconfigureSDFBoundedSourceReader while reading from BigQuery in Python? - Stack Overflow

Situation:I read 25M+ rows, ~300 columns from BigQuery and write to SQL Server with JDBC and it takes

Situation: I read 25M+ rows, ~300 columns from BigQuery and write to SQL Server with JDBC and it takes too much time. When I look into for figuring out which step takes the most time inefficiently, I encounter with SDFBoundedSourceReader. SDFBoundedSourceReader step gets elements one by one and this increases pipeline elapsed time as well as requires of using a lot of vCPUs and get many errors like

Error message from worker: Error encountered with the status channel: SDK harness sdk-0-0 disconnected.

Operation ongoing in bundle process_bundle-3484514982132990920-35 for at least 12m39s without outputting or completing:

Completed work item 7528337809130607698 UNSUCCESSFULLY: CANCELLED: [type.googleapis/util.MessageSetPayload='[dist_proc.dax.internal.StackTraceProto] { stack_top_loc { filepath: .... [dist_proc.dax.workflow.workflow_utils_message_ext]: WORK_PROGRESS_UPDATE_LEASE_ALREADY_CANCELLED }']

I tried:

  • adding cluster and/or partition to BigQuery table
  • adding experiment, pre_optimize=all
  • num workers: 5, 10, 15, 20 (however of course I would love to use minimum worker)
  • machine type: n2-standard-4, n2-standard-8, n2-standard-16
  • autoscaling algorithm: THROUGHPUT_BASED (always)
  • write batch size: 10000, 25000, 50000, 100000, 250000

ReadFromBigQuery configurations:

  • method: EXPORT
  • method: DIRECT_READ
  • reading from query
  • reading from table
  • output type: BEAM_ROW
  • output type: PYTHON_DICT and giving schema I created (..NamedTuple)

I'd prefer getting output as BEAM_ROW because giving schema also takes much time because of ~300 columns however if you have any idea to get better performance, welcome.Please check images from dataflow below.

example dataflow

example dataflow2

example dataflow

发布者:admin,转转请注明出处:http://www.yc00.com/questions/1745282789a4620378.html

相关推荐

发表回复

评论列表(0条)

  • 暂无评论

联系我们

400-800-8888

在线咨询: QQ交谈

邮件:admin@example.com

工作时间:周一至周五,9:30-18:30,节假日休息

关注微信