Scaling gRPC Server Streaming to 100K Subscribers on GKE (Java Spring Boot Reactive) - Stack Overflow

I'm building a gRPC server streaming application using Java, Spring Boot, and reactive programming

I'm building a gRPC server streaming application using Java, Spring Boot, and reactive programming that acts as a wrapper around Google Cloud Pub/Sub. The application has two methods: /publish and /subscribe. I expect a small number of publishers and a large number of subscribers (80K - 100K).

Here's how it works:

  • Publishers use the /publish method to send messages to a specific topic.

  • The MQ server publishes the message to the specified Pub/Sub topic.

  • The MQ server also subscribes to the same topic and receives the message.

  • The MQ server then streams the message to all connected subscribers.

Setup:

  • gRPC Server: Java with Spring Boot and reactive programming.

  • GKE: e2-standard-4 machine type.

  • OS Limits: Increased ulimit to 20000.

The Problem:

I'm using the below script to simulate a large number of subscribers locally:

#!/bin/bash

# Number of requests to execute in this instance
NUM_REQUESTS=$1
SUBSCRIBER_OFFSET=$2

# Loop to start NUM_REQUESTS grpcurl requests in the background
for i in $(seq 1 $NUM_REQUESTS); do
  subscriber_id=$((SUBSCRIBER_OFFSET + i))

  grpcurl -cert /home/client-cert.pem -key /home/client-cert-key.pem \
  -import-path /home \
  -proto contract.proto \
  -d "{\"topic_name\": \"test\", \"subscriber_id\": \"subscriber-$subscriber_id\"}" \
  mq.abc:443 SubscriptionService.Subscribe &

done

echo "All grpcurl connections have been started in the background."

When I try to establish more than 5,000 connections, I get the following error:

Failed to dial target host "mq.abc:443": context deadline exceeded

Active connections also start dropping after a while.

Troubleshooting:

  • Increased ulimit to 20000 to allow more open files/sockets.

  • Increased the connection timeout on the client side with -connect-timeout option.

Questions:

  • Could the context deadline exceeded error be related to network latency, request timeouts, or server overload?

  • Are there specific configurations for Spring Boot Reactive gRPC to handle a large number of concurrent streams efficiently?

  • Are there limits on concurrent connections imposed by GKE's networking or my Ingress configuration?

  • What strategies can I use to optimize gRPC performance for a large number of subscribers in a reactive Spring Boot application (e.g., connection pooling, flow control)?

  • Are there alternative solutions I should consider for this use case, given the high fan-out requirement?

Any help would be greatly appreciated!

发布者:admin,转转请注明出处:http://www.yc00.com/questions/1745591668a4634871.html

相关推荐

发表回复

评论列表(0条)

  • 暂无评论

联系我们

400-800-8888

在线咨询: QQ交谈

邮件:admin@example.com

工作时间:周一至周五,9:30-18:30,节假日休息

关注微信