queue - How to perform Geo-Replication in Apache Pulsar without creating duplicate messages? - Stack Overflow

I’m designing a multi-datacenter architecture using Apache Pulsar with geo-replication enabled.Archite

I’m designing a multi-datacenter architecture using Apache Pulsar with geo-replication enabled.

Architecture Overview:

  • Apache Pulsar version: 4.0.2
  • Helm Chart version: pulsar-3.9.0
  • BookKeeper: 5 replicas
  • Broker: 5 replicas
  • Proxy: 3 replicas
  • Zookeeper: 3 replicas
  • Recovery: 1 replica
  • Deployed on Kubernetes (Rancher)
  • Bookie storage: vSphere CSI via Persistent Volume Claims (PVC)

The architecture includes 2 different datacenters, and each of them hosts local consumers (around 200 per DC). My goal is to enable geo-replication between these DCs, but at the same time I must strictly prevent message duplication during consumption — messages should be processed exactly once, even in failover scenarios.

enter image description here

Requirements:

  • Messages must be durable (no data loss allowed)
  • Active-active or active-passive setup is acceptable
  • Each datacenter has its own Pulsar cluster and
  • Consumer duplication must not happen, even during failover or replay
  • Pulsar Deduplication and Failover Subscriptions are enabled

Questions:

  1. What is the best practice to ensure geo-replication between clusters without consumers processing the same message multiple times?
  2. Is it possible to achieve synchronous geo-replication via BookKeeper, writing each message to multiple DCs before ack?
  3. Would a combination of deduplication + idempotent consumer logic + failover subscription be enough?
  4. Any gotchas or caveats you’ve experienced with Pulsar multi-cluster deployments in this scenario?

Thanks in advance!

I deployed Apache Pulsar 4.0.2 using Helm chart pulsar-3.9.0 on a Kubernetes (Rancher) cluster across 2 datacenters. I enabled geo-replication, created separate clusters for each DC, and connected them via pulsar-admin CLI using the set-replication-clusters and set-clusters commands.

I also configured:

  • deduplication on the topics
  • failover subscription on the consumers
  • idempotent logic in the consumer code using message_id checking with Redis

What I expected:

  • Each message would be written once and consumed only once per global system
  • No duplicates during failover or network interruptions

What actually happened:

  • During failover testing, I noticed duplicate consumption in some edge cases (likely due to replay after disconnect)
  • I'm looking for a more reliable strategy to ensure exactly-once processing across DCs

发布者:admin,转转请注明出处:http://www.yc00.com/questions/1744092970a4557424.html

相关推荐

发表回复

评论列表(0条)

  • 暂无评论

联系我们

400-800-8888

在线咨询: QQ交谈

邮件:admin@example.com

工作时间:周一至周五,9:30-18:30,节假日休息

关注微信