I’m designing a multi-datacenter architecture using Apache Pulsar with geo-replication enabled.
Architecture Overview:
- Apache Pulsar version: 4.0.2
- Helm Chart version: pulsar-3.9.0
- BookKeeper: 5 replicas
- Broker: 5 replicas
- Proxy: 3 replicas
- Zookeeper: 3 replicas
- Recovery: 1 replica
- Deployed on Kubernetes (Rancher)
- Bookie storage: vSphere CSI via Persistent Volume Claims (PVC)
The architecture includes 2 different datacenters, and each of them hosts local consumers (around 200 per DC). My goal is to enable geo-replication between these DCs, but at the same time I must strictly prevent message duplication during consumption — messages should be processed exactly once, even in failover scenarios.
enter image description here
Requirements:
- Messages must be durable (no data loss allowed)
- Active-active or active-passive setup is acceptable
- Each datacenter has its own Pulsar cluster and
- Consumer duplication must not happen, even during failover or replay
- Pulsar Deduplication and Failover Subscriptions are enabled
Questions:
- What is the best practice to ensure geo-replication between clusters without consumers processing the same message multiple times?
- Is it possible to achieve synchronous geo-replication via BookKeeper, writing each message to multiple DCs before ack?
- Would a combination of deduplication + idempotent consumer logic + failover subscription be enough?
- Any gotchas or caveats you’ve experienced with Pulsar multi-cluster deployments in this scenario?
Thanks in advance!
I deployed Apache Pulsar 4.0.2 using Helm chart pulsar-3.9.0 on a Kubernetes (Rancher) cluster across 2 datacenters. I enabled geo-replication, created separate clusters for each DC, and connected them via pulsar-admin
CLI using the set-replication-clusters
and set-clusters
commands.
I also configured:
deduplication
on the topicsfailover
subscription on the consumers- idempotent logic in the consumer code using
message_id
checking with Redis
What I expected:
- Each message would be written once and consumed only once per global system
- No duplicates during failover or network interruptions
What actually happened:
- During failover testing, I noticed duplicate consumption in some edge cases (likely due to replay after disconnect)
- I'm looking for a more reliable strategy to ensure exactly-once processing across DCs
发布者:admin,转转请注明出处:http://www.yc00.com/questions/1744092970a4557424.html
评论列表(0条)