loading...

Did the user really ask for Exactly Once? Fault Tolerance

tspannhw profile image Timothy Spann Originally published at datainmotion.dev on ・1 min read

Exactly Once Requirements

It is very tricky and can cause performance degradation, if your user could just use at least once, then always go with that. Having data sinks like Kudu where you can do an upsert makes exactly once less needed.

https://docs.cloudera.com/csa/1.2.0/datastream-connectors/topics/csa-kafka.html

Apache Flink, Apache NiFi Stateless and Apache Kafka can participate in that.

For CDF Stream Processing and Analytics with Apache Flink 1.10 Streaming :

Both Kafka sources and sinks can be used with exactly once processing guarantees when checkpointing is enabled.

End-to-End Guaranteed Exactly-Once Record Delivery

The Data Source and Data Sink to need to support exactly-once state semantics and take part in checkpointing.

Data Sources

  • Apache Kafka - must have Exactly-Once selected, transactions enabled and correct driver.

Select : Semantic.EXACTLY_ONCE

Data Sinks

  • HDFS BucketingSink
  • Apache Kafka

For Kafka, please check the timeouts sync up to checkpoints. https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/connectors/kafka.html#kafka-producers-and-fault-tolerance

Reference

Posted on by:

tspannhw profile

Timothy Spann

@tspannhw

I am a Principal Field Engineer for Data in Motion at Cloudera. I work with Apache NiFi, Apache Kafka, Apache Spark, Apache Flink, IoT, MXNet, DLJ.AI, Deep Learning, Machine Learning, Streaming...

Discussion

pic
Editor guide