About This Project#

Why this project#

In the realm of data processing systems, it is commonly recommended to transmit records in batches rather than making individual API calls for each record. This practice effectively mitigates network overhead and enhances throughput. However, several potential pitfalls exist in this approach. The remote system may experience failures, your client’s network connectivity could fail, or your client program itself might be killed. To avoid data loss, we have to have a fault tolerance mechanisms.

In my personal professional experience, I have done projects involving the development of data producer programs for various platforms, including Kafka, AWS Kinesis, AWS CloudWatch, and Splunk. Some streaming systems offer official client libraries to address this issue, but the implementations vary considerably. Some client libraries requires a long-running system service acting as an agent, sending batch data to stream system. Others rely on external databases to persistently buffer data, while some implement their own in-memory buffering mechanisms. Unfortunately, not all of these solutions support Python. This decentralized implementation approach places a substantial burden on developers and lacks the reusability required for addressing a broader range of streaming systems. Consequently, I decide to create a universal Producer library to tackle these challenges comprehensively.

Project Objectives#

This project provides an abstraction layer of producer. It has three low level modules:

  1. AbcRecord: it is the data model base class for your record, serialize and deserialize your record.

  2. AbcBuffer: pack records into batch, and persist the data for fault tolerance.

  3. AbcProducer: it take batch data from buffer and send to target system. it provide auto-retry, exponential backoff, error handling, debugging feature out of the box.

Based on these three low level modules, this project also provides some concrete implementations of producer client library for popular streaming systems. Also, you can easily build their own producer client library for other streaming systems by inheriting the three low level modules.

Features#

DataClassRecord

dataclasses based record class

FileBuffer

A file based buffer, it writes batch data to local file system for fault tolerance.

SimpleProducer

A simple producer that send data to a target file on your local machine in append-only mode. This producer is for demo and for testing purpose.

AwsCloudWatchLogsProducer

A simple AWS CloudWatch Logs producers, based on FileBuffer.

AwsKinesisStreamProducer

A simple AWS Kinesis data stream producers, based on FileBuffer.