DEV Community

Rajneesh shukla
Rajneesh shukla

Posted on

Apache Kafka tool evaluation related queries

Hello All,

I need below information about Apache Kafka tool for data integration and ETL needs:

Development effort:

The development effort , time and complexity is more in general?

Maintainability:

Is it less maintainable?

Error Handling:

Only possesses a single log file? or possesses a log and error port in every transform?
What kind of errors can be handled?

Various teams needed:

Separate Administration team or Unix or NT Admin will suffice needed works. hence it does not need a dedicated administer?

File Structure:

Only able to read record with single type of delimiter?

Data Integration Capability:

ODI boasts comparatively lesser range of Data Integration Products and capability which includes many related functions such as profiling and data quality ? Also, if it offers these capabilities then these are more mainstream in nature?

Market Segments:

Serves medium to large scale companies?

Debugging:

Is it offer easy debugging? Example -just place some watchers on required places and intermediate data will be saved in temporary files for easy viewing. or complex debugging process through debugger?

Company Strategy:

You can download a scaled down free version of their software and plenty of free documents available on internet?

Go live rate:

High “GO Live” success? any know issue during deployment?

Scalability:
Is there any issue with stability? If yes then why is the issue and what is impact?
Which kind of scalability is supported- horizontal, vertical?

Performance:
Can it supports High volume of data movement, transformation and integration (ETL operations)?
How about parallelism - mapping level parallelism, session level parallelism, supports multiple parallel source and multiple target data loads?

Heterogeneous system:
It integrates data from various heterogeneous systems like multiple variety of databases (SQL server, Oracle, DB2 etc), files (XML, XLS, CSV, text etc)?
Targets can be any type of DB , file etc.?

Big Data support:
It can be integrated and used for Big Data?

On cloud solution:
It is available for both- on cloud and on premises platforms?

Pricing:
Is it free ware - open source? Does it come in basic, standard and enterprise editions flavors? If yes , all flavors are free?

Repository:
Does it offers repositories ? Those repositories are for metadata?
Host for repositires should be relational database?

Push down mechanism:
Do we have pushdown optimization concepts, where it can generate SQL statements from the workflow/mapping which can be directly executed on database?
It is ETL or ELT tool?

Job scheduling:
Does it come with in-built scheduler?

Version controlling:

Does it offer version controlling?
If yes then it is tightly controlled or moderate?

Tool Bugs:
Any known tool bugs? Any issue due to those bugs?

Anything else you want to highlight?

Thanks,

Rajneesh

Top comments (0)