Building a Highly Generic Tool for Copying Transactional Data
Years ago, I set out to build a tool to copy transactional data from one environment to another. A single transaction could span 350+ tables, though not all transactions used all tables. Different transaction types would touch different subsets of these tables. Given this complexity, coding this in Java seemed like a massive undertaking. It was my pet project that I wanted to do during weekends etc. besides my other important projects, so I had to find shortcuts and utilize high reusability of code.
The challenge?
- Writing 100,000+ lines of code to handle individual table reads and inserts. 
- Maintaining this code as table structures changed frequently to add new features almost every month. 
- Ensuring key values were correctly generated, as they relied on Oracle sequences rather than direct copies. 
- Minimizing manual effort, complexity, and long-term maintenance. 
I needed a solution that required minimal code, could adapt dynamically to schema changes, and eliminated ongoing maintenance.
First Attempt: Scripts & Data Pump Utility
Initially, I explored using scripts and the Oracle Data Pump utility, which supports complex queries. While this seemed like a viable option, I hit a political roadblock — the DB team didn’t allow non-DB personnel to run complex queries. Teams in large corporations can be territorial, and I had to find another way.
The Breakthrough: A Highly Generic Approach
I had to design this such that there is high re-use of code. I followed one of my core principles of genericity:
If you write even a single line of code, ensure it gets reused by wrapping it inside a function.
With that in mind, I designed the solution with a main orchestrator class and another class that implemented the database operations around just two functions with some supporting functions inside the class:
- A function to run a SELECT query. 
- A function to run an INSERT query. 
 No individual queries for 350+ tables, just a handful of highly generic functions! However, the queries needed to come from outside the code to avoid constant modifications.
Dynamic Query Generation Using a Config File
Instead of hardcoding queries, I structured everything using a configuration file that:
Contained SELECT statements for each table in the correct sequence of insertion.
Used subqueries to relate data back to the main primary key — sometimes with nested dependencies.
Ensured foreign key relationships were handled dynamically.
Eliminated code changes when tables evolved — only the config needed updating.
Handling Schema Changes at Runtime
Since the tool worked dynamically at runtime, I couldn’t rely on an object model. But that wasn’t a problem — every result set contained field names, effectively allowing dynamic object modeling.
The next challenge? Inserting data with the correct datatype.
- Oracle’s sysCols, sysColumns and other metadata tables provided field details. 
- I mapped data types properly to prevent query failures. 
- Special handling for BLOBs, CLOBs, and other non-standard types was added. 
This approach meant we looped over every field in every table, determining its type dynamically and constructing the correct INSERT statements at runtime.
Performance Bottleneck & Optimization
The first run took over 40 minutes and this was totally unacceptable. The solution? Multi-threading.
However, I strictly followed another personal principle for multi-threaded programming:
Do not share data or resources between threads if you want true parallel execution.
The existing config file wasn’t structured for parallel execution, so I:
- Grouped tables logically, allowing independent execution. 
- Designed threads to run in parallel without dependencies. 
- Ensured data integrity despite concurrent operations. 
This restructuring took a day of analyzing 350+ table relationships, but the payoff was enormous — execution time dropped from 40 minutes to under 1 minute!
Connection Management
I didn’t use Hikari connection pooling or any other similar JAR (I can’t recall why), but I had extensive C# experience handling database connections manually. So, I implemented a lightweight connection pooling mechanism — keeping connections open for a limited number of operations before refreshing them. I noticed this connection pooling was more beneficial as it was tailored to my use case.
Impact & Value Delivered
This tool dramatically improved the productivity of developers and QA teams:
✅ Drastically reduced test data setup time — from hours to mere minutes.
✅ Enabled easy replication of test scenarios across environments.
✅ Adapted dynamically to schema changes without modifying code.
✅ Required just ~5,000 lines of code instead of 100,000+ — a 95% reduction!
Whenever a new table was added, only a single line was added in the config file. The code remained untouched — a true victory in generic, maintainable software design.
And yes, for those curious — the UI was built in AngularJS.
Final Thoughts
This was one of my favorite pet projects because it embodied high genericity, efficiency, and maintainability. I love designing solutions that eliminate unnecessary work, reduce technical debt, and maximize long-term value and this tool was a perfect example of that. I wanted to give it to other groups with other applications as the code did not have any application specific implementation. The config file needed changes as this is a different database. Also, the connection configuration needed to point to the new database to populate in the dropdown. So, it was essentially a product that could be used by any team utilizing any Oracle DB schema. Hope you learnt a thing or two — I always look back to it with a sense of pride and satisfaction.
LinkedIn: https://www.linkedin.com/in/vaseem-anjum-07514a7/
My new Pet Project is a No Code Low Code Digitization Platform, check it out here: https://www.digitifynow.com/
 
 
              
 
    
Top comments (0)