Background:
Curt Larson, the Principal Engineer at our organization, undertook a critical project for a large regional bank as a Consultant specializing in Software Development and Data Engineering. The bank was facing challenges related to legacy systems, and the need for modernization in their data processing infrastructure.
Data Pipeline Modernization
As part of the bank's modernization efforts to eliminate the mainframe, Curt was tasked with creating a Proof of Concept (POC) data pipeline to load EBCDIC files from a vendor into Teradata. The existing data pipeline had the EBCDIC files being sent directly to the mainframe and being processed there. The client already had a quote from the vendor to re-work the nightly data feed into an ascii CSV file, but the cost estimate was too expensive. To work around this issue, a proof of concept (POC) project was proposed to determine if an ETL (Extract, Transform, Load) job written in Java and run on Linux could be created to process files originally meant to be handled by the mainframe. If successful, the solution would save the company a significant amount of money and also provide a pattern for moving similar jobs off of the mainframe.
Challenges:
- Finding a well maintained open source library for working with EBCDIC and Copybook files.
- Error handling edge cases in the EBCDIC data files.
- Ensuring compatibility with existing mainframe processes.
Curt's Approach:
Curt collaborated closely with the bank's mainframe team to ensure that the new data pipeline mirrored the functionality of the existing mainframe pipeline. To achieve this, he chose to implement the data pipeline using an Apache Spark library called Cobrix, leveraging the bank's existing Spark cluster. Cobrix has many features for reading different EBCDIC formats, and it is able to process very large files efficiently using Spark.
Results:
- Successful creation of a POC data pipeline for EBCDIC file loading into Teradata.
- Successfully validated the feasibility of EBCDIC ETL jobs on the Linux platform.
- Close alignment with existing mainframe processes.
- The POC was a resounding success, paving the way for the bank to move forward with the modernization effort.
Conclusion:
Curt Larson's exceptional expertise in software development and data engineering proved invaluable to the large regional bank. His contributions in modernizing the data pipeline demonstrated his problem-solving skills, technical prowess, and dedication to ensuring the bank's operational efficiency and security. The bank is now well-positioned to embrace modernization with confidence, thanks to Curt's efforts and expertise.