When working on a customer on an import process, she made the comment—if we were doing true EDI, we’d [Do X]. The comment got me thinking—Aren’t we working on an EDI process? As someone who frequently has multiple roles on projects, I perhaps had lumped all the various 3 letter acronyms into the same spaghetti salad. In this episode of the podcast, Eugene and I chat about the differences—and how thy might impact your team. Data projects can get complex with lots of moving pieces and understanding the different pieces can be helpful as you work to solve your specific pieces of the puzzle.
Microsoft MVP Kevin Feasel was not able to join the discussion but did have this to say about the two:
“EDI is an interface, a contract for how things are expected. I expect data of this file format with this data structure. It should contain these data elements with these attributes. For example, I’m expecting Parquet files with 37 columns, the first of which is an integer ID, the second a string description, and so on.
ETL, meanwhile, is all about the process. We pull data from a source system, transform it in-memory, and then load it into a destination system. Alternatively, ELT: pull data from a source system, land it as-is in the destination system, and then transform it using the destination system’s resources into a structure which fits the destination system. ETL is the venerable technique with a long history of existence. ELT, meanwhile, is a bit newer of an idea but works really well when we don’t want to tax our source systems very heavily and have a huge amount of data–more than we’d be able to transform in a reasonable amount of time on cheap data mover servers.
With ETL, we of course care about file formats, data elements, and data attributes, but I can talk you through an entire ETL implementation without getting into the details on any of those. That’s because the techniques are independent of file format, data element, and data attribute. There are tuning options available based on this–for example, packing data in the right way to maximize how much fits in a single buffer of memory, or minimizing network transit times for data movement.
With EDI, meanwhile, we can talk through the entire interchange without a single word about how data gets from you to me and what I do with it afterward. This provides us with a nice separation of duties, as we can have business analysts discussing the makeup of these files and data engineers discussing the processing and analysis of the data and outside of a few touchpoints, that can be handled independently.”