The Random Thoughts of GeniusMusing

Just another random blog

10-52-2022-The Power of Python

- Posted in 52Posts-2022 by

It is, I assume, that most people who use Python well understand the power of Python. I have known but also just got a reminder. The current project/job I am working on is moving a companies data from one ERP system to a new ERP system. That involves a lot of data and as the current system is around ten years old, it has a lot of data as well as a lot of "dead" data.

Dead data?

Customers, parts, BOM's and other related items that don't need to move over to the new system. This is the basic data flow.

Create data views in the database to collect the required data, hopefully in one file. Export data from existing ERP system into csv files. Sort through csv files to remove the old items and format for the new system Save the "new" data as a csv file for import. Test importing the data. Find the issues. Correct the data. Rinse and Repeat.

There are other factors that also come into play like document versions. I have access to the master list so that when the data is ready for import it has the current version associated with the part number.

Why?

Because sometimes versions change while a part has not been running and the ERP data still has the older revision and might not be updated until it is to be run again and it could be months or more sometimes.

There are also links between customer part numbers and the internal part numbers. This is what re awakened me to the power of Python.

A little back story, I had been using Powershell for this work and it had been going OK up to a point.

Why Powershell?

It was there and seemed, at least at the time, to be working pretty OK but the run time seemed to keep getting longer and I didn't want to install something not on the system already and rebooting during working hours was not an option should it be needed.

Or did it just seem that way?

So I added a total time to run in the script, it was getting longer and longer as I added things. The longest run I had was around 7 minutes for one (yes, just one of the many scripts I had written) to run. And that was just loading the data and doing some initial filtering, not the final formatting and writing of the csv file.

I had also tried to clean up the incoming data for what I could, removing unneeded and unwanted things to make it as small as possible. While I don't have a lines read count it was probably around 300,000 lines.

They also have setup a development version of the new ERP system so that things can be tested and tried without affecting anything else so that is where I installed Python and got stared.

Actually, before that, I wrote a quick test script on my home machine to see just how much of a difference it would make.

Reading in just one file that I had timed in Powershell and doing the same operations in Python that I had in Powershell.

Powershell time: 3 minutes 20 seconds.

Python time: 0.06 seconds.

WOW!

So as I have been rewriting the script I have reached the point where the script functions are pretty much the same.

Powershell time: ~7 minutes.

Python time: ~7 seconds.

I am also now using unclean file with Python because it is much easier and speed is not an issue. One of the csv files being read in is 28MB.

The end result so far is in reading in over 1,000,000 lines of data and doing filtering takes ~7 seconds VS ~7 minutes.

OK, so why is this a big deal?

Like many things, getting the data filtered, sorted, formatted for the new system and not to mention the random bugs that creep up requires many iterations of testing. The faster I can iterate, the quicker it move forward. Assuming minimal time between iterations, I could get about 10 per hour in Powershell VS ~50 per hour in Python.

Behold the power of Python.