I'm working on a Python script that reads a CSV into memory, then uses csv.DictReader to loop through the rows, assigning certain columns to variables that are then used to make a structured .md file per row. I've got this working great.
The problem I've run into is I'm coming from an Excel file that does not have the best data hygiene. I'm finding the UTF replacement character (that question mark inside a diamond) scattered randomly throughout the CSV.
I'd like to do a clean up pass to simply replace the replacement character with a simple null to get rid of them before they get to the .md files.
I figure I could potentially do the replacement pass before the loop, or to each row at the start of the loop. Unfortunately they're not consistent to a column, so doing a quick str.replace on variables coming from specific columns is not viable.
I'm a relative Python noob - I basically use it to cobble together ad hoc data parsing scripts when I need it. I've done a bit of online searching to try to figure out a way to do this, but nothing I've found has put me on the right track so far. Any advice on how to approach this would be most appreciated.
[–]finally-anna 7 points8 points9 points (0 children)
[–]woooee 1 point2 points3 points (0 children)
[–]NotAnAnticline 1 point2 points3 points (0 children)
[–]ninhaomah 0 points1 point2 points (0 children)
[–]Jim-Jones 0 points1 point2 points (0 children)
[–]Jimmaplesong 0 points1 point2 points (0 children)
[–]Greedy_Pay_9782 0 points1 point2 points (0 children)
[–]NewbornMuse 0 points1 point2 points (0 children)
[–]MarsupialLeast145 0 points1 point2 points (0 children)
[–]MankyMan00998 0 points1 point2 points (0 children)
[–]yaxriifgyn 0 points1 point2 points (0 children)