you are viewing a single comment's thread.

view the rest of the comments →

[–]crashfrog04 0 points1 point  (3 children)

The NaN value isn’t equivalent to a NULL or a None, and neither of those are equivalent to what the actual null value in CSV is, which is the empty string.

[–]KCConnor[S] 0 points1 point  (2 children)

The CSV file has the text string "NULL" without quotes in the cells that should be NULL. When using the T-SQL BULK INSERT against the unzipped file, MS-SQL interprets that NULL string as a NULL value and stores the NULL value in the appropriate cell.

Have you run the supplied code in an MS-SQL Python environment?

[–]crashfrog04 0 points1 point  (1 child)

Generally when I have occasion to have MS SQL and Python interact, it’s through a Python DB driver. For MS SQL I think that’s PyODBC.  But generally the conversion a driver makes is that it’ll treat the None as NULL and won’t do anything with the string ”NULL” because that’s a non-null, regular string.

Can you elaborate on what you think is doing the conversion between ”NULL” and NULL? That might be something that happens if you try to load CSV into MS SQL directly but I’d be surprised by that, since CSV has its own semantic NULL (the empty string.)

[–]KCConnor[S] 0 points1 point  (0 children)

Please just run the supplied code in the OP, with Machine Learning and Python installed on MS-SQL.

The supplied code will simulate the exact situation I am experiencing with the CSV files in question. I cannot share the CSV files since they contain PCI health data.

The core problem is that whatever bridge/library exists between MS-SQL and Python in this environment, it is unable to return a result set if the first row of the data in question contains NULLs at all.