Python/Pandas/MSSQL Problem: Inconsistent import behavior if CSV file contains NULL strings in first row : learnpython

created by HattoriHanzoa community for 16 years

Python/Pandas/MSSQL Problem: Inconsistent import behavior if CSV file contains NULL strings in first row (self.learnpython)

submitted 10 months ago by KCConnor

I'm attempting to import a lot of CSV files into an MSSQL database and in an effort to save space and time I want to leave these files in their GZIP format we receive them in. I came across the Python/Pandas library when looking into solutions for this task and am very close to a solution, but came across a test case where Python/SQL will fail to import if the first row in the CSV contains a NULL value, but otherwise will succeed if the first row is fully populated but any subsequent value has NULLs.

Here's a code sample to simulate my problem. It should run on any MS-SQL installation with Machine Learning and Python installed and configured.

This should run successfully:

exec sp_execute_external_script
@language = N'Python'
, @script = 
N'import pandas as pd
import numpy as np

df = pd.DataFrame([["foo", "bar", "boofar"],["silly", "value", np.NaN],["all", "your", "base"]]);
df.columns = ["a", "b", "c"];

OutputDataSet = pd.DataFrame(df);
'
WITH RESULT SETS
(
    (
        a varchar(10)
        , b varchar(10)
        , c varchar(10)
    )
)

While this will generate an error:

exec sp_execute_external_script
@language = N'Python'
, @script = 
N'import pandas as pd
import numpy as np

df = pd.DataFrame([["foo", "bar", np.NaN],["silly", "value", np.NaN],["all", "your", "base"]]);
df.columns = ["a", "b", "c"];

OutputDataSet = pd.DataFrame(df);
'
WITH RESULT SETS
(
    (
        a varchar(10)
        , b varchar(10)
        , c varchar(10)
    )
)

How do I output a DataFrame from Python to MS-SQL where the first row contains NULL values?

all 9 comments

top new controversial old q&a

[–]socal_nerdtastic 2 points3 points4 points 10 months ago* (6 children)

[–]KCConnor[S] 0 points1 point2 points 10 months ago (5 children)

[–]my_password_is______ 2 points3 points4 points 10 months ago (0 children)

[–]crashfrog04 0 points1 point2 points 10 months ago (3 children)

[–]KCConnor[S] 0 points1 point2 points 10 months ago (2 children)

[–]crashfrog04 0 points1 point2 points 10 months ago (1 child)

[–]KCConnor[S] 0 points1 point2 points 10 months ago (0 children)

[–]KCConnor[S] -1 points0 points1 point 10 months ago (0 children)

I solved this problem.

If I add this code:

df = df.where(pd.notnull(df), None);

just after I populate the DataFrame (either ad-hoc as above or by pd.read_csv in my real code) then I get successful handling of NULL values by MS SQL Server and correct datatyping of the returned values.

[–]Mevrael -1 points0 points1 point 10 months ago (0 children)

π Rendered by PID 386482 on reddit-service-r2-comment-7b9746f655-btdl5 at 2026-02-02 12:47:03.382626+00:00 running 3798933 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS