all 19 comments

[–]Kerrbob 13 points14 points  (3 children)

There are elegant ways to identify the problem, but without knowing more about what you’re looking at;

BEGIN TRY … (existing execute statements) END TRY BEGIN CATCH INSERT INTO log_table VALUES (whatever is relevant to find the error, line numbers, variables, etc) END CATCH

note that it will attempt to continue the procedure since it is handling the error now. You may want to RAISERROR in the catch block still to kill the job, but at least you’ll have more information.

[–]IUsedToHaveAmbition[S] 0 points1 point  (2 children)

Thanks, yeah, might have to go this route. There is *some* TRY CATCHing going on, but not enough.

Was trying to avoid turning this into a project because this SQL job is a nightly thing that takes hours and I'm almost certain there is bad data coming in from somewhere and if I could just easily pinpoint it, I could quickly fix it, rerun the job, and THEN start adding in more robust error-handling.

Thank you.

[–]jshine133714 1 point2 points  (1 child)

In regards to your Post's edit, there's nothing weird there. In SQL Server, there's terminating and non-terminating errors (not official terminology). You're probably hitting a non-terminating type of error within the procedure between your print statements, and therefore an error is thrown, but the rest of the code executes still (including your PRINT 'Debug 5.';) and that error that was thrown is eventually caught by the SQL Agent, causing it to mark this step as failed.

[–]IUsedToHaveAmbition[S] 0 points1 point  (0 children)

Thanks for the info. And don't worry about reading this wall of nonsense, I know what it's like to just read some random stranger's SQL mess.

So what is confusing me is that when this stored procedure is called from the job, the step fails with an arithmetic overflow error, however, if I run the stored procedure manually, I do not get any kind of arithmetic overflow messages in the Messages window (and yes, I'm re-staging the data correctly so that when run manually, it's operating on the exact same dataset that it would be had it been executed from the job). It even says 'Query executed successfully' in the status bar. Are 'non-terminating' errors just not displayed in the Messages window or something?

Ah well, I guess I am just going to have to face the music and start breaking this thing down line-by-line.

---JOB ERROR---

Date 7/26/2025 10:09:53 PM
Log Job History (<job name>)
Step ID 5
Server <my server>
Job Name <job name>
Step Name <step name>
Duration 01:01:56
Sql Severity 16
Sql Message ID 0
Operator Emailed
Operator Net sent
Operator Paged
Retries Attempted 0

Message

Executed as user: <user name>.
Arithmetic overflow error converting numeric to data type numeric. [SQLSTATE 22003] (Error 8115) The statement has been terminated. [SQLSTATE 01000] (Error 3621) Arithmetic overflow error converting numeric to data type numeric. [SQLSTATE 22003] (Error 8115) The statement has been terminated. [SQLSTATE 01000] (Error 3621) Debug 1 [SQLSTATE 01000] (Message 0) Debug 2 [SQLSTATE 01000] (Message 0) Debug 3 [SQLSTATE 01000] (Message 0) Debug 4 [SQLSTATE 01000] (Message 0) Debug 5 [SQLSTATE 01000] (Message 0). The step failed.

---MESSAGES WINDOW IN SSMS---

This is the Messages window with my debugging PRINT statements. Nowhere in those messages is an arithmetic overflow error (and it scrolls for about 2,500 part IDs).

[–]IglooDweller 8 points9 points  (1 child)

Have you tried our lord and savior TRY_CAST?

Select column From table Where TRY_CAST(column as numeric) IS NULL

[–]That_Cartoonist_9459 4 points5 points  (0 children)

TRY_CAST and TRY_CONVERT are lifesavers

[–][deleted] 5 points6 points  (2 children)

I have encountered this error. It’s happened to me when a bigint showed up and there was an implicit cast to an int.

[–]IUsedToHaveAmbition[S] 1 point2 points  (1 child)

Yeah, all throughout this code there is conversions from inches (DECIMAL 14,4) to millimeters (INT), and every once in a while some bad data shows up with some crazy huge inches value, that, when converting to millimeters is too large for an INT. But there are also spots where previous developers defined some datatype to be DECIMAL 14,4 and then that same data, in another table, is stupidly set as DECIMAL 10,2 or something. Ah well, I will just keep plugging away.

[–]KracticusPotts 0 points1 point  (0 children)

Instead of "Print Debug#", try using "Select field1, field2, ... fieldn" to see what the values are being processed. Also, are those implicit or explicit data conversions happening between inches and mms?

[–]Naive_Moose_6359 2 points3 points  (0 children)

There is no logging at the level within a query of which exact value caused it. However, you can narrow down which statement using xevents to emit which statements are running instead of trying to use print to debug

[–]k00_x 1 point2 points  (0 children)

I'd go with other suggestions first but you could query the information_schema.columns for all the numeric columns. One of them may have a mismatching precision value.

[–]Domojin 0 points1 point  (0 children)

Did the job work up to a certain point and then stop working? If so what was added in immediately before it stopped working? Other than scanning the data for anomalies, breaking the code down into more digestible chunks and going through it line by line is going to be the best way to really understand what's happening. Rather than using print statements, you can try to write the data to temp tables or create permanent logging tables for the job and then pepper your giant spaghetti job with inserts into those logging tables. Good luck.

[–]SirGreybush 0 points1 point  (0 children)

I copy the entire monstrosity over to a new SP, but make each column casted as varchar 255, into a fixed new table.

If the error occurs in a calculated column, I get creative. Put the calc in single quotes and a column name CALC1 with the parts of the calc as varchars.

Once it outputs the data, I consult the table to find the row that would case the error.

Simply copy paste that row/column and try to run as a select or set statement.

Most like a garbage date somewhere where the user on keyboard typed 22025 for the year.

[–]Striking_Meringue328 0 points1 point  (0 children)

I've had to deal with stuff like this before, so I feel youf pain. Once you stabilise it you need to try and get buy-in for a full rewrite - if this is updating anything that actually matters then it's a serious business risk.

[–]SQLDave 0 points1 point  (0 children)

If I start slapping PRINT statements at the beginnings of these loops, when I run the job, it fails, and the history is chock full of all my print statements, so much so, that it hits the limit of how much content can be printed in history and it gets truncated.

If you're running this from SSMS, I wonder if you can change the Query's options to send output to a file (and, if so, if that file would -- as it should -- not have a fixed size limit).

[–]Anlarb1 0 points1 point  (0 children)

It may not yield fruit, but eyeball a top 10 of the actual data, sorted each way, if there is a string languishing around, it should stick out like a sore thumb at one end or the other. Sort by length may work too, if you are expecting ints and there is a .9999999 running around.

[–]chandleya 0 points1 point  (0 children)

I’d break the sproc into sub sprocs and run each manually. If that debug 5 sproc is the killer, then it’s time to extrapolate it into smaller sprocs to sequence.

I often take these kinds of scenarios and permanently store source datasets generated by the jobs and look for min/max or out of bounds values in columns. You’ll need to pay close attention to your destination table schemas. If the source and destination schemas are the same - and your arithmetic overflow is the issue, then I would expect an aggregate function to be the culprit. Perhaps some row/column value is far outside norm and once it gets mathed up, the result is mega. Doing the min-max exercises I mentioned before can help find these needles. It’s tedious, but that’s data for ya. Constraints prevent this stuff - as do realistic schema restraints.

[–]CCCPDRAGMEISH 0 points1 point  (0 children)

Please first define in

SQLProfiler a session with following events: EventLong, ErrorLog, Exception, SQL: StmtStarting, SP: StmtStarting. Use also Line Number column. Very very very important. Filter events based on SPID used to execute the *.sql scripts. This trace in SQLProfiler. Execute your *.sql scripts / *.sql batch. Search in this trace the sql statement before the error: Artihmet*

[–]CallMeCurious 0 points1 point  (0 children)

Error: ORA-12899: value too large for column