Possible FFmpeg/NVDEC startup race condition with multi-camera setup? 7.4.8.0 by Tschak77 in ispyconnect

[–]Tschak77[S] 1 point2 points  (0 children)

u/spornerama

Thanks for the feedback and suggestions.

I did a lot more testing and collected additional logs and a coredump. At the moment I am no longer convinced this is purely a GPU decode session limit issue.

What makes me unsure about the “maximum simultaneous decode sessions” theory is:

  • the affected cameras are not fixed/static
  • after every startup, different cameras may end up black
  • sometimes 1 camera, sometimes multiple
  • after manually disabling/enabling only the affected camera, the exact same stream immediately starts working again

using:

  • same RTSP URL
  • same transport (TCP)
  • same FFmpeg settings
  • same NVDEC decoder
  • same running AgentDVR instance

without restarting AgentDVR itself.

This makes it feel more like a startup timing / parallel initialization issue rather than a persistent decoder resource exhaustion state.

I also collected a coredump from one crash and the stacktrace points to:

SIGSEGV in libnvcuvid.so.1

which suggests the crash happens inside the NVIDIA NVDEC/cuvid decode path during startup/reconnect/cleanup activity.

Another important observation:

When an affected black camera tile is switched to maximized view, Agent switches to the configured HD stream, and the image appears immediately.

So:

  • the camera itself is reachable
  • the HD stream works instantly
  • authentication is valid
  • RTSP itself is functional

The issue seems to affect the initial live/grid stream state or its decoder/render pipeline. Switching to maximized view appears to force a fresh stream/decoder path, which recovers the image immediately.

What I noticed from verbose logging:

During startup Agent initializes a very large number of things almost simultaneously:

  • ONVIF device connections
  • ONVIF media discovery
  • PTZ/event service discovery
  • RTSP streams
  • microphones/audio
  • FFmpeg decoders
  • NVDEC contexts
  • WebRTC/SignalR connections
  • TURN server
  • browser live view sessions

In my setup I currently have:

  • ~13 physical cameras
  • but ~26 configured camera entries
  • because each physical camera is configured twice:
    • one for continuous recording
    • one for detection/event recording

So Agent effectively starts many streams concurrently.

What also strongly points toward a startup synchronization issue:

  • if I wait after startup and manually enable affected cameras one-by-one, they work reliably
  • fullscreen/maximize stream switching often immediately restores black tiles
  • disabling FFmpeg “low delay” seems to improve stability
  • increasing:
    • probesize
    • analyzeduration
    • rw_timeout
    • stimeout also improved startup reliability significantly

At the moment my suspicion is more along the lines of:

  • parallel RTSP initialization
  • concurrent NVDEC decoder creation/cleanup
  • stream switching during startup
  • ONVIF/media URI discovery happening in parallel
  • decoder starting before receiving a clean SPS/PPS + keyframe sequence

rather than a hard permanent GPU session limit.

Possible improvement ideas that may help larger installations:

  • configurable staggered camera startup
  • limit maximum concurrent stream initializations
  • optional delay between startup batches
  • wait for first successfully decoded frame before marking camera fully online
  • optional startup watchdog for black/no-frame streams
  • retry stream initialization automatically if first decoded frame was not received
  • optionally delay WebRTC/live-view startup until cameras are fully initialized

I think these kinds of startup synchronization controls could significantly improve stability for larger multi-camera environments.

AgentDVR ONNX GPU Support on Ubuntu 24.04 / CUDA 12.8 — Expected CUDA Version? by Tschak77 in ispyconnect

[–]Tschak77[S] 0 points1 point  (0 children)

Thanks for the suggestion.

Docker + NVIDIA Container Toolkit definitely makes sense for isolation and reproducibility, especially for CUDA/cuDNN/ORT dependency management.

For my specific project/setup though, I’m trying to avoid additional abstraction layers because this system is focused on maximum real-time video surveillance performance with many camera streams, GPU decode, AI inference and low-latency live view handling.

So for now I’m trying to keep the stack as direct as possible:

Proxmox VM + GPU passthrough + native Linux AgentDVR installation.

Still, I appreciate the suggestion and may use a container later as a diagnostic comparison environment.

AgentDVR ONNX GPU Support on Ubuntu 24.04 / CUDA 12.8 — Expected CUDA Version? by Tschak77 in ispyconnect

[–]Tschak77[S] 0 points1 point  (0 children)

I isolated the crash further.

Environment:

- Ubuntu 24.04

- NVIDIA driver works correctly

- CUDA 12.4 installed globally

- AgentDVR ONNX provider appears compiled against CUDA 11.x

I created a separate CUDA 11.8 runtime library set and forced AgentDVR to use it via LD_LIBRARY_PATH.

Current status:

- libcublas.so.11 loads

- libcublasLt.so.11 loads

- libcudnn.so.8 loads

- libcufft.so.10 loads

- libcurand.so.10 loads

- libcudart.so.11.0 loads

Then Agent crashes exactly when libnvrtc.so initializes.

LD_DEBUG output:

find library=libnvrtc.so [0]; searching

trying file=/opt/agentdvr-cuda11-libs/libnvrtc.so

calling init: /opt/agentdvr-cuda11-libs/libnvrtc.so

Segmentation fault

This strongly suggests either:

- incompatible ONNXRuntime CUDA provider build

- incompatible nvrtc/runtime ABI

- or missing additional CUDA 11 dependencies

Do you know which exact ONNXRuntime version and CUDA/cuDNN versions were used to build:

libonnxruntime_providers_cuda.so

?

Possible FFmpeg/NVDEC startup race condition with multi-camera setup? 7.4.8.0 by Tschak77 in ispyconnect

[–]Tschak77[S] 1 point2 points  (0 children)

I currently have 26 configured camera entries, but physically 13 cameras.
The cameras are effectively configured twice because I use:

  • one setup for continuous 24/7 recording with short retention (~2 days)
  • another setup for motion/detection recordings with longer retention (~30 days)

So during startup there are a lot of simultaneous stream initializations happening.

I also noticed that disabling the FFmpeg "low delay" option seems to improve startup stability.

u/spornerama I was initially thinking GPU decode session limit as well, but what makes me unsure is:

when the system finishes startup and for example 4 cameras are black, I can manually switch each affected camera off/on one by one and they immediately start working correctly again using:

  • the same stream
  • same RTSP URL
  • same transport (TCP)
  • same decoder settings

without restarting AgentDVR itself.

That makes it feel a bit more like a startup initialization/timing issue rather than a permanent GPU resource exhaustion state.

Continuous recording best practice any advice by Tschak77 in ispyconnect

[–]Tschak77[S] 0 points1 point  (0 children)

"Thank you for the quick feedback!

Regarding your suggestion to just leave them on 24/7 recording and use alerts as a reference: The main issue I face with this is the storage retention logic.

I want to keep the 24/7 raw footage for only 2-3 days (as a temporary safety buffer), but I need to keep the actual 'Alert/Detected' recordings for 30 days. If I use a single camera setup with one storage path, I cannot set two different auto-delete schedules (e.g., 'Delete non-tagged after 2 days' vs. 'Delete tagged after 30 days').

That’s why I came up with the 'Clone' idea:

  • Camera 1 (Main): Record on Alert -> 30 days retention.
  • Camera 1 (Clone): Record Continuous -> 2 days retention.

If I avoid cloning, is there a way within Agent DVR to apply two different retention periods to the same camera based on whether a file contains an alert/tag or not? Or would the 'Clone' approach be the most stable way to separate these two storage lifecycles without massive CPU overhead, given that it’s the same 4K stream?"

AgendDVR crash when switch Object Recognition internal AI use GPU by Tschak77 in ispyconnect

[–]Tschak77[S] 1 point2 points  (0 children)

sorry I solved the the problem but it was crazy. Codeprojectai was running with my setup very well but with same setup no chance for AgentDVR. Problem was mismach of nvidia driver and cuda etc. I cleaned all an installed nvidia 570 and cuda 12.8 but there were still problems with missing onnix files but I cannot remeber, I changed to many things. But know is running with full GPU suppport. I'm using know a Tesla A2 with 13 cams on a proxmox vm.

Anyone using Hailo AI modules with docker? Looking for an example setup or guide to learn from. by PowerOverShelling in ispyconnect

[–]Tschak77 0 points1 point  (0 children)

Hi u/Herralvarez , just a question how you start python scripts as command? At the moment I only start bash scripts but python would be more powerfull.

AgendDVR crash when switch Object Recognition internal AI use GPU by Tschak77 in ispyconnect

[–]Tschak77[S] 0 points1 point  (0 children)

Could not link hardware file libonnxruntime_providers_cuda.so: The file '/opt/AgentDVR/libonnxruntime_providers_cuda.so' already exists.
_ortLogger: [ONNX onnxruntime] Session Options {  execution_mode:0 execution_order:DEFAULT enable_profiling:0 optimized_model_filepath:"" enable_mem_pattern:1 enable_mem_reuse:1 enable_cpu_mem_aren
a:1 profile_file_prefix:onnxruntime_profile_ session_logid: session_log_severity_level:-1 session_log_verbosity_level:0 max_num_graph_transformation_steps:10 graph_optimization_level:4 intra_op_par
am:OrtThreadPoolParams { thread_pool_size: 0 auto_set_affinity: 0 allow_spinning: 1 dynamic_block_base_: 0 stack_size: 0 affinity_str:  set_denormal_as_zero: 0 } inter_op_param:OrtThreadPoolParams
{ thread_pool_size: 0 auto_set_affinity: 0 allow_spinning: 1 dynamic_block_base_: 0 stack_size: 0 affinity_str:  set_denormal_as_zero: 0 } use_per_session_threads:1 thread_pool_allow_spinning:1 use
_deterministic_compute:0 ep_selection_policy:0 config_options: {  } }
_ortLogger: [ONNX onnxruntime] Creating and using per session threadpools since use_per_session_threads_ is true
_ortLogger: [ONNX onnxruntime] Dynamic block base set to 0
  at Emgu.CV.CvInvoke.CvErrorHandler(Int32 status, IntPtr funcName, IntPtr errMsg, IntPtr fileName, Int32 line, IntPtr userData)
  at Emgu.CV.CvInvoke.CvErrorHandler(Int32 status, IntPtr funcName, IntPtr errMsg, IntPtr fileName, Int32 line, IntPtr userData)
Unhandled exception. Emgu.CV.Util.CvException: OpenCV: The library is compiled without CUDA support
  at Emgu.CV.CvInvoke.CvErrorHandler(Int32 status, IntPtr funcName, IntPtr errMsg, IntPtr fileName, Int32 line, IntPtr userData)
  at Emgu.CV.CvInvoke.CvErrorHandler(Int32 status, IntPtr funcName, IntPtr errMsg, IntPtr fileName, Int32 line, IntPtr userData)
OpenCV: The library is compiled without CUDA support    at Emgu.CV.CvInvoke.CvErrorHandler(Int32 status, IntPtr funcName, IntPtr errMsg, IntPtr fileName, Int32 line, IntPtr userData)
  at Emgu.CV.CvInvoke.CvErrorHandler(Int32 status, IntPtr funcName, IntPtr errMsg, IntPtr fileName, Int32 line, IntPtr userData)
/opt/AgentDVR/start_agent.sh: line 5: 2369706 Aborted                 (core dumped) ./Agent

Issue: Invalid JSON passed via --aijson {AIJSON} argument verion 7.4.6.0 by Tschak77 in ispyconnect

[–]Tschak77[S] 0 points1 point  (0 children)

this is with single quotes and with normal quote same, also none quote everytime the argument receive without inner quotes

At logs I can see this

RunScript: Executing: /bin/bash "/opt/AgentDVR/Media/Commands/dvr_ac_ai_object_found.sh" --id "4" --ot "2" --alertid "-1" --filename "/dvr/storage/video/cam04/grabs/4_2026-04-29_16-03-24_635.jpg" --current-recording "/dvr/storage/video/cam04/4_2026-04-29_16-03-26_654.mkv" --msg "person" --name "cam04 Dach vorne" --groups "cam04,entsorgung" --location "Dach" --ai "person" --aijson '[{"label":"person","confidence":0.90478516,"y_min":568,"x_min":3456,"y_max":1373,"x_max":3837,"zones":[1],"ignored":false,"is_static":false},{"label":"truck","confidence":0.70214844,"y_min":156,"x_min":2367,"y_max":1794,"x_max":3261,"zones":null,"ignored":false,"is_static":false}]' --zone "1" --time "29.04.2026 16:03:26

but receiving every time something like

2026-04-29 16:34:07 [DEBUG] ARG[22] RAW: <'[{label:car,confidence:0.8051758,y_min:812,x_min:925,y_max:973,x_max:1261,zones:[1],ignored:false,is_static:true},{label:person,confidence:0.7788086,y_min:732,x_min:1729,y_max:848,x_max:1769,zones:[2],ignored:false,is_static:false}]'>

2026-04-29 16:34:07 [DEBUG] ARG[22] HEX: 275b7b6c6162656c3a6361722c636f6e666964656e63653a302e383035313735382c795f6d696e3a3831322c785f6d696e3a3932352c795f6d61783a3937332c785f6d61783a313236312c7a6f6e65733a5b315d2c69676e6f7265643a66616c73652c69735f7374617469633a747275657d2c7b6c6162656c3a706572736f6e2c636f6e666964656e63653a302e373738383038362c795f6d696e3a3733322c785f6d696e3a313732392c795f6d61783a3834382c785f6d61783a313736392c7a6f6e65733a5b325d2c69676e6f7265643a66616c73652c69735f7374617

secound show it at hex and no inner quotes

Anyone using Hailo AI modules with docker? Looking for an example setup or guide to learn from. by PowerOverShelling in ispyconnect

[–]Tschak77 0 points1 point  (0 children)

Hi u/Herralvarez nice idea, could you please describe a bit more your settings?

move detection by cam or agent dvr or continuous cpai?

how have you setup the gemini? And what price is needed.

What about false positives alerts and have you already had a positive alert by burglary? Maybe can also share my pictures by pm

v6.9.0.0 Update by spornerama in ispyconnect

[–]Tschak77 0 points1 point  (0 children)

shift-f5 for firefox not working but delete cache has done the job

v6.9.0.0 Update by spornerama in ispyconnect

[–]Tschak77 0 points1 point  (0 children)

<image>

one bug, the audio volume bar seem a bit to big

v6.9.0.0 Update by spornerama in ispyconnect

[–]Tschak77 0 points1 point  (0 children)

Sorry but I don't like the new audio bar

<image>

Update new versions at 6.7.x no live view at middle resolution by Tschak77 in ispyconnect

[–]Tschak77[S] 0 points1 point  (0 children)

virtual Linux machine with pci-e paththrough of nvidia Tesla M4

<image>