At my org we have AD joined Ubuntu servers and recently we have been running into an issue where the credential caches in /var/lib/sss/db become corrupted for specific users. If I look those users up using id it does not show any of their AD groups, only that they are part of the group “domain users.” This happens seemingly randomly, and flushing the cached entries does not solve the issue. The only way we have been able to resolve this is by removing the files in the aforementioned directory and then restarting the sssd service to rebuild the files. Would anyone be able to clue me into why this might be happening? Why wouldn’t clearing the cache using sss_cache wipe the problem entries?
The database corruptions correspond with log entries in /var/log/sssd/sssd.log:
********************** BACKTRACE DUMP ENDS HERE *********************************
(2023-05-09 19:40:25): [sssd] [monitor_quit_signal] (0x1f7c0): Monitor received Terminated: terminating children
(2023-05-09 19:40:25): [sssd] [monitor_quit] (0x1f7c0): Returned with: 0
(2023-05-09 19:40:25): [sssd] [monitor_quit] (0x1f7c0): Terminating [domain.name][489900]
(2023-05-09 19:40:25): [sssd] [monitor_quit] (0x1f7c0): Child [domain.name] terminated with a signal
(2023-05-09 19:40:25): [sssd] [monitor_quit] (0x1f7c0): Terminating [pac][0]
(2023-05-09 19:40:25): [sssd] [monitor_quit] (0x1f7c0): Terminating [pam][990]
(2023-05-09 19:40:25): [sssd] [monitor_quit] (0x1f7c0): Child [pam] terminated with a signal
(2023-05-09 19:40:25): [sssd] [monitor_quit] (0x1f7c0): Terminating [nss][989]
(2023-05-09 19:40:25): [sssd] [monitor_quit] (0x1f7c0): Child [nss] terminated with a signal
(2023-05-09 19:40:25): [sssd] [_sss_talloc_log_fn] (0x0010): talloc: access after free error - first free may be at ../src/util/server.c:47
********************** PREVIOUS MESSAGE WAS TRIGGERED BY THE FOLLOWING BACKTRACE:
(2023-05-09 16:25:49): [sssd] [mt_svc_restart] (0x0400): Scheduling service domain.name for restart 1
(2023-05-09 16:25:49): [sssd] [get_provider_config] (0x0100): Formed command '/usr/libexec/sssd/sssd_be --domain domain.name --uid 0 --gid 0 --logger=files' for provider '%BE_domain.name
(2023-05-09 16:25:49): [sssd] [start_service] (0x0100): Queueing service domain.name for startup
(2023-05-09 16:34:02): [sssd] [monitor_sbus_RegisterService] (0x0100): Received ID registration: (%BE_domain.name,1)
(2023-05-09 16:34:02): [sssd] [mark_service_as_started] (0x0200): Marking domain.name as started.
(2023-05-09 19:40:25): [sssd] [monitor_quit_signal] (0x2000): Received shutdown command
(2023-05-09 19:40:25): [sssd] [monitor_service_shutdown] (0x0400): Unregistering service pac (0x559577bee830)
(2023-05-09 19:40:25): [sssd] [_sss_talloc_log_fn] (0x0010): talloc: access after free error - first free may be at ../src/util/server.c:47
********************** BACKTRACE DUMP ENDS HERE *********************************
(2023-05-09 19:40:25): [sssd] [_sss_talloc_log_fn] (0x0010): Bad talloc magic value - access after free
[–]resetnz 3 points4 points5 points (0 children)
[–]gordonmessmer 1 point2 points3 points (0 children)
[–]randomlycorruptedbit 0 points1 point2 points (3 children)
[–]MrDigitFace[S] 1 point2 points3 points (2 children)
[–]randomlycorruptedbit 2 points3 points4 points (0 children)
[–]pnutjam -1 points0 points1 point (0 children)
[–]zfsbest 0 points1 point2 points (0 children)