View Issue Details
|ID||Project||Category||View Status||Date Submitted||Last Update|
|0001406||bareos-core||file daemon||public||2021-12-08 07:15||2022-01-31 09:31|
|Status||closed||Resolution||unable to reproduce|
|Platform||64bit||OS||Windows||OS Version||Server 2016|
|Summary||0001406: file daemon crashes on Windows Server 2016|
|Description||Sometimes the file daemon crashes on Windows Server 2016. This happened two times in the last month.|
Patch level of the Server is from Microsoft October 2021 Patchday.
File daemon Version: 19.2.7 (16 April 2020) VSS Linux Cross-compile Win64
Microsoft Windows Server 2012 Standard Edition (build 9200), 64-bit
Error in the Windows event log:
Event 1000, Application Error
Name der fehlerhaften Anwendung: bareos-fd.exe, Version: 0.0.0.0, Zeitstempel: 0x5e98651f
Name des fehlerhaften Moduls: libbareos.dll, Version: 0.0.0.0, Zeitstempel: 0x5e9864a8
ID des fehlerhaften Prozesses: 0x3db8
Startzeit der fehlerhaften Anwendung: 0x01d7e4e8765df91e
Pfad der fehlerhaften Anwendung: C:\Program Files\Bareos\bareos-fd.exe
Pfad des fehlerhaften Moduls: C:\Program Files\Bareos\libbareos.dll
Vollständiger Name des fehlerhaften Pakets:
Anwendungs-ID, die relativ zum fehlerhaften Paket ist:
|Tags||No tags attached.|
Would you like to help us to understand what's going on ?
Could you describe a bit more your configuration of the FD (config files for example, you can blank password)
also job and fileset involved, plugins used etc.
Is this happening when the FD is doing something special, what the occurences (number of time per day, week, month)
Could you try to increase the debug level to 200 on the client to get nice timestamped trace and report them here ?
file daemon configuration:
Name = igms00-fd
Maximum Concurrent Jobs = 20
Name = bareos-dir
Password = "xxx"
Description = "Allow the configured Director to access this file daemon."
This is the fileset and job during which the fd crashed last time.
When the crash happened the job was running for about 36 hours of estimated 96 hours. The total backup volume of a successful job would have been about 18TB in several million files.
Name = "FileSetIGMS00_bilddaten"
Enable VSS = yes
Signature = MD5
Drive Type = fixed
IgnoreCase = yes
# if supported by the OS, the read time won't be adapted
# this would generate a bunch of writes for no reason on the client machine.
noatime = yes
# If enabled, the Client will check size, age of each ﬁle after their backup
# to see if they have changed during backup. If time or size mismatch, an error will raise.
# In general, it is recommended to use this option.
checkfilechanges = yes
WildFile = "[A-Z]:/pagefile.sys"
WildDir = "[A-Z]:/RECYCLER"
WildDir = "[A-Z]:/$RECYCLE.BIN"
WildDir = "[A-Z]:/System Volume Information"
WildDir = "[A-Z]:/tmp/bareos-restores"
WildDir = "[A-Z]:/Temp"
Exclude = yes
File = "d:/Bilddaten"
# Don’t add trailing /
File = "d:/Bilddaten/_archivieren"
File = "d:/Bilddaten/_restored"
Name = "filebackup_bilddaten-igms00-fd"
JobDefs = "DefaultFileJob"
Pool = Bilddaten
#Pools müssen explizit angegeben werden sonst werden die Pools aus "DefaultFileJob" verwendet!
Full Backup Pool = Bilddaten
Differential Backup Pool = Bilddaten
Incremental Backup Pool = Bilddaten
Client = "igms00-fd"
FileSet = "FileSetIGMS00_bilddaten"
Schedule = "YearlyCycle"
Enabled = yes
The crash before happened while five jobs where running in parallel.
The fileset and job configurations were different but similar to the one above.
I started the filedaemon with option "-d 200", see screenshot attached.
Is this the correct sytanx for the windows version of the file daemon?
How can I verify that the service is running with debug level 200?
>Is this happening when the FD is doing something special, what the occurences (number of time per day, week, month)
Nothing special was done. The jobs and filesets running didn't change for months.
The crash happened two times, on 2021-12-08 and 2021-11-26.
This were the only occurrences so far.
I wouldn't have change the start of the daemon (especially on nitty picky windows) the command given in previous comment allow to dynamically set and remove debug level.
If you want to do so you can refer to the documentation
As the crash occur two times quite recently, it would be interesting to check if there's any traces that would have been generated.
Could you check if inside the system (normally they are located in bareos working dir, but I can't be sure at 100% under windows) you can find file with .traceback and .bactrace extension.
If yes, could you please attach them here.
|Ping ? Any news on it ?|
|Sorry, my colleague is out of office. Unfortunately, I have no spare time to help you with this. My colleague will be back after Christmas.|
>If you want to do so you can refer to the documentation
Thank you for pointing me to the right documentation chapter - with the new documentation system it is very hard to find the information to a specific topic since the search is not working well, see bug 0001351
This helped me to start the windows service in debug mode. It showed that the method I used was not working correctly.
>As the crash occur two times quite recently, it would be interesting to check if there's any traces that would have been generated.
I could not find any .traceback or .bactrace files.
So far the crash did not occur again. I will inform you as soon as it happens again and since the debug mode is working now I hopefully will also be able to provide a trace file.
Hello, beware that using a debug level may create really large *trace* file.
Have a look from time to time, rotate them manually if they became to big.
Didn't you get any crash, and so a trace since a month ?
Maybe also it would be the time to upgrade to last stable 21 release.
Without crash or debug, keeping this ticket open doesn't make too much sense.
>Hello, beware that using a debug level may create really large *trace* file.
>Have a look from time to time, rotate them manually if they became to big.
I learned that the hard way already before your hint, when your server ran out of disk space ;)
Unfortunately I had to stop tracing because of that. Your server does not have enough free space on C: to store the trace from a full backup job.
But it wasn't a big loss since the crash also did not reappear again.
>Maybe also it would be the time to upgrade to last stable 21 release.
I agree - I am already planning for that.
>Without crash or debug, keeping this ticket open doesn't make too much sense.
Yes, you can close the ticket.
Many thanks for your effort!
|2021-12-08 07:15||Int||New Issue|
|2021-12-09 10:31||bruno-at-bareos||Note Added: 0004385|
|2021-12-10 13:11||Int||File Added: bareos-fd_debug200.png|
|2021-12-10 13:11||Int||Note Added: 0004388|
|2021-12-10 13:16||Int||Note Added: 0004389|
|2021-12-13 17:00||bruno-at-bareos||Note Added: 0004391|
|2021-12-22 16:33||bruno-at-bareos||Note Added: 0004407|
|2021-12-22 16:42||Int||Note Added: 0004410|
|2021-12-27 09:38||Int||Note Added: 0004420|
|2022-01-10 10:43||bruno-at-bareos||Assigned To||=> bruno-at-bareos|
|2022-01-10 10:43||bruno-at-bareos||Status||new => assigned|
|2022-01-10 10:44||bruno-at-bareos||Status||assigned => feedback|
|2022-01-10 10:44||bruno-at-bareos||Note Added: 0004461|
|2022-01-27 16:24||bruno-at-bareos||Note Added: 0004490|
|2022-01-28 07:56||Int||Note Added: 0004492|
|2022-01-28 07:56||Int||Status||feedback => assigned|
|2022-01-31 09:31||bruno-at-bareos||Status||assigned => closed|
|2022-01-31 09:31||bruno-at-bareos||Resolution||open => unable to reproduce|
|2022-01-31 09:31||bruno-at-bareos||Note Added: 0004495|