0001406: file daemon crashes on Windows Server 2016

ID	Project	Category	View Status	Date Submitted	Last Update

0001406	bareos-core	file daemon	public	2021-12-08 07:15	2022-01-31 09:31

Reporter	Int	Assigned To	bruno-at-bareos
Priority	normal	Severity	crash	Reproducibility	sometimes
Status	closed	Resolution	unable to reproduce
Platform	64bit	OS	Windows	OS Version	Server 2016
Product Version	19.2.11

Summary	0001406: file daemon crashes on Windows Server 2016
Description	Sometimes the file daemon crashes on Windows Server 2016. This happened two times in the last month. Patch level of the Server is from Microsoft October 2021 Patchday. File daemon Version: 19.2.7 (16 April 2020) VSS Linux Cross-compile Win64 Microsoft Windows Server 2012 Standard Edition (build 9200), 64-bit Error in the Windows event log: Event 1000, Application Error Name der fehlerhaften Anwendung: bareos-fd.exe, Version: 0.0.0.0, Zeitstempel: 0x5e98651f Name des fehlerhaften Moduls: libbareos.dll, Version: 0.0.0.0, Zeitstempel: 0x5e9864a8 Ausnahmecode: 0xc0000005 Fehleroffset: 0x0000000000022e8b ID des fehlerhaften Prozesses: 0x3db8 Startzeit der fehlerhaften Anwendung: 0x01d7e4e8765df91e Pfad der fehlerhaften Anwendung: C:\Program Files\Bareos\bareos-fd.exe Pfad des fehlerhaften Moduls: C:\Program Files\Bareos\libbareos.dll Berichtskennung: 37b0a4b3-d23d-4cf0-9149-b94dce2d6d2d Vollständiger Name des fehlerhaften Pakets: Anwendungs-ID, die relativ zum fehlerhaften Paket ist:
Tags	No tags attached.

bruno-at-bareos 2021-12-09 10:31 manager ~0004385	Would you like to help us to understand what's going on ? Could you describe a bit more your configuration of the FD (config files for example, you can blank password) also job and fileset involved, plugins used etc. Is this happening when the FD is doing something special, what the occurences (number of time per day, week, month) Could you try to increase the debug level to 200 on the client to get nice timestamped trace and report them here ?

Int 2021-12-10 13:11 reporter ~0004388	file daemon configuration: myself.conf Client { Name = igms00-fd Maximum Concurrent Jobs = 20 } bareos-dir.conf Director { Name = bareos-dir Password = "xxx" Description = "Allow the configured Director to access this file daemon." } This is the fileset and job during which the fd crashed last time. When the crash happened the job was running for about 36 hours of estimated 96 hours. The total backup volume of a successful job would have been about 18TB in several million files. Fileset: FileSet { Name = "FileSetIGMS00_bilddaten" Enable VSS = yes Include { Options { Signature = MD5 Drive Type = fixed IgnoreCase = yes # if supported by the OS, the read time won't be adapted # this would generate a bunch of writes for no reason on the client machine. noatime = yes # If enabled, the Client will check size, age of each ﬁle after their backup # to see if they have changed during backup. If time or size mismatch, an error will raise. # In general, it is recommended to use this option. checkfilechanges = yes WildFile = "[A-Z]:/pagefile.sys" WildDir = "[A-Z]:/RECYCLER" WildDir = "[A-Z]:/$RECYCLE.BIN" WildDir = "[A-Z]:/System Volume Information" WildDir = "[A-Z]:/tmp/bareos-restores" WildDir = "[A-Z]:/Temp" Exclude = yes } File = "d:/Bilddaten" } Exclude { # Don’t add trailing / File = "d:/Bilddaten/_archivieren" File = "d:/Bilddaten/_restored" } } Job: Job { Name = "filebackup_bilddaten-igms00-fd" JobDefs = "DefaultFileJob" Pool = Bilddaten #Pools müssen explizit angegeben werden sonst werden die Pools aus "DefaultFileJob" verwendet! Full Backup Pool = Bilddaten Differential Backup Pool = Bilddaten Incremental Backup Pool = Bilddaten Client = "igms00-fd" FileSet = "FileSetIGMS00_bilddaten" Schedule = "YearlyCycle" Enabled = yes } The crash before happened while five jobs where running in parallel. The fileset and job configurations were different but similar to the one above. I started the filedaemon with option "-d 200", see screenshot attached. Is this the correct sytanx for the windows version of the file daemon? How can I verify that the service is running with debug level 200? bareos-fd_debug200.png (10,642 bytes) bareos-fd_debug200.png (10,642 bytes)

Int 2021-12-10 13:16 reporter ~0004389	>Is this happening when the FD is doing something special, what the occurences (number of time per day, week, month) Nothing special was done. The jobs and filesets running didn't change for months. The crash happened two times, on 2021-12-08 and 2021-11-26. This were the only occurrences so far.

bruno-at-bareos 2021-12-13 17:00 manager ~0004391	I wouldn't have change the start of the daemon (especially on nitty picky windows) the command given in previous comment allow to dynamically set and remove debug level. If you want to do so you can refer to the documentation https://docs.bareos.org/master/TasksAndConcepts/TheWindowsVersionOfBareos.html?highlight=windows#windows-service As the crash occur two times quite recently, it would be interesting to check if there's any traces that would have been generated. Could you check if inside the system (normally they are located in bareos working dir, but I can't be sure at 100% under windows) you can find file with .traceback and .bactrace extension. If yes, could you please attach them here.

bruno-at-bareos 2021-12-22 16:33 manager ~0004407	Ping ? Any news on it ?

Int 2021-12-22 16:42 reporter ~0004410	Sorry, my colleague is out of office. Unfortunately, I have no spare time to help you with this. My colleague will be back after Christmas.

Int 2021-12-27 09:38 reporter ~0004420	>If you want to do so you can refer to the documentation >https://docs.bareos.org/master/TasksAndConcepts/TheWindowsVersionOfBareos.html?highlight=windows#windows-service Thank you for pointing me to the right documentation chapter - with the new documentation system it is very hard to find the information to a specific topic since the search is not working well, see bug 0001351 This helped me to start the windows service in debug mode. It showed that the method I used was not working correctly. >As the crash occur two times quite recently, it would be interesting to check if there's any traces that would have been generated. I could not find any .traceback or .bactrace files. So far the crash did not occur again. I will inform you as soon as it happens again and since the debug mode is working now I hopefully will also be able to provide a trace file.

bruno-at-bareos 2022-01-10 10:44 manager ~0004461	Hello, beware that using a debug level may create really large trace file. Have a look from time to time, rotate them manually if they became to big.

bruno-at-bareos 2022-01-27 16:24 manager ~0004490	Hello again, Didn't you get any crash, and so a trace since a month ? Maybe also it would be the time to upgrade to last stable 21 release. Without crash or debug, keeping this ticket open doesn't make too much sense.

Int 2022-01-28 07:56 reporter ~0004492	Hello, >Hello, beware that using a debug level may create really large trace file. >Have a look from time to time, rotate them manually if they became to big. I learned that the hard way already before your hint, when your server ran out of disk space ;) Unfortunately I had to stop tracing because of that. Your server does not have enough free space on C: to store the trace from a full backup job. But it wasn't a big loss since the crash also did not reappear again. >Maybe also it would be the time to upgrade to last stable 21 release. I agree - I am already planning for that. >Without crash or debug, keeping this ticket open doesn't make too much sense. Yes, you can close the ticket. Many thanks for your effort!

bruno-at-bareos 2022-01-31 09:31 manager ~0004495	Not reproducible.

Date Modified	Username	Field	Change
2021-12-08 07:15	Int	New Issue
2021-12-09 10:31	bruno-at-bareos	Note Added: 0004385
2021-12-10 13:11	Int	File Added: bareos-fd_debug200.png
2021-12-10 13:11	Int	Note Added: 0004388
2021-12-10 13:16	Int	Note Added: 0004389
2021-12-13 17:00	bruno-at-bareos	Note Added: 0004391
2021-12-22 16:33	bruno-at-bareos	Note Added: 0004407
2021-12-22 16:42	Int	Note Added: 0004410
2021-12-27 09:38	Int	Note Added: 0004420
2022-01-10 10:43	bruno-at-bareos	Assigned To	=> bruno-at-bareos
2022-01-10 10:43	bruno-at-bareos	Status	new => assigned
2022-01-10 10:44	bruno-at-bareos	Status	assigned => feedback
2022-01-10 10:44	bruno-at-bareos	Note Added: 0004461
2022-01-27 16:24	bruno-at-bareos	Note Added: 0004490
2022-01-28 07:56	Int	Note Added: 0004492
2022-01-28 07:56	Int	Status	feedback => assigned
2022-01-31 09:31	bruno-at-bareos	Status	assigned => closed
2022-01-31 09:31	bruno-at-bareos	Resolution	open => unable to reproduce
2022-01-31 09:31	bruno-at-bareos	Note Added: 0004495

Reporting new Issues is disabled, please Report new Issues at https://github.com/bareos/bareos/issues

View Issue Details

Activities

Issue History