View Issue Details

IDProjectCategoryView StatusLast Update
0001406bareos-corefile daemonpublic2022-01-31 09:31
ReporterInt Assigned Tobruno-at-bareos  
PrioritynormalSeveritycrashReproducibilitysometimes
Status closedResolutionunable to reproduce 
Platform64bitOSWindowsOS VersionServer 2016
Product Version19.2.11 
Summary0001406: file daemon crashes on Windows Server 2016
DescriptionSometimes the file daemon crashes on Windows Server 2016. This happened two times in the last month.
Patch level of the Server is from Microsoft October 2021 Patchday.

File daemon Version: 19.2.7 (16 April 2020) VSS Linux Cross-compile Win64
Microsoft Windows Server 2012 Standard Edition (build 9200), 64-bit

Error in the Windows event log:

Event 1000, Application Error

Name der fehlerhaften Anwendung: bareos-fd.exe, Version: 0.0.0.0, Zeitstempel: 0x5e98651f
Name des fehlerhaften Moduls: libbareos.dll, Version: 0.0.0.0, Zeitstempel: 0x5e9864a8
Ausnahmecode: 0xc0000005
Fehleroffset: 0x0000000000022e8b
ID des fehlerhaften Prozesses: 0x3db8
Startzeit der fehlerhaften Anwendung: 0x01d7e4e8765df91e
Pfad der fehlerhaften Anwendung: C:\Program Files\Bareos\bareos-fd.exe
Pfad des fehlerhaften Moduls: C:\Program Files\Bareos\libbareos.dll
Berichtskennung: 37b0a4b3-d23d-4cf0-9149-b94dce2d6d2d
Vollständiger Name des fehlerhaften Pakets:
Anwendungs-ID, die relativ zum fehlerhaften Paket ist:
TagsNo tags attached.
bareos-master: impact
bareos-master: action
bareos-19.2: impact
bareos-19.2: action
bareos-18.2: impact
bareos-18.2: action
bareos-17.2: impact
bareos-17.2: action
bareos-16.2: impact
bareos-16.2: action
bareos-15.2: impact
bareos-15.2: action
bareos-14.2: impact
bareos-14.2: action
bareos-13.2: impact
bareos-13.2: action
bareos-12.4: impact
bareos-12.4: action

Activities

bruno-at-bareos

bruno-at-bareos

2021-12-09 10:31

developer   ~0004385

Would you like to help us to understand what's going on ?

Could you describe a bit more your configuration of the FD (config files for example, you can blank password)
also job and fileset involved, plugins used etc.
Is this happening when the FD is doing something special, what the occurences (number of time per day, week, month)

Could you try to increase the debug level to 200 on the client to get nice timestamped trace and report them here ?
Int

Int

2021-12-10 13:11

reporter   ~0004388

file daemon configuration:
myself.conf
Client {
  Name = igms00-fd
  Maximum Concurrent Jobs = 20
}
bareos-dir.conf
Director {
  Name = bareos-dir
  Password = "xxx"
  Description = "Allow the configured Director to access this file daemon."
}

This is the fileset and job during which the fd crashed last time.
When the crash happened the job was running for about 36 hours of estimated 96 hours. The total backup volume of a successful job would have been about 18TB in several million files.

Fileset:
FileSet {
  Name = "FileSetIGMS00_bilddaten"
  Enable VSS = yes
  Include {
    Options {
      Signature = MD5
      Drive Type = fixed
      IgnoreCase = yes
      # if supported by the OS, the read time won't be adapted
      # this would generate a bunch of writes for no reason on the client machine.
      noatime = yes
      # If enabled, the Client will check size, age of each file after their backup
      # to see if they have changed during backup. If time or size mismatch, an error will raise.
      # In general, it is recommended to use this option.
      checkfilechanges = yes

      WildFile = "[A-Z]:/pagefile.sys"
      WildDir = "[A-Z]:/RECYCLER"
      WildDir = "[A-Z]:/$RECYCLE.BIN"
      WildDir = "[A-Z]:/System Volume Information"
      WildDir = "[A-Z]:/tmp/bareos-restores"
      WildDir = "[A-Z]:/Temp"
      Exclude = yes
    }
    File = "d:/Bilddaten"
  }
  Exclude {
    # Don’t add trailing /
    File = "d:/Bilddaten/_archivieren"
    File = "d:/Bilddaten/_restored"
  }
}

Job:
Job {
  Name = "filebackup_bilddaten-igms00-fd"
  JobDefs = "DefaultFileJob"
  Pool = Bilddaten
  #Pools müssen explizit angegeben werden sonst werden die Pools aus "DefaultFileJob" verwendet!
  Full Backup Pool = Bilddaten
  Differential Backup Pool = Bilddaten
  Incremental Backup Pool = Bilddaten
  Client = "igms00-fd"
  FileSet = "FileSetIGMS00_bilddaten"
  Schedule = "YearlyCycle"

  Enabled = yes
}

The crash before happened while five jobs where running in parallel.
The fileset and job configurations were different but similar to the one above.


I started the filedaemon with option "-d 200", see screenshot attached.
Is this the correct sytanx for the windows version of the file daemon?
How can I verify that the service is running with debug level 200?
bareos-fd_debug200.png (10,642 bytes)   
bareos-fd_debug200.png (10,642 bytes)   
Int

Int

2021-12-10 13:16

reporter   ~0004389

>Is this happening when the FD is doing something special, what the occurences (number of time per day, week, month)

Nothing special was done. The jobs and filesets running didn't change for months.
The crash happened two times, on 2021-12-08 and 2021-11-26.
This were the only occurrences so far.
bruno-at-bareos

bruno-at-bareos

2021-12-13 17:00

developer   ~0004391

I wouldn't have change the start of the daemon (especially on nitty picky windows) the command given in previous comment allow to dynamically set and remove debug level.
If you want to do so you can refer to the documentation
https://docs.bareos.org/master/TasksAndConcepts/TheWindowsVersionOfBareos.html?highlight=windows#windows-service

As the crash occur two times quite recently, it would be interesting to check if there's any traces that would have been generated.

Could you check if inside the system (normally they are located in bareos working dir, but I can't be sure at 100% under windows) you can find file with .traceback and .bactrace extension.
If yes, could you please attach them here.
bruno-at-bareos

bruno-at-bareos

2021-12-22 16:33

developer   ~0004407

Ping ? Any news on it ?
Int

Int

2021-12-22 16:42

reporter   ~0004410

Sorry, my colleague is out of office. Unfortunately, I have no spare time to help you with this. My colleague will be back after Christmas.
Int

Int

2021-12-27 09:38

reporter   ~0004420

>If you want to do so you can refer to the documentation
>https://docs.bareos.org/master/TasksAndConcepts/TheWindowsVersionOfBareos.html?highlight=windows#windows-service

Thank you for pointing me to the right documentation chapter - with the new documentation system it is very hard to find the information to a specific topic since the search is not working well, see bug 0001351
This helped me to start the windows service in debug mode. It showed that the method I used was not working correctly.

>As the crash occur two times quite recently, it would be interesting to check if there's any traces that would have been generated.
I could not find any .traceback or .bactrace files.

So far the crash did not occur again. I will inform you as soon as it happens again and since the debug mode is working now I hopefully will also be able to provide a trace file.
bruno-at-bareos

bruno-at-bareos

2022-01-10 10:44

developer   ~0004461

Hello, beware that using a debug level may create really large *trace* file.
Have a look from time to time, rotate them manually if they became to big.
bruno-at-bareos

bruno-at-bareos

2022-01-27 16:24

developer   ~0004490

Hello again,

Didn't you get any crash, and so a trace since a month ?
Maybe also it would be the time to upgrade to last stable 21 release.

Without crash or debug, keeping this ticket open doesn't make too much sense.
Int

Int

2022-01-28 07:56

reporter   ~0004492

Hello,

>Hello, beware that using a debug level may create really large *trace* file.
>Have a look from time to time, rotate them manually if they became to big.

I learned that the hard way already before your hint, when your server ran out of disk space ;)
Unfortunately I had to stop tracing because of that. Your server does not have enough free space on C: to store the trace from a full backup job.
But it wasn't a big loss since the crash also did not reappear again.

>Maybe also it would be the time to upgrade to last stable 21 release.
I agree - I am already planning for that.

>Without crash or debug, keeping this ticket open doesn't make too much sense.
Yes, you can close the ticket.

Many thanks for your effort!
bruno-at-bareos

bruno-at-bareos

2022-01-31 09:31

developer   ~0004495

Not reproducible.

Issue History

Date Modified Username Field Change
2021-12-08 07:15 Int New Issue
2021-12-09 10:31 bruno-at-bareos Note Added: 0004385
2021-12-10 13:11 Int File Added: bareos-fd_debug200.png
2021-12-10 13:11 Int Note Added: 0004388
2021-12-10 13:16 Int Note Added: 0004389
2021-12-13 17:00 bruno-at-bareos Note Added: 0004391
2021-12-22 16:33 bruno-at-bareos Note Added: 0004407
2021-12-22 16:42 Int Note Added: 0004410
2021-12-27 09:38 Int Note Added: 0004420
2022-01-10 10:43 bruno-at-bareos Assigned To => bruno-at-bareos
2022-01-10 10:43 bruno-at-bareos Status new => assigned
2022-01-10 10:44 bruno-at-bareos Status assigned => feedback
2022-01-10 10:44 bruno-at-bareos Note Added: 0004461
2022-01-27 16:24 bruno-at-bareos Note Added: 0004490
2022-01-28 07:56 Int Note Added: 0004492
2022-01-28 07:56 Int Status feedback => assigned
2022-01-31 09:31 bruno-at-bareos Status assigned => closed
2022-01-31 09:31 bruno-at-bareos Resolution open => unable to reproduce
2022-01-31 09:31 bruno-at-bareos Note Added: 0004495