View Issue Details

IDProjectCategoryView StatusLast Update
0001037bareos-core[All Projects] storage daemonpublic2019-07-22 13:28
Reporterr7Assigned To 
PrioritynormalSeveritycrashReproducibilityalways
Status feedbackResolutionopen 
PlatformWindowsOSWindowsOS Version7
Product Version18.2.4-rc2 
Fixed in Version 
Summary0001037: Windows bareos-fd is crashing during backup
DescriptionAfter successful full backup incremental job always fail. Backup scheme is Always Incremental.
To begin please take a look here https://groups.google.com/forum/?fromgroups=#!topic/bareos-users/fbu6iz-Ugbc

Attached are traces, reports, memory dumps and dir logs for one such Windows 7 host with failing FD
- helium-WER.7z.* - contents of C:\ProgramData\Microsoft\Windows\WER\ReportQueue
- helium-traces.7z.* - crashing FD job traces (setdebug client=lab-helium level=200 trace=1)
- helium-dir.log - job logs from director
Steps To ReproduceNo specific steps. FD is just crashing during backup.
Additional InformationDirector: 18.2.4rc2
Storage: 18.2.4rc2
FD: Both 17.2.4 & 18.2.4rc2 are failing, no matter which Windows version 7 or 10
Tagsalways incremental, crash, fd
bareos-master: impact
bareos-master: action
bareos-19.2: impact
bareos-19.2: action
bareos-18.2: impact
bareos-18.2: action
bareos-17.2: impact
bareos-17.2: action
bareos-16.2: impact
bareos-16.2: action
bareos-15.2: impact
bareos-15.2: action
bareos-14.2: impact
bareos-14.2: action
bareos-13.2: impact
bareos-13.2: action
bareos-12.4: impact
bareos-12.4: action

Activities

r7

r7

2019-01-24 18:26

reporter  

helium-dir.log (75,985 bytes)
r7

r7

2019-01-24 18:29

reporter   ~0003207

It is almost impossible to upload files to mantis with your limits.

APPLICATION ERROR 0000500

File upload failed. This is likely because the filesize was larger than is currently allowed by this PHP installation.

Please use the "Back" button in your web browser to return to the previous page. There you can correct whatever problems were identified in this error or select another action. You can also click an option from the menu bar to go directly to a new section.
r7

r7

2019-01-24 18:37

reporter   ~0003208

It is really a quest to attach files. What`s the limits for attachments?

APPLICATION ERROR #2800

Invalid form security token. This could be caused by a session timeout, or accidentally submitting the form twice.

Please use the "Back" button in your web browser to return to the previous page. There you can correct whatever problems were identified in this error or select another action. You can also click an option from the menu bar to go directly to a new section.
r7

r7

2019-01-24 19:07

reporter   ~0003209

- helium-WER.7z.* - contents of C:\ProgramData\Microsoft\Windows\WER\ReportQueue. There are reports with memory dumps.

P.S.: I`ve been struggling with your mantis timeouts, it`s attach limits and php post limits for 3 hours to complete this report. I`ve repacked crash reports files 3 or 4 times to get through attachment limits.

helium-WER.7z.001 (1,966,080 bytes)
r7

r7

2019-01-24 19:07

reporter  

helium-WER.7z.002 (1,966,080 bytes)
r7

r7

2019-01-24 22:36

reporter  

helium-WER.7z.003 (1,966,080 bytes)
r7

r7

2019-01-24 22:36

reporter  

helium-WER.7z.004 (1,966,080 bytes)
helium-WER.7z.005 (1,966,080 bytes)
helium-WER.7z.006 (1,966,080 bytes)
r7

r7

2019-01-24 22:36

reporter  

helium-WER.7z.007 (1,966,080 bytes)
helium-WER.7z.008 (1,966,080 bytes)
helium-WER.7z.009 (487,341 bytes)
r7

r7

2019-01-24 22:53

reporter  

helium-traces.7z.001 (2,097,152 bytes)
helium-traces.7z.002 (2,097,152 bytes)
helium-traces.7z.003 (2,097,152 bytes)
r7

r7

2019-01-25 00:07

reporter  

helium-traces.7z.004 (2,097,152 bytes)
helium-traces.7z.005 (2,097,152 bytes)
helium-traces.7z.006 (700,997 bytes)
r7

r7

2019-01-25 13:41

reporter   ~0003210

Sorry category is incorrect. Please edit it to be "[All Projects] file daemon".
teka74

teka74

2019-02-05 01:49

reporter   ~0003246

now I have same problem, switched to always incremental backup after update to 18.2.5

The 1st backup changed automatically to full, and worked. Next evening the AI was scheduled normally, and nothing happens

bconsole output:
Connecting to Client server01-fd at server01:9102
 Handshake: Cleartext, Encryption: None

server01-fd Version: 17.2.4 (21 Sep 2017) VSS Linux Cross-compile Win64
Daemon started 07-Jan-19 07:55. Jobs: run=29 running=1.
Microsoft Windows Server 2008 R2 Small Business Server Service Pack 1 (build 7601), 64-bit
 Heap: heap=0 smbytes=57,056,510 max_bytes=57,056,704 bufs=208 max_bufs=369
 Sizeof: boffset_t=8 size_t=8 debug=0 trace=1 bwlimit=0kB/s

Running Jobs:
server01-mon (director) connected at: 07-Jan-19 08:02
server01-mon (director) connected at: 17-Jan-19 15:05
JobId 616 Job server01-ai.2019-02-05_01.00.00_19 is running.
    Incremental System or Console Job started: 05-Feb-19 01:00
    Files=0 Bytes=0 Bytes/sec=0 Errors=0
    Bwlimit=0
    Files Examined=0
    SDReadSeqNo=3 fd=720
bareos-dir (director) connected at: 05-Feb-19 01:40
====

daemon is still running, but doing nothing. On my linux machine (the director), mysql is at 40% cpu since backup started, and the webui is lagging
teka74

teka74

2019-02-05 02:52

reporter   ~0003247

after more than 1 hour waiting, i got an email from the bareos director:

05-Feb 02:43 bareos-dir: ERROR in dird/authenticate_console.cc:393 Number of console connections exceeded MaximumConsoleConnections


i cancelled the backup job, now switching back to normal backup
r7

r7

2019-02-05 16:36

reporter  

joblog.1835.txt (6,060 bytes)
Connecting to Director localhost:9101
 Encryption: ECDHE-PSK-CHACHA20-POLY1305
1000 OK: bareos-dir Version: 18.2.5 (30 January 2019)
bareos.org build binary
bareos.org binaries are UNSUPPORTED by bareos.com.
Get official binaries and vendor support on https://www.bareos.com
You are connected using the default console

Enter a period to cancel a command.
list joblog jobid=1835
Automatically selected Catalog: MyCatalog
Using Catalog "MyCatalog"
 2019-02-05 13:01:00 bareos-dir JobId 1835: Start Backup JobId 1835, Job=lab-helium.2019-02-05_13.01.00_05
 2019-02-05 13:01:01 bareos-dir JobId 1835: Connected Storage daemon at fs32.lan:9103, encryption: ECDHE-PSK-CHACHA20-POLY1305
 2019-02-05 13:01:01 bareos-dir JobId 1835: Created new Volume "lab-helium.2019-02-05-13-01.aii_1835" in catalog.
 2019-02-05 13:01:01 bareos-dir JobId 1835: Using Device "disk-fs32-r6s3" to write.
 2019-02-05 13:01:01 bareos-dir JobId 1835: Probing client protocol... (result will be saved until config reload)
 2019-02-05 13:01:01 bareos-dir JobId 1835: Connected Client: lab-helium at lab-helium.lan:9102, encryption: PSK-AES256-CBC-SHA
 2019-02-05 13:01:01 bareos-dir JobId 1835:    Handshake: Immediate TLS  2019-02-05 13:01:01 bareos-dir JobId 1835:  Encryption: PSK-AES256-CBC-SHA
 2019-02-05 13:01:02 bareos-dir JobId 1835: Sending Accurate information.
 2019-02-05 13:01:12 fs32-sd JobId 1835: Labeled new Volume "lab-helium.2019-02-05-13-01.aii_1835" on device "disk-fs32-r6s3" (/_bareos).
 2019-02-05 13:01:12 fs32-sd JobId 1835: Wrote label to prelabeled Volume "lab-helium.2019-02-05-13-01.aii_1835" on device "disk-fs32-r6s3" (/_bareos)
 2019-02-05 13:01:12 bareos-dir JobId 1835: Max Volume jobs=1 exceeded. Marking Volume "lab-helium.2019-02-05-13-01.aii_1835" as Used.
 2019-02-05 13:01:05 lab-helium JobId 1835: Created 29 wildcard excludes from FilesNotToBackup Registry key
 2019-02-05 13:01:05 lab-helium JobId 1835: Connected Storage daemon at fs32.lan:9103, encryption: PSK-AES256-CBC-SHA
 2019-02-05 13:01:22 lab-helium JobId 1835: Generate VSS snapshots. Driver="Win64 VSS", Drive(s)="CF"
 2019-02-05 13:01:23 lab-helium JobId 1835: VolumeMountpoints are not processed as onefs = yes.
 2019-02-05 13:01:23 lab-helium JobId 1835: VolumeMountpoints are not processed as onefs = yes.
 2019-02-05 14:29:22 lab-helium JobId 1835: VSS Writer (BackupComplete): "Task Scheduler Writer", State: 0x1 (VSS_WS_STABLE)
 2019-02-05 14:29:22 lab-helium JobId 1835: VSS Writer (BackupComplete): "VSS Metadata Store Writer", State: 0x1 (VSS_WS_STABLE)
 2019-02-05 14:29:22 lab-helium JobId 1835: VSS Writer (BackupComplete): "Performance Counters Writer", State: 0x1 (VSS_WS_STABLE)
 2019-02-05 14:29:22 lab-helium JobId 1835: VSS Writer (BackupComplete): "System Writer", State: 0x1 (VSS_WS_STABLE)
 2019-02-05 14:29:22 lab-helium JobId 1835: VSS Writer (BackupComplete): "MSSearch Service Writer", State: 0x1 (VSS_WS_STABLE)
 2019-02-05 14:29:22 lab-helium JobId 1835: VSS Writer (BackupComplete): "ASR Writer", State: 0x1 (VSS_WS_STABLE)
 2019-02-05 14:29:22 lab-helium JobId 1835: VSS Writer (BackupComplete): "BITS Writer", State: 0x1 (VSS_WS_STABLE)
 2019-02-05 14:29:22 lab-helium JobId 1835: VSS Writer (BackupComplete): "WMI Writer", State: 0x1 (VSS_WS_STABLE)
 2019-02-05 14:29:22 lab-helium JobId 1835: VSS Writer (BackupComplete): "Registry Writer", State: 0x1 (VSS_WS_STABLE)
 2019-02-05 14:29:22 lab-helium JobId 1835: VSS Writer (BackupComplete): "Shadow Copy Optimization Writer", State: 0x1 (VSS_WS_STABLE)
 2019-02-05 14:29:22 lab-helium JobId 1835: VSS Writer (BackupComplete): "COM+ REGDB Writer", State: 0x1 (VSS_WS_STABLE)
 2019-02-05 14:29:23 lab-helium: ABORTING due to ERROR in lib/smartall.cc:229
Overrun buffer: len=41300 addr=34a00a8 allocated: filed/accurate_htable.cc:49 called from /home/abuild/rpmbuild/BUILD/bareos-18.2.4rc2/src/filed/accurate_htable.cc:193
 2019-02-05 14:29:23 bareos-dir JobId 1835: Fatal error: Network error with FD during Backup: ERR=Connection reset by peer
 2019-02-05 14:29:23 fs32-sd JobId 1835: Fatal error: stored/append.cc:173 Error reading data header from FD. ERR=Connection reset by peer
 2019-02-05 14:29:23 fs32-sd JobId 1835: Releasing device "disk-fs32-r6s3" (/_bareos).
 2019-02-05 14:29:23 bareos-dir JobId 1835: Fatal error: No Job status returned from FD.
 2019-02-05 14:29:23 bareos-dir JobId 1835: Error: Bareos bareos-dir 18.2.5 (30Jan19):
  Build OS:               Linux-4.4.92-6.18-default debian Debian GNU/Linux 9.7 (stretch)
  JobId:                  1835
  Job:                    lab-helium.2019-02-05_13.01.00_05
  Backup Level:           Incremental, since=2019-01-16 13:00:08
  Client:                 "lab-helium" 18.2.4rc2 (18Dec18) Microsoft Windows 7 Professional Service Pack 1 (build 7601), 64-bit,Cross-compile,Win64
  FileSet:                "win-all" 2018-07-27 16:59:34
  Pool:                   "aii" (From Job IncPool override)
  Catalog:                "MyCatalog" (From Client resource)
  Storage:                "disk-fs32-r6" (From Job resource)
  Scheduled time:         05-Feb-2019 13:01:00
  Start time:             05-Feb-2019 13:01:02
  End time:               05-Feb-2019 14:29:23
  Elapsed time:           1 hour 28 mins 21 secs
  Priority:               12
  FD Files Written:       0
  SD Files Written:       0
  FD Bytes Written:       0 (0 B)
  SD Bytes Written:       0 (0 B)
  Rate:                   0.0 KB/s
  Software Compression:   None
  VSS:                    no
  Encryption:             no
  Accurate:               yes
  Volume name(s):         lab-helium.2019-02-05-13-01.aii_1835
  Volume Session Id:      3
  Volume Session Time:    1549321963
  Last Volume Bytes:      3,330,367,609 (3.330 GB)
  Non-fatal FD errors:    1
  SD Errors:              0
  FD termination status:  Error
  SD termination status:  Running
  Bareos binary info:     bareos.org build: Get official binaries and vendor support on bareos.com
  Termination:            *** Backup Error ***
joblog.1835.txt (6,060 bytes)
r7

r7

2019-02-05 16:36

reporter   ~0003250

After DIR & SD upgrade to 18.2.5 job error in DIR logs of job with crashing 18.2.4rc2 FD changed.

2019-02-05 14:29:23 lab-helium: ABORTING due to ERROR in lib/smartall.cc:229
Overrun buffer: len=41300 addr=34a00a8 allocated: filed/accurate_htable.cc:49 called from /home/abuild/rpmbuild/BUILD/bareos-18.2.4rc2/src/filed/accurate_htable.cc:193
 2019-02-05 14:29:23 bareos-dir JobId 1835: Fatal error: Network error with FD during Backup: ERR=Connection reset by peer
 2019-02-05 14:29:23 fs32-sd JobId 1835: Fatal error: stored/append.cc:173 Error reading data header from FD. ERR=Connection reset by peer
 2019-02-05 14:29:23 fs32-sd JobId 1835: Releasing device "disk-fs32-r6s3" (/_bareos).
 2019-02-05 14:29:23 bareos-dir JobId 1835: Fatal error: No Job status returned from FD.
 2019-02-05 14:29:23 bareos-dir JobId 1835: Error: Bareos bareos-dir 18.2.5 (30Jan19):

You can find full joblog attached.
arogge

arogge

2019-07-22 13:28

developer   ~0003520

Does this happen with 18.2.5 or 18.2.6?
Are you able to retry with the nightly build from https://download.bareos.org/bareos/experimental/nightly/?

Can you check whether the problem persists if you fix the TLS configuration problem (i.e. your client seems to be misconfigured, as TLS-PSK fails. Maybe the name mismatches)?

Issue History

Date Modified Username Field Change
2019-01-24 18:26 r7 New Issue
2019-01-24 18:26 r7 Tag Attached: always incremental
2019-01-24 18:26 r7 Tag Attached: crash
2019-01-24 18:26 r7 Tag Attached: fd
2019-01-24 18:26 r7 File Added: helium-dir.log
2019-01-24 18:29 r7 Note Added: 0003207
2019-01-24 18:37 r7 Note Added: 0003208
2019-01-24 19:07 r7 File Added: helium-WER.7z.001
2019-01-24 19:07 r7 Note Added: 0003209
2019-01-24 19:07 r7 File Added: helium-WER.7z.002
2019-01-24 22:36 r7 File Added: helium-WER.7z.003
2019-01-24 22:36 r7 File Added: helium-WER.7z.004
2019-01-24 22:36 r7 File Added: helium-WER.7z.005
2019-01-24 22:36 r7 File Added: helium-WER.7z.006
2019-01-24 22:36 r7 File Added: helium-WER.7z.007
2019-01-24 22:36 r7 File Added: helium-WER.7z.008
2019-01-24 22:36 r7 File Added: helium-WER.7z.009
2019-01-24 22:53 r7 File Added: helium-traces.7z.001
2019-01-24 22:53 r7 File Added: helium-traces.7z.002
2019-01-24 22:53 r7 File Added: helium-traces.7z.003
2019-01-25 00:07 r7 File Added: helium-traces.7z.004
2019-01-25 00:07 r7 File Added: helium-traces.7z.005
2019-01-25 00:07 r7 File Added: helium-traces.7z.006
2019-01-25 13:41 r7 Note Added: 0003210
2019-02-05 01:49 teka74 Note Added: 0003246
2019-02-05 02:52 teka74 Note Added: 0003247
2019-02-05 16:36 r7 File Added: joblog.1835.txt
2019-02-05 16:36 r7 Note Added: 0003250
2019-07-22 13:28 arogge Status new => feedback
2019-07-22 13:28 arogge Note Added: 0003520