View Issue Details

IDProjectCategoryView StatusLast Update
0000361bareos-core[All Projects] storage daemonpublic2016-01-14 13:54
ReportermgostomskiAssigned To 
PriorityhighSeveritycrashReproducibilityunable to reproduce
Status closedResolutionsuspended 
PlatformWindowsOSServerOS Version2008 64bit
Product Version14.2.1 
Fixed in Version 
Summary0000361: Storage crash when running eg. two concurrent jobs
DescriptionWhen i try run two concurrent jobs on the same SD daemon (multiple devices) then he's crached with this error on Windows Event Log:

Signature of the problem:
   Problem Event Name: APPCRASH
   Application Name: bareos-sd.exe
   Application Version: 1.0.0.0
   Application Timestamp: 543d899b
   The name of the module with the error: pthreadGCE2.dll
   Version of the module with the error: 2.8.0.0
   Timestamp module error: 00790070
   Exception code: c0000005
   Moving exception: 0000000000005471
   Operating System Version: 6.1.7601.2.1.0.274.10
   Locale ID: 1045
   Additional Information 1: f9b3
   Additional Information 2: f9b3fd4dd4211be525ef3120e05e4107
   Additional Information 3: 5f9e
   Additional Information 4: 5f9e152a0a74f7ea07fc904fcf144ab6


Steps To Reproduce1. Create and configure storage daemon
2. Create multiple devices on the same disk but different directories
3. Run two job at the same time
4. One Job running and when next is connect to storage, then bareos-sd daemon is crashed. After re-run, bareos-dir trying to rerun job, and bareos-sd again crach... and again... again...
Additional Information2014-11-06 19:57:51 avn01bac001-dir JobId 1030: Rescheduled Job Backup_Users.2014-11-06_19.35.58_25 at 06-Nov-2014 19:57 to re-run in 60 seconds (06-Nov-2014 19:58).
2014-11-06 19:57:51 avn01bac001-dir JobId 1030: Job Backup_Users.2014-11-06_19.35.58_25 waiting 60 seconds for scheduled start time.
2014-11-06 19:58:53 avn01bac001-dir JobId 1030: Start Backup JobId 1030, Job=Backup_Users.2014-11-06_19.35.58_25
2014-11-06 19:58:54 avn01bac001-dir JobId 1030: Using Device "avn02adc001" to write.
2014-11-06 20:07:37 avn02adc001 JobId 1030: Volume "Avena0114" previously written, moving to end of data.
2014-11-06 20:07:37 avn02adc001 JobId 1030: Ready to append to end of Volume "Avena0114" size=205
2014-11-06 20:07:37 rwodzik-fd JobId 1030: Created 28 wildcard excludes from FilesNotToBackup Registry key
2014-11-06 20:07:40 rwodzik-fd JobId 1030: Generate VSS snapshots. Driver="Win64 VSS", Drive(s)="C" VMP(s)=0
2014-11-06 20:08:17 avn02adc001 JobId 1030: Fatal error: stored/append.c:191 FI=8 from FD not positive or sequential=0
2014-11-06 20:08:17 avn02adc001 JobId 1030: Elapsed time=00:00:39, Transfer rate=0 Bytes/second
2014-11-06 20:08:17 rwodzik-fd JobId 1030: Error: lib/bsock_tcp.c:422 Write error sending 6124 bytes to Storage daemon:188.252.6.146:9103: ERR=Input/output error

2014-11-06 20:08:17 rwodzik-fd JobId 1030: Fatal error: filed/backup.c:984 Network send error to SD. ERR=Input/output error

2014-11-06 20:08:20 rwodzik-fd JobId 1030: VSS Writer (BackupComplete): "Task Scheduler Writer", State: 0x1 (VSS_WS_STABLE)
2014-11-06 20:08:20 rwodzik-fd JobId 1030: VSS Writer (BackupComplete): "VSS Metadata Store Writer", State: 0x1 (VSS_WS_STABLE)
2014-11-06 20:08:20 rwodzik-fd JobId 1030: VSS Writer (BackupComplete): "Performance Counters Writer", State: 0x1 (VSS_WS_STABLE)
2014-11-06 20:08:20 rwodzik-fd JobId 1030: VSS Writer (BackupComplete): "System Writer", State: 0x1 (VSS_WS_STABLE)
2014-11-06 20:08:20 rwodzik-fd JobId 1030: VSS Writer (BackupComplete): "Shadow Copy Optimization Writer", State: 0x1 (VSS_WS_STABLE)
2014-11-06 20:08:20 rwodzik-fd JobId 1030: VSS Writer (BackupComplete): "ASR Writer", State: 0x1 (VSS_WS_STABLE)
2014-11-06 20:08:20 rwodzik-fd JobId 1030: VSS Writer (BackupComplete): "COM+ REGDB Writer", State: 0x1 (VSS_WS_STABLE)
2014-11-06 20:08:20 rwodzik-fd JobId 1030: VSS Writer (BackupComplete): "Registry Writer", State: 0x1 (VSS_WS_STABLE)
2014-11-06 20:08:20 rwodzik-fd JobId 1030: VSS Writer (BackupComplete): "BITS Writer", State: 0x1 (VSS_WS_STABLE)
2014-11-06 20:08:20 rwodzik-fd JobId 1030: VSS Writer (BackupComplete): "WMI Writer", State: 0x1 (VSS_WS_STABLE)
2014-11-06 20:08:20 rwodzik-fd JobId 1030: VSS Writer (BackupComplete): "MSSearch Service Writer", State: 0x1 (VSS_WS_STABLE)
2014-11-06 19:59:47 avn01bac001-dir JobId 1030: Error: Bareos avn01bac001-dir 14.2.1 (12Sep14):
  Build OS: x86_64-pc-linux-gnu debian Debian GNU/Linux 7.0 (wheezy)
  JobId: 1030
  Job: Backup_Users.2014-11-06_19.35.58_25
  Backup Level: Full
  Client: "rwodzik-fd" 14.4.0 (08Oct14) Microsoft Windows 7 Professional Service Pack 1 (build 7601), 64-bit,Cross-compile,Win64
  FileSet: "Users" 2014-11-06 11:02:52
  Pool: "AvenaFull" (From Job FullPool override)
  Catalog: "MyCatalog" (From Client resource)
  Storage: "avn02adc001" (From command line)
  Scheduled time: 06-Nov-2014 19:35:58
  Start time: 06-Nov-2014 19:58:54
  End time: 06-Nov-2014 19:59:47
  Elapsed time: 53 secs
  Priority: 1
  FD Files Written: 8
  SD Files Written: 0
  FD Bytes Written: 0 (0 B)
  SD Bytes Written: 0 (0 B)
  Rate: 0.0 KB/s
  Software Compression: 100.0 %
  VSS: yes
  Encryption: no
  Accurate: no
  Volume name(s):
  Volume Session Id: 1
  Volume Session Time: 1415300855
  Last Volume Bytes: 0 (0 B)
  Non-fatal FD errors: 1
  SD Errors: 1
  FD termination status: Error
  SD termination status: Error
  Termination: *** Backup Error ***
TagsNo tags attached.
bareos-master: impact
bareos-master: action
bareos-19.2: impact
bareos-19.2: action
bareos-18.2: impact
bareos-18.2: action
bareos-17.2: impact
bareos-17.2: action
bareos-16.2: impact
bareos-16.2: action
bareos-15.2: impact
bareos-15.2: action
bareos-14.2: impact
bareos-14.2: action
bareos-13.2: impact
bareos-13.2: action
bareos-12.4: impact
bareos-12.4: action

Activities

pstorz

pstorz

2014-11-07 09:14

administrator   ~0001043

Can you please give me the exact windows version you are running the server on?

Is your director also running on windows

What version is your director?

Please give as much info on your setup as possible so that we can hopefully reproduce it.
mgostomski

mgostomski

2014-11-07 11:02

reporter   ~0001045

If we run only one job for whichever device this job run ok, but if we add second job, then storages crashed.

Version of Windows Storage Bareos: 14.4.0

My Director running on Linux Debian (version: avn01bac001-dir Version: 14.2.1)

#####################
# WINDOWS STORAGE #
# bareos-sd.conf #
#####################

Storage { # definition of myself
  Name = avn02adc001
  Heartbeat Interval = 20
  Maximum Concurrent Jobs = 25
}

Director {
  Name = avn01bac001-dir
  Password = "#SERCRETPASS#"
}

Director {
  Name = avn02adc001-mon
  Password = "#SERCRETPASS#"
  Monitor = yes
}

Device {
  Name = avn02adc001
  Device Type = File
  Media Type = File
  Archive Device = H:\Bareos\
  Random Access = yes
  RemovableMedia = no
  Autoselect = yes
  Requires Mount = no
  LabelMedia = yes
  Maximum Concurrent Jobs = 25
    }
    
Device {
  Name = avn02adc001Mgostomski
  Device Type = File
  Media Type = FileMgostomski
  Archive Device = H:\BareosMgostomski\
  Random Access = yes
  RemovableMedia = no
  Autoselect = yes
  Requires Mount = no
  LabelMedia = yes
  Maximum Concurrent Jobs = 25
    }
    
Device {
  Name = avn02adc001Dnaczk
  Device Type = File
  Media Type = FileDnaczk
  Archive Device = H:\BareosDnaczk\
  Random Access = yes
  RemovableMedia = no
  Autoselect = yes
  Requires Mount = no
  LabelMedia = yes
  Maximum Concurrent Jobs = 25
    }

Messages {
  Name = Standard
  director = avn01bac001-dir = all
}

#########################
# END OF BAREOS-SD.CONF #
#########################

#####################
# DIRECOTR ON LINUX #
# storages.conf #
#####################

Storage {
  Name = avn02adc001
  Address = #SERCRETPASS#
  Password = "#SERCRETPASS#"
  Device = avn02adc001
  Device = avn02adc001Mgostomski
  Device = avn02adc001Dnaczk
  Media Type = File
  SDPort = 9103
  Maximum Concurrent Jobs = 8
}

Storage {
  Name = avn02adc001Mgostomski
  Address = #SERCRETPASS#
  Password = "#SERCRETPASS#"
  Device = avn02adc001Mgostomski
  Media Type = FileMgostomski
  SDPort = 9103
  Maximum Concurrent Jobs = 8
}

Storage {
  Name = avn02adc001Dnaczk
  Address = #SERCRETPASS#
  Password = "#SERCRETPASS#"
  Device = avn02adc001Dnaczk
  Media Type = FileDnaczk
  SDPort = 9103
  Maximum Concurrent Jobs = 8
}
#########################
# END OF STORAGES.CONF #
#########################
mvwieringen

mvwieringen

2014-11-07 11:13

developer   ~0001046

The crash is on an assert in the SD code

Fatal error: stored/append.c:191 FI=8 from FD not positive or sequential=0

Seems the filed sends as first the FileIndex 8 instead of 1
mgostomski

mgostomski

2014-11-07 11:41

reporter   ~0001049

@mvwieringen
How to resolve this issue?
When we run only one job, then all is ok... Storage crashed only if we run two or more concurrent jobs.
mgostomski

mgostomski

2014-11-07 13:02

reporter   ~0001051

Last edited: 2014-11-07 13:03

View 2 revisions

Error from first Job with run after run second Job.
This Job Backups 206MB and, crash..



2014-11-07 12:48:11 avn01bac001-dir JobId 15: Start Backup JobId 15, Job=Backup_Users.2014-11-07_12.48.09_22
2014-11-07 12:48:12 avn01bac001-dir JobId 15: Using Device "avn02adc001" to write.
2014-11-07 12:56:58 avn02adc001 JobId 15: Volume "Avn010002" previously written, moving to end of data.
2014-11-07 12:56:58 avn02adc001 JobId 15: Ready to append to end of Volume "Avn010002" size=463712440
2014-11-07 12:56:58 avn02adc001 JobId 15: Spooling data ...
2014-11-07 12:56:58 dnaczk-fd JobId 15: Created 28 wildcard excludes from FilesNotToBackup Registry key
2014-11-07 12:57:01 dnaczk-fd JobId 15: Generate VSS snapshots. Driver="Win32 VSS", Drive(s)="C" VMP(s)=0
2014-11-07 12:57:46 dnaczk-fd JobId 15: Error: lib/bsock_tcp.c:422 Write error sending 65536 bytes to Storage daemon:188.252.6.146:9103: ERR=Input/output error

2014-11-07 12:57:46 dnaczk-fd JobId 15: Fatal error: filed/backup.c:984 Network send error to SD. ERR=Input/output error

2014-11-07 12:57:51 dnaczk-fd JobId 15: VSS Writer (BackupComplete): "Task Scheduler Writer", State: 0x1 (VSS_WS_STABLE)
2014-11-07 12:57:51 dnaczk-fd JobId 15: VSS Writer (BackupComplete): "VSS Metadata Store Writer", State: 0x1 (VSS_WS_STABLE)
2014-11-07 12:57:51 dnaczk-fd JobId 15: VSS Writer (BackupComplete): "Performance Counters Writer", State: 0x1 (VSS_WS_STABLE)
2014-11-07 12:57:51 dnaczk-fd JobId 15: VSS Writer (BackupComplete): "System Writer", State: 0x1 (VSS_WS_STABLE)
2014-11-07 12:57:51 dnaczk-fd JobId 15: VSS Writer (BackupComplete): "ASR Writer", State: 0x1 (VSS_WS_STABLE)
2014-11-07 12:57:51 dnaczk-fd JobId 15: VSS Writer (BackupComplete): "MSSearch Service Writer", State: 0x1 (VSS_WS_STABLE)
2014-11-07 12:57:51 dnaczk-fd JobId 15: VSS Writer (BackupComplete): "Shadow Copy Optimization Writer", State: 0x1 (VSS_WS_STABLE)
2014-11-07 12:57:51 dnaczk-fd JobId 15: VSS Writer (BackupComplete): "WMI Writer", State: 0x1 (VSS_WS_STABLE)
2014-11-07 12:57:51 dnaczk-fd JobId 15: VSS Writer (BackupComplete): "Registry Writer", State: 0x1 (VSS_WS_STABLE)
2014-11-07 12:57:51 dnaczk-fd JobId 15: VSS Writer (BackupComplete): "COM+ REGDB Writer", State: 0x1 (VSS_WS_STABLE)
2014-11-07 12:49:00 avn01bac001-dir JobId 15: Error: Director's comm line to SD dropped.
2014-11-07 12:50:08 avn01bac001-dir JobId 15: Error: Bareos avn01bac001-dir 14.2.1 (12Sep14):
  Build OS: x86_64-pc-linux-gnu debian Debian GNU/Linux 7.0 (wheezy)
  JobId: 15
  Job: Backup_Users.2014-11-07_12.48.09_22
  Backup Level: Full
  Client: "dnaczk-fd" 14.4.0 (08Oct14) Microsoft Windows 7 Professional Service Pack 1 (build 7601), 32-bit,Cross-compile,Win32
  FileSet: "Users" 2014-11-07 12:13:51
  Pool: "AvenaFull" (From Job FullPool override)
  Catalog: "MyCatalog" (From Client resource)
  Storage: "avn02adc001" (From Job resource)
  Scheduled time: 07-Nov-2014 12:48:09
  Start time: 07-Nov-2014 12:48:12
  End time: 07-Nov-2014 12:50:08
  Elapsed time: 1 min 56 secs
  Priority: 1
  FD Files Written: 770
  SD Files Written: 0
  FD Bytes Written: 208,004,626 (208.0 MB)
  SD Bytes Written: 0 (0 B)
  Rate: 1793.1 KB/s
  Software Compression: None
  VSS: yes
  Encryption: no
  Accurate: no
  Volume name(s):
  Volume Session Id: 1
  Volume Session Time: 1415361377
  Last Volume Bytes: 0 (0 B)
  Non-fatal FD errors: 2
  SD Errors: 0
  FD termination status: Error
  SD termination status: Error
  Termination: *** Backup Error ***

pstorz

pstorz

2014-11-18 16:15

administrator   ~0001064

Program blows up in line 786 of vol_mgr.c:

(gdb) n
784 free(vol->vol_name);
(gdb)
785 vol->vol_name = NULL;
(gdb)
786 vol->destroy_mutex();
(gdb)

Program received signal SIGSEGV, Segmentation fault.
0x000000006ea05471 in ?? () from c:\Program Files\Bareos\pthreadGCE2.dll
(gdb)
pstorz

pstorz

2014-11-18 16:21

administrator   ~0001065

Mutex is zero:

VOLRES::destroy_mutex (this=0x24fee18) at ../../stored/vol_mgr.h:66
66 void destroy_mutex() { pthread_mutex_destroy(&m_mutex); };
(gdb) p m_mutex
$3 = (pthread_mutex_t) 0x0
(gdb)
stephand

stephand

2015-11-13 22:30

developer   ~0001964

This is probably related to 0000414
stephand

stephand

2015-11-26 12:13

developer   ~0002016

Could you please try if you still get this error with the new version:
http://download.bareos.org/bareos/release/15.2/windows/winbareos-15.2.2-postvista-64-bit-r35.1.exe

Issue History

Date Modified Username Field Change
2014-11-06 20:25 mgostomski New Issue
2014-11-07 09:14 pstorz Note Added: 0001043
2014-11-07 09:15 pstorz Assigned To => pstorz
2014-11-07 09:15 pstorz Status new => assigned
2014-11-07 11:02 mgostomski Note Added: 0001045
2014-11-07 11:13 mvwieringen Note Added: 0001046
2014-11-07 11:41 mgostomski Note Added: 0001049
2014-11-07 13:02 mgostomski Note Added: 0001051
2014-11-07 13:03 mgostomski Note Edited: 0001051 View Revisions
2014-11-18 16:15 pstorz Note Added: 0001064
2014-11-18 16:21 pstorz Note Added: 0001065
2015-03-31 14:42 pstorz Status assigned => confirmed
2015-03-31 14:58 mvwieringen Assigned To pstorz =>
2015-11-13 22:30 stephand Note Added: 0001964
2015-11-26 12:13 stephand Note Added: 0002016
2015-11-26 12:13 stephand Assigned To => stephand
2015-11-26 12:13 stephand Status confirmed => feedback
2016-01-14 13:54 mvwieringen Status feedback => closed
2016-01-14 13:54 mvwieringen Assigned To stephand =>
2016-01-14 13:54 mvwieringen Resolution open => suspended