View Issue Details

IDProjectCategoryView StatusLast Update
0000498bareos-corefile daemonpublic2016-04-26 10:15
ReporterGunther Assigned To 
PrioritynormalSeverityminorReproducibilityalways
Status closedResolutionunable to reproduce 
Platformx86_64 on vmware vsphere 5.5 (2)OSCentOSOS Version 6.6
Product Version14.2.2 
Summary0000498: centos - backup fails after 5 minutes because of clientside network reset
DescriptionOld bug - existed already with bacula. (google finds old references)
Starting a backup-run of a centos-client will end up in a "broken connection" / network reset caused by the client. Reset takes place after 5 minutes (and 1 or 2 seconds).

..............


Already changed on client:

net.ipv4.tcp_keepalive_time = 600 (instead 7200)

no effect..

All other clients (>100; different linux flavours) work flawlessly.
Steps To ReproduceStart backup of centos-client - wait for the error.
Additional Information/usr/sbin/bareos-fd -d 200 -f -c /etc/bareos/bareos-fd.conf

start backup-run::
bareos-fd: filed_conf.c:484-0 Inserting director res: bareos-server-mon
centos-client-fd: jcr.c:141-0 read_last_jobs seek to 192
centos-client-fd: jcr.c:148-0 Read num_items=9
centos-client-fd: fd_plugins.c:1560-0 plugin dir is NULL
centos-client-fd: filed.c:267-0 filed: listening on port 9102
centos-client-fd: bnet_server_tcp.c:166-0 Addresses host[ipv4;0.0.0.0;9102]
centos-client-fd: dir_cmd.c:576-0 Conn: Hello Director bareos-server-dir calling
centos-client-fd: dir_cmd.c:582-0 Got a DIR connection at 29-Jul-2015 10:38:30
centos-client-fd: dir_cmd.c:444-0 Calling Authenticate
centos-client-fd: cram-md5.c:68-0 send: auth cram-md5 <1794404934.1438159110@centos-client-fd> ssl=0
centos-client-fd: cram-md5.c:123-0 cram-get received: auth cram-md5 <1285386616.1438159110@bareos-server-dir> ssl=0
centos-client-fd: cram-md5.c:143-0 sending resp to challenge: m9Q+ZmYMq4Z92n4pA9Fc9A
centos-client-fd: dir_cmd.c:449-0 OK Authenticate
centos-client-fd: dir_cmd.c:463-0 <dird: JobId=69 Job=centos-client.2015-07-29_10.38.28_25 SDid=64 SDtime=1438092081 Authorization=JPMM-EIGA-HIAJ-HGNO-GHKF-MNPN-MGIP-KEBO
centos-client-fd: dir_cmd.c:474-0 Executing JobId= command.
centos-client-fd: dir_cmd.c:1430-0 set sd auth key
centos-client-fd: dir_cmd.c:810-0 JobId=69 Auth=JPMM-EIGA-HIAJ-HGNO-GHKF-MNPN-MGIP-KEBO
centos-client-fd: fd_plugins.c:1691-0 plugin list is NULL
centos-client-fd: dir_cmd.c:463-0 <dird: level = full mtime_only=0
centos-client-fd: dir_cmd.c:474-0 Executing level = command.
centos-client-fd: dir_cmd.c:1265-0 level_cmd: level = full mtime_only=0
centos-client-fd: dir_cmd.c:463-0 <dird: fileset vss=1
centos-client-fd: dir_cmd.c:474-0 Executing fileset command.
centos-client-fd: fileset.c:251-0 I
centos-client-fd: fileset.c:251-0 O fMZ6
centos-client-fd: fileset.c:251-0 N
centos-client-fd: fileset.c:251-0 F /
centos-client-fd: fileset.c:251-0 N
centos-client-fd: fileset.c:251-0 N
centos-client-fd: fileset.c:251-0 E
centos-client-fd: fileset.c:251-0 F /proc
centos-client-fd: fileset.c:251-0 F /sys
centos-client-fd: fileset.c:251-0 F /tmp
centos-client-fd: fileset.c:251-0 F /.journal
centos-client-fd: fileset.c:251-0 F /.fsck
centos-client-fd: fileset.c:251-0 F /srv/chroot/el5/dev
....
....
tail:
.....
centos-client-fd: backup.c:627-0 filed: sending /usr/share/doc/gcc-c++-4.4.7/ChangeLog-2003.bz2 to stored
centos-client-fd: crypto_openssl.c:529-0 crypto_digest_new jcr=7f14480018c8
centos-client-fd: backup.c:1501-0 No strip for /usr/share/doc/gcc-c++-4.4.7/ChangeLog-2003.bz2
centos-client-fd: backup.c:748-0 type=3 do_read=1
centos-client-fd: bfile.c:1101-0 bopen: fname /usr/share/doc/gcc-c++-4.4.7/ChangeLog-2003.bz2, flags 0, mode 0, rdev 0
centos-client-fd: bfile.c:1111-0 open file /usr/share/doc/gcc-c++-4.4.7/ChangeLog-2003.bz2
centos-client-fd: backup.c:177-0 end blast_data ok=0
centos-client-fd: dir_cmd.c:1712-0 Error in blast_data.
centos-client-fd: dir_cmd.c:477-0 Quit command loop. Canceled=1
centos-client-fd: runscript.c:149-0 runscript: running all RUNSCRIPT object (ClientAfterJob) JobStatus=f
centos-client-fd: dir_cmd.c:513-0 End FD msg: Jmsg Job=centos-client.2015-07-29_10.38.28_25 type=4 level=1438159416 centos-client-fd JobId 69: Error: bsock_tcp.c:422 Write error sending 41896 bytes to Storage daemon:bareos-server.backup-domain:9103: ERR=Broken pipe

centos-client-fd: dir_cmd.c:533-0 Calling term_find_files
centos-client-fd: htable.c:84-0 free malloc buf=7f14634c2038
centos-client-fd: mem_pool.c:432-0 Freed mem_pool count=33 size=25,957
Pool Maxsize Maxused Inuse
NoPool 256 3 1
NAME 256 3 0
FNAME 2304 20 3
MSG 512 10 3
EMSG 1024 3 0
BSOCK 65636 5 2
RECORD 128 0 0

centos-client-fd: htable.c:590-0 Done destroy.
centos-client-fd: dir_cmd.c:538-0 Done with term_find_files
centos-client-fd: jcr.c:184-0 write_last_jobs seek to 192
centos-client-fd: dir_cmd.c:540-0 Done with free_jcr
centos-client-fd: mem_pool.c:394-0 garbage collect memory pool

................
................
Server logs:
29-Jul 10:38 bareos-server-dir JobId 69: No prior Full backup Job record found.
29-Jul 10:38 bareos-server-dir JobId 69: No prior or suitable Full backup found in catalog. Doing FULL backup.
29-Jul 10:38 bareos-server-dir JobId 69: Start Backup JobId 69, Job=centos-client.2015-07-29_10.38.28_25
29-Jul 10:38 bareos-server-dir JobId 69: Using Device "bareos-server-div-storage" to write.
29-Jul 10:38 bareos-server-sd JobId 69: Volume "div0040" previously written, moving to end of data.
29-Jul 10:38 bareos-server-sd JobId 69: Ready to append to end of Volume "div0040" size=11965573289
29-Jul 10:43 bareos-server-dir JobId 69: Fatal error: Network error with FD during Backup: ERR=Die Verbindung wurde vom Kommunikationspartner zurückgesetzt
29-Jul 10:43 bareos-server-dir JobId 69: Fatal error: No Job status returned from FD.
29-Jul 10:43 bareos-server-dir JobId 69: Error: Bareos bareos-server-dir 14.2.2 (12Dec14):
  Build OS: x86_64-pc-linux-gnu debian Debian GNU/Linux 7.0 (wheezy)
  JobId: 69
  Job: centos-client.2015-07-29_10.38.28_25
  Backup Level: Full (upgraded from Incremental)
  Client: "centos-client-fd" 14.2.2 (12Dec14) x86_64-unknown-linux-gnu,redhat,CentOS release 6.2 (Final)
  FileSet: "centos-client-fs" 2015-07-28 15:57:12
  Pool: "Div" (From Job resource)
  Catalog: "MyCatalog" (From Client resource)
  Storage: "bareos-server-sd-div" (From Pool resource)
  Scheduled time: 29-Jul-2015 10:37:21
  Start time: 29-Jul-2015 10:38:30
  End time: 29-Jul-2015 10:43:36
  Elapsed time: 5 mins 6 secs
  Priority: 10
  FD Files Written: 0
  SD Files Written: 0
  FD Bytes Written: 0 (0 B)
  SD Bytes Written: 10,994,848 (10.99 MB)
  Rate: 0.0 KB/s
  Software Compression: None
  VSS: no
  Encryption: no
  Accurate: no
  Volume name(s):
  Volume Session Id: 64
  Volume Session Time: 1438092081
  Last Volume Bytes: 0 (0 B)
  Non-fatal FD errors: 1
  SD Errors: 0
  FD termination status: Error
  SD termination status: Running
  Termination: *** Backup Error ***

............
............
Status client (bconsole):

Connecting to Client centos-client-fd at centos-client.backup-domain:9102

centos-client-fd Version: 14.2.2 (12 December 2014) x86_64-unknown-linux-gnu redhat CentOS release 6.2 (Final)
Daemon started 29-Jul-15 10:36. Jobs: run=1 running=0.
 Heap: heap=36,864 smbytes=93,929 max_bytes=2,240,953 bufs=65 max_bufs=214
 Sizeof: boffset_t=8 size_t=8 debug=200 trace=0 bwlimit=0kB/s

Running Jobs:
Director connected at: 29-Jul-15 10:57
No Jobs running.
====

Terminated Jobs:
 JobId Level Files Bytes Status Finished Name
======================================================================
112616 Full 83,474 841.4 M Error 28-Jul-15 09:59 centos-client
112617 Full 44,966 430.8 M Error 28-Jul-15 10:05 centos-client
112623 Full 127,076 1.048 G Error 28-Jul-15 10:28 centos-client
112624 Full 54,257 572.9 M Error 28-Jul-15 11:08 centos-client
112625 Full 79,922 829.4 M Error 28-Jul-15 11:35 centos-client
     4 Full 0 0 Error 28-Jul-15 16:00 centos-client
     5 Full 34,339 351.6 M Error 28-Jul-15 16:06 centos-client
     6 Full 65,083 742.3 M Error 28-Jul-15 16:28 centos-client
     7 Full 59,222 674.6 M Error 28-Jul-15 16:42 centos-client
    69 Full 76,035 813.4 M Error 29-Jul-15 10:43 centos-client
====
*

(new jobid because of re-installation of bareos)

*est
Select Job resource (1-63): 12
Using Catalog "MyCatalog"
Connecting to Client centos-client-fd at centos-client.backup-domain:9102
2000 OK estimate files=128,588 bytes=3,221,471,374
*

TagsNo tags attached.

Activities

lexa500

lexa500

2015-08-03 11:49

reporter   ~0001793

I have very similiar situation.
my file daemon is on Gentoo Version: 14.2.1
my storage and director are on Ubuntu Version: 15.2.0
They are placed on one network (so nothing can break the connection)
so when I try to run backup i get this on FD
mail-fd: backup.c:989-0 Send data to SD len=59174
mail-fd: backup.c:989-0 Send data to SD len=59232
mail-fd: backup.c:989-0 Send data to SD len=59547
mail-fd: heartbeat.c:98-0 wait_intr=0 stop=0
mail-fd: heartbeat.c:98-0 wait_intr=0 stop=0
mail-fd: heartbeat.c:98-0 wait_intr=0 stop=0

and on SD i get

bareos-sd (50): askdir.c:177-1769 <dird 1000 OK VolName=lab00 VolJobs=3 VolFiles=0 VolBlocks=0 VolBytes=146958542 VolMounts=3 VolErrors=0 VolWrites=1 MaxVolBytes=268435456000 VolCapacityBytes=0 VolStatus=Append Slot=0 MaxVolJobs=0 MaxVolFiles=0 InChanger=0 VolReadTime=0 VolWriteTime=27 EndFile=0 EndBlock=0 LabelType=0 MediaId=144 EncryptionKey= MinBlocksize=0 MaxBlocksize=0
bareos-sd (50): askdir.c:213-1769 do_get_volume_info return true slot=0 Volume=lab00, VolminBlocksize=0 VolMaxBlocksize=0
bareos-sd (50): askdir.c:215-1769 setting dcr->VolMinBlocksize(0) to vol.VolMinBlocksize(0)
bareos-sd (50): askdir.c:217-1769 setting dcr->VolMaxBlocksize(0) to vol.VolMaxBlocksize(0)
bareos-sd (150): reserve.c:134-1769 Dec reserve=0 dev="device-synology" (/mnt/data/backup/storage)
bareos-sd (50): append.c:89-1769 Begin append device="device-synology" (/mnt/data/backup/storage)
bareos-sd (100): append.c:100-1769 Just after acquire_device_for_append
bareos-sd (130): label.c:803-1769 session_label record=980053c0
bareos-sd (150): label.c:848-1769 Write sesson_label record JobId=1769 FI=SOS_LABEL SessId=1 Strm=1769 len=153 remainder=0
bareos-sd (150): label.c:852-1769 Leave write_session_label Block=146
958542d File=0d

So at some moment FD looses connection to SD.
IDK if I should open a new ticket or this is the same problem
maik

maik

2015-11-06 17:44

administrator   ~0001942

Probably an individual network problem. Please reopen, if it occurs on other clients, too.

Issue History

Date Modified Username Field Change
2015-07-29 11:03 Gunther New Issue
2015-08-03 11:49 lexa500 Note Added: 0001793
2015-11-06 17:44 maik Note Added: 0001942
2015-11-06 17:44 maik Status new => closed
2015-11-06 17:44 maik Resolution open => unable to reproduce