View Issue Details

IDProjectCategoryView StatusLast Update
0000470bareos-corefile daemonpublic2015-05-26 11:53
Reportertilman Assigned To 
PrioritynormalSeverityblockReproducibilityalways
Status closedResolutionno change required 
PlatformLinuxOSUbuntuOS Version14.04
Product Version14.2.2 
Summary0000470: Passive Client: Backup itself seems to be successful, job however fails
DescriptionBackup attempts of Tgvs2 fail.
- Tgvs2 is a passive client on an ubuntu server.
- The director is in a different network, nat'ed and behind a firewall.

Here the output of bconsole:
*status director
qtron-dir Version: 14.2.2 (12 December 2014) i686-pc-linux-gnu ubuntu Ubuntu 14.04 LTS
Daemon started 25-Apr-15 06:49. Jobs: run=11, running=0 mode=0
 Heap: heap=393,216 smbytes=185,752 max_bytes=234,397 bufs=718 max_bufs=1,159
No Scheduled Jobs.
====

Running Jobs:
Console connected at 22-May-15 21:35
No Jobs running.
====

Terminated Jobs:
 JobId Level Files Bytes Status Finished Name
====================================================================
    37 Full 0 0 Cancel 25-Apr-15 15:08 BackupTgvs2ToDisk
    38 Full 0 0 Error 09-May-15 23:09 BackupTgvs2ToDisk
    39 Full 0 0 Error 10-May-15 01:19 BackupWtronToDisk
    40 Full 6,300 4.295 G OK 11-May-15 00:18 BackupWtronToDisk
    41 Full 0 0 Error 11-May-15 02:52 BackupWtronToDisk
    42 Full 0 0 Error 11-May-15 05:33 BackupTgvs2ToDisk
    43 Full 0 0 Error 11-May-15 18:39 BackupTgvs2ToDisk
    44 Full 0 0 Error 12-May-15 05:30 BackupTgvs2ToDisk
    45 Full 6,316 4.620 G OK 17-May-15 16:41 BackupWtronToDisk
    46 Full 0 0 Error 22-May-15 19:21 BackupTgvs2ToDisk

====
You have messages.
*status client=tgvs2-fd
Connecting to Client tgvs2-fd at tgvs2:9102

tgvs2-fd Version: 14.2.2 (12 December 2014) x86_64-pc-linux-gnu ubuntu Ubuntu 14.04 LTS
Daemon started 25-Apr-15 06:34. Jobs: run=6 running=0.
 Heap: heap=167,936 smbytes=107,723 max_bytes=171,696 bufs=84 max_bufs=185
 Sizeof: boffset_t=8 size_t=8 debug=0 trace=0 bwlimit=0kB/s

Running Jobs:
Director connected at: 22-May-15 21:39
No Jobs running.
====

Terminated Jobs:
 JobId Level Files Bytes Status Finished Name
======================================================================
    37 Full 5,521 2.474 G OK 25-Apr-15 14:53 BackupTgvs2ToDisk
    38 Full 5,834 2.335 G OK 09-May-15 21:33 BackupTgvs2ToDisk
    42 Full 5,968 2.680 G OK 11-May-15 03:57 BackupTgvs2ToDisk
    43 Full 5,968 2.680 G OK 11-May-15 17:04 BackupTgvs2ToDisk
    44 Full 5,973 2.680 G OK 12-May-15 03:55 BackupTgvs2ToDisk
    46 Full 5,973 2.694 G OK 22-May-15 17:48 BackupTgvs2ToDisk
====
*m
22-May 17:25 qtron-dir: Console [default] from [127.0.0.1] cmdline status tgvs2-fd
22-May 17:25 qtron-dir: Console [default] from [127.0.0.1] cmdline status client=tgvs2-fd
22-May 17:29 qtron-dir: Console [default] from [127.0.0.1] cmdline status director
22-May 17:45 qtron-sd JobId 46: Elapsed time=00:24:52, Transfer rate=1.806 M Bytes/second
22-May 19:21 qtron-dir JobId 46: Fatal error: Network error with FD during Backup: ERR=Connection reset by peer
22-May 19:21 qtron-dir JobId 46: Fatal error: No Job status returned from FD.
22-May 19:21 qtron-dir JobId 46: Error: Bareos qtron-dir 14.2.2 (12Dec14):
  Build OS: i686-pc-linux-gnu ubuntu Ubuntu 14.04 LTS
  JobId: 46
  Job: BackupTgvs2ToDisk.2015-05-22_17.20.59_38
  Backup Level: Full
  Client: "tgvs2-fd" 14.2.2 (12Dec14) x86_64-pc-linux-gnu,ubuntu,Ubuntu 14.04 LTS
  FileSet: "tgvs2-FileSet" 2015-04-25 14:09:53
  Pool: "Tgvs2-Disk" (From Job resource)
  Catalog: "QtronCatalog" (From Client resource)
  Storage: "FileStorage1" (From command line)
  Scheduled time: 22-May-2015 17:20:49
  Start time: 22-May-2015 17:21:04
  End time: 22-May-2015 19:21:08
  Elapsed time: 2 hours 4 secs
  Priority: 10
  FD Files Written: 0
  SD Files Written: 5,973
  FD Bytes Written: 0 (0 B)
  SD Bytes Written: 2,695,453,866 (2.695 GB)
  Rate: 0.0 KB/s
  Software Compression: None
  VSS: no
  Encryption: no
  Accurate: yes
  Volume name(s): Tgvs2-0045
  Volume Session Id: 17
  Volume Session Time: 1428283496
  Last Volume Bytes: 2,697,624,629 (2.697 GB)
  Non-fatal FD errors: 1
  SD Errors: 0
  FD termination status: Error
  SD termination status: OK
  Termination: *** Backup Error ***

22-May 21:35 qtron-dir: Console [default] from [127.0.0.1] cmdline status direcror
22-May 21:37 qtron-dir: Console [default] from [127.0.0.1] cmdline status client=tgvs2-fd

Thanks

Tilman
TagsNo tags attached.

Activities

pstorz

pstorz

2015-05-23 23:17

administrator   ~0001738

Hello,

please understand that this is a bugtracker. This means that it is intended to track bugs.

If bareos does not behave as you expect, this does not mean that you have found a bug.

Please write such problems to the bareos-users mailinglist instead of filing a bug.


Regarding your problem:

"Elapsed time: 2 hours 4 secs" sounds like your firewall has a maximum session time of two hours and kills the connection.

Either configure your firewall to have a long enough session time or use the "Heartbeat Interval" directives create network traffic that keeps the connection open.


I hope this solves your problem.

Best regards,

Philipp
tilman

tilman

2015-05-24 19:53

reporter   ~0001739

Hallo Philipp

>If bareos does not behave as you expect, this does not mean that you have found a bug.
Absolutely

I interpreted the log like this:
22-May-2015 17:21: Job Started
22-May-2015 17:48: Client successfully finished backup: 2.694 GB in 5,973 files sent to storage daemon. Status (on client side): OK.
The Director misses the state change, potentially because the client side closes the socket connection to the director prematurely.
22-May-15 19:21: The Director times out after 2 hours, and as the connection is already closed, it records a network error.

With small amount of data to be stored (a couple of 10 MBs), the behaviour did not show, and the jobs completed successfully.

I do not know the protocol between client and director, and its underlying state machine. It is however obvious, that it needed to be changed for the implementation of the passive mode -- meaning that there are rather recent additions in this part of the code.

I am still not convinced that it is not a bug. I have read that you thoroughly test the software before releasing a new feature. Hence I am thinking of a glitch in the protocol machine for the passive mode that only shows given special preconditions (like an amount of data bigger than a certain threshold combined with client and director being in different nets)

I will read about the "Heartbeat Interval" directive, and give this a go.

Thanks

Tilman
tilman

tilman

2015-05-25 05:08

reporter   ~0001740

Dear Philipp

setting the heartbeat interval fixes the issue. So it is apparently not a bug, but a user issue on my part.

Thanks for your support

Tilman

Issue History

Date Modified Username Field Change
2015-05-22 23:13 tilman New Issue
2015-05-23 23:17 pstorz Note Added: 0001738
2015-05-23 23:17 pstorz Status new => feedback
2015-05-24 19:53 tilman Note Added: 0001739
2015-05-24 19:53 tilman Status feedback => new
2015-05-25 05:08 tilman Note Added: 0001740
2015-05-26 11:53 pstorz Status new => closed
2015-05-26 11:53 pstorz Assigned To => pstorz
2015-05-26 11:53 pstorz Resolution open => no change required
2015-05-26 11:53 pstorz Assigned To pstorz =>