View Issue Details

IDProjectCategoryView StatusLast Update
0001466bareos-corefile daemonpublic2022-11-10 16:53
Reporterkhvalera Assigned Tobruno-at-bareos  
PriorityurgentSeverityminorReproducibilityalways
Status closedResolutionfixed 
PlatformMacOSMacOS XOS Version10
Product Version21.1.3 
Fixed in Version22.0.0 
Summary0001466: Backups triggering twice with "Run On Incoming Connect Interval
DescriptionI am currently setting up backups for Laptops using the "Run On Incoming Connect Interval" feature. So far, backups are triggering upon connections every 12 to 16 hours, so I'm happy with that. However, every time backups are triggered, a second backup is scheduled to run immediately after. Backups running on normal schedules do not have this problem. Both client (MacOS) and director (Docker running on Linux) are running version 20.0.1. Here is an example of the log :

1623 Backup Cosmos 2021-12-27 22:56:09 2021-12-27 22:56:15 MacHome Incremental 18 6.37 MB
1622 Backup Cosmos 2021-12-27 22:55:09 2021-12-27 22:55:24 MacHome Incremental 385 291.17 MB
1610 Backup Cosmos 2021-12-27 09:06:38 2021-12-27 09:08:24 MacHome Incremental 2340 4.69 GB
1609 Backup Cosmos 2021-12-27 09:04:48 2021-12-27 09:06:34 MacHome Incremental 2339 4.7 GB
1608 Backup Cosmos 2021-12-26 18:31:34 2021-12-26 18:33:37 MacHome Incremental 5138 5.15 GB
1607 Backup Cosmos 2021-12-26 18:29:04 2021-12-26 18:31:23 MacHome Incremental 5138 5.15 GB

Steps To ReproduceThe configuration of the job is as follows :

Job {
  Name = "Backup Cosmos"
  Client = "cosmos"
  JobDefs = "DefaultJob"
  FileSet = "MacHome"

  RunOnIncomingConnectInterval = 12h

   # Run Script with would ping monitoring system removed for simplicity
}

JobDefs {
  Name = "DefaultJob"
  Type = Backup
  Storage = File
  Messages = Standard
  Pool = Incremental
  Priority = 10
  Write Bootstrap = "/var/lib/bareos/%c.bsr"
  Full Backup Pool = Full
  Differential Backup Pool = Differential
  Incremental Backup Pool = Incremental
  Max Diff Interval = 10d
  Max Full Interval = 35d
}

Client {
   Name = cosmos
   Address = x.x.x.x
   Password = xyz

   Connection From Client To Director = yes
   Heartbeat Interval = 60
}

On the client, the following line is added to the myself.conf

Connection From Client To Director = yes
Additional InformationLooking at the client log, I noticed that when the first backup job is started on a client, a second socket it opened :

elita (100): lib/bsys.cc:576-0 ReadStateFile seek to 192
elita (100): lib/recent_job_results_list.cc:57-0 Read num_items=10
elita (10): filed/socket_server.cc:128-0 filed: listening on port 9102
elita (100): lib/jcr.cc:195-0 Construct JobControlRecord
elita (100): lib/bnet_server_tcp.cc:160-0 Addresses host[ipv4;0.0.0.0;9102]
elita (100): lib/bsock.cc:84-0 Construct BareosSocket
elita (100): lib/bsock.cc:163-0 All source addresses
elita (100): lib/bsock_tcp.cc:234-0 Current host[ipv4;10.0.1.204;9101] All host[ipv4;10.0.1.204;9101]
elita (100): lib/bsock_tcp.cc:152-0 who=bareos-dir host=10.0.1.204 port=9101
elita (100): lib/tls_openssl_private.cc:65-0 Construct TlsOpenSslPrivate
elita (100): lib/tls_openssl_private.cc:606-0 Set tcp filedescriptor: <4>
elita (100): lib/tls_openssl_private.cc:618-0 Set protocol: <>
elita (100): lib/tls_openssl_private.cc:552-0 Set ca_certfile: <>
elita (100): lib/tls_openssl_private.cc:558-0 Set ca_certdir: <>
elita (100): lib/tls_openssl_private.cc:564-0 Set crlfile_: <>
elita (100): lib/tls_openssl_private.cc:570-0 Set certfile_: <>
elita (100): lib/tls_openssl_private.cc:576-0 Set keyfile_: <>
elita (100): lib/tls_openssl_private.cc:594-0 Set dhfile_: <>
elita (100): lib/tls_openssl_private.cc:612-0 Set cipherlist: <>
elita (100): lib/tls_openssl_private.cc:600-0 Set Verify Peer: <false>
elita (50): lib/tls_openssl.cc:65-0 Preparing TLS_PSK CLIENT context for identity R_CLIENT elita
elita (100): lib/tls_openssl_private.cc:537-0 psk_client_cb. identity: R_CLIENT elita.
elita (50): lib/bnet.cc:196-0 TLS client negotiation established.
elita (10): filed/dir_cmd.cc:2331-0 Opened connection with Director bareos-dir
elita (100): lib/cram_md5.cc:166-0 cram-get received: auth cram-md5 <489928684.1641010677@R_DIRECTOR::bareos-dir> ssl=1
elita (50): lib/cram_md5.cc:62-0 my_name: <R_CLIENT::elita> - challenge_name: <R_DIRECTOR::bareos-dir>
elita (99): lib/cram_md5.cc:231-0 sending resp to challenge: m89SGwF7Ax+VTU++K5+zEC
elita (50): lib/cram_md5.cc:106-0 send: auth cram-md5 <1288004040.1641010677@R_CLIENT::elita> ssl=1
elita (50): lib/cram_md5.cc:134-0 Authenticate OK f59utyt55ldroh+th4+gFD
elita (100): lib/jcr.cc:195-0 Construct JobControlRecord
elita (100): filed/dir_cmd.cc:462-0 <dird: JobId=1695 Job=Backup_Elita.2022-01-01_04.18.15_21 SDid=24 SDtime=1640922268 Authorization=LCEA-AANP-JPBG-CCLB-FEOF-NNMP-FMBO-PIFL ssl=4

elita (100): lib/bsock.cc:84-0 Construct BareosSocket
elita (100): lib/bsock.cc:163-0 All source addresses
elita (100): filed/dir_cmd.cc:473-0 Executing JobId= command.
elita (100): lib/bsock_tcp.cc:234-0 Current host[ipv4;10.0.1.204;9101] All host[ipv4;10.0.1.204;9101]
elita (5): filed/dir_cmd.cc:1623-0 set sd auth key
elita (5): filed/dir_cmd.cc:1626-0 set sd ssl_policy to 4
elita (100): lib/bsock_tcp.cc:152-0 who=bareos-dir host=10.0.1.204 port=9101
elita (100): lib/tls_openssl_private.cc:65-0 Construct TlsOpenSslPrivate
elita (100): lib/tls_openssl_private.cc:606-0 Set tcp filedescriptor: <5>
elita (100): lib/tls_openssl_private.cc:618-0 Set protocol: <>
elita (100): lib/tls_openssl_private.cc:552-0 Set ca_certfile: <>
elita (100): lib/tls_openssl_private.cc:558-0 Set ca_certdir: <>
elita (100): lib/tls_openssl_private.cc:564-0 Set crlfile_: <>
elita (100): lib/tls_openssl_private.cc:570-0 Set certfile_: <>
elita (100): lib/tls_openssl_private.cc:576-0 Set keyfile_: <>
elita (100): lib/tls_openssl_private.cc:594-0 Set dhfile_: <>
elita (100): lib/tls_openssl_private.cc:612-0 Set cipherlist: <>
elita (100): lib/tls_openssl_private.cc:600-0 Set Verify Peer: <false>
elita (50): lib/tls_openssl.cc:65-0 Preparing TLS_PSK CLIENT context for identity R_CLIENT elita
elita (100): lib/tls_openssl_private.cc:537-0 psk_client_cb. identity: R_CLIENT elita.
elita (50): lib/bnet.cc:196-0 TLS client negotiation established.
elita (10): filed/dir_cmd.cc:2331-0 Opened connection with Director bareos-dir
elita (100): filed/dir_cmd.cc:462-0 <dird: level = differential mtime_only=0

elita (100): filed/dir_cmd.cc:473-0 Executing level = command.
elita (10): filed/dir_cmd.cc:1447-0 LevelCmd: level = differential mtime_only=0
elita (100): filed/dir_cmd.cc:462-0 <dird: level = since_utime 1640620669 mtime_only=0 prev_job=Backup_Elita.2021-12-27_15.57.08_06

elita (100): filed/dir_cmd.cc:473-0 Executing level = command.
elita (100): lib/cram_md5.cc:166-0 elita (10): filed/dir_cmd.cc:1447-0 LevelCmd: level = since_utime 1640620669 mtime_only=0 prev_job=Backup_Elita.2021-12-27_15.57.08_06
cram-get received: auth cram-md5 <642302895.1641010697@R_DIRECTOR::bareos-dir> ssl=1
elita (100): filed/dir_cmd.cc:1499-0 since_time=1640620669 prev_job=Backup_Elita.2021-12-27_15.57.08_06
elita (50): lib/cram_md5.cc:62-0 my_name: <R_CLIENT::elita> - challenge_name: <R_DIRECTOR::bareos-dir>
elita (99): lib/cram_md5.cc:231-0 sending resp to challenge: K++4U7+Bt4UHJ4+U4QoiSB
elita (50): lib/cram_md5.cc:106-0 send: auth cram-md5 <861086444.1641010697@R_CLIENT::elita> ssl=1
elita (50): lib/cram_md5.cc:134-0 Authenticate OK cxBcYkYgy5xq17/ucR+b1D

According to this page ( https://docs.bareos.org/TasksAndConcepts/NetworkSetup.html ) this is normal behavior :

When a waiting connection is used for a job, the Bareos File Daemon will detect this and creates an additional connection. This is required, to keep the client responsive for additional commands, like cancel.

The problem is that the director detects this as a new incoming connection and triggers a second backup. I can even see the connect time change for the client in the Bareos-Dir Status (Client Initiated Connections (waiting for jobs)). Is there a special way to ignore this second connection, or is it possible that "Run On Incoming Connect Interval" is broken?
TagsNo tags attached.

Relationships

related to 0001433 closedarogge Release Bareos 22.0.0 
has duplicate 0001417 closedbruno-at-bareos Backups triggering twice with "Run On Incoming Connect Interval 

Activities

Beshiros

Beshiros

2022-05-27 23:11

reporter   ~0004626

I had the problem when running on Bareos 20.1. Upgrading to Bareos 21 and I still have the problem.
khvalera

khvalera

2022-05-27 23:11

reporter   ~0004627

Just ran into this same problem :( Linux Version 21.1.3
brockp

brockp

2022-09-14 05:00

reporter   ~0004748

Still an issue on 21.1.4
bruno-at-bareos

bruno-at-bareos

2022-09-21 14:38

manager   ~0004749

Thanks for your report, after checking the code, we maybe have found the issue.
There's a work in progress actually, and I would be able to propose packages to tests soon.

Can someone will be able to test them ? (they will be based on master)
bruno-at-bareos

bruno-at-bareos

2022-09-21 15:52

manager   ~0004754

I've now published normally fixed packages at https://download.bareos.org/bareos/experimental/CD/PR-1265/
We would really appreciate if any of you (or all) can confirm that works for your use case.
If confirmed quite quickly, it can and should be part of 21.1.5

Thanks
Beshiros

Beshiros

2022-09-21 19:21

reporter   ~0004761

What component do we need to update to test? Director? Client?
bruno-at-bareos

bruno-at-bareos

2022-09-22 09:58

manager   ~0004763

The patch included concern only the director. Actually master 22.0~pre) is totally compatible with 21 database schema.
So you can run the test director without trouble on a 21 database. Doing a dump of your db, and configuration before is always a good thing.
bruno-at-bareos

bruno-at-bareos

2022-09-22 17:19

manager   ~0004769

The best binaries to test will be those published on 22 September > 17h30
The fix has been improved to handle also just created jobs but that would have not yet started.
brockp

brockp

2022-09-23 04:59

reporter   ~0004770

I loaded the packages in PR-1265 earlier will report back.

Not sure how to grab the packages published on >17h30 (?)
Beshiros

Beshiros

2022-09-23 05:24

reporter   ~0004771

Same here, loaded the new packages on my director. It will take a couple of days to make sure there is no duplicate.
brockp

brockp

2022-09-26 03:19

reporter   ~0004773

I'm still getting some jobs killed by

25-Sep 11:38 myth-dir JobId 66664: Fatal error: JobId 66658 already running. Duplicate job not allowed.

Which I would expect this setting to fix. Now before I would get several of these a day but now I get 1-2. so it's better but still kicking off extras.
bruno-at-bareos

bruno-at-bareos

2022-09-28 13:14

manager   ~0004780

I'm trying to get a new build out, but we face some trouble with our Jenkins for the moment.
would you mind in the meantime to tell be a bit more about your

25-Sep 11:38 myth-dir JobId 66664: Fatal error: JobId 66658 already running. Duplicate job not allowed.

What kind of job are they, did they run a pre-post script, what is their status before being run etc.
Beshiros

Beshiros

2022-09-29 03:41

reporter   ~0004783

I'm still seeing duplicate jobs :

10965 Backup UltraMagnus ultramagnus Backup Incremental 2022-09-28 20:40:36 2022-09-28 20:40:36 00:00:00 0 0.00 B 1 Canceled
10964 Backup UltraMagnus ultramagnus Backup Incremental 2022-09-28 20:40:19 2022-09-28 20:40:54 00:00:35 1047 664.75 MB 0 Success

The second job gets cancelled because I use the following option

 Allow Duplicate Jobs = no

The configuration for the backup is as follows

Client {
   Name = ultramagnus
   Address = 10.0.1.2 # Symbolic IP
   Password = XYZ

   Connection From Client To Director = yes
   Connection From Director To Client = no
   Heartbeat Interval = 1 min
}

Job {
  Name = "Backup UltraMagnus"
  Client = "ultramagnus"
  JobDefs = "DefaultJobLaptop"
  FileSet = "MacHome"

  RunOnIncomingConnectInterval = 6h
  Allow Duplicate Jobs = no

  Run Before Job = "curl -m 10 --retry 5 http://hc-ping.com/72f87410-63c1-4e56-8002-e06638b67de8/start"
  Run After Job = "curl -m 10 --retry 5 http://hc-ping.com/72f87410-63c1-4e56-8002-e06638b67de8"
  Run After Failed Job = "curl -m 10 --retry 5 http://hc-ping.com/72f87410-63c1-4e56-8002-e06638b67de8/fail"
}

The curl calls are used for monitoring (cronitor) and return instantly.
bruno-at-bareos

bruno-at-bareos

2022-09-29 17:10

manager   ~0004790

Thanks all for participating, Finally I was able to get the new build out.

The new packages are available at https://download.bareos.org/bareos/experimental/CD/PR-1265/
dated of 2022-09-29 15:39

Would you mind to test them as before, and report any failure left.
bruno-at-bareos

bruno-at-bareos

2022-09-29 17:11

manager   ~0004791

github PR 1265
brockp

brockp

2022-10-01 04:12

reporter   ~0004793

Will do loaded it on today. Will see what happens.
brockp

brockp

2022-10-05 15:43

reporter   ~0004813

Been running a few days working flawlessly so far.
bruno-at-bareos

bruno-at-bareos

2022-10-06 17:11

manager   ~0004814

Thanks for the success report, I will do now the rest of the administrative work to make in included into master, and maybe backported to 21 (but no warranties for that to be accepted)
bruno-at-bareos

bruno-at-bareos

2022-10-31 10:03

manager   ~0004819

PR#1265 merged in master. No backport to previous version has been set.

Related Changesets

bareos: master 900f164b

2022-09-21 16:29

bruno-at-bareos


Committer: pstorz

Ported: N/A

Details Diff
core: BareosDb::FindLastJobStartTimeForJobAndClient: take into account Running job

- Adding Created and Running jobs to the query will avoid the start
of a second backup job for the same client while starttime is
overdue.
As we need to return a time_t and jobstatus = C doesn't have
starttime set, now() is used with casting forced to expected
timestamp without DST

Fix [BUG 0001466]

Signed-off-by: Bruno Friedmann <bruno.friedmann@bareos.com>
Affected Issues
0001466
mod - core/src/cats/sql_find.cc Diff File

Issue History

Date Modified Username Field Change
2022-05-27 23:11 khvalera New Issue
2022-05-27 23:11 khvalera Issue generated from: 0001417
2022-05-27 23:11 khvalera Note Added: 0004626
2022-05-27 23:11 khvalera Note Added: 0004627
2022-09-14 05:00 brockp Note Added: 0004748
2022-09-21 14:37 bruno-at-bareos Assigned To => bruno-at-bareos
2022-09-21 14:37 bruno-at-bareos Status new => assigned
2022-09-21 14:38 bruno-at-bareos Status assigned => acknowledged
2022-09-21 14:38 bruno-at-bareos Note Added: 0004749
2022-09-21 15:17 bruno-at-bareos Relationship added has duplicate 0001417
2022-09-21 15:52 bruno-at-bareos Note Added: 0004754
2022-09-21 19:21 Beshiros Note Added: 0004761
2022-09-22 09:58 bruno-at-bareos Note Added: 0004763
2022-09-22 17:19 bruno-at-bareos Note Added: 0004769
2022-09-23 04:59 brockp Note Added: 0004770
2022-09-23 05:24 Beshiros Note Added: 0004771
2022-09-26 03:19 brockp Note Added: 0004773
2022-09-28 13:14 bruno-at-bareos Note Added: 0004780
2022-09-29 03:41 Beshiros Note Added: 0004783
2022-09-29 17:10 bruno-at-bareos Note Added: 0004790
2022-09-29 17:11 bruno-at-bareos Status acknowledged => confirmed
2022-09-29 17:11 bruno-at-bareos Note Added: 0004791
2022-10-01 04:12 brockp Note Added: 0004793
2022-10-05 15:43 brockp Note Added: 0004813
2022-10-06 17:11 bruno-at-bareos Note Added: 0004814
2022-10-28 17:11 pstorz Changeset attached => bareos master 900f164b
2022-10-31 10:03 bruno-at-bareos Status confirmed => closed
2022-10-31 10:03 bruno-at-bareos Resolution open => fixed
2022-10-31 10:03 bruno-at-bareos Fixed in Version => 22.0.0
2022-10-31 10:03 bruno-at-bareos Note Added: 0004819
2022-11-10 16:53 arogge_adm Relationship added related to 0001433