View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0001466 | bareos-core | file daemon | public | 2022-05-27 23:11 | 2022-11-10 16:53 |
Reporter | khvalera | Assigned To | bruno-at-bareos | ||
Priority | urgent | Severity | minor | Reproducibility | always |
Status | closed | Resolution | fixed | ||
Platform | Mac | OS | MacOS X | OS Version | 10 |
Product Version | 21.1.3 | ||||
Fixed in Version | 22.0.0 | ||||
Summary | 0001466: Backups triggering twice with "Run On Incoming Connect Interval | ||||
Description | I am currently setting up backups for Laptops using the "Run On Incoming Connect Interval" feature. So far, backups are triggering upon connections every 12 to 16 hours, so I'm happy with that. However, every time backups are triggered, a second backup is scheduled to run immediately after. Backups running on normal schedules do not have this problem. Both client (MacOS) and director (Docker running on Linux) are running version 20.0.1. Here is an example of the log : 1623 Backup Cosmos 2021-12-27 22:56:09 2021-12-27 22:56:15 MacHome Incremental 18 6.37 MB 1622 Backup Cosmos 2021-12-27 22:55:09 2021-12-27 22:55:24 MacHome Incremental 385 291.17 MB 1610 Backup Cosmos 2021-12-27 09:06:38 2021-12-27 09:08:24 MacHome Incremental 2340 4.69 GB 1609 Backup Cosmos 2021-12-27 09:04:48 2021-12-27 09:06:34 MacHome Incremental 2339 4.7 GB 1608 Backup Cosmos 2021-12-26 18:31:34 2021-12-26 18:33:37 MacHome Incremental 5138 5.15 GB 1607 Backup Cosmos 2021-12-26 18:29:04 2021-12-26 18:31:23 MacHome Incremental 5138 5.15 GB | ||||
Steps To Reproduce | The configuration of the job is as follows : Job { Name = "Backup Cosmos" Client = "cosmos" JobDefs = "DefaultJob" FileSet = "MacHome" RunOnIncomingConnectInterval = 12h # Run Script with would ping monitoring system removed for simplicity } JobDefs { Name = "DefaultJob" Type = Backup Storage = File Messages = Standard Pool = Incremental Priority = 10 Write Bootstrap = "/var/lib/bareos/%c.bsr" Full Backup Pool = Full Differential Backup Pool = Differential Incremental Backup Pool = Incremental Max Diff Interval = 10d Max Full Interval = 35d } Client { Name = cosmos Address = x.x.x.x Password = xyz Connection From Client To Director = yes Heartbeat Interval = 60 } On the client, the following line is added to the myself.conf Connection From Client To Director = yes | ||||
Additional Information | Looking at the client log, I noticed that when the first backup job is started on a client, a second socket it opened : elita (100): lib/bsys.cc:576-0 ReadStateFile seek to 192 elita (100): lib/recent_job_results_list.cc:57-0 Read num_items=10 elita (10): filed/socket_server.cc:128-0 filed: listening on port 9102 elita (100): lib/jcr.cc:195-0 Construct JobControlRecord elita (100): lib/bnet_server_tcp.cc:160-0 Addresses host[ipv4;0.0.0.0;9102] elita (100): lib/bsock.cc:84-0 Construct BareosSocket elita (100): lib/bsock.cc:163-0 All source addresses elita (100): lib/bsock_tcp.cc:234-0 Current host[ipv4;10.0.1.204;9101] All host[ipv4;10.0.1.204;9101] elita (100): lib/bsock_tcp.cc:152-0 who=bareos-dir host=10.0.1.204 port=9101 elita (100): lib/tls_openssl_private.cc:65-0 Construct TlsOpenSslPrivate elita (100): lib/tls_openssl_private.cc:606-0 Set tcp filedescriptor: <4> elita (100): lib/tls_openssl_private.cc:618-0 Set protocol: <> elita (100): lib/tls_openssl_private.cc:552-0 Set ca_certfile: <> elita (100): lib/tls_openssl_private.cc:558-0 Set ca_certdir: <> elita (100): lib/tls_openssl_private.cc:564-0 Set crlfile_: <> elita (100): lib/tls_openssl_private.cc:570-0 Set certfile_: <> elita (100): lib/tls_openssl_private.cc:576-0 Set keyfile_: <> elita (100): lib/tls_openssl_private.cc:594-0 Set dhfile_: <> elita (100): lib/tls_openssl_private.cc:612-0 Set cipherlist: <> elita (100): lib/tls_openssl_private.cc:600-0 Set Verify Peer: <false> elita (50): lib/tls_openssl.cc:65-0 Preparing TLS_PSK CLIENT context for identity R_CLIENT elita elita (100): lib/tls_openssl_private.cc:537-0 psk_client_cb. identity: R_CLIENT elita. elita (50): lib/bnet.cc:196-0 TLS client negotiation established. elita (10): filed/dir_cmd.cc:2331-0 Opened connection with Director bareos-dir elita (100): lib/cram_md5.cc:166-0 cram-get received: auth cram-md5 <489928684.1641010677@R_DIRECTOR::bareos-dir> ssl=1 elita (50): lib/cram_md5.cc:62-0 my_name: <R_CLIENT::elita> - challenge_name: <R_DIRECTOR::bareos-dir> elita (99): lib/cram_md5.cc:231-0 sending resp to challenge: m89SGwF7Ax+VTU++K5+zEC elita (50): lib/cram_md5.cc:106-0 send: auth cram-md5 <1288004040.1641010677@R_CLIENT::elita> ssl=1 elita (50): lib/cram_md5.cc:134-0 Authenticate OK f59utyt55ldroh+th4+gFD elita (100): lib/jcr.cc:195-0 Construct JobControlRecord elita (100): filed/dir_cmd.cc:462-0 <dird: JobId=1695 Job=Backup_Elita.2022-01-01_04.18.15_21 SDid=24 SDtime=1640922268 Authorization=LCEA-AANP-JPBG-CCLB-FEOF-NNMP-FMBO-PIFL ssl=4 elita (100): lib/bsock.cc:84-0 Construct BareosSocket elita (100): lib/bsock.cc:163-0 All source addresses elita (100): filed/dir_cmd.cc:473-0 Executing JobId= command. elita (100): lib/bsock_tcp.cc:234-0 Current host[ipv4;10.0.1.204;9101] All host[ipv4;10.0.1.204;9101] elita (5): filed/dir_cmd.cc:1623-0 set sd auth key elita (5): filed/dir_cmd.cc:1626-0 set sd ssl_policy to 4 elita (100): lib/bsock_tcp.cc:152-0 who=bareos-dir host=10.0.1.204 port=9101 elita (100): lib/tls_openssl_private.cc:65-0 Construct TlsOpenSslPrivate elita (100): lib/tls_openssl_private.cc:606-0 Set tcp filedescriptor: <5> elita (100): lib/tls_openssl_private.cc:618-0 Set protocol: <> elita (100): lib/tls_openssl_private.cc:552-0 Set ca_certfile: <> elita (100): lib/tls_openssl_private.cc:558-0 Set ca_certdir: <> elita (100): lib/tls_openssl_private.cc:564-0 Set crlfile_: <> elita (100): lib/tls_openssl_private.cc:570-0 Set certfile_: <> elita (100): lib/tls_openssl_private.cc:576-0 Set keyfile_: <> elita (100): lib/tls_openssl_private.cc:594-0 Set dhfile_: <> elita (100): lib/tls_openssl_private.cc:612-0 Set cipherlist: <> elita (100): lib/tls_openssl_private.cc:600-0 Set Verify Peer: <false> elita (50): lib/tls_openssl.cc:65-0 Preparing TLS_PSK CLIENT context for identity R_CLIENT elita elita (100): lib/tls_openssl_private.cc:537-0 psk_client_cb. identity: R_CLIENT elita. elita (50): lib/bnet.cc:196-0 TLS client negotiation established. elita (10): filed/dir_cmd.cc:2331-0 Opened connection with Director bareos-dir elita (100): filed/dir_cmd.cc:462-0 <dird: level = differential mtime_only=0 elita (100): filed/dir_cmd.cc:473-0 Executing level = command. elita (10): filed/dir_cmd.cc:1447-0 LevelCmd: level = differential mtime_only=0 elita (100): filed/dir_cmd.cc:462-0 <dird: level = since_utime 1640620669 mtime_only=0 prev_job=Backup_Elita.2021-12-27_15.57.08_06 elita (100): filed/dir_cmd.cc:473-0 Executing level = command. elita (100): lib/cram_md5.cc:166-0 elita (10): filed/dir_cmd.cc:1447-0 LevelCmd: level = since_utime 1640620669 mtime_only=0 prev_job=Backup_Elita.2021-12-27_15.57.08_06 cram-get received: auth cram-md5 <642302895.1641010697@R_DIRECTOR::bareos-dir> ssl=1 elita (100): filed/dir_cmd.cc:1499-0 since_time=1640620669 prev_job=Backup_Elita.2021-12-27_15.57.08_06 elita (50): lib/cram_md5.cc:62-0 my_name: <R_CLIENT::elita> - challenge_name: <R_DIRECTOR::bareos-dir> elita (99): lib/cram_md5.cc:231-0 sending resp to challenge: K++4U7+Bt4UHJ4+U4QoiSB elita (50): lib/cram_md5.cc:106-0 send: auth cram-md5 <861086444.1641010697@R_CLIENT::elita> ssl=1 elita (50): lib/cram_md5.cc:134-0 Authenticate OK cxBcYkYgy5xq17/ucR+b1D According to this page ( https://docs.bareos.org/TasksAndConcepts/NetworkSetup.html ) this is normal behavior : When a waiting connection is used for a job, the Bareos File Daemon will detect this and creates an additional connection. This is required, to keep the client responsive for additional commands, like cancel. The problem is that the director detects this as a new incoming connection and triggers a second backup. I can even see the connect time change for the client in the Bareos-Dir Status (Client Initiated Connections (waiting for jobs)). Is there a special way to ignore this second connection, or is it possible that "Run On Incoming Connect Interval" is broken? | ||||
Tags | No tags attached. | ||||
related to | 0001433 | closed | arogge | Release Bareos 22.0.0 |
has duplicate | 0001417 | closed | bruno-at-bareos | Backups triggering twice with "Run On Incoming Connect Interval |
I had the problem when running on Bareos 20.1. Upgrading to Bareos 21 and I still have the problem. | |
Just ran into this same problem :( Linux Version 21.1.3 | |
Still an issue on 21.1.4 | |
Thanks for your report, after checking the code, we maybe have found the issue. There's a work in progress actually, and I would be able to propose packages to tests soon. Can someone will be able to test them ? (they will be based on master) |
|
I've now published normally fixed packages at https://download.bareos.org/bareos/experimental/CD/PR-1265/ We would really appreciate if any of you (or all) can confirm that works for your use case. If confirmed quite quickly, it can and should be part of 21.1.5 Thanks |
|
What component do we need to update to test? Director? Client? | |
The patch included concern only the director. Actually master 22.0~pre) is totally compatible with 21 database schema. So you can run the test director without trouble on a 21 database. Doing a dump of your db, and configuration before is always a good thing. |
|
The best binaries to test will be those published on 22 September > 17h30 The fix has been improved to handle also just created jobs but that would have not yet started. |
|
I loaded the packages in PR-1265 earlier will report back. Not sure how to grab the packages published on >17h30 (?) |
|
Same here, loaded the new packages on my director. It will take a couple of days to make sure there is no duplicate. | |
I'm still getting some jobs killed by 25-Sep 11:38 myth-dir JobId 66664: Fatal error: JobId 66658 already running. Duplicate job not allowed. Which I would expect this setting to fix. Now before I would get several of these a day but now I get 1-2. so it's better but still kicking off extras. |
|
I'm trying to get a new build out, but we face some trouble with our Jenkins for the moment. would you mind in the meantime to tell be a bit more about your 25-Sep 11:38 myth-dir JobId 66664: Fatal error: JobId 66658 already running. Duplicate job not allowed. What kind of job are they, did they run a pre-post script, what is their status before being run etc. |
|
I'm still seeing duplicate jobs : 10965 Backup UltraMagnus ultramagnus Backup Incremental 2022-09-28 20:40:36 2022-09-28 20:40:36 00:00:00 0 0.00 B 1 Canceled 10964 Backup UltraMagnus ultramagnus Backup Incremental 2022-09-28 20:40:19 2022-09-28 20:40:54 00:00:35 1047 664.75 MB 0 Success The second job gets cancelled because I use the following option Allow Duplicate Jobs = no The configuration for the backup is as follows Client { Name = ultramagnus Address = 10.0.1.2 # Symbolic IP Password = XYZ Connection From Client To Director = yes Connection From Director To Client = no Heartbeat Interval = 1 min } Job { Name = "Backup UltraMagnus" Client = "ultramagnus" JobDefs = "DefaultJobLaptop" FileSet = "MacHome" RunOnIncomingConnectInterval = 6h Allow Duplicate Jobs = no Run Before Job = "curl -m 10 --retry 5 http://hc-ping.com/72f87410-63c1-4e56-8002-e06638b67de8/start" Run After Job = "curl -m 10 --retry 5 http://hc-ping.com/72f87410-63c1-4e56-8002-e06638b67de8" Run After Failed Job = "curl -m 10 --retry 5 http://hc-ping.com/72f87410-63c1-4e56-8002-e06638b67de8/fail" } The curl calls are used for monitoring (cronitor) and return instantly. |
|
Thanks all for participating, Finally I was able to get the new build out. The new packages are available at https://download.bareos.org/bareos/experimental/CD/PR-1265/ dated of 2022-09-29 15:39 Would you mind to test them as before, and report any failure left. |
|
github PR 1265 | |
Will do loaded it on today. Will see what happens. | |
Been running a few days working flawlessly so far. | |
Thanks for the success report, I will do now the rest of the administrative work to make in included into master, and maybe backported to 21 (but no warranties for that to be accepted) | |
PR#1265 merged in master. No backport to previous version has been set. | |
bareos: master 900f164b 2022-09-21 16:29 Committer: pstorz Ported: N/A Details Diff |
core: BareosDb::FindLastJobStartTimeForJobAndClient: take into account Running job - Adding Created and Running jobs to the query will avoid the start of a second backup job for the same client while starttime is overdue. As we need to return a time_t and jobstatus = C doesn't have starttime set, now() is used with casting forced to expected timestamp without DST Fix [BUG 0001466] Signed-off-by: Bruno Friedmann <bruno.friedmann@bareos.com> |
Affected Issues 0001466 |
|
mod - core/src/cats/sql_find.cc | Diff File |
Date Modified | Username | Field | Change |
---|---|---|---|
2022-05-27 23:11 | khvalera | New Issue | |
2022-05-27 23:11 | khvalera | Issue generated from: 0001417 | |
2022-05-27 23:11 | khvalera | Note Added: 0004626 | |
2022-05-27 23:11 | khvalera | Note Added: 0004627 | |
2022-09-14 05:00 | brockp | Note Added: 0004748 | |
2022-09-21 14:37 | bruno-at-bareos | Assigned To | => bruno-at-bareos |
2022-09-21 14:37 | bruno-at-bareos | Status | new => assigned |
2022-09-21 14:38 | bruno-at-bareos | Status | assigned => acknowledged |
2022-09-21 14:38 | bruno-at-bareos | Note Added: 0004749 | |
2022-09-21 15:17 | bruno-at-bareos | Relationship added | has duplicate 0001417 |
2022-09-21 15:52 | bruno-at-bareos | Note Added: 0004754 | |
2022-09-21 19:21 | Beshiros | Note Added: 0004761 | |
2022-09-22 09:58 | bruno-at-bareos | Note Added: 0004763 | |
2022-09-22 17:19 | bruno-at-bareos | Note Added: 0004769 | |
2022-09-23 04:59 | brockp | Note Added: 0004770 | |
2022-09-23 05:24 | Beshiros | Note Added: 0004771 | |
2022-09-26 03:19 | brockp | Note Added: 0004773 | |
2022-09-28 13:14 | bruno-at-bareos | Note Added: 0004780 | |
2022-09-29 03:41 | Beshiros | Note Added: 0004783 | |
2022-09-29 17:10 | bruno-at-bareos | Note Added: 0004790 | |
2022-09-29 17:11 | bruno-at-bareos | Status | acknowledged => confirmed |
2022-09-29 17:11 | bruno-at-bareos | Note Added: 0004791 | |
2022-10-01 04:12 | brockp | Note Added: 0004793 | |
2022-10-05 15:43 | brockp | Note Added: 0004813 | |
2022-10-06 17:11 | bruno-at-bareos | Note Added: 0004814 | |
2022-10-28 17:11 | pstorz | Changeset attached | => bareos master 900f164b |
2022-10-31 10:03 | bruno-at-bareos | Status | confirmed => closed |
2022-10-31 10:03 | bruno-at-bareos | Resolution | open => fixed |
2022-10-31 10:03 | bruno-at-bareos | Fixed in Version | => 22.0.0 |
2022-10-31 10:03 | bruno-at-bareos | Note Added: 0004819 | |
2022-11-10 16:53 | arogge_adm | Relationship added | related to 0001433 |