0000855: Update Volume or Purge Volume Hangs on Using Catalog

ID	Project	Category	View Status	Date Submitted	Last Update

0000855	bareos-core	director	public	2017-10-02 04:23	2023-09-11 17:13

Reporter	Ryushin	Assigned To	bruno-at-bareos
Priority	normal	Severity	minor	Reproducibility	always
Status	closed	Resolution	unable to reproduce
Platform	Linux	OS	Debian	OS Version	8
Product Version	17.2.3

Summary	0000855: Update Volume or Purge Volume Hangs on Using Catalog
Description	This just started happening after the last couple of nightly updates. Pretty much at the time when the new mysql schema was updated in the Debian packages. I've been using the same scripts that I've created the for the last couple of years without a problem. Now the scripts fail to run and hang when sending the command using echo and piping it to bconsole. Example commands: echo "update volume=vchanger_monthly_1.5tb_drives_0003_0001 ActionOnPurge=Truncate" \| bconsole echo "purge volume=vchanger_monthly_1.5tb_drives_0003_0001 action=truncate storage=vchanger_monthly_1.5tb pool=Off-site_Pool" \| bconsole echo "update volume=vchanger_monthly_1.5tb_drives_0003_0001 volstatus=Append storage=vchanger_monthly_1.5tb pool=Off-site_Pool" \| bconsole Example Output: echo "update volume=vchanger_monthly_1.5tb_drives_0003_0001 ActionOnPurge=Truncate" \| bconsoleConnecting to Director windwalker:9101 1000 OK: windwalker-dir Version: 17.2.3 (14 Aug 2017) Enter a period to cancel a command. update volume=vchanger_monthly_1.5tb_drives_0003_0001 ActionOnPurge=Truncate Automatically selected Catalog: MyCatalog Using Catalog "MyCatalog" <---- hangs here Running bconsole directly, and then running the commands from inside bconsole seems to work most of the time. But after testing a few times, even inside of bconsole it hanges on the command. I restart the dir and sd daemons to unlock the issue (not sure if the sd daemon needs to be restarted or not) It seems to be doing this with all of my scripts right now.
Additional Information	Script used: bareos_purge_volumes_on_1.5tb_monthly_drive.sh: #!/bin/bash if [ "$1" = "" ] then echo "Syntax: $0 <path where bareos volumes exist>" echo "I.e. $0 /mnt/vchanger/Monthly_1.5TB_01" exit 1 fi echo "update slots storage=vchanger_monthly_1.5tb" \| bconsole for vol in $(ls $1/volumes/vchanger_monthly_1.5tb_drives_*) do echo "update volume=$(basename $vol) ActionOnPurge=Truncate" \| bconsole echo "purge volume=$(basename $vol) action=truncate storage=vchanger_monthly_1.5tb pool=Off-site_Pool" \| bconsole echo "update volume=$(basename $vol) volstatus=Append storage=vchanger_monthly_1.5tb pool=Off-site_Pool" \| bconsole done
Tags	No tags attached.

Ryushin 2017-10-02 04:40 reporter ~0002768	BTW, this is occurring on two different installations.

joergs 2017-10-02 16:47 developer ~0002770	I don't think, that this issue is related to the catalog. However, there have been changes to the prune/purge and ActionOnPurge commands. However, I'm not sure, if your script, even if it run, does what you expect. In my example, with 17.2.4-rc1, volumes don't get truncated. I would use: <your volume loop> purge storage=File pool=Full volume=$VOLUME # truncate all volumes with volstatus=Purged in pool Full truncate volstatus=Purged pool=Full yes Setting volstatus to append should not be required. Back to your main problem: no idea. Try to enable debug before the commands. Maybe the problem than gets obvious: setdebug director level=200

Ryushin 2017-10-04 02:20 reporter	windwalker-dir.trace (214,144 bytes)

Ryushin 2017-10-04 02:21 reporter ~0002772	I'll update the script with your new commands. The script use to purge and truncate the volumes just fine. I had to wait until I could change out the backup drive to another one so I can test again. Ran the script and it hung again. I turned setdebug to 200 and I recorded a trace file. I think everything after line 49 could be deleted (I think the webui queried at that time), but I left it just in case.

Ryushin 2017-10-07 14:00 reporter ~0002776	Was the attached trace file useful? Is there anything else I can do?

joergs 2017-10-08 11:43 developer ~0002778	As far as I can see, the relevant part starts at line 686. windwalker-dir (10): ua_audit.c:143-0 : Console [default] from [192.168.9.1] cmdline update volume=vchanger_monthly_1.5tb_drives_0001_0001 ActionOnPurge=Truncate I can't see an error there. It did what it should: windwalker-dir (100): sql_query.c:124-0 called: bool B_DB::sql_query(const char*, int) with query UPDATE Media SET VolJobs=1,VolFiles=12,VolBlocks=832203,VolBytes=53687079247,VolMounts=1,VolErrors=0,VolWrites=832204,MaxVolBytes=53687091200,VolStatus='Append',Slot=1,InChanger=1,VolReadTime=0,VolWriteTime=583967158,LabelType=0,StorageId=3,PoolId=6,VolRetention=7344000,VolUseDuration=0,MaxVolJobs=0,MaxVolFiles=0,Enabled=1,LocationId=0,ScratchPoolId=0,RecyclePoolId=0,RecycleCount=0,Recycle=1,ActionOnPurge=1,MinBlocksize=0,MaxBlocksize=0 WHERE VolumeName='vchanger_monthly_1.5tb_drives_0001_0001' Also the follow up commands also got through: windwalker-dir (10): ua_audit.c:143-0 : Console [default] from [192.168.9.1] cmdline purge volume=vchanger_monthly_1.5tb_drives_0001_0001 action=truncate storage=vchanger_monthly_1.5tb pool=Off-site_Pool ... Did you run the commands again manually? Otherwise I don't see an error here. Adding the timestamp to the trace might help to figure out, when a command is issued. setdebug director level=200 timestamp=1 So, I can't reproduce the problem and the trace file looks as if everything run through. Without further information, I afraid, I can't help with this.

Ryushin 2017-10-08 15:46 reporter ~0002779	Newish error, I've seen it a couple of times. Ran: ./bareos_purge_volumes_on_1.5tb_monthly_drive.sh /mnt/vchanger/Monthly_1.5TB_01 Output: Connecting to Director windwalker:9101 1000 OK: windwalker-dir Version: 17.2.3 (14 Aug 2017) Enter a period to cancel a command. update slots storage=vchanger_monthly_1.5tb Automatically selected Catalog: MyCatalog Using Catalog "MyCatalog" Connecting to Storage daemon vchanger_monthly_1.5tb at windwalker:9103 ... 3306 Issuing autochanger "list" command. <---hangs here I've attached another trace file.

Ryushin 2017-10-08 15:50 reporter	windwalker-dir.trace_20171008 (10,636 bytes)

Ryushin 2017-10-08 15:59 reporter ~0002780	Restarting bareos-dir and bareos-sd daemons got it to get past the list command. Output: Connecting to Director windwalker:9101 1000 OK: windwalker-dir Version: 17.2.3 (14 Aug 2017) Enter a period to cancel a command. update volume=vchanger_monthly_1.5tb_drives_0001_0001 ActionOnPurge=Truncate Automatically selected Catalog: MyCatalog Using Catalog "MyCatalog" <--- hangs here I've attached another trace file.

Ryushin 2017-10-08 16:01 reporter	windwalker-dir.trace_20171008_2 (57,958 bytes)

Ryushin 2017-10-19 14:49 reporter ~0002804	Was the latest trace file helpful? Is there anything else I can provide?

Ryushin 2017-12-28 00:11 reporter ~0002849 Last edited: 2017-12-28 00:13	I'm thinking what might be the issue is the mysql operation is taking so long to complete the script appears to hang since bconsole never sends back a response. If the mysql completes its operation quickly, then the script appears to work. Only a theory at this point while watching mytop. MySQL on localhost (10.0.32) load 1.78 1.23 1.15 2/1055 5752 up 19+09:45:45 [16:08:59] Queries: 10.6k qps: 0 Slow: 0.0 Se/In/Up/De(%): 81471/00/00/00 Sorts: 0 qps now: 1 Slow qps: 0.0 Threads: 4 ( 2/ 6) 00/00/00/00 Key Efficiency: 0.1% Bps in/out: 0.2/ 25.3 Now in/out: 21.3/ 3.0k Id User Host/IP DB Time Cmd State Query -- ---- ------- -- ---- --- ----- ---------- 266 snort localhost snort 738 Sleep 468041 bareos localhost bareos 72 Query updating DELETE FROM File WHERE JobId IN (4990) 466542 bareos localhost bareos 9 Sleep 462612 root localhost 0 Query init show full processlist

Ryushin 2017-12-28 01:55 reporter ~0002850 Last edited: 2017-12-28 01:57	Actually, I don't think this a bug after all. It's a mysql problem. The script is just taking far, far longer to run. To the point that it seems hung. mytop is still showing a query after what must be over an hour at this point: MySQL on localhost (10.0.32) load 2.10 2.48 2.43 1/1106 30277 up 19+11:29:46 [17:53:00] Queries: 15.3k qps: 0 Slow: 2.0 Se/In/Up/De(%): 56385/00/00/00 Sorts: 0 qps now: 1 Slow qps: 0.0 Threads: 4 ( 2/ 8) 125/00/00/00 Key Efficiency: 0.1% Bps in/out: 0.2/ 36.7 Now in/out: 21.4/ 3.1k Id User Host/IP DB Time Cmd State Query -- ---- ------- -- ---- --- ----- ---------- 468894 bareos localhost bareos 3372 Query updating DELETE FROM File WHERE JobId IN (5034,5035,5036,5037,503 266 snort localhost snort 1342 Sleep 466542 bareos localhost bareos 10 Sleep 462612 root localhost 0 Query init show full processlist Is there some kind of mysql schema code update that fixes how long the updating DELETE File WHERE JobId IN <JOB> takes or some kind of optimization I can make? The query is just taking far longer since the schema update.

Ryushin 2017-12-28 13:42 reporter ~0002851	After four hours I stopped waiting for it to finish and I went to bed. It had only gotten to 47 out of 140 volumes. I spent quite a bit of time trying to improve the performance of mysql with no noticeable improvement as I'm definitely I/O bound right now. The File.ibd database is over 26GB in size. I run a monthly mysql optimize. The mysql database resides on zfs raidz2 volume made up of four 300GB 15K drives. My mysql performance parameters are as follows: default_storage_engine=innodb innodb_file_per_table=1 innodb_buffer_pool_size=4096M innodb_log_file_size=128M innodb_flush_log_at_trx_commit=2 innodb_flush_method=O_DIRECT innodb_buffer_pool_instances=8 innodb_thread_concurrency=8 innodb_checksum_algorithm=none innodb_doublewrite = 0 Not sure what else I can try and do to solve this. Before the schema update, the longest it took was about 45 minutes.

bruno-at-bareos 2023-09-11 17:13 manager ~0005406	Closing oldies

Date Modified	Username	Field	Change
2017-10-02 04:23	Ryushin	New Issue
2017-10-02 04:40	Ryushin	Note Added: 0002768
2017-10-02 16:47	joergs	Note Added: 0002770
2017-10-02 16:47	joergs	Assigned To	=> joergs
2017-10-02 16:47	joergs	Status	new => feedback
2017-10-04 02:20	Ryushin	File Added: windwalker-dir.trace
2017-10-04 02:21	Ryushin	Note Added: 0002772
2017-10-04 02:21	Ryushin	Status	feedback => assigned
2017-10-07 14:00	Ryushin	Note Added: 0002776
2017-10-08 11:43	joergs	Note Added: 0002778
2017-10-08 11:43	joergs	Status	assigned => feedback
2017-10-08 15:46	Ryushin	Note Added: 0002779
2017-10-08 15:46	Ryushin	Status	feedback => assigned
2017-10-08 15:50	Ryushin	File Added: windwalker-dir.trace_20171008
2017-10-08 15:59	Ryushin	Note Added: 0002780
2017-10-08 16:01	Ryushin	File Added: windwalker-dir.trace_20171008_2
2017-10-19 14:49	Ryushin	Note Added: 0002804
2017-12-28 00:11	Ryushin	Note Added: 0002849
2017-12-28 00:13	Ryushin	Note Edited: 0002849
2017-12-28 01:55	Ryushin	Note Added: 0002850
2017-12-28 01:56	Ryushin	Note Edited: 0002850
2017-12-28 01:57	Ryushin	Note Edited: 0002850
2017-12-28 13:42	Ryushin	Note Added: 0002851
2019-09-27 16:42	joergs	Assigned To	joergs =>
2023-09-11 17:13	bruno-at-bareos	Assigned To	=> bruno-at-bareos
2023-09-11 17:13	bruno-at-bareos	Status	assigned => closed
2023-09-11 17:13	bruno-at-bareos	Resolution	open => unable to reproduce
2023-09-11 17:13	bruno-at-bareos	Note Added: 0005406

Reporting new Issues is disabled, please Report new Issues at https://github.com/bareos/bareos/issues

View Issue Details

Activities

Issue History