View Issue Details

IDProjectCategoryView StatusLast Update
0000855bareos-coredirectorpublic2023-09-11 17:13
ReporterRyushin Assigned Tobruno-at-bareos  
PrioritynormalSeverityminorReproducibilityalways
Status closedResolutionunable to reproduce 
PlatformLinuxOSDebianOS Version8
Product Version17.2.3 
Summary0000855: Update Volume or Purge Volume Hangs on Using Catalog
DescriptionThis just started happening after the last couple of nightly updates. Pretty much at the time when the new mysql schema was updated in the Debian packages.

I've been using the same scripts that I've created the for the last couple of years without a problem. Now the scripts fail to run and hang when sending the command using echo and piping it to bconsole.

Example commands:
echo "update volume=vchanger_monthly_1.5tb_drives_0003_0001 ActionOnPurge=Truncate" | bconsole
echo "purge volume=vchanger_monthly_1.5tb_drives_0003_0001 action=truncate storage=vchanger_monthly_1.5tb pool=Off-site_Pool" | bconsole
echo "update volume=vchanger_monthly_1.5tb_drives_0003_0001 volstatus=Append storage=vchanger_monthly_1.5tb pool=Off-site_Pool" | bconsole

Example Output:
echo "update volume=vchanger_monthly_1.5tb_drives_0003_0001 ActionOnPurge=Truncate" | bconsoleConnecting to Director windwalker:9101
1000 OK: windwalker-dir Version: 17.2.3 (14 Aug 2017)
Enter a period to cancel a command.
update volume=vchanger_monthly_1.5tb_drives_0003_0001 ActionOnPurge=Truncate
Automatically selected Catalog: MyCatalog
Using Catalog "MyCatalog" <---- hangs here


Running bconsole directly, and then running the commands from inside bconsole seems to work most of the time. But after testing a few times, even inside of bconsole it hanges on the command. I restart the dir and sd daemons to unlock the issue (not sure if the sd daemon needs to be restarted or not)

It seems to be doing this with all of my scripts right now.
Additional InformationScript used:


bareos_purge_volumes_on_1.5tb_monthly_drive.sh:
#!/bin/bash
if [ "$1" = "" ]
then
    echo "Syntax: $0 <path where bareos volumes exist>"
    echo "I.e. $0 /mnt/vchanger/Monthly_1.5TB_01"
    exit 1
fi

echo "update slots storage=vchanger_monthly_1.5tb" | bconsole

for vol in $(ls $1/volumes/vchanger_monthly_1.5tb_drives_*)
do
    echo "update volume=$(basename $vol) ActionOnPurge=Truncate" | bconsole
    echo "purge volume=$(basename $vol) action=truncate storage=vchanger_monthly_1.5tb pool=Off-site_Pool" | bconsole
    echo "update volume=$(basename $vol) volstatus=Append storage=vchanger_monthly_1.5tb pool=Off-site_Pool" | bconsole
done



TagsNo tags attached.

Activities

Ryushin

Ryushin

2017-10-02 04:40

reporter   ~0002768

BTW, this is occurring on two different installations.
joergs

joergs

2017-10-02 16:47

developer   ~0002770

I don't think, that this issue is related to the catalog. However, there have been changes to the prune/purge and ActionOnPurge commands.

However, I'm not sure, if your script, even if it run, does what you expect.
In my example, with 17.2.4-rc1, volumes don't get truncated.

I would use:
<your volume loop>
  purge storage=File pool=Full volume=$VOLUME
# truncate all volumes with volstatus=Purged in pool Full
truncate volstatus=Purged pool=Full yes

Setting volstatus to append should not be required.

Back to your main problem:
no idea. Try to enable debug before the commands. Maybe the problem than gets obvious:

setdebug director level=200
Ryushin

Ryushin

2017-10-04 02:20

reporter  

windwalker-dir.trace (214,144 bytes)
Ryushin

Ryushin

2017-10-04 02:21

reporter   ~0002772

I'll update the script with your new commands. The script use to purge and truncate the volumes just fine.

I had to wait until I could change out the backup drive to another one so I can test again. Ran the script and it hung again. I turned setdebug to 200 and I recorded a trace file. I think everything after line 49 could be deleted (I think the webui queried at that time), but I left it just in case.
Ryushin

Ryushin

2017-10-07 14:00

reporter   ~0002776

Was the attached trace file useful? Is there anything else I can do?
joergs

joergs

2017-10-08 11:43

developer   ~0002778

As far as I can see, the relevant part starts at line 686.

windwalker-dir (10): ua_audit.c:143-0 : Console [default] from [192.168.9.1] cmdline update volume=vchanger_monthly_1.5tb_drives_0001_0001 ActionOnPurge=Truncate

I can't see an error there. It did what it should:
windwalker-dir (100): sql_query.c:124-0 called: bool B_DB::sql_query(const char*, int) with query UPDATE Media SET VolJobs=1,VolFiles=12,VolBlocks=832203,VolBytes=53687079247,VolMounts=1,VolErrors=0,VolWrites=832204,MaxVolBytes=53687091200,VolStatus='Append',Slot=1,InChanger=1,VolReadTime=0,VolWriteTime=583967158,LabelType=0,StorageId=3,PoolId=6,VolRetention=7344000,VolUseDuration=0,MaxVolJobs=0,MaxVolFiles=0,Enabled=1,LocationId=0,ScratchPoolId=0,RecyclePoolId=0,RecycleCount=0,Recycle=1,ActionOnPurge=1,MinBlocksize=0,MaxBlocksize=0 WHERE VolumeName='vchanger_monthly_1.5tb_drives_0001_0001'


Also the follow up commands also got through:
windwalker-dir (10): ua_audit.c:143-0 : Console [default] from [192.168.9.1] cmdline purge volume=vchanger_monthly_1.5tb_drives_0001_0001 action=truncate storage=vchanger_monthly_1.5tb pool=Off-site_Pool
...

Did you run the commands again manually? Otherwise I don't see an error here. Adding the timestamp to the trace might help to figure out, when a command is issued.

setdebug director level=200 timestamp=1

So, I can't reproduce the problem and the trace file looks as if everything run through.

Without further information, I afraid, I can't help with this.
Ryushin

Ryushin

2017-10-08 15:46

reporter   ~0002779

Newish error, I've seen it a couple of times.

Ran: ./bareos_purge_volumes_on_1.5tb_monthly_drive.sh /mnt/vchanger/Monthly_1.5TB_01
Output:
Connecting to Director windwalker:9101
1000 OK: windwalker-dir Version: 17.2.3 (14 Aug 2017)
Enter a period to cancel a command.
update slots storage=vchanger_monthly_1.5tb
Automatically selected Catalog: MyCatalog
Using Catalog "MyCatalog"
Connecting to Storage daemon vchanger_monthly_1.5tb at windwalker:9103 ...
3306 Issuing autochanger "list" command. <---hangs here

I've attached another trace file.
Ryushin

Ryushin

2017-10-08 15:50

reporter  

Ryushin

Ryushin

2017-10-08 15:59

reporter   ~0002780

Restarting bareos-dir and bareos-sd daemons got it to get past the list command.

Output:
Connecting to Director windwalker:9101
1000 OK: windwalker-dir Version: 17.2.3 (14 Aug 2017)
Enter a period to cancel a command.
update volume=vchanger_monthly_1.5tb_drives_0001_0001 ActionOnPurge=Truncate
Automatically selected Catalog: MyCatalog
Using Catalog "MyCatalog" <--- hangs here

I've attached another trace file.
Ryushin

Ryushin

2017-10-08 16:01

reporter  

Ryushin

Ryushin

2017-10-19 14:49

reporter   ~0002804

Was the latest trace file helpful? Is there anything else I can provide?
Ryushin

Ryushin

2017-12-28 00:11

reporter   ~0002849

Last edited: 2017-12-28 00:13

I'm thinking what might be the issue is the mysql operation is taking so long
to complete the script appears to hang since bconsole never sends back a
response. If the mysql completes its operation quickly, then the script
appears to work. Only a theory at this point while watching mytop.

MySQL on localhost (10.0.32) load 1.78 1.23 1.15 2/1055 5752 up 19+09:45:45 [16:08:59]
 Queries: 10.6k qps: 0 Slow: 0.0 Se/In/Up/De(%): 81471/00/00/00
 Sorts: 0 qps now: 1 Slow qps: 0.0 Threads: 4 ( 2/ 6) 00/00/00/00
 Key Efficiency: 0.1% Bps in/out: 0.2/ 25.3 Now in/out: 21.3/ 3.0k

       Id User Host/IP DB Time Cmd State Query
       -- ---- ------- -- ---- --- ----- ----------
      266 snort localhost snort 738 Sleep
   468041 bareos localhost bareos 72 Query updating DELETE FROM File WHERE JobId IN (4990)
   466542 bareos localhost bareos 9 Sleep
   462612 root localhost 0 Query init show full processlist

Ryushin

Ryushin

2017-12-28 01:55

reporter   ~0002850

Last edited: 2017-12-28 01:57

Actually, I don't think this a bug after all. It's a mysql problem.
The script is just taking far, far longer to run. To the point that it seems
hung. mytop is still showing a query after what must be over an hour at this point:

MySQL on localhost (10.0.32) load 2.10 2.48 2.43 1/1106 30277 up 19+11:29:46 [17:53:00]
 Queries: 15.3k qps: 0 Slow: 2.0 Se/In/Up/De(%): 56385/00/00/00
 Sorts: 0 qps now: 1 Slow qps: 0.0 Threads: 4 ( 2/ 8) 125/00/00/00
 Key Efficiency: 0.1% Bps in/out: 0.2/ 36.7 Now in/out: 21.4/ 3.1k

       Id User Host/IP DB Time Cmd State Query
       -- ---- ------- -- ---- --- ----- ----------
   468894 bareos localhost bareos 3372 Query updating DELETE FROM File WHERE JobId IN (5034,5035,5036,5037,503
      266 snort localhost snort 1342 Sleep
   466542 bareos localhost bareos 10 Sleep
   462612 root localhost 0 Query init show full processlist


Is there some kind of mysql schema code update that fixes how long
the updating DELETE File WHERE JobId IN <JOB> takes or some kind
of optimization I can make?

The query is just taking far longer since the schema update.

Ryushin

Ryushin

2017-12-28 13:42

reporter   ~0002851

After four hours I stopped waiting for it to finish and I went to bed. It had only gotten to 47
out of 140 volumes. I spent quite a bit of time trying to improve the performance
of mysql with no noticeable improvement as I'm definitely I/O bound right now. The
File.ibd database is over 26GB in size. I run a monthly mysql optimize.
The mysql database resides on zfs raidz2 volume made up of four 300GB 15K drives.

My mysql performance parameters are as follows:
default_storage_engine=innodb
innodb_file_per_table=1
innodb_buffer_pool_size=4096M
innodb_log_file_size=128M
innodb_flush_log_at_trx_commit=2
innodb_flush_method=O_DIRECT
innodb_buffer_pool_instances=8
innodb_thread_concurrency=8
innodb_checksum_algorithm=none
innodb_doublewrite = 0

Not sure what else I can try and do to solve this. Before the schema update, the
longest it took was about 45 minutes.
bruno-at-bareos

bruno-at-bareos

2023-09-11 17:13

manager   ~0005406

Closing oldies

Issue History

Date Modified Username Field Change
2017-10-02 04:23 Ryushin New Issue
2017-10-02 04:40 Ryushin Note Added: 0002768
2017-10-02 16:47 joergs Note Added: 0002770
2017-10-02 16:47 joergs Assigned To => joergs
2017-10-02 16:47 joergs Status new => feedback
2017-10-04 02:20 Ryushin File Added: windwalker-dir.trace
2017-10-04 02:21 Ryushin Note Added: 0002772
2017-10-04 02:21 Ryushin Status feedback => assigned
2017-10-07 14:00 Ryushin Note Added: 0002776
2017-10-08 11:43 joergs Note Added: 0002778
2017-10-08 11:43 joergs Status assigned => feedback
2017-10-08 15:46 Ryushin Note Added: 0002779
2017-10-08 15:46 Ryushin Status feedback => assigned
2017-10-08 15:50 Ryushin File Added: windwalker-dir.trace_20171008
2017-10-08 15:59 Ryushin Note Added: 0002780
2017-10-08 16:01 Ryushin File Added: windwalker-dir.trace_20171008_2
2017-10-19 14:49 Ryushin Note Added: 0002804
2017-12-28 00:11 Ryushin Note Added: 0002849
2017-12-28 00:13 Ryushin Note Edited: 0002849
2017-12-28 01:55 Ryushin Note Added: 0002850
2017-12-28 01:56 Ryushin Note Edited: 0002850
2017-12-28 01:57 Ryushin Note Edited: 0002850
2017-12-28 13:42 Ryushin Note Added: 0002851
2019-09-27 16:42 joergs Assigned To joergs =>
2023-09-11 17:13 bruno-at-bareos Assigned To => bruno-at-bareos
2023-09-11 17:13 bruno-at-bareos Status assigned => closed
2023-09-11 17:13 bruno-at-bareos Resolution open => unable to reproduce
2023-09-11 17:13 bruno-at-bareos Note Added: 0005406