Bareos Bug Tracker
Bareos Bug Tracker

View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0000855bareos-core[All Projects] directorpublic2017-10-02 04:232017-12-28 13:42
ReporterRyushin 
Assigned Tojoergs 
PrioritynormalSeverityminorReproducibilityalways
StatusassignedResolutionopen 
PlatformLinuxOSDebianOS Version8
Product Version17.2.3 
Target VersionFixed in Version 
Summary0000855: Update Volume or Purge Volume Hangs on Using Catalog
DescriptionThis just started happening after the last couple of nightly updates. Pretty much at the time when the new mysql schema was updated in the Debian packages.

I've been using the same scripts that I've created the for the last couple of years without a problem. Now the scripts fail to run and hang when sending the command using echo and piping it to bconsole.

Example commands:
echo "update volume=vchanger_monthly_1.5tb_drives_0003_0001 ActionOnPurge=Truncate" | bconsole
echo "purge volume=vchanger_monthly_1.5tb_drives_0003_0001 action=truncate storage=vchanger_monthly_1.5tb pool=Off-site_Pool" | bconsole
echo "update volume=vchanger_monthly_1.5tb_drives_0003_0001 volstatus=Append storage=vchanger_monthly_1.5tb pool=Off-site_Pool" | bconsole

Example Output:
echo "update volume=vchanger_monthly_1.5tb_drives_0003_0001 ActionOnPurge=Truncate" | bconsoleConnecting to Director windwalker:9101
1000 OK: windwalker-dir Version: 17.2.3 (14 Aug 2017)
Enter a period to cancel a command.
update volume=vchanger_monthly_1.5tb_drives_0003_0001 ActionOnPurge=Truncate
Automatically selected Catalog: MyCatalog
Using Catalog "MyCatalog" <---- hangs here


Running bconsole directly, and then running the commands from inside bconsole seems to work most of the time. But after testing a few times, even inside of bconsole it hanges on the command. I restart the dir and sd daemons to unlock the issue (not sure if the sd daemon needs to be restarted or not)

It seems to be doing this with all of my scripts right now.
Additional InformationScript used:


bareos_purge_volumes_on_1.5tb_monthly_drive.sh:
#!/bin/bash
if [ "$1" = "" ]
then
    echo "Syntax: $0 <path where bareos volumes exist>"
    echo "I.e. $0 /mnt/vchanger/Monthly_1.5TB_01"
    exit 1
fi

echo "update slots storage=vchanger_monthly_1.5tb" | bconsole

for vol in $(ls $1/volumes/vchanger_monthly_1.5tb_drives_*)
do
    echo "update volume=$(basename $vol) ActionOnPurge=Truncate" | bconsole
    echo "purge volume=$(basename $vol) action=truncate storage=vchanger_monthly_1.5tb pool=Off-site_Pool" | bconsole
    echo "update volume=$(basename $vol) volstatus=Append storage=vchanger_monthly_1.5tb pool=Off-site_Pool" | bconsole
done



TagsNo tags attached.
bareos-master: impactyes
bareos-master: action
bareos-17.2: impactyes
bareos-17.2: action
bareos-16.2: impact
bareos-16.2: action
bareos-15.2: impactno
bareos-15.2: action
bareos-14.2: impactno
bareos-14.2: action
bareos-13.2: impactno
bareos-13.2: action
bareos-12.4: impactno
bareos-12.4: action
Attached Files? file icon windwalker-dir.trace [^] (214,144 bytes) 2017-10-04 02:20
? file icon windwalker-dir.trace_20171008 [^] (10,636 bytes) 2017-10-08 15:50
? file icon windwalker-dir.trace_20171008_2 [^] (57,958 bytes) 2017-10-08 16:01

- Relationships

-  Notes
(0002768)
Ryushin (reporter)
2017-10-02 04:40

BTW, this is occurring on two different installations.
(0002770)
joergs (administrator)
2017-10-02 16:47

I don't think, that this issue is related to the catalog. However, there have been changes to the prune/purge and ActionOnPurge commands.

However, I'm not sure, if your script, even if it run, does what you expect.
In my example, with 17.2.4-rc1, volumes don't get truncated.

I would use:
<your volume loop>
  purge storage=File pool=Full volume=$VOLUME
# truncate all volumes with volstatus=Purged in pool Full
truncate volstatus=Purged pool=Full yes

Setting volstatus to append should not be required.

Back to your main problem:
no idea. Try to enable debug before the commands. Maybe the problem than gets obvious:

setdebug director level=200
(0002772)
Ryushin (reporter)
2017-10-04 02:21

I'll update the script with your new commands. The script use to purge and truncate the volumes just fine.

I had to wait until I could change out the backup drive to another one so I can test again. Ran the script and it hung again. I turned setdebug to 200 and I recorded a trace file. I think everything after line 49 could be deleted (I think the webui queried at that time), but I left it just in case.
(0002776)
Ryushin (reporter)
2017-10-07 14:00

Was the attached trace file useful? Is there anything else I can do?
(0002778)
joergs (administrator)
2017-10-08 11:43

As far as I can see, the relevant part starts at line 686.

windwalker-dir (10): ua_audit.c:143-0 : Console [default] from [192.168.9.1] cmdline update volume=vchanger_monthly_1.5tb_drives_0001_0001 ActionOnPurge=Truncate

I can't see an error there. It did what it should:
windwalker-dir (100): sql_query.c:124-0 called: bool B_DB::sql_query(const char*, int) with query UPDATE Media SET VolJobs=1,VolFiles=12,VolBlocks=832203,VolBytes=53687079247,VolMounts=1,VolErrors=0,VolWrites=832204,MaxVolBytes=53687091200,VolStatus='Append',Slot=1,InChanger=1,VolReadTime=0,VolWriteTime=583967158,LabelType=0,StorageId=3,PoolId=6,VolRetention=7344000,VolUseDuration=0,MaxVolJobs=0,MaxVolFiles=0,Enabled=1,LocationId=0,ScratchPoolId=0,RecyclePoolId=0,RecycleCount=0,Recycle=1,ActionOnPurge=1,MinBlocksize=0,MaxBlocksize=0 WHERE VolumeName='vchanger_monthly_1.5tb_drives_0001_0001'


Also the follow up commands also got through:
windwalker-dir (10): ua_audit.c:143-0 : Console [default] from [192.168.9.1] cmdline purge volume=vchanger_monthly_1.5tb_drives_0001_0001 action=truncate storage=vchanger_monthly_1.5tb pool=Off-site_Pool
...

Did you run the commands again manually? Otherwise I don't see an error here. Adding the timestamp to the trace might help to figure out, when a command is issued.

setdebug director level=200 timestamp=1

So, I can't reproduce the problem and the trace file looks as if everything run through.

Without further information, I afraid, I can't help with this.
(0002779)
Ryushin (reporter)
2017-10-08 15:46

Newish error, I've seen it a couple of times.

Ran: ./bareos_purge_volumes_on_1.5tb_monthly_drive.sh /mnt/vchanger/Monthly_1.5TB_01
Output:
Connecting to Director windwalker:9101
1000 OK: windwalker-dir Version: 17.2.3 (14 Aug 2017)
Enter a period to cancel a command.
update slots storage=vchanger_monthly_1.5tb
Automatically selected Catalog: MyCatalog
Using Catalog "MyCatalog"
Connecting to Storage daemon vchanger_monthly_1.5tb at windwalker:9103 ...
3306 Issuing autochanger "list" command. <---hangs here

I've attached another trace file.
(0002780)
Ryushin (reporter)
2017-10-08 15:59

Restarting bareos-dir and bareos-sd daemons got it to get past the list command.

Output:
Connecting to Director windwalker:9101
1000 OK: windwalker-dir Version: 17.2.3 (14 Aug 2017)
Enter a period to cancel a command.
update volume=vchanger_monthly_1.5tb_drives_0001_0001 ActionOnPurge=Truncate
Automatically selected Catalog: MyCatalog
Using Catalog "MyCatalog" <--- hangs here

I've attached another trace file.
(0002804)
Ryushin (reporter)
2017-10-19 14:49

Was the latest trace file helpful? Is there anything else I can provide?
(0002849)
Ryushin (reporter)
2017-12-28 00:11
edited on: 2017-12-28 00:13

I'm thinking what might be the issue is the mysql operation is taking so long
to complete the script appears to hang since bconsole never sends back a
response. If the mysql completes its operation quickly, then the script
appears to work. Only a theory at this point while watching mytop.

MySQL on localhost (10.0.32) load 1.78 1.23 1.15 2/1055 5752 up 19+09:45:45 [16:08:59]
 Queries: 10.6k qps: 0 Slow: 0.0 Se/In/Up/De(%): 81471/00/00/00
 Sorts: 0 qps now: 1 Slow qps: 0.0 Threads: 4 ( 2/ 6) 00/00/00/00
 Key Efficiency: 0.1% Bps in/out: 0.2/ 25.3 Now in/out: 21.3/ 3.0k

       Id User Host/IP DB Time Cmd State Query
       -- ---- ------- -- ---- --- ----- ----------
      266 snort localhost snort 738 Sleep
   468041 bareos localhost bareos 72 Query updating DELETE FROM File WHERE JobId IN (4990)
   466542 bareos localhost bareos 9 Sleep
   462612 root localhost 0 Query init show full processlist

(0002850)
Ryushin (reporter)
2017-12-28 01:55
edited on: 2017-12-28 01:57

Actually, I don't think this a bug after all. It's a mysql problem.
The script is just taking far, far longer to run. To the point that it seems
hung. mytop is still showing a query after what must be over an hour at this point:

MySQL on localhost (10.0.32) load 2.10 2.48 2.43 1/1106 30277 up 19+11:29:46 [17:53:00]
 Queries: 15.3k qps: 0 Slow: 2.0 Se/In/Up/De(%): 56385/00/00/00
 Sorts: 0 qps now: 1 Slow qps: 0.0 Threads: 4 ( 2/ 8) 125/00/00/00
 Key Efficiency: 0.1% Bps in/out: 0.2/ 36.7 Now in/out: 21.4/ 3.1k

       Id User Host/IP DB Time Cmd State Query
       -- ---- ------- -- ---- --- ----- ----------
   468894 bareos localhost bareos 3372 Query updating DELETE FROM File WHERE JobId IN (5034,5035,5036,5037,503
      266 snort localhost snort 1342 Sleep
   466542 bareos localhost bareos 10 Sleep
   462612 root localhost 0 Query init show full processlist


Is there some kind of mysql schema code update that fixes how long
the updating DELETE File WHERE JobId IN <JOB> takes or some kind
of optimization I can make?

The query is just taking far longer since the schema update.

(0002851)
Ryushin (reporter)
2017-12-28 13:42

After four hours I stopped waiting for it to finish and I went to bed. It had only gotten to 47
out of 140 volumes. I spent quite a bit of time trying to improve the performance
of mysql with no noticeable improvement as I'm definitely I/O bound right now. The
File.ibd database is over 26GB in size. I run a monthly mysql optimize.
The mysql database resides on zfs raidz2 volume made up of four 300GB 15K drives.

My mysql performance parameters are as follows:
default_storage_engine=innodb
innodb_file_per_table=1
innodb_buffer_pool_size=4096M
innodb_log_file_size=128M
innodb_flush_log_at_trx_commit=2
innodb_flush_method=O_DIRECT
innodb_buffer_pool_instances=8
innodb_thread_concurrency=8
innodb_checksum_algorithm=none
innodb_doublewrite = 0

Not sure what else I can try and do to solve this. Before the schema update, the
longest it took was about 45 minutes.

- Issue History
Date Modified Username Field Change
2017-10-02 04:23 Ryushin New Issue
2017-10-02 04:40 Ryushin Note Added: 0002768
2017-10-02 16:47 joergs Note Added: 0002770
2017-10-02 16:47 joergs Assigned To => joergs
2017-10-02 16:47 joergs Status new => feedback
2017-10-02 16:48 joergs bareos-master: impact => yes
2017-10-02 16:48 joergs bareos-17.2: impact => yes
2017-10-02 16:48 joergs bareos-15.2: impact => no
2017-10-02 16:48 joergs bareos-14.2: impact => no
2017-10-02 16:48 joergs bareos-13.2: impact => no
2017-10-02 16:48 joergs bareos-12.4: impact => no
2017-10-04 02:20 Ryushin File Added: windwalker-dir.trace
2017-10-04 02:21 Ryushin Note Added: 0002772
2017-10-04 02:21 Ryushin Status feedback => assigned
2017-10-07 14:00 Ryushin Note Added: 0002776
2017-10-08 11:43 joergs Note Added: 0002778
2017-10-08 11:43 joergs Status assigned => feedback
2017-10-08 15:46 Ryushin Note Added: 0002779
2017-10-08 15:46 Ryushin Status feedback => assigned
2017-10-08 15:50 Ryushin File Added: windwalker-dir.trace_20171008
2017-10-08 15:59 Ryushin Note Added: 0002780
2017-10-08 16:01 Ryushin File Added: windwalker-dir.trace_20171008_2
2017-10-19 14:49 Ryushin Note Added: 0002804
2017-12-28 00:11 Ryushin Note Added: 0002849
2017-12-28 00:13 Ryushin Note Edited: 0002849 View Revisions
2017-12-28 01:55 Ryushin Note Added: 0002850
2017-12-28 01:56 Ryushin Note Edited: 0002850 View Revisions
2017-12-28 01:57 Ryushin Note Edited: 0002850 View Revisions
2017-12-28 13:42 Ryushin Note Added: 0002851


Copyright © 2000 - 2018 MantisBT Team
Powered by Mantis Bugtracker