View Issue Details

IDProjectCategoryView StatusLast Update
0000359bareos-core[All Projects] file daemonpublic2015-11-06 18:05
ReportertigerfootAssigned To 
PrioritylowSeverityfeatureReproducibilityalways
Status acknowledgedResolutionopen 
PlatformLinuxOSopenSUSEOS Version13x
Product Version13.2.3 
Fixed in Version 
Summary0000359: Make accurate multi-thread?
DescriptionOn big fileset with 4G files using high level of accurate take as much time as running a full backup.

Incremental
05-Nov 05:21 orville-sd JobId 62: Despooling elapsed time = 00:00:29, Transfer rate = 495.2 M Bytes/second
05-Nov 05:21 orville-sd JobId 62: Elapsed time=04:50:20, Transfer rate=823.3 K Bytes/second
05-Nov 05:21 orville-sd JobId 62: Sending spooled attrs to the Director. Despooling 12,563,395 bytes
Steps To ReproduceHave a very big number of files our case is +4,000,000,000 for a total of 3TB of data.
Set Job Accurate = Yes
Set in fileset accurate = mcspug1

Run a full, then incremental.
Additional Informationbareos-fd is only using one core ( 8 availables ).
Making the process using the available cores or a number given in conf
couldn't it improve the treatment of accurate ?
TagsNo tags attached.
bareos-master: impact
bareos-master: action
bareos-19.2: impact
bareos-19.2: action
bareos-18.2: impact
bareos-18.2: action
bareos-17.2: impact
bareos-17.2: action
bareos-16.2: impact
bareos-16.2: action
bareos-15.2: impact
bareos-15.2: action
bareos-14.2: impact
bareos-14.2: action
bareos-13.2: impact
bareos-13.2: action
bareos-12.4: impact
bareos-12.4: action

Activities

mvwieringen

mvwieringen

2014-11-07 11:24

developer   ~0001048

Maybe you can explain to me how multi-threading will make things go faster.

The accurate code fetches the values from the database and sends them over
the socket to the filed. Then its stored into a memory hash table or into
an LMDB for bareos-14.2.

So what will multi-threading bring unless you open multiple sockets etc.
e.g. you have a resource starvation on the socket. The only thing you could
think about is instead of sending the items one by one send them as bigger
chunks or use some socket compression.
tigerfoot

tigerfoot

2014-11-07 12:15

developer   ~0001050

Sorry if not enough clear and especially using certainly wrong words, for non dev language translated to dev.
Don't shoot the fool :-)

So from your point there's no way to check if files have changed in multi-* process? if file changed -> put them on the tobedone-queue.
I guess that checksuming 4 to 8 file at the same time result in a whole shorter backup time, than doing the checksum one by one.

Or to check where the time is most spended, what would be the procedure to see how much time is needed to build the accurate information on dir,
the time spended to send it to -fd, the time to build the hash/lmdb
the time spended to locally build what as to be backuped.
mvwieringen

mvwieringen

2014-11-07 14:12

developer   ~0001052

Last edited: 2014-11-07 14:17

View 2 revisions

If you want to see where time is spend you need to use profiling.

It might be worthwhile to split the scan process but that means decoupling
the scan process done in the findlib and the actual saving of the data.

It may also mean we need more read only transactions for the LMDB as multiple
threads will be accessing the accurate data. Same is true for the in memory
hash as multiple threads may be updating the data.

Given our current workload don't think we will be spending much time on this
but if you want to try fork the code and try it out not sure how hard it will
be.

Issue History

Date Modified Username Field Change
2014-11-05 09:46 tigerfoot New Issue
2014-11-07 11:24 mvwieringen Note Added: 0001048
2014-11-07 12:15 tigerfoot Note Added: 0001050
2014-11-07 14:12 mvwieringen Note Added: 0001052
2014-11-07 14:17 mvwieringen Note Edited: 0001052 View Revisions
2015-11-06 18:05 maik Priority normal => low
2015-11-06 18:05 maik Status new => acknowledged