View Issue Details
|ID||Project||Category||View Status||Date Submitted||Last Update|
|0000359||bareos-core||file daemon||public||2014-11-05 09:46||2015-11-06 18:05|
|Summary||0000359: Make accurate multi-thread?|
|Description||On big fileset with 4G files using high level of accurate take as much time as running a full backup.|
05-Nov 05:21 orville-sd JobId 62: Despooling elapsed time = 00:00:29, Transfer rate = 495.2 M Bytes/second
05-Nov 05:21 orville-sd JobId 62: Elapsed time=04:50:20, Transfer rate=823.3 K Bytes/second
05-Nov 05:21 orville-sd JobId 62: Sending spooled attrs to the Director. Despooling 12,563,395 bytes
|Steps To Reproduce||Have a very big number of files our case is +4,000,000,000 for a total of 3TB of data.|
Set Job Accurate = Yes
Set in fileset accurate = mcspug1
Run a full, then incremental.
|Additional Information||bareos-fd is only using one core ( 8 availables ).|
Making the process using the available cores or a number given in conf
couldn't it improve the treatment of accurate ?
|Tags||No tags attached.|
Maybe you can explain to me how multi-threading will make things go faster.
The accurate code fetches the values from the database and sends them over
the socket to the filed. Then its stored into a memory hash table or into
an LMDB for bareos-14.2.
So what will multi-threading bring unless you open multiple sockets etc.
e.g. you have a resource starvation on the socket. The only thing you could
think about is instead of sending the items one by one send them as bigger
chunks or use some socket compression.
Sorry if not enough clear and especially using certainly wrong words, for non dev language translated to dev.
Don't shoot the fool :-)
So from your point there's no way to check if files have changed in multi-* process? if file changed -> put them on the tobedone-queue.
I guess that checksuming 4 to 8 file at the same time result in a whole shorter backup time, than doing the checksum one by one.
Or to check where the time is most spended, what would be the procedure to see how much time is needed to build the accurate information on dir,
the time spended to send it to -fd, the time to build the hash/lmdb
the time spended to locally build what as to be backuped.
If you want to see where time is spend you need to use profiling.
It might be worthwhile to split the scan process but that means decoupling
the scan process done in the findlib and the actual saving of the data.
It may also mean we need more read only transactions for the LMDB as multiple
threads will be accessing the accurate data. Same is true for the in memory
hash as multiple threads may be updating the data.
Given our current workload don't think we will be spending much time on this
but if you want to try fork the code and try it out not sure how hard it will
|2014-11-05 09:46||tigerfoot||New Issue|
|2014-11-07 11:24||mvwieringen||Note Added: 0001048|
|2014-11-07 12:15||tigerfoot||Note Added: 0001050|
|2014-11-07 14:12||mvwieringen||Note Added: 0001052|
|2014-11-07 14:17||mvwieringen||Note Edited: 0001052|
|2015-11-06 18:05||maik||Priority||normal => low|
|2015-11-06 18:05||maik||Status||new => acknowledged|