View Issue Details

IDProjectCategoryView StatusLast Update
0000226bareos-coredirectorpublic2015-03-25 19:19
Reportercvelasco Assigned To 
PrioritynormalSeverityminorReproducibilityalways
Status closedResolutionfixed 
PlatformLinuxOSanyOS Version3
Product Version12.4.4 
Summary0000226: Bwlimit not working in Windows 64bits client
DescriptionDirector is version 12.4.5 linux 64bits
Client is version 12.4.4 windows 2008 R2 64bits

When you set "maximum bandwith" in job the speed of the job is really low, far of the target marked.

Client is Windows 2008 R2 virtualized in qemu64 (kvm) without HPET (don't know if this could be an issue).

With the job with bwlimit set to 50000 the speed is really really slow (2kbps):
===
client-fd Version: 12.4.4 (12 June 2013) VSS Linux Cross-compile Win64
Daemon started 18-Sep-13 12:08. Jobs: run=0 running=1.
Microsoft Windows Server 2008 R2 Enterprise Edition Service Pack 1 (build 7601), 64-bit
 Heap: heap=0 smbytes=333,774 max_bytes=333,878 bufs=151 max_bufs=151
 Sizeof: boffset_t=8 size_t=8 debug=0 trace=1 bwlimit=0kB/s

Running Jobs:
Director connected at: 18-Sep-13 12:09
JobId 7144 Job xxxxxxxxx.2013-09-18_12.10.30_13 is running.
    VSS Full Backup Job started: 18-Sep-13 12:10
    Files=3 Bytes=340,144 Bytes/sec=2,834 Errors=0
    Bwlimit=50,000
    Files Examined=3
    Processing file: N:xxxxxxxxxxxxxxx
    SDReadSeqNo=5 fd=792
===

Job has NOT compression:
===
Job: name=xxxxxxx JobType=66 protocol=0 level=Full Priority=30 Enabled=1
     MaxJobs=1 Resched=0 Times=5 Interval=1,800 Spool=0
     Accurate=0
     MaximumBandwidth=50000
  --> Client: name=client-fd protocol=0 authtype=0 address=xxxxxxx FDport=9102 MaxJobs=1
      JobRetention=2 months FileRetention=2 months AutoPrune=1 SoftQuota=0 SoftQuotaGrace=0 secs HardQuota=0 StrictQuotas=0
  --> Catalog: name=MyCatalog address=xxxxxx DBport=3306 db_name=bareos
      db_driver=mysql db_user=bareos MutliDBConn=0
  --> FileSet: name=xxxxxxx set
      O S
      N
      I N:xxxxxx
      N
  --> Schedule: name=xxxxx
  --> Run Level=Full
===

The SAME job without bwlimit the speed rises above 70k, the limit of the line.
===
client-fd Version: 12.4.4 (12 June 2013) VSS Linux Cross-compile Win64
Daemon started 18-Sep-13 12:16. Jobs: run=0 running=1.
Microsoft Windows Server 2008 R2 Enterprise Edition Service Pack 1 (build 7601), 64-bit
 Heap: heap=0 smbytes=339,994 max_bytes=340,660 bufs=159 max_bufs=159
 Sizeof: boffset_t=8 size_t=8 debug=0 trace=1 bwlimit=0kB/s

Running Jobs:
Director connected at: 18-Sep-13 12:17
JobId 7145 Job xxxxxxx.2013-09-18_12.18.56_03 is running.
    VSS Full Backup Job started: 18-Sep-13 12:18
    Files=63 Bytes=9,207,565 Bytes/sec=72,500 Errors=0
    Bwlimit=0
    Files Examined=63
    Processing file: N:xxxxxxxx
    SDReadSeqNo=5 fd=780
===
Steps To Reproduce1. Setup job with "Maximum Bandwidth".
2. Run job.
3. Tray in client or network devices show really slow speed.
TagsNo tags attached.

Activities

cvelasco

cvelasco

2013-09-18 14:54

reporter   ~0000663

Virtualizarion is NOT a difference here.
Tested the same issue with another client adjacent to the original one (same LAN, router, line). This is client Windows 7 64bits. REAL, not virtualized.

With bandwidth limit really low speed:
===
client2-fd Version: 12.4.4 (12 June 2013) VSS Linux Cross-compile Win64
Daemon started 18-Sep-13 14:33. Jobs: run=0 running=1.
Microsoft Windows 7 Professional Service Pack 1 (build 7601), 64-bit
 Heap: heap=0 smbytes=335,288 max_bytes=335,296 bufs=136 max_bufs=136
 Sizeof: boffset_t=8 size_t=8 debug=0 trace=1 bwlimit=0kB/s

Running Jobs:
Director connected at: 18-Sep-13 14:34
JobId 7148 Job xxxxx.2013-09-18_14.36.37_08 is running.
    VSS Full Backup Job started: 18-Sep-13 14:36
    Files=3 Bytes=7,856 Bytes/sec=25 Errors=0
    Bwlimit=30,000
    Files Examined=3
    Processing file: xxxxxxxx
    SDReadSeqNo=5 fd=944
===

Without bandwidth limit it speeds up to the line rate:
===
client2-fd Version: 12.4.4 (12 June 2013) VSS Linux Cross-compile Win64
Daemon started 18-Sep-13 14:45. Jobs: run=0 running=1.
Microsoft Windows 7 Professional Service Pack 1 (build 7601), 64-bit
 Heap: heap=0 smbytes=335,292 max_bytes=335,300 bufs=136 max_bufs=136
 Sizeof: boffset_t=8 size_t=8 debug=0 trace=1 bwlimit=0kB/s

Running Jobs:
Director connected at: 18-Sep-13 14:47
JobId 7149 Job xxxxx.2013-09-18_14.47.19_05 is running.
    VSS Full Backup Job started: 18-Sep-13 14:47
    Files=3 Bytes=11,542,192 Bytes/sec=60,748 Errors=0
    Bwlimit=0
    Files Examined=3
    Processing file: xxxxxxxx
    SDReadSeqNo=5 fd=912
===
mvwieringen

mvwieringen

2013-09-18 16:00

developer   ~0000664

I think the timer resolution on Windows is just to bad to let things work
for so low bandwidth. I have seen that also on FreeBSD the bandwidth kind
of wanders of from what you really want. I think anything under 200 Kb/s
will not work to well even on systems like Linux where 512 Kb/s and 1 Mb/s are
things which seems to work. You could try enabling the allowbandwidthbursting
to see if when you allow bursting it stays somewhat closer to the actual
bandwidth setting but don't expect anything as your bandwidth settings are
just to low to be able to control them in a decent matter with the current code
we inherited from Bacula which just sleeps a certain amount of miliseconds to
get to the wanted bandwidth.
cvelasco

cvelasco

2013-09-18 16:08

reporter   ~0000665

Sorry, how the allowbandwidthbursting works?
I tested the bandwidth control against a linux client trying to shape it at 30 megabits per second, with similar problem. Not as low as this, but at best it went to 20mbps or so.
mvwieringen

mvwieringen

2013-09-18 16:19

developer   ~0000666

Just set Allow Bandwidth Bursting = true in your filed config under FileDaemon.

This will allow the filed to use the bytes from a previous timeslice that
it didn't use. The original code always kept the the bandwidth under the
maximum and as such it will never reach the actual bandwidth setting. e.g.
the overall bandwidth will be much lower then the actual set bandwidth.
With bursting on however it could happen that sometimes it uses more then
the bandwidth set to get to the actual speed.

It could be that the code can be made smarter, but for now this is what it is.
The limiting code is in src/lib/bsock.c so you can look there if it can be
improved.
cvelasco

cvelasco

2013-09-18 17:29

reporter   ~0000667

Tried allowbandwidthbursting, better, but it doesn't make a difference really.
===
client-fd Version: 12.4.4 (12 June 2013) VSS Linux Cross-compile Win64
Daemon started 18-Sep-13 17:20. Jobs: run=0 running=1.
Microsoft Windows Server 2008 R2 Enterprise Edition Service Pack 1 (build 7601), 64-bit
 Heap: heap=0 smbytes=333,932 max_bytes=334,036 bufs=151 max_bufs=151
 Sizeof: boffset_t=8 size_t=8 debug=0 trace=1 bwlimit=0kB/s

Running Jobs:
Director connected at: 18-Sep-13 17:21
JobId 7150 Job xxxxxxxx.2013-09-18_17.21.55_03 is running.
    VSS Full Backup Job started: 18-Sep-13 17:21
    Files=3 Bytes=340,144 Bytes/sec=1,137 Errors=0
    Bwlimit=50,000
    Files Examined=3
    Processing file: xxxxxxxx
    SDReadSeqNo=5 fd=792
===
cvelasco

cvelasco

2013-09-18 19:49

reporter   ~0000668

Last edited: 2013-09-18 19:50

I have look into code and there is something I don't understand here.

In src/lib/bnet.c function control_bwlimit is called when read and when write.

bsock->control_bwlimit(nread);
bsock->control_bwlimit(nwritten);

But control_bwlimit uses only one variable m_last_tick (defined in src/lib/bsock.h).

I think reads and writes are both overwritting this variable and messing all thing.

mvwieringen

mvwieringen

2013-09-18 20:09

developer   ~0000669

As I already stated in my first reply I don't think bandwidth limiting is ever
going to work for any limit below 200 Kbps and maybe even higher for some
platforms due to the limited resolution of the timers and sleep method used.
(e.g. nanosleep is translated in mingw (the cross compiler used) into some
native Windows calls and I seriously wonder they will give you a good enough
resolution.

Regarding your observation about the m_last_tick variable that is nothing
more then a system time when it last checked the bandwidth limit. You can argue
that send and receive should have separate counters but the way it works now
is that the total bandwidth is the aggregate of input and output bytes given
that the backup speed is mainly dominated by write bandwidth any way (e.g.
the responses of the SD to the backup stream are minimal anyway if it
sends responses at all when blasting the file data (don't know the protocol
by hart without having to investigate)) I think it should be no problem.
So overwriting the m_last_tick variable is no problem as it only gets updated
when a full check is run if the data needs to be slowed down by inserting some
sleep interval.

We also see on regression tests ranges between 950 and 1100 kbps when we
limit to 1024 kbps. So it seems to work good enough on some platforms
if you want something more accurate you probably need to look into TC
for Linux or some other trafficshaper which are much more accurate.
cvelasco

cvelasco

2013-09-19 01:56

reporter   ~0000670

Sorry but I don't see this working at all.
From a linux to linux with bwlimit to 2500kbps.. it goes really down the mark.
===
linux-fd Version: 12.4.5 (04 September 2013) x86_64-unknown-linux-gnu unknown unknown
Daemon started 17-Sep-13 23:50. Jobs: run=7 running=0.
 Heap: heap=151,552 smbytes=755,720 max_bytes=851,095 bufs=275 max_bufs=471
 Sizeof: boffset_t=8 size_t=8 debug=0 trace=0 bwlimit=0kB/s

Running Jobs:
JobId 7152 Job xxxxx.2013-09-19_00.44.36_03 is running.
    Full Backup Job started: 19-Sep-13 00:44
    Files=13,794 Bytes=512,785,087 Bytes/sec=1,811,961 Errors=0
    Bwlimit=2,500,000
    Files Examined=13,794
    Processing file: xxxxxxxxx
    SDReadSeqNo=5 fd=5
===

About the m_last_tick it is not that read+written is added, this could be fine, although not right to me. It is rather that you get into a "race condition". For flow control we need time elapsed and amount of traffic sent. Time elapsed is calculated with now and m_last_tick. But if m_last_tick has been modified by received traffic then time elapsed is calculated wrong, tricked to believe you have sent a lot of traffic in little time, and this is wrong.

On the other hand, if we have a problem of precision then we need to call this function only when strictly necessary to avoid losing precision in every call. I would test it without the read call to see how it goes, but cross-compiling a windows bareos is above my level :(

Anyway, I have been looking into other codes and googling to see how other projects solve this problem. I found that the consensus is to use select function to do this.

From the man select:
===
Some code calls select() with all three sets empty, nfds zero, and a non-NULL timeout as a fairly portable way to sleep with subsecond precision.
===

I looked specially into Proftpd project where I use bwlimit there. In src/throttle.c there is interesting code:
===
    /* Setup for the select. We use select() instead of usleep() because it
     * seems to be far more portable across platforms.
     *
     * ideal and elapsed are in milleconds, but tv_usec will be microseconds,
     * so be sure to convert properly.
     */
    tv.tv_usec = (ideal - elapsed) * 1000;
    tv.tv_sec = tv.tv_usec / 1000000L;
    tv.tv_usec = tv.tv_usec % 1000000L;

    pr_log_debug(DEBUG7, "transferring too fast, delaying %ld sec%s, %ld usecs",
      (long int) tv.tv_sec, tv.tv_sec == 1 ? "" : "s", (long int) tv.tv_usec);

    /* No interruptions, please... */
    xfer_rate_sigmask(TRUE);

    if (select(0, NULL, NULL, NULL, &tv) < 0) {
===

It seems portable to Windows too, although with some more help.
http://stackoverflow.com/questions/85122/sleep-less-than-one-millisecond
===
On Windows, however, the use of select forces you to include the Winsock library which has to be initialized like this in your application:

WORD wVersionRequested = MAKEWORD(1,0);
WSADATA wsaData;
WSAStartup(wVersionRequested, &wsaData);

And then the select won't allow you to be called without any socket so you have to do a little more to create a microsleep method:

int usleep(long usec)
{
    struct timeval tv;
    fd_set dummy;
    SOCKET s = socket(PF_INET, SOCK_STREAM, IPPROTO_TCP);
    FD_ZERO(&dummy);
    FD_SET(s, &dummy);
    tv.tv_sec = usec/1000000L;
    tv.tv_usec = usec%1000000L;
    return select(0, 0, 0, &dummy, &tv);
}

All these created usleep methods return zero when successful and non-zero for errors.
===

Right now, bareos/bacula uses bmicrosleep that uses nanosleep if present, if not it uses something really strange to me (the old way that I doubt it works fine at all):
===
#ifdef HAVE_NANOSLEEP
   status = nanosleep(&timeout, NULL);
   if (!(status < 0 && errno == ENOSYS)) {
      return status;
   }
   /* If we reach here it is because nanosleep is not supported by the OS */
#endif

   /* Do it the old way */
   gettimeofday(&tv, &tz);
   timeout.tv_nsec += tv.tv_usec * 1000;
   timeout.tv_sec += tv.tv_sec;
   while (timeout.tv_nsec >= 1000000000) {
      timeout.tv_nsec -= 1000000000;
      timeout.tv_sec++;
   }
===

So, I think the select method should be the way to go.
mvwieringen

mvwieringen

2013-09-19 09:53

developer   ~0000671

First of all as we compile for windows with MINGW we use src/win32/compat/include/mingwconfig.h as config file for what options should be used if you look there you will find that HAVE_NANOSLEEP is enabled. As to using select, poll or the now used pthreads alternative when nanosleep is not available I think they are all comparable. The all use the timeout method for not hanging
the select, poll or waiting on a pthread conditional that never gets raised.

As to your linux test, did you run that with Allow Bandwidth Bursting ?
If so then its indeed quite off then again the overall bandwidth calculation
is not to accurate either if not yes that is what I have seen too and that is
why the new option was introduced. We might want to make that the default in
a newer version.

If you want to get a deeper insight in things you could try running the fd
with a high debug level (I think 450 or higher will trigger the debug messages
in the limiting code.)

As this is just seriously hard to debug (if not next to impossible) I see very
low probability it will be fixed or made better any time soon. That doesn't mean
that you cannot work on enhancements yourself and if you can show that they
behave much better then please send a patch and we will seriously consider
changing the code but currently we are all to busy to work on something that
is going to take serious time with little gain or promise of any gain at all.
cvelasco

cvelasco

2013-09-19 13:23

reporter   ~0000672

Tested with burst on. Better, but still 10% off down the mark.
Both machines are real, not virtual, and both have HPET.
===
linux-fd Version: 12.4.5 (04 September 2013) x86_64-unknown-linux-gnu unknown unknown
Daemon started 19-Sep-13 13:08. Jobs: run=0 running=0.
 Heap: heap=270,336 smbytes=749,634 max_bytes=755,502 bufs=227 max_bufs=234
 Sizeof: boffset_t=8 size_t=8 debug=0 trace=0 bwlimit=0kB/s

Running Jobs:
JobId 7160 Job xxxxx.2013-09-19_13.09.06_16 is running.
    Full Backup Job started: 19-Sep-13 13:09
    Files=4,496 Bytes=182,409,435 Bytes/sec=2,280,117 Errors=0
    Bwlimit=2,500,000
    Files Examined=4,496
    Processing file: xxxxxx
    SDReadSeqNo=5 fd=5
===

mingw AFAIK actually has not nanosleep.
I think it is defined in mingwconfig.h because in src/win32/compat/compat.c there is a nanosleep that makes a call to a simple Sleep.
Sleep((req->tv_sec * 1000) + (req->tv_nsec/1000000));

And this is not accurate at all.
mvwieringen

mvwieringen

2013-09-19 15:10

developer   ~0000673

Ok nice that we have such a great replacement for nanosleep while the
fallback code in bsys.c probably works better with pthread. Up until now
I think nanosleep was not really used on windows for anything critical
only sleeping for 0.10 seconds but that is no real problem for this ancient
implementation.

As to the 10% offset, that is better then I expected, I think you have to make
the code seriously more complex to be able to get closer as you have to keep
track over the full time of the backup of unused bytes in previous timeslices
and use those for bursting too to be able to get much closer to the set limit.

It will also mean people are going to start complaining that it sometimes uses
less and sometimes more then what they want. So I think it will be a serious
adventure to improve it much. But like I said before write a patch benchmark
it and show the better approach and I will import it.

As to the windows nanosleep problem. I don't have time to look into fixing
that right now (just before a small holiday and the OSBConf next week.)
We have a bunch of other windows fixes in the development pipeline I will
see if I can put this fix in too and maybe either release it as part of
the whole set of patches for windows or if that takes to long find an other
path for bringing it to testing.
cvelasco

cvelasco

2013-09-19 19:34

reporter   ~0000674

Test from linux to linux with low bwlimit is really bad. Burst is on.
===
linux-fd Version: 12.4.5 (04 September 2013) x86_64-unknown-linux-gnu unknown unknown
Daemon started 19-Sep-13 13:26. Jobs: run=0 running=0.
 Heap: heap=270,336 smbytes=747,713 max_bytes=755,014 bufs=226 max_bufs=227
 Sizeof: boffset_t=8 size_t=8 debug=0 trace=0 bwlimit=0kB/s

Running Jobs:
JobId 7161 Job xxxx.2013-09-19_19.25.39_05 is running.
    Full Backup Job started: 19-Sep-13 19:25
    Files=1,992 Bytes=143,232,644 Bytes/sec=384,001 Errors=0
    Bwlimit=25,000
    Files Examined=1,992
    Processing file: xxxxx
    SDReadSeqNo=5 fd=5
===
mvwieringen

mvwieringen

2013-09-27 17:17

developer   ~0000678

Fix committed to bareos master branch with changesetid 1150.
mvwieringen

mvwieringen

2015-03-25 16:51

developer   ~0001488

Fix committed to bareos2015 bareos-14.2 branch with changesetid 5028.
joergs

joergs

2015-03-25 19:19

developer   ~0001633

Due to the reimport of the Github repository to bugs.bareos.org, the status of some tickets have been changed. These tickets will be closed again.
Sorry for the noise.

Related Changesets

bareos: master e5195308

2013-09-27 14:22

mvwieringen

Ported: N/A

Details Diff
Bwlimit not working in Windows 64bits client

The nanosleep implementation in compat.c for windows if of poor quality.
Try using the fallback to the pthread_cond_timedwait() in bsys.c by
no longer claiming in the mingwconfig.h that we have a working nanosleep()
and remove the poor implementation. There are a couple of ways of sleeping
in a somewhat portable way e.g. select(), poll() or pthread_cond_timedwait().
So for as we use pthreads anyway everywhere we leave the pthread_cond_timedwait()
method.

Also check if bmicrosleep() returns early in the bandwidth limiting code
and if it does schedule an new bmicrosleep() of the sleep time remaining.

Fixes 0000226: Bwlimit not working in Windows 64bits client
Affected Issues
0000226
mod - src/lib/bsock.c Diff File
mod - src/lib/bsys.c Diff File
mod - src/win32/compat/compat.c Diff File
mod - src/win32/compat/include/mingwconfig.h Diff File

bareos2015: bareos-14.2 8dd095e3

2013-09-27 16:22

mvwieringen

Ported: N/A

Details Diff
Bwlimit not working in Windows 64bits client

The nanosleep implementation in compat.c for windows if of poor quality.
Try using the fallback to the pthread_cond_timedwait() in bsys.c by
no longer claiming in the mingwconfig.h that we have a working nanosleep()
and remove the poor implementation. There are a couple of ways of sleeping
in a somewhat portable way e.g. select(), poll() or pthread_cond_timedwait().
So for as we use pthreads anyway everywhere we leave the pthread_cond_timedwait()
method.

Also check if bmicrosleep() returns early in the bandwidth limiting code
and if it does schedule an new bmicrosleep() of the sleep time remaining.

Fixes 0000226: Bwlimit not working in Windows 64bits client
Affected Issues
0000226
mod - src/lib/bsock.c Diff File
mod - src/lib/bsys.c Diff File
mod - src/win32/compat/compat.c Diff File
mod - src/win32/compat/include/mingwconfig.h Diff File

Issue History

Date Modified Username Field Change
2013-09-18 13:12 cvelasco New Issue
2013-09-18 14:54 cvelasco Note Added: 0000663
2013-09-18 16:00 mvwieringen Note Added: 0000664
2013-09-18 16:01 mvwieringen Assigned To => mvwieringen
2013-09-18 16:01 mvwieringen Status new => feedback
2013-09-18 16:08 cvelasco Note Added: 0000665
2013-09-18 16:08 cvelasco Status feedback => assigned
2013-09-18 16:19 mvwieringen Note Added: 0000666
2013-09-18 16:20 mvwieringen Status assigned => feedback
2013-09-18 17:29 cvelasco Note Added: 0000667
2013-09-18 17:29 cvelasco Status feedback => assigned
2013-09-18 19:49 cvelasco Note Added: 0000668
2013-09-18 19:50 cvelasco Note Edited: 0000668
2013-09-18 20:09 mvwieringen Note Added: 0000669
2013-09-18 20:09 mvwieringen Status assigned => feedback
2013-09-19 01:56 cvelasco Note Added: 0000670
2013-09-19 01:56 cvelasco Status feedback => assigned
2013-09-19 09:53 mvwieringen Note Added: 0000671
2013-09-19 09:59 mvwieringen Assigned To mvwieringen =>
2013-09-19 09:59 mvwieringen Status assigned => feedback
2013-09-19 13:23 cvelasco Note Added: 0000672
2013-09-19 13:23 cvelasco Status feedback => new
2013-09-19 15:10 mvwieringen Note Added: 0000673
2013-09-19 19:34 cvelasco Note Added: 0000674
2013-09-27 17:17 mvwieringen Changeset attached => bareos master e5195308
2013-09-27 17:17 mvwieringen Note Added: 0000678
2013-09-27 17:17 mvwieringen Assigned To => mvwieringen
2013-09-27 17:17 mvwieringen Status new => resolved
2013-09-27 17:17 mvwieringen Resolution open => fixed
2013-11-16 15:32 mvwieringen Status resolved => closed
2013-11-16 15:32 mvwieringen Assigned To mvwieringen =>
2015-03-25 16:51 mvwieringen Changeset attached => bareos2015 bareos-14.2 8dd095e3
2015-03-25 16:51 mvwieringen Note Added: 0001488
2015-03-25 16:51 mvwieringen Status closed => resolved
2015-03-25 19:19 joergs Note Added: 0001633
2015-03-25 19:19 joergs Status resolved => closed