View Issue Details

IDProjectCategoryView StatusLast Update
0000961bareos-coredirectorpublic2023-04-06 15:21
Reporterprogserega Assigned Tobruno-at-bareos  
PriorityurgentSeveritycrashReproducibilitysometimes
Status closedResolutionunable to reproduce 
PlatformLinuxOSDebianOS Version9
Product Version17.2.5 
Summary0000961: director crash after 4-10 hours of work
DescriptionBareos starting in LXC in Proxmox 5.1. For container set 8 Gb hard drive and 16 Gb RAM.

Bareos-dir crash every 0000004:0000004 hour of work.

I start bareos-dir in gdb for get stack:

(gdb) r
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /usr/sbin/bareos-dir -f -v -d 10
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
bareos-dir (10): dird.c:245-0 Debug level = 10
bareos-dir (9): inc_conf.c:390-0 set wildfile 555555846518 size=1 [A-Z]:/pagefile.sys
bareos-dir (9): inc_conf.c:390-0 set wilddir 555555846518 size=1 [A-Z]:/RECYCLER
bareos-dir (9): inc_conf.c:390-0 set wilddir 555555846518 size=2 [A-Z]:/$RECYCLE.BIN
bareos-dir (9): inc_conf.c:390-0 set wilddir 555555846518 size=3 [A-Z]:/System Volume Information
bareos-dir (9): inc_conf.c:390-0 set wildbase 555555841088 size=1 *.avi
bareos-dir (9): inc_conf.c:390-0 set wildbase 555555841088 size=2 *.mkv
bareos-dir (9): inc_conf.c:390-0 set wild 55555583fa28 size=1 /var/www/localhost/htdocs
[New Thread 0x7ffff41bc700 (LWP 43570)]
[New Thread 0x7fffebfff700 (LWP 43572)]
[New Thread 0x7fffeb7fe700 (LWP 43573)]
[New Thread 0x7fffeaffd700 (LWP 43574)]
[New Thread 0x7fffea7fc700 (LWP 44718)]
[New Thread 0x7fffe9ffb700 (LWP 44726)]
[Thread 0x7fffe9ffb700 (LWP 44726) exited]
[Thread 0x7fffea7fc700 (LWP 44718) exited]

Thread 5 "bareos-dir" received signal SIGUSR2, User defined signal 2.
[Switching to Thread 0x7fffeaffd700 (LWP 43574)]
0x00007ffff56f57dd in nanosleep () at ../sysdeps/unix/syscall-template.S:84
84 ../sysdeps/unix/syscall-template.S: Нет такого файла или каталога.
(gdb) bt
#0 0x00007ffff56f57dd in nanosleep () at ../sysdeps/unix/syscall-template.S:84
0000001 0x00007ffff6ea51f4 in bmicrosleep(int, int) () from /usr/lib/bareos/libbareos-17.2.4.so
0000002 0x00007ffff6ecf0c8 in register_watchdog(s_watchdog_t*) () from /usr/lib/bareos/libbareos-17.2.4.so
0000003 0x00007ffff6ea8107 in start_thread_timer(JCR*, unsigned long, unsigned int) () from /usr/lib/bareos/libbareos-17.2.4.so
0000004 0x00007ffff6ea37c0 in BSOCK_TCP::connect(JCR*, int, long, long, char const*, char*, char*, int, bool) () from /usr/lib/bareos/libbareos-17.2.4.so
0000005 0x00005555555a74cd in ?? ()
0000006 0x0000555555833108 in ?? ()
0000007 0x0000000000000000 in ?? ()
(gdb)
Additional Informationlast lines from bareos.log on director host:

    Check file /var/lib/bareos/bareos-dir.9101.pid
    08-июн 13:00 bareos-dir JobId 35: shell command: run BeforeJob "/scripts/backup/bareos_zabbix_status.sh"
    08-июн 13:00 bareos-dir JobId 35: Start Admin JobId 35, Job=AdminJobZabbixStatus.2018-06-08_13.00.00_01
    08-июн 13:00 bareos-dir JobId 35: BAREOS 17.2.4 (21Sep17): 08-июн-2018 13:00:02
      JobId: 35
      Job: AdminJobZabbixStatus.2018-06-08_13.00.00_01
      Scheduled time: 08-июн-2018 13:00:00
      Start time: 08-июн-2018 13:00:02
      End time: 08-июн-2018 13:00:02
      Termination: Admin OK
    
    08-июн 13:00 bareos-dir JobId 35: shell command: run AfterJob "touch /tmp/bareos_admin_test_after_job"
    08-июн 14:00 bareos-dir JobId 36: shell command: run BeforeJob "/scripts/backup/bareos_zabbix_status.sh"
    08-июн 14:00 bareos-dir JobId 36: Start Admin JobId 36, Job=AdminJobZabbixStatus.2018-06-08_14.00.00_20
    08-июн 14:00 bareos-dir JobId 36: BAREOS 17.2.4 (21Sep17): 08-июн-2018 14:00:02
      JobId: 36
      Job: AdminJobZabbixStatus.2018-06-08_14.00.00_20
      Scheduled time: 08-июн-2018 14:00:00
      Start time: 08-июн-2018 14:00:02
      End time: 08-июн-2018 14:00:02
      Termination: Admin OK
    
    08-июн 14:00 bareos-dir JobId 36: shell command: run AfterJob "touch /tmp/bareos_admin_test_after_job"
    08-июн 15:00 bareos-dir JobId 37: shell command: run BeforeJob "/scripts/backup/bareos_zabbix_status.sh"
    08-июн 15:00 bareos-dir JobId 37: Start Admin JobId 37, Job=AdminJobZabbixStatus.2018-06-08_15.00.00_03
    08-июн 15:00 bareos-dir JobId 37: BAREOS 17.2.4 (21Sep17): 08-июн-2018 15:00:02
      JobId: 37
      Job: AdminJobZabbixStatus.2018-06-08_15.00.00_03
      Scheduled time: 08-июн-2018 15:00:00
      Start time: 08-июн-2018 15:00:02
      End time: 08-июн-2018 15:00:02
      Termination: Admin OK
    
    08-июн 15:00 bareos-dir JobId 37: shell command: run AfterJob "touch /tmp/bareos_admin_test_after_job"
TagsNo tags attached.

Activities

progserega

progserega

2018-06-14 06:58

reporter   ~0003040

In system logs:

июн 14 14:16:08 bareos bareos-dir[43341]: BAREOS interrupted by signal 7: BUS error
progserega

progserega

2018-06-14 08:43

reporter   ~0003041

Start bareos-dir in console:

/usr/sbin/bareos-dir -d 99 -f -v

result after start backup job:

bareos-dir (50): postgresql.c:248-0 db_user=bareos db_name=bareos db_password=DyVek9IXYe1QbLzL
bareos-dir (20): ua_output.c:567-0 list: llist jobid=209
bareos-dir (50): cram-md5.c:68-0 send: auth cram-md5 <544362531.1528958430@bareos-dir> ssl=0
bareos-dir (50): cram-md5.c:94-0 Authenticate OK J/o8sY02C7xkhnAjlEzO6g
bareos-dir (99): cram-md5.c:143-0 sending resp to challenge: e9+Ovk+wKR+UY4Ea9Q+PbA
bareos-dir (10): ua_audit.c:143-0 : Console [admin] from [127.0.0.1] cmdline list joblog jobid=209 limit=1000 offset=0
bareos-dir (50): postgresql.c:246-0 pg_real_connect done
bareos-dir (50): postgresql.c:248-0 db_user=bareos db_name=bareos db_password=DyVek9IXYe1QbLzL
bareos-dir (20): ua_output.c:567-0 list: list joblog jobid=209 limit=1000 offset=0
bareos-dir (10): ua_audit.c:143-0 : Console [admin] from [127.0.0.1] cmdline list joblog jobid=209 limit=1000 offset=1000
bareos-dir (20): ua_output.c:567-0 list: list joblog jobid=209 limit=1000 offset=1000
bareos-dir (50): cram-md5.c:68-0 send: auth cram-md5 <1717595899.1528958430@bareos-dir> ssl=0
bareos-dir (50): cram-md5.c:94-0 Authenticate OK t/WObVoLokZ2DPiGYtk5Iw
bareos-dir (99): cram-md5.c:143-0 sending resp to challenge: ByU+x8/A7T/kC6tIQ7JKdC
bareos-dir (10): ua_audit.c:143-0 : Console [admin] from [127.0.0.1] cmdline llist jobmedia jobid=209
bareos-dir (50): postgresql.c:246-0 pg_real_connect done
bareos-dir (50): postgresql.c:248-0 db_user=bareos db_name=bareos db_password=DyVek9IXYe1QbLzL
bareos-dir (20): ua_output.c:567-0 list: llist jobmedia jobid=209
BAREOS interrupted by signal 7: BUS error
Kaboom! bareos-dir, bareos-dir got signal 7 - BUS error. Attempting traceback.
Kaboom! exepath=/usr/sbin/
Calling: /usr/sbin/btraceback /usr/sbin/bareos-dir 44527 /var/lib/bareos
progserega

progserega

2018-06-14 08:46

reporter   ~0003042

last log in bareos.log director log:

14-июн 16:39 cloud-fd JobId 209: shell command: run ClientAfterJob "/scripts/backup/bacula_clear_backup_dir /tmp/bacula_mysql_backup"
14-июн 16:39 bareos-sd JobId 209: Elapsed time=00:00:01, Transfer rate=69.34 M Bytes/second
14-июн 16:39 cloud-fd JobId 209: ClientAfterJob: удаляю файлы во временной директории /tmp/bacula_mysql_backup:
14-июн 16:39 cloud-fd JobId 209: ClientAfterJob: removed '//tmp/bacula_mysql_backup/nextcloud.sql.gz'
14-июн 16:39 cloud-fd JobId 209: ClientAfterJob: удаление файлов прошло без ошибок. завершение скрипта /scripts/backup/bacula_clear_
14-июн 16:39 cloud-fd JobId 209: ClientAfterJob: backup_dir
14-июн 16:39 bareos-dir JobId 209: sql_create.c:872 Insert of attributes batch table done
14-июн 16:39 bareos-dir JobId 209: Bareos bareos-dir 17.2.4 (21Sep17):
  Build OS: x86_64-pc-linux-gnu debian Debian GNU/Linux 9.3 (stretch)
  JobId: 209
  Job: backup-cloud.rs.int-MysqlYear.2018-06-14_16.39.11_07
  Backup Level: Full
  Client: "cloud.rs.int-fd" 17.2.4 (21Sep17) x86_64-pc-linux-gnu,debian,Debian GNU/Linux 8.0 (jessie),Debian_8.0,x86_64
  FileSet: "MysqlFileSet" 2018-06-14 14:11:57
  Pool: "cloud.rs.int-DbPoolYear" (From command line)
  Catalog: "MyCatalog" (From Client resource)
  Storage: "DbStorage" (From Job resource)
  Scheduled time: 14-июн-2018 16:39:11
  Start time: 14-июн-2018 16:39:50
  End time: 14-июн-2018 16:39:52
  Elapsed time: 2 secs
  Priority: 10
  FD Files Written: 2
  SD Files Written: 2
  FD Bytes Written: 69,346,787 (69.34 MB)
  SD Bytes Written: 69,347,007 (69.34 MB)
  Rate: 34673.4 KB/s
  Software Compression: None
  VSS: no
  Encryption: no
  Accurate: no
  Volume name(s): cloud.rs.int-DbPoolYear-2018.06.14-0
  Volume Session Id: 67
  Volume Session Time: 1528268023
  Last Volume Bytes: 69,400,243 (69.40 MB)
  Non-fatal FD errors: 0
  SD Errors: 0
  FD termination status: OK
  SD termination status: OK
  Termination: Backup OK
bruno-at-bareos

bruno-at-bareos

2023-03-23 16:43

manager   ~0004950

Is this still reproducible with current code (Bareos >21) ?
bruno-at-bareos

bruno-at-bareos

2023-04-06 15:21

manager   ~0004965

can't be reproduced with recent 22 code. We have director running 24/24.

Issue History

Date Modified Username Field Change
2018-06-09 06:07 progserega New Issue
2018-06-14 06:58 progserega Note Added: 0003040
2018-06-14 08:43 progserega Note Added: 0003041
2018-06-14 08:46 progserega Note Added: 0003042
2023-03-23 16:43 bruno-at-bareos Assigned To => bruno-at-bareos
2023-03-23 16:43 bruno-at-bareos Status new => feedback
2023-03-23 16:43 bruno-at-bareos Note Added: 0004950
2023-04-06 15:21 bruno-at-bareos Status feedback => closed
2023-04-06 15:21 bruno-at-bareos Resolution open => unable to reproduce
2023-04-06 15:21 bruno-at-bareos Note Added: 0004965