View Issue Details

IDProjectCategoryView StatusLast Update
0000830bareos-core[All Projects] directorpublic2017-11-18 11:37
Reporterjoop.martensAssigned To 
PrioritynormalSeveritymajorReproducibilityalways
Status newResolutionopen 
PlatformLinuxOSCentOSOS Version7
Product Version16.2.5 
Fixed in Version 
Summary0000830: Client consolidation job lasts over 10 hours to complete
DescriptionI'm heading into an issue were the always incremental consolidation job for one of my backup clients lasts over 10 hours.

Because of the long runtime period it is becoming an issue for the consolidation to complete before the next backup schedule kicks off.

I have several clients that are being backuped using the always incremental mechanism.

The second largest client has approximately 6008K files stored in the File table and finishes consolidation within 30 minutes which is just fine.

The largest client has approximately 41000K files stored in the File table and lasts for 14 hours to complete.
  
I have used the following query to count the number of files by client:
select count(*) from File where JobId in (select JobId from Job where CLientID = <ClientID>);

I know 41000K files is a huge amount but can't imagine that it requires 14 hours to process and I'm hoping that the performance can be improved.

I'm running Bareos 16.2.6 and MariaDB 10.1.24 with a bufferpool that almost holds the full database of 12GB.

The consolidation job actually needs almost all it's time for handling the following query (Obviously with unique JobID's for every consolidation job):
SELECT Path.Path, Filename.Name, T1.FileIndex, T1.JobId, LStat, DeltaSeq FROM ( SELECT FileId, Job.JobId AS JobId, FileIndex, File.PathId AS PathId, File.FilenameId AS FilenameId, LStat , File.DeltaSeq AS DeltaSeq, Job.JobTDate AS JobTDate FROM Job, File, ( SELECT MAX(JobTDate) AS JobTDate, PathId, FilenameId, DeltaSeq FROM ( SELECT JobTDate, PathId, FilenameId, DeltaSeq FROM File JOIN Job USING (JobId) WHERE File.JobId IN (3214,3268,1932,1948) UNION ALL SELECT JobTDate, PathId, FilenameId, DeltaSeq FROM BaseFiles JOIN File USING (FileId) JOIN Job ON (BaseJobId = Job.JobId) WHERE BaseFiles.JobId IN (3214,3268,1932,1948) ) AS tmp GROUP BY PathId, FilenameId, DeltaSeq ) AS T1 WHERE (Job.JobId IN ( SELECT DISTINCT BaseJobId FROM BaseFiles WHERE JobId IN (3214,3268,1932,1948)) OR Job.JobId IN (3214,3268,1932,1948)) AND T1.JobTDate = Job.JobTDate AND Job.JobId = File.JobId AND T1.PathId = File.PathId AND T1.FilenameId = File.FilenameId ) AS T1 JOIN Filename ON (Filename.FilenameId = T1.FilenameId) JOIN Path ON (Path.PathId = T1.PathId) WHERE FileIndex > 0 ORDER BY T1.JobTDate, FileIndex ASC;

Other queries and the actual storage migration of the data is completed within minutes so if the process time of the above query can be improved my problem would be solved.

I have done some database analysis but I'm not really experienced in performance tuning of databases and getting stuck here.
I have also executed bareos-dbcheck and ran all checks with the modification flag enabled but that also does not seem to fix the issue.

While the query is running their is no disk I/O (Because of the large bufferpool). One CPU core is constantly 100% utilized by mysql.

An EXPLAIN of the above query reveals:
+------+--------------+------------+--------+----------------------+---------+---------+----------------------------+------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+--------------+------------+--------+----------------------+---------+---------+----------------------------+------+---------------------------------+
| 1 | PRIMARY | <derived3> | ALL | NULL | NULL | NULL | NULL | 679 | Using temporary; Using filesort |
| 1 | PRIMARY | Path | eq_ref | PRIMARY | PRIMARY | 4 | T1.PathId | 1 | |
| 1 | PRIMARY | Filename | eq_ref | PRIMARY | PRIMARY | 4 | T1.FilenameId | 1 | |
| 1 | PRIMARY | File | ref | JobId,JobId_2,PathId | PathId | 4 | T1.PathId | 16 | Using where |
| 1 | PRIMARY | Job | eq_ref | PRIMARY | PRIMARY | 4 | bareos.File.JobId | 1 | Using where |
| 6 | MATERIALIZED | BaseFiles | ALL | basefiles_jobid_idx | NULL | NULL | NULL | 1 | Using where |
| 3 | DERIVED | <derived4> | ALL | NULL | NULL | NULL | NULL | 679 | Using temporary; Using filesort |
| 4 | DERIVED | Job | range | PRIMARY | PRIMARY | 4 | NULL | 3 | Using where |
| 4 | DERIVED | File | ref | JobId,JobId_2 | JobId | 4 | bareos.Job.JobId | 226 | |
| 5 | UNION | BaseFiles | ALL | basefiles_jobid_idx | NULL | NULL | NULL | 1 | Using where |
| 5 | UNION | Job | eq_ref | PRIMARY | PRIMARY | 4 | bareos.BaseFiles.BaseJobId | 1 | |
| 5 | UNION | File | eq_ref | PRIMARY | PRIMARY | 8 | bareos.BaseFiles.FileId | 1 | |
+------+--------------+------------+--------+----------------------+---------+---------+----------------------------+------+---------------------------------+

An ANALYZE FORMAT=JSON of the above query reveals:

ANALYZE
{
  "query_block": {
    "select_id": 1,
    "r_loops": 1,
    "r_total_time_ms": 5.44e7,
    "filesort": {
      "r_loops": 1,
      "r_total_time_ms": 926.18,
      "r_used_priority_queue": false,
      "r_output_rows": 900331,
      "r_sort_passes": 1,
      "r_buffer_size": "2047Kb",
      "temporary_table": {
        "table": {
          "table_name": "<derived3>",
          "access_type": "ALL",
          "r_loops": 1,
          "rows": 901,
          "r_rows": 1.04e6,
          "r_total_time_ms": 58.562,
          "filtered": 100,
          "r_filtered": 100,
          "materialized": {
            "query_block": {
              "select_id": 3,
              "r_loops": 1,
              "r_total_time_ms": 56202,
              "filesort": {
                "r_loops": 1,
                "r_total_time_ms": 404.72,
                "r_used_priority_queue": false,
                "r_output_rows": 1036317,
                "r_sort_passes": 4,
                "r_buffer_size": "2047Kb",
                "temporary_table": {
                  "table": {
                    "table_name": "<derived4>",
                    "access_type": "ALL",
                    "r_loops": 1,
                    "rows": 901,
                    "r_rows": 1.88e6,
                    "r_total_time_ms": 86.389,
                    "filtered": 100,
                    "r_filtered": 100,
                    "materialized": {
                      "query_block": {
                        "union_result": {
                          "table_name": "<union4,5>",
                          "access_type": "ALL",
                          "r_loops": 0,
                          "r_rows": null,
                          "query_specifications": [
                            {
                              "query_block": {
                                "select_id": 4,
                                "r_loops": 1,
                                "r_total_time_ms": 9697.5,
                                "table": {
                                  "table_name": "Job",
                                  "access_type": "range",
                                  "possible_keys": ["PRIMARY"],
                                  "key": "PRIMARY",
                                  "key_length": "4",
                                  "used_key_parts": ["JobId"],
                                  "r_loops": 1,
                                  "rows": 4,
                                  "r_rows": 4,
                                  "r_total_time_ms": 10.702,
                                  "filtered": 100,
                                  "r_filtered": 100,
                                  "attached_condition": "(Job.JobId in (3214,3268,1932,1948))"
                                },
                                "table": {
                                  "table_name": "File",
                                  "access_type": "ref",
                                  "possible_keys": ["JobId", "JobId_2"],
                                  "key": "JobId",
                                  "key_length": "4",
                                  "used_key_parts": ["JobId"],
                                  "ref": ["bareos.Job.JobId"],
                                  "r_loops": 4,
                                  "rows": 225,
                                  "r_rows": 471232,
                                  "r_total_time_ms": 5152.9,
                                  "filtered": 100,
                                  "r_filtered": 100
                                }
                              }
                            },
                            {
                              "query_block": {
                                "select_id": 5,
                                "r_loops": 1,
                                "r_total_time_ms": 0.0156,
                                "table": {
                                  "table_name": "BaseFiles",
                                  "access_type": "ALL",
                                  "possible_keys": ["basefiles_jobid_idx"],
                                  "r_loops": 1,
                                  "rows": 1,
                                  "r_rows": 0,
                                  "r_total_time_ms": 0.0053,
                                  "filtered": 100,
                                  "r_filtered": 100,
                                  "attached_condition": "(BaseFiles.JobId in (3214,3268,1932,1948))"
                                },
                                "table": {
                                  "table_name": "Job",
                                  "access_type": "eq_ref",
                                  "possible_keys": ["PRIMARY"],
                                  "key": "PRIMARY",
                                  "key_length": "4",
                                  "used_key_parts": ["JobId"],
                                  "ref": ["bareos.BaseFiles.BaseJobId"],
                                  "r_loops": 0,
                                  "rows": 1,
                                  "r_rows": null,
                                  "filtered": 100,
                                  "r_filtered": null
                                },
                                "table": {
                                  "table_name": "File",
                                  "access_type": "eq_ref",
                                  "possible_keys": ["PRIMARY"],
                                  "key": "PRIMARY",
                                  "key_length": "8",
                                  "used_key_parts": ["FileId"],
                                  "ref": ["bareos.BaseFiles.FileId"],
                                  "r_loops": 0,
                                  "rows": 1,
                                  "r_rows": null,
                                  "filtered": 100,
                                  "r_filtered": null
                                }
                              }
                            }
                          ]
                        }
                      }
                    }
                  }
                }
              }
            }
          }
        },
        "table": {
          "table_name": "Path",
          "access_type": "eq_ref",
          "possible_keys": ["PRIMARY"],
          "key": "PRIMARY",
          "key_length": "4",
          "used_key_parts": ["PathId"],
          "ref": ["T1.PathId"],
          "r_loops": 1036317,
          "rows": 1,
          "r_rows": 1,
          "r_total_time_ms": 170.56,
          "filtered": 100,
          "r_filtered": 100
        },
        "table": {
          "table_name": "Filename",
          "access_type": "eq_ref",
          "possible_keys": ["PRIMARY"],
          "key": "PRIMARY",
          "key_length": "4",
          "used_key_parts": ["FilenameId"],
          "ref": ["T1.FilenameId"],
          "r_loops": 1036317,
          "rows": 1,
          "r_rows": 1,
          "r_total_time_ms": 1983.6,
          "filtered": 100,
          "r_filtered": 100
        },
        "table": {
          "table_name": "File",
          "access_type": "ref",
          "possible_keys": ["JobId", "JobId_2", "PathId"],
          "key": "PathId",
          "key_length": "4",
          "used_key_parts": ["PathId"],
          "ref": ["T1.PathId"],
          "r_loops": 1036317,
          "rows": 16,
          "r_rows": 29621,
          "r_total_time_ms": 5.11e7,
          "filtered": 100,
          "r_filtered": 0.0055,
          "attached_condition": "((`File`.FilenameId = T1.FilenameId) and (`File`.FileIndex > 0) and (<in_optimizer>(`File`.JobId,`File`.JobId in (subquery#6)) or (`File`.JobId in (3214,3268,1932,1948))))"
        },
        "table": {
          "table_name": "Job",
          "access_type": "eq_ref",
          "possible_keys": ["PRIMARY"],
          "key": "PRIMARY",
          "key_length": "4",
          "used_key_parts": ["JobId"],
          "ref": ["bareos.File.JobId"],
          "r_loops": 1703162,
          "rows": 1,
          "r_rows": 1,
          "r_total_time_ms": 2306.9,
          "filtered": 100,
          "r_filtered": 52.862,
          "attached_condition": "(Job.JobTDate = T1.JobTDate)"
        },
        "subqueries": [
          {
            "query_block": {
              "select_id": 6,
              "r_loops": 1,
              "r_total_time_ms": 0.0113,
              "table": {
                "table_name": "BaseFiles",
                "access_type": "ALL",
                "possible_keys": ["basefiles_jobid_idx"],
                "r_loops": 1,
                "rows": 1,
                "r_rows": 0,
                "r_total_time_ms": 0.004,
                "filtered": 100,
                "r_filtered": 100,
                "attached_condition": "(BaseFiles.JobId in (3214,3268,1932,1948))"
              }
            }
          }
        ]
      }
    }
  }
}

It shows that most of the time is being consumed on the File table ("r_total_time_ms": 5.11e7) on the condition "((`File`.FilenameId = T1.FilenameId) and (`File`.FileIndex > 0) and (<in_optimizer>(`File`.JobId,`File`.JobId in (subquery#6)) or (`File`.JobId in (3214,3268,1932,1948))))"

According the documentation I have all appropriate indexes available:
show index from File;
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| File | 0 | PRIMARY | 1 | FileId | A | 48084648 | NULL | NULL | | BTREE | | |
| File | 1 | JobId | 1 | JobId | A | 185655 | NULL | NULL | | BTREE | | |
| File | 1 | JobId_2 | 1 | JobId | A | 133940 | NULL | NULL | | BTREE | | |
| File | 1 | JobId_2 | 2 | PathId | A | 9616929 | NULL | NULL | | BTREE | | |
| File | 1 | JobId_2 | 3 | FilenameId | A | 48084648 | NULL | NULL | | BTREE | | |
| File | 1 | PathId | 1 | PathId | A | 4371331 | NULL | NULL | | BTREE | | |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

The strange thing about this issue is that some days the consolidation of the same client also finishes within 30 minutes instead of 14 hours.
Maybe the runtime of the query becomes exponentially higher based on the results of the subqueries.

Thanks for you support.
Steps To ReproduceRun Consolidation Job
Wait for client consolidation job to finish after 14 hours
TagsNo tags attached.
bareos-master: impact
bareos-master: action
bareos-19.2: impact
bareos-19.2: action
bareos-18.2: impact
bareos-18.2: action
bareos-17.2: impact
bareos-17.2: action
bareos-16.2: impact
bareos-16.2: action
bareos-15.2: impact
bareos-15.2: action
bareos-14.2: impact
bareos-14.2: action
bareos-13.2: impact
bareos-13.2: action
bareos-12.4: impact
bareos-12.4: action

Activities

DocFraggle

DocFraggle

2017-11-17 19:55

reporter   ~0002818

Hello,

we have the same problem with these jobs. We're running MySQL 5.7.20, the statement described above is similar to the ones which run forever, one of them ran more than 57 hours before I had to kill it manually because it locked some tables...

Is there any progress concerning this issue?

Christian
joop.martens

joop.martens

2017-11-18 11:37

reporter   ~0002819

FYI
I have moved to PostgreSQL and haven't seen the problem since.
It looks like this issue only occurs in combination with large MySQL databases.

Issue History

Date Modified Username Field Change
2017-07-06 16:51 joop.martens New Issue
2017-11-17 19:55 DocFraggle Note Added: 0002818
2017-11-18 11:37 joop.martens Note Added: 0002819