View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0000374 | bareos-core | General | public | 2014-12-16 17:04 | 2015-03-24 11:25 |
Reporter | gglynn | Assigned To | |||
Priority | normal | Severity | minor | Reproducibility | always |
Status | closed | Resolution | open | ||
Platform | Linux | OS | CentOS | OS Version | 6 |
Product Version | 14.2.2 | ||||
Fixed in Version | 14.2.3 | ||||
Summary | 0000374: NDMP SMTAPE restores fail with "SMTAPE: Error: Read Socket received EOF." | ||||
Description | I'm attempting to restore an NDMP SMTAPE backup I made of a NetApp FAS2520 flexible volume to a restricted volume of the same size on another identical NetApp (from a Bareos FileStorage pool) but the restore job fails immediately with the error "SMTAPE: Error: Read Socket received EOF" reported by the NetApp FAS unit. | ||||
Additional Information | I've confirmed that the correct nlist is provided to NetApp by the Bareos Director (bareos-dir) via NDMP, and the NetApp connects to the Bareos Storage service (bareos-sd) on its random NDMP session port, but TCPdump shows that bareos-sd closes the connection immediately after it opens. I can't tell from the Bareos logs and debug output or the NetApp's ndmpd debug log whether bareos-sd is closing the storage connection *because* the NetApp has already informed the bareos-dir that it encountered the SMTAPE EOF error, or if the disconnection is *causing* the SMTAPE EOF error. I'll note that I made the backup with Bareos 13.2.2, but I upgraded Bareos to 14.2.2 after I ran into Bareos Bug 0000345 (NDMP SMTAPE restore does not work because Bareos didn't supply an nlist), and couldn't find a binary 13.2.4 RPM with the patch included. Also note that I was able to successfully extract the SMTAPE backup to a file on the Bareos server using bextract, so the backup is there intact in the FileStorage Pool, I just can't get the Netapp to pull it in from the bareos-sd. | ||||
Tags | No tags attached. | ||||
Try adding the following to your bareos-dir.conf and bareos-sd.conf both in the server section NDMP Snooping = yes NDMP Loglevel = 7 e.g. for SD that is in the Storage section of the conf and for the DIR its the Director section. Smells a bit like the SD doesn't know how to connect to the NDMP data agent on the NetAPP box. |
|
ndmp-snooping-output-201412161549.txt (20,917 bytes)
16-Dec 12:44 sneakernet-dir JobId 54: Start Restore Job Restore_SRCNETAPP:vol1_smtape_to_DSTNETAPP:SRCNETAPP_vol1.2014-12-16_12.44.45_07 16-Dec 12:44 sneakernet-dir JobId 54: Using Device "FileStorage" to read. 16-Dec 12:44 sneakernet-dir JobId 54: >C NDMP0_NOTIFY_CONNECTED 1 16-Dec 12:44 sneakernet-dir JobId 54: reason=NDMP0_CONNECTED protocol_version=4 text_reason='Connection successful' 16-Dec 12:44 sneakernet-dir JobId 54: C> NDMP0_CONNECT_OPEN 1 16-Dec 12:44 sneakernet-dir JobId 54: version=4 16-Dec 12:44 sneakernet-dir JobId 54: >R NDMP0_CONNECT_OPEN 1 (2) 16-Dec 12:44 sneakernet-dir JobId 54: error=NDMP0_NO_ERR 16-Dec 12:44 sneakernet-dir JobId 54: C> NDMP4_CONFIG_GET_AUTH_ATTR 2 16-Dec 12:44 sneakernet-dir JobId 54: auth_type=NDMP4_AUTH_MD5 16-Dec 12:44 sneakernet-dir JobId 54: >R NDMP4_CONFIG_GET_AUTH_ATTR 2 (3) 16-Dec 12:44 sneakernet-dir JobId 54: <<unimplemented pp>> 16-Dec 12:44 sneakernet-dir JobId 54: C> NDMP4_CONNECT_CLIENT_AUTH 3 16-Dec 12:44 sneakernet-dir JobId 54: auth_type=NDMP4_AUTH_MD5 auth_id=ndmpbackup 16-Dec 12:44 sneakernet-dir JobId 54: >R NDMP4_CONNECT_CLIENT_AUTH 3 (4) 16-Dec 12:44 sneakernet-dir JobId 54: error=NDMP4_NO_ERR 16-Dec 12:44 sneakernet-dir JobId 54: >C NDMP0_NOTIFY_CONNECTED 1 16-Dec 12:44 sneakernet-dir JobId 54: reason=NDMP0_CONNECTED protocol_version=4 text_reason='Hello' 16-Dec 12:44 sneakernet-dir JobId 54: C> NDMP0_CONNECT_OPEN 1 16-Dec 12:44 sneakernet-dir JobId 54: version=4 16-Dec 12:44 sneakernet-dir JobId 54: >R NDMP0_CONNECT_OPEN 1 (2) 16-Dec 12:44 sneakernet-dir JobId 54: error=NDMP0_NO_ERR 16-Dec 12:44 sneakernet-dir JobId 54: C> NDMP4_CONFIG_GET_AUTH_ATTR 2 16-Dec 12:44 sneakernet-dir JobId 54: auth_type=NDMP4_AUTH_MD5 16-Dec 12:44 sneakernet-dir JobId 54: >R NDMP4_CONFIG_GET_AUTH_ATTR 2 (3) 16-Dec 12:44 sneakernet-dir JobId 54: <<unimplemented pp>> 16-Dec 12:44 sneakernet-dir JobId 54: C> NDMP4_CONNECT_CLIENT_AUTH 3 16-Dec 12:44 sneakernet-dir JobId 54: auth_type=NDMP4_AUTH_MD5 auth_id=ndmpbackup 16-Dec 12:44 sneakernet-dir JobId 54: >R NDMP4_CONNECT_CLIENT_AUTH 3 (4) 16-Dec 12:44 sneakernet-dir JobId 54: error=NDMP4_NO_ERR 16-Dec 12:44 sneakernet-dir JobId 54: ndmca_connect_tape_agent: 0 0x7f09e80f8a70 16-Dec 12:44 sneakernet-dir JobId 54: C> NDMP4_MOVER_SET_RECORD_SIZE 4 16-Dec 12:44 sneakernet-dir JobId 54: len=64512 16-Dec 12:44 sneakernet-dir JobId 54: >R NDMP4_MOVER_SET_RECORD_SIZE 4 (5) 16-Dec 12:44 sneakernet-dir JobId 54: error=NDMP4_NO_ERR 16-Dec 12:44 sneakernet-dir JobId 54: Opening tape drive NNLE-BCOP-OODO-CLOL-GBDA-OIMP-ELMG-IAAK@/vol/vol1 read-only 16-Dec 12:44 sneakernet-dir JobId 54: C> NDMP4_TAPE_OPEN 5 16-Dec 12:44 sneakernet-dir JobId 54: device='NNLE-BCOP-OODO-CLOL-GBDA-OIMP-ELMG-IAAK@/vol/vol1' mode=NDMP4_TAPE_READ_MODE 16-Dec 12:44 sneakernet-dir JobId 54: >R NDMP4_TAPE_OPEN 5 (6) 16-Dec 12:44 sneakernet-dir JobId 54: error=NDMP4_NO_ERR 16-Dec 12:44 sneakernet-dir JobId 54: Commanding tape drive to rewind 16-Dec 12:44 sneakernet-dir JobId 54: C> NDMP4_TAPE_MTIO 6 16-Dec 12:44 sneakernet-dir JobId 54: op=NDMP4_MTIO_REW count=1 16-Dec 12:44 sneakernet-dir JobId 54: >R NDMP4_TAPE_MTIO 6 (7) 16-Dec 12:44 sneakernet-dir JobId 54: error=NDMP4_NO_ERR resid_count=0 16-Dec 12:44 sneakernet-dir JobId 54: C> NDMP4_MOVER_SET_WINDOW 7 16-Dec 12:44 sneakernet-dir JobId 54: offset=0 length=-1 16-Dec 12:44 sneakernet-dir JobId 54: >R NDMP4_MOVER_SET_WINDOW 7 (8) 16-Dec 12:44 sneakernet-dir JobId 54: error=NDMP4_NO_ERR 16-Dec 12:44 sneakernet-dir JobId 54: C> NDMP4_MOVER_LISTEN 8 16-Dec 12:44 sneakernet-dir JobId 54: mode=NDMP4_MOVER_MODE_WRITE addr_type=NDMP4_ADDR_TCP 16-Dec 12:44 sneakernet-dir JobId 54: >R NDMP4_MOVER_LISTEN 8 (9) 16-Dec 12:44 sneakernet-dir JobId 54: error=NDMP4_NO_ERR data_conn_addr=NDMP4_ADDR_TCP #0(c0a8039d,48651) 16-Dec 12:44 sneakernet-dir JobId 54: C> NDMP4_DATA_CONNECT 4 16-Dec 12:44 sneakernet-dir JobId 54: addr=NDMP4_ADDR_TCP #0(c0a8039d,48651) 16-Dec 12:44 sneakernet-sd JobId 54: Ready to read from volume "FVol0043" on device "FileStorage" (/home/bareos). 16-Dec 12:44 sneakernet-sd JobId 54: Forward spacing Volume "FVol0043" to file:block 7:3140372906. 16-Dec 12:44 sneakernet-dir JobId 54: >R NDMP4_DATA_CONNECT 4 (5) 16-Dec 12:44 sneakernet-dir JobId 54: error=NDMP4_NO_ERR 16-Dec 12:44 sneakernet-dir JobId 54: C> NDMP4_DATA_START_RECOVER 5 16-Dec 12:44 sneakernet-dir JobId 54: butype_name='smtape' n_env=1 n_nlist=1 16-Dec 12:44 sneakernet-dir JobId 54: env[0] name='PREFIX' value='/vol/SRCNETAPP_vol1' 16-Dec 12:44 sneakernet-dir JobId 54: nl[0] original_path='' 16-Dec 12:44 sneakernet-dir JobId 54: ..... destination_path='/vol/SRCNETAPP_vol1' 16-Dec 12:44 sneakernet-dir JobId 54: ..... name='' other='' 16-Dec 12:44 sneakernet-dir JobId 54: ..... node=-1 fh_info=-1 16-Dec 12:44 sneakernet-dir JobId 54: >R NDMP4_DATA_START_RECOVER 5 (6) 16-Dec 12:44 sneakernet-dir JobId 54: error=NDMP4_NO_ERR 16-Dec 12:44 sneakernet-dir JobId 54: Waiting for operation to start 16-Dec 12:44 sneakernet-dir JobId 54: C> NDMP4_DATA_GET_STATE 6 16-Dec 12:44 sneakernet-dir JobId 54: >R NDMP4_DATA_GET_STATE 6 (7) 16-Dec 12:44 sneakernet-dir JobId 54: unsupp=3 error=NDMP4_NO_ERR op=NDMP4_DATA_OP_RECOVER 16-Dec 12:44 sneakernet-dir JobId 54: state=NDMP4_DATA_STATE_ACTIVE 16-Dec 12:44 sneakernet-dir JobId 54: halt_reason=NDMP4_DATA_HALT_NA 16-Dec 12:44 sneakernet-dir JobId 54: bytes_processed=0 est_bytes_remain=0 16-Dec 12:44 sneakernet-dir JobId 54: est_time_remain=0 data_conn_addr=NDMP4_ADDR_TCP #0(c0a8039d,48651) 16-Dec 12:44 sneakernet-dir JobId 54: read_offset=0 read_length=0 16-Dec 12:44 sneakernet-dir JobId 54: C> NDMP4_MOVER_GET_STATE 9 16-Dec 12:44 sneakernet-dir JobId 54: >C NDMP4_LOG_MESSAGE 10 16-Dec 12:44 sneakernet-dir JobId 54: log_type=NDMP4_LOG_DEBUG id=1418762687 message='mover going active' 16-Dec 12:44 sneakernet-dir JobId 54: Async request NDMP4_LOG_MESSAGE 16-Dec 12:44 sneakernet-dir JobId 54: >R NDMP4_MOVER_GET_STATE 9 (11) 16-Dec 12:44 sneakernet-dir JobId 54: error=NDMP4_NO_ERR state=NDMP4_MOVER_STATE_ACTIVE 16-Dec 12:44 sneakernet-dir JobId 54: pause_reason=NDMP4_MOVER_PAUSE_NA 16-Dec 12:44 sneakernet-dir JobId 54: halt_reason=NDMP4_MOVER_HALT_NA 16-Dec 12:44 sneakernet-dir JobId 54: record_size=64512 record_num=0 bytes_moved=0 16-Dec 12:44 sneakernet-dir JobId 54: seek=0 to_read=0 win_off=0 win_len=-1 16-Dec 12:44 sneakernet-dir JobId 54: data_conn_addr=NDMP4_ADDR_TCP #0(c0a8039d,48651) 16-Dec 12:44 sneakernet-dir JobId 54: C> NDMP4_TAPE_GET_STATE 10 16-Dec 12:44 sneakernet-dir JobId 54: >R NDMP4_TAPE_GET_STATE 10 (12) 16-Dec 12:44 sneakernet-dir JobId 54: unsupp=30 error=NDMP4_NO_ERR flags=0x0 file_num=0 16-Dec 12:44 sneakernet-dir JobId 54: soft_errors=0 block_size=0 blockno=0 16-Dec 12:44 sneakernet-dir JobId 54: total_space=-1 space_remain=-1 16-Dec 12:44 sneakernet-dir JobId 54: Operation started 16-Dec 12:44 sneakernet-dir JobId 54: Monitoring recover 16-Dec 12:44 sneakernet-dir JobId 54: mon_wait_for_something() entered 16-Dec 12:44 sneakernet-dir JobId 54: >C NDMP4_NOTIFY_DATA_READ 8 16-Dec 12:44 sneakernet-dir JobId 54: offset=0 length=-1 16-Dec 12:44 sneakernet-dir JobId 54: mon_wait_for_something() happened, resid=30 16-Dec 12:44 sneakernet-dir JobId 54: C> NDMP4_DATA_GET_STATE 7 16-Dec 12:44 sneakernet-dir JobId 54: >R NDMP4_DATA_GET_STATE 7 (9) 16-Dec 12:44 sneakernet-dir JobId 54: unsupp=3 error=NDMP4_NO_ERR op=NDMP4_DATA_OP_RECOVER 16-Dec 12:44 sneakernet-dir JobId 54: state=NDMP4_DATA_STATE_ACTIVE 16-Dec 12:44 sneakernet-dir JobId 54: halt_reason=NDMP4_DATA_HALT_NA 16-Dec 12:44 sneakernet-dir JobId 54: bytes_processed=0 est_bytes_remain=0 16-Dec 12:44 sneakernet-dir JobId 54: est_time_remain=0 data_conn_addr=NDMP4_ADDR_TCP #0(c0a8039d,48651) 16-Dec 12:44 sneakernet-dir JobId 54: read_offset=0 read_length=-1 16-Dec 12:44 sneakernet-dir JobId 54: C> NDMP4_MOVER_GET_STATE 11 16-Dec 12:44 sneakernet-dir JobId 54: >R NDMP4_MOVER_GET_STATE 11 (13) 16-Dec 12:44 sneakernet-dir JobId 54: error=NDMP4_NO_ERR state=NDMP4_MOVER_STATE_ACTIVE 16-Dec 12:44 sneakernet-dir JobId 54: pause_reason=NDMP4_MOVER_PAUSE_NA 16-Dec 12:44 sneakernet-dir JobId 54: halt_reason=NDMP4_MOVER_HALT_NA 16-Dec 12:44 sneakernet-dir JobId 54: record_size=64512 record_num=0 bytes_moved=0 16-Dec 12:44 sneakernet-dir JobId 54: seek=0 to_read=0 win_off=0 win_len=-1 16-Dec 12:44 sneakernet-dir JobId 54: data_conn_addr=NDMP4_ADDR_TCP #0(c0a8039d,48651) 16-Dec 12:44 sneakernet-dir JobId 54: C> NDMP4_TAPE_GET_STATE 12 16-Dec 12:44 sneakernet-dir JobId 54: >R NDMP4_TAPE_GET_STATE 12 (14) 16-Dec 12:44 sneakernet-dir JobId 54: unsupp=30 error=NDMP4_NO_ERR flags=0x0 file_num=0 16-Dec 12:44 sneakernet-dir JobId 54: soft_errors=0 block_size=0 blockno=0 16-Dec 12:44 sneakernet-dir JobId 54: total_space=-1 space_remain=-1 16-Dec 12:44 sneakernet-dir JobId 54: DATA: bytes 0KB MOVER: read 0KB record 0 16-Dec 12:44 sneakernet-dir JobId 54: C> NDMP4_MOVER_READ 13 16-Dec 12:44 sneakernet-dir JobId 54: offset=0 length=-1 16-Dec 12:44 sneakernet-dir JobId 54: >R NDMP4_MOVER_READ 13 (15) 16-Dec 12:44 sneakernet-dir JobId 54: error=NDMP4_NO_ERR 16-Dec 12:44 sneakernet-dir JobId 54: mon_wait_for_something() entered 16-Dec 12:44 sneakernet-dir JobId 54: >C NDMP4_NOTIFY_MOVER_PAUSED 16 16-Dec 12:44 sneakernet-dir JobId 54: reason=NDMP4_MOVER_PAUSE_EOF seek_position=0 16-Dec 12:44 sneakernet-dir JobId 54: mon_wait_for_something() happened, resid=10 16-Dec 12:44 sneakernet-dir JobId 54: C> NDMP4_DATA_GET_STATE 8 16-Dec 12:44 sneakernet-dir JobId 54: >R NDMP4_DATA_GET_STATE 8 (10) 16-Dec 12:44 sneakernet-dir JobId 54: unsupp=3 error=NDMP4_NO_ERR op=NDMP4_DATA_OP_RECOVER 16-Dec 12:44 sneakernet-dir JobId 54: state=NDMP4_DATA_STATE_ACTIVE 16-Dec 12:44 sneakernet-dir JobId 54: halt_reason=NDMP4_DATA_HALT_NA 16-Dec 12:44 sneakernet-dir JobId 54: bytes_processed=0 est_bytes_remain=0 16-Dec 12:44 sneakernet-dir JobId 54: est_time_remain=0 data_conn_addr=NDMP4_ADDR_TCP #0(c0a8039d,48651) 16-Dec 12:44 sneakernet-dir JobId 54: read_offset=0 read_length=-1 16-Dec 12:44 sneakernet-dir JobId 54: C> NDMP4_MOVER_GET_STATE 14 16-Dec 12:44 sneakernet-dir JobId 54: >R NDMP4_MOVER_GET_STATE 14 (17) 16-Dec 12:44 sneakernet-dir JobId 54: error=NDMP4_NO_ERR state=NDMP4_MOVER_STATE_PAUSED 16-Dec 12:44 sneakernet-dir JobId 54: pause_reason=NDMP4_MOVER_PAUSE_EOF 16-Dec 12:44 sneakernet-dir JobId 54: halt_reason=NDMP4_MOVER_HALT_NA 16-Dec 12:44 sneakernet-dir JobId 54: record_size=64512 record_num=0 bytes_moved=0 16-Dec 12:44 sneakernet-dir JobId 54: seek=0 to_read=-1 win_off=0 win_len=-1 16-Dec 12:44 sneakernet-dir JobId 54: data_conn_addr=NDMP4_ADDR_TCP #0(c0a8039d,48651) 16-Dec 12:44 sneakernet-dir JobId 54: C> NDMP4_TAPE_GET_STATE 15 16-Dec 12:44 sneakernet-dir JobId 54: >R NDMP4_TAPE_GET_STATE 15 (18) 16-Dec 12:44 sneakernet-dir JobId 54: unsupp=30 error=NDMP4_NO_ERR flags=0x0 file_num=0 16-Dec 12:44 sneakernet-dir JobId 54: soft_errors=0 block_size=0 blockno=1 16-Dec 12:44 sneakernet-dir JobId 54: total_space=-1 space_remain=-1 16-Dec 12:44 sneakernet-dir JobId 54: DATA: bytes 0KB MOVER: read 0KB record 0 16-Dec 12:44 sneakernet-dir JobId 54: Mover paused, reason=NDMP9_MOVER_PAUSE_EOF 16-Dec 12:44 sneakernet-dir JobId 54: End of tapes 16-Dec 12:44 sneakernet-dir JobId 54: C> NDMP4_MOVER_CLOSE 16 16-Dec 12:44 sneakernet-dir JobId 54: >R NDMP4_MOVER_CLOSE 16 (19) 16-Dec 12:44 sneakernet-dir JobId 54: error=NDMP4_NO_ERR 16-Dec 12:44 sneakernet-dir JobId 54: mon_wait_for_something() entered 16-Dec 12:44 sneakernet-dir JobId 54: >C NDMP4_LOG_MESSAGE 11 16-Dec 12:44 sneakernet-dir JobId 54: log_type=NDMP4_LOG_ERROR id=1 message='SMTAPE: Error: Read Socket received EOF. ' 16-Dec 12:44 sneakernet-dir JobId 54: Error: LOG_MESSAGE: 'SMTAPE: Error: Read Socket received EOF.' 16-Dec 12:44 sneakernet-dir JobId 54: >C NDMP4_LOG_MESSAGE 12 16-Dec 12:44 sneakernet-dir JobId 54: log_type=NDMP4_LOG_NORMAL id=2 message='SMTAPE: Cannot Init Input, aborting ' 16-Dec 12:44 sneakernet-dir JobId 54: LOG_MESSAGE: 'SMTAPE: Cannot Init Input, aborting' 16-Dec 12:44 sneakernet-dir JobId 54: >C NDMP4_LOG_MESSAGE 13 16-Dec 12:44 sneakernet-dir JobId 54: log_type=NDMP4_LOG_ERROR id=3 message='smtape operation failed due to: image restore aborted due to snapmirror failure. ' 16-Dec 12:44 sneakernet-dir JobId 54: Error: LOG_MESSAGE: 'smtape operation failed due to: image restore aborted due to snapmirror failure.' 16-Dec 12:44 sneakernet-dir JobId 54: >C NDMP4_LOG_MESSAGE 14 16-Dec 12:44 sneakernet-dir JobId 54: log_type=NDMP4_LOG_ERROR id=4 message='DATA: Operation terminated: EVENT: INTERNAL ERROR (for SRCNETAPP_vol1) ' 16-Dec 12:44 sneakernet-dir JobId 54: Error: LOG_MESSAGE: 'DATA: Operation terminated: EVENT: INTERNAL ERROR (for SRCNETAPP_vol1)' 16-Dec 12:44 sneakernet-dir JobId 54: >C NDMP4_NOTIFY_DATA_HALTED 15 16-Dec 12:44 sneakernet-dir JobId 54: reason=NDMP4_DATA_HALT_INTERNAL_ERROR 16-Dec 12:44 sneakernet-dir JobId 54: mon_wait_for_something() happened, resid=10 16-Dec 12:44 sneakernet-dir JobId 54: C> NDMP4_DATA_GET_STATE 9 16-Dec 12:44 sneakernet-dir JobId 54: >R NDMP4_DATA_GET_STATE 9 (16) 16-Dec 12:44 sneakernet-dir JobId 54: unsupp=3 error=NDMP4_NO_ERR op=NDMP4_DATA_OP_RECOVER 16-Dec 12:44 sneakernet-dir JobId 54: state=NDMP4_DATA_STATE_HALTED 16-Dec 12:44 sneakernet-dir JobId 54: halt_reason=NDMP4_DATA_HALT_INTERNAL_ERROR 16-Dec 12:44 sneakernet-dir JobId 54: bytes_processed=0 est_bytes_remain=0 16-Dec 12:44 sneakernet-dir JobId 54: est_time_remain=0 data_conn_addr=NDMP4_ADDR_TCP #0(c0a8039d,48651) 16-Dec 12:44 sneakernet-dir JobId 54: read_offset=0 read_length=-1 16-Dec 12:44 sneakernet-dir JobId 54: C> NDMP4_MOVER_GET_STATE 17 16-Dec 12:44 sneakernet-dir JobId 54: >C NDMP4_NOTIFY_MOVER_HALTED 20 16-Dec 12:44 sneakernet-dir JobId 54: reason=NDMP4_MOVER_HALT_CONNECT_CLOSED 16-Dec 12:44 sneakernet-dir JobId 54: Async request NDMP4_NOTIFY_MOVER_HALTED 16-Dec 12:44 sneakernet-dir JobId 54: >R NDMP4_MOVER_GET_STATE 17 (21) 16-Dec 12:44 sneakernet-dir JobId 54: error=NDMP4_NO_ERR state=NDMP4_MOVER_STATE_HALTED 16-Dec 12:44 sneakernet-dir JobId 54: pause_reason=NDMP4_MOVER_PAUSE_NA 16-Dec 12:44 sneakernet-dir JobId 54: halt_reason=NDMP4_MOVER_HALT_CONNECT_CLOSED 16-Dec 12:44 sneakernet-dir JobId 54: record_size=64512 record_num=0 bytes_moved=0 16-Dec 12:44 sneakernet-dir JobId 54: seek=0 to_read=-1 win_off=0 win_len=-1 16-Dec 12:44 sneakernet-dir JobId 54: data_conn_addr=NDMP4_ADDR_TCP #0(c0a8039d,48651) 16-Dec 12:44 sneakernet-dir JobId 54: C> NDMP4_TAPE_GET_STATE 18 16-Dec 12:44 sneakernet-dir JobId 54: >R NDMP4_TAPE_GET_STATE 18 (22) 16-Dec 12:44 sneakernet-dir JobId 54: unsupp=30 error=NDMP4_NO_ERR flags=0x0 file_num=0 16-Dec 12:44 sneakernet-dir JobId 54: soft_errors=0 block_size=0 blockno=1 16-Dec 12:44 sneakernet-dir JobId 54: total_space=-1 space_remain=-1 16-Dec 12:44 sneakernet-dir JobId 54: DATA: bytes 0KB MOVER: read 0KB record 0 16-Dec 12:44 sneakernet-dir JobId 54: Operation done, cleaning up 16-Dec 12:44 sneakernet-dir JobId 54: Waiting for operation to halt 16-Dec 12:44 sneakernet-dir JobId 54: mon_wait_for_something() entered 16-Dec 12:44 sneakernet-dir JobId 54: mon_wait_for_something() happened, resid=2 16-Dec 12:44 sneakernet-dir JobId 54: C> NDMP4_DATA_GET_STATE 10 16-Dec 12:44 sneakernet-dir JobId 54: >R NDMP4_DATA_GET_STATE 10 (17) 16-Dec 12:44 sneakernet-dir JobId 54: unsupp=3 error=NDMP4_NO_ERR op=NDMP4_DATA_OP_RECOVER 16-Dec 12:44 sneakernet-dir JobId 54: state=NDMP4_DATA_STATE_HALTED 16-Dec 12:44 sneakernet-dir JobId 54: halt_reason=NDMP4_DATA_HALT_INTERNAL_ERROR 16-Dec 12:44 sneakernet-dir JobId 54: bytes_processed=0 est_bytes_remain=0 16-Dec 12:44 sneakernet-dir JobId 54: est_time_remain=0 data_conn_addr=NDMP4_ADDR_TCP #0(c0a8039d,48651) 16-Dec 12:44 sneakernet-dir JobId 54: read_offset=0 read_length=-1 16-Dec 12:44 sneakernet-dir JobId 54: C> NDMP4_MOVER_GET_STATE 19 16-Dec 12:44 sneakernet-dir JobId 54: >R NDMP4_MOVER_GET_STATE 19 (23) 16-Dec 12:44 sneakernet-dir JobId 54: error=NDMP4_NO_ERR state=NDMP4_MOVER_STATE_HALTED 16-Dec 12:44 sneakernet-dir JobId 54: pause_reason=NDMP4_MOVER_PAUSE_NA 16-Dec 12:44 sneakernet-dir JobId 54: halt_reason=NDMP4_MOVER_HALT_CONNECT_CLOSED 16-Dec 12:44 sneakernet-dir JobId 54: record_size=64512 record_num=0 bytes_moved=0 16-Dec 12:44 sneakernet-dir JobId 54: seek=0 to_read=-1 win_off=0 win_len=-1 16-Dec 12:44 sneakernet-dir JobId 54: data_conn_addr=NDMP4_ADDR_TCP #0(c0a8039d,48651) 16-Dec 12:44 sneakernet-dir JobId 54: C> NDMP4_TAPE_GET_STATE 20 16-Dec 12:44 sneakernet-dir JobId 54: >R NDMP4_TAPE_GET_STATE 20 (24) 16-Dec 12:44 sneakernet-dir JobId 54: unsupp=30 error=NDMP4_NO_ERR flags=0x0 file_num=0 16-Dec 12:44 sneakernet-dir JobId 54: soft_errors=0 block_size=0 blockno=1 16-Dec 12:44 sneakernet-dir JobId 54: total_space=-1 space_remain=-1 16-Dec 12:44 sneakernet-dir JobId 54: Commanding tape drive to rewind 16-Dec 12:44 sneakernet-dir JobId 54: C> NDMP4_TAPE_MTIO 21 16-Dec 12:44 sneakernet-dir JobId 54: op=NDMP4_MTIO_REW count=1 16-Dec 12:44 sneakernet-dir JobId 54: >R NDMP4_TAPE_MTIO 21 (25) 16-Dec 12:44 sneakernet-dir JobId 54: error=NDMP4_NO_ERR resid_count=0 16-Dec 12:44 sneakernet-dir JobId 54: Closing tape drive NNLE-BCOP-OODO-CLOL-GBDA-OIMP-ELMG-IAAK@/vol/vol1 16-Dec 12:44 sneakernet-dir JobId 54: C> NDMP4_TAPE_CLOSE 22 16-Dec 12:44 sneakernet-dir JobId 54: >R NDMP4_TAPE_CLOSE 22 (26) 16-Dec 12:44 sneakernet-dir JobId 54: error=NDMP4_NO_ERR 16-Dec 12:44 sneakernet-dir JobId 54: Operation halted, stopping 16-Dec 12:44 sneakernet-dir JobId 54: Operation ended questionably 16-Dec 12:44 sneakernet-dir JobId 54: C> NDMP4_DATA_STOP 11 16-Dec 12:44 sneakernet-dir JobId 54: >R NDMP4_DATA_STOP 11 (18) 16-Dec 12:44 sneakernet-dir JobId 54: error=NDMP4_NO_ERR 16-Dec 12:44 sneakernet-dir JobId 54: C> NDMP4_MOVER_STOP 23 16-Dec 12:44 sneakernet-dir JobId 54: >R NDMP4_MOVER_STOP 23 (27) 16-Dec 12:44 sneakernet-dir JobId 54: error=NDMP4_NO_ERR 16-Dec 12:44 sneakernet-dir JobId 54: C> NDMP4_DATA_GET_STATE 12 16-Dec 12:44 sneakernet-dir JobId 54: >R NDMP4_DATA_GET_STATE 12 (19) 16-Dec 12:44 sneakernet-dir JobId 54: unsupp=3 error=NDMP4_NO_ERR op=NDMP4_DATA_OP_NOACTION 16-Dec 12:44 sneakernet-dir JobId 54: state=NDMP4_DATA_STATE_IDLE 16-Dec 12:44 sneakernet-dir JobId 54: halt_reason=NDMP4_DATA_HALT_NA 16-Dec 12:44 sneakernet-dir JobId 54: bytes_processed=0 est_bytes_remain=0 16-Dec 12:44 sneakernet-dir JobId 54: est_time_remain=0 data_conn_addr=NDMP4_ADDR_LOCAL 16-Dec 12:44 sneakernet-dir JobId 54: read_offset=0 read_length=0 16-Dec 12:44 sneakernet-dir JobId 54: C> NDMP4_MOVER_GET_STATE 24 16-Dec 12:44 sneakernet-dir JobId 54: >R NDMP4_MOVER_GET_STATE 24 (28) 16-Dec 12:44 sneakernet-dir JobId 54: error=NDMP4_NO_ERR state=NDMP4_MOVER_STATE_IDLE 16-Dec 12:44 sneakernet-dir JobId 54: pause_reason=NDMP4_MOVER_PAUSE_NA 16-Dec 12:44 sneakernet-dir JobId 54: halt_reason=NDMP4_MOVER_HALT_NA 16-Dec 12:44 sneakernet-dir JobId 54: record_size=10240 record_num=0 bytes_moved=0 16-Dec 12:44 sneakernet-dir JobId 54: seek=0 to_read=0 win_off=0 win_len=-1 16-Dec 12:44 sneakernet-dir JobId 54: data_conn_addr=NDMP4_ADDR_LOCAL 16-Dec 12:44 sneakernet-dir JobId 54: C> NDMP4_TAPE_GET_STATE 25 16-Dec 12:44 sneakernet-dir JobId 54: >R NDMP4_TAPE_GET_STATE 25 (29) 16-Dec 12:44 sneakernet-dir JobId 54: unsupp=3f error=NDMP4_DEV_NOT_OPEN_ERR flags=0x0 file_num=4294967295 16-Dec 12:44 sneakernet-dir JobId 54: soft_errors=4294967295 block_size=4294967295 blockno=4294967295 16-Dec 12:44 sneakernet-dir JobId 54: total_space=-1 space_remain=-1 16-Dec 12:44 sneakernet-dir JobId 54: ERR=reply-error 16-Dec 12:44 sneakernet-sd JobId 54: ERR=EOF 16-Dec 12:44 sneakernet-dir JobId 54: Error: Bareos sneakernet-dir 14.2.2 (12Dec14): Build OS: x86_64-unknown-linux-gnu redhat CentOS release 6.2 (Final) JobId: 54 Job: Restore_SRCNETAPP:vol1_smtape_to_DSTNETAPP:SRCNETAPP_vol1.2014-12-16_12.44.45_07 Restore Client: DSTNETAPP Start time: 16-Dec-2014 12:44:47 End time: 16-Dec-2014 12:44:47 Elapsed time: 0 secs Files Expected: 1 Files Restored: 0 Bytes Restored: 0 Rate: 0.0 KB/s SD termination status: Running Termination: *** Restore Error *** |
|
I've attached a text file with the messages output of another restore attempt with NDMP Snooping = yes and NDMP Loglevel = 7 as requested. In case it helps, the name of my Bareos server is "sneakernet" and it's IP address is 192.168.3.157 (c0a8039d in hex). The IP address of the DSTNETAPP system is 192.168.3.216, and SRCNETAPP is the NetApp from which I backed up the volume (/vol/vol1 on SRCNETAPP, and /vol/SRCNETAPP_vol1 on DSTNETAPP). Note that I have the same problem when I try to restore to the same volume name (i.e. /vol/vol1) on DSTNETAPP, so it doesn't appear to be related to redirecting the restore to a different volume name. | |
The NDMP trace shows indeed that the full handshake succeeds and it starts a restore session it only seems it doesn't read a block which must be because the restore selected no media to read. Next option would be to run the SD in debug mode starting with a level 100 or maybe even 200 but this gives really a lot of output. How does your restore session look like in bconsole ? It would also be interesting to see the BSR that gets created as that describes what the SD is going to read. |
|
Yes, that's what I thought I was seeing in the NDMP trace, too, which is weird. Like I said, I know the data is intact in the FileStorage pool volumes, because I extracted it using bextract, and in fact, I used a BSR generated by a restore job attempt to do it. Note that when I did the extract, it put the extracted files relative to a top-level directory named "@NDMP", so when I extracted an SMTAPE backup of /vol/vol1 to /home/tmp, it actually put the file in /home/tmp/@NDMP/vol/vol1. I also manually copied that extracted "vol1" file to DSTNETAPP and successfully used it to restore a volume using the "smtape restore" command right on the NetApp CLI, so it's definitely not munged. Anyway, here's what the restore job looks like in bconsole right before I run it: Run Restore job JobName: Restore SRCNETAPP:vol1 smtape to DSTNETAPP:SRCNETAPP_vol1 Bootstrap: /var/lib/bareos/sneakernet-dir.restore.3.bsr Where: /vol/SRCNETAPP_vol1 Replace: Always FileSet: SRCNETAPP_vol1-fileset Backup Client: SRCNETAPP Restore Client: DSTNETAPP Format: smtape Storage: File When: 2014-12-16 14:13:16 Catalog: MyCatalog Priority: 10 Plugin Options: *None* I also attached the BSR referenced above to this bug. Any idea why bextract is reading the FileStorage volumes properly, but SD seems to be choking on them and returning EOF right on the first read? |
|
As the same code is used there must be something that it doesn't like. Which can be also still on the NDMP level. Only a debug run of the SD will show what is going on. So run bareos-sd -f -d 200 > logfile and make sure you only have the NDMP restore as this will give quite some debug output and gets fully unreadable when more then one job is running. Keep in mind that the SD probably runs as user bareos so you need to run the debug also under that user (as the user has not login shell you need su - bareos -s /bin/bash or so) |
|
I've uploaded the bareos-sd log output (at debug level 200) for an attempt to restore vol2. I had to split it into two parts in order to upload it to this bug, but it's all from one run. |
|
Ok log looks normal and the SD processes everything only the NDMP tape never opens. The SD get a cancel: sneakernet-sd: dir_cmd.c:262-0 <dird: cancel Job=Restore_SRCNETAPP:vol2_smtape_to_DSTNETAPP:vol2.2014-12-17_11.16.47_10 So the dir is canceling the restore for whatever reason. So think we can remove the SD from the problem. |
|
Need a debug log from the director which is more or less the same cmdline. | |
Director debug log (level 200) for another restore job with the same parameters as for the bareos-sd debug is attached. | |
Just want to throw this out there: I seem to remember something about SMTAPE on the NetApp using a 240KB tape block size, but in the debug logs I just uploaded, I see a lot of references to a max block size of 64KB. Any relevance? | |
The logs at least confirm that everything is setup right and tape is opened by the NDMP DMA and the moment the data mover gets active the NetAPP seems to abort the restore. About the tape block size as NDMP is so device specific you could be right. The blocksize you see is the default blocksize but its configurable using the NdmpBlockSize parameter in the client definition it defaults to 64512. |
|
Does the Director communicate the blocksize to the NetApp prior to the abort? With the FileStorage volume type, does the block size on the restore have to match the block size on the backup? I'd like to try to change the NdmpBlockSize parameter to 245760 bytes, but I want to make sure this won't cause any major problems on the NetApp. |
|
The NDMP blocksize is only used as send size over the NDMP protocol as we don't support direct writing to NDMP Tape agents e.g. we emulate the Tape Agent in the bareos SD the actual data is written in the blocksize of the lower level device anyway. So I think you can just change the blocksize but what impact that has on the NetAPP I have no idea then again its not an enormous blocksize. |
|
I changed the NdmpBlockSize on the Client entry for DSTNETAPP to 24576 (240KB), and now it no longer fails immediately! However, the NDMP Mover function of the SD doesn't actually move any data. The TCP connection established from the NetApp to the SD stays connected now, but they just keep exchanging 0 byte payload packets, and the restore job just keeps logging "DATA: bytes 0KB MOVER: read 0KB record 0" every 30 seconds until I cancel the job. |
|
Uploaded new debug logs for DIR and SD on new restore attempt described in my last note. | |
Ok removed the debugging output from the report as it only shows obvious things. Looked at the whole blocksize thing with SMTAPE and it seems its hardcoded to use a blocksizes from 4 to 256 Kb in multiples of 4 Kb. Don't seem to remember that was like that before. But that does show a problem with the default blocksize we use which is 15.75 x 4 Kb so that is probably why its not happy. The problem with replying at night is that you sometimes don't see the whole picture. But the current code will always return the saved data in the same chunk size as the data was saved in so changing it to 240 Kb will still return only chunks of the original size. So I think your only option is to: - Change the NDMP blocksize to something that can be divided by 4 Kb - Create a Pool with a blocksize that matches the NDMP blocksize - Create a new backup with these new settings. |
|
I'm still unclear as to what's happening here. Can the SD not read the backed-up data in the File-type Volumes that hold the backups for some reason? Is it trying to use the newly-set 240KB NdmpBlockSize to read them and failing to do so because it doesn't match the blocksize used to back up the data? I'm still confused as to how the source Netapp was able to stream the SMTAPE back to the SD in the first place if the size of the 240KB blocks the Netapp was sending exceeded Bareos' default block size for File-type Volumes (is this 63KB)? Did the SD just ignore the blocking and take the data as a stream of bytes, reblocking it to whatever block size it uses for the File-type Volumes (is this also 63KB)? Is the block size of the File-type Volumes configurable, or is it the same for all such File-type Volumes? If it is configurable/variable, is there any way to tell what the block size of an existing File-type Volume is? No matter what I set the "Maximum Block Size" and "Minimum Block Size" for the "File" Device in bareos-sd.conf, bls seems to be able to read the existing Volumes without issue, so I suspect this parameter is fixed and ignore for this Volume/Device type, but that's just a guess, I'm sure you know for sure. |
|
Those are a lot of questions and they mostly make little sense. One things is for sure the NetAPP is indeed buggy as it allows you to use a blocksize that it doesn't support for backup but on restore it fails to restore such a blocksize. The 240 Kb blocksize is the default but is in no way a must see my previous note on how it really works. Could be it are two different levels of the NetAPP OS. When you search for blocksize and NetAPP it looks as if 8.2 is somewhat prominently mentioned. As to why 240 Kb reading doesn't work is because the SD reads the old NDMP blocks (which are completely different then what Bareos calls a block) back in their own size. The NDMP blocks are encoded as normal so called DATA streams (each piece of backup data has it own streamtype e.g. data, metadata, compressed data etc.) The current code doesn't try to assemble parts or multipe parts of data into one bigger block. That is also fine as its a known limitation that you should use the same NDMP blocksize for backup and restore. As to why it works with bextract is fairly simple, that just concatenates all data blocks into one long data stream. That is something you can of course read in any blocksize you want until it fully processed. Any setting of the minimum and maximum block size on the bareos tape/file device have zero impact on the whole thing as the NDMP blocksize is a separate configuration setting and the code will chunk things in the right way as it operates on NDMP data streams which are for the SD an opaque data stream. The SD just returns the exact same data as the bextract only in 63 Kb e.g. the POSIX default blocksize. The NetAPP for whatever reason (probably because its not a 4 Kb chunked piece of data) doesn't like the data coming in over the NDMP protocol. I would be possible to write re-assembly code that maps from one NDMP blocksize to an other on restore but that is not something that is really high on my priority list, also given that NDMP blocksize is a known fixed thing for NDMP so it only possible because we emulate the NDMP Tape Agent to do such translation. So I guess your only option is to perform the actions I mentioned in the note above. The blocksize on the disk pool is not strictly needed but for performance reasons it probably makes sense to have a blocksize that is a bit bigger even then NDMP blocksize so a single chunk of backup data fits and doesn't need to be cut into multiple pieces when saving it. |
|
"One things is for sure the NetAPP is indeed buggy as it allows you to use a blocksize that it doesn't support for backup but on restore it fails to restore such a blocksize. The 240 Kb blocksize is the default but is in no way a must see my previous note on how it really works. Could be it are two different levels of the NetAPP OS. When you search for blocksize and NetApp it looks as if 8.2 is somewhat prominently mentioned." I think SMTAPE is somewhat different than a non-SMTAPE NDMP backup. The NetApp generates a single SMTAPE file for the volume you're backing up and transfers it via NDMP to the Tape Agent as it's being generated. The SMTAPE volume backup file seems to be somewhat like a tarfile, and it is generated with a 240KB internal data block size by default. When an SMTAPE backup is conducted via NDMP, the NetApp doesn't seem to care if the NDMP Tape Agent session's block size is different than the SMTAPE block size. With an NdmpBlockSize of 63KB configured on a session, the NDMP service on the NetApp just breaks the stream of bytes from its SMTAPE generation process into 63KB blocks and transfers them to the Tape Agent. It's on the NDMP SMTAPE restore sessions that the NetApp seems to require an exactly 240KB NDMP block size, or else it fails immediately, with the error I originally reported, regardless of the DATA_BLOCK_SIZE meta option configured on the restore fileset in bareos-dir.conf (I tried setting that to 63KB at one point, too). This seems to me to be an arbitrary and harmful limitation of the NetApp, so I agree with your "buggy" assessment in this case. "The SD just returns the exact same data as the bextract only in 63 Kb e.g. the POSIX default blocksize. The NetAPP for whatever reason (probably because its not a 4 Kb chunked piece of data) doesn't like the data coming in over the NDMP protocol." I don't think that's what we're seeing here. I've run tcpdump during the restore attempts, and as far as I can tell, the SD isn't sending anything to the NetApp at all. The SD and the NetApp just exchange TCP packets with zero-byte payloads back and forth over the NDMP Tape Agent connection forever until I abort the job by stopping the Director. Is the SD not sending anything because the size of the blocks that it's reading from the backup Volumes is different from the NdmpBlockSize? Is there any indication of this in the SD debug logs? I can live with making another NDMP SMTAPE backup of the source volume using a 240KB NdmpBlockSize if I have to, I just want to make sure we understand why the SD isn't reading and sending blocks of data to the NetApp with the current configuration before I do so, because it's a big volume, and it will probably take a full 24 hours for me to back it up again. |
|
I have no idea but I would say create a small test set. As NDMP is such a weird protocol it seems everyone does it own way of implementing things. I think its a protocol violation if the NetAPP doesn't adhere to the correct blocksize given on restore then again NetAPP is one of the inventors of NDMP. But we have seen more strangeness from NetAPP, but unless I get some money to spend on it this is how far I can debug it remotely. After all this is open source you could try adding some more debugging to the NDMP tape callbacks in the SD to get a better idea what its doing. We based this on top of a reference implementation of NDMP which is also used by other open source vendors. |
|
b.t.w. as we are entering the holiday season I won't be doing much this year. | |
Ok, thanks for your help, I appreciate it. You've pointed me in the right direction, and I'll post back to this bug after I've got it figured out. Cheers! | |
Just a quick update. I tried setting the NdmpBlockSize on the source NetApp's Client definition and the Max Block Size on a new Pool and Storage Device all to 245760 bytes (240KB) , but tripped over an issue with setting the NdmpBlockSize greater than 102400 bytes (100KB). I think this is because the NDMP Channel Image Stream Buffer Size is currently set to 100KB in src/ndmp/ndmagents.h, here: #ifndef NDM_N_IMAGE_STREAM_BUF #define NDM_N_IMAGE_STREAM_BUF (100*1024) #endif As a result, when you specify the NdmpBlockSize bigger than NDM_N_IMAGE_STREAM_BUF on a backup, the image stream buffer fills up waiting for the Bareos SD to start pulling it out of the buffer and writing it to disk, but the SD is waiting for a full NDMP block to be received, which will never happen, because the NDMP channel is out of buffer space to store more data, so the SD starts advertising a zero-byte TCP window to the client (the NetApp, in this case). I'm going to increase the stream buffer size in the header file to 256KB and recompile to see if that allows me to take a backup with 240KB NDMP, tape, and SMTAPE image block sizes. |
|
Increasing that NDM_M_IMAGE_STREAM_BUF value to 256KB did indeed allow a backup to proceed with NdmpBlockSize and the volume max block size set to 245760 bytes (note that I left the volume label block size at the new default value of 63KB), but restores are still failing almost immediately with the same EOF error I originally reported. However, I am now almost certain that this error is being generated by the Bareos SD. Here's the relevant debug output from the bareos-sd process (run at debug level 999) during a restore attempt: *** sneakernet-sd: ndmp_tape.c:245-0 NDMP: [#C] [5] >C NDMP4_MOVER_READ 11 *** sneakernet-sd: ndmp_tape.c:245-0 NDMP: [#C] [6] offset=0 length=-1 *** sneakernet-sd: ndmp_tape.c:245-0 NDMP: [#C] [5] R> NDMP4_MOVER_READ 11 (13) *** sneakernet-sd: ndmp_tape.c:245-0 NDMP: [#C] [6] error=NDMP4_NO_ERR *** sneakernet-sd: ndmp_tape.c:245-0 NDMP: [#C] [9] writing 32 ... *** sneakernet-sd: ndmp_tape.c:245-0 NDMP: [#C] [9] 80 00 00 1c 00 00 00 0d 54 af 66 78 00 00 00 01 *** sneakernet-sd: ndmp_tape.c:245-0 NDMP: [#C] [9] 00 00 0a 06 00 00 00 0b 00 00 00 00 00 00 00 00 *** sneakernet-sd: ndmp_tape.c:245-0 NDMP: [#C] [8] write=32 len=32 In the snippet above, the NetApp appears to tell the MOVER state machine within Bareos SD's NDMP TAPE service to start reading the backup from the file volume. *** sneakernet-sd: record.c:669-0 Block=0 Ver=2 size=245760 *** sneakernet-sd: record.c:677-0 Enter read_record_block: remlen=245736 data_len=0 rem=0 blkver=2 *** sneakernet-sd: record.c:741-0 rd_rec_blk() got FI=SOS_LABEL SessId=1 Strm=128 len=223 remlen=245724 data_len=0 *** sneakernet-sd: record.c:805-0 Rtn full rd_rec_blk FI=SOS_LABEL SessId=1 Strm=128 len=223 *** sneakernet-sd: read_record.c:285-0 read-OK. state_bits= blk=0 rem=0 file:block=0:245973 *** sneakernet-sd: read_record.c:296-0 recno=1 state_bits= blk=0 SI=1 ST=1420780864 FI=-4 *** sneakernet-sd: read_record.c:74-0 Begin Session Record: VolSessionId=1 VolSessionTime=1420780864 JobId=128 DataLen=223 Then, the debug shows that the SD successfully read the SOS (Start of Session) record from the volume, and it correctly identifies the backup as JobId 128. *** sneakernet-sd: ndmp_tape.c:245-0 NDMP: [#C] [5] C> NDMP4_NOTIFY_MOVER_PAUSED 14 *** sneakernet-sd: ndmp_tape.c:245-0 NDMP: [#C] [6] reason=NDMP4_MOVER_PAUSE_EOF seek_position=0 *** sneakernet-sd: ndmp_tape.c:245-0 NDMP: [#C] [9] writing 40 ... *** sneakernet-sd: ndmp_tape.c:245-0 NDMP: [#C] [9] 80 00 00 24 00 00 00 0e 54 af 66 78 00 00 00 00 *** sneakernet-sd: ndmp_tape.c:245-0 NDMP: [#C] [9] 00 00 05 04 00 00 00 00 00 00 00 00 00 00 00 02 *** sneakernet-sd: ndmp_tape.c:245-0 NDMP: [#C] [9] 00 00 00 00 00 00 00 00 *** sneakernet-sd: ndmp_tape.c:245-0 NDMP: [#C] [8] write=40 len=40 Right here is the problem: immediately after the SD successfully reads the SOS record, the SD notifies the NDMP client (the NetApp) that it has paused its MOVER because it reached an unexpected EOF reading from the volume, which is incorrect. As soon as the NetApp sees this EOF notification, it tells the Director to cancel the restore job. So, any idea why the SD is pausing the MOVER and blaming it on a non-existent EOF read? |
|
I guess you only find out if you put thing under a debugger the interesting entry points are bndmp_read_data_from_block() and bndmp_tape_read(). bndmp_tape_read() is the callback called by the NDMP code for reading tape data. I also added two patches that change the checking of the SMTAPE blocksize to be in the range defined by NetAPP and one other patch allocating the image stream buffer dynamically according to the used NDMP Job. We spend quite some time on modernizing this reference implementation of the NDMP protocol but its from 2001 and that shows in some cases. Although this has my personal interest there are currently a lot of things preempting any time I can spend on it other then replying and giving some hints. |
|
I think I got it figured out Friday night. In the main loop of bndmp_read_data_from_block(), when read_next_record_from_block() succeeds, the maskedStream field (an int32_t) of the read record is checked and matched against three possible values (defined in streams.h): * STREAM_UNIX_ATTRIBUTES [1] * STREAM_FILE_DATA [2] * STREAM_NDMP_SEPERATOR [999] STREAM_UNIX_ATTRIBUTES records are silently discarded (i.e. the read loop just continues) and STREAM_FILE_DATA records are passed to the DATA agent. STREAM_NDMP_SEPERATOR and all other record types (including SOS_LABEL records) are also discarded, but not silently. Rather, the data_length variable is set to zero, and the function returns true back to bndmp_tape_read() from which it was called. The data_length value in bndmp_read_data_from_block() is passed back to the done_count variable in bndmp_tape_read(), and if done_count equals zero (which it does, in this case), then bndmp_tape_read() returns an NDMP9_EOF_ERR to its caller, which kills the restore job as soon as it begins. To fix this behavior, I modified bndmp_read_data_from_block() to silently discard the SOS_LABEL record (with maskedStream of 128) in the same manner as it does STREAM_UNIX_ATTRIBUTES records. An EOF is still generated when the STREAM_NDMP_SEPERATOR record that appears after the SMTAPE file data stream is fully read, which causes Bareos to report that the restore failed, but it actually succeeds on the NetApp (i.e. the restore target volume is fully restored from the SMTAPE backup). So, some further modification is necessary to get Bareos to close out the restore job properly, but it appears to do what I need it to do now. |
|
Thanks for the analyses it seems this is a side effect of some things changing over time in the upper levels which means that the reader now needs to handle those label records itself (e.g. ignore them.) Although your workaround probably works I think the fix is a bit more subtle e.g. we need to skip any label record e.g. check if FileIndex is negative and skip those. I added a patch that does that and also prints an error now when it gets a datastream that it doesn't like which means we sooner see these kind of problems in the debugging output may it ever occur again. I added that patch maybe you can test with these 3 patches and we can take it from there. Regarding the STREAM_NDMP_SEPERATOR that is a valid EOF marker so I would be interested as to why the restore Job is actually getting a failed state. As each NDMP implementation is different this could be some strange side effect. I do remeber that when writing the code I also got some strange error on the Oracle NDMP implementation when it asked for a certain return status after the restore but when looking better at the NDMP rfc it was obvious that was a false implementation in the NDMP server software. |
|
Could you elaborate a bit more on what error you see when restoring after the fixes you made ? Then I can either close this report or look what needs to be fixed. |
|
bareos: master 051522be 2014-12-18 11:36 Marco van Wieringen Ported: N/A Details Diff |
Add checks for SMTAPE blocksizes. |
Affected Issues 0000374 |
|
mod - src/dird/ndmp_dma.c | Diff File | ||
bareos: master 694f1364 2015-01-10 11:46 Marco van Wieringen Ported: N/A Details Diff |
Make NDMP ImageStream buffer dynamically allocated size. Instead of hardcoding the Image Stream buffer to 100 K size it to the actual recordsize in use by the NDMP Job. This way we don't have to check if the recordsize specified is outside the allowable size. |
Affected Issues 0000374 |
|
mod - src/ndmp/ndma_image_stream.c | Diff File | ||
mod - src/ndmp/ndmagents.h | Diff File | ||
bareos: master 739c923d 2015-01-13 11:36 Marco van Wieringen Ported: N/A Details Diff |
Ignore tape labels in NDMP code. When we encounter some sort of tape label in the backup stream (e.g. FileIndex < 0) then we should just skip that data and not use the default handling which leads to an EOF. When we now encounter a unknown data stream we also print an error including the number of the unwanted datastream. |
Affected Issues 0000374 |
|
mod - src/stored/ndmp_tape.c | Diff File |
Date Modified | Username | Field | Change |
---|---|---|---|
2014-12-16 17:04 | gglynn | New Issue | |
2014-12-16 20:48 | mvwieringen | Note Added: 0001111 | |
2014-12-16 20:48 | mvwieringen | Status | new => feedback |
2014-12-16 21:50 | gglynn | File Added: ndmp-snooping-output-201412161549.txt | |
2014-12-16 21:56 | gglynn | Note Added: 0001112 | |
2014-12-16 21:56 | gglynn | Status | feedback => new |
2014-12-16 22:27 | mvwieringen | Note Added: 0001113 | |
2014-12-16 22:27 | mvwieringen | Status | new => feedback |
2014-12-16 23:20 | gglynn | File Added: sneakernet-dir.restore.3.bsr | |
2014-12-16 23:28 | gglynn | Note Added: 0001114 | |
2014-12-16 23:28 | gglynn | Status | feedback => new |
2014-12-17 19:46 | mvwieringen | Note Added: 0001118 | |
2014-12-17 19:46 | mvwieringen | Assigned To | => mvwieringen |
2014-12-17 19:46 | mvwieringen | Status | new => feedback |
2014-12-17 20:45 | gglynn | File Added: bareos-sd-debuglog-part1.txt | |
2014-12-17 20:45 | gglynn | File Added: bareos-sd-debuglog-part2.txt | |
2014-12-17 20:47 | gglynn | Note Added: 0001119 | |
2014-12-17 20:47 | gglynn | Status | feedback => assigned |
2014-12-17 20:47 | gglynn | Note Edited: 0001119 | |
2014-12-17 20:54 | mvwieringen | Note Added: 0001120 | |
2014-12-17 20:54 | mvwieringen | Note Added: 0001121 | |
2014-12-17 20:54 | mvwieringen | Status | assigned => feedback |
2014-12-17 21:10 | gglynn | File Added: bareos-dir-debuglog.txt | |
2014-12-17 21:12 | gglynn | Note Added: 0001122 | |
2014-12-17 21:12 | gglynn | Status | feedback => assigned |
2014-12-17 21:17 | gglynn | Note Added: 0001123 | |
2014-12-17 22:00 | mvwieringen | Note Added: 0001124 | |
2014-12-17 22:00 | mvwieringen | Status | assigned => feedback |
2014-12-17 22:12 | gglynn | Note Added: 0001125 | |
2014-12-17 22:12 | gglynn | Status | feedback => assigned |
2014-12-17 22:13 | gglynn | Note Edited: 0001125 | |
2014-12-17 22:24 | mvwieringen | Note Added: 0001126 | |
2014-12-17 22:25 | mvwieringen | Status | assigned => feedback |
2014-12-18 03:37 | gglynn | Note Added: 0001128 | |
2014-12-18 03:37 | gglynn | Status | feedback => assigned |
2014-12-18 03:43 | gglynn | Note Edited: 0001128 | |
2014-12-18 04:17 | gglynn | File Added: bareos-dir-vol2restore-201412172200-debuglog.txt | |
2014-12-18 04:17 | gglynn | File Added: bareos-sd-vol2restore-201412172200-debuglog-part1.txt | |
2014-12-18 04:17 | gglynn | File Added: bareos-sd-vol2restore-201412172200-debuglog-part2.txt | |
2014-12-18 04:18 | gglynn | Note Added: 0001129 | |
2014-12-18 09:49 | mvwieringen | File Deleted: bareos-sd-vol2restore-201412172200-debuglog-part2.txt | |
2014-12-18 09:49 | mvwieringen | File Deleted: bareos-sd-vol2restore-201412172200-debuglog-part1.txt | |
2014-12-18 09:49 | mvwieringen | File Deleted: bareos-dir-vol2restore-201412172200-debuglog.txt | |
2014-12-18 09:49 | mvwieringen | File Deleted: bareos-dir-debuglog.txt | |
2014-12-18 09:49 | mvwieringen | File Deleted: bareos-sd-debuglog-part2.txt | |
2014-12-18 09:50 | mvwieringen | File Deleted: bareos-sd-debuglog-part1.txt | |
2014-12-18 09:50 | mvwieringen | File Deleted: sneakernet-dir.restore.3.bsr | |
2014-12-18 09:56 | mvwieringen | Note Added: 0001131 | |
2014-12-18 09:56 | mvwieringen | Status | assigned => feedback |
2014-12-18 14:32 | mvwieringen | Relationship added | related to 0000345 |
2014-12-18 23:08 | gglynn | Note Added: 0001134 | |
2014-12-18 23:08 | gglynn | Status | feedback => assigned |
2014-12-18 23:09 | gglynn | Note Edited: 0001134 | |
2014-12-19 10:15 | mvwieringen | Note Added: 0001135 | |
2014-12-19 10:15 | mvwieringen | Status | assigned => feedback |
2014-12-19 16:08 | gglynn | Note Added: 0001136 | |
2014-12-19 16:08 | gglynn | Status | feedback => assigned |
2014-12-19 16:36 | mvwieringen | Note Added: 0001137 | |
2014-12-19 16:37 | mvwieringen | Note Added: 0001138 | |
2014-12-19 16:37 | mvwieringen | Status | assigned => feedback |
2014-12-19 16:50 | gglynn | Note Added: 0001140 | |
2014-12-19 16:50 | gglynn | Status | feedback => assigned |
2015-01-09 05:15 | gglynn | Note Added: 0001175 | |
2015-01-09 08:21 | gglynn | Note Added: 0001176 | |
2015-01-10 11:01 | mvwieringen | File Added: 0001-Add-checks-for-SMTAPE-blocksizes.patch | |
2015-01-10 11:01 | mvwieringen | File Added: 0002-Make-NDMP-ImageStream-buffer-dynamically-allocated-s.patch | |
2015-01-10 11:07 | mvwieringen | Note Added: 0001177 | |
2015-01-10 11:07 | mvwieringen | Status | assigned => feedback |
2015-01-10 12:19 | mvwieringen | File Deleted: 0002-Make-NDMP-ImageStream-buffer-dynamically-allocated-s.patch | |
2015-01-10 12:20 | mvwieringen | File Added: 0002-Make-NDMP-ImageStream-buffer-dynamically-allocated-s.patch | |
2015-01-12 16:15 | gglynn | Note Added: 0001179 | |
2015-01-12 16:15 | gglynn | Status | feedback => assigned |
2015-01-12 18:07 | gglynn | Note Edited: 0001179 | |
2015-01-13 10:47 | mvwieringen | Note Added: 0001180 | |
2015-01-13 10:48 | mvwieringen | File Added: 0003-Ignore-tape-labels-in-NDMP-code.patch | |
2015-01-13 10:50 | mvwieringen | Status | assigned => feedback |
2015-01-14 20:34 | mvwieringen | File Deleted: 0003-Ignore-tape-labels-in-NDMP-code.patch | |
2015-01-14 20:35 | mvwieringen | File Added: 0003-Ignore-tape-labels-in-NDMP-code.patch | |
2015-01-28 12:44 | mvwieringen | Relationship added | child of 0000393 |
2015-02-02 13:22 | mvwieringen | Changeset attached | => bareos master 739c923d |
2015-02-02 13:23 | mvwieringen | Changeset attached | => bareos master 694f1364 |
2015-02-02 13:25 | mvwieringen | Changeset attached | => bareos master 051522be |
2015-02-02 13:25 | mvwieringen | File Deleted: 0003-Ignore-tape-labels-in-NDMP-code.patch | |
2015-02-02 13:26 | mvwieringen | File Deleted: 0002-Make-NDMP-ImageStream-buffer-dynamically-allocated-s.patch | |
2015-02-02 13:26 | mvwieringen | File Deleted: 0001-Add-checks-for-SMTAPE-blocksizes.patch | |
2015-02-02 13:27 | mvwieringen | Note Added: 0001250 | |
2015-03-24 11:25 | mvwieringen | Assigned To | mvwieringen => |
2015-03-24 11:25 | mvwieringen | Status | feedback => closed |
2015-03-24 11:25 | mvwieringen | Fixed in Version | => 14.2.3 |