RSS

Skip Broken Confluence Attachments During Migration

A single broken Confluence attachment can fail an entire page migration. This post shows how to spot broken attachments in the logs and how to skip them using the new Ignore Broken Attachments setting.

Sometimes Confluence attachments are broken, triggering connection resets or other errors during download. Every once in a while, a customer discovers such broken attachments and the root cause often is some corrupted state in either the Confluence database, or a corrupted attachment file.

By default, a single broken attachment fails the entire page. A new setting lets you skip those and keep migrating.

How to Spot Broken Attachments?

When an attachment download fails, the WikiTraccs common log files show errors like these:

[ERR] [https://wiki.contoso.com] Exception while downloading
  'https://wiki.contoso.com/download/attachments/12345678/broken-file.zip':
  System.IO.IOException: Unable to read data from the transport connection:
  An existing connection was forcibly closed by the remote host.
   ---> System.Net.Sockets.SocketException (10054): An existing connection
        was forcibly closed by the remote host.
   at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs...
   at System.Net.Security.SslStream.<FillBufferAsync>...
   at System.Net.Http.HttpConnection.SendAsync(...)

This then cascades into:

[ERR] Cannot store or apply content. Additional information: job result
  details: 'Storing the page snapshot returned false, something is wrong',
  status code: InternalServerError.

The page then is either missing in SharePoint, or created empty with a Failed Transformations count of 9001 (which is a marker value).

You can also verify the brokenness of an attachment by downloading it manually in the browser. This will often fail, or only succeed after several retries - clearly indicating that there is an issue with it.

How to Skip Over Broken Attachments?

Note: This setting is available as of WikiTraccs v1.32.14

In WikiTraccs.GUI, in the menu bar, click Settings -> Configure Transformation. Then go to the Misc tab and check Ignore broken Confluence attachments.

With this setting active:

  • WikiTraccs catches attachment download errors and skips the broken attachment instead of failing the whole page
  • The page is still created in SharePoint - with all attachments that did download successfully
  • The skipped attachment is logged as an error in the log file:
    [ERR] Failed to download attachment 'broken-file.zip' for page 12345678;
      URL was 'https://wiki.contoso.com/download/attachments/...';
      skipping attachment because IgnoreBrokenAttachments is enabled
    
  • The Failed Transformations count on the SharePoint page reflects the skipped attachment, so you can filter the Site Pages library to find affected pages

Workarounds

If you don’t want WikiTraccs to skip over broken attachments, there are ways to manually work around the issue.

You can delete the offending attachments in Confluence. This might be an option if losing those files is of no concern. This will unblock page migration.

An alternative approach is re-uploading the attachment. If you can download the file in the browser (and that might not be the case), download the attachment and re-upload it again. Sometimes this fixes the underlying issue in Confluence.

Wrap

This setting trades completeness for progress - pages with broken attachments get migrated nevertheless. Review the log file and the Failed Transformations column in Site Pages to identify which attachments were skipped.