Updating previously migrated pages

This article explains how to update older migrated pages where the original Confluence pages have since been changed

Starting with release v1.7.0 WikiTraccs supports updating migrated pages.

Why update older migrated pages?

Updating older pages is relevant when pages migrated from Confluence to SharePoint are not protected from changes in Confluence, during or after the migration.

When users edit a page in Confluence, but that page already has been migrated to SharePoint, the SharePoint page will be outdated. It does not reflect the current state of the Confluence page anymore.

That’s where the update mode comes in, sometimes also called delta migration.

How to detect pages that have been changed since the last migration?

You simply start a migration to learn about the update state of already migrated pages.

Every time a migration starts, WikiTraccs will check if the SharePoint target site already contains migrated pages (from a previous migration). If such pages exist, it will compare the modification date both in SharePoint and in Confluence. The result is written to progress log files.

Have a look at the documentation about progress log files: Monitoring Confluence to SharePoint Migration Progress > Using progress log files to get insights.

One of those types of progress log files contains information about the update state of already migrated pages; that’s the files with 25-update-state-of-migrated-pages in their name.

Here’s a screenshot showing progress log files for 3 migrated spaces (each having 3 progress log files):

Those spaces have been migrated before and when starting another migration WikiTraccs checked for outdated pages, and found two spaces to have outdated pages.

You can quickly see how many outdated pages there are as WikiTraccs writes this information to the file name of the progress log file, like marker-x-needupdate-y-other.

In the screenshot above, one space has 2 pages that need an update, for the other space it’s 3 pages. One space is up-to-date since the marker part is missing from the file name.

How to tell WikiTraccs to update outdated pages?

You tell WikiTraccs which pages it should update by marking those pages in the progress log files, and copying those files to the WikiTraccs.GUI\input folder.

Here are the steps:

  1. locate a progress log file with 25-update-state-of-migrated-pages and marker in its file name
    • this progress log file contains information about outdated pages
  2. open the progress log file in a text editor - this files contains a header surrounded by +++ (ignore this), and multiple lines, each one representing a migrated page (you want to look at those)
  3. take note of the update state near the end of each line, like uptodate or needsupdate
    • a line might look like this: SPACEKEY Page 123456789 Page Title /display/SPACEKEY/Page+Title 2023-08-31T10:16:22 2023-07-10T13:26:24 needsupdate
    • the values in this line (like SPACEKEY, page ID 123456789, etc.) are separated by tabulator (the “tab key”)
    • note: you might know CSV-files, where values in each line are separated by comma or semicolon; here the tabulator is used instead
  4. mark pages you want WikiTraccs to update by adding an x to the end of the page’s line
    • make sure to separate the update state value (like needsupdate) and the x you add with exactly one tab character; note: this tab character might already be there
    • the line then should look like this: SPACEKEY Page 123456789 Page Title /display/SPACEKEY/Page+Title 2023-08-31T10:16:22 2023-07-10T13:26:24 needsupdate x - note the x at the end that marks the page for update
  5. save the progress log file when you are finished marking pages for update
  6. now copy this modified progress log file to the input folder, that is located next to the logs folder; note: both of those folders are in the same location as WikiTraccs.GUI.exe, which you used to run WikiTraccs
  7. run WikiTraccs.GUI
  8. start a migration as usual, by selecting the Start transformation button in WikiTraccs.GUI

WikiTraccs looks at the input folder when a migration starts. It automatically processes all progress log files it finds in there. Pages marked for update will now be migrated again, overwriting corresponding existing pages that are present in SharePoint.

How does the delta migration differ from a normal migration?

There are certain differences when WikiTraccs finds and processes progress log files from the input folder.

If there are pages marked for update:

  • WikiTraccs will only migrate those pages marked for update and skip any spaces that have been marked for migration in the Space Inventory
  • WikiTraccs will not remove any files from the input folder
    • you have to remove those files manually, otherwise WikiTraccs will migrate them again and again, for each migration you start
  • WikiTraccs will not write progress log files for updated pages
    • to get updated progress log files, remove any files from the input folder (to end delta migration mode) and start a regular migration; new progress log files will then be created that should reflect that pages are now up-to-date

Notes about limitations of delta migrations

There are some points to be aware of when it comes to delta migrations.

Modification detection

WikiTraccs can detect changes to Confluence pages, but not attachments. For example, when an attachment is added to a Confluence page, this does not change the modification date of the page and WikiTraccs still sees this page as up-to-date.

This means that Confluence pages where only attachments changed since having been migrated are reported as uptodate in the progress log file.

Note that you can still mark those uptodate pages for update, if you want to force updating pages that WikiTraccs didn’t detect as changed.

WikiTraccs derives the file names of SharePoint modern pages from Confluence page titles. That means that changing a page’s title in Confluence can cause the following:

  • duplicate pages being created in SharePoint, since the updated page gets a new file name; WikiTraccs does not rename or remove the existing page (yet)
  • page links can break in SharePoint, when other SharePoint pages link to a page by a name that now changed due to the page being updated

Note that this does only apply when page titles change in Confluence between an initial migration and a subsequent delta migration. So those changes should be kept to a minimum.

Changes of SharePoint pages

Changed SharePoint pages will be overwritten when running a delta migration.

Keep in mind that, when marking a page for update, this will forcefully overwrite the target SharePoint page. Even if this SharePoint page was modified since the initial migration.

Performance

The delta migration is less efficient compared to the initial migration. More requests to Confluence might be made compared to bulk operations being done in the initial migration.

Please get in touch if you encounter any issues, or have suggestions that would make your life easier.

Last modified March 6, 2024