How to migrate Confluence Pages using CQL Query Selectors

This article describes how to use CQL queries to select source pages and how to configure which CQL query selector is migrated to which SharePoint site.

WikiTraccs allows selecting source pages for migration using CQL queries.

Migrating spaces vs. migrating via CQL query

Before release 1.8.0 WikiTraccs only supported the configuration of entire Confluence spaces for migration and for link resolution. This meant that all pages of a given source space would be migrated to the configured target site.

Feedback from clients let to the introduction of an additional, more flexible way to select source pages: pages can now also be selected via CQL query. This allows to select specific pages for migration.

For WikiTraccs the technical difference is minimal, from a page selection perspective.

For you, the difference is also minimal, from a configuration perspective. The Space Inventory list is still used to configure which pages should be migrated to which target. Simply write your CQL query to the WT_In_SpaceKey field.

The following screenshot shows how different CQL queries are used to select pages by their label, migrating each label to a different target site:

Space Inventory list showing CQL query selectors for source page selection.

Note: the field is still named WT_In_SpaceKey since initially only a space key was supported there.

Everything else from the space selector article applies as well, so please refer to this article: How to map Confluence Spaces to SharePoint Sites. The target site can be configured. If there is no target site set, the default target site will be chosen.

There are some consequences though, when using CQL queries.

The consequences of using CQL queries source selectors

Here’s the list of topics that can get more complicated when dealing with CQL queries:

  • Restricted Pages: see note above
  • Link Resolution: Confluence pages linking to other spaces, pages, or attachments require more time to migrate, and put more load on Confluence
  • Query Result Size: using CQL queries you can select a large amount of pages (potentially all)
  • Duplicate Pages: one page can be selected by multiple CQL queries, leading to pages being migrated multiple times

For each CQL query selector you add to the Space Inventory WikiTraccs has to issue one additional request to Confluence for each link it needs to resolve.

Assume you are migrating 10,000 pages, with 2 links to other pages on each of those 10,000 pages. Further assume you configured 100 CQL query selectors in the Space Inventory list. That means that 2*100 callbacks to Confluence would need to be issued for each of those 10,000 pages, amounting to 2,000,000 calls overall during the migration of those 10,000 pages.

Why is that?

To transform links from Confluence to SharePoint, WikiTraccs needs to know which target site a page will be migrated to. But how would WikiTraccs know which CQL query contains which page? To learn this WikiTraccs asks Confluence for each CQL query if a given page is included.

Example:

  • page A links to page B with ID 2000
  • WikiTraccs needs to find out which SharePoint site page B will be migrated to, to create the proper SharePoint link
  • there are two CQL queries configured: label="one", mapped to target site https://contoso.sharepoint.com/sites/one, and label="two", mapped to target site https://contoso.sharepoint.com/sites/two
  • WikiTraccs creates a modified CQL query (label="one") AND (id=2000) to check if page B is covered by CQL query label="one"
    • there are no results - page B is not covered
  • WikiTraccs creates a second modified CQL query (label="two") AND (id=2000) to check if page B is covered by CQL query label="two"
    • there is one result - page B is covered!
  • WikiTraccs now knows that page B will be migrated to site https://contoso.sharepoint.com/sites/two
  • thus the correct link to page B is something like https://contoso.sharepoint.com/sites/two/SitePages/SPC-Page-B-2000.aspx

Note: WikiTraccs does not have to do this when only performing space-based migrations (without any CQL queries), since the target site can easily be looked up via each page’s space key in the Space Inventory list.

Duplicate pages can be created

You can write CQL queries might include overlapping results.

Take for example the CQL queries label="one" and label="two".

If Confluence pages only ever have one of the labels one or two, those selectors choose two disjuct sets of pages. Which is good.

But what about pages having both labels one and two? Those pages will be chosen by both CQL queries, migrating each of those pages two times.

The consequences are:

  • duplicate content in SharePoint as multiple copies of a page are present
  • fuzziness when it comes to linking to those pages as other SharePoint pages can only link to one of the duplicates; which one is not defined
Last modified March 5, 2024