How to migrate Confluence Content using CQL Query Selectors

This article describes how to use CQL queries to select source pages and how to configure which CQL query selector is migrated to which SharePoint site.

Note

CQL query selectors have been introduced in WikiTraccs release 1.8.0.

WikiTraccs allows selecting source pages for migration using CQL queries.

Migrating spaces vs. migrating via CQL query

Note

Refer to the Atlassian documentation about the CQL query syntax: Advanced Searching using CQL.

Before release 1.8.0 WikiTraccs only supported the configuration of entire Confluence spaces for migration and for link resolution. This meant that all pages of a given source space would be migrated to the configured target site.

Feedback from clients let to the introduction of an additional, more flexible way to select source pages: pages can now also be selected via CQL query. This allows to select specific pages for migration.

For WikiTraccs the technical difference is minimal, from a page selection perspective.

For you, the difference is also minimal, from a configuration perspective. The Space Inventory list is still used to configure which pages should be migrated to which target. Simply write your CQL query to the WT_In_SpaceKey field.

The following screenshot shows how different CQL queries are used to select pages by their label, migrating each label to a different target site:

Space Inventory list showing CQL query selectors for source page selection.

Note: the field is still named WT_In_SpaceKey since initially only a space key was supported there.

Everything else from the space selector article applies as well, so please refer to this article: How to map Confluence Spaces to SharePoint Sites. The target site can be configured. If there is no target site set, the default target site will be chosen.

There are some consequences though, when using CQL queries.

The consequences of using CQL queries source selectors

Note about Page Restrictions

CQL queries will only surface restricted pages when the logged-in user is explicitly listed in the page restrictions (either directly or as member of a group). Confluence admin accounts that are allowed to access all pages (even restricted ones) are often not part of page restriction configurations and thus restricted pages will be missing in the CQL query result. This is Confluence out-of-the-box behavior.

Here’s the list of topics that can get more complicated when dealing with CQL queries:

Restricted Pages: see note above
Link Resolution: Confluence pages linking to other spaces, pages, or attachments require more time to migrate, and put more load on Confluence
Query Result Size: using CQL queries you can select a large amount of pages (potentially all)
Duplicate Pages: one page can be selected by multiple CQL queries, leading to pages being migrated multiple times

Tip

If you want to test a CQL query you can do so in the browser as follows:

open a browser and log in to Confluence
in the same browser window, enter the following in the address bar: https://<confluencebaseurl>/rest/api/content/search?start=0&limit=200&cql=type="page" (note: make sure to replace <confluencebaseurl> with the base URL of your Confluence); press Return

The browser window should now show a wall of text. This is the result for the CQL query type="page" which selects “all pages”.

Link resolution is more difficult

For each CQL query selector you add to the Space Inventory WikiTraccs has to issue one additional request to Confluence for each link it needs to resolve.

Assume you are migrating 10,000 pages, with 2 links to other pages on each of those 10,000 pages. Further assume you configured 100 CQL query selectors in the Space Inventory list. That means that 2*100 callbacks to Confluence would need to be issued for each of those 10,000 pages, amounting to 2,000,000 calls overall during the migration of those 10,000 pages.

Why is that?

To transform links from Confluence to SharePoint, WikiTraccs needs to know which target site a page will be migrated to. But how would WikiTraccs know which CQL query contains which page? To learn this WikiTraccs asks Confluence for each CQL query if a given page is included.

Example:

page A links to page B with ID 2000
WikiTraccs needs to find out which SharePoint site page B will be migrated to, to create the proper SharePoint link
there are two CQL queries configured: label="one", mapped to target site https://contoso.sharepoint.com/sites/one, and label="two", mapped to target site https://contoso.sharepoint.com/sites/two
WikiTraccs creates a modified CQL query (label="one") AND (id=2000) to check if page B is covered by CQL query label="one"
- there are no results - page B is not covered
WikiTraccs creates a second modified CQL query (label="two") AND (id=2000) to check if page B is covered by CQL query label="two"
- there is one result - page B is covered!
WikiTraccs now knows that page B will be migrated to site https://contoso.sharepoint.com/sites/two
thus the correct link to page B is something like https://contoso.sharepoint.com/sites/two/SitePages/SPC-Page-B-2000.aspx

Note: WikiTraccs does not have to do this when only performing space-based migrations (without any CQL queries), since the target site can easily be looked up via each page’s space key in the Space Inventory list.

Duplicate pages can be created

You can write CQL queries might include overlapping results.

Take for example the CQL queries label="one" and label="two".

If Confluence pages only ever have one of the labels one or two, those selectors choose two disjuct sets of pages. Which is good.

But what about pages having both labels one and two? Those pages will be chosen by both CQL queries, migrating each of those pages two times.

The consequences are:

duplicate content in SharePoint as multiple copies of a page are present
fuzziness when it comes to linking to those pages as other SharePoint pages can only link to one of the duplicates; which one is not defined

Link Transformation

CQL query selectors are relevant for link transformation.

When WikiTraccs transforms a cross-space link from PAGE_SOURCE to PAGE_TARGET, all selectors in the Space Inventory will be checked if they contain PAGE_TARGET.

Having found a selector that contains PAGE_TARGET, WikiTraccs uses this selector’s WT_Setting_TargetSiteRootUrl value to create the link.

Be sure to fill those WT_Setting_TargetSiteRootUrl columns for all selectors, not only the ones to be migrated.

Last modified June 20, 2024