How to migrate Confluence Pages using CQL Query Selectors
NoteCQL query selectors have been introduced in WikiTraccs release 1.8.0.
WikiTraccs allows selecting source pages for migration using CQL queries.
Migrating spaces vs. migrating via CQL query
NoteRefer to the Atlassian documentation about the CQL query syntax: Advanced Searching using CQL.
Before release 1.8.0 WikiTraccs only supported the configuration of entire Confluence spaces for migration and for link resolution. This meant that all pages of a given source space would be migrated to the configured target site.
Feedback from clients let to the introduction of an additional, more flexible way to select source pages: pages can now also be selected via CQL query. This allows to select specific pages for migration.
For WikiTraccs the technical difference is minimal, from a page selection perspective.
For you, the difference is also minimal, from a configuration perspective. The Space Inventory list is still used to configure which pages should be migrated to which target. Simply write your CQL query to the WT_In_SpaceKey field.
The following screenshot shows how different CQL queries are used to select pages by their label, migrating each label to a different target site:
Space Inventory list showing CQL query selectors for source page selection.
Note: the field is still named WT_In_SpaceKey since initially only a space key was supported there.
Everything else from the space selector article applies as well, so please refer to this article: How to map Confluence Spaces to SharePoint Sites. The target site can be configured. If there is no target site set, the default target site will be chosen.
There are some consequences though, when using CQL queries.
The consequences of using CQL queries source selectors
Note about Page RestrictionsCQL queries will only surface restricted pages when the logged-in user is explicitly listed in the page restrictions (either directly or as member of a group). Confluence admin accounts that are allowed to access all pages (even restricted ones) are often not part of page restriction configurations and thus restricted pages will be missing in the CQL query result. This is Confluence out-of-the-box behavior.
Here’s the list of topics that can get more complicated when dealing with CQL queries:
- Restricted Pages: see note above
- Link Resolution: Confluence pages linking to other spaces, pages, or attachments require more time to migrate, and put more load on Confluence
- Query Result Size: using CQL queries you can select a large amount of pages (potentially all)
- Duplicate Pages: one page can be selected by multiple CQL queries, leading to pages being migrated multiple times
If you want to test a CQL query you can do so in the browser as follows:
- open a browser and log in to Confluence
- in the same browser window, enter the following in the address bar:
https://<confluencebaseurl>/rest/api/content/search?start=0&limit=200&cql=type="page"(note: make sure to replace
<confluencebaseurl>with the base URL of your Confluence); press Return
The browser window should now show a wall of text. This is the result for the CQL query
type="page" which selects “all pages”.
Link resolution is more difficult
For each CQL query selector you add to the Space Inventory WikiTraccs has to issue one additional request to Confluence for each link it needs to resolve.
Assume you are migrating 10,000 pages, with 2 links to other pages on each of those 10,000 pages. Further assume you configured 100 CQL query selectors in the Space Inventory list. That means that 2*100 callbacks to Confluence would need to be issued for each of those 10,000 pages, amounting to 2,000,000 calls overall during the migration of those 10,000 pages.
Why is that?
To transform links from Confluence to SharePoint, WikiTraccs needs to know which target site a page will be migrated to. But how would WikiTraccs know which CQL query contains which page? To learn this WikiTraccs asks Confluence for each CQL query if a given page is included.
- page A links to page B with ID 2000
- WikiTraccs needs to find out which SharePoint site page B will be migrated to, to create the proper SharePoint link
- there are two CQL queries configured:
label="one", mapped to target site
label="two", mapped to target site
- WikiTraccs creates a modified CQL query
(label="one") AND (id=2000)to check if page B is covered by CQL query
- there are no results - page B is not covered
- WikiTraccs creates a second modified CQL query
(label="two") AND (id=2000)to check if page B is covered by CQL query
- there is one result - page B is covered!
- WikiTraccs now knows that page B will be migrated to site
- thus the correct link to page B is something like
Note: WikiTraccs does not have to do this when only performing space-based migrations (without any CQL queries), since the target site can easily be looked up via each page’s space key in the Space Inventory list.
Duplicate pages can be created
You can write CQL queries might include overlapping results.
Take for example the CQL queries
If Confluence pages only ever have one of the labels
two, those selectors choose two disjuct sets of pages. Which is good.
But what about pages having both labels
two? Those pages will be chosen by both CQL queries, migrating each of those pages two times.
The consequences are:
- duplicate content in SharePoint as multiple copies of a page are present
- fuzziness when it comes to linking to those pages as other SharePoint pages can only link to one of the duplicates; which one is not defined