How to migrate Confluence Content using Content ID Selectors

This article describes how to use a list of Confluence content IDs to select source content and how to configure which content IDs are migrated to which SharePoint site.

WikiTraccs allows selecting Confluence source content for migration using Confluence content IDs.

Why use a content ID selector?

If you have complex requirements with regard to which Confluence content is migrated to which SharePoint site, content ID selectors allow for full flexibility.

You’ll prepare a content ID selector for WikiTraccs that looks like this:

123456789;#page,234567891;#page,345678912;#page,4567891234;#blogpost,5678912345;#blogpost

Above selector contains the content IDs of 3 pages and 2 blogs.

How you end up with the list of IDs to migrate is entirely up to you.

You can run a complex database query in an on-premises Confluence environment. Or you can use Excel-based ID filtering on space content reports for Confluence Cloud.

WikiTraccs’ Content ID Format

WikiTraccs needs each content ID to follow this pattern: ID;#TYPE

  • ID is the actual Confluence content ID, like 123456789
  • ;# is a delimiter to separate the ID from the type
  • TYPE is the Confluence content type like page and blogpost (note: more content types will be supported in the future)

Multiple content IDs are delimited by comma, like ID;#TYPE,ID;#TYPE,ID;#TYPE.

Where to configure content ID selectors?

For each new content ID selector you want to create, add a new item in the Space Inventory.

Use the WT_Setting_ContentSelectorValue field to specify the content ID selector value.

Set the WT_In_CfSiteId field to the Confluence base address (like https://contoso.atlassian.net/wiki or https://wiki.contoso.com). Look at the other entries in the Space Inventory and copy the value from those entries that WikiTraccs created for spaces.

Here’s what the Space Inventory list might look like after adding two content ID selectors:

WT_In_CfSiteIdWt_Setting_RequestTransformationWT_Setting_TargetSiteRootUrlWT_Setting_ContentSelectorValue
https://contoso.com/wikixhttps://contoso.sharepoint.com/sites/ProjectAlpha123456789;#page,234567891;#page,345678912;#page,4567891234;#blogpost,5678912345;#blogpost
https://contoso.com/wikixhttps://contoso.sharepoint.com/sites/ProjectBeta101;#page,201;#blogpost

Note: Although content ID selectors are added to the “Space” Inventory, a space key or space ID doesn’t have to be set.

Migrating via CQL query vs. migrating via content IDs

CQL queries allow filtering content by IDs, just like content ID selectors.

But CQL queries in Confluence have one big limitation with regard to restrictions. Restricted contents might be missing from the query result, even when using a migration account that has admin permissions. Have a look at the CQL selector article for details: How to migrate Confluence Pages using CQL Query Selectors.

A WikiTraccs content ID selector does not have this restriction.

Why is that?

When handling a content ID selector WikiTraccs will also try to get chunks of content via CQL queries, to speed up things. WikiTraccs then checks if expected IDs are missing from the CQL query result. If there are IDs missing, WikiTraccs assumes that those are restricted and that the current migration account should be allowed to access them (just not via CQL).

For each ID that is still missing from the query result WikiTraccs will get those one by one via their ID. This will succeed if the content exists and the migration account is allowed to view it.

The consequences of using content ID source selectors

Please refer to the respective section of this article: How to migrate Confluence Pages using CQL Query Selectors. The reasoning about duplicates and links also applies to content ID selectors.

Furthermore, retrieving contents via single IDs for a large selector will take more time than retrieving all contents of a space or CQL query. The more content there is where the migration account is not direct part of the restriction configuration, the longer it takes to retrieve the selector’s content.

SQL Snippets

Using SQL you can get content IDs for your selector from the Confluence database.

Here are sample SQL statements to get them in the right format for WikiTraccs (like 123456789;#page,234567891;#blogpost).

Get the content ID selector that includes ALL Confluence pages and blogposts:

SELECT count(*), string_agg(contentId || ';#' || contenttype, ',') AS contentIdSelectorValue
FROM (
    SELECT contentId, LOWER(contenttype) AS contenttype
    FROM content
    WHERE (contenttype='PAGE' OR contenttype='BLOGPOST')
      AND prevver IS NULL
      AND content_status='current'
    ORDER BY contenttype, contentId
) subquery;

Get the content ID selector that includes Confluence pages and blogposts from space MYSPACEKEY:

SELECT count(*), string_agg(contentId || ';#' || contenttype, ',') AS contentIdSelectorValue
FROM (

	SELECT contentid, LOWER(contenttype) AS contenttype
	FROM content LEFT JOIN spaces ON content.spaceid = spaces.spaceid
	WHERE (contenttype='PAGE' OR contenttype='BLOGPOST') 
		AND prevver IS NULL 
		AND content_status='current' 
		AND spacekey='MYSPACEKEY'
    ORDER BY contenttype, contentId
) subquery;

Replace MYSPACEKEY with the key of the space you want to get content IDs for.

Content ID selectors are relevant for link transformation.

When WikiTraccs transforms a cross-space link from PAGE_SOURCE to PAGE_TARGET, all selectors in the Space Inventory will be checked if they contain PAGE_TARGET.

Having found a selector that contains PAGE_TARGET, WikiTraccs uses this selector’s WT_Setting_TargetSiteRootUrl value to create the link.

Be sure to fill those WT_Setting_TargetSiteRootUrl columns for all selectors, not only the ones to be migrated.

Last modified June 22, 2024