RSS

How to run parallel WikiTraccs migrations?

You might want to run multiple WikiTraccs migrations in parallel, to speed up things. Let’s take a look on how to do this and what to look after.

You can run multiple WikiTraccs instances in parallel to speed up the overall migration. You’ll need a separate machine or VM per WikiTraccs instance as only one instance is allowed to run on the same machine.

Some definitions

The computer or virtual machine (VM) WikiTraccs is running on is the machine or migration machine.

We’ll call the running WikiTraccs program a WikiTraccs instance. When two WikiTraccs programs are running, that would be two instances.

The central SharePoint site that WikiTraccs needs and where the Space Inventory list (and more) is located is called WikiTraccs site.

One WikiTraccs instance is allowed to run on one machine.

How to run multiple WikiTraccs instances in parallel?

Looking at client’s projects I see different approaches.

Some create cloud-based virtual machines in Azure or AWS. Some use spare laptops.

Install WikiTraccs to each of those machines and your are good to go.

Configuring WikiTraccs when parallelizing the migration

There is local and shared configuration.

Configure locally on all machines

Make sure you configure the WikiTraccs settings on all machines. This affects things like comments migration, macro ignore list, migration mode etc. They can be different on different machines, but usually are the same.

Locally stored settings are:

  • everything you enter into the blue WikiTraccs.GUI window
  • settings available in the blue WikiTraccs.GUI window via the Settings menu
  • any settings configured via appsettings.json

Note that the Space Inventory that contains the source-to-target mapping is NOT a local setting, but stored in SharePoint, in the WikiTraccs site. But the link to the WikiTraccs site (that contains the Space Inventory) is part of the local settings.

Be aware of shared configuration via WikiTraccs site

One WikiTraccs site can be used by multiple WikiTraccs instances, or each instance can use their own WikiTraccs site.

With the same WikiTraccs site, multiple WikiTraccs instances share the same mappings.

There is one caveat of sharing one WikiTraccs site: they will also want to migrate all the same content.

We’ll look at two two scenarios now: one with shared WikiTraccs site, one without.

Parallelization Scenario - Shared WikiTraccs Site

Multiple WikiTraccs instances share the same WikiTraccs site.

Two WikiTraccs instances sharing a single WikiTraccs site.

To achieve that, simply configure the same WikiTracc site address for all WikiTraccs instances.

Pro:

  • you need to configure the mapping in the Space Inventory only once
  • you need to configure the user and group mapping only once
  • easy, as there is only one WikiTraccs site

Contra:

  • you need to change the WT_Setting_RequestTransformation value in the Space Inventory before starting each WikiTraccs instance as otherwise multiple instances migrate the same pages (!); one approach is to use PnP.PowerShell to manipulate the space inventory

Parallelization Scenario - Separate WikiTraccs Sites

Two WikiTraccs instances each using their own WikiTraccs site.

To achieve that, use a different WikiTracc site address for each WikiTraccs instance.

Pro:

  • the WT_Setting_RequestTransformation value in the Space Inventory can be configured per WikiTraccs instance, as each instance has its own Space Inventory
  • the snapshots that WikiTraccs takes of pages (in the Confluence Content Snapshots library), that are stored in the WikiTraccs site, are now scattered across multiple WikiTraccs sites, which can be an advantage permission-wise (when different teams do the migration that are not allowed to see each other’s content)

Contra:

  • the mapping in the Space Inventory needs to be configured in all WikiTraccs sites, for all spaces and CQL selectors (!)
  • the user and group mapping needs to be configured in all WikiTraccs sites

General parallelization notes

Multiple instances can migrate to the same target SharePoint site, as long as they don’t migrate the same content at the same time to the same target.

Parallel migration is covered by the WikiTraccs license, without limits.

You shouldn’t go too crazy with the parallel instances. There are limits in place set by Microsoft on how fast an application can access the Microsoft 365 services. Microsoft will throttle WikiTraccs, if it goes to fast. This means: WikiTraccs will have to wait some time (a couple of seconds, or even minutes) until it is again allowed to send content. Those limits are set by Microsoft, adjusted to overall cloud load, and dependent on the number of licenses in the tenant (among other factors).

Roadmap

Configuring for parallel migration at the moment is not as easy as it could be. Future versions of WikiTraccs will add a wave number to scenario 1, so each WikiTraccs can select a different wave to migrate.