How to run parallel WikiTraccs migrations?
You can run multiple WikiTraccs instances in parallel to speed up the overall migration. You’ll need a separate machine or VM per WikiTraccs instance as only one instance is allowed to run on the same machine.
Some definitions
The computer or virtual machine (VM) WikiTraccs is running on is the machine or migration machine.
We’ll call the running WikiTraccs program a WikiTraccs instance. When two WikiTraccs programs are running, that would be two instances.
The central SharePoint site that WikiTraccs needs and where the Space Inventory list (and more) is located is called WikiTraccs site.
One WikiTraccs instance is allowed to run on one machine.
How to run multiple WikiTraccs instances in parallel?
Looking at client’s projects I see different approaches.
Some create cloud-based virtual machines in Azure or AWS. Some use spare laptops.
Install WikiTraccs to each of those machines and your are good to go.
Configuring WikiTraccs when parallelizing the migration
There is local and shared configuration.
Configure locally on all machines
Make sure you configure the WikiTraccs settings on all machines. This affects things like comments migration, macro ignore list, migration mode etc. They can be different on different machines, but usually are the same.
Locally stored settings are:
- everything you enter into the blue WikiTraccs.GUI window
- settings available in the blue WikiTraccs.GUI window via the Settings menu
- any settings configured via
appsettings.json
Note that the Space Inventory that contains the source-to-target mapping is NOT a local setting, but stored in SharePoint, in the WikiTraccs site. But the link to the WikiTraccs site (that contains the Space Inventory) is part of the local settings.
Be aware of shared configuration via WikiTraccs site
One WikiTraccs site can be used by multiple WikiTraccs instances, or each instance can use their own WikiTraccs site.
With the same WikiTraccs site, multiple WikiTraccs instances share the same mappings.
There is one caveat of sharing one WikiTraccs site: they will also want to migrate all the same content.
We’ll look at two two scenarios now: one with shared WikiTraccs site, one without.
Parallelization Scenario - Shared WikiTraccs Site
Multiple WikiTraccs instances share the same WikiTraccs site.
Two WikiTraccs instances sharing a single WikiTraccs site.
To achieve that, simply configure the same WikiTracc site address for all WikiTraccs instances.
Pro:
- you need to configure the mapping in the Space Inventory only once
- you need to configure the user and group mapping only once
- easy, as there is only one WikiTraccs site
Contra:
- (note: before WikiTraccs v1.18) you need to change the WT_Setting_RequestTransformation value in the Space Inventory before starting each WikiTraccs instance as otherwise multiple instances migrate the same pages (!); one approach is to use PnP.PowerShell to manipulate the space inventory
Tip
Starting with WikiTraccs v1.18 migration waves are supported. Those simplify distributing the migration to multiple WikiTraccs instances.
Assign wave numbers to selectors in the Space Inventory. Then select a different wave for migration in each WikiTraccs instance.
Important
With this scenario, you need to make sure that no two WikiTraccs instances migrate the same content at any given time.Parallelization Scenario - Separate WikiTraccs Sites
Two WikiTraccs instances each using their own WikiTraccs site.
To achieve that, use a different WikiTracc site address for each WikiTraccs instance.
Pro:
- the WT_Setting_RequestTransformation value in the Space Inventory can be configured per WikiTraccs instance, as each instance has its own Space Inventory
- the snapshots that WikiTraccs takes of pages (in the Confluence Content Snapshots library), that are stored in the WikiTraccs site, are now scattered across multiple WikiTraccs sites, which can be an advantage permission-wise (when different teams do the migration that are not allowed to see each other’s content)
Contra:
- the mapping in the Space Inventory needs to be configured in all WikiTraccs sites, for all spaces and CQL selectors (!)
- the user and group mapping needs to be configured in all WikiTraccs sites
Important
With this scenario, you need to make sure to configure each instance properly and to configure complete mappings in each Space Inventory (even for spaces migrated by other instances). It’s important for link transformation of cross-space links.General parallelization notes
Multiple instances can migrate to the same target SharePoint site, as long as they don’t migrate the same content at the same time to the same target.
Parallel migration is covered by the WikiTraccs license, without limits.
You shouldn’t go too crazy with the parallel instances. There are limits in place set by Microsoft on how fast an application can access the Microsoft 365 services. Microsoft will throttle WikiTraccs, if it goes to fast. This means: WikiTraccs will have to wait some time (a couple of seconds, or even minutes) until it is again allowed to send content. Those limits are set by Microsoft, adjusted to overall cloud load, and dependent on the number of licenses in the tenant (among other factors).