Strategy Shift to Text Files for Knowledge Repositories

Tags:

This topic is part of WikiTraccs for Markdown and work in progress.

Join the waitlist and start publishing Markdown to SharePoint soon

Why Go Back to Text Files?

Corporations are faced with several challenges right now when it comes to internal documentation and knowledge management:

How to feed information into third-party AI services like Copilot and Atlassian Intelligence?
How to feed information into privacy-focused in-house AI solutions?
How to tackle rising costs of third-party tools and services?
How to prevent vendor lock-in in a changing environment?

One way to tackle all of above issues is to take a step back, cut ties to any tools at all and go back to… text files.

Multiple of my clients who used WikiTraccs to migrate content from Confluence to SharePoint are already implementing or at least considering this as part of a broader strategy.

Their strategy entails:

Reducing the number of tools and services to reduce operational complexity and cut costs
Ultimately going to a simpler, slightly more technical, yet proven approach of maintaining information like a source code repository

Usually, there are additional collateral benefits like easier collaboration with a translation agency for multi-lingual pages.

One common implementation of a text file-based documentation approach is the use of Markdown.

The benefits of Markdown are clear:

AI services and tools understand Markdown very well - they understand it so well that Microsoft created MarkItDown to convert files of any kind into Markdown, to feed it to AI
Text files can be stored anywhere; a place of choice is something like GitHub, Gitlab, or any other repository that supports collaboration and versioning
Markdown is supported by a wide range of (free) tools and services, so there is no vendor lock-in
Markdown is a format suited for data exchange across enterprises
Markdown can be published on-demand to third-party services like Confluence and SharePoint, while preventing getting locked in with one specific vendor

It’s the latter point - publishing to third parties - where WikiTraccs for Markdown comes into play, as it can publish Markdown files to SharePoint Online.

Preparing for the Change

Switching from an Enterprise wiki like Confluence to text files containing Markdown is a big change.

We look at the feature set that tools like Confluence and SharePoint provide, to then map those to a Markdown-based repository.

Some features will have parity, some will be lost, some might be added again by different third-party tools and solutions.

Users will have the expectation that certain features are available in a knowledge repository - like formatting text, attaching files, restricting access, and so on.

In this article, we assume that third-party tools have features built-in because there is a demand for them. We look at those features and - in a second step - look at what’s possible to do with a Markdown-based knowledge repository to meet user’s demands.

What’s a “Markdown-Based Knowledge Repository”?

This kind of knowledge repository is - at least in the context of WikiTraccs for Markdown - a collection of Markdown files, organized in folders.

The Markdown-based knowledge repository is

a single point of truth for enterprise documentation
written in a well documented and well established markup language (Markdown), with broad tool support
yet independent of any vendor or third-party service, as information is stored in text files
easily consumable both by humans and machines
a starting point for publishing content to third-party services like SharePoint Online or Atlassian Confluence

Before jumping right into creating such a repository, we take a step back to look at the typical content types and structure these repositories are usually built of.

Content Types in a Knowledge Respository

Which types of content are there in a knowledge repository?

This list is influenced mainly by looking at the following third party services as those are often used for documentation and knowledge management:

Atlassian Confluence Enterprise Wiki
SharePoint Online

Each row in this list below names a content type, describes what that type means, and then maps this abstract content type to specific third-party services.

The list also includes the file-based Markdown Repository we aim to build.

Content Type	Meaning	Confluence	SharePoint Online	Markdown Repository
Space	A logical container for pages.	space	site	folder
Page	A wiki page, a page in a PDF document, something textual, might have columns or rows, contains tables, text, images, etc.	page, blog post, whiteboard	modern page	text file (Markdown)
List	A tabular representation of data, something like a sheet in an Excel file.	database (Confluence Cloud)	list, document library	text file (Markdown)
Attachment	A file that is associated with a page.	attachments of a page	kind of has page attachments, but often just links to files stored in document libraries	files next to a page’s text file
Comment	A comment by a user	footer comments, inline comments, attachment comments; all support rich content	page; very very basic nesting and formatting support	?
User	A user resource that can be linked to	@-mentions in pages, used in metadata	used in metadata	?

All content types have metadata attached, to different extents. At least the author and a timestamp are nearly always available.

Structures of Knowledge Repositories

Let’s look at some approaches to organizing files and folders.

Atlassian Confluence

The structure used by Confluence:

Confluence
├── Space
│   ├── Page
│   └── Page
│       ├── Child Page
│       └── Comments
│   └── Page
│       └── Attachment
│           └── Comments
└── Space
    └── ...

SharePoint Online

The structure used by SharePoint Online:

SharePoint Online
├── Site
│   ├── Document Library
|   │   ├── Page
|   │   └── Page
|   │       └── Comments
│   └── Document Library
|   │   └── Page Attachment Folder
|   │       └── Attachments
|   │   └── Page Attachment Folder
|   │       └── Attachments
│   └── Document Library
|       └── Files
└── Site
    └── ...

Note: there are no child pages in SharePoint.

Hugo

Hugo (a Markdown-based static web site generator that powers this very website) uses this structure:

Local File System
├── Section Folder
│   ├── Page
│   └── Page Folder
│       ├── Page
│       ├── Attachments
│       └── Page Folder
│           ├── Page
│           └── Attachments
│   └── Page
└── Section Folder
    └── ...

Glossary

In addition to the content types, the following words should be defined:

content = pages, attachments, comments, etc. - quite everything, except permissions or processes
files = files in general, can be an attachment, can be an attachment of another page, can be an external file that is being linked to

Access Control in Knowledge Repositories

We again look at the different third-party services.

Atlassian Confluence

Confluence uses the following approach to permission management:

Confluence (access is granted to users and groups)
└── Space  (access is granted to users and groups)
    └── Page  (access can be narrowed down to a subset of users and groups)
        └── Child Page (access can be further narrowed down)

Confluence also has the option to grant Anonymous access.

Being granted access to a page grants access to this page’s attachments and comments as well.

SharePoint Online

SharePoint Online
└── Site (access is granted to users and groups)
    ├── Document Library (access can be changed to a completely different set of users and groups)
    │   └── Page (access can be changed to a completely different set of users and groups)
    └── Document Library (access can be changed to a completely different set of users and groups)
        └── Page Attachment Folder (access can be changed to a completely different set of users and groups)

SharePoint Online allows sharing of pages and attachments with anonymous users using a sharing link. But there is no real anonymous access as Confluence has.

Actions

Users can perform actions on content and structure, based on their permissions.

Possible actions are:

modify content (create, read, update, delete)
- one user at a time / multiple users at the same time
- using a visual editor / without a visual editor
restructure content (e.g. move page to a new parent, create a new space)
duplicate content
set metadata on content (e.g. label)
link to existing content
link to non-existing content (“red links”)
use extensions to enrich content with non-text elements (first-party or third-party)
export content (e.g. to PDF or Word)
change permissions on content

Processes

tbd

Next reading

Now that we know what’s in a repository, we can apply that knowledge and build our Markdown-based knowledge repository: Markdown Repository Specifics.

Last modified January 8, 2025