Overview

This document covers the process of configuring AWS Knowledge Bases to collect and index information from a Confluence site. The setup process involves creating authentication credentials in Confluence, configuring an AWS Knowledge Base, and then connecting it to SurePath AI as a data source. Please note that none of the screens or setup steps are SurePath AI specific until the final section.

High level steps

The integration process requires admins to complete several key tasks. First, admins need to create a Confluence API token for authentication. Next, they will configure the AWS Knowledge Base with the Confluence credentials. After the initial sync completes, admins can configure content filters to control which spaces and pages are indexed. Finally, admins will add data contexts in SurePath AI using the new connector.

Confluence setup

Prerequisites

Admins will need the following to complete the Confluence configuration:

Access to a Confluence account with permissions to generate API tokens
The email address associated with the Confluence account
The base Confluence URL for the organization

Output checklist

Admins should collect the following information during this process:

Confluence email address (username)
Confluence API token (password)
Base Confluence URL

Create a Confluence API token

Confluence uses API tokens for authentication with external services like AWS Knowledge Bases. Admins need to create an API token that will be stored in AWS Secrets Manager.

Navigate to Atlassian Account Settings:
- https://id.atlassian.com/manage-profile/security/api-tokens
Select Create API token
Provide a descriptive label for the token (e.g., "AWS Knowledge Base Integration")
Select Create
Copy the API token immediately, as it will only be displayed once
Store the token securely along with the associated email address

The API token does not require any specific scopes to be configured. The token inherits the permissions of the user account that created it, so admins should ensure the user account has appropriate read access to the Confluence spaces that need to be indexed.

AWS setup

Prerequisites

Admins will need the following values to complete the AWS configuration:

Confluence email address (username)
Confluence API token (password)
Base Confluence URL
Admin rights to AWS with access to Secrets Manager and Bedrock

Output checklist

Admins should collect the following information during this process:

AWS Secrets Manager ARN for the Confluence credentials
AWS Knowledge Base ID

Create an AWS secret

Admins need to store the Confluence credentials in AWS Secrets Manager before configuring the Knowledge Base. The Knowledge Base will reference this secret for authentication.

Log in to AWS and navigate to Secrets Manager > Store a new secret
- https://us-east-1.console.aws.amazon.com/secretsmanager
Select Other type of secret
Enter the two (2) key-value pairs into the UI. The text and case must match exactly:
- username (value is the Confluence email address)
- password (value is the Confluence API token created earlier)
Select Next
Enter a Secret name (spaces are not allowed)
Select Next and continue selecting Next and Store until the secret is saved
After creation, open the secret details and copy the full ARN (Amazon Resource Name) for use in the next section

Setup a new AWS Knowledge Base

Admins can now create the Knowledge Base that will connect to Confluence and index the content.

Access AWS Bedrock > Builder Tools > Knowledge Bases > Create
- https://us-east-1.console.aws.amazon.com/bedrock/home?region=us-east-1#/knowledge-bases
Select Knowledge Base with vector store
Enter a Knowledge Base name or accept the default
For IAM permissions, select Create and use a new service role
Update the Service role name or accept the default
Scroll down and select Confluence - Preview as the data source type
Select Next

Configure the data source

The data source configuration defines which Confluence content will be indexed and how authentication will work.

Enter a data source Name or accept the default
Enter the Confluence host URL
- This should be the base URL without any path components
- Example: https://yourcompany.atlassian.net/wiki
Leave advanced options at their default settings
In the Authentication section, select Basic authentication
Enter the AWS Secrets Manager secret ARN that was created in the previous section
- Paste the full ARN that was copied from Secrets Manager
For Chunking and parsing, select Amazon Bedrock default parser
For Chunking strategy, select Default chunking
Configure Metadata and filtering as needed (see the Content filtering section below for detailed guidance)
Select Next

Configure embeddings and vector store

The embeddings model converts the text content into vectors that enable semantic search capabilities.

For the Embeddings model, admins can choose between:
- Cohere Embed English V3 - $0.0001 per 1000 tokens (recommended as the industry standard)
- Amazon Titan Text Embeddings V2 - $0.00002 per 1000 tokens (lower cost option)
- See https://aws.amazon.com/bedrock/pricing/ for current pricing
For Vector Store, select Quick create new vector store
- Type: Amazon OpenSearch Serverless
- Important: Set the OpenSearch capacity limits (OCU) to a value higher than 10. SurePath AI recommends values between 20 and 50 for optimal performance. While customers are only charged for the capacity they actually use, setting this limit too high could lead to significant charges if the Knowledge Base experiences high query volumes.
Select Next
Review all settings and select Create Knowledge Base

The Knowledge Base will be created and the initial sync will begin automatically. Depending on the amount of content in Confluence, the sync process may take several minutes to several hours.

Important: The Knowledge Base will not sync automatically after the initial creation. Admins must either trigger manual syncs or configure an automated refresh schedule. See the automatic KB refreshes documentation for setup instructions:

https://help.surepath.ai/en/articles/10930743-howto-data-source-refresh-data-in-an-aws-knowledge-base#h_d6037a8bb6

Content filtering

AWS Knowledge Bases support inclusion and exclusion filters to control which Confluence content gets indexed. Filters can target specific spaces or individual pages, allowing admins to create precise indexing rules.

Understanding filter behavior

The crawler behavior depends on how inclusion and exclusion filters are configured. When no inclusion filters are defined, the crawler will index the entire Confluence site except for any content specified in exclusion filters. When inclusion filters are defined, the crawler will only index the content explicitly listed in those filters, and exclusion filters can further refine what gets indexed within the included content.

Exclusion filters always take precedence over inclusion filters. If a space is included but specific pages within that space are excluded, those pages will not be indexed. Similarly, if no space inclusion filters are defined, space exclusion filters can block specific spaces from being indexed.

Space filters

Space filters use the Confluence Space Key rather than the space name or URL path. The Space Key is a system identifier that may differ from what appears in the URL.

To find the Space Key for a space, admins should navigate to the space settings page at https://{team-name}.atlassian.net/wiki/spaces/{space-name}/settings/details and locate the System Key value displayed on that page. This System Key is what should be entered in the space filter.

For example, if the Space Key is "ENG", admins would enter "ENG" in the space filter even if the URL shows a different value like "/spaces/engineering/".

Page filters

Page filters use the page title exactly as it appears at the top of the Confluence page. Unlike space filters, page filters do not use any hidden identifiers or system keys. Admins should copy the page title directly from the page and enter it into the filter.

Page titles are case-sensitive and must match exactly, including any special characters or spacing. If a page title is "Q4 Planning Document", the filter must specify that exact string.

Common filtering scenarios

Admins can implement several common filtering patterns depending on their indexing requirements:

Index specific spaces only: Define space inclusion filters for each desired space. Only content within those spaces will be indexed.

Index everything except specific spaces: Leave space inclusion filters empty, and define space exclusion filters for the spaces that should not be indexed.

Index a space but exclude specific pages: Define a space inclusion filter for the space, then add page exclusion filters for the specific page titles that should be skipped.

Index specific pages across the entire site: Leave space inclusion filters empty (to allow all spaces), and define page inclusion filters for the specific page titles to index. Note that this approach requires the page titles to be unique across spaces.

SurePath AI setup

Create a new connector

Before creating the data source in SurePath AI, admins need to ensure an AWS connector is configured. If an AWS connector already exists for private models or S3 buckets, that same connector can be used for Knowledge Bases.

For instructions on creating an AWS connector, see:

https://help.surepath.ai/en/articles/10023397-setup-private-genai-models#h_25411a9478

Create the data source

Once the AWS connector is in place, admins can create a data source that references the Knowledge Base. For detailed instructions on creating and configuring data sources in SurePath AI, including how to control user access, see:

https://help.surepath.ai/en/articles/10439776-setup-data-sources

Connecting an AWS Knowledge Base to Microsoft Sharepoint

Automatic refreshes for AWS Knowledge Bases

Sensitive Data policy

Connecting an AWS Knowledge Base to Confluence

Overview

High level steps

Confluence setup

Prerequisites

Output checklist

Create a Confluence API token

AWS setup

Prerequisites

Output checklist

Create an AWS secret

Setup a new AWS Knowledge Base

Configure the data source

Configure embeddings and vector store

Content filtering

Understanding filter behavior

Space filters

Page filters

Common filtering scenarios

SurePath AI setup

Create a new connector

Create the data source