Overview
The AWS Knowledge Base does not refresh information from it's data sources automatically. If you want the information to refresh without manually doing clicking the button, you will need to do this using some form of automation.
High Level Steps
Previously create a Knowledge Base and Data Source all working as expected
Create an AWS Lambda Function
Create an AWS EventBridge
Never think about it again.
Prerequisites
You will need the following values to complete your configuration.
Admin rights to AWS
Knowledge Base ID (for the KB you want to refresh)
Data Source ID (for the data source of the KB you want to refresh)
AWS Lambda Function
Create the Lambda
This function will use variables and is capable of handling any multiple different KBs and Data Sources by passing in the appropriate values. You will do this when you create the EventBridge later.
Login to AWS and go to the Lambda screen while in the proper region
Select Functions > Create function
Give the Function name
Change the Runtime to Python 3.xx
Select Create function
Replace existing code in the lamda_function.py with the following
import json
import boto3
import os
import logging
# Set up logging
logger = logging.getLogger()
logger.setLevel(logging.INFO)
# Initialize AWS client for Amazon Bedrock Knowledge Bases
bedrock_agent = boto3.client('bedrock-agent')
def lambda_handler(event, context):
logger.info(f"Received event: {json.dumps(event)}")
try:
# Extract KB ID and data source ID from the event
# These can be passed from QuickSight schedule
kb_id = event.get('kb_id')
data_source_id = event.get('data_source_id')
# Validate the required parameters
if not kb_id or not data_source_id:
error_msg = "Missing required parameters. Both 'kb_id' and 'data_source_id' are required."
logger.error(error_msg)
return {
'statusCode': 400,
'body': json.dumps({'error': error_msg})
}
# Start the data source refresh
response = bedrock_agent.start_ingestion_job(
knowledgeBaseId=kb_id,
dataSourceId=data_source_id
)
logger.info(f"Started ingestion job: {response}")
return {
'statusCode': 200,
'body': json.dumps({
'message': 'Data source refresh started successfully',
'ingestionJobId': response.get('ingestionJobId'),
'knowledgeBaseId': kb_id,
'dataSourceId': data_source_id
})
}
except Exception as e:
logger.error(f"Error refreshing data source: {str(e)}")
return {
'statusCode': 500,
'body': json.dumps({'error': str(e)})
}
β
Update the Permissions
Select Configuration > Permissions
Open the Role name by clicking it
Select Add permissions > Create inline policy
Select JSON
Paste the following into the Policy editor (replacing everything else) to allow access to the Lambda to refresh the data source.
{
"Version": "2012-10-17",
"Statement":
[
{ "Effect": "Allow",
"Action": [ "bedrock:StartIngestionJob" ],
"Resource": [ "arn:aws:bedrock:us-east-1:905418109363:knowledge-base/*" ]
}
]
}
Scroll down and click Next
Give it a name and click Create policy
The Role should be updated and shown on the screen
Create an AWS EventBridge
The EventBridge will allow you to both setup a schedule to run a labmda, and send it variables for the KB and Data Source. You can create multiple EventBridge configurations to refresh all of your KBs data sources.
Login to AWS and go to the EventBridge screen while in the proper region
In Getting started select EventBridge Schedule > Create schedule
Create a name and description for the schedule
Select Recurring schedule
Enter information into the cron fields. Shown is a schedule that will run every morning at 1:10AM Pacific time. Every day of the year.
Please note the "?" in Day of the week
Please don't setup a frequency under 24 hours without consideration of how long it might take to injest and refresh the information.
Scroll down
Verify with the Next 10 trigger dates that you have setup what you wanted for refresh times.
Adjust the cron times if needed
Select an option in the Flexible time window
I have 5 minutes selected
Click Next when ready
Select AWS Lambda Invoke
Select the Lambda function in the dropdown that you made in the previous steps
Scroll down until you can see the Payload screen and insert the following into the window.
Be sure to replace the two variables in green with your Knowledge Base ID and your Data Source ID.
Scroll down and click Next
On the Next screen click Next
On the final screen validate the Schedule details and Target details
When ready click Create schedule
Be sure to check the Knowledge Base and the Data Source after the next expected run to be sure it completed ok.
The time can be adjusted so that it's triggered a few minutes from when you are doing this work so you don't need to wait until the next day.
By adjusting the time you can be sure that timing and permissions are all as expected.