Automatic Chapter Detection Using AI and Python (2023)

Breaking down your video or audio content into logical chapters utilizing AI

Automatic Chapter Detection Using AI and Python (1)

Most good books that readers might come across will have a summary page describing a table of contents, the different chapters involved, and what the reader can expect in the future contents. It helps the reader to gauge a quick understanding of all the subjects involved as well as provides an option for them to jump to the section they find most engaging.

When watching a video or listening to audiobooks, the viewer or listener would love to have similar information as discussed previously. Automatic chapter detection is one of the best ways in which you can obtain the name of the critical intervals in a video or audio data, and also a brief summary for the same enabling the user to gauge greater understanding of the contents.

Before checking out this article, I would recommend getting more familiar with the development of your own optimized speech-to-text application and how to make use of Artificial Intelligence for real-time speech recognition with Python. You can check out the former from this link and the latter from the following link.

For this project, you will need video or audio content for which the task of automatic chapter detection must be performed. This video or audio file can be downloaded from the internet or could be something you made by yourself. Ensure that you place this file in the current working directory so that we can perform the desired actions easily. Let us get started with the development of this project by importing the essential libraries and loading the necessary parameters.

Import the essential libraries and loading the required parameters:

In the first step, let us import all the essential libraries and briefly discuss their use cases. The JSON library is imported to help us deal with the JSON files that we will utilize for the majority of this project. The pretty print allows us to have a more visually appealing print statement to read the display content better. The requests library will help us to connect to various useful URLs, such as the AssemblyAI platform for their API key. The config import is a Python file that we create to store the API key.

import json
from pprint import pprint
import requests
from fileinput import filename
from config import API_Key

For performing a high-quality automatic chapter analysis, it is best to make use of the AssemblyAI platform. For continuing with the rest of the coding section, you can easily create an account on the AssemblyAI platform and retrieve your free API key that we can utilize for this project. This key is found on the right side of your login screen. This key should be placed in a new “config.py” file, as shown in the below code snippet.

API_Key = "Your Free API Key"

Once we are done with the imports, we can proceed to declare some of the essential variables that we will require for this project. The transcript endpoint and upload endpoint variables will help us establish a connection with the AssemblyAI platform for uploading our data and receiving the appropriate detection of chapters. We will also specify the headers for the authorization of the API key and use the JSON type content. The chunk size will allow us to upload the information in terms of small chunks. The code block below contains all the necessary parameters.

transcript_endpoint = "https://api.assemblyai.com/v2/transcript"
upload_endpoint = 'https://api.assemblyai.com/v2/upload'
headers_auth_only = {'authorization': API_Key}headers = {
"authorization": API_Key,
"content-type": "application/json"
}
CHUNK_SIZE = 5242880

In the next few sections, we will focus on creating three different functions that will enable us to upload, transcribe, and poll the audio or video content. Let us explore each of them accordingly.

Creating the upload function:

The first function we will create is the upload function that will help us to upload the saved audio or video content to the AssemblyAI platform, for which further analysis on the automatic chapter detection can be done. The upload function contains another inner function that we will utilize to read the data when it is available in terms of the chunk size specified earlier.

Once we finish defining the function to read the data, we can proceed to upload the response to the upload endpoint defined previously. We will make use of the requests library to upload our data. We will post the file to the upload endpoint URL by specifying the headers and the function which will upload our data in the specified chunk size to the AssemblyAI platform. The code block to perform the following action is as shown below.

def upload(filename):
def read_file(filename):
with open(filename, 'rb') as _file:
while True:
data = _file.read(CHUNK_SIZE)
if not data:
break
yield data
# upload audio file to AssemblyAI
upload_response = requests.post(
upload_endpoint,
headers=headers_auth_only,
data=read_file(filename)
)
pprint(upload_response.json())
return upload_response.json()['upload_url']
Automatic Chapter Detection Using AI and Python (2)

You can run the program by using the above function. You can either assign a variable to hold the return function or just use the pretty print command of the function to hard code the upload URL, which we will utilize for transcribing the data. Let us explore this further in the next section.

Creating the transcribe function:

In the next step, we will define the function to transcribe the audio or video file that was previously uploaded. We will need to send another request to the transcript endpoint URL that will now contain a transcript request where the upload URL is mentioned, and the auto chapters condition is set to True. We will receive a JSON file response that we will print and return accordingly. The code snippet below shows how to perform the required actions.

def transcribe(audio_url, auto_chapters=False):
# start the transcription of the audio file
transcript_request = {
'audio_url': audio_url,
'auto_chapters': "True" if auto_chapters else "False"
}
transcript_response = requests.post(transcript_endpoint, json=transcript_request, headers=headers)
pprint(transcript_response.json())
return transcript_response.json()['id']
Automatic Chapter Detection Using AI and Python (3)

When we run the following function, we will receive values similar to the image shown above. One of the main parameters that we are concerned with is the transcript ID that will help us to retrieve the essential information. You can either store this data or hard code the printed data as we did for the previous upload function.

Define the polling function:

Finally, we will need to retrieve the information, which is the response obtained during transcription. In this function, we will define the polling endpoint, which is a combination of the transcript endpoint and the transcript ID. We will use the get request command to retrieve the information. Once the polling response status is completed, we will create a text file and a JSON file. The text file will store the summary of the audio information, while the JSON file will store all the important chapters that the AI detected in the audio file.

def poll(transcript_id):
polling_endpoint = transcript_endpoint + "/" + transcript_id
polling_response = requests.get(polling_endpoint, headers=headers)
if polling_response.json()['status'] == "completed":
filename = transcript_id + '.txt'
with open(filename, 'w') as f:
f.write(polling_response.json()['text'])
filename = transcript_id + '_chapter.json'
with open(filename, 'w') as f:
chapters = polling_response.json()['chapters']
json.dump(chapters, f, indent=4)
print("Transcript Saved")
Automatic Chapter Detection Using AI and Python (4)

After the successful execution of this final function, you should be able to find the following two files as shown in the above image containing the relevant information. You can view the information for your respective audio file and notice that the AI does a pretty good job at figuring out the desired summaries and chapters accordingly.

Calling the main method:

We have already discussed most of the steps that are required in the main method. Firstly, we will declare the filename variable that will contain the file in our working directory. We can upload this filename and store the obtained link in the URL variable.

This link can now be passed to the transcript endpoint, where the audio data is transcribed accordingly. The ID obtained can now be passed through the final poll function that will return the text and JSON files containing the summary and automatic chapter detection information.

if __name__ == '__main__':
filename = "TDS.mp4"
# Uploading the filename and hardcoding the URL from recieved value, check Screenshot
# upload(filename)
url = 'https://cdn.assemblyai.com/upload/dd4e13e5-4001-471e-b0f5-e3fcbd27d8c3'

# Transcribe the value
# transcript_id = transcribe(url, auto_chapters=True)
transcript_id = 'orw1x52bk1-b592-42e3-ba01-2fca9bb8b078'

poll(transcript_id)

Once we have finished going through all these steps, the automatic chapter detection project must be successfully completed. Let us have a final look at the complete Python script working of this project in the next section.

Complete Code:

Now that we have completed the construction of the entire project, we can have a look at the functioning workflow as shown in the code embedded below.

If you are looking for a video guide for the automatic chapter detection project, I would highly recommend checking out the following link where they go through this project in enormous detail covering most of the requirements to construct it successfully.

Automatic Chapter Detection Using AI and Python (5)

The emergence of modern technologies in artificial intelligence helps us to accomplish tasks that were once deemed next to impossible for machines to achieve. The progression of AI in the field of natural language processing helps us to perform tasks such as speech-to-text transcription, sentiment analysis, automatic chapter detection, and many other similar tasks with high precision and ease.

In this article, we understood how to construct the project of automatic chapter detection. We looked over the necessary library imports and the required parameters and described three primary functions for this project. These three functions included one to upload the audio or video file, another one to transcribe the required information, and finally, a last one to save the received results.

If you want to get notified about my articles as soon as they go up, check out the following link to subscribe for email recommendations. If you wish to support other authors and me, then subscribe to the below link.

Check out some of my other articles in relation to the topic covered in this piece that you might also enjoy reading!

Thank you all for sticking on till the end. I hope all of you enjoyed reading the article. Wish you all a wonderful day!

Top Articles
Latest Posts
Article information

Author: Nicola Considine CPA

Last Updated: 03/19/2023

Views: 5502

Rating: 4.9 / 5 (69 voted)

Reviews: 92% of readers found this page helpful

Author information

Name: Nicola Considine CPA

Birthday: 1993-02-26

Address: 3809 Clinton Inlet, East Aleisha, UT 46318-2392

Phone: +2681424145499

Job: Government Technician

Hobby: Calligraphy, Lego building, Worldbuilding, Shooting, Bird watching, Shopping, Cooking

Introduction: My name is Nicola Considine CPA, I am a determined, witty, powerful, brainy, open, smiling, proud person who loves writing and wants to share my knowledge and understanding with you.