Project Idea π
Recently, I created a Chat-GPT Web Clone having the exact features the official Chat-GPT website has. However, making the exact same website feels kind of effortless to me. I realized that I had to innovate and add something new to my Chat-GPT Web Clone project.
While using the official Chat-GPT website, I realized a few missing features that would've been useful if it was added. For example:
- Adding file upload that allows us to upload documents(right now GPT-4 can do this, but GPT-3.5 still can't).
- Adding speech recognition that allows us to talk to the bot without directly typing on our keyboard.
That is why I decided to add these two features to my Chat-GPT Web Clone project.
Adding New Features β
For the file upload feature, I used an npm package pdf-parse
. And for the speech recognition feature, I used react-speech-recognition
, @speechly/speech-recognition-polyfill
, and regenerator-runtime
.
npm install pdf-parse react-speech-recognition @speechly/speech-recognition-polyfill regenerator-runtime
File Upload
First to make the file upload, I created a file at the /app/api
directory and named it pdf-to-text.js
, this file will be used for our API endpoint for parsing .pdf files. Inside the file we place in our code for the API:
import { NextResponse } from "next/server"
import pdf from 'pdf-parse/lib/pdf-parse'
export async function POST(request){
const formData = await request.formData()
const pdfFile = formData.get('pdfFile')
const buffer = Buffer.from(await pdfFile.arrayBuffer())
try {
const parsedPdf = await pdf(buffer)
return NextResponse.json(parsedPdf)
} catch (error) {
return NextResponse.json({ error: error.message }, { status: 400 })
}
}
What this code does is that when there is a POST request to the API with the endpoint /api/pdf-to-text
, this will get the submitted PDF file and parse it using the pdf-parse
library. Then, the returned result will be a JSON object containing the following properties:
{
numpages: integer,
numrender: integer,
info: [Object],
metadata: [Object],
version: string,
text: string,
}
Next, I created a helper function in /app/lib/api.js
for fetching the API I just created.
export const getParsedPdf = async(formData) => {
const response = await fetch("/api/pdf-to-text",{
method: "POST",
body: formData
})
.then(res => res.status === 400 ? "" : res.json())
.catch(err => console.log(err))
return response
}
After that, inside my InputField component, I created a button that has a click event listener that will open file upload.
InputField.js
"use client"
...
import { AiFillFileText as TxtIcon } from "react-icons/ai"
...
export default function InputField({ name, value, setValue, placeholder, autoFocus = false, disabled = false, disableInput = false }){
const uploadTxt = () => {
const fileInput = document.createElement('input')
fileInput.type = 'file'
fileInput.accept = '.txt, .pdf'
fileInput.multiple = true
fileInput.formEnctype = "multipart/form-data"
fileInput.onchange = _ => {
const files = Array.from(fileInput.files)
files.forEach(async(file, index)=> {
if(file.type === "application/pdf") {
const formData = new FormData()
formData.append("pdfFile", file)
const extractedText = await getParsedPdf(formData).then(res => res.text)
setValue(prevValue => `${prevValue}${index > 0 ? "\n" : ""}${extractedText.trim()}`)
}
else{
const reader = new FileReader()
reader.onload = () => {
setValue(prevValue => `${prevValue}${index > 0 ? "\n" : ""}${reader.result.trim()}`)
}
if(file) reader.readAsText(file)
}
})
}
fileInput.click()
}
return(
...
<button type="button" className="p-2 hover:bg-gray-300 dark:hover:bg-gray-900 rounded-md text-gray-600 dark:text-gray-400 transition-colors duration-200" onClick={uploadTxt}>
<TxtIcon size={16}/>
</button>
...
)
}
From the code above, the uploadTxt function will be triggered when the button is clicked. It will open a file upload and wait for a file to be inputted, I set the accept attribute to .txt/.pdf
only to ensure that the file uploaded can be parsed to raw text.
After receiving a file, it will check the file type, if it is a pdf then it will parse it using the function getParsedPdf
and get the text property from the response. Otherwise, it will read it using the FileReader's method readAsText
.
Speech Recognition
And now for the speech recognition. I created a component called Dictaphone.js and added these code:
Dictaphone.js
"use client"
import { useEffect } from 'react'
import 'regenerator-runtime/runtime'
import SpeechRecognition, { useSpeechRecognition } from 'react-speech-recognition'
import { createSpeechlySpeechRecognition } from '@speechly/speech-recognition-polyfill'
import { BsFillMicFill } from "react-icons/bs"
const speechlyAppId = process.env.NEXT_PUBLIC_SPEECHLY_APP_ID
if(speechlyAppId){
const SpeechlySpeechRecognition = createSpeechlySpeechRecognition(speechlyAppId)
SpeechRecognition.applyPolyfill(SpeechlySpeechRecognition)
}
export default function Dictaphone({ setPreviewSpeech, handleSpeechEnd }){
const {
transcript,
resetTranscript,
listening,
browserSupportsSpeechRecognition,
isMicrophoneAvailable
} = useSpeechRecognition()
useEffect(() => {
if(transcript.length > 0){
setPreviewSpeech(transcript)
}
if(!listening) {
handleSpeechEnd()
resetTranscript()
}
}, [listening, transcript]);
const startListening = () => SpeechRecognition.startListening({ continuous: true, language: 'en-US' });
if (!browserSupportsSpeechRecognition) return null
return (
<button
type='button'
disabled={!isMicrophoneAvailable} id='mic-btn'
className={`${(listening && isMicrophoneAvailable) ? "bg-green text-white" : "bg-white dark:bg-gray-700 text-gray-600 dark:text-gray-400 hover:bg-gray-300 dark:hover:bg-gray-900 border-black/10 dark:border-white/10 disabled:bg-gray-200 dark:disabled:bg-gray-800"} disabled:text-gray-600/40 dark:disabled:text-gray-400/40 border p-3 shadow-md rounded-full transition-colors duration-200`}
onTouchStart={startListening}
onMouseDown={startListening}
onTouchEnd={SpeechRecognition.stopListening}
onMouseUp={SpeechRecognition.stopListening}
>
<BsFillMicFill/>
</button>
)
}
I created an environment variable called NEXT_PUBLIC_SPEECHLY_APP_ID
to use the Speechly polyfill, you can get it here. Then, I followed the documentations given by react-speech-recognition
to use their package.
I connected the given transcript to the text input from the InputField component earlier. When we start touching the Mic button, the Dictaphone starts listening continously(won't stop until the button is released). When the Dictaphone stops listening, we call resetTranscript
to set the transcript back to an empty string.
Challenges π―
Some of the challenges I had while making these features are:
-
Problems related to the
pdf-parse
library. Becausepdf-parse
is an old library and unmaintained, there was some issues I had to handle before finally getting a working pdf to text function. -
react-speech-recognition
is not supported by most browsers. I solved this issue by implementing the Speechly polyfill, which helped in supporting other browsers than Chrome.
Conclusion
And that is how I added file upload and speech recognition feature to my Chat-GPT Web Clone project. You can check out the full code on this Github Repo or view the demo website by clicking this link.
Thank you for your time reading this article and feel free to leave a comment if you have any questions or feedbacks.
Top comments (0)