DEV Community

Discussion on: Understanding LangChain's RecursiveCharacterTextSplitter

Collapse
 
james_stover_cb94b158d958 profile image
James Stover

Something doesn’t quite work right as I see some words throughout my text after splitting are broken apart with a space making 2 non-words of each of them. They have quite a few characters in between, so it isn’t frequent, but in a large body of text, these add up. I am concerned about the detrimental impact to the vector embeddings and retrieval then.

Collapse
 
eteimz profile image
Youdiowei Eteimorde

Splitting is far from perfect. Hopefully more efficient techniques will be developed.