Originally a chemist, changed to software development. Been through many years going from xbase code to C, C++, Java, C#. Now learning Python and Data Analysis, AI agents and such.
Location
Pennsylvania, USA
Education
HACC, Arizona State (chemistry). Berklee College of Music, composition and arranging.
Something doesn’t quite work right as I see some words throughout my text after splitting are broken apart with a space making 2 non-words of each of them. They have quite a few characters in between, so it isn’t frequent, but in a large body of text, these add up. I am concerned about the detrimental impact to the vector embeddings and retrieval then.
Something doesn’t quite work right as I see some words throughout my text after splitting are broken apart with a space making 2 non-words of each of them. They have quite a few characters in between, so it isn’t frequent, but in a large body of text, these add up. I am concerned about the detrimental impact to the vector embeddings and retrieval then.
Splitting is far from perfect. Hopefully more efficient techniques will be developed.