DEV Community

loading...

Discussion on: How I can extract words from strings using regular expressions?

Collapse
qm3ster profile image
Mihail Malo • Edited

One possible RegEx that creates these matches is

const re = /(?:[()]|AND|OR|"[^"]*":\s*"[^"]*")/g;
Enter fullscreen mode Exit fullscreen mode

try it out here

However, regex is often brittle for parsing, and produces uninformative errors (or worse, silently skips/corrupts data)

Thread Thread
pprathameshmore profile image
Prathamesh More Author

Let me check. Thank man

Thread Thread
pprathameshmore profile image
Prathamesh More Author

Thanks, man. It works. You saved my job.

Thread Thread
qm3ster profile image
Mihail Malo

What language are you using? Javascript?
Are you on the NodeJS runtime?
Can you use libraries from npm?

Thread Thread
pprathameshmore profile image
Prathamesh More Author • Edited

Yes, I am using JS and Node.js.

Yes, I can use lib from npm. Can you suggest to me?

I am a junior developer. Joined 1 month ago. You saved me.

Thanks, man.

Thread Thread
pprathameshmore profile image
Prathamesh More Author

Look output now

Operands [
  '(',
  '"type":"Nanoheal"',
  'AND',
  '(',
  '"specialFields.email":"prathameshmore@gmail.com"',
  'OR',
  '"specialFields.address":"Jaysingpur"',
  ')'
]
Generate [
  '"specialFields.address":"Jaysingpur"',
  'OR',
  '"specialFields.email":"prathameshmore@gmail.com"'
]
Filter {"$or":[{"specialFields.address":"Jaysingpur"},{"specialFields.email":"prathameshmore@gmail.com"}]}
Enter fullscreen mode Exit fullscreen mode
Thread Thread
pprathameshmore profile image
Prathamesh More Author

Filter is final output

Thread Thread
qm3ster profile image
Mihail Malo

Nah, you're good.

But for correct treatment of strings, such as if "ty\"pe": "Pe\"rson" is allowed, you should look at using a proper tokenizer, for example moo: github.com/no-context/moo#states (see how they match string escape here!)
You can then take tokens from this tokenizer yourself, or give it to for example nearley: nearley.js.org/docs/tokenizers

I suggest you read the documentation for these two libraries later today when you have time, you will then be a head above most people in tasks where custom text formats need to be parsed.

Thread Thread
pprathameshmore profile image
Prathamesh More Author

Thanks Mihail for such valuable time.

Thread Thread
pprathameshmore profile image
Prathamesh More Author • Edited

How I can modify the Regex so can support for > ,<, <=,>=, = and != .

E.g.

(("assetType": "Application" AND "assetType"> "Application" ) OR ("assetType": "AccessKey" OR "assetType": "Google"))
Enter fullscreen mode Exit fullscreen mode
Thread Thread
pprathameshmore profile image
Prathamesh More Author

This is working

(?:[()]|AND|OR|"[^"]*"\s*:*>*[<|>=|<=|!=|=]*\s*"[^"]*")
Enter fullscreen mode Exit fullscreen mode