Asking for review on non-string regular expressions

twitter logo github logo ・1 min read

There is this idea I have and that I want to push forward about "non-string regular expressions". I explained things as I could on the repo.

GitHub logo Xowap / nsre

Non-String Regular Expressions

Non-String Regular Expressions

Regular expressions are used to match strings of characters, however the concept can be applied to anything else. This engine allows you to match any list of any type of objects using the same kind of constructs that regular expressions allow.

The algorithm is (far away but) based on this article by Russ Cox, aka uses the Thomson NFA algorithm (because it's apparently more efficient but mostly because it's the first explanation of a RE engine that I understood).

However the package doesn't support (yet?) the regular expression syntax that everybody is used to (because it allows to do different things).

Note — The current implementation is a pile of crap because I have no idea what I'm doing

Installation

pip install nsre

Then from your project you can

from nsre import *

Concept demo

By example, suppose that you have a list of dictionaries with a…

I'm looking for all kinds of feedbacks

  • Do you understand what this is?
  • Do you see applications for this?
  • Does the API look nice?
  • What features would you want to see around that?
  • What would you want before using this in production?

Thanks!

twitter logo DISCUSS (3)
markdown guide
 

Maybe I fail to understand, but how is this:

from nsre import *

re = AnyNumber(
    Symbol(KeyHasValue("type", "image")) + Maybe(KeyHasValue("type", "caption"))
) + Range(KeyHasValue("type", "text"), min=1)

assert re.match(
    [
        {"type": "image", "url": "https://img1.jpg"},
        {"type": "image", "url": "https://img2.jpg"},
        {"type": "image", "url": "https://img3.jpg"},
        {"type": "caption", "text": "Image 3"},
        {"type": "image", "url": "https://img4.jpg"},
        {"type": "caption", "text": "Image 4"},
        {"type": "image", "url": "https://img5.jpg"},
        {"type": "text", "text": "Hello"},
        {"type": "text", "text": "Foo"},
        {"type": "text", "text": "Bar"},
    ]
)

Better than this:

[
  {"type": "image", "url": "https://img1.jpg"},
  {"type": "image", "url": "https://img2.jpg"},
  {"type": "image", "url": "https://img3.jpg"},
  {"type": "caption", "text": "Image 3"},
  {"type": "image", "url": "https://img4.jpg"},
  {"type": "caption", "text": "Image 4"},
  {"type": "image", "url": "https://img5.jpg"},
  {"type": "text", "text": "Hello"},
  {"type": "text", "text": "Foo"},
  {"type": "text", "text": "Bar"},
]
  .filter(x => (x.type === "image") || (x.type === "caption"))
  .filter(x => x.text)
  .map(x => x.text.length)
 

I had a brief look at it, and whilst I'm predominantly a Java developer, I have done some Python scripting in the past so I can look at the code and feel comfortable with it.

I'll try and answer your questions:

Do I understand what this is?

I think so - I believe it's a way to search for a value in complicated objects, and those values and/or objects may or may not be strings.

It feels like it is making Regex more human readable.

Do I see applications for this?

Kind of - I saw right at the bottom that the performance for this was quoted as being "terrible" so it's not something I'd happy use in a production environment, but I can imagine it would be super excellent for searching for a piece of data in a complex JSON or XML object, and that would be super dandy if I say so myself - but only if the performance is decent.

Does the API look nice?

This is where me being a prominent Java developer will probably fail me. To me, it actually reminds me a lot of old-school Java, in that it's very verbose to express something simple. What I would imagine would be nice would be something like

re.on(datatype).match(expression) and have a limited set of expressions be available for the datatype. But I expect that would be more lengthier to code and maintaining a codebase like that would be hell.

But then again I'm not an expert in Python

What features would I want to see around that?

Mainly efficient, easy-to-understand regex formatting with various data types, like JSON, XML, CSS, perhaps even just a massive String which represents a text file.

What would I want before using this in production?

Mostly speed to be honest, and possibly support for the above data types? But that might be a stretch. The API is a nice to have but I rather not enforce that on any Python developer as someone who is interested in the code but isn't an expert on the language.


Honestly I think what you made is cool, even if is in another language that I don't use! 😂 Oh well. It's a good start and I think it definitely has promise!

Keep up the good work! 👍

 

The power of regular expressions in my opinion is to be able to encapsulate complex logic in a few characters with a standardized syntax. Being able to do the same type of thing on an array of JavaScript objects might open up some options, but the stream operations of map, reduce, etc are already powerful and flexible so this would need to provide something either different, cleaner, simpler, or more concise. However I do understand what it is doing and sometimes it's good to build tools and then see what unforseen things they can do after you have them to mess around with.

Classic DEV Post from Nov 29 '18

Big Tech Company Interview Advice Thread

Give and receive advice on big tech company interviews

Rémy 🤖 profile image

Night owl? 🦉

dev.to now has a dark version (in public beta).

Go to the "misc" section of your settings and select night theme ❤️