Hello everyone, السلام عليكم و رحمة الله و بركاته
Abstract Syntax Trees (ASTs) are a fundamental concept in computer science, particularly in the ...
For further actions, you may consider blocking this person and/or reporting abuse
So, you're simply using ChatGPT for creating you're blog post AND the comments and not even mention it, as you should by dev.to guidelInes?
At least he gave you information, bro. take it or shut up.
Did he? I don't see any value.
You are my hero! I´m trying to build a markdown parser from scratch. Ok, there are many implementations, but non of them fits my needs. I know that I should use a syntax tree, but I´m not sure about the implementation. By the way, I want to use the same markdown syntax most parsers use (e.g. this), but some elements are tricky to capture:
Ordered Lists
Ordered lists in Markdown can have any number followed by a dot (
1.,2., etc.).Tokenization Strategy:
\d+\.(one or more digits followed by a dot).Example:
Parsing Strategy:
ListNodefor the ordered list.ListItemNodechildren for each list item.Links
Links in Markdown have the format
[text](url).Tokenization Strategy:
\[.*?\]\(.*?\).Example:
Parsing Strategy:
LinkNodewithtextandurlattributes.Example Implementation in Python
Here is a simplified example of how you might start implementing this in Python:
Explanation
tokenizefunction processes each line of the Markdown input and generates tokens based on patterns.parsefunction converts tokens into an AST. It recognizes ordered list items and links, nesting them appropriately.render_htmlfunction traverses the AST to generate HTML output.This is a basic framework. You can extend it by adding more sophisticated handling for other Markdown features like unordered lists, blockquotes, code blocks, etc. Additionally, refining the tokenization and parsing logic will help in accurately capturing and rendering all Markdown syntax.
Thank you so much for your answer!
Please correct me, I´m not used to Python and not very good at reading RegEx, but If I understand it right, you might get trouble with nested brackets like this (?)
[[Link-Text]](URL.com)
You're correct. The regular expression I provided for links doesn't handle nested brackets properly. Nested brackets can indeed cause issues because the regex pattern
\[.*?\]\(.*?\)will greedily match the first closing bracket, leading to incorrect tokenization.To handle nested brackets correctly, we need a more sophisticated approach. One way is to use a stack to track the brackets during the tokenization phase. This approach can manage nested brackets by ensuring each opening bracket has a corresponding closing bracket.
Here is the code example in typescript :
The parseLink function uses a stack to handle nested brackets, ensuring that link text and URLs are correctly identified even when nested brackets are present.
Was this post created with ChatGPT?
Yes, it was.
This code didn't work for me. I had to modify the traverse function to pass source file into node.getText().
Thanks