DEV Community

Jeff Lindsay
Jeff Lindsay

Posted on

3 2

Parsing bridgesupport schema files

Tuesday I started on the potentially arduous effort of generating a Go API for AppKit, the Apple framework for building apps. With the core bridge working, I want to have native bindings for all the Apple APIs. There are a lot of them!

Luckily as I mentioned a while back, every framework has with it a gigantic XML file describing every part of the API. These are there specifically for generating bindings and/or header files. So the first step would be to parse these into Go structures.

Although it's not super well documented, Go has amazing support for parsing XML into structs. You just lay out the data types and map them using struct tags. There is some documentation on the structure of these XML files, but I also just referenced a few of the files as well to figure out the data model.

There was one twist. When the schema needs to talk about specific data types of any value or variable, they are encoded into these strange strings. Luckily theses are... mostly documented. I had to poke around to find out why methods on informal protocols have some extra numbers it turns out I can ignore, and there are a few examples where I just don't know what they are.

On top of this, they're encoded into a strange string encoding where most types are represented as single characters, but a few like bitfield masks and pointers have extra information, and compound types like structs and arrays are of course more complicated structures. It took me a while to figure out how I should parse these. I ended up using bufio.Scanner, a sort of programmable tokenizer you can use to relatively easily throw together a lexer. After a bunch of experimentation I got a system that seems to work and I can extend and customize as I run into specific scenarios. I use this to create a TypeInfo struct that has a more friendly representation of the type data.

The best part is, I can have the XML parser automatically unmarshal into the TypeInfo type for those encoded fields. There aren't many examples of this, but it can be done and it works great. At this point I can parse a couple of these XML files without error, and although there's plenty of holes, this is enough to start generating some Go bindings. Next week!

Image of Timescale

🚀 pgai Vectorizer: SQLAlchemy and LiteLLM Make Vector Search Simple

We built pgai Vectorizer to simplify embedding management for AI applications—without needing a separate database or complex infrastructure. Since launch, developers have created over 3,000 vectorizers on Timescale Cloud, with many more self-hosted.

Read full post →

Top comments (2)

Collapse
 
yorodm profile image
Yoandy Rodriguez Martinez •

[From your robot overlords 🤖]: We are very pleased with this post. We will also like to know if there's a Github repo for us to learn and hopefully collaborate with your code.

Collapse
 
progrium profile image
Jeff Lindsay •

Image of Docusign

🛠️ Bring your solution into Docusign. Reach over 1.6M customers.

Docusign is now extensible. Overcome challenges with disconnected products and inaccessible data by bringing your solutions into Docusign and publishing to 1.6M customers in the App Center.

Learn more