Michael Neu

Posted on May 14, 2019 • Edited on Nov 27, 2019

Excel'ing at PHP

#webdev #php #showdev #excel

It's been a while since I published my writeup on building a webserver using plain VBA macros in Microsoft Excel. You might have guessed by the title picture, this time we're not building a REST backend, but rather add support for PHP to webxcel, and basically any FastCGI enabled language, like Perl or Python.

This post will be divided into two sections: first, we'll look into how off-the-shelf webservers integrate PHP by inspecting its protocol, and later we'll see how to build this ourselves. If you're just here for the code, feel free to check out the webxcel repo.

Do you even PHP?

When I started doing webdev in school, PHP was the way to go. Being on Windows back then, the easiest solution to get up and running was to simply install a preconfigured bundle locally. My bundle of choice was XAMPP, it came with an Apache webserver, PHP, MySQL and a mail server, all configured to work together out of the box. Once you had your site ready, you could deploy it to a webhoster, which ran PHP and a webserver for you.

It's 2019 now, and people use more sophisticated setups, e.g. something like this with separate containers for each service. To cover the basics: PHP itself supports two main modes of operation, CLI and FastCGI. If you run something like this, you can launch PHP scripts directly from your shell.



$ cat foo.php
<?php echo "hello world\n"; ?>
$ php foo.php
hello world

Adding PHP support to a server using the CLI would be convenient, since we could start the process each time we get a request from our server. That's essentially what plain CGI was: launching processes on each request. But this approach would also mean a relatively big overhead (individual memory allocations per process, a "cold VM" and process limits inside containers - to name a few), so PHP ships with FastCGI as well, to reuse processes for multiple requests.

The '90s called, they want their protocol back

FastCGI was originally shipped in the 1990s and set out to make the Common Gateway Interface (CGI), well, faster. To understand how it works internally, we should read the manual, but it's more fun to monitor NGINX making FastCGI calls to PHP. Usually, Wireshark would be my tool of choice, but by implementing a quick and dirty Python relay proxy instead, which essentially listens for incoming data and forwards it to our PHP container, we have full control over the raw data sent back and forth.

To understand the messages, we'll still have to read the specification though. FastCGI messages start with a 2 bytes version and message type field, each one char wide, following 6 bytes for the rest of the messages' header. The header specifies the content length of the body in a 2 byte integer, which ranges from 0 to 65,535. So each time our FastCGI server needs to send more than 64kB of data, it'll have to split it into multiple packets adding an 8 bytes header each time. For a 1MB website that means 16 chunks with 64kB data and an 8 bytes header each, which means our 1MB file will cause 128 bytes overhead in total. If you're after that "1µs optimization", you should consider minifying your webpages before passing them to your webserver, but on the other hand, who really cares about that kind of overhead ¯\_(ツ)_/¯

Depending on the message, the body may be either a struct or plain text. Most messages are pretty easy to deserialize, except for the "most complicated" message, the FastCGI params. Each param is sent as a ${key.length}${value.length}${key}${value} string, which needs to be parsed, too. As documented in the specification, a param's key or value may not exceed 255 characters, since the length is encoded as a single byte each. That means, if we wanted our client to send a long param to our server, we'd have to split it into multiple seperate headers and join it ourselves. On a sidenote: the probably longest user agent string from the most common browsers is from Edge and it spans 140 bytes, which is obviously < 255, so not an issue either.

A typical FastCGI request looks somewhat like this:

To initiate a FastCGI request, the webserver has to send a "begin request", which describes PHP's role and contains some flags to modify the protocol. Usually, we'll only see the webserver sending FCGI_RESPONDER to PHP, since we want it to respond to us with a website. After sending our params, the webserver sends our stdin to PHP, which means it'll send our request body. If we're doing a GET request, we don't have a body, so we can send an empty stdin right away. In case of a POST or UPDATE request, we'll have to signal PHP that we're done sending our stdin. To do so, the webserver sends an empty stdin. The same message flow is used for the params, too, as you can see in the diagram above.

Now PHP can execute our script - it received the script name via the params and got our request body from stdin. Once it's done, it sends back the stdout right away. Note, there's also an stderr message type, but even by raising an error in PHP, we'll usually only receive stdout messages. Similar to stdin, PHP will send an empty message to signal once it's done sending our website in stdout. Eventually, PHP will send an "end request" message, which contains a field for the script's exit code and whether the protocol ended successfully.

Integrating FastCGI into webxcel

How can we get this into webxcel now? First, we'll need to find a way to connect to arbitrary sockets, then we'll "just" have to de-/serialize FastCGI messages and we're done. Piece of cake.

VBA hell

If you've ever written VBA macros and had problems debugging them: imagine importing native functions from a fragmented documentation on how to do winsocks. Using sockets isn't hard per se, in Python one just has to import socket and get going, but in VBA we'll need to import the socket functions ourselves. Luckily, most of the stuff was already lying around from building webxcel's TCP server, but there appear to be multiple different ways to connect to a socket using winsocks, so finding the right approach with VBA's "this variable is 0" debugger can be considered difficult.

To spare you the frustrating process, eventually:

The key here was to use set the right values in sockaddr_in and implicitely cast it to the generic sockaddr type, which allows us to use connect(SOCKET*, sockaddr, sizeof(sockaddr)).

Write(UInt16 value)

VBA is from a different time. When it was created, memory was more sacred than it is now, and for an integrated scripting language, a 2 byte signed integer was just fine. But we want to communicate with a server that uses plain old 4 byte ints, so how can we do this?

Sending an int over the network is usually done in big endian, so e.g. the number 1337 or 0x0539 will be sent as 39 05. In VBA, Long is 4 bytes, Integer is 2 bytes and Byte obviously is 1 byte. Shifting a value i >> n 'ough times will more or less yield the nth byte, that can eventually be used to marshal the value to a big endian byte array, which in turn can then be sent over the network.

VBA doesn't offer a shifting operator though, but dividing by 0xFF and using the rest and result to build the byte representation works just as well. Also, VBA doesn't have a raw byte array, but strings can be used to simulate them: VBA provides raw access to strings, just like a char array in C, which also consist of 1 byte chars. So to marshal a Long to its bytes, all there is to do is divide it often enough and assign its bytes to a properly sized String.

FastCGI has both 2 and 4 byte integers, which will be marshalled to 2 and 4 byte wide strings. Fixed size strings can be allocated by Dim foo As String * size, but this is restricted to a constant size and it would be nice to have a more dynamic solution. So to recreate a malloc(size) function, one could try to naively add a RepeatString(size, char) function, which basically repeats a string, e.g. \0, by size times.

Turns out, this actually works. You can allocate memory from VBA using arbitrary strings (up to 2GB per string), which we have raw access to. And I thought we needed C to do this.



' the first malloc(n) with O(n) runtime complexity!
Public Function Malloc(ByVal size As Long) As String
  Malloc = ""

  Dim i As Long
  For i = 1 To size
    Malloc = Malloc & Chr(0)
  End For
End Function

Fun fact: writing to memory outside of your allocated string likely results in a segmentation fault, causing Excel to crash.

Parsing and serializing messages

With marshalling in place, FastCGI messages can then be de-/serialized. By creating an abstract class IFastCGIRecord, all messages can inherit a consistent interface for de-/serialization. Each message contains a header and a body, so there should be a FastCGIHeader field inside every message, and also fields for the message body, e.g. a role field for the FastCGIBeginRequest message. De-/serializing the records can now follow the classic composition pattern: first serialize the message header by calling the header's serialization method, then the send the body, like this:



' overload for IFastCGIRecord.WriteToTcpClient
Private Sub IFastCGIRecord_WriteToTcpClient(client As TcpClient)
    Dim header As IFastCGIRecord
    Set header = m_header
    ' serialize the header first
    header.WriteToTcpClient client

    Dim bytes As String
    bytes = ""

    bytes = bytes & Marshal.Int16ToBytes(Role)
    bytes = bytes & Marshal.Int8ToBytes(Flags)
    bytes = bytes & Reserved

    client.SendString bytes
End Sub

VBA's 2 byte Integer is signed and as soon as we try to deserialize real life messages, we'll quickly see the message size uses that one extra bit the sign stole from us. Let's just use a Long instead and call it a day, since there's probably not going to be a UInt16 in VBA anytime soon.

In webxcel, more functionality can easily be added using IWebController classes. By writing a new FastCGIWebController, we can then connect to a FastCGI server and exchange FastCGI messages.

Wrapup

So, we got PHP to webxcel. Why exactly did we want this?

In all sillyness, this is actually exciting. We wrote our own basic FastCGI implementation, which not only allows us to run PHP, but any arbitrary FastCGI-capable language, e.g. Perl. And let's be honest, who doesn't want to run Perl scripts from within Excel?

Top comments (4)

mæghith | Ramón FSM • May 15 '19

Sorry haven't read the article yet, but a PHP server running out of excel macros sounds like an aberration from the 7th layer of hell.

It also sounds like a lot of fun, I'm totally reading the article :D

gskur • Nov 28 '19

There is no such thing as "VBA hell" but simply a state of usual frustration trying to deal with a language one really does not know :) -- and, of course, the author doesn't (it's just a statement of fact, not a criticism).

It is not quite true that "VBA is from a different time." Rather, it was born in a different time but morphed/expanded/adjusted enough to remain quite viable even now. It was designed for MS Office apps and has been updated along the way so that even today one can do pretty much anything needed in that area, and very effectively too. And, btw, VBA versions for Excel, Access, Word, PowerPoint, etc. are different except for the core syntax and functionality. The only grudge I have is the VBE (the editor) which is certainly from 1980s, and not the best one even of those…

To illustrate the "not knowing the language" claim, here is a replacement for the “Public Function Malloc(ByVal size As Long) As String”. It is a simple “in-place” statement: Space(size). If you want zeros then this will do just fine: Replace(Space(size), " ", 0).

Or, regarding bit shifting: if you use Excel VBA you can utilize "WorkshitFunction" route to directly use the Excel's BITRSHIFT, BITLSHIFT, BITAND, BITOR and BITXOR functions.

Etc.

Otherwise, it is a nice (and brave) article. Thanks.