Rendering phpBB BBCode in Go: Part 3

In the last part, we learned that it would be fairly difficult to make an accurate BBCode renderer for the TextFormatter-based storage format. In order to accomplish my goal in less time, I've decided on making my own XSLT stylesheet for the conversion.

I pondered for a bit on how to best approach this, and I think the best approach is going to be simply making TextFormatter do it. We won't be able to do this part from Go code, unless someone wants to write a PHP 7 implementation in pure Go for some reason.

Getting Set Up

We need to host a PHP environment capable of running TextFormatter, and a MySQL instance with our database. I went with Docker Compose because it's a pretty easy way to get started fast.

First, how Compose is setup. I just have two containers in my docker-compose.yml:

version: '3'

services:
  mariadb:
    image: mariadb:10.3
    volumes:
      - ./dump.sql:/docker-entrypoint-initdb.d/dump.sql
    environment:
      MYSQL_ROOT_PASSWORD: root
      MYSQL_DATABASE: phpbb
    ports:
      - "3306:3306"
  php:
    build: ./xsltgen
    volumes:
      - ./xsltgen:/src

Inside the xsltgen folder, I have the following Dockerfile:

FROM php:7-cli-alpine

COPY --from=composer:1.5 /usr/bin/composer /usr/bin/composer

RUN apk add --no-cache git libxslt-dev
RUN docker-php-ext-install xsl mysqli

ADD . /src

ENTRYPOINT ["/bin/sh", "-c", "sleep 100000"]

This should give us a PHP 7 container with XSLT, MySQLi, and composer already installed. Now we can declare our dependencies in a composer.json file:

{
    "name": "jchw/xsltgen",
    "type": "project",
    "require": {
        "s9e/text-formatter": "^1.3"
    }
}

This will give us access to TextFormatter.

Now we need to write some PHP. We could go a long way to get more authentic phpBB rendering, but right now I want to get a working prototype, and it doesn't really seem like I'd get a whole lot of value out of implementing this nicer in PHP when my end goal is to process a forum using only Golang. So, I've manually procured some stuff that could be done better.

Let's start by loading Composer and some TextFormatter classes:

<?php
require __DIR__ . '/vendor/autoload.php';

use s9e\TextFormatter\Configurator;
use s9e\TextFormatter\Configurator\Items\AttributeFilters\RegexpFilter;
use s9e\TextFormatter\Configurator\Items\UnsafeTemplate;

So far, pretty standard. You'll see in a moment why we need to load RegexpFilter and UnsafeTemplate. Next up, we need something a bit more interesting: phpBB's default BBCode. It's actually dependent on both your board's style as well as some hardcoded defaults. The process by which the BBCode is loaded is not so complicated - the style stores some templates in bbcode.html, which is a mishmash of some legacy templates and an XSLT template for quotes. However, it's not really worth reproducing it exactly, since we only have a few. I've hand-produced the following for Prosilver:

$default_bbcode = array(
    'attachment' => array(
        'usage' => '[ATTACHMENT index={NUMBER} filename={TEXT;useContent}]',
        'template' => '<div class="inline-attachment"><xsl:apply-templates/></div>'
    ),
    'b'     => array(
        'usage' => '[B]{TEXT}[/B]',
        'template' => '<span style="font-weight: bold"><xsl:apply-templates/></span>'
    ),
    'code'  => array(
        'usage' => '[CODE lang={IDENTIFIER;optional}]{TEXT}[/CODE]',
        'template' => '<div class="codebox"><p>Code: <a href="#" onclick="selectCode(this); return false;">Select All</a></p><pre><code><xsl:apply-templates/></code></pre></div>'
    ),
    'color' => array(
        'usage' => '[COLOR={COLOR}]{TEXT}[/COLOR]',
        'template' => '<span style="color: {COLOR}"><xsl:apply-templates/></span>'
    ),
    'email' => array(
        'usage' => '[EMAIL={EMAIL;useContent} subject={TEXT;optional;postFilter=rawurlencode} body={TEXT;optional;postFilter=rawurlencode}]{TEXT}[/EMAIL]',
        'template' => '<a>
            <xsl:attribute name="href">
                <xsl:text>mailto:</xsl:text>
                <xsl:value-of select="@email"/>
                <xsl:if test="@subject or @body">
                    <xsl:text>?</xsl:text>
                    <xsl:if test="@subject">subject=<xsl:value-of select="@subject"/></xsl:if>
                    <xsl:if test="@body"><xsl:if test="@subject">&amp;</xsl:if>body=<xsl:value-of select="@body"/></xsl:if>
                </xsl:if>
            </xsl:attribute>
            <xsl:apply-templates/>
        </a>'
    ),
    'flash' => array(
        'usage' => '[FLASH={NUMBER1},{NUMBER2} width={NUMBER1;postFilter=#flashwidth} height={NUMBER2;postFilter=#flashheight} url={URL;useContent} /]',
        'template' => '<object classid="clsid:D27CDB6E-AE6D-11CF-96B8-444553540000" codebase="http://active.macromedia.com/flash2/cabs/swflash.cab#version=5,0,0,0" width="{WIDTH}" height="{HEIGHT}"><param name="movie" value="{URL}" /><param name="play" value="false" /><param name="loop" value="false" /><param name="quality" value="high" /><param name="allowScriptAccess" value="never" /><param name="allowNetworking" value="internal" /><embed src="{URL}" type="application/x-shockwave-flash" pluginspage="http://www.macromedia.com/shockwave/download/index.cgi?P1_Prod_Version=ShockwaveFlash" width="{WIDTH}" height="{HEIGHT}" play="false" loop="false" quality="high" allowscriptaccess="never" allownetworking="internal"></embed></object>'
    ),
    'i'     => array(
        'usage' => '[I]{TEXT}[/I]',
        'template' => '<span style="font-style: italic"><xsl:apply-templates/></span>'
    ),
    'img'   => array(
        'usage' => '[IMG src={IMAGEURL;useContent}]',
        'template' => '<img src="{IMAGEURL}" class="postimage" alt="Image"/>'
    ),
    'list'  => array(
        'usage' => '[LIST type={HASHMAP=1:decimal,a:lower-alpha,A:upper-alpha,i:lower-roman,I:upper-roman;optional;postFilter=#simpletext} #createChild=LI]{TEXT}[/LIST]',
        'template' => '<xsl:choose>
            <xsl:when test="not(@type)">
                <ul><xsl:apply-templates/></ul>
            </xsl:when>
            <xsl:when test="contains(\'upperlowerdecim\',substring(@type,1,5))">
                <ol style="list-style-type: {LIST_TYPE}"><xsl:apply-templates/></ol>
            </xsl:when>
            <xsl:otherwise>
                <ul style="list-style-type: {LIST_TYPE}"><xsl:apply-templates/></ul>
            </xsl:otherwise>
        </xsl:choose>'
    ),
    'li'    => array(
        'usage' => '[* $tagName=LI]{TEXT}[/*]',
        'template' => '<li><xsl:apply-templates/></li>'
    ),
    'quote' => array(
        'usage' => "[QUOTE
            author={TEXT1;optional}
            post_id={UINT;optional}
            post_url={URL;optional;postFilter=#false}
            profile_url={URL;optional;postFilter=#false}
            time={UINT;optional}
            url={URL;optional}
            user_id={UINT;optional}
            author={PARSE=/^\\[url=(?'url'.*?)](?'author'.*)\\[\\/url]$/i}
            author={PARSE=/^\\[url](?'author'(?'url'.*?))\\[\\/url]$/i}
            author={PARSE=/(?'url'https?:\\/\\/[^[\\]]+)/i}
        ]{TEXT2}[/QUOTE]",
        'template' => '<blockquote>
            <xsl:if test="not(@author)">
                <xsl:attribute name="class">uncited</xsl:attribute>
            </xsl:if>
            <div>
                <xsl:if test="@author">
                    <cite>
                        <xsl:choose>
                            <xsl:when test="@url">
                                <a href="{@url}" class="postlink"><xsl:value-of select="@author"/></a>
                            </xsl:when>
                            <xsl:when test="@profile_url">
                                <a href="{@profile_url}"><xsl:value-of select="@author"/></a>
                            </xsl:when>
                            <xsl:otherwise>
                                <xsl:value-of select="@author"/>
                            </xsl:otherwise>
                        </xsl:choose>
                        <xsl:text> </xsl:text>
                        <xsl:value-of select="$L_WROTE"/>
                        <xsl:value-of select="$L_COLON"/>
                        <xsl:if test="@post_url">
                            <xsl:text> </xsl:text>
                            <a href="{@post_url}" data-post-id="{@post_id}" onclick="if(document.getElementById(hash.substr(1)))href=hash">&#8593;</a>
                        </xsl:if>
                        <xsl:if test="@date">
                            <div class="responsive-hide"><xsl:value-of select="@date"/></div>
                        </xsl:if>
                    </cite>
                </xsl:if>
                <xsl:apply-templates/>
            </div>
        </blockquote>'
    ),
    'size'  => array(
        'usage' => '[SIZE={FONTSIZE}]{TEXT}[/SIZE]',
        'template' => '<span style="font-size: {FONTSIZE}%; line-height: normal"><xsl:apply-templates/></span>'
    ),
    'u'     => array(
        'usage' => '[U]{TEXT}[/U]',
        'template' => '<span style="text-decoration: underline"><xsl:apply-templates/></span>'
    ),
    'url'   => array(
        'usage' => '[URL={URL;useContent} $forceLookahead=true]{TEXT}[/URL]',
        'template' => '<a href="{URL}" class="postlink">{TEXT}</a>'
    ),
);

Phew, that's a lot. With that out of the way, we're ready to do some actual work. First thing we need to do is instantiate a Configurator:

$configurator = new Configurator;
$configurator->rendering->engine = 'XSLT';

We need to add some attribute filters for the default BBCode to properly load. We don't actually need to parse, so we can just noop this.

function noop($v) { return $v; }
$configurator->attributeFilters->add('#flashheight', 'noop');
$configurator->attributeFilters->add('#flashwidth', 'noop');
$configurator->attributeFilters->add('#fontsize', 'noop')->markAsSafeInCSS();
$configurator->attributeFilters->add('#imageurl', 'noop')->markAsSafeAsURL();

Finally, after all that work, we can load the BBCode in.

foreach ($default_bbcode as $bbcode) {
    $configurator->BBCodes->addCustom($bbcode['usage'], $bbcode['template']);
}

It's pretty pointless to implement all of this without custom BBCode. But before we can load custom BBCode, we need a couple more filters. I won't go over what's going on here, you can find a bit more information by looking at phpBB's factory.php code if you really want.

$filter = new RegexpFilter("#^(?:[^\p{C}\p{Z}\p{S}\p{P}\p{Nl}\p{No}\p{Me}\x{1100}-\x{115F}\x{A960}-\x{A97C}\x{1160}-\x{11A7}\x{D7B0}-\x{D7C6}\x{20D0}-\x{20FF}\x{1D100}-\x{1D1FF}\x{1D200}-\x{1D24F}\x{0640}\x{07FA}\x{302E}\x{302F}\x{3031}-\x{3035}\x{303B}]*[\x{00B7}\x{0375}\x{05F3}\x{05F4}\x{30FB}\x{002D}\x{06FD}\x{06FE}\x{0F0B}\x{3007}\x{00DF}\x{03C2}\x{200C}\x{200D}\pL0-9\-._~!$&'()*+,;=:@|]+|%[\dA-F]{2})*(?:/(?:[^\p{C}\p{Z}\p{S}\p{P}\p{Nl}\p{No}\p{Me}\x{1100}-\x{115F}\x{A960}-\x{A97C}\x{1160}-\x{11A7}\x{D7B0}-\x{D7C6}\x{20D0}-\x{20FF}\x{1D100}-\x{1D1FF}\x{1D200}-\x{1D24F}\x{0640}\x{07FA}\x{302E}\x{302F}\x{3031}-\x{3035}\x{303B}]*[\x{00B7}\x{0375}\x{05F3}\x{05F4}\x{30FB}\x{002D}\x{06FD}\x{06FE}\x{0F0B}\x{3007}\x{00DF}\x{03C2}\x{200C}\x{200D}\pL0-9\-._~!$&'()*+,;=:@|]+|%[\dA-F]{2})*)*(?:\?(?:[^\p{C}\p{Z}\p{S}\p{P}\p{Nl}\p{No}\p{Me}\x{1100}-\x{115F}\x{A960}-\x{A97C}\x{1160}-\x{11A7}\x{D7B0}-\x{D7C6}\x{20D0}-\x{20FF}\x{1D100}-\x{1D1FF}\x{1D200}-\x{1D24F}\x{0640}\x{07FA}\x{302E}\x{302F}\x{3031}-\x{3035}\x{303B}]*[\x{00B7}\x{0375}\x{05F3}\x{05F4}\x{30FB}\x{002D}\x{06FD}\x{06FE}\x{0F0B}\x{3007}\x{00DF}\x{03C2}\x{200C}\x{200D}\pL0-9\-._~!$&'()*+,;=:@/?|]+|%[\dA-F]{2})*)?(?:\#(?:[^\p{C}\p{Z}\p{S}\p{P}\p{Nl}\p{No}\p{Me}\x{1100}-\x{115F}\x{A960}-\x{A97C}\x{1160}-\x{11A7}\x{D7B0}-\x{D7C6}\x{20D0}-\x{20FF}\x{1D100}-\x{1D1FF}\x{1D200}-\x{1D24F}\x{0640}\x{07FA}\x{302E}\x{302F}\x{3031}-\x{3035}\x{303B}]*[\x{00B7}\x{0375}\x{05F3}\x{05F4}\x{30FB}\x{002D}\x{06FD}\x{06FE}\x{0F0B}\x{3007}\x{00DF}\x{03C2}\x{200C}\x{200D}\pL0-9\-._~!$&'()*+,;=:@/?|]+|%[\dA-F]{2})*)?$#Du");
$configurator->attributeFilters->add('#local_url', $filter);
$configurator->attributeFilters->add('#relative_url', $filter);

Alright. Now we can go through the database and load custom BBCode:

$my = new mysqli('mariadb', 'root', 'root', 'phpbb');
$result = $my->query("SELECT * FROM phpbb_bbcodes");

while ($row = $result->fetch_assoc()) {
    $tpl = preg_replace_callback('#\\{LOCAL_URL\\d*\\}#', function ($m) { return '/forum/' . $m[0]; }, $row['bbcode_tpl']);

    $configurator->BBCodes->addCustom($row['bbcode_match'], new UnsafeTemplate($tpl));
}

Not so hard after all of that work, actually. Finally, we have one more thing to do:

file_put_contents('bbcode.xsl', $configurator->rendering->engine->getXSL($configurator->rendering));

Now exec into the container, run composer install, and php index.php, and with any luck, you should get a bbcode.xsl file with a full XML Stylesheet that can turn the internal storage format into HTML.

This process is pretty easy to tweak, and now that we have a stylesheet to work with, we can chuck off the PHP part of this and do our processing however we want.

Back to Go

So far, there's been more PHP code than Go code in this blog. A lot of it, too, has simply been mimicking phpBB. It's finally time to write something vaguely original.

PHP uses libxslt, a library of the libxml project, to do its XSLT transforms. Technically, phpBB does not, since it uses the PHP-based renderer. Still, libxslt seems like our best option, since Go XML libraries don't seem terribly mature.

Our luck only gets worse from there. Just about every single other component of libxml has Go bindings except for libxslt! Looks like we'll have to do it on our own.

Wrapping libxslt

First and foremost, you need to have libxml2 and libxslt installed. libxml2 also requires zlib. If you're on Windows, you will want to set up MSYS2 and install mingw-w64-x86_64-gcc, mingw-w64-x86_64-pkg-config, mingw-w64-x86_64-libxml2, and mingw-w64-x86_64-libxslt, and you will need to add your MinGW64 bin folder to %PATH%. Under Linux, just install libxslt and libxml2 (you might need to use the dev variants, and this might vary per distribution a bit.)

After you have the libraries set up, we can start working. I want to do this quickly, so I'm skipping error handling for now. We need the magic incantation for Cgo:

package xsltexec

/*
#cgo pkg-config: libxml-2.0 libxslt zlib
#include <string.h>
#include <stdbool.h>
#include <stdio.h>
#include <libxml/parser.h>
#include <libxslt/transform.h>
#include <libxslt/xsltutils.h>
*/
import "C"
import "unsafe"

So far, pretty simple. Now we need to define our abstraction. I chose to make a structure that represents an XML Stylesheet, like so:

type Stylesheet struct {
	xsl C.xsltStylesheetPtr
}

We need to be able to construct this. So we do this:

func NewStylesheet(source string) *Stylesheet {
	C.xmlSubstituteEntitiesDefault(1)
	data := C.CString(source)
	defer C.free(unsafe.Pointer(data))
	return &Stylesheet{
		xsl: C.xsltParseStylesheetDoc(C.xmlParseMemory(data, C.int(len(source)))),
	}
}

A bit more magical but nothing too out of the ordinary. We're not checking any errors so this will blow up very violently if we pass in bad data! Keep that in mind.

Now we just need our Apply function:

func (s *Stylesheet) Apply(xml string) string {
	xmldata := C.CString(xml)
	defer C.free(unsafe.Pointer(xmldata))
	doc := C.xmlParseMemory(xmldata, C.int(len(xml)))
	res := C.xsltApplyStylesheet(s.xsl, doc, nil)

	var out *C.uchar
	var len C.int
	C.xsltSaveResultToString(&out, &len, res, s.xsl)
	defer C.free(unsafe.Pointer(out))

	return C.GoStringN((*C.char)(unsafe.Pointer(out)), len)
}

Once again, nothing too surprising here. We're using a minimal amount of libxml2/libxslt API to achieve our goals, which is mostly just a matter of making life easier - the API is actually a lot more powerful than this.

Now, in Go, you can take data directly from your MySQL database, and render it as HTML.

package main

import "xsltexec"

func main() {
    src, err := ioutil.ReadFile("bbcode.xsl")
    if err != nil {
        panic(err)
    }

    stylesheet := xsltexec.NewStylesheet(string(src))

    println(stylesheet.Apply("<B>Hello, world!</B>"))
}

With any luck, you should see <span style="font-weight:bold">Hello, world!</span> outputted. You can now render your forum's posts to HTML.

Why?

It may be a bit easier to do this by hooking into phpBB directly. However, I wanted a more general solution that I had finer control of. Being able to control the stylesheet directly enables me to make adjustments as needed.

Go was my choice because it's a language that executes fast. It can rip through gigabytes of this stuff, even when using Cgo to shell out to libxml2 and libxslt. No doubt the same code could be written in Python or PHP with little or no effort, but it would take more time to iterate. Here, I can just throw the entire DB at the code repeatedly, catching bugs quickly.

The end result of the decisions is a system that is very tweakable with a short feedback loop.

What's Next?

I think TextFormatter is a cool library and having a more complete implementation in Go would be nice.

I've reached an important milestone; I can do what I set out to do, which is turn phpBB BBCode into HTML from Go. Some time later, I plan to return and do a more general version of this that does not require running PHP code, and if possible, I'd like to remove the Cgo dependency. For now, I am going to continue working on the project that prompted me to investigate rendering phpBB posts in Go.