This utility, Format4Blog, has been after me to be written almost since the day I started to learn Scheme. It is starting to show signs of maturity, being that I've now used it on one of my blog articles as the main formatting tool. Still, there are some gaps in what it can do and what I would like it to do.
What it can do is format a Scheme source file for use on a blog, or any web page for that matter. It really needs to have the source file marked up in block comments between text and code, so that the code has been marked as <pre>formatted and the text in the comments has been marked up with any css style in my blog's stylesheet, which is a subset of my website's stylesheet. For my style of documentation this adds very little extra work, as I am in the habit of documenting my source files extensively with XML inside of block comments. When I create a source that is destined to be a blog article I merely switch to coding with HTML for markup, rather than XML for content.
What happens with the source file is that it is parsed by a rather thorough Scheme lexer which emits different classes of tokens. There's a class for symbols, constants, strings (including here strings), comments (including block comments), builtin identifiers, and a growing list of what I call library identifiers. Library identifiers end up being roughly anything that isn't R5RS Scheme, but wasn't defined by the programmer. So, it includes PLT extensions to the language as well as functions found in SRFI's. I say growing list, because the lexer makes this distinction based on a list of R5RS Scheme identifiers, and another list which stores the names of all the other identifiers. As you can imagine, there are quite a number of library identifiers, and so far I've been adding them on an as needed basis. One of my release criteria is that I utilize a dictionary built from DrScheme source files to attempt to complete the list of library identifiers using match heuristics against the dictionary. A direct correlation is made between these lexeme classes and css classes, so that the output of the HTML is finely grained syntax directed stylized text, most notably color, but any text attribute can be unique within a class.
The lexer began life in the syntax-color library of DrScheme, but was modified in several areas. There were two big changes that had to be made, and several add-ons' to functionality. The first of the two was that the original lexer, like most good lexers, ignored white space, and for my purposes where code would be marked up as <pre>formatted it was essential that the white space be maintained. The other big change was that, again, in the tradition of normal lexing, it chose to skip completely over block comments, as if they where white space. This had to be fixed, since it is in those block comments that the text and HTML markup lives, hardly something one could simply toss away as white space. The additional functionality came in separating the original lexeme symbol class up into symbol, builtin, and library as discussed above. But there were other things I couldn't live without, namely, glyphs, and the lambda glyph in particular. So, there is now a pretty stable set of glyphs for lambda and arrows and such that make expository writing about Scheme so much nicer when you have them.
Once its soup you'll find out about it here or my web site, and the package will likely be made available on PlaneT as well as my web site.