XP1LO2web simplifies HTML saved by LibreOffice

In order to save a functional HTML file, LibreOffice has to provide a header section, which includes a lot of information that has more to do with format than with content and which you probably don’t really want in your html. So XP1LO2web removes the header completely. You can then paste in your preferred header. I assume you’ll be editing the HTML to put in links, images etc. anyway.

Likewise, the <p> and <hn> tags in the file saved by LibreOffice contain a lot of formatting information that you won’t want, and XP1LO2web strips all that out, leaving just plain <p> and <hn> tags. LibreOffice also fairly often seems to think it must frequently repeat the information that the text is in GB English or US English, or that the font style is normal, or the text is black, etc. etc. XP1LO2web removes all that – but leaves in anything specifying other languages, styles or colours, together with the corresponding end tags.

Finally, if you delete something that was italic or bold or sub- or superscript etc., or you change styles, LibreOffice often leaves matching pairs of tags behind, enclosing nothing. XP1LO2web removes such empty pairs of tags – recursively, since it’s not unusual for them to end up nested several deep.

The end result is a much cleaner HTML file, that’s far easier to read and to edit, but which still has all your headings, bold, italic, small caps, coloured text, underlines, strike-thoughs etc. etc.

One thing to watch out for when you save as HTML from LibreOffice is to turn off “track changes” before saving. If you don’t, blocks of old text that don’t show any longer in LibreOffice will magically reappear, without even any indication that they’re supposed to have gone.

(This applies to the version of LibreOffice I updated to yesterday – it may not apply to all versions.)

Another gotcha is that sometimes (not always, and so far I’ve not managed to suss out when) LibreOffice, when saving an HTML file with footnotes, leaves the link characters out – it includes all the tags for the link, but omits the character! Iff that’s happened, XP1LO2web puts in footnote numbers for you – which may not be the format you wanted, but it’s definitely better than nothing!

The footnote itself, complete with its tags, is still there, but right at the end of the file, since HTML doesn’t have pages to put the footnotes at the foot of. This is an issue with footnotes in any pageless format, of course: they inevitably become endnotes.

By default, the resultant file is called simply ‘Z’ and placed in the same directory the original came from. You can edit the name, and drag the file wherever you like, in standard RISCOS fashion.

XP1LO2web will of course do a similar job on any HTML file from any source, but may not recognize every possible extraneous crap generated by other apps. You could edit the runfile yourself, or I‘d be happy to update the app to handle other apps’ crap if you send me a sample file or two.

Download XP1LO2web here

(It’s a self-extracting archive, so all you have to do is download the file into a suitable directory, set the file type to FFC, and double-click on it, which will create the whole application in the same directory. You can then delete the original downloaded file.)

I don’t expect any payment, but won’t complain if you’re feeling flush – I’ve got an adequate pension, but I’m far from rich!