Let me preface this by saying that my personal belief is that there are only three reasons for anyone to use PDF:
So, yeah, I don’t like PDF. At all. 90% of PDFs would be better off living life as web pages, and would be easier to search, merge, and work with … but have been saved to PDF simply because it is the de-facto option for users who want to embed images in their document file rather than storing them externally, or who don’t want to be bothered with the myriad of different web-browser rendering discrepancies.
As a format, PDF is not bad, it has the ability to have structured content and supports bookmarks and more. However, despite the fact that the freely available viewer supports marking up PDFs and writing comments and notes, nobody actually ships PDFs with that feature enabled.
Now, Microsoft has developed an alternative, code-named “Metro” which supposedly allows sharing documents in Windows so that the user doesn’t require the source application. But in essence, Metro is just an update to the Windows print architecture. And, totally aside from being a part of the as-yet-beta Vista operating system, it appears to have no cross-platform ambitions.
As though to emphasize this, the Microsoft Office team has announced that Office 12 will support exporting documents as PDF files, something long a part of Open Office, and cited by many as a good reason to switch (as though you had any desire to create the pathetic PDFs that Open Office exports).
So this begs the question: why not just include a PDF Printer in the operating system? That way, all of your applications would gain the apparently desireable feature … (as they will, if you just download and install PDF Creator from SourceForge).
The funny part is: Office won’t include a viewer, so you’ll still have to go download Adobe Acrobat’s bloated viewer.
So here’s my question: why don’t we just simplify the MHTML web archive format? My suggestion is that we simply rip off the office format concept:
The new format will be called Zipped HTML and will have a file extension zhtml (because xhtml is already taken). The format is as follows: you take all the relative-path links to images, style sheets and scripts, and you change them to remove all path information, and you put all of these files in to a simple .zip file with the source html file and rename it acording to the name of the html file, with the zhtml extension.
Think we could handle that? I mean, it’s basically the same thing they’re doing with the XML-based formats from Microsoft Office, Star Office, and Open Office… it’s simple, portable, and could be made safer in the same way that the Microsoft Office ones are: documents with scripting must be named with a slightly different extension (e.g.: zhtml-js) to make the user aware of the fact, otherwise, scripts are not executed at all.
Edit 4-Oct-2005] I just realized that of course, what I’ve described above is basically what MAF is, although MAF adds an RDF file. Basically, it’s a great idea, except it only works in mozilla’s browsers, so the best part about it is that the developer also implemented saving files as MHTML files (which are an international standard, they use MIME encoding instead of ziping, so they’re bigger, but it works, and they’re viewable in IE).
But of course, it still suffers from several problems:
Anyone have a better idea? Maybe we should just petition Adobe to light up Acrobat, take out those plugins to speed up loading, and stop trying to peddle the Yahoo toolbar and the Adobe Album…
Maybe one could use Foxit Reader? (It’s free and light)
What a coinsidence!
You know, since years I got this pain in the *** with AcrobatReader. Either I am stupid or it really has no bookmarking capabiliy. That was, when I read a book in PDF form, I closed the laptop for standby mode in order to bookmark the document, so I could continue reading where I left. This is so 19th hundred industrial!
I do not talk about that what /they/ call “bookmark”, I mean what me and granny calls “bookmarks”. /They” call chapters bookmarks, as it seems.
And yes, it is bloadted. Especially that crap of Yahoo Toolbar, which at least you can switch off in the stand-alone version but not in the browser plugin.
Other than that, I like PDF for its compactness (one file does it all) and the possibility to read it on any platform. In the end, it is a Postscript variant, no ?
But, oh, just yesterday I thought about replacing PDF with a new format. I wrote a software module to control a HomeTheater amplifier over RS 232. They gave me the protocol and command-specs in two PDFs. The commands were maybe 150 or so and arranged in a table. Byte-sequence, command-name, command-description. I transformed this into Python code. What I wanted to do was to copy the table into my text editor. A simple copy&paste gave me a string per line. Without any means as how to automate the task. The byte-sequence was static in its length. But the command name could be up to three words (natural language) long, so a simple editor -> record macro would not do.
The ASCII export of AcrobatReader produced a line per column. At least something to work with. But this was, when I watched around Sourceforge for software, that would allow me to do with a PDF whatever I wanted. No luck. The only such tool seems to be (hail to the proprietary world) Adobes own Distiller.
So I thought, that we need something new. And I quickly came to realize, that it may be time for OASIS (creators of the OpenDocument spec, which OOo uses) to bring an alternative to PDF.
The ODF files are a JAR dialect, meaning a ZIP file with a MANIFEST but not a 100% Java jar. In this archive there is sub-directories for all media-data (Bitmaps, etc.), you can include scipts, it has standards-conformity (dublin-core, etc.) and is XML, meaning (not so easy) XSLT possibilities (not so easy, since it is rather complex and I am right in writing an XHTML WYSIWYG XSL transformer, which is pretty tough). It can include SVG and MathML and many more, at least it will do so fully in the next updates (so I expect). It will also be able to do more with tables, and so on.
So, all files are accessible to the user. It’s editable. There is a stand-alone viewer in the works. There are already Firefox and IE plugins for it. It’s backed by a whole Office Suite. Add some encryption, XML security and it should be able to replace PDF.
But then…. PDF is too accepted in order to have it gone forever.