Episode: 3805 Title: HPR3805: Document File Formats on Wikipedia Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr3805/hpr3805.mp3 Transcribed: 2025-10-25 05:39:48 --- This is Hacker Public Radio Episode 3805 for Friday the 3rd of March 2023. Today's show is entitled, Document File Formats on Wikipedia. It is hosted by Archer 72 and is about 12 minutes long. It carries a clean flag. The summary is Document File Format, a continuation of Content Format. Hello, this is Archer 72. Welcome to Hacker Public Radio. In this episode, I'm continuing to go through a Wikipedia article on Content Format to recap. There has been a countless number of content formats throughout history. The following are examples of some common content formats and content format categories, covering sensor experience, model, and language used for encoding information. The first item in this list is Document File Format. A Document File Format is a text or binary file format storing documents on a storage media, especially used by computers. There currently exists a multitude of incompatible Document File Formats. Examples of XML-based open standards are Docbook, XHTML, and more recently, the ISO IEC Standards Open Document ISO 26300, 2006, and OpenOffice XML ISO 29500, 2008. In 1993, the ITU-T tried to establish a standard for Document File Formats, known as Open Document Architecture, which was supposed to replace all competing Document File Formats. It is described by the ITU-T Document T.411 through T.421, which are equivalent to ISO 8613. It did not succeed. Page language descriptions such as PostScript and PDF have become de facto standards for documents that a typical user should only be able to create and read, not edit. In 2001, a series of ISO IEC standards for PDF began to be published, including the specification for PDF itself. ISO 32000, HTML is the most used and open international standard, and is also used as a Document File Format. It has also become ISO IEC Standard ISO 15445 2000. The default binary file format used by Microsoft Word.doc has become widespread de facto standard for office documents, but it is a proprietary format and is not always fully supported by other word processors. Modern Document File Formats are as follows, ASCII and UTF-A, which are plain text formats. ASCII is abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent texts and computers, telekin medications, equipment, and other devices. UTF-A is a variable-length character encoding standard used for electronic communication, defined by the Unicode Standard. The name is derived from Unicode Transformation Format 8-bit. Amiga Guide is a Hypertext Document File Format designed for the Amiga. Files are stored in ASCII, so it is possible to read and edit a file without the need for special software. This is that DOC is a file name extension used for word processing documents stored on Microsoft's proprietary Microsoft Word binary file format. Microsoft has used the extension since 1983, and specifications have been available since 2008 under the Opens specification promise, which is a promise by Microsoft published in September 2006, not to assert its patents in certain conditions against implementation of certain list of specifications. Next up is DJVU, pronounced as the French word deja vu, and it is a computer file format designed primarily to store scanned documents, especially those containing a combination of text, line drawings, indexed color images, and photographs. Next is Docbook, which is a semantic markup language for technical documentation. It was originally intended for writing. Technical documents related to computer hardware and software, but can be used in any other sort of documentation. Next up is HTML, with the extension of .html or .htm. It is an open standard ISO from 2000. Next up is Fiction Book, which is an open XML-based eBook format, which originated and gained popularity in Russia. Fiction Book files have the FB2 file name extension. Some readers also support zip compressed fiction book files. The next format is Markdown, which is a lightweight markup language for creating formatted text using a plain text editor. John Gruber and Aaron Schwartz created Markdown in 2004 as a markup language that is appealing to human readers in its source code form. Next up is OpenOffice XML, which is a zipped XML-based file format, developed by Microsoft for representing spreadsheets, charts, presentations, and word processing documents. Next is the OpenDocument format for Office Applications, a abbreviated ODF. It is also known as OpenDocument, and is an open file format for word processing documents, spreadsheets, presentations, and graphics using zip compressed XML files. Next is OpenOffice XML, and it is an open XML-based file format, developed as an open community effort by Sun Microsystems in 2000-2002. The open source software application, suiteopenoffice.org, 1.x, and star office 6 and 7. Use the format as their native default file format. Next is OXPS, which is the OpenXML paper specification, and is an open specification for page description language and a fixed document format. Microsoft developed it as an XML paper specification. In June 2009, ECMA International adopted it as an international standard ECMA-388. Next up is PalmDoc, abbreviated PDB, and is a container format for record databases in Palm OS, Garnet OS, and Access Linux platform. It is structured similar to PRC's resource databases. The PalmDoc eBook format is a special version of the PDB format. Next up is PDF, which you already know is a portable document format standardized as ISO 32,000, as a file format developed by Adobe in 1992 to present documents, including text formatting and images, and a matter independent of application software, hardware, and operating systems. Next up category of PDF is PDF forward slash e, which is an ISO 24517-1 2008, it is an ISO standard published in 2008 for document management, engineering document format using PDF Part 1. There is also PDF forward slash UA, which stands for Accessibility, and PDF forward slash VT, which is variable data and transactional printing. Next we have PostScript, which is a page description language in the electronic publishing desktop publishing realm. It is a dynamically typed concatenative programming language that was created at the Adobe systems. By John Warlock, Charles Gessie, Doug Brott's Ed Taft, and Bill Paxton from 1982 to 1984. Next we have Rich Text Format, which is a proprietary document file format with published specification developed by Microsoft Corporation from 1987 until 2008 for cross-platform document and interchange with Microsoft products. Next is SymbolicLink, SYLK, it is a Microsoft file format, typically used to exchange data between applications, specifically spreadsheets. SYLK files can eventually have a .SLK suffix, composed of only displayable, ANSI characters. It can be easily created in process by other applications such as databases. Next is Scalable Vector Graphics SVG, it is an XML-based vector image format for defining two-dimensional graphics, having support for interactivity and animation. The SVG specification is an open standard developed by the World Wide Web Consortium since 1999. Next we have Text, TEX, stylized with the system ST, subscript EX, is a typesetting system which was designed and written by computer scientist and Stanford University professor Donald Nooth and first released in 1978. Next we have Text and Coding Initiative, TEI, it is a text-centric community of practice in the academic field of digital humanities operating continuously since the 1980s. Next is TROF, short for typesetter, ROF, it is a major component of a document processing system developed by Bell Labs for the UNIX operating system. TROF and the related NROF were developed from the original ROF. Then we have Uniform Office Format, sometimes known as Unified Office Format, it is an open standard for office applications developed in China, it includes word processing, presentation and spreadsheet modules, and is made of GUI API and Format specifications. And last there is WordPerfect which is a word processing application now owned by Karel with a long history of multiple personal computer platforms. With the height of its popularity in the 1980s and early 1990s, it was the dominant player in the word processor market, displacing the prior market leader award star, says Ben Archer 72 for Hacker Public Radio. Feel free to record a show of your own, until next time. You have been listening to Hacker Public Radio, as Hacker Public Radio does work. Today's show was contributed by a HBR listener like yourself. If you ever thought of recording a podcast, you click on our contribute link to find out how easy it really is. Hosting for HBR has been kindly provided by an honesthost.com, the Internet Archive and R-Sync.net. On the Sadois status, today's show is released under Creative Commons, Attribution, 4.0 International License.