Files
Lee Hanken 7c8efd2228 Initial commit: HPR Knowledge Base MCP Server
- MCP server with stdio transport for local use
- Search episodes, transcripts, hosts, and series
- 4,511 episodes with metadata and transcripts
- Data loader with in-memory JSON storage

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-26 10:54:13 +00:00

141 lines
8.9 KiB
Plaintext

Episode: 3805
Title: HPR3805: Document File Formats on Wikipedia
Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr3805/hpr3805.mp3
Transcribed: 2025-10-25 05:39:48
---
This is Hacker Public Radio Episode 3805 for Friday the 3rd of March 2023.
Today's show is entitled, Document File Formats on Wikipedia.
It is hosted by Archer 72 and is about 12 minutes long.
It carries a clean flag.
The summary is Document File Format, a continuation of Content Format.
Hello, this is Archer 72.
Welcome to Hacker Public Radio.
In this episode, I'm continuing to go through a Wikipedia article on Content Format to recap.
There has been a countless number of content formats throughout history.
The following are examples of some common content formats and content format categories,
covering sensor experience, model, and language used for encoding information.
The first item in this list is Document File Format.
A Document File Format is a text or binary file format storing documents on a storage
media, especially used by computers.
There currently exists a multitude of incompatible Document File Formats.
Examples of XML-based open standards are Docbook, XHTML, and more recently, the ISO IEC
Standards Open Document ISO 26300, 2006, and OpenOffice XML ISO 29500, 2008.
In 1993, the ITU-T tried to establish a standard for Document File Formats, known as Open
Document Architecture, which was supposed to replace all competing Document File Formats.
It is described by the ITU-T Document T.411 through T.421, which are equivalent to ISO
8613.
It did not succeed.
Page language descriptions such as PostScript and PDF have become de facto standards for
documents that a typical user should only be able to create and read, not edit.
In 2001, a series of ISO IEC standards for PDF began to be published, including the specification
for PDF itself.
ISO 32000, HTML is the most used and open international standard, and is also used as a Document File
Format.
It has also become ISO IEC Standard ISO 15445 2000.
The default binary file format used by Microsoft Word.doc has become widespread de facto standard
for office documents, but it is a proprietary format and is not always fully supported
by other word processors.
Modern Document File Formats are as follows, ASCII and UTF-A, which are plain text formats.
ASCII is abbreviated from American Standard Code for Information Interchange, is a character
encoding standard for electronic communication.
ASCII codes represent texts and computers, telekin medications, equipment, and other devices.
UTF-A is a variable-length character encoding standard used for electronic communication,
defined by the Unicode Standard.
The name is derived from Unicode Transformation Format 8-bit.
Amiga Guide is a Hypertext Document File Format designed for the Amiga.
Files are stored in ASCII, so it is possible to read and edit a file without the need
for special software.
This is that DOC is a file name extension used for word processing documents stored
on Microsoft's proprietary Microsoft Word binary file format.
Microsoft has used the extension since 1983, and specifications have been available since
2008 under the Opens specification promise, which is a promise by Microsoft published
in September 2006, not to assert its patents in certain conditions against implementation
of certain list of specifications.
Next up is DJVU, pronounced as the French word deja vu, and it is a computer file format
designed primarily to store scanned documents, especially those containing a combination
of text, line drawings, indexed color images, and photographs.
Next is Docbook, which is a semantic markup language for technical documentation.
It was originally intended for writing.
Technical documents related to computer hardware and software, but can be used in any other
sort of documentation.
Next up is HTML, with the extension of .html or .htm.
It is an open standard ISO from 2000.
Next up is Fiction Book, which is an open XML-based eBook format, which originated and gained
popularity in Russia.
Fiction Book files have the FB2 file name extension.
Some readers also support zip compressed fiction book files.
The next format is Markdown, which is a lightweight markup language for creating formatted
text using a plain text editor.
John Gruber and Aaron Schwartz created Markdown in 2004 as a markup language that is appealing
to human readers in its source code form.
Next up is OpenOffice XML, which is a zipped XML-based file format, developed by Microsoft
for representing spreadsheets, charts, presentations, and word processing documents.
Next is the OpenDocument format for Office Applications, a abbreviated ODF.
It is also known as OpenDocument, and is an open file format for word processing documents,
spreadsheets, presentations, and graphics using zip compressed XML files.
Next is OpenOffice XML, and it is an open XML-based file format, developed as an open community
effort by Sun Microsystems in 2000-2002.
The open source software application, suiteopenoffice.org, 1.x, and star office 6 and 7.
Use the format as their native default file format.
Next is OXPS, which is the OpenXML paper specification, and is an open specification
for page description language and a fixed document format.
Microsoft developed it as an XML paper specification.
In June 2009, ECMA International adopted it as an international standard ECMA-388.
Next up is PalmDoc, abbreviated PDB, and is a container format for record databases
in Palm OS, Garnet OS, and Access Linux platform.
It is structured similar to PRC's resource databases.
The PalmDoc eBook format is a special version of the PDB format.
Next up is PDF, which you already know is a portable document format standardized as
ISO 32,000, as a file format developed by Adobe in 1992 to present documents, including
text formatting and images, and a matter independent of application software, hardware, and operating
systems.
Next up category of PDF is PDF forward slash e, which is an ISO 24517-1 2008, it is an
ISO standard published in 2008 for document management, engineering document format using
PDF Part 1.
There is also PDF forward slash UA, which stands for Accessibility, and PDF forward slash
VT, which is variable data and transactional printing.
Next we have PostScript, which is a page description language in the electronic publishing desktop
publishing realm.
It is a dynamically typed concatenative programming language that was created at the Adobe
systems.
By John Warlock, Charles Gessie, Doug Brott's Ed Taft, and Bill Paxton from 1982 to 1984.
Next we have Rich Text Format, which is a proprietary document file format with published specification
developed by Microsoft Corporation from 1987 until 2008 for cross-platform document
and interchange with Microsoft products.
Next is SymbolicLink, SYLK, it is a Microsoft file format, typically used to exchange data
between applications, specifically spreadsheets.
SYLK files can eventually have a .SLK suffix, composed of only displayable,
ANSI characters.
It can be easily created in process by other applications such as databases.
Next is Scalable Vector Graphics SVG, it is an XML-based vector image format for defining
two-dimensional graphics, having support for interactivity and animation.
The SVG specification is an open standard developed by the World Wide Web Consortium since
1999.
Next we have Text, TEX, stylized with the system ST, subscript EX, is a typesetting system
which was designed and written by computer scientist and Stanford University professor
Donald Nooth and first released in 1978.
Next we have Text and Coding Initiative, TEI, it is a text-centric community of practice
in the academic field of digital humanities operating continuously since the 1980s.
Next is TROF, short for typesetter, ROF, it is a major component of a document processing
system developed by Bell Labs for the UNIX operating system.
TROF and the related NROF were developed from the original ROF.
Then we have Uniform Office Format, sometimes known as Unified Office Format, it is an open
standard for office applications developed in China, it includes word processing, presentation
and spreadsheet modules, and is made of GUI API and Format specifications.
And last there is WordPerfect which is a word processing application now owned by Karel
with a long history of multiple personal computer platforms.
With the height of its popularity in the 1980s and early 1990s, it was the dominant
player in the word processor market, displacing the prior market leader award star, says
Ben Archer 72 for Hacker Public Radio.
Feel free to record a show of your own, until next time.
You have been listening to Hacker Public Radio, as Hacker Public Radio does work.
Today's show was contributed by a HBR listener like yourself.
If you ever thought of recording a podcast, you click on our contribute link to find out
how easy it really is.
Hosting for HBR has been kindly provided by an honesthost.com, the Internet Archive
and R-Sync.net.
On the Sadois status, today's show is released under Creative Commons, Attribution, 4.0
International License.