- MCP server with stdio transport for local use - Search episodes, transcripts, hosts, and series - 4,511 episodes with metadata and transcripts - Data loader with in-memory JSON storage 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
141 lines
8.9 KiB
Plaintext
141 lines
8.9 KiB
Plaintext
Episode: 3805
|
|
Title: HPR3805: Document File Formats on Wikipedia
|
|
Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr3805/hpr3805.mp3
|
|
Transcribed: 2025-10-25 05:39:48
|
|
|
|
---
|
|
|
|
This is Hacker Public Radio Episode 3805 for Friday the 3rd of March 2023.
|
|
Today's show is entitled, Document File Formats on Wikipedia.
|
|
It is hosted by Archer 72 and is about 12 minutes long.
|
|
It carries a clean flag.
|
|
The summary is Document File Format, a continuation of Content Format.
|
|
Hello, this is Archer 72.
|
|
Welcome to Hacker Public Radio.
|
|
In this episode, I'm continuing to go through a Wikipedia article on Content Format to recap.
|
|
There has been a countless number of content formats throughout history.
|
|
The following are examples of some common content formats and content format categories,
|
|
covering sensor experience, model, and language used for encoding information.
|
|
The first item in this list is Document File Format.
|
|
A Document File Format is a text or binary file format storing documents on a storage
|
|
media, especially used by computers.
|
|
There currently exists a multitude of incompatible Document File Formats.
|
|
Examples of XML-based open standards are Docbook, XHTML, and more recently, the ISO IEC
|
|
Standards Open Document ISO 26300, 2006, and OpenOffice XML ISO 29500, 2008.
|
|
In 1993, the ITU-T tried to establish a standard for Document File Formats, known as Open
|
|
Document Architecture, which was supposed to replace all competing Document File Formats.
|
|
It is described by the ITU-T Document T.411 through T.421, which are equivalent to ISO
|
|
8613.
|
|
It did not succeed.
|
|
Page language descriptions such as PostScript and PDF have become de facto standards for
|
|
documents that a typical user should only be able to create and read, not edit.
|
|
In 2001, a series of ISO IEC standards for PDF began to be published, including the specification
|
|
for PDF itself.
|
|
ISO 32000, HTML is the most used and open international standard, and is also used as a Document File
|
|
Format.
|
|
It has also become ISO IEC Standard ISO 15445 2000.
|
|
The default binary file format used by Microsoft Word.doc has become widespread de facto standard
|
|
for office documents, but it is a proprietary format and is not always fully supported
|
|
by other word processors.
|
|
Modern Document File Formats are as follows, ASCII and UTF-A, which are plain text formats.
|
|
ASCII is abbreviated from American Standard Code for Information Interchange, is a character
|
|
encoding standard for electronic communication.
|
|
ASCII codes represent texts and computers, telekin medications, equipment, and other devices.
|
|
UTF-A is a variable-length character encoding standard used for electronic communication,
|
|
defined by the Unicode Standard.
|
|
The name is derived from Unicode Transformation Format 8-bit.
|
|
Amiga Guide is a Hypertext Document File Format designed for the Amiga.
|
|
Files are stored in ASCII, so it is possible to read and edit a file without the need
|
|
for special software.
|
|
This is that DOC is a file name extension used for word processing documents stored
|
|
on Microsoft's proprietary Microsoft Word binary file format.
|
|
Microsoft has used the extension since 1983, and specifications have been available since
|
|
2008 under the Opens specification promise, which is a promise by Microsoft published
|
|
in September 2006, not to assert its patents in certain conditions against implementation
|
|
of certain list of specifications.
|
|
Next up is DJVU, pronounced as the French word deja vu, and it is a computer file format
|
|
designed primarily to store scanned documents, especially those containing a combination
|
|
of text, line drawings, indexed color images, and photographs.
|
|
Next is Docbook, which is a semantic markup language for technical documentation.
|
|
It was originally intended for writing.
|
|
Technical documents related to computer hardware and software, but can be used in any other
|
|
sort of documentation.
|
|
Next up is HTML, with the extension of .html or .htm.
|
|
It is an open standard ISO from 2000.
|
|
Next up is Fiction Book, which is an open XML-based eBook format, which originated and gained
|
|
popularity in Russia.
|
|
Fiction Book files have the FB2 file name extension.
|
|
Some readers also support zip compressed fiction book files.
|
|
The next format is Markdown, which is a lightweight markup language for creating formatted
|
|
text using a plain text editor.
|
|
John Gruber and Aaron Schwartz created Markdown in 2004 as a markup language that is appealing
|
|
to human readers in its source code form.
|
|
Next up is OpenOffice XML, which is a zipped XML-based file format, developed by Microsoft
|
|
for representing spreadsheets, charts, presentations, and word processing documents.
|
|
Next is the OpenDocument format for Office Applications, a abbreviated ODF.
|
|
It is also known as OpenDocument, and is an open file format for word processing documents,
|
|
spreadsheets, presentations, and graphics using zip compressed XML files.
|
|
Next is OpenOffice XML, and it is an open XML-based file format, developed as an open community
|
|
effort by Sun Microsystems in 2000-2002.
|
|
The open source software application, suiteopenoffice.org, 1.x, and star office 6 and 7.
|
|
Use the format as their native default file format.
|
|
Next is OXPS, which is the OpenXML paper specification, and is an open specification
|
|
for page description language and a fixed document format.
|
|
Microsoft developed it as an XML paper specification.
|
|
In June 2009, ECMA International adopted it as an international standard ECMA-388.
|
|
Next up is PalmDoc, abbreviated PDB, and is a container format for record databases
|
|
in Palm OS, Garnet OS, and Access Linux platform.
|
|
It is structured similar to PRC's resource databases.
|
|
The PalmDoc eBook format is a special version of the PDB format.
|
|
Next up is PDF, which you already know is a portable document format standardized as
|
|
ISO 32,000, as a file format developed by Adobe in 1992 to present documents, including
|
|
text formatting and images, and a matter independent of application software, hardware, and operating
|
|
systems.
|
|
Next up category of PDF is PDF forward slash e, which is an ISO 24517-1 2008, it is an
|
|
ISO standard published in 2008 for document management, engineering document format using
|
|
PDF Part 1.
|
|
There is also PDF forward slash UA, which stands for Accessibility, and PDF forward slash
|
|
VT, which is variable data and transactional printing.
|
|
Next we have PostScript, which is a page description language in the electronic publishing desktop
|
|
publishing realm.
|
|
It is a dynamically typed concatenative programming language that was created at the Adobe
|
|
systems.
|
|
By John Warlock, Charles Gessie, Doug Brott's Ed Taft, and Bill Paxton from 1982 to 1984.
|
|
Next we have Rich Text Format, which is a proprietary document file format with published specification
|
|
developed by Microsoft Corporation from 1987 until 2008 for cross-platform document
|
|
and interchange with Microsoft products.
|
|
Next is SymbolicLink, SYLK, it is a Microsoft file format, typically used to exchange data
|
|
between applications, specifically spreadsheets.
|
|
SYLK files can eventually have a .SLK suffix, composed of only displayable,
|
|
ANSI characters.
|
|
It can be easily created in process by other applications such as databases.
|
|
Next is Scalable Vector Graphics SVG, it is an XML-based vector image format for defining
|
|
two-dimensional graphics, having support for interactivity and animation.
|
|
The SVG specification is an open standard developed by the World Wide Web Consortium since
|
|
1999.
|
|
Next we have Text, TEX, stylized with the system ST, subscript EX, is a typesetting system
|
|
which was designed and written by computer scientist and Stanford University professor
|
|
Donald Nooth and first released in 1978.
|
|
Next we have Text and Coding Initiative, TEI, it is a text-centric community of practice
|
|
in the academic field of digital humanities operating continuously since the 1980s.
|
|
Next is TROF, short for typesetter, ROF, it is a major component of a document processing
|
|
system developed by Bell Labs for the UNIX operating system.
|
|
TROF and the related NROF were developed from the original ROF.
|
|
Then we have Uniform Office Format, sometimes known as Unified Office Format, it is an open
|
|
standard for office applications developed in China, it includes word processing, presentation
|
|
and spreadsheet modules, and is made of GUI API and Format specifications.
|
|
And last there is WordPerfect which is a word processing application now owned by Karel
|
|
with a long history of multiple personal computer platforms.
|
|
With the height of its popularity in the 1980s and early 1990s, it was the dominant
|
|
player in the word processor market, displacing the prior market leader award star, says
|
|
Ben Archer 72 for Hacker Public Radio.
|
|
Feel free to record a show of your own, until next time.
|
|
You have been listening to Hacker Public Radio, as Hacker Public Radio does work.
|
|
Today's show was contributed by a HBR listener like yourself.
|
|
If you ever thought of recording a podcast, you click on our contribute link to find out
|
|
how easy it really is.
|
|
Hosting for HBR has been kindly provided by an honesthost.com, the Internet Archive
|
|
and R-Sync.net.
|
|
On the Sadois status, today's show is released under Creative Commons, Attribution, 4.0
|
|
International License.
|