Files

232 lines
19 KiB
Plaintext
Raw Permalink Normal View History

Episode: 3701
Title: HPR3701: ReiserFS - the file system of the future
Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr3701/hpr3701.mp3
Transcribed: 2025-10-25 04:18:44
---
This is Hacker Public Radio Episode 3,701 from Monday the 10th of October 2022.
Today's show is entitled, Ryser F.S. The File System of the Future.
It is the first show by Newhost Paul Jay and is about 18 minutes long.
It carries a clean flag.
The summary is.
The history and future of Ryser F.S. its involvement with DARPA,
assorted murder and colonel politics.
Hi everyone, my name is Paul.
I'm recording this show for Hacker Public Radio.
This is my first show, recorded in Beffordshire.
For the more offstituted you might recognise my accent that I'm British,
so that's in the UK.
I'm a long time listener and first I'm contributed to HPR,
so I will give you a bit of introduction to myself before I talk about my subject matter.
I am a true full stack developer,
so I can design CPU architectures from the transistor level upwards,
including firmware, system software, backend and front end systems,
and I also invented my own programming language.
I work mostly in the automotive and manufacturing sectors,
usually in specialist problem areas of automation and normalisation
and data processing, especially when it comes to getting disparate file formats
and systems to talk to each other and play nicely.
I was born at the start of the micro computer revolution in the UK in the 80s,
and the first computer I cut my teeth on was a BBC micro,
which was produced under licence from a company in the next county over from me,
came which year.
This company was called ACORN Computers.
I also used a ZX Spectrum and ACORN Electron,
a Commodore 64 and an ACORN Archimedes,
which even to this day I believe is still ahead of its time.
Most people outside the UK have never heard of ACORN Computers,
but have heard of their most successful product, which today we know as ARM Micro Processes.
That stands for Advanced Risk Machines,
but originally used to call ACORN Risk Machines.
When ACORN decided to move away from the 6502 Micro Processes that powered
the likes of the BBC micro and the ACORN Electron,
they decided they needed something more powerful for their Archimedes range,
so that's when they set up on creating their own CPU, which we now know as ARM.
When ACORN was bought out and stopped producing computers,
I moved on to the Commodore Amiga.
I had an Amiga 600 and an Amiga 1200, and I think those computers also were very far ahead of
that time back in the day. When I got to high school, I got my first exposure to Microsoft Windows,
which compared to the Amiga Workbench just didn't really seem up to snuff and didn't really
interest me very much, so I never pursued it.
Around this time, I also had a job working in my town's first internet cafe,
which was powered by Unix and Linux systems to manage the routing and the care management,
and I started getting very familiar with these, and I thought the design was better,
and it just clicked with me. Everything seemed much more well thought-through and more correct.
I got a copy of Red Hat Linux, and this isn't Red Hat Enterprise Linux,
that didn't exist back then. This was just playing Red Hat Linux,
and I installed it on my home desktop computer, and I was fascinated by it,
that I could get in there and tinker with everything. It blew my mind, and I wanted to learn
everything about it. I kept that distro on there for a good couple of weeks before I did a search
online, and to set the scene, this wasn't using Google. This was using a search engine you
may have heard of called El Davista, and I remember searching for what is the hardest Linux to learn,
and the entire first page of results came, like saying Slackware, Slackware, Slackware,
and I thought of myself, well, if I'm going to learn this system, I'm going to dive in at the
deep end, and I'm going to download Slackware. When I downloaded it, version 8.0 was available,
which only offered one filing system, X2. Shortly thereafter, version 8.1 came out,
which offered two new filing systems, X3 and RISERFS, both of which were journaling filing systems,
but as I'd played with X, I decided that I was going to give RISERFS a go,
so I formatted my disk with that, and started learning about RISER.
When version 9 came along, RISERFS was made the default filing system, and around this point,
also, it was the default for Suza, Linspire, and Xandros Linux.
So I've been a Slackware user for about two decades now in my personal and professional life,
but I'm not a Linux zealot. I do believe in the right talk for the right job. I use a lot of
virtual machines for cross-platform development, and I've developed solutions for Android,
and iOS, and Linux, and Windows, and Mac, and I use free RTS on my Pintime Smartwatch,
and Microsoft Windows on my Xbox Games console. I've used the MegaWork, Bench, and Beos,
and you name it. I'm very tech-agnostic, provided the tech plays nicely with each other,
which is what a lot of my working life involves. That's all that matters to me,
but Slackware and Linux will always have a special play smart, and always be one I consider home.
Anyway, that's enough about me, and time to move on to the subject for the show,
which is Rise of Filing System. So before I decided to record this episode, I looked through
the HBI history to see if anyone had covered it previously, and I was surprised to find it had
only been briefly covered once before, in episode 1,560 in July 2014. And that was in a series of
talks about different filing systems by JWP, who has recorded some great content, and I always
look forward to his shows. So I want to say thanks to him for including it, and I want to recap
what he said, and then expand on a few points in a bit more detail. So JWP's overview tells us
that Rise of Filing System was the first journaling filing system in Linux, created in 2001 by
Hans Reiser. It only supports right-back journaling mode, is one of the fastest journaling
filing systems in Linux. You can resize it, an existing filing system online while it's still active
and mounted, and it supports tailpacking, which allows you to store data from one file into the
empty blocks of another file, and some of these features were incorporated into X4.
So, expanding on these points from my own perspective and experiences, Rise of Fils is now
21 years old as at the date of this recording, and I use it almost exclusively. So why do I do this?
But the biggest reason is that I've never lost any data on Rise of Fils, despite all sorts of
problems including physical failures and corruptions, and especially user error. I use it on my laptops,
which sometimes don't go to sleep properly or the power button gets knocked in the bag and
the laptop just dies, and it always recovers fine without losing anything. I've had mechanical
failures that I've been able to fully retrieve the filing system from, and also sometimes we will
make the mistake where we've gone to format a disk and have mistype the device name. So in the past,
I've tried to zero out a USB key, and instead of using DD to write zeros to slash dev slash SDB,
I've written it to slash dev slash SDA, and after a few moments thought, that's strange,
the USB key lights are not flashing to tell me it's writing data. What am I doing? And then
realized my terrible mistake and had to control C my way out of there and been left with a situation
where I've got a filing system that's mounted, and I've just overwritten the first 30 megabytes of
it, and I'm thinking, well, it's not going to boot again, so I need to somehow recreate that. But
the Rise of Filing system tools have always let me recover from the most bizarre situations like
this, and I've just never lost a single file. So it's very hard for me to want to give something
up that's always been so reliable for me. Rise of FIS also supports right back journey mode,
which is quick, but slightly less crash proof than the more common order journaling mode.
But because I don't really have use cases that are that extreme, I think that I've been absolutely
fine with right back journaling mode. I find that the filing system is extremely fast.
Rise of FIS is based on binary trees. In fact, a specific type called a B plus tree,
and these are balancing binary trees for files typically smaller than a block, which is usually
about four kilobytes. These are stored in the tree itself rather than the tree points of the
blocks on the disk where the files are stored. And this is extremely well suited for Linux,
General Usage, where there's lots of small configuration files and cache files, and also for a lot
of the ETL and data processing work that I do that has a huge number of small files. Sometimes,
you know, hundreds of thousands to millions that aren't very big. It's a perfectly suited
filing system for that. And also the tail packing, which is a technically called block sub allocation,
allows the filing system to take two files that may be, say, your block size is four kilobytes,
and you have two files that are a kilobyte and two kilobytes, and it can actually store them both
in the one block to save having wasted space. Otherwise, you'd have one kilobyte file in one block,
and then three kilobytes wasted, and then you'd have another two kilobyte file in another block
and have two kilobytes wasted, whereas riser allows you to combine them both into the one block,
and that reduces fragmentation and wasted space. JWP mentioned in his show that a lot of these
features made it through to x4, but tail packing isn't one of those features, and x4 doesn't support
tail packing, whereas riserfs does. So I want to get the history of where riserfs came from.
It started at a company called Namesys, which was founded by Hans riser in 1993.
There the file system was developed with development being funded by a mix of proprietary licenses
and sponsorship from the likes of Susa Linux, mp3.com, big storage.com, and as well as some sponsors
who wanted to remain anonymous. It was the first journaling filing system released initially
in standard Linux, kernel 2.4.1. However, it was already possible to get journaling filing
systems from patches and unofficial kernels before this, but riserfs was the first to make it
into the mainline kernel, and that was quite a significant achievement for Namesys.
As time went on, new academic ideas and innovations regarding filing systems and data storage
from collaborations between Namesys and the Moscow State University and the program systems
institute of the Russian Academy of Sciences began to accumulate. But due to the limits in the
existing riserfs version three co-base and some inherent shortcomings with some previous design
choices, Namesys decided to complete development on the project and declare it feature complete
with it only receiving critical security and bug fixes going forward. The company decided to move
on from the existing co-base and to restart the riserfs project from scratch and call the project
riser4. It would combine the latest innovations with the experience of the previous incarnations
of the filing system. Susa Linux made a commitment to switching from riserfs to riser4 as the default
filing system in its upcoming distributions. In 2002, the project caught the attention of the US
Pentagon's research and development section DARPA, which stands for the Defence Advanced Research
Projects Agency, which provided a $600,000 U.S. grant for the project to build the filing system
of the future. It was also sponsored by Linz by Linux, which already made riserfs its default
filing system by this point. The riser4 project progressed and a work link was slowly being completed
over the course of a few years, bringing our history forward to October 2006 where Hans Reiser was
arrested for the suspected murder of his estranged wife. I won't go into too much detail on this,
as there's plenty of other information online if there's interest you to learn, and I want to
focus more on the technical aspects. But Namesys employees continued to work on riser4 during these
events. In late December 2006, Hans Reiser attempted to sell Namesys to fund the legal cost for his
trial, but he didn't find a buyer as of March 2008, and the company eventually ceased commercial
activity that year, just as Reiser4 was nearly completed. Namesys employees, including Edward
Chishkin, continued to work on the GPLD code and released it via his personal website and then
eventually from kernel.org. The riser4 project was shortly thereafter completed according to the
roadmap, and the patch sets were various kernel versions were released, and Reiser4 arrived,
feature packed and included many new and innovative approaches to filing systems, including a move
from B plus trees to dancing trees, which is a concept invented by Hans Reiser, whereby the tree
balancing occurs only on a flush to save unnecessary and expensive disc operations.
Some of the riser4 features acquired ahead of their time and have been the inspiration for
other filing systems which have adopted some of these ideas and approaches. Whilst Namesys,
or at least the former employees thereof, may have achieved the DARPA request of building
the filing system of the future, as of the data recording, and despite every year's of requests
from the developers, the code has not been merged into the mainline kernel. Of course, there have
been claims that the reason for this is not based on technical merit, but on political stands
considering the history of Hans Reiser, but the ex-Namesys developers have not given up hope and
maintain compatible patch sets with kernels released even as recently as a month ago as of the
date of this recording. On the last day in 2019, the lead developer Reiser4, Edward Chishkin,
announced the development of Reiser 5 in a lengthy post to the Linux kernel mailing list,
detailing even more fundamental and innovative approaches to filing systems, including some
approaches now already used by ZFS, or ZFS. Despite the continued development and updating of
the Reiser FS and Reiser 4 code bases, the Linux kernel has deprecated Reiser FS in version 5.18,
and plans to remove it completely in 2025. Although Reiser FS does suffer from some problems,
including the YK38 bug, whereby the file system time integer will wrap around in the year 2038,
such problems are not present in Reiser 4 and shouldn't be obstacles to the technical
adoption of the filing system to replace the aging Reiser 3 version that we have in the kernel right now.
As a long-term Reiser FS user, I now need to consider the alternative
filing system to switch to if Reiser 4 is not going to be adopted. I think
Batre FS sounds the most appealing out of everything available, considering it's
got mainline kernel support and also has a lot of quite modern filing system features,
but I welcome any suggestions that the community might make for me to help me find a path forwards
on this. But whilst it's still available, I want to share some lessons learned and experienced
gain from two decades of using Reiser FS in many places, and also the lessons learned on
places you shouldn't use it. First of all, do not use it on a USB key, because when you format
the drive, it has to put a lot more data structures on the drive, including the journal,
and that takes up a lot more space than X3 or a VFAT partition would, and it's slower to write to.
Also, never use it in a VM. In fact, never put any Reiser Filing system inside another.
You should hopefully never need to be in a position to recreate your filing system after a crash,
or if you have to attempt a recovery after some error or a deep disk scan. But if you ever do
find yourself in that position, if one Reiser Filing system is embedded in another such as a
disk machine, image, or a backup, the tools to recover the filing system will then read into the
filing system within a filing system and start recovering fandom files that were never part of the
the outer master filing system. I had this issue once, when I had to recover from a disk crash,
that it also, in my lost and found directory, stored some files that were originally stored inside
a VM. Never do that. There's some fun quirks about Reiser. There's a virtual directory called
.ReiserFSUnderschoolPriv in the root directory, and that's used to store the extended attributes
file. You can't LS this or CD into it or write to it. You're called to such file operations,
which is fail. But this virtual directory and file were linked to a security bug that has now
been fixed, where the incorrect file permissions were granted to anyone that wanted to do something
nefarious with it, such as an attacker, that could use it again, extended privileges. But that,
unfortunately, has now been taken care of. There are several tools that you can use to manage your
Reiser Filing system, including standard tools such as MK, ReiserFS, which is used to make the
Filing system, and ReiserFSCK, which is used to check the Filing system. There's also ReiserFS Tune,
which you can use to modify the parameters of the Filing system, such as the label and the
time between it being checked and hash functions, and you also have resize underscore ReiserFS,
which is the tool mentioned by JWP that allows you to grow Reiser Filing System that's online,
provided the underlying block storage device has enough space for the Filing system to grow.
But of course, to shrink it, you have to unmount it, but it can be shrunk, provided that the
block device that it's on is not made smaller than the file system itself.
And there's also a very interesting tool called Debug ReiserFS that allows you to
examine the journal and the block structure of the Filing system. You can see what files are
stored where and what their attributes are and such things, so if you wanted to have a bit of a
low-level poke at your Filing system, you can pass in the hyphen D flag, and that lets you start
to explore the console a lot of information about what's going on under the hood in the Filing System.
I won't get too far down in the weeds describing the other options available, but it's worth
checking out if you have a Reiser Filing system and want to learn a bit more about what your
system's actually doing. I want to say thank you to everyone for making it this far and listening
to me. I hope that you enjoyed my content. I hope that it was accurate and error-free, but if
there are any corrections, please do of course contact me. I'd love to see some comments on the show.
It's my first show, so hopefully everyone will be kind and I hope you enjoyed my content.
I'd love to make some new connections, and I'm working on a small series which I hope will be
ready in the next couple of weeks, which I look forward to sharing with you all. So again, thank you.
This is Paul signing off, and I hope you all have a great day.
You have been listening to Hacker Public Radio, and Hacker Public Radio does work.
Today's show was contributed by a HPR listener like yourself.
If you ever thought of recording podcasts, click on our contribute link to find out how easy it
is. Posting for HPR has been kindly provided by an honesthost.com, the Internet Archive,
and our sims.net. On the Sadois status, today's show is released under Creative Commons
Attribution 4.0 International License.