Who Needs Git When You Have ZFS?

  • ZFS is excellent for database development! Create a snapshot before you try something that mess up your data and instantly restore it! If your development database is a few gigabytes, this will save you a lot of time.

  • > Who needs Git?

    > Notably missing is support for merging

    So, it's almost like a car except it's missing its wheels

    No, it's not like Git

  • I've been working with virtual machines in production and development for well over a decade now and a lot of what works via ZFS in this article works very similarly to how it would work we used AWS or VMware style snapshots. In fact, ZFS snapshots are only so helpful if you make complicated changes across multiple ZFS volumes, for example, that require some more transactional style rollbacks. In such scenarios, it may be easier to just perform an instance-level rollback.

    One problem that didn't really occur to me before I became more ops-side was that administrators would want to heavily restrict / remove users' (read: developers and even other sysadmins) abilities to create snapshots in the first place. Why would you remove self-service temporary backups and avoid a lot of backup restoration requests? I didn't realize that other engineers could be careless and keep dozens or even hundreds of snapshots over time that gobble up expensive storage resources (SANs are not cheap regardless of manufacturer) and slow down I/O transactions over time. That was why so many of my customers deploying stuff like VMware Lab Manager and vCloud Director demanded the ability to remove access to snapshot features.

    As a result of the typical usage where user abuse of a very powerful feature threw things for a loop, the typical organizational structure and siloization of these environments means that nowadays SAN-side LUN snapshots are used more often than from the VM layer (the same administrators that manage VMware environments typically have rights to the SANs). Using ZFS like this is a developer-side reaction to me, but duplication of trying to solve the same problem when technical solutions have existed and are viable is exasperating.

  • > Notably missing is support for merging, which ZFS does not have direct support for as far as I'm aware.

    So, no replacement. Merge and Branch are the major features of git. The bigger the project the bigger the need for these. I know some projects where people work full time as merge conflict resolvers. Without git that would require a whole team instead of a person.

  • > Notably missing is support for merging, which ZFS does not have direct support for as far as I'm aware.

    That insight is critical, especially now that we are heading towards Dropbox Infinite, IPFS, Ceph and so many other non-centralized file systems.

    Syncing is not a solved problem at all yet. Advances can bring the same revolution as 3-way merging did to SCMs, previously dominated by file-locking mechanisms.

  • ZFS doesn't have history in that way; snapshots are far too coarse. HAMMER, for example, is closer to what you want if you're going to replace git with a FS.

    https://www.dragonflybsd.org/hammer/

  • If you look past the attention-grabbing headline, this shows some pretty cool stuff with ZFS, using source versioning as an analogy to explain what would otherwise be quite abstract when it comes to filesystems. The only thing I knew about ZFS was the name, this helped me see some of its impressive features.

    Of course, I'll take reliability & stability over fancy features any day, but I'm glad there's active development in this area, and look forward to seeing these features mature. They could contribute to some interesting simplifications.

  • "Oracle's (previously Sun's) next-generation file system"

    That's why you need to stick with Git.

  • "Using ZFS as a replacement of Git for is probably not a good idea..."

    Of course not.

  • To play devil's advocate, sometimes Git is abused to store things that aren't code, and ZFS (or similar) might actually be a better fit in those situations.

    Somebody further down mentions art assets. Also potentially relevant are document repositories.

    ZFS isn't very easy to use for a lay person, but that's really only a "Time Machine"-esque front-end away.

  • Ubuntu 16.04 comes with ZFS preinstalled and ready to use.

    Well, you need to install the user package 'zfsutils-linux' and you are ready to go.

    The kernel modules for ZFS are already in the default kernel and are loaded automatically when you create a volume.

  • I have a FreeBSD server in my home office that has a big ZFS raidz volume that I use for shared file storage, and it's really, really awesome. The snapshots are especially great because they greatly reduce the fear of screwing something up. I once wanted to run a de-dupe script on several hundred gigabytes of photos, but I was afraid it would go awry, so I snapshotted first, knowing I could roll back in a few keystrokes. (Granted, this wouldn't help much if the script had caused some problem that wasn't immediately apparent. And yes, I have backups of all of that stuff elsewhere, but doing a restore from backup is a lot more work than rolling back a snapshot.)

  • Strange there's no mention of FreeBSD's excellent ZFS implementation

  • How production ready is ZFS on Ubuntu (Debian?) ? Is there any risk of Oracle fighting back open source zfs as for Java with Google and Android ?

  • Three year old article teaches slightly outdated techniques and basic ZFS usage on linux by way of specious comparisons to GIT.

    I guess it wasn’t a half bad resource when it came out, but these days there’s got to be better blog posts about this, right?

    EDIT: Not that I’m bitter; I love zfs. I mainly wonder how something so old ended up here.

  • I'm tempted to write an article: Who Needs ZFS When You Got Git, but the title IS the article.

    Don't get me wrong, I looove ZFS, and from a quick scan this article looks like a good intro to ZFS, particularly for someone familiar with git.

  • This blog post was one of the most inspirational pieces that led us to founding Pachyderm (pachyderm.io). We offer git-like semantics for data, including branches, and in a distributed system so it scales to petabytes.

  • ZFS and other advanced versioned/copy-on-write filesystems are really cool and useful but it's not a source code management tool.

    E.g. the examples show branching but not merging.

    On the other hand, this might be useful for storing binary large files that are not really diffable and mergeable, and thus a bad fit for Git. However, the typical use case for this is art assets in games, which means that it's the artists and designers who are the target audience. Git is often said to be too difficult for non-programmers, and ZFS or BTRFS is definitely not easier.

  • Weird title as Git and ZFS are used for completely different reasons but I love ZFS as well and the article highlighted some of the finer details which was great.

  • An important difference is that While both rely on Merkel DAGs git uses content addressing while ZFS doesn't (physical addresses impact hash too).

    Relatedly, ZFS's memory management relies on the assumption that a forked filesystem and its sibling will monotonically alias less data as either is modified. This allows it to avoid tracing and ref-counting alike, but makes implementing merging or `cp --reflink` difficult.

  • Are there any beginner resources for learning about file systems? Would be smart / harmful to reformat my whole macbook ssd to ZFS?

  • This article gave me some confidence in finally trying to boot Arch Linux on ZFS. This will either make my life very easy or very hard.

  • Well.. I get the funny side of the comparison , they're two different things..

  • I've been reading that ZFS was "almost ready for use" in Linux for years, how stable is it for production now, anybody here using it, any good/bad comments or tips? (using CentOS 7 at the moment)

  • Anyone here think that a "ZFS in the cloud" that anyone could inexpensively export a ZFS snapshot (or volume) to would potentially be a business model (offsite backups, etc.)?

  • More like git-annex and git LFS.

    ZFS can be good cms choice for large binary files.

  • ZFS is very powerful indeed; I've been using ZFS to share a data partition between OS X and Ubuntu. The versioning capabilities the OP mentions has a lot of potential.

  • Branches and tags are just snapshots and clones? This actually looks more like Subversion than git, together with the challenges with managing merges that this model brings.

  • I like ZFS a lot, but it's no replacement for Git. Snapshots are really handy for a lot of things, but not for a group of developers collaborating over some source code.

  • Almost all of what they're saying the benefit here isn't ZFS specific and could be done (for example) with Volume Shadow Copies on Windows servers as well

  • Strange he doesn't mention FreeBSD's ZFS implementation.

  • It looks like ZFS which came from BSD systems will be/is the next big thing in Linux, not Ubuntu on Windows which gets more hype...

  • I personally feel that we should all hop off the ZFS (on Linux) hype train and consider a few points. Currently ZoL is being developed by two people, and it works by creating a translation layer into the Linux kernel API. The bug list is enormous (much bigger than btrfs), and there isn't enough experience with the codebase by Linux kernel developers. That's ignoring the potential legal issues. No other ZFS port has these problems.

    On the other hand, if you want a supported filesystem with many of the same features as ZFS, there's btrfs. It alleviates all of the problems with the ZoL port. And there's no fear of Oracle lawsuits.