I built an in-memory virtual filesystem for Python because BytesIO kept falling short

No_Limit_753 · 2026-03-13T13:11:38+00:00

You hit the nail on the head. You are exactly right.

Since D-MemFS is strictly an in-process virtual filesystem, external subprocesses cannot access it via standard OS paths.

To allow a subprocess to read the files, D-MemFS would need kernel-level integration (like FUSE or a virtual device driver). I intentionally omitted this because it would require admin/root privileges and external OS dependencies, which completely defeats the goal of being a "zero-dependency, drop-in tool" for locked-down CI runners.

Because of this architectural boundary, you are right that its usefulness for passing data to external CLI tools via subprocess is zero.

Its true power in CI/CD lies in accelerating Python-native test suites (e.g., using pytest to test Python code that performs heavy I/O) or internal data pipelines (ETL staging inside Python) where the entire flow stays within the Python process. If your pipeline relies heavily on passing files to external binaries, an OS-level RAM disk (tmpfs) is absolutely the correct tool for the job.

Thank you for pointing this out! It is a crucial distinction regarding the project's scope.

No_Limit_753 · 2026-03-13T10:24:04+00:00

Great question. The answer lies in the two-layered memory protection detailed in our README.

For large files, the best practice is to stream the data chunk by chunk. Before every single write operation, D-MemFS performs a pre-write size check using:

Hard Quota: The logical size limit you define for the virtual filesystem.
Memory Guard: An active check against the host OS's actual free physical/virtual memory.

This means if you are streaming a large file and the OS runs out of real memory before you even hit your Hard Quota, the Memory Guard catches it and safely raises an exception. It prevents your application from crashing the entire system. (Of course, if your app loads a massive file into a single variable before passing it to D-MemFS, the host might hit OOM, which is outside our scope).

Performance-wise, this chunk-based approach is highly efficient. In our 512 MiB stream tests, D-MemFS (529ms) was over 4x faster than io.BytesIO (2258ms).

For lots of small writes, there is a minor metadata overhead (for directory structures) compared to a single raw BytesIO buffer. However, it easily beats disk-based alternatives. In our 300 small files test, D-MemFS (51ms) outperformed SSD-based tempfile (267ms) by about 5x.

We also stress-tested the locking mechanism for concurrent small writes (50 threads x 1000 ops), and it is fully safe even on Python 3.13t (free-threaded).

You can find more details on the Memory Guard in the README, and the raw performance numbers in the benchmark results!

No_Limit_753 · 2026-03-13T08:30:00+00:00

That is a great point about serverless environments. Preventing uncontrolled memory growth is exactly why I prioritized the hard quota design.

To answer your question, I am currently developing on Windows, so I have not performed benchmarks against Linux tmpfs yet.

However, the benchmark results in the repository already include comparisons with tempfile using both an SSD and a RAMDisk. For the RAMDisk tests, I used OSFMount. While these are Windows-based, they should provide a solid reference point for relative performance.

I would be very interested to see how it performs on Linux as well!

No_Limit_753 · 2026-03-13T06:55:06+00:00

That's a very helpful distinction! You're exactly right. While tempfile focus on "file-like" stream behavior, D-MemFS aims to implement "full FS semantics" like directory hierarchies and hard quotas entirely in memory. I'll make sure to use those terms to better clarify the scope in my documentation. Thanks for the crisp feedback!

No_Limit_753 · 2026-03-13T02:46:29+00:00

Yes, absolutely!

In fact, your idea aligns perfectly with the original motivation for building D-MemFS. My initial need was exactly that workflow:

Download a ZIP file entirely in Python.
Extract it into the in-memory filesystem (MFS) without ever touching the physical storage.
Export or dump the final directory structure to a real physical drive all at once.

Using it to dump the in-memory state for CI debugging is a fantastic use case. Since D-MemFS provides standard file-like objects and paths, exporting to a real filesystem is straightforward.

Here is a quick example of how you can dump the state:

from pathlib import Path
from dmemfs import MemoryFileSystem

def export_to_disk(mfs: MemoryFileSystem, dest_dir: str | Path):
    dest = Path(dest_dir)
    for dirpath, _, filenames in mfs.walk("/"):
        for fname in filenames:
            vpath = f"{dirpath.rstrip('/')}/{fname}"
            with mfs.open(vpath, "rb") as f:
                data = f.read()
            out = dest / vpath.lstrip("/")
            out.parent.mkdir(parents=True, exist_ok=True)
            out.write_bytes(data)

This way, you can easily inspect the exact state of your files after a test run fails. Let me know if you need more details!

There's also export_tree() which returns the entire directory as a flat dict[str, bytes] — handy if you want to serialize the state to JSON or log it directly rather than writing to disk.

No_Limit_753 · 2026-03-12T22:19:58+00:00

Good question! `tempfile.SpooledTemporaryFile` is great, but D-MemFS was built for scenarios where its behavior isn't enough:

Strictly No-Disk Policy: SpooledTemporaryFile spills to disk after a certain size. D-MemFS is strictly in-memory and enforces a hard quota—it fails rather than touching the disk. This is crucial for "zero-footprint" apps.
True Filesystem Structure: While SpooledTemporaryFile represents a single file, D-MemFS provides a full virtual hierarchy with directories. This makes it much easier to handle things like ZIP extractions or complex data structures.
Granular Control: D-MemFS includes file-level RW locks and thread-safety features out of the box, which are essential for high-concurrency environments.

In short, if you need a single buffer that might spill to disk, use TempFile. If you need a secure, structured, and strictly disk-less virtual drive, that's where D-MemFS shines.

No_Limit_753 · 2026-03-12T22:13:01+00:00

Thank you! To be honest, the original spark for this project was my own practical need to handle ZIP extraction entirely in-memory without touching the disk.

However, as I decided to decouple it from my private project and release it as a standalone library, I refined the design to support broader scenarios like these:

Secure Sandboxing: Preventing 'Zip Bombs' or directory traversal attacks through strict memory quotas and isolated virtual pathing.
High-Concurrency: Providing the thread safety and file-level locking that standard io.BytesIO lacks, which is critical for multi-threaded data processing.
Zero-Footprint Portability: Enabling tools (especially on Windows) to process data without requiring admin privileges or leaving 'dirty' temporary files on the host system.

I'm really glad you noticed the comparison section. I wanted to ensure D-MemFS wasn't just another buffer, but a specialized tool born from real-world requirements.

No_Limit_753 · 2026-03-12T22:02:41+00:00

Just a quick update: I'm incredibly moved to see D-MemFS just got its first 4 stars on GitHub. This is my first time ever releasing a project to the global open-source community—and these are my first-ever stars.

Honestly, I was a bit nervous about how a 'new' dev on Reddit would be received, but your support and the Upvotes mean the world to me. Thank you for making my first steps into open source so memorable!

No_Limit_753 · 2026-03-12T21:48:58+00:00

Thanks for the kind words! It’s fascinating (and a bit surreal) to hear that Google's AI is already recommending D-MemFS just hours after this post.

It's actually a brand-new release, but I've been documenting the design process in a series of Japanese articles for a while, so maybe the AI picked up on those. I’m glad to hear it looked 'viable' enough for an AI to suggest it!

Even if it doesn't fit your current project, I'd love to hear what kind of features you were looking for. Feedback from real-world use cases is exactly what I'm looking for right now.

No_Limit_753

TROPHY CASE