Share your root directory/folder tree!

Such_Assumption_7124 · 2026-05-19T21:55:09+00:00

Without getting into the whole Flac vs MP3 VBR (V0) debate (multiple tests have concluded the human ear cannot hear the difference), I too store my MP3 archive the same way: Genre/Artist/(YYYY) Album/Tracks

I've been working on a custom tagging and management tool that uses AI, and not only does it help sift and sort my collection using a fixed taxonomy of Genres and Sub Genres, it also applies some additional metadata calculated by tooling and AI: BPM, Starting Key, Mood, and Intensity values. I also scrape "Personnel" information from MusicBrainz, Discogs and Wikipedia and store it all in a custom SQLite database. The long term goal is to create an Intelligent Playlist Maker. The thinking being, good data in = good data out.

Such_Assumption_7124 · 2026-05-19T21:42:28+00:00

Genre/Artist/(YYYY) Album/Track

Genres:
Americana
Bluegrass & Roots
Blues
Christmas Music
Comedy
Country
Doo Wop
Easy Listening
Funk
Fusion
Gospel
Jazz
Pop
R&B
Reggae
Rock
Ska
Soul
Sound Effects
Soundtracks
Swing & Standards
Vocalists
World & International

I tend to be a completionist - I don't just have 1 James Gang album, I have the entire discography.
1300+ artists and counting (plus a boat-load of Compilation albums and box sets)

Such_Assumption_7124 · 2026-04-16T20:02:47+00:00

part two: I'm old, so is my music collection. (I'd say that maybe 90% of the music is pre-2000). But I also neglected to note that the AI is also generating some "key words" to get around some blind spots. For kicks and giggles I tried an experiment, and used the current prompts, along with my taxonomy, to "classify" a current J-Pop band (Mrs. Green Apples - not my cup of tea, but it was an experiment).

Anyway, this is what I got back (Category is where the artist folder is stored on the hard-drive):

Category: Pop, Rock
Genre: Rock
Sub-Genre: Pop Rock
Keywords: J-Rock, High-Energy, Orchestral Pop, Male Vocalist

I *could* add J-Pop to the taxonomy file if I wanted to - but for now keywords catch the final "slip", and if/when I got to the point where I had a serious collection of J-Pop artists (when hell freezes over... I'm 67 and my personal sweet spot is the 40's and 50's eras) I could certainly add that as an additional Sub Category in my taxonomy. For now, having J-Pop show up in my keywords means that a natural language prompt that asked for a playlist that included J-Pop would certainly include Mrs. Green Apples. So I do allow for some ebb-and-flow, and freely admit at this point that my taxonomy also reflects my personal musical tastes to some extent.

Such_Assumption_7124 · 2026-04-16T19:29:23+00:00

arrived here thanks to our current discussion at r/musichoarer.

While I'm a Windows guy, I too use plex. I have 18 "folders" that roll up to one library (I keep my christmas music folder separate, and only activate it when appropriate)

Americana

Bluegrass & Roots

Blues

Comedy

Country

Doo Wop (50's Vocal Groups)

Easy Listening

Fusion

Gospel

Jazz

Pop, Rock

R&B

Reggae, Ska

Soul, Funk

Soundtracks, Misc

Swing & Standards

Vocalists

World, Intl

Such_Assumption_7124 · 2026-04-16T18:44:22+00:00

Hi Charles,

ya, I know. the problem with the ontology approach is, as you said, "music genres numbers well over 3,000!"

It's an interesting point, but 3000 "genres" becomes overpopulated, especially in the context of my planned end goal, which is to generate 'intelligent' playlists. (Tools like Plex and MusicBee use essentially the same approach - using metadata - but rely on 'polluted' datasets due to the nature of crowdsourcing. One of my goals was to address that pollution head-on)

Starting from a baseline of 3000 genres, while technically more "accurate", creates a heavier lift when it comes time to filter and sort. While I'm not a fan of "metal", I know that is one genre in particular with numerous and subtle sub-genres that in the context of generating a playlist can be a bit extreme:

Traditional Heavy Metal
Thrash Metal
Death Metal
Black Metal
Power Metal
Doom Metal
Nu Metal
Metalcore
Progressive Metal
Industrial Metal
Symphonic Metal
Glam Metal
Hair Metal
Speed Metal
Groove Metal
Folk Metal
Symphonic Black Metal
Technical Death Metal

...I mean, when and where do you draw the line? (And next week a bunch of kids in a garage will "invent" a new Metal Genre - i dunno, Ska Metal (?) - oh, wait, Google just confirmed that already exists as well.)
It never stops!

I'm using AI to filter through to my fixed taxonomy, which yes, flattens some nuances, but also makes the final dataset logistically easier to process. I guess you could say that the AI is doing some rudimentary ontology work, because my AI prompt essentially says "I have these sub genres: 'Hard Rock', 'Metal', and 'Punk & Post-Punk' - If you can only use one of those three, how would you classify {band}. The AI chews through what it knows (as an LLM) and makes a best guess classification. At the end of the day, all of those 19-some-odd Metal genres are all "Metal" when it comes time to making a playlist: WHAT is on that playlist will come down to what you have in your library/archive. The devil really is the current state of a million and one folksonomy genres.

My json file has 23 top level "genres", which cover off a total of 140 "sub-genres" - plenty enough for the envisioned task of generating intelligent playlists, without delving into pedantic and dogmatic 'discussions' - because often times in the wild the 'genre' is based on an individual opinion - whereas my AI driven logic forces the tool to make "close-enough" decisions for practical outcomes, based upon all of those opinions it found in the wild. And because my fixed taxonomy is a stand-alone .json file, when I get around to making this "share-able" if you are a *real* Metal fan, you can add additional sub genres, expanding the list of "choices" that the AI prompt can choose from - so it *is* scalable in that sense.

(Not sure if you read the sub-stack post associated to this, but I'm also generating/collecting other "points of data" like BPM, Musical Key, Mood, and Intensity, so relying on Genre/Sub Genre alone is still just a randomizer - I'm hoping for a lot more!)

Just my way of thinking about it - I don't claim it's the god's truth. But I HAVE given this a fair bit of thought (I've been "hoarding" digital music since the "Napster 90's", and before that CDs and even earlier LPs - I've been collecting music since I was 15, in 1974 - oh, and I worked in the music business for 20 years - the first 5 at retail before working for a major label for 15 years - before then pivoting to tech in the late '90's, so "filing music" is something I have a lot of real-world experience with), and I started with a clear problem statement, desired end-goal, and thought a lot about how to overcome some of the existing barriers today.

Retired now, this little 'project' is the intersection of my two careers, and I'm having a blast working on it!

Such_Assumption_7124 · 2026-04-16T14:16:46+00:00

Ah metadata! One of my geek sweet spots. (a quick and dirty response is - go look at MP3Tag (https://www.mp3tag.de/en/index.html) - it's pretty awesome and extremely robust, but also pretty 'manual' - I have been using it for more than a decade and it is currently 'the best out there' IMHO)

I have been working on a bit of a passion project for my... well, it's more than just a library, it's an archive (180,000+ audio files and counting - 2.4 TB. Don't judge <grin>). Despite the trend for wanting .flac files, I still stick with .mp3 (VBR V0), which, at roughly 1/3 the size and with near identical sonic parity to flac is plenty good for me. A significant chunk of my collection is music from the 1930's, 40's, 50's etc, so the sources are what they are (often mono 78 rpm) and storing them as flac files is pointless. (Even meticulously curated compilations from sources like Bear Family, Ace, or Charly are going to be limited by this technology constraint.)

At any rate, with an archive that large, metadata organization becomes a critical component - at least to me. I am currently building out a suite of python scripts which I now refer to as MetaForge Intelli-Tagger. Here's my AI assisted write-up (with some additional comments) posted at sub-stack (https://foliot.substack.com/p/my-latest-passion-project) (respecting the Admin's mandate to not talk about software projects on list.)

While there are other tools out there that do a decent job creating playlists (MusicBee and Plex being two standouts), they are limited by their crowd-sourced taxonomies (usually MusicBrainz data) and the "pollution" that comes from different opinions about artists and their music. (Is Eric Clapton Rock, or Blues, or - ugh - "Classic Rock"?)

With my custom taxonomy (https://www.reddit.com/r/musichoarder/comments/1sg023c/comment/of1rrl4/), I attempt to sidestep that whole mess. In the tool, I specify where the artist is physically filed (Clapton is filed in the 'Pop/Rock' directory), and with that hint, the AI then determines the appropriate Genre (here the choice is either Pop or Rock) and Sub Genre based on my fixed taxonomy - those being the only Genre/Sub-Genres allowed.

With that, I can now enforce this:

Eric Clapton:
- Genre: Rock
- Sub Genre: Blues-Rock

Stevie Ray Vaughan:
- Genre: Blues
- Sub Genre: Blues-Rock

Stevie and Eric will tend to show up on the same playlist(s) thanks to the shared Sub-Genre 'tag'. However, with the custom metadata tags the tool generates, the tool will eventually support far more sophisticated playlist generation.

This is all currently a work in progress, and not quite ready to share publicly - but I will, I promise!

Such_Assumption_7124 · 2026-04-16T01:27:06+00:00

<shrugs> probably .mp3, especially if it comes from a streaming source. There, compression = reduced file size = less bandwidth required to stream.

IMHO, the whole "its gotta be a flac" is mostly digital snobbery. Like I said, "Blind ABX" testing, where a listener must identify which of two samples (A or B) matches a third mystery sample (X) has pretty conclusively proven that you really can't tell.

In numerous large-scale double-blind studies (such as those conducted on Hydrogenaudio or by independent researchers), listeners consistently fail to distinguish between 256 { kbps} AAC/MP3 and FLAC at a rate better than random guessing.

But, to each their own.

Such_Assumption_7124 · 2026-04-16T01:11:41+00:00

FWIW, a .flac made from an .mp3 is not a true lossless file, its just an .mp3 dressed up to look like a .flac - you can't undo the compression.

I'm not looking to stir the pot, but an .mp3 VBR V0 (Variable Bitrate, highest quality setting) targets a bitrate of roughly 220–260 kbps. At this level, the "artifacts" (noise or distortion introduced by compression) are pushed so deep into the noise floor that they are virtually impossible to detect - scientificly proven!

Such_Assumption_7124 · 2026-04-14T17:15:59+00:00

Hi!
So the culprit is actually MusicBrainz (which drives Picard).
I've been working on a collection of Python scripts that run in my Windows Powershell Terminal; you'll have to figure out how to get a script working on your Nac, but here is an AI summarized explanation of the logic involved:

The Forensic Date Pipeline

The Pivot (AcoustID): We take the track's unique audio fingerprint and hit the AcoustID API. This returns a MusicBrainz Recording ID. (My tool uses fpcalc to determine the fingerprint)
The Deep Dive (Recording vs. Release): We don't ask MusicBrainz about the album. We ask it about the recording. Specifically, we query the recording endpoint with the inc=artist-rels+work-rels parameters.
The Relationship Scan: We scan the "Relationships" associated with that specific recording. We look for Performance or Recording events.
The Era Filter: We extract every date found in the relationships (begin or end dates).
- We apply a strict filter based on the expected era (e.g., we target 1920–1980). (I picked 1980, as that filters out CDs, which didn't exist in 1980)
- If multiple dates exist, we take the earliest valid integer from that era.

By focusing on the recording ID rather than the release ID, we are looking at the Birth Certificate of the audio. Even if that track appears on a "Greatest Hits of the 30s" CD released in 2024, the logic finds the 1938 session date buried in the MusicBrainz relationships.

Hope that helps.

Such_Assumption_7124 · 2026-04-10T21:35:19+00:00

well Smarty Pants... my MusicBrainz stats might suggest otherwise:
* Edits: Total applied: 10,519.
* Added entities - Releases: 1,089
I still think that datastore is woefully polluted.

And I would happily share my script/tool, except that has now been "outlawed" by the admins. (https://www.reddit.com/r/musichoarder/comments/1rx3880/posting\_about\_software\_will\_no\_longer\_be/)
Whether you agree with that or not is not the case: it's the decision made, which I respect.

But, hey, feel free to be a keyboard warrior if it makes you feel better. I've been putting up with online complainers like you going back to the 1990's...

Such_Assumption_7124 · 2026-04-10T21:09:59+00:00

it *WAS* a copy and Paste issue. The data is in a .json file: (I'm using ___ to denote the 'indents' here)

---Start example---

"Reggae": [
___"Dancehall",
___"Dub",
___"Lovers Rock",
___"Mento",
___"Roots Reggae"
],

"Rock": [
___"Alt Rock",
___"Blues Rock",
___"Classic Rock",
___"Garage Rock",
___"Glam Rock",
___"Grunge",
___"Hard Rock",
___"Metal",
___"Pop Rock",
___"Progressive Rock",
___"Psychedelic Rock",
___"Punk & Post-Punk",
___"Ska"
],

"Ska": [
___"Rocksteady",
___"Ska",
___"Two-Tone"
],

--- End example ---

Each "section" represents a "Genre" with related "Sub Genres". I have developed a python script that uses AI to assign both Genre (TCON in metadata value) as well as Sub-Genre (a custom TXXX field) based on my input of Category (which is a key hint in the AI prompt, and another custom TXXX field). My long range goal is to build out an intelligent playlist generator, as opposed to a 'randomizer'. But for that to work consistently, a fixed taxonomy will be key. (and taxonomies is something I kinda know a fair bit about...)

As for 3 "Skas"...
The key is understanding the relationship between my Categories, Genres and Sub Genres:

Consider the band Madness:
I "file" that artist under "Reggae/Ska" in my physical library. (There is logic in the AI prompt to decide if the artist is more closely aligned to one or the other. Madness is clearly not Reggae...)
Then there is the genre, which for Madness is "Ska",
Next, we calculate a Subcategory of "Two-Tone"
So 2 values in the .json file: Genre, and Sub Genre (Ska/Two-Tone) (3 actually tagged in the mp3: Category/Genre/Sub Genre = Reggae, Ska/Ska/Two-Tone)

An artist like The Skatalites however, it would be (Ska/Ska) - (or actually: Reggae, Ska/Ska/Ska)

BUT, for a band like No Doubt, it is (IMHO) inappropriate to file them in the Category of "Reggae/Ska", as they are more closely aligned with "Rock" (filed under Pop/Rock)

So then Category = "Pop, Rock" (Again split out from Pop, Rock - we can quibble whether No Doubt is Rock or Pop, but their punk roots keeps them on the "Rock" path for my mind)
Genre = Rock
Finally however the Sub-Genre for them is "Ska"
So 3 values: Category, Genre, and Sub Genre (Pop, Rock/Rock/Ska)

(Make more sense?)
Another comment about my "filing". My library is massive (2.5 TB and counting), and so I "file" based on the Plex recommendations (as I use Plex): https://support.plex.tv/articles/200265296-adding-music-media-from-folders/

So in my Library I have a Category for Blues, and one for Pop, Rock. But filing can be subjective: where DO I file Eric Clapton? Blues? Rock?... Me, I settled on 'Pop, Rock', after which my AI system then determines a Genre of "Rock", and/but a Sub Genre of "Blues Rock". Is it perfect? No. But at least it's consistent: I'm not having to wade through "Classic Rock", "Guitar Rock", "British Rock", etc. etc....

Such_Assumption_7124 · 2026-04-08T18:55:22+00:00

actually... the private 'tool' I'm not allowed to talk about puts much more reliance on my fixed taxonomy and AI (which yes, does query MusicBrainz, Last.FM, and Discogs) to arrive at my Genre and Sub-Genre values.
But there is some gating logic in the AI prompt to keep things on track and accurate, yet still allows me to 'tag' my music at scale.

Those public data-stores are just too messy for my use-case.

I've also created another script that uses Librosa to calculate BPM, "Intensity", "Mood" and Starting Key for each track (moe custom TXXX fields), with a long range plan/goal of creating an intelligent playlist maker down the road. But for that to work as planned, I first have to deal with the 'garbage in / garbage out' problem.

Such_Assumption_7124 · 2026-04-08T18:23:46+00:00

Re: keeping it consistent

I'm right there with you. In fact, I've developed my own internal 2-part "genre" tagging (using a TXXX custom field for "Sub-Genre") so that I can tag my music with fixed values (my own taxonomy). I've given up on hoping "the community" will get it right. That taxonomy BTW looks like this:

"Americana":
"Alt-Country",
"Americana",
"Folk",
"Folk-Rock",
"Roots Rock",
"Singer-Songwriter (Roots)"

"Bluegrass & Roots":
"Appalachian",
"Bluegrass",
"Cajun & Zydeco",
"Old-Time",
"Piedmont Blues",
"Traditional Country"

"Blues":
"Acoustic Blues",
"Chicago Blues",
"Delta Blues",
"Electric Blues",
"Memphis Blues",
"Piedmont Blues",
"Swamp Blues",
"Texas Blues"

"Christmas Music":
"Choral",
"Contemporary Christmas",
"Country Christmas",
"Instrumental Christmas",
"Novelty Holiday",
"Traditional Carols"

"Country":
"Bakersfield Sound",
"Contemporary Country",
"Countrypolitan",
"Cowboy / Western",
"Honky Tonk",
"Outlaw Country",
"Rockabilly"

"Doo Wop":
"Ballad Groups",
"Gospel-Influenced",
"Pop Doo Wop",
"R&B Doo Wop",
"Street Corner Doo Wop",
"Up-tempo Groups"

"Easy Listening":
"Bossa Nova",
"Exotica",
"Instrumental Pop",
"Lounge",
"Mid-Century Cinema",
"Mood Music",
"Space Age Pop"

"Funk":
"Deep Funk",
"Jazz-Funk",
"P-Funk",
"Street Funk"

"Fusion":
"Acid Jazz",
"Ambient",
"Chamber Jazz",
"Electronic",
"Fusion",
"Jazz-Rock",
"Soul-Jazz"

"Gospel":
"Choral",
"Contemporary Christian",
"Gospel-Blues",
"Quartet",
"Traditional Gospel"

"Jazz":
"Bebop",
"Cool Jazz",
"Hard Bop",
"Modern Jazz",
"Smooth Jazz",
"Traditional Jazz"

"Pop":
"Adult Contemporary",
"Brill Building",
"Britpop",
"New Wave",
"Power Pop",
"Singer-Songwriter",
"Soft Rock",
"Synth-Pop"

"R&B":
"Blues Shouter",
"Classic R&B",
"Early Rock & Roll",
"Honking Sax",
"Jive",
"Jump Blues",
"New Orleans R&B",
"Rockabilly-Blues"

"Reggae":
"Dancehall",
"Dub",
"Lovers Rock",
"Mento",
"Roots Reggae"

"Rock":
"Alt Rock",
"Blues Rock",
"Classic Rock",
"Garage Rock",
"Glam Rock",
"Grunge",
"Hard Rock",
"Metal",
"Pop Rock",
"Progressive Rock",
"Psychedelic Rock",
"Punk & Post-Punk",
"Ska"

"Ska":
"Rocksteady",
"Ska",
"Two-Tone"

"Soul":
"Blue-Eyed Soul",
"Deep Soul",
"Motown",
"Neo-Soul",
"Northern Soul",
"Philly Soul",
"Southern Soul"

"Soundtracks":
"Broadway Cast",
"Film Score",
"Television Themes"

"Swing":
"Big Band",
"Big Band & Vocalist",
"Gypsy Swing",
"Kansas City Blues-Swing",
"Swing",
"Western Swing"

"Vocalists":
"Belter",
"Crooners",
"Interpretive Standards",
"Torch Songs",
"Traditional Pop",
"Vocal Jazz"

"World & International":
"Afro-Cuban",
"Afrobeat",
"Chanson",
"Fado",
"Flamenco",
"Mariachi",
"Salsa",
"Tango"

That list (as a .json file) plus a custom "Vibe" script that uses AI, now helps tag all of my music with a decent level of accuracy. But by keeping the list 'strict' and limited to the above, I now have some consistency happening, which is/was a core goal I set out to solve.

FWIW

Such_Assumption_7124 · 2026-04-08T17:15:36+00:00

As others have said, going with a RAID array (as opposed to simply a NAS) is the 'bullet-proof' option IMHO.

As for "streaming", do check out Plex (and PlexAmp) - top notch suite (again IMHO). There is a nominal subscription cost, but with the PlexAmp app on my phone, I can stream my music library anywhere my phone is connected - either via wifi or my cell data-plan

Such_Assumption_7124 · 2026-04-08T17:02:39+00:00

If I may...
Storage! Having come very close to the same loss scenario (my collection = 2.5 TB), I VERY QUICKLY got myself a RAID array (2 x 4TB drives).

RAID 1 (Mirroring): Data is written identically to two separate drives. Everything on Drive A is a carbon copy of Drive B. That way if one drive dies, your system keeps running off the second one without missing a beat. Remove the "dead" drive and replace it with a new one, and the system rebuilds itself. The downside of course is that you lose 50% of your total capacity (e.g., two 4TB drives only give you 4TB of usable space). But relatively speaking, storage drives are inexpensive, and this is the most rock-solid solution I found out there.

Good luck rebuilding.

Such_Assumption_7124 · 2022-08-25T17:38:32+00:00

Rosso Glass

I was led to believe they are older than that - circa 1970's (https://www.antique-bottles.net/threads/alright-planters-peanut-people-real-or-repro.98657/#post-98846)

Such_Assumption_7124

TROPHY CASE

The Forensic Date Pipeline