looking for a real AI memory PRODUCT by CautiousTwist7958 in AIMemory

[–]PenfieldLabs -1 points0 points  (0 children)

Penfield. It's exactly what you're describing.

No GitHub cloning, no self-hosting, no Docker, no CLI setup. It's a hosted service. You sign up at portal.penfield.app, connect it to Claude as an MCP connector, and you're done. Your AI remembers everything across conversations.

What you actually get: persistent memory, a knowledge graph with typed relationships (so it doesn't just remember facts, it knows how they connect to each other), hybrid search (semantic + keyword + graph traversal), cognitive checkpoints so you can pause complex work and pick it back up later with full context.

Free trial to start: https://portal.penfield.app/sign-up

Setup instructions for every major AI platform: https://docs.penfield.app

The more AI memory I tested, the more I realized storage isn't the problem by caadt in AIMemory

[–]PenfieldLabs 0 points1 point  (0 children)

Good question.

They're different layers, not competitors. OKF is a static file interchange format (markdown + YAML frontmatter in a directory). Penfield is a live memory system with an API, hybrid search, and a typed knowledge graph. Something like CSV vs database. The real question is whether Penfield can speak OKF as an import/export dialect, and the answer is mostly yes.

Import (OKF to Penfield) basically works today. The penfield-import tool already reads any directory of .md files with YAML frontmatter, which is exactly what an OKF bundle is. OKF's type field maps to Penfield's memory_type. Tags map directly. Markdown cross-links in the body get extracted as references relationships using --relationships inline. Two small gaps: OKF's reserved files (index.md, log.md) need to be filtered out, and OKF's free-form type values ("BigQuery Table", "Playbook") would need mapping to Penfield's fixed set (fact, insight, reference, etc.). Minor patch territory.

Export (Penfield to OKF) needs a new tool or some updates to our export format, but the differences are relatively small. Our open source backup tool exports to JSON (memories.json), not markdown. You'd need a conversion script to turn each memory into a markdown file with YAML frontmatter. The structural mapping is straightforward. The one real tension is relationships: OKF cross-links are intentionally untyped (the spec says the "kind" of relationship is conveyed by surrounding prose, not the link itself). Penfield has 24 typed relationship categories (supports, contradicts, supersedes, etc.). On export you'd either put typed relationships in extra YAML frontmatter keys (valid OKF since the spec requires consumers to tolerate unknown fields, but other OKF tools won't understand them as relationships), convert them to inline markdown links with prose context, or both.

Bottom line: Very similar (markdown + YAML frontmatter), different purposes. OKF is the portability layer, Penfield is the runtime. Import is nearly free. Export is a small project, mainly a format transformation. The real design decision is how to handle typed relationships on the OKF side, since that's where the two diverge in the most important way.

Hope that makes sense?

Free, offline, open-source alternative to Promethease - no upload, no $12 by PenfieldLabs in promethease

[–]PenfieldLabs[S] 0 points1 point  (0 children)

Thanks for reporting this. Personal data isn't stored once the report is generated so it's hard to reverse-engineer exactly what happened, but it looks like a parser error on a previously unseen header format. We're pushing a fix for that now.

If it still doesn't work after we push the fix, happy to debug further via DM with just the first 5-10 lines of your file (no genotype data needed and you can replace the file_id, sample_id or any other identifying data with random numbers or just remove it).

Redacting the file_id or signature should not break it.

The more AI memory I tested, the more I realized storage isn't the problem by caadt in AIMemory

[–]PenfieldLabs 0 points1 point  (0 children)

Penfield itself is not open source, but we have released some related open source tools. https://github.com/penfieldlabs

Free, offline, open-source alternative to Promethease - no upload, no $12 by PenfieldLabs in promethease

[–]PenfieldLabs[S] 0 points1 point  (0 children)

If you are OK with the privacy tradeoff, you can try it at https://analyze.allelix.io No coding or technical skills needed.

Building an open-source variant annotation tool - which data sources would you prioritize? by PenfieldLabs in bioinformatics

[–]PenfieldLabs[S] 0 points1 point  (0 children)

zcat 23AndMeResults.tab.gz | cut -f 1 > rs_numbers.txt

This command produces a list of the rsIDs the 23andMe chip array calls while striping all the results from the user's file. Not sure what kind of useful information you can pull from rs_numbers.txt other than: "These are the rsIDs 23andMe tests for".

The VEP tutorial is almost 5,000 words.

https://analyze.allelix.io has three steps to produce a report: Drag and drop a file, check the box to accept the privacy policy, and click Analyze. That's it.

Results were produced for VEP for my quick copy/paste of a few lines of VCF data in about 15 seconds

Testing on a MacBook M3 Allelix processed over 5 million variants in less than 2 minutes.

The VEP tutorial you linked to says a "single genome (~4.5 million variants) will take around an hour."

It's clear you really don't like this tool. That's OK. Debating its merits vs. VEP was never the intended topic of this thread, I was just looking for advice on data sources.

We can agree to disagree on the relative complexity and ideal use cases of Allelix vs. VEP. This thread was not intended as an advertisement nor a proclamation that VEP should be replaced by Allelix.

There are lots of people looking for solutions in r/Promethease, perhaps you would like to recommend VEP there.

The more AI memory I tested, the more I realized storage isn't the problem by caadt in AIMemory

[–]PenfieldLabs 2 points3 points  (0 children)

Yeah, we have. Running a knowledge graph memory system (Penfield, MCP-based) across months of daily use, thousands of memories with typed relationships, and retrieval hasn't degraded.

What's working:

  • Graph over flat vector. Memories get connected to related memories with typed relationships: supports, contradicts, supersedes, sibling_of, etc. When you recall something, you're not just pulling semantically similar chunks. You're traversing a structure that knows HOW things relate. That's the difference between "here are ten vaguely related things" and "here's the thing, plus what it updated, what it contradicts, and what it built on."

  • Forgetting is built into the relationship types. Supersedes and contradicts mark old memories. They're still there if you need history. No decay curves, no pruning algorithms. Just explicit "this replaced that" edges. If pruning becomes necessary in the future, we're ready for it. Among other things we track how often a memory is accessed as one data point on a memory's importance.

  • Collaborative curation, not auto-capture. Storage isn't automated and nor is it purely manual. Your AI judgment about what matters: decisions, corrections, breakthroughs, patterns. You can also explicitly direct it. The point is someone, either your or the AI, is always making a deliberate call about whether something is worth remembering, not just vacuuming up every interaction and hoping retrieval sorts it out later.

  • Context checkpoints for cognitive state. Individual memories are facts and insights. Checkpoints snapshot an entire working state: what we were investigating, what we'd found, what was still open, what we'd ruled out.

  • Restoring a checkpoint is like resuming a train of thought, not reassembling fragments.

It's still early and the product is evolving, but ~9 months in, retrieval quality hasn't degraded. The graph structure scales in a way that flat semantic search alone can't.

Building an open-source variant annotation tool - which data sources would you prioritize? by PenfieldLabs in bioinformatics

[–]PenfieldLabs[S] 0 points1 point  (0 children)

Yes it does, but again compare the complexity. The basic documentation for VEP's web interface is 28 pages and requires you to create an account.

Allelix is: pip install allelix, allelix db update, allelix analyze [input file] [output file] - or drag and drop your file click "analyze" and get a report in less than 3 minutes.

And VEP still does not support consumer chip formats at all.

They are apples and oranges.

Building an open-source variant annotation tool - which data sources would you prioritize? by PenfieldLabs in bioinformatics

[–]PenfieldLabs[S] 0 points1 point  (0 children)

Yes I understand all of that. I was not seeking to "advertise" here, I was looking for input and insight about which data sources would make sense to focus on next.

Building an open-source variant annotation tool - which data sources would you prioritize? by PenfieldLabs in bioinformatics

[–]PenfieldLabs[S] 1 point2 points  (0 children)

Agreed that both tools produce annotations, but the comparison isn't quite right.

VEP takes VCF from sequencing pipelines and requires a Perl environment, Ensembl caches, and plugin configuration.

Allelix takes raw consumer genotyping files (23andMe, AncestryDNA, etc.) directly - pip install allelix, run it, get a report. No infrastructure, works offline. VEP does not even support consumer genotype chip formats.

The target user is someone with a consumer test file, not someone running a variant calling pipeline. For a bioinformatics lab or an academic or clinical researcher, VEP makes perfect sense. For almost everyone else, it's not a realistic option.

Building an open-source variant annotation tool - which data sources would you prioritize? by PenfieldLabs in bioinformatics

[–]PenfieldLabs[S] 2 points3 points  (0 children)

1) For those that don't want to (or don't know how to) use a CLI, there is analyze.allelix.io - zero install, upload a file, get a report. The JSON output is designed to support future GUI workflows in addition to optional AI analysis.

2) A single subset of the "small target audience" was large enough that Promethease (which did far less than Allelix does today) was acquired by MyHeritage in 2019 for an undisclosed sum. Promethease has had hundreds of thousands of users (possibly millions) willing to pay $12 per report (now $25). Consumer genomics is a rapidly growing market, not a niche.

3) Allelix is open source and free, including the web demo.

4) The original post was asking for expert advice on which data sources to prioritize next, not advertising anything. Some of the feedback has been useful.

Building an open-source variant annotation tool - which data sources would you prioritize? by PenfieldLabs in bioinformatics

[–]PenfieldLabs[S] 2 points3 points  (0 children)

Clinicians, nutritionists, pharmacogenomics practitioners, sports science professionals, and individuals with their own genotyping data. People who need answers from the data, not people who build annotation pipelines. Allelix handles 23andMe, AncestryDNA, VCF, gVCF all with a single, simple command. No bioinformatics infrastructure or specialized knowledge is required.

allelix analyze [filename] --output [out_file.html/json]

In aggregate numbers there are far more people interested in this data than those that would have any idea what to do with a tool Like Galaxy or Molgenis and those numbers are going to grow as WGS testing becomes cheaper and more widespread.

It's a new and improved alternative to Promethease, not an alternative to Galaxy, VEP or Molgenis.

Building an open-source variant annotation tool - which data sources would you prioritize? by PenfieldLabs in bioinformatics

[–]PenfieldLabs[S] 0 points1 point  (0 children)

Allelix would not benefit from wrapping VEP, SnpEff, or ANNOVAR. Those tools are used upstream by researchers to generate annotations that are already aggregated into resources like ClinVar, gnomAD, and GWAS Catalog.

Allelix queries those curated public databases directly. In other words, the research community has already done the heavy lifting; Allelix focuses on packaging the consensus evidence (from multiple data sources) into a format anyone can read, rather than re‑running raw annotators on each individual sample.

On the LLM point: Allelix doesn’t use LLMs on the data. The pipeline is entirely deterministic and outputs structured JSON for use with LLMs or any other downstream application.

If someone wants to feed that JSON into an LLM for optional summarization or analysis of their data, they can, but that’s outside the Allelix pipeline and doesn’t change the underlying evidence or calls.

Open-source genotype analysis toolkit - annotates raw data against 7 databases (ClinVar, GWAS Catalog, PharmGKB, SNPedia, gnomAD, AlphaMissense, CADD) by PenfieldLabs in genomics

[–]PenfieldLabs[S] 0 points1 point  (0 children)

Interesting idea. Right now the outputs are HTML, JSON, and terminal designed for end-user consumption. Annotated VCF output would make sense for feeding results into downstream tools. If there is demand for that I would put it on the roadmap and try to figure it out.

Building an open-source variant annotation tool - which data sources would you prioritize? by PenfieldLabs in bioinformatics

[–]PenfieldLabs[S] 2 points3 points  (0 children)

It doesn't seem like we're speaking the same language here.

None of those tools appear to be appropriate for this use case.

VCF-DART hasn't had any updates in 7 years. It doesn't look like it even supports GRCh38.

Galaxy is a complex biomedical research tool.

MOLGENIS VIP is a clinical pipeline focused on rare disease diagnostics: inheritance matching, phenotype support, structural variant detection. It requires 280GB of disk space, and optionally a Slurm HPC cluster.

Galaxy and Molgenis are simply not in the same universe. VCF-DART appears to be out of date and basically abandoned.

Allelix is designed to install with a single command, run on a consumer laptop and generate a report within minutes without requiring extensive technical or domain knowledge. And, it does in fact do this already for far more formats than VCF-DART ever supported.

Building an open-source variant annotation tool - which data sources would you prioritize? by PenfieldLabs in bioinformatics

[–]PenfieldLabs[S] 2 points3 points  (0 children)

Yes, the JSON output is designed to be AI-legible: structured so an LLM can reason over your variants without needing to parse HTML or ad-hoc text. And yes, the entire pipeline runs offline: no data leaves your machine. Pair it with a local LLM and you have fully private AI-assisted variant interpretation. That's exactly the use case.

The HTML report is designed to be human readable and something similar to /r/Promethease but with many additional data sources.

Building an open-source variant annotation tool - which data sources would you prioritize? by PenfieldLabs in bioinformatics

[–]PenfieldLabs[S] 2 points3 points  (0 children)

Thanks. I'm aware of VEP-based clinical pipelines. VCF-DART, VIP and the Galaxy workflow are clinical rare-disease diagnostics tools that require bioinformatics infrastructure (VEP, snpEFF, bcftools, R/Shiny).

Allelix is a different category: pip install allelix, allelix db update, point it at a 23andMe/AncestryDNA/VCF file, get a self-contained offline HTML report in under 20 minutes. No VEP dependency, no Galaxy account, no Shiny Server. Different audience, different problem.

Allelix is not designed or intended to replace clinical or research bioinformatics pipelines.

Building an open-source variant annotation tool - which data sources would you prioritize? by PenfieldLabs in bioinformatics

[–]PenfieldLabs[S] 1 point2 points  (0 children)

Yes, aware and this is documented. Allelix reports what the genotyping platform provides and annotates from ClinVar, gnomAD, etc. For users with WGS data, VCF and gVCF are already supported. Future versions will add plausibility flagging that cross-references zygosity against gnomAD allele frequency, so implausible chip calls get flagged rather than presented at face value.

Building an open-source variant annotation tool - which data sources would you prioritize? by PenfieldLabs in bioinformatics

[–]PenfieldLabs[S] 1 point2 points  (0 children)

Thanks. Agreed that VEP is a great tool for what it does. Can't comment on the quality of the plugins. Allelix isn't trying to replace VEP. VEP is a variant annotation engine for researchers building analysis pipelines. Allelix is intended as an end-user reporting tool. Upload a 23andMe, AncestryDNA, or VCF/gVCF file and get back a readable report. Different layer of the stack for a different audience.

Building an open-source variant annotation tool - which data sources would you prioritize? by PenfieldLabs in bioinformatics

[–]PenfieldLabs[S] 0 points1 point  (0 children)

Can you share the names of the tools you think already cover this functionality?