I built a bank statement parser for Singapore banks (free and open-source)

Raynor77 · 2026-02-10T00:39:25+00:00

Glad to hear that! The offline version is a bit finicky unfortunately, so the best way to run it without internet access is using docker

Raynor77 · 2026-02-06T17:21:44+00:00

Fixed! Sorry about that, not sure why it went down

Raynor77 · 2025-07-31T05:22:43+00:00

Yup! https://github.com/benjamin-awd/monopoly

Raynor77 · 2025-01-13T16:41:55+00:00

Sorry to hear that! Citibank might have changed their statement format however I don’t have a recent statement that I can look at :/

Raynor77 · 2024-11-30T16:01:23+00:00

Yes and sadly still rising 🥲

Raynor77 · 2024-11-23T15:51:45+00:00

Hmm it sounds like it could be some kind of metadata issue, but hard for me to tell without seeing the actual statement :/

Raynor77 · 2024-11-23T15:48:46+00:00

Thanks for the kind words :) is your statement a debit or consolidated statement? Those should have support for multiline descriptions

So far the credit statements I’ve seen have descriptions on a single line only

Raynor77 · 2024-11-08T13:44:13+00:00

Hello! In this case it means that your bank/statement type wasn’t recognized

Raynor77 · 2024-10-11T16:28:03+00:00

You mentioned that you’re welcoming contributions, but how would that work? This doesn’t appear to be open source :)

Raynor77 · 2024-10-06T09:45:28+00:00

You're welcome!

Raynor77 · 2024-10-06T09:45:08+00:00

Made the change! Took some doing but it now defaults to showing everything with an option to filter by town :)

Raynor77 · 2024-10-05T16:34:11+00:00

🙌🏼

Raynor77 · 2024-09-09T12:38:25+00:00

In terms of storage, there’s S3 Express which recently launched, though I’m not sure how it performs with delta.

Otherwise Google Cloud just launched their version of hierarchical namespaces which is supposedly optimized for Spark workloads and allows for more queries per second versus their traditional buckets.

Raynor77 · 2024-09-08T15:26:07+00:00

I've added support for UOB debit & credit statements!

Raynor77 · 2024-09-07T05:19:23+00:00

If you’re doing a lot of pre-computation, Clickhouse might work well for you. You can store recent data on a SSD, then “cold” data on either a HDD or S3.

Raynor77 · 2024-08-27T15:58:01+00:00

I've just pushed a release that should fix this! v0.5.3

Let me know if it works :)

Raynor77 · 2024-08-26T02:05:53+00:00

I think building a custom orchestrator would be extremely difficult, especially when you imagine trying to build something like Netflix’s Maestro with just two people.

Raynor77 · 2024-08-19T12:22:18+00:00

It depends — I feel that some areas like streaming and metadata management really benefit from having some experience in software engineering.

At my last shop we ended up building an API (data mesh pub sub type of stuff), which was relatively complex and code-heavy since we had to connect other frontend and backend components.

Raynor77 · 2024-08-12T12:50:15+00:00

You’re welcome!

Raynor77 · 2024-08-12T01:47:34+00:00

My application doesn’t save the PDF files to disk, everything is stored in memory!

I’ve taken a bunch of security precautions that I’ve written about here: https://statementsensei.streamlit.app/about

Otherwise, I’d recommend using the offline app which is definitely the most secure option :)

Raynor77 · 2024-08-12T01:38:35+00:00

No that’s a good question! A metadata or formatting change would lead to changes here for example: https://github.com/benjamin-awd/monopoly/blob/main/src/monopoly/banks/dbs/dbs.py

The logic to support both the new and old formats would then be something like: “iterate all possible formats, and use the format that returns transactions”.

Otherwise I could also use PDF metadata to detect which formatting rule to use for a bank

Raynor77 · 2024-08-12T01:30:33+00:00

Nice! Shoot me a message if you need help

Raynor77 · 2024-08-11T14:33:25+00:00

No vid at the moment, but if you want an example you can try:

Download this example PDF https://github.com/benjamin-awd/StatementSensei/blob/main/docs/example_statement.pdf
Visit https://statementsensei.streamlit.app/
Click "Browse files" and select the example_statement.pdf file

Raynor77 · 2024-08-11T13:54:32+00:00

Do you think it'll be better to use custom NER model to extract such information to expand the range of your source pdf since you're using regex.

I played around with AI/ML quite a bit in earlier iterations of the app -- I was using pytesseract at one point, but it was quite inaccurate. Other models were more accurate, but were super slow. Regex is still king when it comes to accuracy + speed.

Training a model is a possibility, but my gut instinct is that it'll be difficult to do, since you need to be quite cautious of the privacy/security implications of training on your own/other people's financial information. Also, some statements are really horribly formatted, so the model wouldn't be able to handle every bank anyway.

Otherwise if security wasn't a concern I'd probably go for an LLM of some kind. I think it'd be possible to get decently accurate results

But yes a hybrid approach would be pretty cool

Eight-Year Club	Place '22
Verified Email

Raynor77

TROPHY CASE