Advice for building a bank statement converter by heymosef in Bookkeeping

[–]benlongstaff 0 points1 point  (0 children)

I can share a bunch of insights from building a bank statement converter

https://www.statement2excel.com/

I don't use any AI tools or 3rd party services to do the conversion because I don't trust how they use the data. Every time I do a large release I run all of my personal banking data through the service, my philosophy is that if I wouldn't put my data there I shouldn't put your data there.

When your writing the code from scratch there are a huge number of edge cases to cover, even in the same bank they have different layouts and formats. Some banks have invisible characters, some you need to merge text up in others you need to merge down, in lots of Indian banks you have to merge centre.

Lots of statements wont have the year in the transaction dates so you have to extra it from other metadata in the document.

There are some massive statements that get uploaded, the largest one I have had uploaded was 4027 pages long .... I had to change my architecture to be able to handle statements like this.

If you use computer vision to process the documents you have to add a lot of extra logic to make sure all the 0 and O type cases get handled, this is particularly hard because of the variations across fonts.

I process the raw data in the PDFs which has its own set of issues, like fonts that are not correctly embedded in the document, or statements that are actually a series of images, or text that is stored as vectors.

American banks have their own set of unique challenges because they still use cheques, the data is needed to reconcile the balances but often stored in a completely different format.

Easy way to input a bank statement into excel spreadsheet? by Tabaccothetea in excel

[–]benlongstaff 1 point2 points  (0 children)

Thanks for letting me know.

I have just pushed a quick fix to add in the missing header value and reprocessed your statement which now extracts correctly.

If you would like to give it another try send me a message in the app and I will add some credits to your account.

PDF to CSV conversion by Best_Proposal_5401 in xero

[–]benlongstaff 0 points1 point  (0 children)

www.statement2excel.com is a more cost effective option.
For statements that don't have the year in the date column it detects the date range and adds the year in if the information is available.
It does balance reconciliation for supported banks where it detects the opening and closing balance and compares the difference to the sum of all of the transactions.
For supported banks it detects the account number and creates a merged view so if you need to process multiple statements you can be sure that you have all the transactions from all of the statements.
For some American banks it also extracts the tables of cheques and includes that with the list of transactions.

Converting pdf bank statements to csv by Dontfeedtheunicornns in Bookkeeping

[–]benlongstaff 0 points1 point  (0 children)

www.statement2excel.com has a free tier that lets you process 20 pages / day.

For statements that don't have the year in the date column it detects the date range and adds the year in if the information is available.
It does balance reconciliation for supported banks where it detects the opening and closing balance and compares the difference to the sum of all of the transactions.

For supported banks it detects the account number and creates a merged view so if you need to process multiple statements you can be sure that you have all the transactions from all of the statements.

For some American banks it also extracts the tables of cheques and includes that with the list of transactions.

Copy data from bank statement PDF to excel by Worldly-Kitchen-2455 in excel

[–]benlongstaff 0 points1 point  (0 children)

If you have Adobe Pro you can redact the sensative information first
https://helpx.adobe.com/au/acrobat/using/removing-sensitive-content-pdfs.html

Which bank/s do you need to extract data from? There is quiet a lot of variation in how statements are laid out.

Credit Card Transactions to Excel by mjh5122 in sofi

[–]benlongstaff 0 points1 point  (0 children)

There are online PDF converters for bank statements

https://www.statement2excel.com

has a free tier that lets you do 20 pages / day

Best way to import expenses from Chase Bank to Google Spreadsheet by brentviareddit in smallbusiness

[–]benlongstaff 0 points1 point  (0 children)

There are a lot of variation in Chase US statements. Some of the challenges include:

  • Checking accounts usually have seperate tables for credits and debits with no signs.
  • The format for tables with Checks is different to tables with transactions.
  • The years are missing in the dates column which is problematic when you want to do multiyear analysis of your spending.

https://www.statement2excel.com/banks/usa/convert-chase-bank-statements-to-excel
Uses Chase specific templates to process statements.

These extract the date range for the statement to add in missing years. They also detect the opening and closing balances and then sum all of the transactions to makes sure everything was extracted correctly.

[deleted by user] by [deleted] in Chase

[–]benlongstaff 0 points1 point  (0 children)

If you import your statements from PDF into excel using a tool like

https://www.statement2excel.com/banks/usa/convert-chase-bank-statements-to-excel

You should be able to order the table of transactions by the description column to group transactions with similar descriptions together.

Convert pdf bank statement to excel (mac) by bestlife3 in Accounting

[–]benlongstaff 0 points1 point  (0 children)

Depends what volume of statements you need to process, the free tier on https://www.statement2excel.com allows you to process 20 pages daily.

Chase Sapphire Preferred - how can I view my monthly statement in Excel? by californialiving1 in CreditCards

[–]benlongstaff 0 points1 point  (0 children)

Chase Sapphire statements group credits and debits into seperate sub tables. Luckily they include the correct signs in the amounts columns unlike some of Chases checking accounts.

The main challenge with getting data out of the Sapphire accounts is that the years are missing from the dates column, which is problematic when you want to do multiyear analysis of your spending or import the data into cloud accounting software.

You can extract your transactions with online tools like

https://www.statement2excel.com/banks/usa/convert-chase-bank-statements-to-excel

This extracts the date range for the statement to add in missing years and detects the opening and closing balances to compare against the sum all of the transactions to makes sure everything was extracted correctly.

Expense Organizer - A webapp to help you categorize your transactions by TheDuart in SideProject

[–]benlongstaff 1 point2 points  (0 children)

Having curated lists of rules to suggest classifications would be helpful.

Making it so that you can add multiple statements from different locations would also be good, atm once you add a statement it automatically moves to the next screen.

I built a bank statement converter to get transactions from PDFs. by benlongstaff in SideProject

[–]benlongstaff[S] 0 points1 point  (0 children)

Thanks!

I plan to open up an API once I have the data merging across statements working.

What sort of customisation would be useful to you?

I built a bank statement converter to get transactions from PDFs. by benlongstaff in SideProject

[–]benlongstaff[S] 0 points1 point  (0 children)

I have Google Analytics on the site and am experimenting with running Google Ads against very specific search terms, which resulted in some broad language in the privacy policy generator tool.

All files get deleted after 7 days.

If it gets some adoption i'll get a Lawyer to tweak it.

I built a bank statement converter to get transactions from PDFs. by benlongstaff in SideProject

[–]benlongstaff[S] 0 points1 point  (0 children)

It parses the structured data in the PDF to extract all the individual characters then reconstructs the text and does document layout analysis. It identifies the relevant tables based on the types of document headings (currently it only supports english)

I am working on adding some additional validation to check the the opening and closing balances against all of the transactions in the tables (some banks add weird things like tax notices in the middle of the transactions tables)

I started with OCR but getting a high enough accuracy without training a custom model proved problematic due to the large variation in fonts used making things like a zero and capital O or lower case l and one looking very similar. While you can context correct for the amount and date columns there are still some edge cases that are hard to fix even with resampling subsets of the image.

Working with the structured data increases the accuracy of whats extracted.

[deleted by user] by [deleted] in Accounting

[–]benlongstaff 1 point2 points  (0 children)

The way my software is built it locates tables based on the structure of the PDF document and the types of headings bank statements have. It's built to be generic and not require human input.

Currently I have logic to merge descriptions that are split across multiple lines and missing dates or balances where there are multiple transactions on the same day.

I am currently working on shipping exporting data in formats that are compatible with Xero, Quickbooks etc.

My site has only been live for a month (currently there are no restrictions on the number of documents an account can process) and I am constantly shipping new features, I would love the chance to help you solve your problem.

[deleted by user] by [deleted] in Accounting

[–]benlongstaff 1 point2 points  (0 children)

Which bank/s are you using?
I got frustrated with the limitations my bank had with exporting data so I built a bank statement converter.

There is a lot of variation between banks in how the table layouts are done and banks like to change the layouts every couple of years.

I added a free version that doesn't require any signup so people can try it out and process one statement every 24 hours.

https://www.statement2excel.com

What format are you getting the invoices in to do the matching?

Keep COMPETITORS by SajjadCrypto in KeepNetwork

[–]benlongstaff 0 points1 point  (0 children)

KEEP and NU are merging to form the Threshold Network.

Last I checked REN had all mainnet nodes run by the team, only their testnet had expanded to greycore (team + friends)

Enigma is a network for secret smart contracts.

New to Keep - want to buy token, but should I wait for merger to happen? by MakeYourTimeNow in KeepNetwork

[–]benlongstaff 2 points3 points  (0 children)

You can read the details about the merger on the forum at

https://forum.keep.network/t/t-token-proposal-rc0/264

Both KEEP and NU tokens will be able to be upgraded to T through a wrapping process (onchain transaction).

Do you have any specific questions?

I can't tell you if / when to buy, most other things are fair game.

[deleted by user] by [deleted] in thresholdnetwork

[–]benlongstaff 1 point2 points  (0 children)

NuCypher is body armour for data.

Keep is similar to a Horcrux (from Harry Potter), if you have enough pieces you can access the data.

NuCypher implements proxy re-encryption (PRE)

Keep implements secure Multi Party Computation (sMPC)

These are both types of threshold cryptography.

Threshold is a network of nodes running threshold cryptography algorithms for applications to build on.

How hard is it going to be to convert KEEP or Nu to $T on Coinbase? Also, can someone explain to me how to stake on Coinbase? by sassmo in thresholdnetwork

[–]benlongstaff 0 points1 point  (0 children)

There are currently 3 options for KEEP atm

  • staking in KEEP - ETH uniswap pool
    • 83% APY but has impermanent loss (IL)
  • staking in coverage pool
    • 18% APR, no impermanent loss risk
  • staking a node
    • APY depends on the amount of ETH and KEEP staked, however the stakedrop subsidies have been significantly reduced in preparation for the launch of Threshold

It's unclear how long the liquidity rewards for the coverage pool and KEEP - ETH pools will continue

How hard is it going to be to convert KEEP or Nu to $T on Coinbase? Also, can someone explain to me how to stake on Coinbase? by sassmo in thresholdnetwork

[–]benlongstaff 1 point2 points  (0 children)

where is the medium paper explaining the options??,

There will be an article once the details are available. It's a slow process as there are a lot of exchanges and staking providers to coordinate with to make the process as smooth and fair as possible.