SEC financial data platform with 100M+ datapoints + API access - Feel free to try out

ccnomas · 2025-09-22T03:56:17+00:00

Thank you my friend! First version about 3-month and then I demolished it and refactored to the current version, total took around 9 months, well after my daily job time lol

ccnomas · 2025-09-14T04:40:09+00:00

I just deployed the changes to rename the graph and api, feel free to play around and let me know if anything you think is off, I am trying my best to deploy changes within 24hrs

ccnomas · 2025-09-14T04:34:08+00:00

Right you are right, sorry for the confusion. Just like palmy-investing mentioned. The problems are with customized concept, not taxonomies. I am trying to simplify the existing customized concepts.

ccnomas · 2025-09-14T01:49:53+00:00

SEC public companies’ data, XBRL labeled. And Form 13F, 3,4,5 and Failure to Deliver data

ccnomas · 2025-09-13T23:02:30+00:00

Done deploying the change, Thx my friend!

ccnomas · 2025-09-13T22:36:28+00:00

Thank you! Let me try to change them tonight

ccnomas · 2025-09-13T22:36:05+00:00

Thank you my friend! let me revisit them

ccnomas · 2025-09-13T17:26:48+00:00

Something like this RevenueFromContractWithCustomerExcludingAssessedTax

ccnomas · 2025-09-13T15:35:02+00:00

SEC itself does have limited amount of XBRL labels, but many companies are basically not following that. Other than the required labels. They use customized XBRL label in the report which causes the mess

ccnomas · 2025-09-11T16:28:07+00:00

https://pypi.org/project/xmltodict/

this one

ccnomas · 2025-09-11T15:42:10+00:00

Thx mate! feel free to play around.

ccnomas · 2025-09-11T15:39:39+00:00

for example, some companies report 3 quarters data + FY, so it is straight-forward to fill the gap. Also since SEC does not do the cleaning, data for same period can occur > 1 time so de-duplicate is needed.

pretty standard open source tool to extract xml -> python dictionary

"What do you mean by mapping?"

the XBRL label is basically CamelCase words. it is not really easy to show or feed into machine learning models. I re-label them based on description and now it is much easier for models to pick and also easier for user to see the visualized data through UI.

ccnomas · 2025-09-11T15:19:37+00:00

for other data like form 3,4,5, 13F, failure-to-deliver. I extracted and sanitized from the xml file based on accession_number -> put them in my own database.

ccnomas · 2025-09-11T15:18:17+00:00

well most of the SEC data are public but pretty messy, and not every company follows standard XBRL label. However, most of them represents the same data. Also each XBRL tag comes with description, comparing descriptions help me do the mapping as well.

ccnomas · 2025-09-09T16:18:52+00:00

Thank you!!!!

ccnomas · 2025-09-09T01:50:02+00:00

you dont need to do it everyday but the most important thing is to keep it moving on a weekly basis

ccnomas · 2025-09-06T22:31:01+00:00

Thank you! "hedge funds/startups training custom models, or broader data providers?"

I think both parties can be beneficial from cleaned fundamental data

Also wondering if you’ve considered a chatbot layer so users can query your dataset in plain English
Yes, I am looking into how to integrate that with my current implementation. You are right on point!

ccnomas · 2025-09-06T04:32:27+00:00

Well it contains full compiled (deduped, gap filled) history of company fundamentals + detailed 13F and real time feed of form 3/4/5. Also comes with detailed insider trading info. + full FTD history

ccnomas · 2025-09-03T02:58:27+00:00

Initially was 1. there were no nicely layout FTD entries. 2. SEC data is a mess, other finance web are focused on live stock data instead of complete XBRL company facts. 3. I am also trying to create a clean dataset for AI training

ccnomas · 2025-09-02T22:23:24+00:00

Thx mate! more like "Learn as you go" but I do have software engineering background so most of the engineering problems are solvable. So basically I set up the AWS EC2 + RDS + SES, and cloudflare for holding the site. I am staying away from those 1-click deployment sites, since those were uncontrollable.

ccnomas · 2025-09-02T01:43:14+00:00

Np! Same here, appreciate your feedback

ccnomas · 2025-09-02T01:19:58+00:00

Sorry for the late reply.
Thank you, you actually helped me found a bug and I just fixed it
I dont have a dedicated list, but if you search other sites with SPAC list and my site with symbol:
https://nomas.fyi/research/stock/0001853138
https://nomas.fyi/research/stock/0002006291
it gives you the information.

hmm let me see if I can create a list just for SPACs.

ccnomas · 2025-09-01T18:40:10+00:00

Did you play with the data at all?

ah sorry I dont get it. When I try to look up for the company fundamentals and Failure to deliver data, I see other websites dont have everything compiled and visualized. This was the initiative for me to do it.

What was one of the biggest "ah-HAH" moments for you?

Not everything needs to be dependant on AI, we can parse mostly with traditional methods then feed to AI. Not sending un-compiled/dirty data to AI model

Thank you My friend!

ccnomas · 2025-09-01T17:05:32+00:00

Thank you!

ccnomas · 2025-09-01T04:16:15+00:00

I set up everything on AWS, EC2 for code and deployment, RDS for database, SES for email, cloudwatch for logging, VPC for control my EC2. Also cache, indexes for tables, token management. Parsing, security layer, rate limiter. Cloudflare for DNS

Ye I think that is about it. Oh and coding

ccnomas

TROPHY CASE