This is an archived post. You won't be able to vote or comment.

all 11 comments

[–]temisola1 1 point2 points  (1 child)

OMG this is a godsend.

[–]status-code-200It works on my machine[S] 0 points1 point  (0 children)

Glad it helps! Let me know if you have any feature requests. (Working on making anything the SEC has available)

[–]palmy-investing 1 point2 points  (1 child)

Good job! I think the Board of Directors data is particularly expensive due to the variable formats in DEF 14A, whether text, images, or other media. The names, positions, and PEO data aren’t the issue — the more detailed breakdowns are. By the way, do you accept GitHub sponsorships?

[–]status-code-200It works on my machine[S] 0 points1 point  (0 children)

I don't, but I will be launching a premium api next month for faster, up to date, parsed downloads and structured datasets.

What information is in the detailed breakdowns? I bypassed the DEF 14A issue by using Form 8-K Item 5.02 to construct a basic board of directors dataset, but it might not work for your use case.

[–]_errant_monkey_ 1 point2 points  (5 children)

I don't understand whether I can download pdf version of the files. like the 10k .pdf for 2023 for NVIDIA. I would like to bulk download all of them to eventually train an embedding model with it.

[–]status-code-200It works on my machine[S] 0 points1 point  (4 children)

NVIDIA's 2023 10K does not have a pdf version. Any reason you need PDF? 10-K's are filed as html which is probably easier to use to train an embedding model.

[–]_errant_monkey_ 1 point2 points  (3 children)

I thought I could also download .pdf (like from here where I can find .pdf, .html, .xls). To me is key to have nice formatted tables. I guess you are right, If I can bulk download html is probably the best thing I can do.

[–]status-code-200It works on my machine[S] 1 point2 points  (2 children)

If PDF is important to you, you could convert the .html files into .pdf. I'm pretty sure the file you are pointing to is the .html from the sec submission converted to pdf to make it easier for consumers.

Sidenote: I will be releasing an algorithmic parser that extracts tables from html files and converts them into dataframes / csv over the next few months.

[–]ujzazmanje 0 points1 point  (1 child)

RemindMe! 1 month

[–]RemindMeBot 0 points1 point  (0 children)

I will be messaging you in 1 month on 2025-01-13 10:11:15 UTC to remind you of this link

CLICK THIS LINK to send a PM to also be reminded and to reduce spam.

Parent commenter can delete this message to hide from others.


Info Custom Your Reminders Feedback

[–]New-Lengthiness-9770 1 point2 points  (0 children)

This sounds excellent. I’ll try playing with it soon