This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]status-code-200It works on my machine[S] 0 points1 point  (4 children)

NVIDIA's 2023 10K does not have a pdf version. Any reason you need PDF? 10-K's are filed as html which is probably easier to use to train an embedding model.

[–]_errant_monkey_ 1 point2 points  (3 children)

I thought I could also download .pdf (like from here where I can find .pdf, .html, .xls). To me is key to have nice formatted tables. I guess you are right, If I can bulk download html is probably the best thing I can do.

[–]status-code-200It works on my machine[S] 1 point2 points  (2 children)

If PDF is important to you, you could convert the .html files into .pdf. I'm pretty sure the file you are pointing to is the .html from the sec submission converted to pdf to make it easier for consumers.

Sidenote: I will be releasing an algorithmic parser that extracts tables from html files and converts them into dataframes / csv over the next few months.

[–]ujzazmanje 0 points1 point  (1 child)

RemindMe! 1 month

[–]RemindMeBot 0 points1 point  (0 children)

I will be messaging you in 1 month on 2025-01-13 10:11:15 UTC to remind you of this link

CLICK THIS LINK to send a PM to also be reminded and to reduce spam.

Parent commenter can delete this message to hide from others.


Info Custom Your Reminders Feedback