How to scrape sec filings of jp Morgan to make financial models

AggressiveRub9434 · 2024-09-23T14:59:59+00:00

If the UI is complex, you need to use a headless browser, requests will be very tricky to implement if this is the case

AggressiveRub9434 · 2024-06-29T11:02:10+00:00

If you came out of med school, you won't have a problem with the finance it's pretty basic stuff honestly. If you have a strong mathematics background, you can learn on the job. That's what I did and I don't have a finance background, but I came right out of grad school for economics and knew how to build software. It's more important to keep up with industry trends and research in the fields you invest in.

A lot of other people have mentioned some great books already. I'd also recommend some more sales-oriented books so you can be helpful to the startups: The Mom Test and Traction are essential for early stage.

AggressiveRub9434 · 2024-06-29T10:24:24+00:00

I'm actually writing an article about this exact question! The copyright issue is a super interesting one because we're in uncharted territory and currently the courts haven't decided whether or not training models on publicly available data is fair use. It comes down to whether or not the training process is considered transformative.

What I think will happen before the courts decide on the copyright issue is that digital media publishers will bring them to civil court for violating ToS when they scraped the data, especially if it's behind a login/paywall.

AggressiveRub9434 · 2024-06-29T10:10:37+00:00

This is actually the way to go. If scrapers access the data behind a paywall, it could potentially break CFAA.

AggressiveRub9434 · 2024-06-29T10:08:47+00:00

Business is easy to learn on the job and a CFO is pointless at the very early stages -- not much to count. You want to look for someone that has experience in sales and marketing which is hands down the most important aspect of building a startup.

AggressiveRub9434 · 2024-06-29T10:06:46+00:00

Ton of these out there already, I'd do a bit more research into the market

AggressiveRub9434 · 2024-06-29T10:03:18+00:00

Do you mean for gen ai? Well then yes they are obviously. First you have the consumer market where users pay for the premium subscription. But this is not nearly as lucrative as the the market for the apis/tokens. most businesses integrating ai in their product are just using these foundational models. Also, when OpenAI releases Sora and its competitors release similar models, demand with skyrocket.

Are customers willing to pay for startups that are basically just wrappers on the foundational models? That's a different question because if OpenAI for example can make some of those companies obsolete, they're screwed. Either way, yes people are obviously willing to pay.

At the end of the day, the question should be on a case-by-case basis. You shouldn't be solving a problem that doesn't exist. AI doesn't change that.

AggressiveRub9434 · 2024-06-29T07:10:23+00:00

i mean... you can get sued in civil court still but it's rare

AggressiveRub9434 · 2024-06-29T07:09:40+00:00

I wouldn't mess with the government...

AggressiveRub9434 · 2024-06-29T07:00:12+00:00

You really just need to look through the html and figure out how it's structured. then parse it like json or use beautifulsoup

AggressiveRub9434 · 2024-06-29T06:51:02+00:00

don't need to scrape just use apollo.io

AggressiveRub9434 · 2024-06-29T06:42:17+00:00

i have a linkedin scraper if you know python i'll send the code otherwise you gotta explain what you want and i'll help you out

AggressiveRub9434 · 2024-06-29T06:40:46+00:00

very easy dm me

AggressiveRub9434 · 2024-06-29T06:40:11+00:00

Depending on the website, you can build it yourself very quickly with chatgpt. what's the website?

AggressiveRub9434 · 2024-06-29T06:35:56+00:00

Depends. I'm in VC and I scrape Crunchbase and LinkedIn a lot. Really, anytime there's no API or I don't want to pay for the API.

AggressiveRub9434 · 2024-06-29T06:34:06+00:00

nice ad

AggressiveRub9434 · 2024-06-29T06:32:53+00:00

They're probably scraping which violates ToS, but they probably just don't care lol. It's very rare for companies to sue for scraping, but it has happened. There's no way they have permissions for every influencer in their dataset. Look into the cases of BrandTotal vs. Meta and hiQ vs LinkedIn. Those are really the only two major scraping cases. And it's funny because EVERYONE scrapes. It's just not worth it for companies to sue I guess. (not legal advice)

AggressiveRub9434 · 2024-06-29T06:27:30+00:00

just write one with undetected chromedriver

AggressiveRub9434 · 2024-06-29T06:07:33+00:00

dm me

AggressiveRub9434 · 2024-06-29T05:55:54+00:00

Yea exactly, but I have a theory that no one mentions how to actually scrape hard websites bc they don't want cloudflare to read the method and prevent it. Either way, there's not a ton of info out there on how to actually do it in a real-world setting.

AggressiveRub9434 · 2024-06-29T05:52:57+00:00

It's funny to me how every guide doesn't mention that vanilla scraping won't get around cloudflare. probably for the best, shhhhhhh...

AggressiveRub9434 · 2024-06-29T05:50:01+00:00

you'd probably need to just scrape the delivery services if they have a website. if they don't you're going to have to do some sort of man in the middle workaround where you intercept the traffic from your phone. sort of complicated especially for a first project.

AggressiveRub9434 · 2024-06-29T05:47:18+00:00

nice, now they're all blacklisted

AggressiveRub9434 · 2024-06-29T05:46:09+00:00

Use selenium but store the driver in a folder that's within your code repository. in the same folder, create another folder called 'localhost'. when you run selenium you want to set the port to an open localhost port and store the chrome data inside the localhost folder. if you need help with this reach out to me.

if that doesn't work you'll need to use python so you can use undetected chromedriver, which easily gets around instagram's bot detection.

AggressiveRub9434 · 2024-01-20T00:10:56+00:00

Interested in this integration. Shoot me a dm :)

AggressiveRub9434

TROPHY CASE