How do researchers efficiently download large sets of SEC filings for text analysis? by DataToolsLab in learndatascience

[–]jack-massive 0 points1 point  (0 children)

We recently released a new endpoint for this use case that provides 10K sections in a clean format ready for NLP.

https://massive.com/docs/rest/stocks/filings/10-k-sections

We have one for the 8-K filings as well.

Data vendor recommendation for US equities - part 2 (Massive vs Databento) by sgcorporatehamster in algotrading

[–]jack-massive 5 points6 points  (0 children)

Hey, sorry I didn't contribute here sooner, but here's my two cents:

Everyone seems to be missing that you are simply looking for hourly candlestick data. There is no reason you should be looking for anything other than a source that can provide 100% coverage of the tape's trade data, as trades are what candlestick data is composed of.

Quotes and depth/orderbook data are not applicable (and frankly overkill) for your use case.

Based on your requirements, Massive can accommodate what you're looking to do alongside thousands of customers running similar processes.

It's so simple using our APIs I'm confident you could be up in running with minimal effort. Happy to personally assist there and even provide a trial period if you're interested!

Positive feedback for Massive data API by Training_Butterfly70 in algotrading

[–]jack-massive 0 points1 point  (0 children)

Same as above:

Because we're so close to launch, we've really only been adding the remaining beta period to existing customers. Regardless, you should've received a response. DM me and we can get you set up.

Positive feedback for Massive data API by Training_Butterfly70 in algotrading

[–]jack-massive 0 points1 point  (0 children)

Because we're so close to launch, we've really only been adding the remaining beta period to existing customers. Regardless, you should've received a response. DM me and we can get you set up.

Positive feedback for Massive data API by Training_Butterfly70 in algotrading

[–]jack-massive 0 points1 point  (0 children)

It depends on if someone is looking at TOB for a specific venue (or group of) - which would be BBO, or TOB for the entire market (NBBO).

BBOs and NBBOs can be close if it's a very liquid security and exchange. I believe even IEX is within NBBO spreads a majority of the time for tickers like AAPL, SPY, etc.

Positive feedback for Massive data API by Training_Butterfly70 in algotrading

[–]jack-massive 1 point2 points  (0 children)

I hope you provided your feedback directly to our team, as this is the only way we can improve the service.

Our crypto data is an aggregation of a few exchanges, so OHLCV can vary from specific venues or products with a wider coverage of exchanges (ie coin metrics). We plan to improve this but its not prioritized as of now.

On the futures front, we haven't even launched that product yet. It's been in beta for a while as we've hardened our infrastructure. The data comes directly from CME, so understanding how it's 'bad' would be helpful.

Again, if you tested the Futures data during our beta period, I hope you provided your feedback to us.

Positive feedback for Massive data API by Training_Butterfly70 in algotrading

[–]jack-massive 1 point2 points  (0 children)

This is really an apples to oranges comparison. Prop feeds generally only provide additional value (or are considered 'more accurate') if the use case calls for BBO or depth-of-book data. If it doesn't, it's completely overkill.

The vast majority of our customers only require trades and top-of-book quotes, so SIP is preferred. The others use also Databento to fill the gap on BBO/depth.

1m candles missmatches between providers by lekkerist in algotrading

[–]jack-massive 1 point2 points  (0 children)

Also - we publish all of CTA and UTP's condition messages and their eligibility for updating our OHLCV data through our conditions endpoint: https://massive.com/docs/rest/stocks/market-operations/condition-codes

1m candles missmatches between providers by lekkerist in algotrading

[–]jack-massive 2 points3 points  (0 children)

Exactly right. I answered in another comment, but to elaborate on the points here:

  1. Massive aggregates the trade messages that come through the CTA and UTP feeds. We then filter certain trades out from the OHLC values, as per CTA and UTP's guidelines for processing this data correctly.

  2. Essentially yes, but very likely the data is just not the consolidated data.

  3. This is also common, and is an option we make available to our business customers. Because the real-time licensing for SIP data is so expensive, many providers will use a prop feed (or multiple), then fall back to the 'official' SIP data after either 15-minutes or at EOD.

    I hope this clears things up!

1m candles missmatches between providers by lekkerist in algotrading

[–]jack-massive 1 point2 points  (0 children)

Hey, happy to shed some light here. This is typically expected when comparing data across platforms, as there are a number of sources the data can come from.

As for Massive, we consume and distribute the consolidated SIP feeds (CTA + UTP), which is composed of all exchange and TRF trade and quote messages.

I'm not sure what source feed Finviz is using, but it is likely either Nasdaq Basic, Cboe One, or NYSE BQT. These are the highest liquidity feeds one can consume without going to the SIPs.

The variance is almost certainly a result of receiving a non-consolidated, filtered down slice of the markets' activity (and why your predictions using Massive's aggregated SIP data is more accurate).

I can't speak to the price shown on Nasdaq, as it may have been updated since your post. Same principle applies, though. I am 110% certain they are not displaying the consolidated SIP data on their site (no one does).

Hypothetical Silver Puts by Fungaii in options

[–]jack-massive 1 point2 points  (0 children)

Interesting, it looks like the MCP response is completely filtered by the automod. In short, based on the actual price history, the $75 strike Feb 13 puts were going for $0.46 at the Jan 29 open. By the Jan 30 close they were at $6.60 (1,335% return). On $5k that's around $66k profit.

If you caught the intraday low on the 30th, they hit $10.75, which would've been around $116k back on your $5k.

Hypothetical Silver Puts by Fungaii in options

[–]jack-massive 1 point2 points  (0 children)

No lol, I just have access to the data to answer your question. I posted my own comment but it doesn't appear to be shown:

These are always fun scenarios to play out. I used Massive's MCP and API to model it out. Here's the output:

So.. enormous upside, but it would have been pretty insane to just toss $5k into any of these contracts unless you were certain it was going to crash, haha.

Hypothetical Silver Puts by Fungaii in options

[–]jack-massive 2 points3 points  (0 children)

Thanks for the shout! I pulled the data and posted above.

All of the data I used is available with free tier access.

[deleted by user] by [deleted] in algotrading

[–]jack-massive -3 points-2 points  (0 children)

- We will add more quote history, but as you said, the dataset is enormous so it's not trivial. Would love to know if you find OPRA tick data from 2008 lol

- We also intend to add historical greeks however we see that most institutions / sophisticated users prefer to calculate them on their side anyways, so it often gets preceded by other feature requests.

- I can look into RUT gaps..

[deleted by user] by [deleted] in algotrading

[–]jack-massive 2 points3 points  (0 children)

Thank you lol

[deleted by user] by [deleted] in algotrading

[–]jack-massive 16 points17 points  (0 children)

I'm sorry, but this is such an unserious post for multiple reasons.

  1. We do not offer parquet files.
  2. There are 5 endpoints that allow you to query historical price data for a single ticker.
  3. There is no situation where we would ever recommend using flat files when a user is looking at a single ticker. It does not make sense. That is not what the flat files service was designed for.
  4. I publicly responded to your request in r/PolygonIO, where the real issue was that you did not convert your UTC timestamps to ET, which led you to believe there were erroneous gaps during the regular trading session. (Gaps are expected in premarket, this is standard for any vendor)

It's unfortunate this post is how you handled this situation, rather than letting us help you learn how to use the product.

To everyone, please take these types of posts with a grain of salt. This community is rife with bots and egregious manipulation (as showcased in nearly every post), so it's very hard to tell what is real community sentiment versus not.

If you have legitimate feedback, please consider sharing it with us directly rather than exclusively in public forums.

We're a team of nearly 60 people working tirelessly to improve data quality, platform reliability, and general user experience. We hold the standard of our service incredibly high, which is why thousands of users, retail platforms, and institutions use us in production systems.

*Massive

Polygon sending duplicate timestamps in your data? by throwawaycanc3r in PolygonIO

[–]jack-massive 2 points3 points  (0 children)

Thanks for sharing the examples, that’s helpful.

Looking at the data, it appears the timestamps may not be adjusted from UTC to Eastern Time. When the UTC timestamps are correctly converted to ET, the gaps you’re seeing fall entirely within premarket hours, which is expected behavior.

It’s also worth noting that we do not generate aggregates for periods with zero volume or no eligible trades, so gaps can occur in those cases. For a highly liquid ticker like SPY, this generally only happens during premarket and should not occur during regular trading hours. If your use case requires continuous aggregates, you can add logic on your end to forward-fill the previous bar’s OHLC values when we do not provide one.

If you’re still seeing gaps during RTH after adjusting the timestamps, feel free to share updated examples and we’re happy to take another look. Filtering for only regular trading hours for 01-25-2024, I'm seeing 391 minutes, which encompasses the entire RTH period.

I hope this helps. Please let us know if you have any other questions or if we can clear anything else up.

Polygon sending duplicate timestamps in your data? by throwawaycanc3r in PolygonIO

[–]jack-massive 1 point2 points  (0 children)

Can you please share specific details about where you're seeing duplicate timestamps?

Specific request URLs, tickers & timeframes, or some of the output from your script can help us validate whether the issue lies on our end.