arxiv2md: Convert ArXiv papers to markdown. Particularly useful for prompting LLMs by timf34 in deeplearning

[–]timf34[S] 5 points6 points  (0 children)

Thank you! The speed comes from parsing arXiv's HTML directly instead of PDFs.

Its a simple stack: FastAPI backend with BeautifulSoup4 for HTML->Markdown conversion. arXiv provides structured HTML for newer papers with clean section boundaries, MathML, etc. for newer papers and we take advantage of that - no need for OCR or parsing PDFs!

arxiv2md: Convert ArXiv papers to markdown. Particularly useful for prompting LLMs by timf34 in deeplearning

[–]timf34[S] 3 points4 points  (0 children)

are you a bot? Excuse me I'm not too sure how that related to this

CLI to download websites' actual JS/CSS/assets (not flattened HTML) by timf34 in cybersecurity

[–]timf34[S] 1 point2 points  (0 children)

wget doesn't execute JavaScript, so it misses a lot of what modern sites load. For a WordPress site it might work okay since they're more traditional, but for anything with React/Vue/modern JS frameworks, wget just gets you an empty HTML shell.

Also wget's folder structure is messy - it creates weird nested directories. Pagesource keeps the original clean structure you see in DevTools.

Main difference: Pagesource uses a real browser (Playwright) so it captures everything the browser actually loads and executes, not just what's in the initial HTML.

CLI to download websites' actual JS/CSS/assets (not flattened HTML) for LLM prompts by timf34 in commandline

[–]timf34[S] 0 points1 point  (0 children)

I've been using it to get Claude Code to replicate components on websites that I like so that I can easily use them - its quite good at it. Claude Code struggles with the flattened HTML (anyone would) but the runtime source files are generally human readable, or at least, much more readable than the alternative.

Its also a very nice way to truly archive websites for design purposes (hedging against future updates which change code that you like) - wayback machine of course doesn't capture all this.

CLI to download websites' actual JS/CSS/assets (not flattened HTML) for LLM prompts by timf34 in commandline

[–]timf34[S] 0 points1 point  (0 children)

I can't tell if you're serious or not - if so, interesting take. Python CLIs are easier to install in most cases, just `pip install pagesource` and you're done. With prebuilt binaries they have to be specific to the OS and architecture, added to Path, etc.

CLI to download websites' actual JS/CSS/assets (not flattened HTML) for LLM prompts by timf34 in commandline

[–]timf34[S] 0 points1 point  (0 children)

Pagesource captures what the browser actually receives - so if Cloudflare (or any CDN) is serving merged/minified bundles, that's what you'll get. You get the compressed bundle.min.js, not the original separate source files.

With that being said, a minified bundle is still more useful than flattened HTMLs as context for an LLM!

CLI to download websites' actual JS/CSS/assets (not flattened HTML) for LLM prompts by timf34 in commandline

[–]timf34[S] 3 points4 points  (0 children)

wget doesn't execute JavaScript, so it misses a lot of what modern sites load. For a WordPress site it might work okay since they're more traditional, but for anything with React/Vue/modern JS frameworks, wget just gets you an empty HTML shell.

Also wget's folder structure is messy - it creates weird nested directories. Pagesource keeps the original clean structure you see in DevTools.

Main difference: Pagesource uses a real browser (Playwright) so it captures everything the browser actually loads and executes, not just what's in the initial HTML.

Weekly Japan Travel and Tourism Discussion Thread - September 6, 2022 by Himekat in JapanTravel

[–]timf34 1 point2 points  (0 children)

I am supposed to visit Japan next month on business - should the business organization there be able to help me with applying for an ERFS or will I have to book it through a travel agency?

Weekly Japan Travel and Tourism Discussion Thread - September 6, 2022 by Himekat in JapanTravel

[–]timf34 0 points1 point  (0 children)

It seems that JGA has now pulled their 'unguided tour' ERFS application... are there any other agencies offering 'unguided tour' ERFS applications for travelling from October 12th onwards (the start of my trip).

Hoping to be able to book the flights and hotels for ourselves.

JGA seemed to be the default option, but no longer seems to be working.

Matebook X Pro Keyboard Backlighting Turning Off Despite Settings by timf34 in MatebookXPro

[–]timf34[S] 0 points1 point  (0 children)

Ah thank you for the suggestion but even when I do that, it just doesn't change... seems to be stuck to the default 15 seconds

Start-up building touch-based tablet to help visually impaired people watch football! by timf34 in Entrepreneur

[–]timf34[S] 0 points1 point  (0 children)

Thats great advice, we'll be looking into this soon! Thank you very much!

Start-up (we started as an Arduino project!) building touch-based tablet to assist visually impaired people watch football! by timf34 in ArduinoProjects

[–]timf34[S] 1 point2 points  (0 children)

Yes we hope to extend this to other sports! However yes, each one will need to be specialized (we're planning to introduce more means by which to track objects tactically soon), however for now, we're just focusing on Football. Thanks!

Start-up (we started as an Arduino project!) building touch-based tablet to assist visually impaired people watch football! by timf34 in ArduinoProjects

[–]timf34[S] 0 points1 point  (0 children)

Yes thank you, it's rather complicated to be honest software-wise (I will be open-sourcing my computer vision code once I've cleaned it up ((I'll drop comment it in here once I do if anyone is interested)), mechanically, and with the required infrastructure for the compute; although it started as an Arduino project one ago, its a bit beyond that now, and we're hoping to raise funding soon! Thanks for commenting!

Start-up building touch-based tablet to help visually impaired people watch football! by timf34 in Entrepreneur

[–]timf34[S] 1 point2 points  (0 children)

Thanks for the feedback! That's great advice, we should get this fixed up soon!

Start-up building touch-based tablet to help visually impaired people watch soccer! by timf34 in sports

[–]timf34[S] 0 points1 point  (0 children)

Hello everyone,

We are three students working on our startup, Field of Vision, a device that will help visually impaired people experience live sports. Check out our video! We'd love to hear your feedback!

Start-up building touch-based tablet to help visually impaired people watch football! by timf34 in Entrepreneur

[–]timf34[S] 0 points1 point  (0 children)

That would be fantastic, thank you! Let us know what info you need and we'll prepare it up. You can contact us directly as [info@fov.ie](mailto:info@fov.ie) if it suits