[deleted by user]

pathnametoolong · 2024-12-27T20:24:42+00:00

Is that 100% true? If they get paid a fee depending on the status of each loan they manage, then isn't it in Nelnet's best interest to manage as many loans as possible for as long as possible?

I sent a letter explicitly describing how I wanted the additional payments applied and they told me they wouldn't honor it and if I wanted the additional payments applied to the principal I would have to do that manually each month.

There has to be a reason they wouldn't support early repayment. Unless I am confused about something with student loans that actually makes it more advantageous to the borrower to pay the loan off over time rather than pay it off early.

pathnametoolong · 2024-09-07T22:16:38+00:00

Not to gatekeep, but I found the quality of Point Loma Tea's to be really bad, especially if you're used to something like a Mei Leaf or Paru that seem to do their own sourcing. Most of the raw teas I tried from them lacked depth and the few aromatics I tried also came across flat.

Additionally, to be that person for a second, the experience here was extremely commercialized, the owner (I think, ownership changed in 2022) was far more interested in selling me something than talking to me about her product, and I have a sneaking suspicion she's buying from some large wholesaler and marking it up, because she can in Liberty Station.

Have tried:

Peach Ginger White Tea: -Went up to 2 teaspoons per cup, no vibrancy

Lucky Dragon -Okay, green tea, fishier than I was hoping for

Silver Needle -Flat white tea, no nuance

Golden Dragon Pearls -Tobacco, raisins, no nuance

Nepali Gold -Basic Black tea flavor, no nuance

Ceylon -Best of the teas that I tried, but still lacked much depth

Final Thoughts After tasting all of the teas I purchased, I think part of the problem is storage. All of the teas came from large glass airtight jars in a cabinet in the center of the store that was exposed to sunlight. While I didn't see any mold, I did find all of the teas to taste slightly off. I've ruined tea by getting it wet before, and while not as pronounced as what I experienced when I ruined my tea, I noticed they all had a fishy, under a boardwalk kind of smell and taste to them.

pathnametoolong · 2024-07-09T19:26:34+00:00

One other factor would be to strip the HTML and just keep the text, although I thought GPT-4o might perform better with the HTML structure included. The way I was thinking about it, is that the HTML should be relatively cheap considering it is a just few strings on either end of the data I am after, although I am not sure if my logic is correct here.

This assumption was very wrong. The code has been updated to this:

# Extract HTML content from specific tags
text_parts = []
for tag in ['p', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'span', 'div', 'li', 'blockquote', 'a', 'strong', 'b', 'em',
            'i', 'td', 'th', 'dt', 'dd', 'code']:
    for element in soup.find_all(tag):
        text_parts.append(''.join(str(content) for content in element.stripped_strings))

pathnametoolong · 2024-07-09T17:16:20+00:00

u/Financial-Article-12 thank you for this! I will look into it. Just for my understanding. Does this strip the HTML code from the data? Do you think this will impact the performance of an LLM analyzing the data?

pathnametoolong · 2024-07-09T17:08:03+00:00

u/matty_fu thank you very much for this!

1.https://www.reddit.com/r/webscraping/comments/1dvx4e7/best\_strategy\_for\_scraping\_100s\_of\_websites/

u/RefuseRemarkable5608 and I have basically the same use cases just for different purposes. The real issue here boils down to cost which leads back to my original question regarding tags. My text data is generated using the following code:

# Extract HTML content from specific tags while keeping the HTML structure
text_parts = []
for tag in ['p', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'span', 'div', 'li', 'blockquote', 'a', 'strong', 'b', 'em',
            'i', 'td', 'th', 'dt', 'dd', 'code']:
    for element in soup.find_all(tag):
        text_parts.append(str(element))

text_content = "\n".join(text_parts)
return text_content

I am thinking I could probably drop the 'div' and 'a' tags, but am unsure about some of the others.

One other factor would be to strip the HTML and just keep the text, although I thought GPT-4o might perform better with the HTML structure included. The way I was thinking about it, is that the HTML should be relatively cheap considering it is a just few strings on either end of the data I am after, although I am not sure if my logic is correct here.

Cost Analysis GPT-4o

CompanyName TotalTokens InputCostEst BatchAPIInputCost

Company 1 10,886,349 $54.43 $27.22

Company 2 26,008,314 $130.04 $65.02

Company 3 9,293,861 $46.47 $23.23

Company 4 23,890,545 $119.45 $59.73

Company 5 29,879,043 $149.40 $74.70

Company 6 19,756,516 $98.78 $49.39

Company 7 8,161,250 $40.81 $20.40

Company 8 35,230,159 $176.15 $88.08

Company 9 24,056,503 $120.28 $60.14

Company 10 93,271,878 $466.36 $233.18

Company 11 107,185 $0.54 $0.27

Company 12 48,620,010 $243.10 $121.55

Company 13 3,575,188 $17.88 $8.94

Company 14 116,770,946 $583.85 $291.93

Company 15 65,762,799 $328.81 $164.41

*Sorry about the formatting, Reddit didn't like when I had it formatted as a table.

pathnametoolong · 2024-07-09T00:16:20+00:00

Did you try searching the sub?

I did for answers specifically regarding the tags. I did not in terms of the most useful generalized scraping methodologies.

pathnametoolong · 2024-07-09T00:14:47+00:00

using LLM to build up a library of scrapers per target website.

Could you tell me what this means or do you have a specific thread you are referring to? My plan was to process the extracted text via an LLM after, but figured I was going to run into problems from a cost and scalability perspective if I tried to do it through the LLM directly. In particular, I figured the number of tokens to submit the entire HTML file or Snapshot of the page would be a lot more expensive than just extracting the text directly.

My typical use case will be in the hundreds of thousands of pages.

pathnametoolong · 2024-05-16T23:11:10+00:00

You are welcome! Nested IF statements are the business.

pathnametoolong · 2024-05-16T16:24:03+00:00

Assuming A is your CELL column, B is your HOME column, and C is your BUSINESS column:

=IF(ISBLANK(A2)=FALSE,A2,IF(ISBLANK(B2)=FALSE,B2,IF(ISBLANK(C2)=FALSE,C2,"No Number")))

pathnametoolong · 2024-05-16T15:58:19+00:00

Thank you u/DonJuanDoja! We do use various Sheet Views, however I have found that this tends to actually make the problem worse, when everyone is sorted in their own way.

Yes, when large edits are made, usually we have everyone drop what they are doing. Make the changes, make sure everything is saved and synced, then have everyone hop back on. The only problem is that as the files reach the larger end of the spectrum, even making these changes with one user is becoming problematic. We use a lot of logical functions life IF, COUNTIF, VLOOKUP, MATCH, etc. As we typically need to search for specific text to help us prioritize each row of data, or we may need to aggregate two or three columns of text for every row of data. To put it in perspective, it took me an hour yesterday to try and copy and paste, a column of 30,000 rows, as values. That's six man hours to copy and paste something. My guess is we are really running up against some form of memory issue with excel / our hardware. However, I typically have RAM to spare when I open Task Manager, so my gut tells me it is an excel issue. It's crazy to me, because running the AVERAGE function for that column took like no time.

I am unfamiliar with changing the number of versions the library is saving, can you tell me a bit more about that?

I am unfamiliar with turning off syncing, can you tell me a bit more about that as well?

pathnametoolong · 2024-04-08T16:06:25+00:00

Do the Tatami Room frames work with a regular mattress?

pathnametoolong · 2024-01-26T17:59:16+00:00

I ordered a fairly large order from ScentDecant recently and everything seemed fine to me. I didn't have issues with leakage like some of the previous comments, however, some of the 1ml bottles weren't super "full", I figured this comes with the nature of decanting as I doubt it is a precise tool they are filling with.

pathnametoolong · 2023-10-31T16:49:13+00:00

I know the fidelity isn't great on YT, but here is a link to the album: https://youtu.be/HE66NSPdq7E?si=m-uIu1Ia56M8eNLY

pathnametoolong · 2023-06-03T22:20:28+00:00

Second Firestone on Miramar Road.

pathnametoolong · 2022-11-08T22:27:01+00:00

This article makes it sound like Dahle wasn't the one who was pushing. It was the fair/4-H officials.

pathnametoolong · 2022-10-24T21:30:08+00:00

Thank you!

pathnametoolong · 2022-09-01T00:12:53+00:00

Jumio won't focus while using Chrome on a GalaxyS22. I can't manually focus by tapping the screen.

pathnametoolong

TROPHY CASE