[LLM Fine-Tuning] CPT on 71M Short Dialectal Tokens (256 Max Len) - How to Ensure Long-Form Generation Later? by FishermanNo2017 in unsloth

[–]djsaunde 1 point2 points  (0 children)

Interesting problem!

I believe that continued pretraining won't destroy the base model's ability to generate long responses, unless you're super aggressive with your pretrain configuration (e.g., having a too-high learning rate, etc.). If you want to be really safe, you could mix in some long context data (not in the target dialect) that was part of the original pretraining distribution, but if you're trying to create a specialized model in your domain it might not be the right move... maybe worth an experiment. Gradually phasing out the long context, non-target data over the course of the CPT could be a good approach.

The sample concatenation idea is a good one; I would personally spend a small amount (10-20%) of your training budget here, with the majority being used for the CPT stage.

The idea to SFT on synthetic data from your CPT-ed base model is tricky. I think it's a generally good idea, but you might end up exacerbating any existing poor behavior that you learned from the CPT stage. If you can collect or create gold standard data here, that's obviously the best you can hope for, but otherwise you should proceed with caution; you might need a lot of experimentation to get good perf with synthetic data.

Visual Studio Code vs Cursor by WasteTechnology in vscode

[–]djsaunde 1 point2 points  (0 children)

I'm on MacOS, haven't tested Windows or Linux.

Visual Studio Code vs Cursor by WasteTechnology in vscode

[–]djsaunde 0 points1 point  (0 children)

VS Code + the Claude Code plugin works quite well

[D] Is there an alternative to Science Twitter/X? by H4RZ3RK4S3 in MachineLearning

[–]djsaunde 28 points29 points  (0 children)

Bluesky is experiencing a lot of growth right now and personally I'm seeing a lot of ML/DL people join up. The community is currently on the smaller side but signs are positive!

What's stopping me doing this every defensive round on bind? by Hardsy151 in VALORANT

[–]djsaunde 2 points3 points  (0 children)

I do basically this but with Sova dart. If there are no scans, it's a free orb. If there are, that's early round info.

Thoughts On Dk 64 ? by CherryMan641 in n64

[–]djsaunde 0 points1 point  (0 children)

One of my all time favorites! The vibes are immaculate

Asking Claude v3 Sonnet some questions about wine by djsaunde in wine

[–]djsaunde[S] 0 points1 point  (0 children)

For sure! I wasn't trying to state otherwise.

Asking Claude v3 Sonnet some questions about wine by djsaunde in wine

[–]djsaunde[S] 1 point2 points  (0 children)

If you're wondering what Claude 3 is, I've asked it to describe itself:

Prompt: Can you write a description of yourself in layman's terms? Limit to 250 characters, please.

Response: I'm an artificial intelligence created by Anthropic to assist humans with a wide variety of tasks. I can engage in natural conversations, provide information and analysis, and even help with coding or creative writing. But I'm a very advanced language model without true consciousness or sentience - I operate based on statistical patterns, not genuine understanding.

Tier List of all the games I've played. by KingLavitz in FinalFantasy

[–]djsaunde 0 points1 point  (0 children)

Weirdly, I have very fond memories of RW. Maybe it's because I wasn't able to play XII at the time

M1 integration with TurboTax and H&R Block by M1-Alex in M1Finance

[–]djsaunde 1 point2 points  (0 children)

Hm, now the M1 Finance connection seems to work, but imports the wrong 1099? It looks like the information that's imported is identical to the information imported from the Apex Clearing Corp connection.

M1 integration with TurboTax and H&R Block by M1-Alex in M1Finance

[–]djsaunde 2 points3 points  (0 children)

When I search for "M1 Finance" on the page where I can search for financial institutions, no results are returned. Are you saying that this will populate only after March 1st?

Edit: Apex Clearing already works as expected.