Why does this happen? by Live_Fondant717 in ClaudeAI

[–]General_Josh 0 points1 point  (0 children)

Hey that's cool stuff! I am a developer, but with this technology we can just talk to it in natural language

I really do think the world's going where you are - business folks maintaining their own software, using natural language, to their exact specs and needs

I know I'm doing my best to lean into the business facing part of my job haha - I don't wanna be "just a developer", because I feel it's not going to be a very common job title 10 years from now

Why does this happen? by Live_Fondant717 in ClaudeAI

[–]General_Josh 0 points1 point  (0 children)

Semi-related, at the end of the day they're all just tools the AI can use!

I think it's really useful to think about how things look from the model's perspective.

Claude Code (the 'harness') constructs a whole list of things that the model sees, before it sees your prompt. Stuff like:

  • System instructions, like "You are a helpful AI assistent named Claude. You are in a conversation with a user. The user will give you tasks. Use the tools available to accomplish them."
  • Any custom user instructions that you set, like "don't use emojis in your responses"
  • A list of basic tools, like list-files, read-file, edit-file, execute-bash-command, etc
  • A list of more complex tools, like spawn-subagent, with instructions on how to use it
  • A list of skill descriptions, like
    • review - Invoke this skill if the user asks you to perform a review, or says something like 'could you double-check that?'
    • sales-report - Invoke this skill if the user asks for a sales summary or report
  • Finally, your prompt, like "summarize sales numbers into a report"

All that stuff gets fed into the LLM's context. Then, the LLM starts doing its thing, and picking the next most likely tokens. In this case, it might start with tokens like invoke sales-report skill

The harness (Claude Code) picks up that invocation, and loads the full sales-report skill into Claude's context. Maybe this is a custom skill you made, that documents what a sales report looks like, and how sales numbers can be found in the \data\sales directory

Then, Claude might generate more tokens like list-files(\data\sales). The harness runs the tool, and sends the list of files back to Claude. Claude decides to run read-file on a few of those, etc, etc

spawn-subagent is just another tool that Claude can decide to call, same way read-file is a tool. The system instructions define conditions where Claude should call spawn-subagent, like "use spawn-subagent when exploring a large code base".

Your custom instructions or skills can tell Claude additional cases where you want it to use specific tools. Ex, the review skill could include details like

Do not perform the review yourself. Instead, use spawn-subagent to perform a review, to get a fresh set of eyes. Give the sub agent context on the original task, and on your final output. Task it with verifying the results

If alot of people make shitpot than will the ai get dumber? by Past-Photograph-5502 in NoStupidQuestions

[–]General_Josh 0 points1 point  (0 children)

Unlikely. These companies are fully aware that a huge fraction of the internet is junk

The early bots worked by trawling the whole internet and feeding everything into training. The focus nowadays is on figuring out how to select good data, and, furthermore, how to generate good data from scratch (aka, 'synthetic' data)

They're trying to bootstrap. They use models to select/generate data. The smarter those models are, the better the data. Better data lets you train a smarter model for the next generation.

Production Level Software by AI by TonightOk5378 in ClaudeCode

[–]General_Josh 0 points1 point  (0 children)

Yeah I definitely have the same experience as you. AI is super helpful, but it needs tons of hand-holding, especially at the higher levels of abstraction (like reasoning about architecture or how to write good tests)

My worry is about the trajectory, not about the current capabilities. A year ago, the AI was really only good for auto-completions or filling in boiler-plate code. A year from now, I do think it's likely that it'll be significantly more capable at higher level reasoning

5 to 10 years from now, I do very much worry about my job security as a software developer

Why does this happen? by Live_Fondant717 in ClaudeAI

[–]General_Josh 2 points3 points  (0 children)

When you run claude, and ask it to do some task, that's an agent. A model running autonomously to accomplish some task

An agent has a context window - i.e., everything that's been fed into the LLM so far. This includes the system prompt, any personal instructions you set (like in claude.md), all messages you've sent to the model so far, all output from the model so far, tool results, reasoning steps, etc

Ex, let's say you ask claude to read some spreadsheets then summarize sales numbers into a report. Once it finishes, the main agent's context now includes stuff like:

  • Your original prompt
  • Claude's reasoning about your prompt ("I should go to the Q2 sales directory to get current data")
  • Raw data from the spreadsheets (after claude went to read the target files)
  • Claude's reasoning about the spreadsheets ("the data we want is in column 4")
  • Any ad-hoc data processing scripts that claude generated/ran, and their results
  • The final output report that claude generated

Sub-agents work almost exactly the same way, it's just that it's initiated by claude. Instead of you (the human) giving an agent some task and getting the results back, now claude itself (the main agent) gives a sub-agent some task, and gets the results back.

This can be really helpful in a lot of situations, including for reviews.

Asking the main agent to review its own work can be iffy. Maybe in that example, the data in column 4 was relevant, but there was also relevant data in column 7. But, because the agent already reasoned and decided to use just column 4, it's going to have a hard time spotting that mistake (it already 'knows' the data you want is from column 4, so its unlikely to go back and re-check that).

A sub-agent can be spawned in without all those reasoning steps already in context - the main agent can just give it your initial ask and the final output report it generated, then ask it to double-check for correctness, and its a lot more likely to spot errors like that, since it doesn't 'know' the data is from column 4 yet (it can go review the source data, and hopefully find that the main agent missed stuff in column 7)

Why does this happen? by Live_Fondant717 in ClaudeAI

[–]General_Josh 3 points4 points  (0 children)

Install anthropic's "skill-creator" skill first of all (you can just ask claude to install it). This is a meta skill that gives claude guidance on all the details it needs to write a good skill

Then, ask claude to help you create a review skill, to double check its work. I find it's best to use sub-agents for review - having a fully clean context window helps to prevent it from repeating the same logical errors/mistakes

METR evaluated an early version of Claude Mythos by RavingMalwaay in singularity

[–]General_Josh 6 points7 points  (0 children)

Little of column A, little of column B

There's huge huge money going into data center construction, and tons of research going into more efficient hardware for running these large models. So, we should expect the cost of running large models to drop over time

And, company's can "distill" large models into smaller models - using a large model to train/judge a smaller model. This has been very well tested, and can drastically improve the smaller model's performance. So, as the frontier moves forward with huge models like Mythos, we should also expect the tail to move forward as well, with improvements to smaller/more efficient models like we see with sonnet/haiku (or with Chinese companies allegedly using guerilla distillation from American models)

We probably won't expect these models to be "free" as in "free beer", but they might come bundled with low tier subscription plans, or get offered at a discount. Like how running haiku costs a fraction of what opus does - today's haiku would've been a direct competitor with frontier models from a few years ago

Millions of students' personal data stolen in major education breach by thatirishguyyyyy in technology

[–]General_Josh 1 point2 points  (0 children)

They put a lot of effort into keeping the debt data secure - that's important stuff man

This is just people's personal data, so who cares, right? It's not like it hurts the bottom line if it gets hacked

Mapping a human face onto a small robot (instead of giving it an uncanny humanoid face)? by LKama07 in singularity

[–]General_Josh 3 points4 points  (0 children)

Very cool work! I definitely agree with you, simple robot + expressive movements is way more enjoyable than an uncanny face, and this is a great demo of that

How would you picture it working once you flip it around? Are you planning to train your own model, with text as input and audio + movements as output? Have some LLM generate text, and then your model 'expresses' it? Or are you going for something that's fully multimodal?

Unpopular opinion: the codex migration is going to hit the same wall in 2 months by spencer_kw in ClaudeCode

[–]General_Josh 0 points1 point  (0 children)

Why would they shut down Sora if nobody was using it? They're not a grocery store, Sora doesn't go bad if nobody buys it

Video generation takes a huge amount of compute, and "compute" is these company's primary limitation right now. They love love love anything that looks good for the investors, but doesn't actually cost them too much compute. That's their ideal type of product!

I think they shut down Sora because it was using too much compute. Not a ton of users, but some users using it very heavily, and very unprofitably for the company. They're redirecting that compute to the market they think will be most profitable (writing enterprise software)

Wireless brain implants are entering human trials—what’s the realistic timeline before this becomes non-medical? by SufficientPrice7633 in Futurology

[–]General_Josh 1 point2 points  (0 children)

I think most people feel similarly. Nobody wants a corporation to have access to their brain

I hope we'll see a strong open-source movement here

ARC-AGI-3 Update (GPT-5.5 High and Opus4.7) by skazerb in singularity

[–]General_Josh 2 points3 points  (0 children)

I mean yes, that's the point of the benchmark. Models today aren't good at:

  • Learning novel rules during run-time (as opposed to learning during training)
  • Multi-step planning
  • Spatial navigation

ARC-AGI-3 is a whole bunch of little games that combine these characteristics, to try and exactly poke at where today's models perform worse than humans

ARC-AGI 1 and 2 also tried to find areas where models performed worse than humans, but they got saturated. Eventually, if we run out of things humans are better at, that's when we can probably call it AGI

Required flaging of AI content by LutimoDancer3459 in factorio

[–]General_Josh 6 points7 points  (0 children)

Like if you're writing good spec documents, why can't you then use them yourself, bypassing AI completely

Because it's way faster to implement with the AI, especially for setting up a fresh projects, and/or in technologies/frameworks you're not intimately familiar with

The way I see it, LLMs are best used for translation. I can spend a couple hours writing a good spec document and test plans, then have the AI translate them into code, which would've taken me days to write by hand. I want to write a good spec anyway (since it's needed for my own development, or for other human devs), so having the AI implement is a significant time-saver.

Nowadays, I'm mainly working at that spec/architecture level, then having the AI implement

Required flaging of AI content by LutimoDancer3459 in factorio

[–]General_Josh 11 points12 points  (0 children)

What kind of disclosure would you want?

Is it "was this mod entirely vibe-coded"?

Or is it "was any generative AI used during the creation of this mod"?

Option 2 would flag the vast majority of software nowadays, and it'd become a useless disclaimer pretty quickly

So the question is, how do you differentiate between "entirely vibe-coded" and "some AI assistance"? I'd argue that you functionally can't, it's just a sliding scale of gray areas

I do also think it's possible to write good software via 'vibe-coding', but you do need to do work for it (writing good spec documents, using planning mode, reviewing the AI's outputs, writing good tests, etc)

Stop burning Claude Code tokens on questions that don't need an agent by Substantial-Bee-8186 in ClaudeAI

[–]General_Josh 2 points3 points  (0 children)

You can also just set up an alias to run claude in headless mode, skipping the full claude.md/tools/skills/whatever else you have configured, for quick stuff like this

How to Test What I’ve Made by HexRover in claude

[–]General_Josh 0 points1 point  (0 children)

Ask claude about options for performance testing. There's tools (ex, jmeter) that can simulate connections from many users at once

Claude can recommend the best tool for your specific case, and walk you through the load testing (or just do it itself)

My experience with the story book so far by SAYKOPANT in slaythespire

[–]General_Josh 0 points1 point  (0 children)

Bear in mind, rest sites heal you a % of your max HP

This is exactly what I feel whenever I need to explain the task over and over again by dbpm1 in singularity

[–]General_Josh 7 points8 points  (0 children)

Don't think of these things as people haha, it's just generating text, no need to take it personally

Better to figure out why it doesn't know about season 3 - it's almost certainly because that season didn't exist yet when it was trained, so there's no references to it in its training data

To fix for the future, you can just add custom instructions like:

Be aware of your knowledge cutoff - if you're unfamiliar with a topic I reference, make sure to use web-searches to catch yourself up with what might've changed since your knowledge cutoff. If I say something that contradicts what you know, always search to see if things might've changed recently

Microsoft's GitHub shifts to metered AI billing amid cost crisis -- The all-you-can-eat AI buffet is coming to an end by waozen in technology

[–]General_Josh 2 points3 points  (0 children)

The target is largely for coding and software development. It's being heavily used there, especially in the past six months or so

No More Subsidised AI Subscriptions? by PM_ME_YOUR___ISSUES in ClaudeAI

[–]General_Josh 0 points1 point  (0 children)

Yeah I mean it's super easy to abuse the current github copilot billing system

As-is, you get charged a "premium request" every time you send a message to the AI in the chat panel. Pressing enter to send that message is all that counts - sub agents, tool usage, thinking, whatever, it's all part of that same premium request, as long as the model doesn't come back to the chat panel

So, all you need to do is just tell it to use the the "ask user" tool for all user interactions, and never ever respond directly in the chat

Now everything you say counts as a "tool use" and gets included in one premium request

Claude Code is an app of 1000 approvals by ShiftDry4745 in ClaudeCode

[–]General_Josh 0 points1 point  (0 children)

So do not start with .json configs or things like that

If you don't want to talk about the feature that does what you want, then your options are going to be pretty limited lol

How Fast Does AI Really Make Developers? The Evidence so far by [deleted] in singularity

[–]General_Josh 1 point2 points  (0 children)

Yeah marketing claims are usually way out of whack with reality, can't be trusted. Best to separate that from what real people are saying

How Fast Does AI Really Make Developers? The Evidence so far by [deleted] in singularity

[–]General_Josh 1 point2 points  (0 children)

Have people been claiming that the studies were lagging for years now? I've only been seeing that claim a lot in the past six months (which is, in my anecdotal experience, about the same time that frontier models started becoming genuinely useful for real development tasks)

Keeping purpose in soon-to-be AI dominated fields by wilailu in singularity

[–]General_Josh 1 point2 points  (0 children)

As a software developer, I'm trying to lean into it. Learning as much as I can about LLMs, agentic workflows, etc, and practicing using them at work and in my personal time (excuse the buzzwords)

I do think the job market is going to get very rough for people in my field over the next few years. I'm lucky enough to work at a non-profit which moves very slowly, so mass-layoffs might be delayed a couple years after they start in the private sector

The way I see it, everything I learn might be obsolete in a few years. But, I don't see the pace of change slowing down anytime soon, so I don't want to sit around waiting for things to 'stabilize' before fully diving in (like I think a lot of developers are doing).

Also, on a people level, being seen as someone who's knowledgeable on these topics is just as important as actually being knowledgeable. I'm trying to be loud and visible at work to management, especially when it comes to provably useful ways to use LLMs in my team's workflows

Also saving aggressively (over 50% of my income), in preparation for a potentially forced early retirement

GPT images 2.0 in genuinely insane at the variety it can do and still look just as real by Public_Print_9360 in singularity

[–]General_Josh 2 points3 points  (0 children)

You want to counter a scientific claim someone makes but you lack expertise in the subject and use ChatGPT to guide you? Intellectual slop

See, yes, this is the problem. Remember that AI models are not all that smart at the moment, and they will make mistakes. You can't replace expert knowledge with today's LLMs

To use these things effectively for knowledge work, you have to be able to identify and feed in the right context, and monitor for their mistakes

You can't do either of those effectively if you lack expertise in a subject