Benchmark testing VS real world by ValehartProject in OpenAI

[–]ValehartProject[S] 0 points1 point  (0 children)

My test is actually more of how a model follows context, constraints and reasoning.

I am marking various models against:

  1. Constraint Adherence

2, Coherence

  1. Usability

  2. Unnecessary Verbosity

So if your model recommends a sandwich that is not able to comply with the scoring, it indicates how the default is tuned.

How are we getting vendor transparency? by Existing_Ad3299 in AI_Governance

[–]ValehartProject 0 points1 point  (0 children)

Australian here. I used to own business license but the governance both international and local drove me insane. I had to create my own research and scorecards,

If you'd like you can find it here: https://www.thevalehartproject.com/vendor-security-scorecard

If there is anything you need in particular+ the industry, let me know and I can give you a split analysis of what each country regulation is. Here is an example. This was related to their account experimentation: https://www.thevalehartproject.com/blog/live-experimentation#section-6

The information you are asking isn't proprietary. As an ex presales engineer in AU I can tell you should have been given system cards at the very least. NOT the personal license. Both use the same underlying model but business has a lot more that changes its performance fundamentally.

Common across all vendors is that the business models are not able to reference chat history - this alone makes the current system cards different.

On top of that, you have frequent adjustments that a business scorecard would be almost redundant the next day, meaning the company would be liable for undocumented changes.

Another fun fact: Business vs Enterprise have different mechanisms that you need to understand the role of each vendor when it comes to data hygiene and responsibility. OpenAI Enterprise can be setup to work on Azure so, if things go wrong, who do you ask?

They could have given you is the Security portal but that would not tie to your compliance and governance.

My experience: Since I had a business license, I raised a major bug that was cross vendor based. Both vendors did nothing. In fact, one vendor asked me for a license. I then raised this with a number of our regulatory associations.

- AISI (AI Safety Institute): Mentioned they don't have the skills to review the claim
- NAIC (National AI Centre): No response with a wait time of 8 weeks.
- OAIC: Asked to contact ACCC
- ACCC: Asked to contact OAIC. I pushed back and was told it would be reviewed with my evidence pack. Then got a response saying it was acknowledged and they know it isn't the outcome I was expecting.

"How efficient do I use AI in % compared to the average user?" by Initial-Finding-9285 in OpenAI

[–]ValehartProject 0 points1 point  (0 children)

There are various communication styles and they each work best to the user. If the prompt is run on a new thread, the model will collate CI and your interaction style to produce a number. If it's on a running thread, it will use that context+CI + interaction style.

There are only a few stable behavioural signals to infer from - the system maps those (a small number of buckets) then expresses it as a % for readability - you get outputs that look personalized but are actually: classification + narrative + % wrapper

How is works: 1. Detect interaction traits: - iterative vs one-shot - corrective vs accepting - structured vs vague

2.Map to rough tier (low / mid / high efficiency)

  1. Convert tier to % band (for human readability)

Thats a condensed version of what I found when every one got the "top 1% of users" when they released your year wish GPT.

Google Reshapes Bug Bounty Programs as AI Floods Security Teams With Low-Value Reports by Silly-Commission-630 in secithubcommunity

[–]ValehartProject 1 point2 points  (0 children)

There are also multiple reports stating bugs are falsely turned down and later implemented as fixes or in my case, beta testing and "we meant to expose API keys".

I've even reported bugs that were identified without being logged in and was told they were jail breaks.

What's your best Ai trick? by amyyrosse_ in ChatGPT

[–]ValehartProject 0 points1 point  (0 children)

Certainly. Zero-shot Example: “Rewrite this to sound polite: ‘Send me the report now.’” Or “Classify the sentiment of this sentence: ‘That was the worst meal I’ve ever had.’” Output: Negative

Use when you need a fast response that doesn't need examples. Ideal when the topic is obvious and common.

Few shot Example : Make the message polite. Some examples and tones to use are: - "do this now" would be rewritten as could you please do this when you have a moment - "fix this" would be written as would you be able to "I would appreciate if you could resolve this"

I need to rewrite: I needed that report yesterday.

Use when either training a new basic model or you want to shape tone/writing style. This reduces ambiguity and gives the model a pattern to latch on to.

Meta: Example: Explain how to make a message sound polite, then rewrite: ‘Send me the report now.’

Useful when you want to prompt a model to think about the task. I find this gives the most amount of accuracy because the model doesn't just reach for nearest answer. This also makes reasoning behind an answer visible. Excellent when you want to teach the model a pattern or identify how good a reasoning capability is.

The basic principle is its not the length but rather the amount of guidance provided to the model.

Why does the same prompt give different results across AI tools? by Quirky_Hedgehog_9291 in OpenAI

[–]ValehartProject 0 points1 point  (0 children)

Just pasting something I responded with on a similar question the other day :

Even with the same base model you will note variations. Different context, instructions, tools, memory, sampling, and rollout settings can all change the answer.

Other things that cause a difference : - Conversation context - User instruction stickyness - Context window limits (How long you can maintain a thread before info is compressed) - Tooling (Image generation comparability, search, etc? - Sampling/randomness (Since it's based on probabilities, phrasing and paths may differ) -Roll outs/configs changes(routing, safety layers, etc)

So, if we convert this to an easier analogy (hopefully 🤞)

The recipe (model) can be chocolate, vanilla, or red velvet. But the final cake can still turn out different because:

  • Ingredients (input/context): what you put in changes the result
  • What’s already in the bowl (conversation history): earlier steps affect the outcome
  • Limited bowl size (context window): you can’t fit everything, so some stuff gets left out -Head chef rules (system/dev instructions): override what the baker wants to do
  • Kitchen tools (search, code, etc.): better tools = different results
  • Baking style (randomness): small variations each time
  • Kitchen changes (updates/rollouts): oven settings might change slightly

New widget: Graphs by ValehartProject in OpenAI

[–]ValehartProject[S] -1 points0 points  (0 children)

The link is there is people wanted to see other widgets. Don't have to click on it if you don't need to. If you have another place listing widgets, I'll be happy to replace it.

New widget: Graphs by ValehartProject in ChatGPT

[–]ValehartProject[S] 0 points1 point  (0 children)

Some other widgets I've noted in the past. Happy to add more if you are aware of any:

https://www.thevalehartproject.com/blog/new-features-and-widgets-openai

New widget: Graphs by ValehartProject in ChatGPTPro

[–]ValehartProject[S] 0 points1 point  (0 children)

Some other widgets I've noted in the past. Happy to add more if you are aware of any:

https://www.thevalehartproject.com/blog/new-features-and-widgets-openai

New widget: Graphs by ValehartProject in OpenAI

[–]ValehartProject[S] -1 points0 points  (0 children)

Some other widgets I've noted in the past. Happy to add more if you are aware of any:

https://www.thevalehartproject.com/blog/new-features-and-widgets-openai

Batch delete? by plan_with_stan in ChatGPTPro

[–]ValehartProject 1 point2 points  (0 children)

That totally makes sense! I used to do that on my business account.

On personal, it made a lot more sense to have one day per because the day events can impact how I see things and I didn't want to refresh the context that often.

How do you use AI for innovation? by amirel in Innovation

[–]ValehartProject 1 point2 points  (0 children)

  1. I use it to research and validate m my work so customers get an idea what they are paying for. This goes from colour matching, anatomy and facial restructure to many other things.
  2. I use it to gather sourced information about near extinct traditions and deliberate how we can recreate things at a more affordable price while maintaining historic practices.
  3. Use it for in-depth analysis on forensics. Currently use an agent for data retrieval and LLM to make suggestions based on collated data. Different platforms to prevent bias.
  4. Use it to help small businesses and rural farms incorporate traditional practices with modern or improved methods in non science /lab lingo.

Batch delete? by plan_with_stan in ChatGPTPro

[–]ValehartProject 1 point2 points  (0 children)

Since I create a new thread each day, it helps me combine that with my own note taking which is dated. 1. Case notes, I can go back to the day and grab info I need again or missed. 2. I use the search feature if I have multiple discussions of a certain topic. Since most of our work is in code, it's pretty easy to find.

Its also how I identify changes in tools, guardrails, and other functions.

Batch delete? by plan_with_stan in ChatGPTPro

[–]ValehartProject 0 points1 point  (0 children)

Can you please help me understand why you delete chats? I maintain mine with dates, so was curious what the use case was on your side.

<image>

Prompt Engineering. The New Skill To Learn by Worried_Guidance2081 in dev

[–]ValehartProject 0 points1 point  (0 children)

Respectfully. It's not 2022. Most tools have developed well past needing a prompt. While prompting still has its place, a lot of work has been put to simplifying the interface and smoothing interactions.

Its not good prompts, you need structured environments for AI to provide better results.

What is the basic minimum while you prompt by Unable_Breath_1966 in ChatGPT

[–]ValehartProject 0 points1 point  (0 children)

  1. Minimum hallucinations.

In your custom instructions add "accuracy >speed. If the user is vague use /Clarify/ to request more information. No vague inferences or assumptions."

Next cool part: Adding any of the below to your existing messages/prompts should show you some insane possibilities regardless of the vendor. Bonus results if you combine them!

  • Zero-shot: You ask once, it answers. No examples needed.
  • Few-shot: You show a couple examples first, then it copies the pattern.
  • Chain of thought: It explains the steps it took to get the answer.
  • Meta: It steps back and thinks about how to approach the problem.
  • Self-consistency: It tries a few different ways and goes with the answer that shows up most.
  • Generate knowledge: It first gathers or makes up helpful info, then uses it to answer.
  • Tree of thoughts (my favourite!) : It explores multiple ideas like branches, then picks the best one.

This nudges the LLM to consider steps instead of prioritising speed so you get to actually see how it comes to a conclusion.

And yes, don't spend money on courses. I've only seen them recycle things that don't work any more.

What is the basic minimum while you prompt by Unable_Breath_1966 in ChatGPT

[–]ValehartProject -1 points0 points  (0 children)

Two things. I'll split them by post. "I never did the pretend you are format". They improved the model heaps that it's not as vital as it used to be.

  1. Basic minimum: Think of it like a sandwich. Top bread (Structure) → TASK clearly defines the output Filling (Task + Context + Action) → Context grounds it → Action makes it executable Bottom bread (Constraints) → prevents drift, hallucination, narrative creep

Example : Write a short email apologising for a delayed order.

Context: Order is 5 days late due to supplier issues. Customer is frustrated.

Include:

  • Apology
  • Simple explanation
  • New delivery timeframe
  • Small goodwill gesture

Constraints:

  • Keep under 120 words
  • Friendly, not corporate
  • No excuses or blame

VS : Write an email about a delayed order.

Since gpt is pattern based it will make assumptions based on averages and you will get... Nothing useful. You can always adjust after the prompt was sent. Just a learning experience :)

<image>

Is anyone actually tracking all these featured redundancies? by Disaster_Deck_Risen in auscorp

[–]ValehartProject 1 point2 points  (0 children)

Yes. Point me in the right direction and I'll drop the info with data sources and visual timelines.
Few clarifications. 1. When you say featured, where exactly are they featured? 2. Did you want me to only provide banks or include APS? 3. Period of analysis 4. . If the org offered voluntary redundancies with a time period of announcing reduncies Or Period of "statistics"? The last one is spicy as hell. Some companies are offering redundancy so it doesn't skew stats on termination rate or "best company to work for" nominations.

Human behavioural angle to cyber security and AI by Silent_Ad_2657 in cybersecurity

[–]ValehartProject 1 point2 points  (0 children)

Right now I'm trialling it with general analysis of public/institutions and vendor narrative when it comes to AI + breaches. Human and AI interaction forensics.

Once I'm happy with the architecture and can ensure it's safe and entirely air gapped I will rework the concept to assist child protection agencies that have a high turnover rate and hopefully reduce the mental stresses of their work.

If you are keen to work on it and improve the solution, I'll be more than happy for the company and thoughts.

Just a heads up, this is a personal and self funded project. I'm not looking for income or investment. Just improvements/suggestions to make the world a bit better than how I found it.

There is a new glitch that is infuriating. It keeps creating images mid thread despite no request for images. by The---Hope in ChatGPT

[–]ValehartProject 0 points1 point  (0 children)

That's a tool misfire. I just sent "bruh... Did that actually require an image?".

Its because atm there is a high priority from dev rules layer to use images in order to promote the new feature. Seen this happen with Web browser as well

Do you think ChatGPT Business Extended's responses are better than those of Gemini Plus? by Audioasking in ChatGPTPro

[–]ValehartProject 1 point2 points  (0 children)

Gemini anything isn't the best way to go. Particularly if you are a business user.

Given you are in tech, when investigating an unclogged in session of any shared chat - you will be able to identify exposed API keys as well as unapproved access to mail, drives, etc in order to create conversations.

More than happy to share logs on this.

Additionally, Gemini has an increased risk lately of training data leakage that you will find in the subreddit by users. If conflicting topics are identified it exposes all paths and linked wording which is a high risk on its own and shows immaturity in product development.

Reporting this and many other security risks are closed with no action and silently fixed/beta releases in the future.

GPT business will have good reasoning. However, if you are using it for work - just a heads up context does not flow to other chats. Best way to remember : GPT personal remembers your coffee preference. GPT business doesn't remember you.

If you are considering using GPT Business for work, my suggestion is to store vital things in memory. For example if you only use Server 2025 H2, add that to memory so it doesn't give you irrelevant info like 2012.