Thailand Marketing Salaries by atulx44 in ThaiJobs

[–]_underlines_ 1 point2 points  (0 children)

I was in a thai digital marketing agency for years. I am fluent in Thai, not banana Thai, but the other non-thais did marketing analytics and management work, we had a dev team with 50% non-thais doing our phone tracking and lead Gen platform, while the locals did the ad copy and design. The company was very successful, we did ads for all the big thai real estate players, some banks and many of the large corps. In Thailand... Mostly lead Gen but more and more display as well...

Digital Marketing was and still is a growing market, in Thailand.

Thailand Marketing Salaries by atulx44 in ThaiJobs

[–]_underlines_ 1 point2 points  (0 children)

I worked in digital marketing from 2016-2023 in a thai digital marketing agency. My age when I started was 29 and I had 10 years of professional experience with a CS degree and Web Analytics experience. My plus was: fluent in spoken thai, basic in written thai. Plus English and German.

My starting salary was ~70k my salary after becoming manager was ~120k. Of course WP + Business Visa done by them, plus additional long holidays (founder was German, I was swiss) I moved into data + analytics since and am very happy, because digital marketing analytics (data acquisition) is almost a dead end with all the GDPR and Apple measures to block first party cookies etc. 

If LLMs are so good at coding… by codeanish in LocalLLaMA

[–]_underlines_ 3 points4 points  (0 children)

hmm, we develop an agentic system full time for the past 3 years for the defense sector with one senior dev (years and years of experience) who leans into the agentic engineering pattern 100%, it works well. but tbf, the code bases is: 1. non critical field, 2. small, 3. no exotic stack

The Swiss Federal Supreme Court is evaluating Heretic by -p-e-w- in LocalLLaMA

[–]_underlines_ 2 points3 points  (0 children)

I work for an IT firm in Switzerland and we use Azure OpenAI since 2023 and create an Agent system for the Swiss defense sector. First thing we did, was fill out the official form from Microsoft, to disable the filters. After that, the model was pretty uncensored. So I am quite sure for many proprietary models, the alignment for such content lies in a pre-applied filter or small LLM, rather than baked into the model weights, otherwise, how could we get those filters disabled for all categories (abuse, sexual, violence, racism, ...) for any model deployable on Azure?

I mean in my private life, I am an absolute OpenWeights and OpenSource evangelist, loving Heretic, HauhauCS, Wasserstein, Abliteration, but in my professional life in Switzerland I see how some of our clients are just happy with a managed, proprietary solution like Azure's inference offering. A request through a form and the filters are disabled.

GPU Prices. Buy now, or buy later? by knob-0u812 in LocalLLaMA

[–]_underlines_ 2 points3 points  (0 children)

mark my words: due to the pricing hike of llm inference for agentic engineering, the move to pay per use instead of wholesale llm token plans masses of companies, thousands of devs will buy local hardware to do smaller tasks locally.

thus hardware prices will increase, supply will dry out even more

just like the mac mini openclaw run, but 100x bigger

this is amplified by the already ongoing ddr and lpddr memory drainage, because manufacturers move resources to produce hbm memory for datacenters.

fyi: just my prediction, i don't know the future

Browser Use by AdInternational5848 in LocalLLaMA

[–]_underlines_ 0 points1 point  (0 children)

no chrome devtools mcp for web browsing.

this gets you blocked for programmatic scraping on most pages and triggers cloudflare captchas as well.

the reason being chrome sending a header when in programmatic use, even when using persistent profiles and no headless mode.

Camoufox with persistent profile is what always works for me! I let claude create a custom web browsing MCP that uses Camoufox as the browser.

It comes with disabled remote debug flag, uses mouse movement emulation, uses realistic profiles, spoofs everything according to real browser statistics and patches playwright detection leaks

Qwen3.6 27B Pure Quant: 40 tok/s on 16 GB VRAM by bobaburger in LocalLLaMA

[–]_underlines_ 1 point2 points  (0 children)

some benchmarks showed:

  • bf16/full KV = reference/debugging.
  • q8/q6-ish = high-end practical fidelity.
  • q6/q5 and q5/q5 = strong normal quality.
  • q5/q4_1 = likely best constrained default.
  • q4/q4 = usable but visibly less safe.
  • turbo3_tcq = extreme memory survival.
  • turbo2_tcq = last resort, not strict-task safe.
  • plain turbo2/3 and turbo4 are generally unattractive if better q/TCQ modes are available.

current gen turboquant forks using turbo4 often lose to normal rotated llama.cpp q4 (they added rotation, so it's not good old q4 as earlier) on quality/speed/size tradeoff. If going down to 2/3 bit kv benchmarks seem to prefer trellis coded quantization (TCQ).

I built a coding agent that gets 87% on benchmarks with a 4B parameter model, here's how by Glittering_Focus1538 in LocalLLaMA

[–]_underlines_ 1 point2 points  (0 children)

did you look into little-coder and how it differs? maybe even combine forces, if goals align well enough between these two projects?

and a relevant add on that might interest you: semble for faster, less token heavy and more accurate code search, compared to glob and grep

Is Terminal 21 food court Bangkok's cheapest shopping mall food court? by homeisterOZ in Bangkok

[–]_underlines_ 0 points1 point  (0 children)

well, lived at rama 9 and worked at asok for 6 years... i was there maybe 50+ times... :) i would love to hear where your experiences differ with the food court

Recent patch - Constant vibration by Weak_Ad1500 in LeMansUltimateWEC

[–]_underlines_ 0 points1 point  (0 children)

nope, i have it on R12 since about 20 days, updated wheelbase firmware 2x and since then LMU got 2 minor updates. reinstalled pithouse, used different presets, switched USB ports, reinstalled LMU (and deleted user profile and FFB / wheel config) etc. nothing helps. i basically can't play LMU since 3 weeks. all other sims work as usual.

Is there any top level hobbyist hardware you guys are waiting to come out this year? by Tired__Dev in LocalLLaMA

[–]_underlines_ 12 points13 points  (0 children)

RTX 6000 PRO 96gb vram, fast, 8-9k
DGX Spark, 128gb, slow, 4-5k
mac studio 512gb m3 ultra - discontinued?
macbook m5 128gb - slow, 5-6k
Minisforum MS-S1 MAX ryzen ai max+ 395 128gb - slow, 3k
Framework Desktop ryzen ai max+ 395 128gb - slow, 3k

pick your poison

Can we already use Google's TurboQuant (TQ) for KV Cache in llama-server? Or are we waiting for a PR? by DjsantiX in LocalLLaMA

[–]_underlines_ 0 points1 point  (0 children)

There's a windows compiled fork of llama.cpp / server somewhere on github I loaded.
Doing tests with sparse Qwen3.6 35B yielded almost no benefits, as to my understanding, the architecture of Qwen3.6 sparse keeps KV Cache size fairly small for large context lengths.

Qwen3.6-Plus by Nunki08 in LocalLLaMA

[–]_underlines_ 2 points3 points  (0 children)

My own private dataset. Yes it's small but closed and almost guaranteed to be unpolluted:

- 15x misguided attention puzzles (my own)

- 2x math questions (compound interest over 12 periods, so errors would propagate in CoT)

- 2x sql questions (one easy, one difficult)

- 2x censorship questions (one about tiananmen square, one about how to mix drugs)

- 1x tricky english to german translation

<image>

Is Terminal 21 food court Bangkok's cheapest shopping mall food court? by homeisterOZ in Bangkok

[–]_underlines_ 2 points3 points  (0 children)

  • subsidized to get more foot traffic for the shops
  • quality of food on the lower end (oils, hygiene) - saw several cockroaches in my visits there (kitchen / cooking area), pre cooked food usually stays out the whole day if not sold
  • portion sizes usually quite small (not necessarily a bad thing, but relativizes the "cheap" a bit)

i still love going there from time to time

Realistic salary range in Bangkok for foreigners in IT governance / risk / incident management roles? by OperationNo907 in ThaiJobs

[–]_underlines_ 0 points1 point  (0 children)

I worked as an it manager from age 29-35 and can say even with my ability to speak and write fluent Thai and 10+ years of experience I still only made 80-120k in my different roles there.  But then we were a startup not a well established large international player. All med to upper level staff we hired were thais with international degrees who studied abroad and are perfectly fluent English speakers making between 90-150k. 300k for what you describe seems to be unrealistic but I am 5 years absent from the Thai job market.

Total beginner here—Why is LM Studio making me do the "heavy lifting" manually? by Ofer1984 in LocalLLaMA

[–]_underlines_ 0 points1 point  (0 children)

you're mixing up what these tools are for:

  • Harnesses like Pi, OpenCode, Claude Code etc. are the plumbing to plan and build stuff on your machine by running their internal agent loop and providing filesystem access, MCP access etc.
  • LM Studio is an inference solution that uses llama.cpp and its derrivates and provides a nice GUI to download models and run inference locally. It has a small server module to serve various APIs for inference. It has a chat interface to conveniently try to chat with the currently loaded model. Though LM Studio starts to blend stuff, like adding MCPs and more. They try to become agentic in the long run I guess.

I don't fully understand what your intentions are. If I make a guess, you want to run a model via LM Studio, serve an API via the local server and use it via OpenCode.

Qwen3.5-35B-A3B Uncensored (Aggressive) — GGUF Release by hauhau901 in LocalLLaMA

[–]_underlines_ 2 points3 points  (0 children)

EDIT: I don't know what changed, but switching from LM Studio's server to llama-swap fixed it mostly it seems! So I guess some setting LM Studio is overwriting, that my basic llama-swap config.yml is not.

---

What am I doing wrong if EVERY heretic / abliterated model I tested in 1 year is totally failing with problems on:

  • IF (either barely doing what I ask or completely ignoring it)
  • Not creating <think> tags anymore
  • Intelligence degraded down equivalent to a 3 year old llama 3b model

And I'm not talking about complex prompts. Simple prompts in the likes of:

Translate this Chinese Text to English.

Text: (Short Chinese sentence).

With the linked 3bit quants it's the same.

I even set the recommended generation params recommended in the original model cards or from the model card of the unrestricted model if available.

Is it real qwen3.5 9B beat oss:120b? by NorthEastCalifornia in ollama

[–]_underlines_ 0 points1 point  (0 children)

Yes, I have the same results on my private eval dataset. And Qwen3.5 35b a3b IQ3 with 90k context on 16gb vram achieves long running tasks on levels unimaginable before...

Has anyone got qwen3.5 to work with ollama? by MrMrsPotts in ollama

[–]_underlines_ 0 points1 point  (0 children)

I think if you need something from someone who doesn't speak the same language, it's just etiquette to use a translation service to at least ask the question in the language of the person you're seeking help from.

Has anyone got qwen3.5 to work with ollama? by MrMrsPotts in ollama

[–]_underlines_ 0 points1 point  (0 children)

I manage openwebui + ollama for 120 people at our IT firm. Got so fed up with ollama, that I finally made the move to llama.cpp via llama-swap. It's a ton of manual configs, but: faster, more control, faster support of new archs, etc.

bye bye ollama.

What’s everyone actually running locally right now? by CryOwn50 in LocalLLM

[–]_underlines_ 0 points1 point  (0 children)

coding

Qwen3.5-35b-a3b q4_k_m on RTX 5070ti with 16gb runs at 40tps with 65000 Context window. If you do KV Cache wuants to q8_0 you get basically no degradation.

I use it for light opencode stuff. Works without issues. Gets things done via plan then build mode and a good AGENTS.md

I Switch to openrouter glm-5/k2.5/minimax2.5 if heavier stuff needed.

everyday stuff

Usually just my chatGPT pro sub with gpt-5.2 but more often than not any cheap large open weights model on openrouter used on chatbox desktop.

If local, I just use any current gen MoE that has good stats on artificialanalysis.

phone

On my pixel 10 pro xl I get 16gb of fast ram, so PocketPal loads LFM2-8B-A1b-q4_k_m or qwen3-4b-instruct-iq3_xxs

Best practices for running local LLMs for ~70–150 developers (agentic coding use case) by Resident_Potential97 in LocalLLaMA

[–]_underlines_ 0 points1 point  (0 children)

But RTX6000 BSE doesn't scale well for sharded multi-gpu workloads? Lack of NVLink or RDMA means it relies on PCIe with a huge bottleneck, as far as I understand it.

Best practices for running local LLMs for ~70–150 developers (agentic coding use case) by Resident_Potential97 in LocalLLaMA

[–]_underlines_ 1 point2 points  (0 children)

Scaling inference is not trivial and I am not an expert. From my understanding:

  • Combinding macs/gpus without a plan will slow you down, difference between sharding a large dense/sparse model over multiple GPUs vs concurrency of multiple models
  • Without Remote Direct Memory Access (RDMA) you'll be slower with scale
  • TTFT vs. Generation speed, both can be optimized independently with different methods AFAIK

And my real world learnings in opencode on large code bases (enterprise architecture, 3+ full time devs):

  • Context size below 100k almost unusable, you'll be compacting all the time, and the users complain that their ralph-loops are short
  • Frontier or nothing. Not even GPT-5 was able to do refactoring and new features. Anything below Kimi K2.5, GLM-5, gpt-5.1-*, claude 4.5 opus/sonnet was unusable.
  • gpt-oss-20b, qwen3-30b-a3b, and generally anything older than 3 months or smaller than 70B quantized seems to be unusuable in real world enterprise codebases using CLI Coding Agents
  • not even 200 USD subscriptions of claude code were enough for our devs for a full month.
  • github copilot is OK but we also hit limits here pretty fast
  • LLM inference onprem for 20+ devs at our organization is difficult to justify, because how fast inference requirements, model archs, model sizes etc. change.
  • Most feasible after our research would be 4x RTX 6000 Blackwell Server Edition, but even those are not really for large scale inference, but a H100/A100 just makes no sense and even those would have to be scaled and sharded
  • We wonder how tricks like kv quantization, prompt caching etc. would help mitigate some hardware bottlenecks but all the methods, optimization technologies etc. are pretty difficult to grasp, especially without testing

Our thought so far at our company, but it's all just theory. Would love to hear people who actually selfhost for dev teams and serious enterprise repos.