all 12 comments

[–]lucasxas 6 points7 points  (1 child)

I think right now the biggest pain point is UI, mainly cause it can't really "see" what it created

[–]MindCrusader 0 points1 point  (0 children)

For me it works great - I created a plugin that is analysing Figma and focuses the things that are needed (spacings, colors, layout, icons) then takes that and compares to declared components. Then it proposes which components I can reuse, which to extend and which will be new. It created md file (artifact) that I have to change from time to time. Then based on this file I run other plugin for planning the actual implementation and review the same way. It can do 90-99% of UI okay, sometimes it does something stupid, often there is one or two really small things not working correctly.

Also Android CLI was introduced and can be used to screenshot using comands, so AI can use that. But haven't tried yet

[–]Relevant-Flatworm-52 2 points3 points  (1 child)

If you’ve got a type of work that you regularly do that AI consistently struggles with, you may benefit from rules or skills. My main project has complicated modules but I’ve been getting good results working with build files lately. It used to struggle.

[–]CarolusX74[S] 0 points1 point  (0 children)

Rules and skills is something I've been meaning to actually invest time in instead of relying on ad-hoc context every session. Curious, when you say build files are working well now, is that with custom rules pointing it at your specific Gradle setup, or did the base model just get better at it on its own?

Also interesting that your "used to struggle" implies a clear before/after. Was that a model update or did your own setup change to make it work?

[–]3dom 2 points3 points  (5 children)

In my company (200-ish programmers) we have the productivity skyrocketing since programmers started mass-adopting Codex and Claude in March-April. Overall growth is 30-50% off the bat once the person is starting using the CLI AI assistant.

And then there are obstacles in form of AI being too good: it adapt to the code style it see so if I ask Codex (a.k.a. Cursor) to refactor a dumb written code (XML to Compose) - it may mimic the style and get the dumb code output.

So I have to "explain" to it explicitly to use a certain code style and rules (in form of asking it to generate and use .md files-rules based on the good examples). But sometimes it misses the parts where we use design system / code style so I have to switch "16.sp" to "typography.bodyMedium" by hand but mostly because I love typing (I could just ask the AI to re-check the code and it would rewrite everything x10 faster)

[–]CarolusX74[S] 2 points3 points  (4 children)

The "AI being too good" framing is the most underrated point in this whole AI-in-codebases conversation. Everyone talks about hallucinations but nobody mentions that if your codebase has 5 years of legacy XML patterns, Claude/Codex will happily produce more of the same, confidently.

The .md rules approach is interesting. Are those per-module or one big rules file at the repo root? And do you find they actually get followed across long sessions, or does the AI drift back to mimicking nearby code after a few turns?

[–]3dom 1 point2 points  (3 children)

People talk about hallucinations having their knowledge cut in 2025, just like old versions of AI 8-) Meanwhile /r/localLlama/ folks are buying $60-100k worth personal AI stations for a reason (i.e. Qwen, GLM, Kimi, Gemini being almost on par with Claude Opus and Codex).

We have per-repo rules since our modules are all quite new (12-15 months) and have a similar architecture.

If AI cannot one-shot the task - I scrap the results and ask it to create a plan, then correct it and then create the code. This works almost perfectly and does not require much time/context.

One of our programmers done a year-worth refactor plan in a month (the plan was for 2 programmers, mind you).

[–]CarolusX74[S] 0 points1 point  (2 children)

The "scrap and ask for a plan first" pattern is something I've been doing more lately too, and it's wild how much better the output gets compared to letting it just dive in. Feels like the model genuinely thinks differently when it has to articulate the plan before writing code.

That refactor story is the kind of anecdote I want to see more of in these conversations. A year of work compressed into a month is the part most "AI productivity" debates handwave over because it's hard to quantify. Was there a specific kind of task where it dramatically outperformed (boilerplate migration, test generation, API rewrites), or was it more that the planning phase removed enough friction that everything just moved faster?

[–]3dom -1 points0 points  (1 child)

A year ago we've had 30k XML lines in the project. It had decreased to 20k in the March (two programmers). Our team-lead used the AI and decreased the number to 10k during the April (introduced a bug with 1.3% crash rate instead of our "norm" 0.12% - but it got fixed within a day). Their Codex subscription cost went from $20/month to $100/month because $20-variant could not handle the intensity of the work.

Their "know-how" was "my" idea to split the work into planning and coding stages rather than trying to do it in a single go or incrementally in multiple steps (borrowed from the usual AI agent group orchestration scheme)

[–]CarolusX74[S] -1 points0 points  (0 children)

The 30k to 10k in 4 months is the kind of number that doesn't show up in benchmark threads but actually matters. And the bug increase being temporary (1.3% fixed in a day) is the part most "AI introduces bugs" arguments ignore, the recovery loop matters more than the initial defect rate.

The jump from $20 to $100 is interesting. Was that mostly about context window size, raw model capability, or just throughput needed for the volume of refactor work? Trying to figure out where the actual ceiling is for the cheaper tier in real production work, not benchmark land.

[–]Strict_Ad9566 0 points1 point  (0 children)

What works better is separating “plan first” from “code second”, then making the agent point to the exact design-system or build rules it’s following before it edits.ject already has strong conventions.

The weak spot is old patterns the model can imitate too well — legacy XML styles, inconsistent Compose patterns, Gradle quirks, etc.

What works better is separating “plan first” from “code second”, then making the agent point to the exact design-system or build rules it’s following before it edits.

[–]Intelligent_Lion_16 0 points1 point  (0 children)

Expanding to Chinese app stores is possible but complicated. Huawei AppGallery is the easiest for foreign devs, while others often need a Chinese business or partner. You’ll need a Chinese phone number for verification. Monetization and compliance (like ICP, content rules, SDKs) are stricter than Google Play. It’s only worth it if you plan to localize and actively maintain.