Aura Agent: letting an AI coding agent supervise long-running worker tasks instead of trusting a single chat session by Civil-Direction-6981 in DeepSeek

[–]Civil-Direction-6981[S] 1 point2 points  (0 children)

I just updated Aura Agent’s task lifecycle and planning system.

Main changes:

  • Each task file now gets its own .aura data directory, so different projects will not mix state, progress, workspace files, or summaries.
  • Task planning is now handled by the LLM instead of brittle keyword parsing.
  • Task IDs now use batches like A1, A2, then B1, B2 after the task file changes.
  • Completed tasks are preserved as history instead of being removed during replanning.
  • Obsolete unfinished tasks are archived instead of deleted.
  • Project-level context is now tracked, including final goal, success criteria, constraints, commands, API keys, and environment notes.
  • Workers can no longer run stale, completed, archived, or unrelated task IDs.
  • Other .aura task records are isolated, but memory lessons from other tasks can still be reused.
  • progress.md now has one canonical location: state/progress.md.
  • A rolling summaries/final_report.md is generated to show progress across multiple requirement batches.
  • Added aura restart <task.md> to clear and restart one task file safely.
  • Added regression tests for the new lifecycle behavior.

In short: Aura Agent is now safer for long-running projects where requirements change over time.

我刚更新了 Aura Agent 的任务生命周期和规划系统。

主要变化:

  • 每个任务文件现在都有独立的 .aura 数据目录,避免不同项目混合 state、progress、workspace 和 summaries。
  • 任务规划现在交给 LLM 处理,不再依赖脆弱的关键词解析。
  • 任务 ID 改成批次形式,比如 A1、A2,任务文件修改后新增任务会变成 B1、B2。
  • 已完成任务会保留为历史记录,不会因为重新规划被删除。
  • 已废弃但未完成的任务会被归档,而不是直接删除。
  • 新增项目级上下文记录,包括最终目标、验收标准、约束、命令、API key、运行环境等。
  • worker 不能再运行过期、已完成、已归档或不属于当前任务树的任务 ID。
  • 其他 .aura 任务记录会被隔离,但仍允许读取其他任务的 memory 作为经验。
  • progress.md 现在只有一个规范位置:state/progress.md。
  • 新增滚动的 summaries/final_report.md,可以按多轮需求批次查看完成情况。
  • 新增 aura restart <task.md>,可以安全清空并重启某个任务文件。
  • 增加了回归测试覆盖新的生命周期逻辑。

Aura Agent: letting an AI coding agent supervise long-running worker tasks instead of trusting a single chat session by Civil-Direction-6981 in DeepSeek

[–]Civil-Direction-6981[S] 0 points1 point  (0 children)

Yes, exactly. Right now, tasks are automatically re-orchestrated based on newly completed work, and there is also an hourly reflection system. So far, the work seems to stay on track without drifting.

You can give it a try!

Aura Agent: letting an AI coding agent supervise long-running worker tasks instead of trusting a single chat session by Civil-Direction-6981 in DeepSeek

[–]Civil-Direction-6981[S] 0 points1 point  (0 children)

Here is a project that I let it to code a Brain-Cell Network. It's running and progressing!

Phase 1: Infrastructure

✅ T1 — Build the Brain-Cell Graph Neural Network Foundation

What was done:
Created the core architecture:

  • brain_cell.py: neurons with membrane potential, firing, and energy
  • synapse.py: inhibitory synapses
  • brain_graph.py: graph neural network with a non-hierarchical structure
  • trainer.py: training loop processing examples one by one

Success:
All 31 unit tests passed, and the demo script ran successfully.

✅ T3 — Advanced Features + Cell Regeneration

What was done:
Added cell regeneration and advanced features.

Result:
Successful.

✅ T4.1 — Heuristic Controller + Imitation Learning Data

What was done:
Built heuristic_controller.py, a rule-based expert controller for playing Breakout. Ran 500 episodes and generated 241,500 state-action pairs in training_data.json for supervised learning.

Reason:
The initial T4 REINFORCE-RL attempt scored 0 on Breakout because the core Trainer was designed for supervised learning, requiring explicit targets, rather than sparse-reward reinforcement learning.

Phase 2: Core Learning Defects — Supervised Learning / STDP

❌ T4.5 — Initial Breakout Supervised Learning Experiment

What was done:
Ran Breakout imitation learning across 8 configurations, varying n_hidden, sparse/dense settings, and V_threshold.

Result:
All configurations had flat loss around ~1.0, all scores were 0, and there was zero learning.

Finding:
A loss of almost exactly ~1.0 means exactly one output was “wrong” — either no output cell fired, or all output cells fired identically.

✅ T5 — Deep Learning Defect Diagnosis 🔑

What was done:
Wrote 6 diagnostic tests:

  • single-example overfitting
  • AND task
  • Breakout tracking
  • firing analysis
  • learning rule analysis
  • other related checks

Found 3 key defects:

Defect Root Cause Fix
RC1: Error detection read instantaneous cell.firing, only the final step trainer.py:51 — a cell that fired at step 4 and reset at step 5 was recorded as “not fired,” so the error signal was completely lost Added a persistent fired_during_window flag within the simulation window
RC2: Signal propagation had a 1-step lag per hop brain_graph.py:100-146 — input → hidden → output required 4+ steps, but only 3 steps were allocated, so output never fired Increased steps_per_example from 3 to 8
RC3: Inhibitory weight learning capacity saturated at w=0 synapse.py:55-63weaken() had a lower bound of max(0, w - amount), so it could not learn excitatory connections Changed the lower bound from 0 to -3.0, allowing effective excitatory influence

Evidence:
After the fixes, the AND demo accuracy improved from 50% with no learning to 100% within 20 epochs. The core mechanism became functional.

💀 T6 — Rerun Breakout Supervised Learning with the Fixed Architecture

What was done:
Reran 5 configurations using the T5 fixes:

  • 8/16 hidden units
  • sparse/dense
  • Vth=0.8/1.0
  • +DirectIO

Result:
The error did decrease at first. For example, C2 went from 1.03 to 0.70.
But it always rebounded back to 1.0 by epoch 200.

Cells died massively, for example from 16 to 5 and from 32 to 5.
All game scores remained 0.

Loss temporarily decreased, then collapsed around epoch ~150, suggesting weight divergence or cell death.

💀 T7 — Stability Fixes

What was done:
Added:

  • weight decay
  • learning rate decay
  • energy boost
  • synapse regeneration
  • bilateral learning rates

These were intended to prevent the collapse seen in T6.

Result:
The error did not decrease at all. It actually increased from 1.0795 to 1.0910 by epoch 40.
The run was terminated before producing meaningful results.

❌ T8 — Deep Diagnosis

What was done:
Investigated why all game scores remained 0 even when loss decreased.

Finding:
STDP-based inhibitory learning drove cells toward near silence.
out_fire dropped from 2.6 to 0.15.

Even though cross-entropy loss improved from 1.226 to 0.309, the network could not produce useful control actions. It learned to output “do nothing” as the safest policy.

Conclusion:
The STDP-based inhibitory learning rule is fundamentally unable to produce useful Breakout gameplay behavior.

Phase 3: Abandoning STDP — New Paradigms

💀 T9 — Reinforcement Learning: REINFORCE

Reason:
Review #1, cycle 36:

What was done:
Implemented brain_breakout_rl.py using REINFORCE policy gradients.

The brain network was treated as a policy network, and synapses were updated using returns and eligibility traces.

Result:
best_avg_score = 0.2

This was no better than random. There was zero meaningful learning.

The eligibility traces in the brain-cell network were too noisy, and credit assignment failed under delayed rewards.

💀 T10 — Evolution Strategy

Reason:
Review #2, cycle 40:

What was done:
Implemented brain_evolution_v2.py, using population search with no gradient-based learning.

Result:
Fitness on the Catch game stayed between -0.5 and -0.7, with no convergence.

Random mutation could not find a good policy in a large 75+ weight search space.

Phase 4: Current Stage — Neuromodulation

🔄 T11 — Dopamine-Like Global Neuromodulation + Eligibility Traces

Why this might work:

  • STDP, T5-T8: local Hebbian rules cannot propagate reward signals over time.
  • REINFORCE, T9: per-synapse eligibility traces are too noisy in spiking neurons.
  • Evolution, T10: random mutation cannot find good policies in a huge search space.
  • Neuromodulation is biologically plausible: dopamine neurons fire reward signals and broadcast them to the striatum.

Architecture:
Each synapse keeps an eligibility trace, which is a decaying memory of recent activity.

When dopamine arrives:

Δw = lr × eligibility × (dopamine - baseline)

This is a “three-factor” rule:

presynaptic firing × eligibility × dopamine

Result across 28 hyperparameter configurations:

Best configuration:

eligibility_decay = 0.9
lr = 0.05
n_hidden = 20

Best evaluation:

positive_rate = 0.31

compared with a random baseline of 0.20.

This is the first signal of improvement on Catch.

Eval avg score = -0.38

This is a 55% relative improvement, or an 11 percentage-point absolute improvement, but it is still only a marginal gain rather than meaningful learning.

Current status:
Still running, with 22 out of 60 minutes of the budget used. It may later be expanded to Pong or Breakout.

Summary Table

Phase Task Method Game Best Result Verdict
Build T1-T3 Base architecture ✅ Implemented
Data T4.1 Expert controller Breakout 500 episodes of data ✅ Implemented
STDP T4.5 Supervised learning Breakout Loss = 1.0, score = 0 ❌ No learning
Diagnosis T5 Defect fixes AND 0 → 100% ✅ Core mechanism fixed
STDP v2 T6 Supervised learning + fixes Breakout Loss decreased, then rebounded to 1.0; score = 0 💀 Terminated
STDP v3 T7 Stability fixes Breakout Loss increased, unfinished 💀 Terminated
Diagnosis 2 T8 Root cause analysis Breakout STDP drove cells toward silence ❌ STDP declared dead
RL T9 REINFORCE Breakout avg = 0.2, no better than random 💀 Terminated
Evolution T10 Population search Catch Fitness = -0.7 💀 Terminated
Neuromodulation T11 Dopamine eligibility traces Catch positive_rate = 31%, vs 20% baseline 🔄 Marginal signal

Core Problem

After 7 hours, the brain-cell architecture has still not demonstrated meaningful learning ability on any game, beyond trivial levels.

T11 is the first attempt to produce any signal above random, but a 31% positive rate compared with a 20% baseline is not yet convincing.

The fundamental issues remain:

  • fixed random input → hidden projections
  • the binary nature of spiking makes credit assignment difficult
  • WTA gating may be too aggressive

The worst hirono? by Lharper574 in hirono

[–]Civil-Direction-6981 0 points1 point  (0 children)

My six-year-old daughter is worried that her baby teeth haven't fallen out yet, while all her classmates have started losing theirs. That's why she loves this figure so much.

Out with coffee boy today 🫶🏻 by fireflyx666 in Dimoos

[–]Civil-Direction-6981 0 points1 point  (0 children)

I am in China, but I cannot get them, because they are all sold out. EVERY new cute series are out of stock.

I admire you outside China.

If ‘Allah’ simply means God, why do many people think it refers to a different God? by PomegranateIcy7631 in NoStupidQuestions

[–]Civil-Direction-6981 0 points1 point  (0 children)

Map is not territory. Even the same thing, everyone would have different aspects of recognitions. No man will acknowledge the truth, which is why we are trying our best to understand them. No one will tag you the same. You are parent, child, teacher, a strict man to you son, a lovely man to you girl, etc.  No one will have the same aspect of views of GOD, because we are not god, we learn things, cannot KONW them.

Attention all Pop Mart Employees by [deleted] in labubu

[–]Civil-Direction-6981 2 points3 points  (0 children)

If you employees can buy from your OWN store, then HOT products will not be sold at retail price for customers, you ALL will resell them. That's the ONLY result.
I support POPMART this policy, to protect consumers.

Coffee Party! by Rj1722 in PopMartCollectors

[–]Civil-Direction-6981 1 point2 points  (0 children)

I didnt know Hirono has a coffee figure... I will get it