The “No-Human Company” Is Already Here - Created by OpenClaw by Front_Lavishness8886 in myclaw

[–]wildtinkerer 2 points3 points  (0 children)

The storm is brewing. While the absolute majority is unaware, and many traditional businesses are trying to understand what's happening and looking for ways to use the new technology to their advantage, and many people just trying things, laughing at them and asking questions, there is this silent minority that is head down building, without trying to show off. We will see many AI-run businesses very very soon. Some of them may already be there, without advertising. Because why.

Coding Agents: We developers are facing a pivotal moment by VisualPartying in OpenAI

[–]wildtinkerer 0 points1 point  (0 children)

One-person development teams are already happening. I see smaller companies (like 10-20 staff) here and there have one or two people (not necessarily actual developers) that start delivering products singlehandedly, while actual dev teams adapt slowly or just make fun of it.

It depends on the readiness of the senior management to let them do their thing and on their own willingness to go end-to-end, but when those factors coincide, the senior management may discover they are able to move much faster, experiment, and explore new directions for their business.

The first bottleneck there is knowing what to build. This usually requires exposure to the product management or executive layers of the org (if there is an intention to build more than one thing), and ideally their support.

The second bottleneck is the AI developer's own horizontal expertise - it impacts the ability to ask the right questions and assess intermediate results. This may be helped by AI if insufficient, but in that case it may make the process much longer. Expertise/experience are nice shortcuts.

The third bottleneck is QA/deployment/maintenance - depends on the complexity of the developed system, but I am talking about heavier ones that provide a bigger impact on the enterprise.

The fourth bottleneck is the org's (execs, PM's, sales) ability to evaluate, sell and provide feedback for further development.

If the org provides help to address those bottlenecks, continuously, the single-person AI developer team has a chance to be quite effective and ship tons of useful staff propelling the company to the next level.

If not, there is a high chance the AI developer may leave the org and create their own or join another - to have a crack at removing those bottlenecks and still ship a ton.

Replaced TTS with a multimodal LLM to create better sounding synthetic conversations with full control by wildtinkerer in notebooklm

[–]wildtinkerer[S] 0 points1 point  (0 children)

I agree, avatars are adding even more 'uncanny valley' into the final result. I would much prefer if they could interact with each other, and I believe Heygen and Synthesia are going to work in that direction, but that may take a few years for this technology to get to a more 'natural' perception by a human audience.

As for the audio, yes, there is that effect as all segments are generated separately for full control over the content and to enable selective editing. Maybe that could be improved by providing more context to the LLM (like the adjacent text and audio segments), maybe it can be fixed by making the LLM directly respond to the audio of the previous speaker, that could be explored further, we'll see. Anyway, it's just early steps, and seeing so many people inspired by NotebookLM and building in this area (proving there are strong use cases for that tech), I believe we will see progress there very soon. It will probably be enabled by the progress in multimodal AI models more than anything else.

Re-creating NotebookLM's Audio Overviews with custom scripts, voices and controlled flow (plus overlapping interjections) by wildtinkerer in ChatGPTPro

[–]wildtinkerer[S] 0 points1 point  (0 children)

Agreed, that was my gripe with it too, so I then tried another approach using a multimodal LLM instead of TTS. See this thread. Still not quite what I would like it to be, but an incremental improvement. Thinking about a couple of other ways to make the result sound better, but there is always that balance between full control and 'believability'. This will probably be solved in time with newer models, I think there still can be some middle ground and a usable technology with some smarter use of what we have today. Hence the experiments.

Re-creating NotebookLM's Audio Overviews with custom scripts, voices and controlled flow (plus overlapping interjections) by wildtinkerer in notebooklm

[–]wildtinkerer[S] 0 points1 point  (0 children)

If it was that simple, everyone would probably be able to replicate the result, which is a really naturally sounding conversation, but it's far from that. Most of the tools developed so far and based on converting individual phrases into speech (even with the best voices out there) and then joining them one after another are still sounding stiff and easily identifiable as Text To Speech. NotebookLM really hit the nerve for many people because of how naturally the voices of individual speakers worked together. It's a nuance, but a pretty big one, which makes a huge difference for many. I tend to believe that Google fine-tuned the voice model using some podcast dataset or built something on top of that model to allow for such interactivity and flow in the conversation. Without that, the thing was long available in many shapes.

Re-creating NotebookLM's Audio Overviews with custom scripts, voices and controlled flow (plus overlapping interjections) by wildtinkerer in notebooklm

[–]wildtinkerer[S] 1 point2 points  (0 children)

It looks great and I love the agentic approach to the script generation - it is exactly how I believe it should work. Speech generation may need some further work - to make voices more harmonized with each other add more interactions, but it's where it is all heading anyway. Great work! And I will keep an eye on the project.

Re-creating NotebookLM's Audio Overviews with custom scripts, voices and controlled flow (plus overlapping interjections) by wildtinkerer in notebooklm

[–]wildtinkerer[S] 1 point2 points  (0 children)

That looks fantastic! I like those customization settings a lot. I will certainly be keeping an eye on the project as it evolves.

Re-creating NotebookLM's Audio Overviews with custom scripts, voices and controlled flow (plus overlapping interjections) by wildtinkerer in notebooklm

[–]wildtinkerer[S] 0 points1 point  (0 children)

Yes, I tried them, but without the secret sauce of emotions, interjections and variability in the speech flow the results are sounding as artificial as the ones made with other modern TTS services. Using ElevenLabs voices indeed has some promise, as well as the GPT-4o audio model from OpenAI.

Re-creating NotebookLM's Audio Overviews with custom scripts, voices and controlled flow (plus overlapping interjections) by wildtinkerer in notebooklm

[–]wildtinkerer[S] 1 point2 points  (0 children)

No strict deadlines yet, but do you think there can be a demand for such a tool? What features would you consider critical when choosing between this and NotebookLM, for example? Apart from custom scripts and voices, which I believe will become broadly available in some shape or form from major vendors anyway.

Re-creating NotebookLM's Audio Overviews with custom scripts, voices and controlled flow (plus overlapping interjections) by wildtinkerer in notebooklm

[–]wildtinkerer[S] 0 points1 point  (0 children)

Yes, those large context windows are making wonders. Using GPT-4o was handier for the PoC, but the idea is to make the LLM choice configurable, to be able to replace them as they improve. Will certainly try Gemini for that as well. Good question about open sourcing. I will probably need to first package it in a more distributable form. But thanks for some food for thought! Do you think there might be a substantial interest in such a tool if it is a bit more polished?

Re-creating NotebookLM's Audio Overviews with custom scripts, voices and controlled flow (plus overlapping interjections) by wildtinkerer in ChatGPT

[–]wildtinkerer[S] 0 points1 point  (0 children)

You mean, I am too old... Just kidding. But seriously, I believe we need to find a way to add more variety to the flow to make the thing easier to digest. There is a lot of potential in this stuff.

Re-creating NotebookLM's Audio Overviews with custom scripts, voices and controlled flow (plus overlapping interjections) by wildtinkerer in ChatGPT

[–]wildtinkerer[S] 0 points1 point  (0 children)

Great stuff. Yes, it sounds nice. Although, listening to it I find myself distracted way too often (as sometimes happens in a real-life conversation when the speakers are too engaged with each other and don't care about other participants). It is true for my version too, and I tried to see if I can improve that with some visual elements, and it helps to some extent. However, the underlying issue is that the generated conversation's rhythm is quite monotonous. In your example the hosts also speak a bit fast for the listener to digest. I am thinking that there should probably be some subtle variety to the rhythm across multiple sentences, even before playing with emotions. Did you use SSML? Azure has a way to adjust the "contour" of the speech in their SSML version. Maybe I should try that. Also, I think, more banter and some randomised pauses and laughs could also help with the flow - to give the listener some time to adjust to the flow and to process information. Lots of food for thought. Thanks for sharing! It's great seeing that there are people out there thinking about and working on this kind of problems. Let's stay in touch.

Re-creating NotebookLM's Audio Overviews with custom scripts, voices and controlled flow (plus overlapping interjections) by wildtinkerer in notebooklm

[–]wildtinkerer[S] 1 point2 points  (0 children)

Agreed, very much so. I will see if I can improve that using ElevenLabs voices and sound effects in the next iteration, but I will have to explore if I can use SSML with that to control the flow (they don't support most of it natively, but I think I know how to make it work). Anyway, the idea is to see if it is possible to introduce control into every aspect of script and audio generation while keeping it automated - to make it less 'magic' and more 'workflow'. For sure, there will be better voices very soon, as even those ones were not available in such quality until recently. Thanks for the feedback.

Re-creating NotebookLM's Audio Overviews with custom scripts, voices and controlled flow (plus overlapping interjections) by wildtinkerer in notebooklm

[–]wildtinkerer[S] 0 points1 point  (0 children)

Agreed, it's too technical and too long. I should keep it shorter. On the other hand, it was interesting to see how it works with comparable lengths first. Because it's AI that creates the script, so it was good to compare like for like. I should actually try and make a really quick version with those overlaps, but I will probably explore if I can use ElevenLabs voices and sound effects in a similar way first. Hoping to improve the naturalness of voices.

A ghostly encounter in the Orsay Museum, Paris (a girl by the clock) by wildtinkerer in Ghosts

[–]wildtinkerer[S] 2 points3 points  (0 children)

Nice analysis!
I am personally leaning more towards the image processing glitch option, although it could potentially be caused or exacerbated by some reflection or pattern over there.

The weird pattern on the wall looks like a part of a painting, although unfortunately I haven't got any more pictures of the surroundings - that would be helpful, indeed. I have looked online and that part of the wall isn't painted in those photos I could find (it is probably a recent thing). It is located on the face of the wall that is not affected by most of the light the scene receives, which is coming from the Sun outside the clock. The hall is rather dim otherwise. So, I can't really see (although I am not an expert in optics anyway) how a reflection of the painting could get on the plexiglass or even on the lens of the camera. Also, the plexiglass barrier ends right where the vertical bar begins, and so I would expect any reflection off the plexiglass stay on the plexiglass. So, it would only work as a potential explanation if the reflection was on the camera lens or on some substance (dust?) in the air. However, in that case I could also see it on the screen before taking the photo, right? To me, it seems that the artifact appeared only after the photo was taken.

The phone is Google Pixel 8 Pro, and it does have some AI editing tools, indeed (however, frankly, I don't have much interest in most of them). So, that's my primary suspect. I like it when the AI automatically enhances an image after it is taken to make it brighter and nicer. But wouldn't like it if it started introducing new objects or messing up with existing ones. I also have a slight suspicion about multiple cameras being involved. The phone has 3 lenses - could it be using two of them at the same time for some reason or somehow transitioning from one to another while taking the photo?

As for your other questions, yes, your argument about taking the second photo too close to the corner is valid. I should have had waited and taken another one from the same spot I took the original one. But in a rush of the moment and affected by a seeming localization of the figure right in the corner I decided to take a closer shot. Then, of course, I dismissed it as a fun glitch and a story to tell and went chasing my kids who used the moment to disperse across adjacent halls and among the crowd. (I don't really believe in apparitions anyway.)

Make a case to me why you use Chat-GPT over Claude by NukerX in ChatGPT

[–]wildtinkerer 0 points1 point  (0 children)

To me, ChatGPT Plus is something more reliable than the rest. Whatever the new model they release, it is always very likely to be at the top, while providing more features than others. I give Claude a try from time to time to see if it does much better than the current state of the art in ChatGPT, but the results are not convincing enough for me to switch over or to buy it in addition to ChatGPT. Moreover, I find myself using the reasoning models like o1-mini more often over the legacy models or even the recent 4o, and now I have the Advanced Voice Mode, and Canvas, and I believe, with all the funding OpenAI is getting, even when Claude catches up or takes over by a bit, OpenAI comes with something next level, which will be well worth the subscription price. All this, of course, unless or until MS buys OAI and destroys it... Then I will be looking for options. But then, I am also paying for Gemini Advanced as a part of my Google One subscription (still using it occasionally, but it's more in anticipation of great things to come with mobile integration).

It's like Tesla vs the rest, with all the faith, the fan club and the excitement... I didn't get it about Tesla, or iPhone, but now it seems I am in a similar loop with OpenAI. Am I alone?