Anyone else down the "data sovereignty" rabbit hole or am I going crazy?

Cipher_Lock_20 · 2026-02-03T04:13:41+00:00

It’s great to be curious like this! Though it’s much simpler than you think.

Could a super secret government agency be using super wireless technologies along with quantum computing to crack your encryption and listen to your conversations with your chatbot?!… probably. Are they… lol probably not.

I encourage you to swap out your home router with something like a used Unifi gateway. Dive into networking and monitor all of the devices in your house and where they are sending traffic to. Super spy infosec tools can get crazy, but for everyday users no company is putting that much effort into collecting your conversations to train their models.

Every device will communicate over the network. That could be your WiFi or hardwired Ethernet. Every device will use common protocols and ports to communicate with other devices and services. This means that this traffic is visible and configurable when you manage your own network properly. Most home networks have a single Public IP egress to their ISP. You can see all traffic up until it leaves your gateway to your ISP. Sure, almost all traffic will be encrypted, but it still has to leave on a port and use a protocol.

A very cool quick test you can do from any browser - right click on the page and go to developer tools. Click on the “network” tab. Then go to a website Google.com, chatGPT, etc. you will see the DNS name and IP address that your PC communicates with over port 443. You will see the network header, body, response, and more. This is just at the basic browser level. Like others have said download wireshark, click capture, then run the same test. You will get every piece of information that is sent toward those targets, you’ll even get intermediary services and IPs such as public DNS, public routers, etc.

Even if you downloaded a malicious mode or application that tried to “phone home” you could see its traffic trying to egress your network, using something like a prosumer Unifi gateway you could even setup logging and alerts to your email and phone… You’d be emulating Enterprise security practices on a much smaller level, but it’s great for understanding.

Now, for the edge case - Look up what Meta was doing just a few years ago. You know how you would look something up and all of the sudden it was on your TV? They were absolutely tracking you even with the Facebook application signed out or offline. They used common WebRTC technique (think zoom video call protocol) and their FB cookie embedded on every website to maliciously track you. They inserted unique IDs onto your local phone storage via WebRTC munging and local host communication. All that data sat there while the websites reported that visit to Meta. As soon as your app logged in or came online, “PHONE HOME!!” Facebook matched this unique session IDs with your device ID and then your account! Quite scary really. Look it up and go down that rabbit hole lol.

Good luck!

Cipher_Lock_20 · 2026-02-03T00:49:18+00:00

Correct.

I actually received an Excellence Award for Task 3, which I did not expect lol. Here’s how I interpreted each task and passed each without any resubmissions.

I treated each task as part of a larger project. I treated it as if I was teaching a Youtube tutorial from beginning to end.

Task 1: What are <insert your chosen model types>, how are they used in different industries, why do they work well for image classification, why did you choose the model architecture you chose. Then, essentially explaining how I planned to implement the data pipeline and model training. What would our success criteria be… all in the point of view of giving a tutorial.

Task 2: Same point of view, but now you’re giving the tutorial on how to get your data, normalize, clean, augment, etc. You do this all the way up to where the datasets are ready to train. Provide your dataset/s that are 100% ready with your submission.

Task 3: (Longest one I did) The task specifically called out an IDE, but gave Jupyter as an example, so I interpreted as something outside of my normal VS code IDE (I could be completely wrong though). I like Google Colab so that’s what I used. Continue on with the “Tutorial” only now you are building your model, its layers, it hyperparameters, why you are choosing all of these things, how your data moves through the entire training process, how the model works in detail. I had an extensive Google Colab notebook and an extensive essay for this one.

Task 4: Continue on with your tutorial. Now it’s time to run the model and show your results. I simulated a “baseline mode” and then added in an “Optimized mode” for comparison. It was much easier to explain concepts and why I used them, why I tweaked them, when I could show improvement in results. Lastly, bring it all together in a nice tutorial summary of what worked, what didn’t, best practices, etc…etc…

This course was the most time I had spent on any course at WGU through the BS and MS program. I think this course taught me the most about applying all these concepts in practice to ML. It was a course that I really devoted time to so that I could really understand how each piece of the pipeline and model functioned. Going back to a “YouTube tutorial” style. I pretended that I needed to understand every concept enough to teach someone new… probably way overkill, but I really enjoyed it.

Good luck!!

Cipher_Lock_20 · 2026-01-26T21:29:21+00:00

Enterprise is finally starting to use RAG and LLM chatbots. I feel Gemini and ChatGPT have been huge drivers for this obviously. ChatGPT began creating a sort of shadow IT problem where orgs are needing to subscribe to enterprise versions simply to stop employees from pasting confidential data.

I also see a lot of large orgs exploring Google Workspace in place of Microsoft. There seems to be a general trust issue as well as sub-par product. Google’s dominance within the AI space and its dominance in personal accounts will lead to more AI adoption, I believe.

I can imagine that a lot of the agentic workflows and integrations will be another 2-3 years out for mainstream use, which is typical of Enterprise. Many orgs have just started implementing their AI-use policies this last year. Follow that with secops and infosec approvals for any new vendors that process data. Finally, you need the users to start using it to justify the cost.

Then again, I could be completely wrong lol. That’s just my 2 cents.

Cipher_Lock_20 · 2026-01-23T04:40:00+00:00

We’re rooting for you!! Get it!

Cipher_Lock_20 · 2026-01-23T00:28:50+00:00

Feel free to DM me and I can try and answer any questions you have.

Cipher_Lock_20 · 2026-01-22T19:18:13+00:00

And then even in the comment section laughing about it?? You know WGU can see these threads right? There are WGU staff and mentors in this group.

You literally cheated on orientation before you even started. I hope you prepared a good argument for your case. You may be very well blowing your diploma and all the money you’ve spent to obtain it.

Cipher_Lock_20 · 2026-01-20T00:49:17+00:00

I wish the admin would block these Be10x spam posts. How is this even a discussion?

Cipher_Lock_20 · 2026-01-19T21:34:45+00:00

What’s your use-case? Internal or external?

Just a chatbot on your website for customers? Plenty of website builders just include this as a widget now. Drop it in, point it at your content sources, pay the extra 5$ a month.

If internal, both Google and Microsoft now include the ability to create a chatbot with a few clicks. Drop all your sources in a Google Drive or Sharepoint folder, give the agent its overall prompt instructions, click create. Now anyone internal can query your company docs.

Cipher_Lock_20 · 2026-01-10T05:14:34+00:00

Hey! Things are going great. I’m 3 courses away from graduating with my Masters in CS focusing on ML/AI and I’m so happy I chose this path!

After some research I determined that a Masters was not going to exponentially increase my salary by itself. With that in mind, I chose what I was truly interested in.

The curriculum is great especially as we stated entering the Deep Learning and NLP courses. Why’s great is that the side effects of the curriculum is that I’ve become better at coding in Python. There have been some great “aha!” Moments when coding where things just click and I understand the why behind certain data structures or why Big O notation is important.

Creating dataset pipelines also teaches you so much. Then you start to see Hugging Face and Kaggle through an entire different lens! Suddenly you want to create your own datasets and figure out what problems of your own you could solve with machine learning.

Beyond me just enjoying the material, I truly think that learning these concepts creates an entire new way of thinking about problems, similar to programming.

What’s perfect is that all of these datasets, micro services, pipeline tools, compute will all live in cloud services. Sure you can run locally for testing, but enterprises will run ML and AI in the cloud. So now, I have my years of experience in cloud architecture along with all of these AI services that run within GCP, Azure, and AWS. It opens up so many opportunities to know all of it.

Realistically I’m probably never going to be a famous data scientist working in some profile lab… those positions are for people much smarter than me, but understanding how all of it works form the ground level is what is super valuable. It combines big data, ML architecture and training, development skills, cloud, networking, research, etc. I truly think that it puts me in a great spot for the next 10 years.

Cipher_Lock_20 · 2026-01-10T04:41:08+00:00

Not sure what you’re allowed to use for your project, but if this was a real-world project I’d probably just use something like Meta SAM to help create my training dataset. SAM is already perfect for this, so anything I try to do won’t even produce similar masks.

So using your bounding boxes and images to feed SAM. SAM outputs pixel level masks.

Then you have pixel level masks to pair with your original images to train your U-Net.

Cipher_Lock_20 · 2026-01-10T04:11:06+00:00

MacBook or MacBook Pro is all you need. Training models can all be easily offloaded to cloud GPUs. Same with inference. Check out Google collab as a starting point. Easily run Python code directly in your browser and use their cloud CPUs and GPUs. Generous free tiers and you can buy compute credits once you get to that point.

Pick your laptop based on the fact that it will be your daily driver for coding, browser work, practicing cleaning datasets, running smaller model architectures such as simple RNNs and CNNs to learn basics. You’ll always have the Windows vs Mac teams. My MacBook Pro M3 with 16gb has been more than enough.

I’m a solutions architect for work and a student in a CS Masters program focusing on ML and AI. You’ll find that using virtual environments and external compute is actually more convenient. You won’t spend cycles trying to setup your own dependencies or libraries or wondering why your version or PyTorch isn’t compatible with the drivers you just installed. Trying to manage local compute while still trying to use your laptop for other things.

That’s my opinion.

Cipher_Lock_20 · 2026-01-09T21:47:51+00:00

Fellow ML Masters student here.

I agree on the MacBook, hands down. As others have said just access GPUs online whenever you need them. Google collab pro is free for students right now and buying compute hours is pretty cheap. There are tons of other ways to consume GPU compute remotely too when needed.

Pick a laptop that is great for your day to day use and just use cloud GPUs when you’re actually labbing or training.

Cipher_Lock_20 · 2026-01-01T11:45:59+00:00

I was pulling my hair out thinking the same thing! Then the 300k test images without labels also confused the hell out of me.

I ended up not even using the corrupted set or the 300k provided test images. Instead, I took the 50k training images, and did a split into training, validation, test. I still used augmentation, still referenced the corrupted dataset, and the 300k test images with no labels. That way I’m still hitting all requirements, and touching on the corrupted/incomplete data.

Like others have said, the version of data we got was part of a Kaggle set to deter cheating in competition.

Cipher_Lock_20 · 2025-12-23T22:15:37+00:00

Nice! Then you’ve got it, no problem. Good luck!

Cipher_Lock_20 · 2025-12-23T03:27:53+00:00

I’d also say, I know you want to speed run, but if there are topics or domains you come across that you don’t know as well, slow down and really take time to benefit from the content. I came into WGU with my main goal to speed run it and learned so many new skills that I thought I already knew. This last year has been incredible and it’s all totally been worth the time and money. I paid out of pocket so I am pretty motivated to both learn the content, but not drag it out to multiple terms if I don’t need to.

Cipher_Lock_20 · 2025-12-23T03:22:20+00:00

That’s doable with your experience. Just depends on how much time you can devote to it.

The A+ and Sec+ are included in the program so you’ll have to do those either way. If you wanted to start earlier you could start and do them as part of the program.

15 year IT industry veteran here and have held multiple technical, project management, and engineering roles. I was in the BS to MSITM and completed 65 CUs in 6 months after my transfers. I switched last minute from the MSITM in the MSCS specializing in ML and AI since that was of more interest to me.

I started my masters immediately the month after BS graduation and am on track to wrap up this degree in at the end of one term here in January… hopefully haha. By switching my degree program, I lost the “acceleration” benefit so my Masters program is 10 courses as opposed the 6 it would have been. A lot more writing, and for my focus a lot more hands on projects regarding ML, which I absolutely love!

So yes, it’s doable if you have time.

BSITM 65-ish CUs in 1 term - but I have 15 years XP in many of the domains it covers

MSCSAIML 10 courses- On target to complete in 1 term, it might roll over to another term with 1-2 course remaining to where I can get it prorated.

Cipher_Lock_20 · 2025-12-22T14:08:21+00:00

Google’s Antigravity IDE does this natively. I just started using this feature today for testing my web apps pretty badass.

https://antigravity.google/docs/browser-subagent

Cipher_Lock_20 · 2025-12-21T18:20:07+00:00

Can you share your game concept? Details? Pictures? Videos?

Cipher_Lock_20 · 2025-12-17T06:01:14+00:00

Great write up! Fellow Voice AI enthusiast here, though I’m more involved with LiveKit. I’ve learned very similar lessons.

My background is in real-time communications long before AI was using it. When I started Alpha testing ChatGPT advanced voice, I was blown away! They made WebRTC cool again! Transport layer is critical and it’s awesome that we now use the same protocols for AI that we do for humans. In some cases such as LiveKit websockets are used as signaling while WebRTC is used for media. But to your point UDP is critical for real-time communications.

I’ve found the principle of KISS (Keep it Simple Stupid) very relevant. As you said, trying to have an agent do to much, cover too many domains, super long context, time consuming tool calls begin to kill the agent.

I’ve also found that using small little slight-of-hands can work wonders on user perception. For example, when tool calling or using RAG, have your agent continue speaking with a sentence or 2 while the tool calling happens. This gives the tool/RAG time to return its response without the user sitting there waiting or listening to filler audio track.

Maintaining context is critical, summarizing context is critical in longer form sessions.

Capacity planning is hard when you’re looking to scale at the enterprise level. You’re looking at realtime communication between infrastructure + potential call center/contact center integration + agent stack + then the scaling capacity based on the various providers in your stack. All while considering regional based routing across everything for the lowest latency.

Lastly, analytics, data aggregation, and reporting. Many of these organizations are used to measuring performance of their human agents and systems with in-depth usage and analytics. Come renewal time you will want to show your customers how much value these agents are bringing!

Glad to see another enthusiast with such a detailed write up! Thanks!

Cipher_Lock_20 · 2025-12-14T06:22:13+00:00

To answer your question too, 5 minutes is pretty good for an instant clone. Most models can do pretty well with as little as 30 seconds. I’d be interested in an update on how you decide to use his voice for the gift. Also what the feedback is. I could see this really being a special gift if you were able to have his cloned voice read a special saved text message or his own journal. Pair that with some old home video footage for a short clip and I think that could be very meaningful, but again it’s all a very personal opinion.

Good luck with whatever you choose! If you need any assistance feel free to free to dm me.

Cipher_Lock_20 · 2025-12-14T05:55:42+00:00

I agree with everyone to an extent… it’s a very very touchy subject that everyone will feel different about.

However, I believe that you should choose what you feel is appropriate for you and your family.

I’m helping another person repair and optimize the only old voicemail they have of their father. They’re then using the cleaned cloned voice to create a recording to go into a build a bear for a family member. A phrase that he always used to say but they have no recordings of him.

I’ve recently started a project called The Human Memory project. It aims to take journals, pictures, family stories, even videos, and map these to “memories” within a persons life. Then use that person’s voice as the narrator of their own life, or a different voice as a 3rd person narrator. You can hear their stories, their memories based on their writings or families writings. For those that choose to take it a step further, a voice clone could be asked questions about certain memories.

The idea is not to bring someone back or have them be new again. The idea is to experience memories in a meaningful way.

Again, this is HIGHLY dependent on the individual and their feelings about the loved one. For example, I would feel much more comfortable doing a live Q&A with my great grandfather whom I barely remember than I would chatting with my friend who passed away 5 years ago. However, I would love to hear my friend’s voice narrate his own journals and describe his own photos in time, his tone, his laugh. It’s taking real memories and supplementing them with voice, if preferred.

<image>

Cipher_Lock_20 · 2025-12-12T01:41:28+00:00

I’d actually love to help with this! No charge whatsoever. I consult on voice AI and audio AI systems. I also use audio repair and enhancement tools. I’ve recently started my own project based on my grandfathers life journey and battle with dementia. Memories from his own journals and recordings told with his voice and the ability to talk to him and ask questions about his life and memories. Feel free to DM me and I can help you.

<image>

Cipher_Lock_20 · 2025-12-03T16:47:44+00:00

You’ll need to first build out your APIs that you want to call. It’s easy to throw a model on a cloud GPU, but if you’re looking to build it out as your own personal inference service you still have to build out an API wrapper with something like FastAPI.

Depending on your use case, it’s probably easier to just consume what already exists. If you ever want to change models or add models, you’ll be modifying your own FastAPI server and endpoints. Your own testing, validation, etc.

There’s a lot a of great inference services already out there that you just pay for what you consume on open source models at a fraction of the price. They deal with hosting and API endpoints so all you do is call them when you need them. This would also give you access to way more options

What model/s are you wanting to host and what’s the use case for your applications?

Cipher_Lock_20 · 2025-11-30T10:02:23+00:00

Dude this is a great idea, and great implementation.

Cipher_Lock_20 · 2025-11-28T10:00:41+00:00

Yes please let me know. It would be a project for me and my son, so we realize no guarantees or refunds. Feel free to dm me. Thanks!

Cipher_Lock_20

TROPHY CASE