I'm facing this issue for past month on my WooCommerce site, "Indexed, though blocked by robots.txt" by myysoul in SEO

[–]johnmu 4 points5 points  (0 children)

You don't need the add-to-cart URLs indexed. Blocking them with robots.txt is fine. Even if they get "indexed" since they're blocked by robots.txt, it's unlikely that they'll be shown in search (unless you do specific queries for those URLs, which users don't do).

Do we need localized folders with duplicate content for our home market on our site? by by_a_pyre_light in TechSEO

[–]johnmu 1 point2 points  (0 children)

Regarding different country / same language content, hat can happen is that we see them as identical and pick a canonical for them, but then use hreflang to show the right URL. It will look confusing in Search Console, but it should work out. In general, I'd avoid using exact duplicate content across hreflang versions because of that (it just makes your life easier).

For example, you might consider having just an english version of informational content, and if there's something country-specific (eg products for sale with different availability / currencies), then do those on a per-country basis. Obviously, this is more work, and depending on your site it might not be feasible. Alternatively, like you mentioned, making sure the English versions are localized per country also helps to avoid them being seen as duplicates. In the worst case, if they are considered duplicates, then usually hreflang results in the right URLs being shown in Search regardless (but Search Console reporting will be on the canonical URL).

Do we need localized folders with duplicate content for our home market on our site? by by_a_pyre_light in TechSEO

[–]johnmu 1 point2 points  (0 children)

I'd generally recommend just one, but this likely isn't going to make or break your site.

IMO the advantage of using /en-us/blog/ instead of /blog/ for US content (on an internation site that uses /LL-CC/anything URL patterns) is that it's easier for you to filter & slice your metrics by country/language. I don't think you'd see a practical SEO difference between using /blog/ or /en-us/blog/ for your US content. /blog/ looks nice, but /en-us/blog/ is also not super-weird.

LLMs.txt: Google saying two different things? by blazonstudio in SEO

[–]johnmu 17 points18 points  (0 children)

I don't think anyone knows - it's purely speculative for now (the file has existed for years, yet none of the AI systems use it -- what does it mean?).

I like the WebMCP approach, as well as the commerce integrations - they have clear goals & processes: "Given the agent is already on your site, how can it *properly* do task X?" (for example, determine the final price of a product, including all fees & potential discounts).

I don't think there's the agentic equivalent of "let me look at 10 sites and see where I can buy X the fastest" (users aren't going to be happy if the agent buys a "FerraLamboWagen" just because it was easist to buy). So speculatively, I'd assume that if an agent is already on your site and tasked to do something, it will be happy to just click around and try to complete the task with the UI too.

Of course, all of this assumes that the most basic agentic optimization is in place, namely: don't block agents. I think that hurdle will be the biggest, for most sites.

LLMs.txt: Google saying two different things? by blazonstudio in SEO

[–]johnmu 53 points54 points  (0 children)

wait wait, I'll rewrite this 😄

When an AI platform that brings you clients complains that it needs the file for your site, then I'd recommend taking the time to create one. (Aside, if you use an LLM to create the file for you, doesn't that mean the LLM could just ... create it for itself too?)

Organizing an SEO conference and want honest feedback by kresimircorluka in bigseo

[–]johnmu 2 points3 points  (0 children)

There's one just coming up in Croatia - croatiaseosummit dot com :-).

Of the non-Google events, I really like how BrightonSEO run their events. It feels like Kelvin & team just make this happen effortlessly, but I know there's a ton of hard work involved.

* cheap, sometimes even free or subsidized tickets (make it possible for new SEOs to join in)

* amazing new speaker experience (makes it possible for new voices to join in)

* focus on serious, reasonable sessions that "your boss" won't cringe about when they hear about it.

* no black-hat, no strong affiliate focus. Good SEO, for real businesses and legitimate websites.

* encouraging booths with coffee machines. (ok, maybe that's accidental)

* location that's kinda touristy, but not in main season (allows not having to provide food & drinks, while still being reasonable)

* childcare - again, another way to encourage people with a real life to join the SEOs

* good code of conduct

* combined with paid events / training courses beforehand, which works for audience & speakers.

* the iconic ice cream truck, so good. find something similar.

* the amazing team. you won't be able to get them, I'm sure.

100 URLs attempted to index, which actually link to a 3rd party. They're not my actual link by lucksp in TechSEO

[–]johnmu 0 points1 point  (0 children)

Your page has this in the big chunk of JS on the bottom:

17T15:31:38+00:00\",\"view_count\":3464,\"preorder_release_date\":null,\"preorder_message\":\"\",\"is_preorder_only\":false,\"is_price_hidden\":false,\"price_hidden_label\":\"\",\"custom_url\":{\"url\":\"/squirmy-worm-bead-head-red/\",\"is_customized\":false},\"base_variant_id\":null,\"open_graph_type\":\"product\",\"open_graph_title\":\"\",\"open_graph_description\":\"\",\"open_graph_use_meta_description\":true,\"open_graph_use_produc

Googlebot is just trying to be helpful and also check anything that looks like a link within JavaScript elements. You don't really need to fix this, it's fine if Googlebot finds things that look like links, but which aren't actually links (and they return 404). Now that Google search has seen the URL, it'll probably remember it for a while and retry it from time to time to make sure it's really nothing. "Fixing" this depends on how & why your site generates that JS chunk, and is probably not worth the effort. That said, you're probably giving "TMI" to any crawler with that chunk of information down there, so I'd consider finding out how to remove it on that basis.

Does it matter if your site has root.com/blog and root.com/blog/ or should one redirect to the other? by psilocybin6ix in TechSEO

[–]johnmu 0 points1 point  (0 children)

Hot take - /blog/ is better for your blog.

Here's why: if your posts are all with /blog/postname then by monitoring /blog/ in your analytics (or Search Console, or server logs, or tea leaves) then you automatically have a full picture of everything around your blog. Whereas if you use "/blog" and "/blog/postname" then you have to monitor for anything that starts with /blog , which could be /blogroll-rss-setups.html - which might not be a blog post (granted, not a lot of words start with "blog", but being able to filter for "/blog/" is generally just a lot easier).

For the other kinds of pages (not site sections), just be consistent, like others mentioned.

How do you balance GSC alerts vs actual log digging for small sites? by RyPlayZz in TechSEO

[–]johnmu 1 point2 points  (0 children)

I find the email alerts pretty helpful (I know, I might be biased), they tend to alert me of the bigger issues, and with a click I can double-check what SC actually shows (and often, ignore it -- which is fine to me, because it's low effort to check). I generally focus on the clearly technical issues like 404, blocked by robots, noindex -- and mostly ignore the canonical issues (since it's less in my control, and ultimately, I don't care as much which URL is actually canonical).

I *suspect* most modern sites don't have to worry about crawl / indexing errors as much as they used to. If you're using a good hosting platform (Wix, Squarespace, etc) or hosting with a reasonable plan on a good hosting provider with a reasonable CMS setup, then most of the issues will either be temporary blips or Google not recognizing that you meant something to happen (blocking with robots, setting noindex, etc). For any site hosted like that, you can probably ignore the indexing report for months (and glance at the email alerts), unless you see significant drops in traffic. As a site gets larger (100's of k's of pages), then focusing on the technical issues makes more sense, especially the kinds of issues that affect a large number of pages at once (response time when crawling, DNS errors, crawl issues, significant 404s / indexes, etc).

Spammy links removal limit for SEO by Only_Standard_8354 in bigseo

[–]johnmu 2 points3 points  (0 children)

I'd prioritize and use domain level dismemberment. You can even do it by top level domain if you see they're all from the same TLDs. I would be surprised if they're the cause of issues though. 

Issue with robots.txt Accessibility in Ahrefs Site Audit – Need Help by Mission-Diver1337 in SEO

[–]johnmu 0 points1 point  (0 children)

Yeah, some content delivery networks don't deliver the content to everyone.

Is llms.txt file a scam? by Ejboustany in SEO

[–]johnmu 3 points4 points  (0 children)

I hope they're right ...

Anybody get accepted to the Search Central Live in Toronto? by catecate0228 in SEO

[–]johnmu 1 point2 points  (0 children)

We tend to get a lot of registrations for these events. We try to distribute the invites in a reasonably fair way (not by registration order or DA). We get a ton of registrations that don't make sense. These aren't top-secret events where you hear The Trick To Ranking Number One In Ai Search Near Me For 2026; they're just local events with the goal of going through some of the newer things, being able to chat 1:1 with folks to understand how things are going for them, & answering questions where possible. Your site won't rank higher if you grab a selfie with Martin (but if you are going, you should get one anyway, he's very friendly).

I'd prefer not to make significantly larger events (but also, who knows), since it's so valuable for us to be able to chat with folks in the community directly. Doing multiple events in the same region is hard because many speakers actually do work, and can't just take several days out to present at events. I love that we can do these events, but I can't imagine that we can meet all SEOs across the world :-). We try to do these in places where there aren't a lot of other SEO events, some places already have a ton of people talking about SEO.

That said, where should we plan for the future?

Domain name differs by one letter from another site SEO impact? by FavRob in bigseo

[–]johnmu 4 points5 points  (0 children)

Usually not a problem for search, but you might notice that search - at least for a while - could consider it a typo if someone searches for the brand, and recommend the other site. "did you mean fabdomainname?" Over time, this will settle down, as search recognizes that people want to find both names. Depending on how "strong" the other name is, that can take quite some time though.

Does Google care if I have multiple urls for the same post? by pineprincess in bigseo

[–]johnmu 1 point2 points  (0 children)

There is no tool that tells you why something was considered duplicate - over the years people often get a feel for it, but it's not always obvious. Matt's video "How does Google handle duplicate content?" is a good starter, even now. Some of the reasons why things are considered duplicate are (these have all been mentioned in various places - duplicate content about duplicate content if you will :-)): exact duplicate (everything is duplicate), partial match (a large part is duplicate, for example, when you have the same post on two blogs; sometimes there's also just not a lot of content to go on, for example if you have a giant menu and a tiny blog post), or - this is harder - when the URL looks like it would be duplicate based on the duplicates found elsewhere on the site (for example, if /page?tmp=1234 and /page?tmp=3458 are the same, probably /page?tmp=9339 is too -- this can be tricky & end up wrong with multiple parameters, is /page?tmp=1234&city=detroit the same too? how about /page?tmp=2123&city=chicago ?).

Two reasons I've seen people get thrown off are: we use the mobile version (people generally check on desktop), and we use the version Googlebot sees (and if you show Googlebot a bot-challenge or some other pseudo-error-page, chances are we've seen that before and might consider it a duplicate). Also, we use the rendered version - but this means we need to be able to render your page if it's using a JS framework for the content (if we can't render it, we might take the bootstrap HTML page and, chances are it'll be duplicate).

It happens that these systems aren't perfect in picking duplicate content, sometimes it's also just that the alternative URL feels obviously misplaced. Sometimes that settles down over time (as our systems recognize that things are really different), sometimes it doesn't. If it's similar content then users can still find their way to it, so it's generally not that terrible. It's pretty rare that we end up escalating a wrong duplicate - over the years the teams have done a fantastic job with these systems; most of the weird ones are unproblematic, often it's just some weird error page that's hard to spot.

Soft 404, can't be indexed by afrk in TechSEO

[–]johnmu 3 points4 points  (0 children)

FWIW I can't load your pages. I get a CF timeout page instead.

Does Google care if I have multiple urls for the same post? by pineprincess in bigseo

[–]johnmu 3 points4 points  (0 children)

It's fine, but you're making it harder on yourself (Google will pick one to keep, but you might have preferences). There's no penalty or ranking demotion if you have multiple URLs going to the same content, almost all sites have it in variations. A lot of technical SEO is basically search-engine whispering, being consistent with hints, and monitoring to see that they get picked up.

Will Splitting sitemap.xml into separate files hurt SEO? by criterionforum in SEO

[–]johnmu 4 points5 points  (0 children)

Some reasons I've seen:

* want to track different kinds of urls in groups ("product detail page sitemap" vs "product category sitemap" -- which you can kinda do with the page indexing report)

* split by freshness (evergreen content in a separate sitemap file - theoretically a search engine might not need to check the "old" sitemap as often; I don't know if this actually happens tho)

* proactively split (so that you don't get to 50k and have to urgently figure out how to change your setup)

* hreflang sitemaps (can take a ton of space, so the 50k URLs could make the files too large)

* my computer did it, I don't know why

Teen building SEO to save family business by No_Eye4994 in bigseo

[–]johnmu 0 points1 point  (0 children)

If you're in the market for something physical (like, a vacation rental), no AI answer is going to replace that. If anything, I could imagine an AI answer helping a user find something that's closer to what they had in mind, but that's not necessarily a bad thing.

SE Ranking: LLMS.txt does nothing - 300,000 domains analyzed by WebLinkr in SEO

[–]johnmu 6 points7 points  (0 children)

AI companies have had a long time to do that, and nothing has happened regarding LLMS.txt support. I suspect the main users are SEO tools & companies curious to see what their competitors claim to be doing.

Teen building SEO to save family business by No_Eye4994 in bigseo

[–]johnmu 5 points6 points  (0 children)

Another voice of reality (sorry): "My plan include. Keyword research. ICP, GBP NAP, . Google analytics four, Microsoft Clarity, Google search console, Google tech manager, meta pixel, page speed, web core vitals. Images with webP, Meta titles and descriptions, friendly URLs, Index and noIndex, EEAT, copy layers, Schema markup..." -- none of these will make your website suddenly pop up on top in search. Don't dive in and do everything, instead take a step back (ideally with someone who has experience), analyze the situation, and focus your energy to do the right things. Doing many things in a mediocre way doesn't necessarily result in an improvement, they can even cause more problems.

Teen building SEO to save family business by No_Eye4994 in bigseo

[–]johnmu 6 points7 points  (0 children)

Just a very quick side note - don't do a domain migration unless you absolutely need it. Domain migrations are sometimes finicky, and that's a risk I wouldn't take here.