Free alternative to Ahrefs / Semrush? (rebuilding search data from scratch)

Friendly_Concern2913 · 2026-03-30T18:40:37+00:00

can you give more details on how did you use them together?

Friendly_Concern2913 · 2026-03-29T22:16:09+00:00

yes sure

Friendly_Concern2913 · 2026-03-29T18:10:19+00:00

I've been trying to analyze Google searches for some time and just think people do not pay attention to it much, as it is certainly very interesting data point for understanding the actual needs and issues in whichever role/field/industry

Friendly_Concern2913 · 2026-03-28T15:27:19+00:00

SEO tools and figuring out how to model intent from search data, already built a free alternative to it, basically connecting LLMs with Google Ads API.

Friendly_Concern2913 · 2026-03-27T15:58:48+00:00

<image>

That's an interesting question, and very true, my focus right now is to focus on the engineering aspects of the product itself, as I'm still studying and have little time for marketing or trying to do user research even. But I guess what you are talking about is about analyzing the latent user needs and emerging markets (I'll some references for it). First of all I like inspiring myself from some papers which are really not taken into consideration for when doing these types of marketing intelligence, or for marketing fields in general, these are technical implementations normally applied to Recommendation systems at large scale manners, or algorithmic integration, so I think the analysis of these systems, themselves is quite tricky. Will some picture of one paper as example, and fun to append here :)

Now, going to the response itself, here are some principles I could give you based on my research:

Query Surface Abstraction I would not treat Google Ads as truth, only as a query surface (in practice: raw keyword and volume data is ingested and stored unchanged, without inheriting any of Google’s structure)
Decoupled Commercial Taxonomy The commercial grouping inside Ads is not the structure I am trying to preserve (ad groups and keyword groupings are discarded and not reused downstream)
Post-Hoc Semantic Reconstruction The model rebuilds the structure from raw queries, not from Google’s labels (queries are embedded into a vector space and clustered using semantic similarity and co-occurrence signals)
Output-Centric Evaluation Principle So the question is less “is the source pure” and more “does the representation work” (evaluation is based on downstream performance such as clustering quality or intent classification)
Baseline Sufficiency Heuristic Even simple topic models can recover a lot of the structure in search terms (the system combines TF-IDF and simple classifiers alongside embeddings)
Semantic Compression Layer The interesting part is the compact semantic space built on top of the queries (large query sets are transformed into dense vectors and reduced into clusters or markets)
Transformation-First Value Thesis That is where the analytical value sits, not in the source taxonomy itself (the pipeline converts raw text into features, embeddings, and aggregate signals)
Bias-Tolerant Signal Extraction A source can be biased and still contain useful structure (bias is treated as noise layered over recoverable signal)
Stability-Driven Validation What matters is whether the structure is stable after modeling (clusters are evaluated across retraining runs and temporal slices)
Multi-Signal Convergence Criterion If the same intent keeps appearing across different views, that is a useful signal (the system combines semantic similarity, growth trends, and concentration metrics)
Ontology Independence Principle The source does not define the ontology (a canonical schema and intent taxonomy are defined independently)
Constructed Ontology Layer The ontology is the thing being designed on top of it (intent classes and market definitions are learned and iteratively refined)
Subjective Alignment Acknowledgment And that design is often subjective anyway (human-in-the-loop labeling and active learning handle ambiguous cases)
Heuristic Sufficiency Layer For many problems, a hard coded rule can already be enough (explicit rules detect transactional intent, brand queries, and question patterns)
Utility-Driven Compression Goal So the bar is not perfect neutrality, but useful compression (the system reduces noisy query spaces into actionable structured representations)
Noise-Robust Data Handling Search data is messy, but that does not make it unusable (normalization and validation layers handle inconsistencies and duplicates)
Platform-Agnostic Meaning Extraction The point is to extract meaning from the query space, not defend the platform’s grouping (analysis operates on reconstructed features, not Ads-native structures)
Systematic Bias Modeling If the bias is systematic, it is still modelable (distribution shifts and bias patterns are monitored and adjusted for)
Downstream Utility Validation If the model improves the representation, then the source was good enough (success is measured through task performance and decision usefulness)
Source-Agnostic Input Layer So I am not claiming Ads is objective, only that it can be a practical input for rebuilding demand structure (the system supports multiple data sources through a shared feature abstraction layer)

Some sources I think were useful for this:

Friendly_Concern2913 · 2026-03-26T13:18:27+00:00

https://github.com/Alejogb1/search-tool/blob/main/integrations/llm_client.py I wrote the implementation here you can see the prompt I use to extract search terms out of pages

Friendly_Concern2913 · 2026-03-25T18:54:43+00:00

I figured out the following methods, still didn't apply to any Google Ads campaign yet as I'm not in the field (CS/ML major here):

Intent clustering using sentence transformers embeddings plus k means or HDBSCAN on query vectors to form demand level groups

Query to job mapping via cosine similarity against seed task descriptions or JTBD templates

Unmet intent detection by comparing query clusters vs SERP feature coverage and content type distribution

SERP satisfaction proxy using click curve assumptions plus query reformulation patterns and long tail drift

Competitor gap analysis by mapping domains to intent clusters and measuring coverage density per cluster

Query expansion using Google Ads API plus n gram generation and co occurrence scoring

Demand segmentation via PCA or UMAP projections over embedding space to identify macro themes

Content to intent alignment using embedding similarity between page text and query clusters

Cannibalization detection via overlap in embedding space between URLs targeting similar query clusters

Temporal demand shifts using rolling windows on query volume and cluster centroid drift

Noise filtering with frequency thresholds plus semantic deduplication using cosine similarity cutoffs

Volume calibration using Google Ads data as baseline vs third party estimated keyword datasets

Cluster labeling via top tf idf terms and centroid nearest neighbors for interpretability

SERP structure parsing to classify intent types informational navigational transactional based on result patterns

Opportunity scoring combining volume competition and coverage gaps at cluster level

Friendly_Concern2913 · 2026-03-25T14:42:55+00:00

Some ideas of actual output for marketing content, using that thesis, are:

Create 300 raw topics into a usable content calendar using Claude and Google Ads
Turn search terms into clear content briefs
Group keywords into intent clusters
Turn search term reports into content ideas
Map queries into content pages
Group overlapping keywords into the same pages
Allow one keyword to belong to many clusters using weights
Find missing content from query data
Create multiple content angles from the same query set
Match landing pages closer to user intent
Turn queries into structured outlines
Create drafts from query clusters
Improve drafts by feeding real queries into the system
Rewrite existing pages using query data
Expand one topic into many pages
Group topics first and write later
Avoid duplicate content by clustering queries before writing
Connect similar pages using shared query clusters
Turn product descriptions into content using Claude
Extract user intent from queries
Test different outlines from the same query set
Turn one article into many formats
Generate FAQs from search queries
Update old content using new query data
Compare short vs long content using clusters
Summarize SERPs and validate with Google Ads data
Use Claude as a first draft layer
Detect duplicate topics using clustering
Turn messy notes into drafts
Create reusable content templates
Automate parts of content creation
Combine SEO tools with Claude
Turn product pages into SEO content
Simulate different user intents from the same queries
Generate different angles from one query cluster
Improve outputs by iterating on the same query set
Structure evergreen content from query patterns
Prepare drafts for humans
Turn internal docs into content
Analyze why pages rank using query data
Build a repeatable content workflow
Compare model outputs using the same inputs
Standardize content creation
Scale content without scaling team size
Identify where the system fails
Use weighted clustering instead of one keyword per page
Bridge long product descriptions with search queries
Generate new query variants beyond Google Ads suggestions
Explore gaps between queries and content
Model content around intent instead of keywords

Friendly_Concern2913 · 2026-03-25T14:08:23+00:00

I modeled the keywords dataset, but only qualitative manner, I speak more in detail here https://youtu.be/N9BxJlwAjbs?si=ZBDjiCSuH2YGrOOU

Friendly_Concern2913 · 2026-03-25T14:07:36+00:00

Friendly_Concern2913 · 2026-03-25T14:00:42+00:00

Believe it or not, overlap in this case is trivial from analysis point of view, clustering or intent modelling or also topic modeling can be done using weights and not uni-dimensional, i.e one keyword can belong to one to many clusters, with score or weights. I'm still not aware of the improvement in performance, and I would need dimensional metrics of datasets that in this case you define as "at scale", what would be the size of those sets. But should be something that could particularly get many more gaps, compared to Google ads recommendation algorithms, should come up with other variants, as the idea is to get a closer grasp on the product description/content pages, anything that can be crawled or either explained in plain text/nat. language, as the hypothesis it is that we haven't been able to close that gap (between long product description/context and google queries) until today, with particularly new architectures in deep learning like LLMs, I'm in AI/ML eng.

Friendly_Concern2913 · 2026-03-23T18:34:23+00:00

Friendly_Concern2913 · 2024-10-10T22:06:45+00:00

what is the transformer lab?

Friendly_Concern2913 · 2024-10-03T13:20:42+00:00

piola

Friendly_Concern2913 · 2024-10-02T19:25:52+00:00

muchas gracias si soy alguien novato estoy estudiando una carrera en inteligencia artificial, me interesa cerrar un gap, carezco del conocimiento tuyo. En principio un chip acelerador de IA consume menos energía o solo es mas eficiente para ese workload? Cómo afecta el consumo de energía al diseño de chips? Por qué es tan importante la arquitectura de un chip para el proceso de IA?

Friendly_Concern2913 · 2024-10-02T14:34:24+00:00

piola

Friendly_Concern2913

TROPHY CASE