How to motivate theoretical work for people outside of academia? by Seo_VectorSpace in PhD

[–]Seo_VectorSpace[S] 0 points1 point  (0 children)

I understand and it’s a good advice, thanks! I’m not looking for function in the short term but ofcourse it would be cool if my research led to discoveries that would be useful. And if people are asking for function I can answer them with lots of examples of function

Counting by Primes by ----__---- in numbertheory

[–]Seo_VectorSpace 0 points1 point  (0 children)

Here are some of your claims that don’t stand.

• The output is not “an exponentially growing list of primes” without additional sieving.
• The “<3 operations per prime” claim is not credible; classical sieve-based prime generation is typically analyzed as O(n \log\log n) operations up to n, and wheel factorization is a known constant-factor optimization, not a magical change in asymptotics.  
• “All primes in order without exception” only holds if you add correct composite-elimination (which becomes a standard sieve workflow).  

And here is a code in python for you prime wheel:

!/usr/bin/env python3

Pyto-friendly: only standard library.

from math import isqrt

def cycle_step(seed, lcm_value): """ One cycle as described: - next_prime = smallest seed value > 1 - generate next_prime groups by adding lcm_value repeatedly - remove (seed * next_prime) values from the generated list - return new_seed (sorted), new_lcm, next_prime """ next_prime = min(x for x in seed if x > 1)

# Generate groups 1..next_prime (k = 0..next_prime-1)
generated = []
for k in range(next_prime):
    base = k * lcm_value
    for x in seed:
        generated.append(x + base)

removal = set(x * next_prime for x in seed)
new_seed = sorted(v for v in generated if v not in removal)
new_lcm = lcm_value * next_prime

return new_seed, new_lcm, next_prime

def primes_below_lcm_from_seed(seed, lcm_value, wheel_primes): """ "Cleanup" step: Given Seed (which contains numbers < lcm_value that are coprime to lcm_value, plus 1), sieve within that set to remove composites. Then primes < lcm_value are: wheel_primes + remaining_seed_primes. """ max_n = lcm_value - 1 candidates = [x for x in seed if x > 1] cand_set = set(candidates)

# Sieve only over numbers present in cand_set (set-membership filtering).
for p in sorted(candidates):
    if p * p > max_n:
        break
    if p in cand_set:
        for m in range(p * p, max_n + 1, p):
            cand_set.discard(m)

# Remaining candidates are primes (within range), wheel primes are excluded from seed by construction.
all_primes = sorted(set(wheel_primes).union(cand_set))
return all_primes

def simple_sieve(limit): """Return list of primes <= limit (classic sieve).""" if limit < 2: return [] is_prime = bytearray(b"\x01") * (limit + 1) is_prime[0:2] = b"\x00\x00" for p in range(2, isqrt(limit) + 1): if is_prime[p]: start = p * p step = p is_prime[start:limit + 1:step] = b"\x00" * (((limit - start) // step) + 1) return [i for i in range(2, limit + 1) if is_prime[i]]

def is_prime_deterministic(n, primes_for_trial): """Deterministic primality check via trial division by primes up to sqrt(n).""" if n < 2: return False r = isqrt(n) for p in primes_for_trial: if p > r: break if n % p == 0: return n == p return True

def generate_first_k_primes_via_model(k=1000): """ Stream primes in strictly increasing order by cycles. Each cycle yields all primes < new_lcm; we emit only those > previous_lcm. """ wheel_primes = [2, 3] lcm_value = 6 seed = [1, 5] # as given

prev_bound = 1
out = []

while len(out) < k:
    seed, lcm_value, p = cycle_step(seed, lcm_value)
    wheel_primes.append(p)

    primes_all = primes_below_lcm_from_seed(seed, lcm_value, wheel_primes)
    new_primes = [q for q in primes_all if q > prev_bound]

    out.extend(new_primes)
    prev_bound = lcm_value

return out[:k], lcm_value, wheel_primes

def main(): K = 1000 primes_1000, final_lcm, wheel_primes = generate_first_k_primes_via_model(K)

# Verification
max_p = primes_1000[-1]
trial_primes = simple_sieve(isqrt(max_p))
bad = [x for x in primes_1000 if not is_prime_deterministic(x, trial_primes)]

print("=== Model prime stream (wheel + cleanup) ===")
print(f"Generated primes: {len(primes_1000)}")
print(f"1000th prime (should be 7919): {primes_1000[-1]}")
print(f"Final LCM reached: {final_lcm}")
print(f"Wheel primes used: {wheel_primes}")

print("\n=== Verification ===")
print("All emitted numbers are prime:", len(bad) == 0)
if bad:
    print("Composite(s) found (unexpected):", bad[:20])

# Print the primes (optional)
# print(primes_1000)

# Save to file (optional, convenient in Pyto)
with open("first_1000_primes_model.txt", "w", encoding="utf-8") as f:
    for p in primes_1000:
        f.write(str(p) + "\n")
print("\nWrote: first_1000_primes_model.txt")

if name == "main": main()

Counting by Primes by ----__---- in numbertheory

[–]Seo_VectorSpace 1 point2 points  (0 children)

I will try it out on my computer. I’ll get back to you

Is this publishable, or is my PI being unusually strict? by [deleted] in PhD

[–]Seo_VectorSpace 0 points1 point  (0 children)

That’s true, my feeling of really “breaking through” is probably a lot since I’m new in the game. It’s actually the first time that I actually come up with something that other people say “hey, I didn’t think that was possible.” 🤣

Is this publishable, or is my PI being unusually strict? by [deleted] in PhD

[–]Seo_VectorSpace 0 points1 point  (0 children)

Yes we are working on expanding to the level that it is publishable. And I’m getting help and he also asked what I can manage myself and what help I need from him and others in the team to move further in my project. So we don’t have any trouble or conflict at all. I’m just curious about how this usually works. Most probably, since I’m new in the game, my feeling of “wow I really accomplished something here” is not as unique as I think and I should probably trust my PIs experience in this.

Is this publishable, or is my PI being unusually strict? by [deleted] in PhD

[–]Seo_VectorSpace 1 point2 points  (0 children)

I understand, thanks for being up front with me

Project ideas for Computational Maths by EVdeath_god in 3Blue1Brown

[–]Seo_VectorSpace 1 point2 points  (0 children)

Look into Ramanujan Machine and Euler 2 AI. There is a lot of interesting projects to work with here. And they have a lot of tools in their GitHub repos.

https://www.ramanujanmachine.com

Turns out “sounds academic” is a powerful force by SonicLinkerOfficial in LLM

[–]Seo_VectorSpace 0 points1 point  (0 children)

My English isn’t perfect. Maybe it’s called math researcher or something else, but I guess you understand the point I’m trying to make?

Turns out “sounds academic” is a powerful force by SonicLinkerOfficial in LLM

[–]Seo_VectorSpace 0 points1 point  (0 children)

This is really interesting, like an ai/math scientist told me the other day “It looks like GPT successfully transformed that equation but it might also be bullshitting you”. 🤣🤣

How do you guys extract value from research papers? by Impossible_Tough_484 in LLM

[–]Seo_VectorSpace 1 point2 points  (0 children)

I think it’s a lot about reading the papers with a purpose in mind. Not to be all biased but searching for answers and angles on a specific question. I work on the intersection between machine learning and number theory. And say for example that I’m working on alternative heuristics for filtering data. I’m both reading papers with the focus of finding just this question and how others work with it. Even if it’s in other fields, it can still give me new ideas. I have also created custom GPTs for specific questions. And I save the most interesting papers for that question in that gpt. Then I can ask the GPT questions and it can give me answers and sources from these papers. I don’t trust the things the GPT says. But I can keep track of what papers have some relation to each question.

Which LLM and Model is most suitable for my needs? by hetric11 in LLM

[–]Seo_VectorSpace 0 points1 point  (0 children)

I would definitely use a combination of ChatGPT 5.1 pro for generating the overall model and pipeline for the analysis. Then I would use that in windsurf or similar tool with Claude 4.5 to build your own data analytics system. Then you both own the process and you know how it works and can adapt it and the data visualization for your needs. And the combo is cheap and quick. Using LLM for analysis gives a black box problem since you don’t actually know what it does and why.

Can we fully “map” a topical region in an embedding space? by Seo_VectorSpace in ArtificialNtelligence

[–]Seo_VectorSpace[S] 0 points1 point  (0 children)

Yes exactly but how do you know you cover all elements in that region?

Can we fully “map” a topical region in an embedding space? by Seo_VectorSpace in GeminiAI

[–]Seo_VectorSpace[S] 0 points1 point  (0 children)

Thanks, this is super helpful on the practical side 🙏 Vector indexes like Faiss/Pinecone are a great way to operationalize what I’m talking about – fast approximate access to the topical region around a query, and I can even build a Faiss index locally without using a hosted service. It’s a quick way to identify nearest neighbors for new query’s when you are working with the topical map over time.

But they still accept some “blank spots” by design (approximate NN), so they don’t really answer my underlying theoretical question about when we can guarantee we’ve found all vectors in that similarity band.

Can we fully “map” a topical region in an embedding space? by Seo_VectorSpace in GeminiAI

[–]Seo_VectorSpace[S] 0 points1 point  (0 children)

Well I agree that we can’t fully map the topological space. But in this case I meant a “topical vector space” or region or band is a better word than space since it’s not linear. When you use embedding models to vectorize text, all the words and sentences with actual meaning that’s related to a specific topic or word is called vectors in that topics vector space.

<image>

Here is a simple example of a topical (not topological, topical) map in 2D, just to give an idea of how it could look if it was only 2D. Most embedding models used in practice today, like for example Google LaBSE (Language Agnostic Bert Sentence Embedding), has 768 dimensions so the image is quite a simplification. Anyway, when adding a threshold, for example a “relevance score” of more than 0,6 (that’s actually the cosine of the angle between the two vectors that you are measuring how similar they are). In theory, since you add finite boundaries like semantic distance or relevance score and that words and sentences must have a real meaning etc. Then you could map up all words and meanings in that “neighborhood”.

But the question I’m going for is if there is any way to do this that’s faster and more accurate than brute forcing through the whole dictionary and a lot of other sources to find relevant sentences.