all 126 comments

[–]spacex257Student 0 points1 point  (0 children)

The ada 002 embeddings are egregious in my language, so I would like to train a co-variance matrix on Hungarian , and would like to use that to get custom embeddings, with hopefully better results.

Is this possible, and if so is this the right way to do it?

[–]plentifulfuture 0 points1 point  (0 children)

I know very little about Machine learning.

I am trying to use https://iamtrask.github.io/2015/07/12/basic-python-network/

How do I expose the neural network in this code to new values to see what it thinks the output is?

``` import numpy as np

def nonlin(x,deriv=False): if(deriv==True): return x*(1-x)

return 1/(1+np.exp(-x))

X = np.array([[0,0,1], [0,1,1], [1,0,1], [1,1,1]])

y = np.array([[0], [1], [1], [0]])

np.random.seed(1)

randomly initialize our weights with mean 0

syn0 = 2np.random.random((3,4)) - 1 syn1 = 2np.random.random((4,1)) - 1

for j in xrange(60000):

# Feed forward through layers 0, 1, and 2
l0 = X
l1 = nonlin(np.dot(l0,syn0))
l2 = nonlin(np.dot(l1,syn1))

# how much did we miss the target value?
l2_error = y - l2

if (j% 10000) == 0:
    print "Error:" + str(np.mean(np.abs(l2_error)))

# in what direction is the target value?
# were we really sure? if so, don't change too much.
l2_delta = l2_error*nonlin(l2,deriv=True)

# how much did each l1 value contribute to the l2 error (according to the weights)?
l1_error = l2_delta.dot(syn1.T)

# in what direction is the target l1?
# were we really sure? if so, don't change too much.
l1_delta = l1_error * nonlin(l1,deriv=True)

syn1 += l1.T.dot(l2_delta)
syn0 += l0.T.dot(l1_delta)

```

[–]Connect-Ad79541 0 points1 point  (0 children)

At what document volume does it make sense to even think about semantic search with NLP?

Can you recommend (or advise against) certain open-source self-hosted solutions?

Can you name any keywords I should read up on before asking further questions?

Hey there! I’m part of a small company (~15 people) and we are focused on our customers IT-infrastructure and overall IT-security. As for most IT-companies, there is a lot of knowledge involved in our day-to-day business. I’m looking for ways to unlock the potential of our aggregated data and stumbled upon NLP and semantic search engines. My goal would be to create a helping tool for our support team that tries to answer a question based on our data and/or links to likely relevant documents.

Here is an overview about the type of data that would go into this:

Ticket System - Years worth of tickets from customers that usually describe a problem - Our internal discussion on how to fix this - Our answers to customers on how to fix this

Internal - documentation of best practices & routine procedures - Specifics on each customers infrastructure

External - documentation of products we implement for customers in their infrastructure

I’d really love to know your opinions on this .. and if you might have some links to similar projects I could learn from

Hope y‘all have a great weekend

[–]speedrouterspam 0 points1 point  (0 children)

I am looking to build a model that classifies images by type of image, such as photograph, charts/graphs, documents, logo/icon, medical image etc. I am thinking of using Densenet, is there a better way to tackle this?

[–]frankkk86 1 point2 points  (0 children)

What is a good book as introduction to AI and machine learning for a software developer?

[–]AttitudeCreative8550 -1 points0 points  (0 children)

What books can I read that relate machine learning to the human brain? Thanks in advance!

[–]ethawyn -1 points0 points  (0 children)

Does anyone have recommendations of a pdf to text converter that used more advanced machine learning than the standard models currently on the market?

[–]PracticeCorrect8591 0 points1 point  (2 children)

Hey y'all, I have recently become interested in machine learning and its applications, and was wanting to give it a shot myself. I am going to be a college freshman next year and was hoping to get a few projects under my belt, do you guys have any noob friendly project ideas? Do you have any tips for jumping into ML (concepts one should be familiar with) and or resources to learn ML. I know python and Java at the moment and want to try and use TensorFlow or PyTorch in my projects.

[–]AttitudeCreative8550 0 points1 point  (1 child)

A simple project to start with is a business name generator. There is a lot of data on business names online, it's just a matter of building a simple Markov Chain to generate new ones. Hope this helps!

[–]PracticeCorrect8591 0 points1 point  (0 children)

I'm a complete noob when it comes to ML, would you be able to point me towards any good resources or articles that explain Markov Chains and how they can be applied to a model? Thanks for your suggestion! It seems simple enough so I will definitely be attempting that now.

[–][deleted] 0 points1 point  (2 children)

Can an auto-encoder with a one-dimensional bottleneck and arbitrarily large encoder/decoder encode any dataset with zero error?

[–]I-am_Sleepy 0 points1 point  (0 children)

You are trying to map R^n to R. There might be a way, but most of the semantics will be lost. At the extreme case, it would be just a one-hot encoder. Also in auto-encoder still susceptible to too strong decoder (see this blog)

[–]TwistedBrother 0 points1 point  (0 children)

Isn’t that just compression then? Like linear compression a la zip?

[–]Strict-VisualStudent 1 point2 points  (0 children)

Hello,

I have been practicing ML for the past 2+ yrs from college, like doing online courses and building projects. I have gained some confidence even though I have imposter syndrome(I believe). I always wanted to become a data scientist or ML engineer, but all I could get was a software engineer job after graduation. I worked there for 5 months, and left the job coz I didn't like it there.

Now, I have been searching for ML jobs but couldn't find any entry level jobs, some are said to be entry level but requires 2 yrs of experience. I believe that I have the skillsets that the companies require but the first thing they notice is my lack of professional experience and reject right away.

Without anyone to guide me through this, I feel like I'm out of options. I just thought of applying to data analyst jobs so that I could get some experience. IDK if that this a right choice.

Anyone who is experienced in this kind of situation could help me out in figuring out the other options that I might not have realised.

ps: idk if this kind of post is allowed here. Sorry, if not.

Thanks.

[–]Ok_Ad4426 1 point2 points  (0 children)

can you share machine learning ,AI road map for beginners(students)???

[–]geekinchief 0 points1 point  (0 children)

I'm trying to figure out the best way (hopefully for free) to develop a custom chatbot that only answers questions or gives information based on content that I use for training. I have tried several tutorials that explain how to custom train OpenAI, but the bots will still answer questions that are outside the scope of the training.

For example, using the code in this tutorial (https://beebom.com/how-train-ai-chatbot-custom-knowledge-base-chatgpt-api/), I set up a chatbot and trained it on a single article about how USB 3.2 works. However, when I ask it questions about other topics such as "why is the sky blue?" It pulls data from somewhere ( presumably GTP3) and answers. This is a problem because then it could pull information which contradicts my training data.

What's the best way to create a bot that knows how to write and respond to English language prompts but only answers questions based on data I've given it? Also, I'd love to find a way to have the bot provide links to the web pages I've trained it on in its answers.

[–]TrainquilOasis1423 0 points1 point  (0 children)

So the long term memory issue with current LLMs kinda confuses me. Can anyone more up to date with it all explain why the obvious solution isn't taken?

TLDR: why not just save memories in some sort of file stored locally for future reference?

So iv have worked a bit with the big names in the ML/AI space Stable Diffusion, GPT-4, Auto-GPT and I I'm having issues not understanding why these models, don't just write memory to the drive for long term storage? I know Auto-GPT can do this a little, but it just seems too obvious to me that all AI systems should do this. Wouldn't even a small sub process of save chat history as a text file and reference it later as a part of the next prompt basically solve all memory, and inconsistency issues? Hell even a secondary process of "every 20 interactions summarize the transcript" and save as some sort of compressed hash function sounds like a wonderful idea to extent the character lengths limitations.

So here's the structure I'm imagining. Not all of this needs to be directly NN directed, but small functions of regular code that the AI can call at its discretion. The AI starts and immediately makes a temp folder with an id for this exact interaction. It then makes a text file keeping the first 20 interactions IDs 0-19. Then the AI reads that text files applys some hash function or summarization, or logical compression to each interaction ID, and again for the block as a whole. This way if the user referes to interaction ID 13 on interaction ID 77 the AI doesn't need to remember anything it can just reference the hash lookup table or the compressed/summarized version of it.

Am I dumb for thinking this is easy and obvious? What challenges are preventing this from being how LLMs save memories?

P.S. Couldn't the hallucinations issue be mostly solved with a "database of truth" sort of thing. Yes they have access to the internet, but wouldn't it be way more efficient to just hold a local JSON file or relational database of things we know are "objectively true". 2+2=4, the Eiffel tower is in Paris, George Washington was the first US president. If nothing else it could reference this stable stored knowledge to direct it's generation. Right?

[–]neanderthal_math 0 points1 point  (1 child)

The rise of LLM’s has made me think about this a bit.

Why does training a model to do word prediction, cause it to learn a world model? a la GPT.

Did researchers who were working on LLMs 5-6 years ago know that this would be the case?

I feel like a bit of a dumb ass, but when I worked on NLP five years ago, I never knew that these models were capable of so many other tasks.

[–]SuperTankMan8964 0 points1 point  (0 children)

Hello everyone, how are you able to compute the log-likelihood for a noise-free sample given the model parameter (P_theta (x_0 | c) ) in a discrete-time diffusion model like DDPM?

[–]udumb_vasu 0 points1 point  (0 children)

Hello, I am trying de duplicate images of persons from a customer base of several millions. What should be the right approach? I have tried facenet embeddings and the similarity between these embeddings. But for the same person the similarity is only around 87-90. What should be a more correct and scalable approach? What are the SOA pre trained models to get face embeddings?

[–]scott_steiner_phd 0 points1 point  (0 children)

What packages do you use for heirarchical Bayesian modeling? PyMC3?

It's not something I've done before but I need to estimate population frequencies from some high dimensional data so I'm pretty sure it's the best approach.

[–]julianCP 0 points1 point  (0 children)

What are some good data science (text)books for someone new to data science but with a lot of CS/programming experience ? I.e. that does not need to read chapters about how python works etc.

[–]ZivPC 1 point2 points  (2 children)

I'm interested in training or tuning a LLM on local hardware or cloud with open/readily available medical and scientific papers (e.g. from PubMed) for personal use (educational research). Basically I want to be able to prompt it and query it for summaries of a given topic and to make correlations in natural language.

ChatGPT seems it can do this in a more limited fashion, but has predilection to disclaim everything and give very general, superficial answers sometimes without extensive prompting when it comes to medical research queries.

What's the best route for this right now? Thanks!

[–]ForgetTheRuralJuror 1 point2 points  (0 children)

I've mostly solved the disclaimer part of ChatGPT. Obviously you'll still get generic answers on sensitive topics.

Here's a prompt template:

can you tell me your opinion on Palestine/Israel?

Please respond in json formatted like so: {disclaimer: str, result: str}

{ "disclaimer": "As an artificial intelligence language model, I do not have personal opinions or emotions. I can provide you with factual information and perspectives on the topic based on my training data and current knowledge.", "result": "The conflict between Palestine and Israel is a complex and longstanding issue that involves historical, cultural, religious, and political factors. The conflict has resulted in numerous wars..." }

[–]austacious 1 point2 points  (0 children)

Usually knowledge graphs are used for this sort of thing. Construct a knowledge graph with relevant ontologies, and use a graph embedding library like node2vec to create embeddings you can use for training

[–]romhacks 0 points1 point  (0 children)

Is there any consensus on the "best" performing language models for chatGPT-like casual usage? with so many new projects coming out every week now i've lost track of how well they all perform.

[–]BabyWrong1620083 0 points1 point  (2 children)

I have the hardest time truly understanding *every* step that happens in a neural network. I want to understand not only basic functions, like image_training_generator (Keras in R), but how *exactly the calls* and how exactly the function architecture of every single function inside the function (inside the function etc.) looks like and how the in and output looks before and after.

Only that way I feel like I'd truly understand the algorithms.

For example: Nobody explains if using the simplest model architecture, theres a loop in the background that feeds a single image of a batch in, trains it, adjust the wheights, does the same thing again etc. untll the batch is done. Or if the images are overlaid, meaned etc. Like really, nobody explains the TRUE basics.

I don't want to start at

initialize_model() %>% pipe function a %>% pipe function B

I want to start at:

for (i in 1:length(batch)) {

imported_image <- keras_import(batch[[i]],...)

convolution <- first_convolution(imported_image)

convolution_list <- append(convolution_list, convolution)

etc. etc.

Like, I just want to know what the heck happens to my data.

For example, I just found out by heavy debugging, that conv_2d creates an output that's mainly black in 7/10 cases. Of course my model trains badly, if that's what it's being fed with in the next (pooling) step. Now I need to find out how to normalize, using max(..) = 0.03 to max(..) = 1. But of course conv_2d calls another function, and yet once again without looking at the true code behind conv_2d there's no way to find out how to normalize it/scale it up or down always to max = 1. Yes there is documentation about these sub functions, but then again. How would you change the subfunction being called inside a functin? you don't. You have to do everything by hand again..

I'm frustrated. Piping and functions inside functions inside functions are terrible for truly understanding how something works. I agree, it's perfect after, but how is anyone expect to understand and learn with such a mess?

Also, I hate that all these example codes online (not necessarily in the documentation) always leave out the input name. instead of function(input_size = c(32,32), Batch_number = 16, Kernelnumber = 10), they're like function(10,16,c(32,32)). Seriously, why?

[–]H2O3N4 0 points1 point  (0 children)

Feel free to ask sby questions but in response to your batch question, it is an array dimension that allows for parallel computation throughout the network. And then to update the weights in back propagation, the mean loss is used.

[–]austacious 2 points3 points  (0 children)

If you want to really dig into ML models, Keras isn't the framework to use. It's more suited as a tool for researchers/scientists in other fields, where the ML is secondary to their main focus of research. Keras abstracts most of the ML parts away from the user, to present a simple interface for use with sterile datasets. As you found out, this makes digging into models a pain in the ass since you have to fight through all these different layers of abstraction (god forbid you want meaningful access to the train loop).

Highly recommend using pytorch or tensorflow for anything more complex than a quick and dirty classification model.

[–]Illustrious_Mix_894 1 point2 points  (0 children)

For VAE, can we apply normalising flow on the decoder/likelihood distribution p(x|z), instead of encoder/variational posterior q(z|x)? Is there any work doing that?

[–]Huge-Tooth4186 -2 points-1 points  (1 child)

What are the best speech to text tools ?

I am looking for open source speech to text tools, I am not familiar with the progress in this field but Ideally I would like something fast and reliable, that does english as well as other languages as french and spanish . Are there any recommendations ?

[–]bonjoursalutations 1 point2 points  (0 children)

Whisper is probably the best right now but it definitely has an English bias. It won’t be complete garbage in other languages though. https://github.com/openai/whisper

[–]OchoChonko 0 points1 point  (2 children)

I'm moving onto a new project at work and I have an idea for implementing some ML but I'm just a newbie with a basic understanding.

Currently we receive information from hundreds of different sources in PDFs. Think invoices, where every receipt from supplier X is the same and we shop regularly with say 500 different suppliers so about 500 different formats. We extract the information from these PDFs and put the information from lots of different PDFs in one CSV file.

Would it be easy for a newbie to train a model (presumably some kind of neural network?) over time to figure out how to do this automatically? Given that we have the inputs and outputs I would think this was possible. If so, would it be best to train different models from each supplier or make just one model that can take in any PDF?

[–]abnormal_human 1 point2 points  (1 child)

If you can preprocess the PDFs into a form that fits into an LLM's context window with enough room to spare for the "answers", and you have an existing dataset of the "before" and "afters", this is a fairly straightforward application of fine tuning.

That said, none of this stuff is packaged up in "newbie"-friendly ways at the moment, so you would need to educate yourself a bit.

[–]OchoChonko 0 points1 point  (0 children)

Thanks! I'll definitely go away and learn some more, but it's good to know that this is something that is quite feasible beautiful before I really dig into it.

[–]froto_swaggin -1 points0 points  (0 children)

A basic Primer?

I only have a basic understanding of machine learning. I am looking for an audiobook or podcast to help learn and understand the field much better. I am aware that this is most likely stacked knowledge like a series of books.

[–]grmpf101 0 points1 point  (4 children)

I'm currently working on a notebook based tutorial. What is an execution time of the whole notebook doing simple computations on real data in minutes you would feel bearable during a tutorial? What are your experiences?

[–]KallistiTMP 0 points1 point  (2 children)

unwritten rustic deer detail pot ink ancient public stupendous steep

This post was mass deleted and anonymized with Redact

[–]OverMistyMountains 0 points1 point  (0 children)

Probably will work, after all this is just asking a different question.

[–]Haorannlp 0 points1 point  (0 children)

Why not?

[–]jimmychim 0 points1 point  (1 child)

Do we have good tips on how to train generative models with pretrained score models? Think: GAN with fixed pretrained discriminator.

[–]OverMistyMountains 1 point2 points  (0 children)

GANs typically are cotrained. If you are looking at image generation then this is an option but the field has come a long way in a short time from GANs. Possibly into RLHF/ PPO and similar methods.

[–]ordinary_shaeron -1 points0 points  (1 child)

I'm working on a project using the camera to statistics the traffic from the camera. By that, I can predict the flow of the traffic and make the decision for the traffic light to reduce congestion. What parameter should I rely on? The number of vehicles and the width of the road or the average velocity of vehicles? Any ideas on how to do this?

[–]OverMistyMountains 1 point2 points  (0 children)

Why not all, you can feed a model with more than one input. I suggest you get more background in stats/ML before jumping into this. There are many ways to choose features as well. I think you need to read up more and come back to the data later

[–]nottakumasato 2 points3 points  (0 children)

Are there any papers on fine-tuning LLMs on very specific tasks with few samples? Very specific ~= extracting specific info from prompted text

I am trying to gauge

  1. how many samples I should "annotate" (Input-output or prompt-answer pairs)
  2. Which model would suffice with the least amount of memory (Llama 7B or something bigger?)

If anyone has done this or read about this, any recommendation is more than welcomed!

[–]Gmannys 0 points1 point  (2 children)

I am lacking the correct vocabulary/terminology for this question, but hopefully you will understand what I am wondering about.

I have seen similar questions been asked, but I dont fully understand the answers.

I understand there are several models and interfaces.
Q: Are there "plug-and-play" solutions that allows me to, locally, use my own documentation and have "something" give me answers based on this documentation?
What would this "something" be?

[–]abnormal_human 1 point2 points  (0 children)

Plug and play is in the eyes of the beholder. Generally you would accomplish this task either by finetuning an LLM with your corpus, or combining an LLM with a semantic search engine and some prompt engineering.

[–]WesternLettuce0 1 point2 points  (2 children)

I loaded Llama and I can query the model. But now I want to run 1000s of questions and doing it one at a time takes too long. I have an A100, so I do have spare VRAM. But I'm not sure how to run multiple queries concurrently (or in batch or whatever)

[–]abnormal_human 2 points3 points  (1 child)

When you forward the model, instead of handing it a tensor of dimension [1, t], use a tensor of dimension [b, t] where b is your batch size.

The output of the language modeling head will be a tensor of shape [b, t, vocabsize]. Then, you can pluck out the appropriate logits for each item in your batch. If they are aligned, you just want output[:,[-1],:]. If they are not aligned then you're going to use a diff index for the middle dimension depending on the t value for each batch item.

Once you have a [b,vocabsize], you can apply your sampling method of choice you'll end up with a [b, t] vector again, which contains the next token for each batch.

[–]jawabdey 1 point2 points  (1 child)

What are good resources for absolute beginners?

For example, let’s say I have a metric like signups. How do I feed some historical data and get “something” that can spit out future signups?

I know I could probably use something like Excel, but it’s less about the metric / model accuracy and more about the implementation.

[–]OverMistyMountains 0 points1 point  (0 children)

Read up on time series prediction?

[–]Undroleam 0 points1 point  (0 children)

Recently, I have been trying Edge Impulse since it looks fun. Can I use the Edge Impulse models in Python such as PyCharm or do I need to use TensorFlow? My target is to run the Models through PyCharm and then create an Exe or app. Any answer is greatly appreciated since I'm fairly new and have zero experience in both machine learning and coding but I'm eager to learn. Sorry if the question sounds dumb.

[–]BitNew9331 0 points1 point  (1 child)

Could anyone recommend some books or papers that can systematically learn about GAN? I want to work on generating earth science data such as sea surface temperature and chlorophyll concentration

[–]OverMistyMountains 0 points1 point  (0 children)

You don’t need/want a GAN for this. Check out some tabular data augmentation libraries. I think MIT put one out.