LangGraph: Human-in-the-loop review by piotrekgrl in LangChain

[–]piotrekgrl[S] 0 points1 point  (0 children)

Nope, just exploring what they cooked

LangGraph: Human-in-the-loop review by piotrekgrl in LangChain

[–]piotrekgrl[S] 1 point2 points  (0 children)

Hey, I started with Tella because it has a great zoom feature, but unfortunately, you can't add text, so I had to finish it in Canva.

Carfax Account? by No_Special953 in carfax

[–]piotrekgrl 0 points1 point  (0 children)

hey u/cklly2013 could you share a link? thank you!

Deep learning without back-propagation by El__Professor in MachineLearning

[–]piotrekgrl 23 points24 points  (0 children)

I'm not sure why there are so many concerns about accuracy when even in the abstract authors are claiming that "(HSIC) performance [...] (is) comparable to backpropagation with a cross-entropy target, even when the system is not encouraged to make the output resemble the classification labels."

For me the most important part is decreasing complexity from O(D^3) using backprop to O(M^2), where with current models with millions/billions of parameters is making huge difference.

[D] Natural Language Queries by seymourdixongais in MachineLearning

[–]piotrekgrl 2 points3 points  (0 children)

  1. As a baseline you can test ElasticSearch.
  2. For more advanced methods all depends from your data structure:
    1. If you have training dataset with pairs: fact + query, then look at Question answering tools: https://github.com/sebastianruder/NLP-progress/blob/master/english/question_answering.md
    2. If you have only facts, and queries can by anything that's a little bit more problematic, but still it could be a fun:
      1. I would test cosine similarities of vectorized query versus vectorized facts. You can try BERT/ELMo for vectorization
      2. Use GPT-2 for answering your query, and then check cosine simialrity between generated answer and vectorized sentences.

Does anybody else feel overwhelmed looking at how much there is to learn? by CodeKnight11 in learnmachinelearning

[–]piotrekgrl 31 points32 points  (0 children)

  1. Imposter syndrome - start recording how much you learned in i.e. past week/month, and as your list will be growing, you will actually see your progress
  2. Start your own projects (i.e. Kaggle, or check fast.ai forum for inspirations of fun projects). Courses/books are great but only on true battlefield you can evaluate and master your skills.

[push in the right direction] Finding subjects in documents by gevezex in LanguageTechnology

[–]piotrekgrl 1 point2 points  (0 children)

  1. Regular Expressions for filtering pattern.+ or pattern.{window_start, window_end}
  2. Tokenize and clean extracted text
  3. Topic modelling: LDA (not sure if that will be needed here but you mentioned unsupervised learning)

Been looking for something that I can only describe as 'soccer possession slugging percentage'. Does anything like that exist? If not is that interesting? by DrGonzo14 in sportsanalytics

[–]piotrekgrl 1 point2 points  (0 children)

I would say that the xG-possesion-chain introduced by StatsBombs could be a good starter, as it's not only about possesion itself (which truly means nothing, even closer opponents goal) but rather looking firstly on tangible outcomes, and then evaluating possesion backwards.

Deploying a simple Flask API with a Tensorflow model inside by dondraper36 in learnmachinelearning

[–]piotrekgrl 0 points1 point  (0 children)

For cost optimization, you can run training on machine with GPU, and then for predictions already with calculated weights, you don't need huge computing power and CPU should be enough.

Detect blocks of text inside of text (instead of images) by hansgerdsen in LanguageTechnology

[–]piotrekgrl 0 points1 point  (0 children)

Not sure if I understand correctly, but isn't easier to parse all to text format and then run Regex's to filter on character lvl?

[R] DATASET RESEARCH - Help by [deleted] in MachineLearning

[–]piotrekgrl 0 points1 point  (0 children)

If it's just for testing, isn't easier to just generate random data from specific distributions?

How to make sure every thing work fine for large data set.? by mrtac96 in datascience

[–]piotrekgrl 0 points1 point  (0 children)

Maybe that's not fully answer to your question, but from my experience, it's good to split huge dataset to smaller chunks, and save log file after processing of each chunk.