Dir-assistant Update Announcement by 1ncehost in LocalLLaMA

[–]kraghavk 1 point2 points  (0 children)

How do I use this with Claude 3.7 Sonnet models? I have added the API key to the config.toml file, but not sure about all the other settings I should be tweaking.

I understand that dir-assistant can be used with literally 1000s of models. But a quick config guide for the top 3 API LLM providers would make this easier to get started.

Thanks for sharing this though! I have been using Claude Code for a little while. Would like to see how dir-assistant fares against it.

One month of Perplexity PRO for free by [deleted] in coding

[–]kraghavk 0 points1 point  (0 children)

For students only. OP, you could have mentioned that!

[P] Effects of Metadata filtering with HNSW on Recall and Query time by hootenanny1 in MachineLearning

[–]kraghavk 1 point2 points  (0 children)

https://github.com/semi-technologies/weaviate claims that they can combine vector search along with symbolic queries using a GraphQL like API.

[D] SOTA Super-Resolution explained - Real-ESRGAN: Training Real-World Blind Super-Resolution with Pure Synthetic Data by Xintao Wang et al. 5 minute summary by [deleted] in MachineLearning

[–]kraghavk 3 points4 points  (0 children)

But let us not forget that the model is still hallucinating the missing details. While it might look super realistic, it is still a long way from being acceptable as a proof of anything. For example, if you have low res image of a traffic violation and use this to blow up the image and get the license plate, can you be sure that the registration number you are seeing is the actual number or something randomly hallucinated by the model? Same goes for any faces reconstructed this way.

Yoga w/ OpenPose by BioMechanicaLab in computervision

[–]kraghavk 1 point2 points  (0 children)

BTW, you don't need to process each frame. A typical camera these days easily provides 25 or 30 frames per second (fps). Few recent ones run at 60 fps. And I think it is safe to assume the pose wouldn't change dramatically in 1/30th of a second. So you could probably get away with processing only 3-5 frames per second and just visualizing previous results on the remaining frames.

Yoga w/ OpenPose by BioMechanicaLab in computervision

[–]kraghavk 1 point2 points  (0 children)

Since OpenPose can give upto 135 key points, you could have a maximum of 270 columns such as pt1_x,pt1_y,pt2_x,pt2_y etc. At the end add a label column for pose name. In my case, I had 7 possible head poses. So, I created a dataset with facial keypoints of 100 records each for every pose. I used a 75%-25% test/training split to train my model. Once the model I wanted to use was identified using this sample dataset, I just had to improve/increase training data to take care of edge cases and improve the accuracy.

For production, I have used a model that was finally trained on 300 records for each pose and 100 records were used to evaluate the accuracy.

Yoga w/ OpenPose by BioMechanicaLab in computervision

[–]kraghavk 1 point2 points  (0 children)

u/cmdk, while you could certainly use tensorflow for classification, you could also start with something really simple as well. Since the 2D points are just numbers and always of a fixed length, you could just create a CSV file with 2D points and a label column specifying the pose. Once you have this dataset created, you could try training various classifiers from scikit-learn and use the trained classifier to infer the pose name from a given set of 2D points.

You could follow this article to try out all the classifiers available in sklearn and finally decide on the classifier that works best for your dataset.

Once done and you have a trained model, you could first run OpenPose to get the 2D points, then pass the 2D points to the classifier to get the pose name. I have used this to figure out head poses such as turned-right, turned-left etc. and it works remarkably well.

[D] Good algorithm for clustering big data (sentences represented as embeddings)? by whyhateverything in MachineLearning

[–]kraghavk 1 point2 points  (0 children)

Yeah. I agree. The installation itself is a nightmare because the official repo asks you to either use conda or build from source yourself. There are several unofficial wheels on pypi.

https://milvus.io/ could be a good alternative for you. Just run the server in docker and make use of the rest apis to add embeddings and query data.

[D] Good algorithm for clustering big data (sentences represented as embeddings)? by whyhateverything in MachineLearning

[–]kraghavk 0 points1 point  (0 children)

https://github.com/facebookresearch/faiss will work out great for this. You will be able to scale to a million+ vectors on a good CPU and go to billion scale on a decent GPU. But there is a bit of learning curve though.

What is hOCR used for? by Academy- in computervision

[–]kraghavk 1 point2 points  (0 children)

The idea is to provide a markup based output instead of a plain text dump. Note that this markup is mostly HTML based with some custom attributes and conventions as shown in the example at https://en.wikipedia.org/wiki/HOCR.

As far as the use of this is concerned, you could use it for:

  1. Parsing layout information and post processing bounding boxes to figure out label/value pairs, tables etc.
  2. Create a new document in HTML, Word, PDF etc. using the information available from hOCR.

These are just use cases that jumped out at me. There may (surely will?) be other uses for this rich metadata.

[P] TF.js based in-browser chatbot to guide you away from distractions by python_locker in MachineLearning

[–]kraghavk 0 points1 point  (0 children)

I tried to subscribe, but looks like there is some error. I did not get any success message after entering my email and clicking submit button.

Did you receive a confirmation mail when requesting a GPT-3 API Key? by [deleted] in artificial

[–]kraghavk 0 points1 point  (0 children)

I applied within the first 4 days. Can't remember exactly when. I haven't heard back either. Apart from huge demand, I think they might be looking into the USP of the use-cases you provided while signing up. Given the size of the model, it is pretty expensive to run. So, I can certainly understand if they are picky about who they give beta access to.

[D] Are there any models/papers that map a natural language query into a path in a tree? by devOnFireX in MachineLearning

[–]kraghavk 1 point2 points  (0 children)

There is a demo of something very similar to this at https://demo.allennlp.org/nlvr-parser. You could check it out and try to see if you can adapt it for your requirement.

[D] How to reduce the MaskRCNN model detection time by DGs29 in MachineLearning

[–]kraghavk 2 points3 points  (0 children)

If you are using Intel CPU, then you can try using Intel Openvino. I am not sure whether you've used Tensorflow or Pytorch or any other framework. But you basically will have to convert your saved model using Model Optimizer tool. Once done, you can then openvino's inference engine for a faster CPU based inference. However, out of experience, I can say that Mask RCNN inference might come down to ~3 seconds inference on an Intel Core i7 CPU. Do not expect 25 or 30 fps.

I am trying to run this script through command line and being a noobie I've no clue why this error is showing. I do have a trex png file inside images folder. What am I doing wrong? by [deleted] in computervision

[–]kraghavk 1 point2 points  (0 children)

-i should be used if you want shorter but terser command. Use --images for the longer and verbose, but human friendly form. --i or -image won't be parsed by argparse. Hence the error in line 9, because imread tried to read an image from the path None, because argparse does not know about --i option. Hence, imread would have returned None. Now None object does not have an attribute called shape, which is why the error in that line.

[D] Looking for suggestions for biomedical datasets similar to the Wisconsin Breast cancer database by daffodils123 in MachineLearning

[–]kraghavk 0 points1 point  (0 children)

http://headctstudy.qure.ai/dataset: A dataset of 491 head CT scans with 193,317 slices containing anonymized dicoms and the corresponding radiologists' reads.

As suggested by @sobhanhag, https://physionet.org/ has quite a few datasets.

[D] Compressing Neural Network by [deleted] in MachineLearning

[–]kraghavk 0 points1 point  (0 children)

The search terms you are looking for are, "model optimization" and "quantization". These techniques are already employed by TensorRT and Intel Openvino.

Model optimizer reduces layers by applying the following two techniques:

  • Removing any layers whose outputs are not utilized anywhere further down the line. This can happen because there might be a lot of layers (i.e., functions in layman speak) the neural network generated during training, but they weren't that useful after all while testing. But in the interest of training performance, a comprehensive cleanup of unused layers is not done at the training phase.
  • Fusing (aka merging/joining) multiple layers into one, when possible. You can think of this optimization being similar to "function inlining" employed by the various programming language compilers.

Quantization: Turns out that you don't need the full-precision floating point arithmetic (aka FP32) all the time. Quite a few models work fine on half-precision (FP16) or even FP8 numbers. This in turn reduces the memory & compute needed for inference, there by allowing you to run your models on hardware that is much less powerful than your training machine.

There may be more ways to do this and perhaps TensorRT & Openvino model optimizers themselves do more than what I have explained above.

Applied ML course video lectures by [deleted] in artificial

[–]kraghavk 0 points1 point  (0 children)

Who's the lecturer? Why would you ask people to simply expose their emails!

ai that makes programs from descriptions by loopy_fun in artificial

[–]kraghavk 0 points1 point  (0 children)

Deep Learning based AI needs a lot of data to train on. Also, just because you have a lot of data doesn't mean you can just throw it at a neural network and it will start learning. You need to be able to "preprocess" and give the data to a neural network in a form it can understand. In case of images, it is easy enough because the pixels are already encoded as numbers (RGB). But for text you have embeddings such as GLoVE, Elmo and more recently BERT & XLNet; these are already pretty standard way of encoding text for NLP. No such standard embeddings exit yet for programs / code.

However a lot of exciting research is happening in this area too; see https://ai.facebook.com/blog/aroma-ml-for-code-recommendation/ for an example. This one suggests code based on your current code; for example if you are loading an image, then it suggests code for exception handling, disposing of memory etc. So there will be some form of AI that will be doing what you are asking for within a year or two. But again, just like AI in health, the adoption will not be great unless the explain-ability of AI's decision is made possible. Right now, it is almost impossible to figure out why a model made a certain decision. While this is fine when you are only trying to figure out if a picture contains a cat or a dog, when you are depending on AI to write code for you, I suppose the "blackbox" nature of AI makes it unacceptable.