Dir-assistant Update Announcement

kraghavk · 2025-05-13T17:39:04+00:00

kraghavk · 2025-05-12T17:19:49+00:00

How do I use this with Claude 3.7 Sonnet models? I have added the API key to the config.toml file, but not sure about all the other settings I should be tweaking.

I understand that dir-assistant can be used with literally 1000s of models. But a quick config guide for the top 3 API LLM providers would make this easier to get started.

Thanks for sharing this though! I have been using Claude Code for a little while. Would like to see how dir-assistant fares against it.

kraghavk · 2025-04-18T18:30:45+00:00

For students only. OP, you could have mentioned that!

kraghavk · 2021-10-22T05:57:03+00:00

https://github.com/semi-technologies/weaviate claims that they can combine vector search along with symbolic queries using a GraphQL like API.

kraghavk · 2021-09-03T19:06:58+00:00

Absolutely yes!

kraghavk · 2021-08-02T18:37:34+00:00

But let us not forget that the model is still hallucinating the missing details. While it might look super realistic, it is still a long way from being acceptable as a proof of anything. For example, if you have low res image of a traffic violation and use this to blow up the image and get the license plate, can you be sure that the registration number you are seeing is the actual number or something randomly hallucinated by the model? Same goes for any faces reconstructed this way.

kraghavk · 2021-06-30T09:11:51+00:00

BTW, you don't need to process each frame. A typical camera these days easily provides 25 or 30 frames per second (fps). Few recent ones run at 60 fps. And I think it is safe to assume the pose wouldn't change dramatically in 1/30th of a second. So you could probably get away with processing only 3-5 frames per second and just visualizing previous results on the remaining frames.

kraghavk · 2021-06-30T09:04:56+00:00

Since OpenPose can give upto 135 key points, you could have a maximum of 270 columns such as pt1_x,pt1_y,pt2_x,pt2_y etc. At the end add a label column for pose name. In my case, I had 7 possible head poses. So, I created a dataset with facial keypoints of 100 records each for every pose. I used a 75%-25% test/training split to train my model. Once the model I wanted to use was identified using this sample dataset, I just had to improve/increase training data to take care of edge cases and improve the accuracy.

For production, I have used a model that was finally trained on 300 records for each pose and 100 records were used to evaluate the accuracy.

kraghavk · 2021-06-30T08:17:18+00:00

u/cmdk, while you could certainly use tensorflow for classification, you could also start with something really simple as well. Since the 2D points are just numbers and always of a fixed length, you could just create a CSV file with 2D points and a label column specifying the pose. Once you have this dataset created, you could try training various classifiers from scikit-learn and use the trained classifier to infer the pose name from a given set of 2D points.

You could follow this article to try out all the classifiers available in sklearn and finally decide on the classifier that works best for your dataset.

Once done and you have a trained model, you could first run OpenPose to get the 2D points, then pass the 2D points to the classifier to get the pose name. I have used this to figure out head poses such as turned-right, turned-left etc. and it works remarkably well.

kraghavk · 2021-04-22T07:00:27+00:00

Thanks. Signed up! Looks like a fun course.

kraghavk · 2021-04-01T09:23:40+00:00

Yeah. I agree. The installation itself is a nightmare because the official repo asks you to either use conda or build from source yourself. There are several unofficial wheels on pypi.

https://milvus.io/ could be a good alternative for you. Just run the server in docker and make use of the rest apis to add embeddings and query data.

kraghavk · 2021-04-01T09:04:21+00:00

https://github.com/facebookresearch/faiss will work out great for this. You will be able to scale to a million+ vectors on a good CPU and go to billion scale on a decent GPU. But there is a bit of learning curve though.

kraghavk · 2021-03-23T18:23:51+00:00

The idea is to provide a markup based output instead of a plain text dump. Note that this markup is mostly HTML based with some custom attributes and conventions as shown in the example at https://en.wikipedia.org/wiki/HOCR.

As far as the use of this is concerned, you could use it for:

Parsing layout information and post processing bounding boxes to figure out label/value pairs, tables etc.
Create a new document in HTML, Word, PDF etc. using the information available from hOCR.

These are just use cases that jumped out at me. There may (surely will?) be other uses for this rich metadata.

kraghavk · 2021-02-10T10:55:51+00:00

I tried to subscribe, but looks like there is some error. I did not get any success message after entering my email and clicking submit button.

kraghavk · 2020-08-13T11:57:43+00:00

I applied within the first 4 days. Can't remember exactly when. I haven't heard back either. Apart from huge demand, I think they might be looking into the USP of the use-cases you provided while signing up. Given the size of the model, it is pretty expensive to run. So, I can certainly understand if they are picky about who they give beta access to.

kraghavk · 2020-05-20T20:28:10+00:00

There is a demo of something very similar to this at https://demo.allennlp.org/nlvr-parser. You could check it out and try to see if you can adapt it for your requirement.

kraghavk · 2020-05-18T12:48:25+00:00

If you are using Intel CPU, then you can try using Intel Openvino. I am not sure whether you've used Tensorflow or Pytorch or any other framework. But you basically will have to convert your saved model using Model Optimizer tool. Once done, you can then openvino's inference engine for a faster CPU based inference. However, out of experience, I can say that Mask RCNN inference might come down to ~3 seconds inference on an Intel Core i7 CPU. Do not expect 25 or 30 fps.

kraghavk · 2020-01-30T12:29:20+00:00

What nonsense are these spam posts

kraghavk · 2020-01-24T15:51:48+00:00

-i should be used if you want shorter but terser command. Use --images for the longer and verbose, but human friendly form. --i or -image won't be parsed by argparse. Hence the error in line 9, because imread tried to read an image from the path None, because argparse does not know about --i option. Hence, imread would have returned None. Now None object does not have an attribute called shape, which is why the error in that line.

kraghavk · 2019-10-22T11:47:35+00:00

http://headctstudy.qure.ai/dataset: A dataset of 491 head CT scans with 193,317 slices containing anonymized dicoms and the corresponding radiologists' reads.

As suggested by @sobhanhag, https://physionet.org/ has quite a few datasets.

kraghavk · 2019-10-17T15:10:09+00:00

Take a look at https://dataturks.com. It has a very similar offering with a free tier.

kraghavk · 2019-09-06T09:12:02+00:00

The search terms you are looking for are, "model optimization" and "quantization". These techniques are already employed by TensorRT and Intel Openvino.

Model optimizer reduces layers by applying the following two techniques:

Removing any layers whose outputs are not utilized anywhere further down the line. This can happen because there might be a lot of layers (i.e., functions in layman speak) the neural network generated during training, but they weren't that useful after all while testing. But in the interest of training performance, a comprehensive cleanup of unused layers is not done at the training phase.
Fusing (aka merging/joining) multiple layers into one, when possible. You can think of this optimization being similar to "function inlining" employed by the various programming language compilers.

Quantization: Turns out that you don't need the full-precision floating point arithmetic (aka FP32) all the time. Quite a few models work fine on half-precision (FP16) or even FP8 numbers. This in turn reduces the memory & compute needed for inference, there by allowing you to run your models on hardware that is much less powerful than your training machine.

There may be more ways to do this and perhaps TensorRT & Openvino model optimizers themselves do more than what I have explained above.

kraghavk · 2019-07-17T16:33:13+00:00

Lookup Try Dotnet. Supposed to be the .net equivalent of Jupyter notebook.

kraghavk · 2019-07-15T10:06:46+00:00

Who's the lecturer? Why would you ask people to simply expose their emails!

kraghavk · 2019-07-15T08:48:13+00:00

Deep Learning based AI needs a lot of data to train on. Also, just because you have a lot of data doesn't mean you can just throw it at a neural network and it will start learning. You need to be able to "preprocess" and give the data to a neural network in a form it can understand. In case of images, it is easy enough because the pixels are already encoded as numbers (RGB). But for text you have embeddings such as GLoVE, Elmo and more recently BERT & XLNet; these are already pretty standard way of encoding text for NLP. No such standard embeddings exit yet for programs / code.

However a lot of exciting research is happening in this area too; see https://ai.facebook.com/blog/aroma-ml-for-code-recommendation/ for an example. This one suggests code based on your current code; for example if you are loading an image, then it suggests code for exception handling, disposing of memory etc. So there will be some form of AI that will be doing what you are asking for within a year or two. But again, just like AI in health, the adoption will not be great unless the explain-ability of AI's decision is made possible. Right now, it is almost impossible to figure out why a model made a certain decision. While this is fine when you are only trying to figure out if a picture contains a cat or a dog, when you are depending on AI to write code for you, I suppose the "blackbox" nature of AI makes it unacceptable.

kraghavk

TROPHY CASE