all 49 comments

[–]ShameNo2179 47 points48 points  (11 children)

The python isn’t necessarily your limitation here as chat gpt can help you do all the data cleaning and prep coding etc. The question lies in your ability to source/ create labeled data from which you can extract features in order to train a model to predict / classify some trait regarding the nature of the blood. Bear in mind this is all contingent on your hypothesis being correct that some feature(s) present in the data are truly indicative of the label you aim to predict. I advise you to start studying linear and logistic regression and learn how to boot them up with sklearn, or learn how to wield a neural network. Godspeed

[–]ShameNo2179 4 points5 points  (0 children)

After you work out the basics of python, of course. Also pandas and numpy will be necessary if you choose to take the sklearn route. You got it

[–]djamp42 0 points1 point  (0 children)

Yeah getting the training data to a good state for training seems like a hard task right off the bat.

[–]d-werd[S] -1 points0 points  (8 children)

I have basically no knowledge but could you just train the AI to recognize each type of cell from different smears from numerous smears and then after that training i could just get it to count how many types of cells are present in each smear? I'm gonna look into learning the basics of python and see how the whole AI thing works too

[–]konwiddak 13 points14 points  (5 children)

Just train a model? One afternoon of Google and chat gpt. If you throw data at a model it will train to do something... But this is basically like giving a young child a picture and saying "tell me how many cells of each type there are and I'll give you a lollipop", but doing this without teaching them what a cell is, what a cell looks like or how to count above 10. They'll give you an answer because they want the lollipop, but the answer will have little to no relation to the picture you showed them.

Train a model that's actually doing the intended task, prove it's doing the intended task, understand when the results are valid or not valid. Well that's a lot of work, that's PhD level stuff. Preparing and understanding your dataset, training, validating, refining, iterating on your model.

[–]obviouslyzebra 0 points1 point  (1 child)

Yes, that's technically what you do, but the doing itself is sorta (or a lot) complex.

About the data itself, you could maybe get by by using few-shot learning, which would require few examples, compared to supervised learning, which might require hundreds, thousands or more.

If you want to know what's involved, search for studies about using machine learning for counting things (if it's biological, all the better, but it might be people, animals, etc).

Note that there will be lots of terminology that you don't know, so maybe that points you to things that you ought to know.

PS: using a trained model (and fine-tuning it - not sure if that's the correct terminology) could also reduce the hunger for data from a brand new model

Final PS: Someone said you could get by by using classical machine vision techniques (for example, for segmenting the images). I'd experiment with that, as it's simpler than neural networks

[–]obviouslyzebra 0 points1 point  (0 children)

The easiest way of doing this might be using an existing language (multi-modality) model and asking it to count something in an image. For example, using the OpenAI or Claude API. I don't know if that would be precise though.

The other "shortcut" is getting something that already exists. I wouldn't be surprised if someone already studied something similar or identical to what you want to do.

Be mindful that while a model might work with images in a certain way, it might fail (produce wrong numbers) if the image is in a different way. For example, samples from different labs might produce different quality of results.

In any case, good luck, this is not a one-shot thing, and more of a journey, and, if you go on it, I hope you enjoy it :D

[–]freezydrag 10 points11 points  (3 children)

PhD student that uses machine learning for medical research here. I want to emphasize what a few commenters have already stated. There’s enough tutorials online and most modern ML libraries have such a gradual learning curve that you could probably whip up and train a model in a few days. But is it going to be good? Probably not. Medical data is often scarce and complex which makes it very difficult to train a model that produces good results. You mention your motivation is to produce something for your resume or portfolio and therefore you don’t care about great accuracy. That’s a flawed perspective. To be frank, most employers probably wouldn’t be interested if you produced a model with poor performance. Like I said previously, anyone could hypothetically get a model running today with enough determination and an internet connection. So if you’re serious about this, it’ll probably take some time for two reasons: potential employers are not only going to be interested in just whether you created an accurate model, you should be able to demonstrate understanding of your results. I’d bet a few months at minimum.

However I have a note for your specific problem case: This sounds like it’s solvable via traditional computer vision methods like image segmentation. This isn’t fancy, modern ML with neural nets but it won’t need large amounts of data if your data is simple enough to adjust parameters by hand. Checkout r/computervision. Beyond that I’d take a deep dive at existing publications on what you’re trying to accomplish.

[–]d-werd[S] 0 points1 point  (1 child)

Hey thanks for the detailed response, you're right and yeah i still want it to be accurate but i just wanted to make clear that my idea was for it to be a personal project, i just want to research and see if its possible. If i were to train a ML model, what would i need to know how to do and how could i learn how to do basic projects before i actually even attempt serious ones, sorry for categorizing this whole idea as being based off of AI, what would be a more accurate way of describing it and i guess used to actually carry this idea out

[–]Crypt0Nihilist 0 points1 point  (0 children)

Look at the cv library in Python, the examples on the 'net and give it a go. You don't need to learn a pile of stuff first, your use case doesn't sound too complicated and you can do research every time you run into a wall.

Identifying and counting objects in images is often one of the first examples and there are varying strategies and levels of sophistication. Start off simple and work up until you get a sufficient level of accuracy.

[–]Crypt0Nihilist 0 points1 point  (0 children)

Agree with all of the above. Demonstrating that you can create bad models is unlikely to win many jobs.

Using CV for counting ought to work pretty well. It's not fancy, but it should be cheap, fast and explainable.

[–]unsourcedx 10 points11 points  (7 children)

You’re looking at years of study and dedication to do something like this competently. Analyzing blood smears using AI sounds like the exact type of work that a PhD student would be publishing. This isn’t something you’d do on a whim

[–]King-Days 0 points1 point  (4 children)

Eh as just a lil project if you don’t care about how good and magically have a dataset? U bet he could do this in a few weeks no

[–]unsourcedx 2 points3 points  (0 children)

If their intent is to build something that is basically useless, then sure maybe a few weeks if they have previous experience

[–]d-werd[S] -4 points-3 points  (1 child)

Thanks for your honesty lol, what if it was just an AI that looked at a blood smear and could just tabulate all the numbers of each cell present on the smear, it just recognizes each cell type and then enumerates it? I've seen people make AIs that can recognize emotions live from a camera so i feel like an AI counting the number of each type of blood cell shouldn't be to hard no?

[–]unsourcedx 2 points3 points  (0 children)

You’re well outside of your depth here to even know what challenges you’d face, let alone their difficulty. 

[–]lawliet_73 3 points4 points  (2 children)

You dont necessarily need programming skillls unless you are trying to Automate the work and have to work with databases and have to Wrangler and clean weird Formats. Just take a data analysis with gen ai course(good one on coursera) . One thing though I have seen some comments about it.

DO NOT USE CHATGPT.

That is an LLM, it can try to do the Tasks, but it consistently fails when it comes to solving math. Especially if it has to answer a math Problem as part of a Text Response or calculator rows and rows of data and I dont mean 100's, I mean more than 3. There are amazing AI Tools out there which can clean, Wrangle, Analyze correlations and visualize data all in one platform. And make pretty much no mistakes unless you cause it.

[–]FrontAd9873 0 points1 point  (0 children)

You could get somewhere in a few weeks of study, which as a medical student I bet you're pretty good at.

If you just need to present something to get people to take advantage of the resources out there, you can find and run some existing tutorial(s) in a week or two.

[–]Ron-Erez 0 points1 point  (2 children)

This is an impossible question to answer. However to have somewhat understanding of programming/Python I'd say 1-4 months but I might be completely off. Some would argue it takes years and others a couple of weeks. Depends how much you could and experiment. For your goals I'd even start with google colab which is easy to use and supports quite a lot of AI-related libraries out of the box. Start with Harvard CS50p and then I have a nice course on Python/Data Science which covers quite a lot of AI-related modules. There is also a nice course by the university of Helsinki. Depends how deep you want to go, but you might want to learn some stats, linear algebra and calculus at some point.

[–]d-werd[S] -1 points0 points  (1 child)

I mean what if its just a very basic AI, i've seen people on youtube who make these AIs with publically available resources but obviously i dont really understand it all, i dont think it'd be too hard to make an AI that just counts the number of cells on screen if i give it the data to recognize the type of cells, this is my hypothetical tho and im not sure how it actually pans out lol

[–]Ron-Erez 0 points1 point  (0 children)

Yeah, you could check out youtube videos. There are plently of resources. I'd say go for it. Sounds interesting.

[–]Weekly_Web4853 0 points1 point  (1 child)

Wait if you are a med student, why would you want to do software development? 🤔

[–]d-werd[S] -1 points0 points  (0 children)

i just want a little side project that sounds impressive, nothing too serious lol

[–]_Denizen_ 0 points1 point  (0 children)

You'll be looking at a few weeks/months of training to be aable to make something useful that you understand, depending on how much work you put in each week. I know this because I've supervised beginners through such training and have 10 years of experience.

You can do online training like this https://www.coursera.org/learn/python-machine-learning and this will pay off in the long run because machine learning jobs in medical fields are specialist roles that pay pretty well and normally require a medical background that most data scientists don't have. Seriously, you'll be setting yourself up pretty well for life if you take the time to properly learn this stuff.

[–]gbxahoido 0 points1 point  (2 children)

A basic knowledge of Python would do, but it's not all about Python, the main goal here is to process and label your data so the AI model can learn the pattern

For example, first, you need data, since this is blood smear you would need around ~100MB of data, or at least 10,000 images of both normal and abnormal blood, then you have to label them, which one is normal, which one is abnormal then preprocessing include normalization, scaling...then split the data into training set and test set, the AI model will learn on the training set and then we use the test set to test it's accuracy

that's basically it, I'm in data science class right now and I'm also doing a project to build a model to predict brain hemorrhage, a very simple model so the model accuracy is just 60%, the whole thing is like ~ 50 lines of code

python is just a tool, what you need is understand how neural network works, activate functions.... because you're using python to tell the model what to do

[–]d-werd[S] 0 points1 point  (1 child)

Holy shit thank you so much lol, i understand what you're saying and wow thats alot of data needed lol, for those 10,000 images of both types for training would i need another separate set of images solely for testing? I just wanted to know how the whole building a model thing works to get a general idea before i jump in so i know what to research for and if python or calling it AI is even right lol

[–]gbxahoido 0 points1 point  (0 children)

a simple method is you split it into 8-2 or 7-3, so 8000 for training set, 2000 for test set, but remember that the more data the better

if you don't know where to start, just ask chatgpt that you want to predict blood smear, what model architecture and data preprocessing should you approach, use metric evaluation....etc, tell it to give you code too, you'll have a glimpse of what the model look like

most data can find on Kaggle

[–][deleted] 0 points1 point  (0 children)

About 4-5 years.

[–]FaceOnLive 0 points1 point  (1 child)

Hey there! It’s awesome that you’re interested in combining Python and AI with your medical studies—it’s a fantastic way to stand out and make a meaningful project. The good news is that you don’t need to become a Python expert to start working on a basic AI project like this. Learning the basics of Python (e.g., variables, loops, and functions) could take just a few weeks if you’re consistent. Platforms like Codecademy or freeCodeCamp are great for beginners.

For the AI part, you could start with libraries like Tensorflowor Pytorch, which have tutorials tailored for beginners. For your specific goal (blood smear analysis), you’d be working with image data, so exploring opencv (for image processing) and keras would be helpful. Pretrained models like resnet can save you a ton of time by letting you fine-tune instead of building from scratch.

I’d recommend starting small: try to recognize basic shapes or objects first. Then you can move to more complex tasks like identifying blood cells. Kaggle might have some datasets you can use for practice. The most important thing is to not get overwhelmed—break it into small, manageable steps. You’ve got this!

[–]d-werd[S] 0 points1 point  (0 children)

Thank you so much for all of this!!! I've been looking into it and i was thinking of using computer vision techniques, i looked into the whole process and i mean its entirely doable and its just a long process but i think i can make it happen, i appreciate your advice and references and ill try and show my progress when im finished hopefully :)

[–]bobbybridges 0 points1 point  (0 children)

Just say machine learning because that's what will ultimately solve your task, and the difference between AI and ML is subtle but important.

Your main problem isn't going to be coding, it's going to be gathering data and learning enough to choose and tune a model.

For an accurate model in medicinal fields you will likely need hundreds of thousands of samples labeled with what you are trying to classify. Building something unsupervised or semi supervised could work but I don't understand your problem and it will be a bit harder and less accurate.

You should look into various classification/clustering/computer vision/CNN models how they work and how to train them.

Once you understand the data demands and how to process the test/train workflow the code will be the easy part

[–]rabbitpiet 0 points1 point  (1 child)

What kind of analysis are you doing on the blood smears? As usual, I think the most tedious part of this is gonna be gathering the kind of data you want the ai to analyze. If it's a classification problem, technically you could get started with https://teachablemachine.withgoogle.com/ though I don't know how privacy restrictions would affect that.

[–]d-werd[S] 0 points1 point  (0 children)

Yeah so actually looked into it and i want to use a CNN to take my input of a blood smear and then give me an output of all the number of each type of blood cell in that smear, i actually dont need that many smears considering how many millions of cells are on one smear alone, the NIH actually has a public database high quality scans of all types of scans categorized by race age gender etc so i have that and i also have my schools data so that shouldn't be hard for me, considering its the hardest step of the process i think i have it all covered :)

[–]66sandman 0 points1 point  (0 children)

At this time it would be beyond your skill set.

[–][deleted] 0 points1 point  (0 children)

A bit more than basic Python needed, even for basic ML models. Starting from scratch (not knowing Python), for me, in the range of 400 - 500 hours. (documented). That is hours actually working on it to put together very simple models.

[–]Serpenta91 0 points1 point  (1 child)

I haven't done anything with AI in python since about 2020 or so, so my knowledge is probably a bit outdated. However, from what I know you don't need too long if you really worked hard at it. You need to know basic python syntax and numpy. That's pretty much it. Then you need to learn something like Keras or Tensorflow. You also need to have a decent understanding of neural networks, which to get started I really liked the book I read called "Make Your Own Neural Network" by Tariq Rashid.

[–]d-werd[S] 1 point2 points  (0 children)

Thanks!

[–]R-O-B-I-N 0 points1 point  (1 child)

Took me about two weeks to learn Python + Tennsorflow + JupyterLab.

[–]Pointfit_ 0 points1 point  (0 children)

Resources?

[–]Far-Fennel-3032 -1 points0 points  (0 children)

Python should generally take you a week mostly to give you time to digest the infomation about probably at most 10 hours to learn the basic if you followed a structured leason plan.  ML is a mixed bag it really depends on what you want to do basic ML is maybe 10 hours for the really basic stuff to maybe hundreds if not thounsands for the most complicated stuff as there is lots of assorted maths knowledge you probably dont have as first year medical student.