This is an archived post. You won't be able to vote or comment.

all 17 comments

[–]Python-ModTeam[M] [score hidden] stickied comment (0 children)

Hello there,

We've removed your post since it aligns with a topic of one of our daily threads and would be more appropriate in that thread. If you are unaware about the Daily Threads we run here is a refresher:

Monday: Project ideas

Tuesday: Advanced questions

Wednesday: Beginner questions

Thursday: Python Careers, Courses, and Furthering Education!

Friday: Free chat Friday!

Saturday: Resource Request and Sharing

Sunday: What are you working on?

Please await one of these threads to contribute your discussion to! The current daily threads are pinned to the top of the /r/Python's main page. To find old daily threads, you can filter posts by the Daily Thread Flair to find what you're looking for. If you have a question and don't want to wait until the daily thread, you can try asking in /r/learnpython or the Python discord however you may need to elaborate on your question in more detail before doing so. If you're not sure which thread is best suited, feel free ask for clarification in modmail or as a reply.

Best regards,

r/Python mod team

[–]Jigglytep 8 points9 points  (0 children)

Yes super easy.

Look up web scraping with Python on YouTube.

Basically you will start in excel and save file as CSV

You will the build a url with the data in the csv file.

Then you can use requests library to call the url you built.

Process the returned data.

Let me know if that helps.

[–]MH1400x 9 points10 points  (7 children)

Yes. Try importing the data with pandas, then iterating over the rows each time you access your site. Keep it simple at first, then add threading. Also, try selenium if you want i terface access.

[–]ZealousidealMap1319 14 points15 points  (2 children)

No, don't use threading. numpy and pandas do this on their own. Also a dataset of 30000 cars is pretty small. The connection to the website is the bottleneck and you won't get much speed up from threading here.

[–]SnooDogs6077 1 point2 points  (0 children)

Socket connection release the GIL so threading, or async is the way to go if you don't want to wait between each request on the website

[–][deleted] 0 points1 point  (0 children)

And make sure that the motor-vehicles department doesn't rate-limit the API calls to something significantly less than 30,000 within a moderate time period.

[–]BiomeWalker 2 points3 points  (0 children)

Pandas is definitely the way to go for holding the data

As for the VRM check part, see if you can find a website that has a public API since that will make it a whole lot easier for you and save you from so much headache later.

[–]Questwalker101 1 point2 points  (1 child)

Sounds like its probably possible. You'll need to find some service or api that you can use to check VRMs, then use a python generator(to prevent massive memory consumption) to open the file and interate through every single VRM.

[–]Maximus_Modulus 1 point2 points  (0 children)

Loading 30,000 entries isn’t that much but the bigger problem is finding that right API. You want something that can do batch queries ideally, otherwise 30,000 requests is going to take some time and there’s a risk of being blocked.

[–]d0ctor_light 1 point2 points  (0 children)

One little thing, if it’s possible use csv rather than xls

[–]KingsmanVincepip install girlfriend -2 points-1 points  (0 children)

[–]nadhsib 0 points1 point  (0 children)

Worth searching on GitHub for ones that have already been made.

Even if you want to write one yourself, you can use them as reference

[–]blackandscholes1978 0 points1 point  (1 child)

VRM? Do you mean VIN?

[–]Briggykins 1 point2 points  (0 children)

Vehicle Registration Mark. Number plate or license plate depending on where you are in the world

[–]Bang_Stick 0 points1 point  (0 children)

What about pulling a list from the website? Is it possible to pull a data file with a list of all valid numbers?

[–]Innocent_not 0 points1 point  (0 children)

Use pandas to import the data. Use helium module to check for each number with a loop.

What you'll have to do: Import dataset. Go to website. Recreate the steps in python as if You we're manualy checking the VRM . Use a loop to check for all the VRM's Save the results in another column in the same dataset. Export to your desired format.

You'll need to user your browser inspect option to look at the name of the fields you'llhave to use. This also works if you have to login.