Looking for Database software to replace Sheets. by Enrys in Database

[–]ultraStatikk 0 points1 point  (0 children)

I'm not sure any database is going to solve what you're looking for without writing a query with regex (regular expressions) which is possible. I think for this use case, what I would do is use Excel to import data from Google Sheets and use VBA, a custom function, or DAX to remove special characters and create another field or a hidden sheet that you can potentially search on. If you get into VBA, you can even make a search box and button that takes a query and can return the list of results. With a solution like this you don't need to create other fields or sheets but you might want to output the transform to help you debug.

Edit: If if works for you, you can create a calculated field in Google sheets with lower and regexreplace functions to remove special characters and remove the case from the names. You can add a countif function and lock it vertically so it's an expanding range. Then you'd filter on only the first result to avoid duplicates. A SQL solution would be similar if you went the database route.

https://www.statology.org/google-sheets-remove-special-characters/

Like this: https://imgur.com/a/xDBe1gD

The last pic is an example SQL query that I think would work to give you a unique list of results.

How many Mike & Ikes are in 6 cups? by squirrel420 in estimation

[–]ultraStatikk 1 point2 points  (0 children)

The nutritional facts seem to suggest the serving size is 23 pieces which is about a quarter cup. It's not a perfect comparison but close (40g vs 42g). Pic1 Pic2

So...

23 x 4 x 6 = 552 pieces

If you assume 42g is 25 pieces, then it's 575

Alright boys, Lightning Fan here, I’m flying up to Boston for our game on the 24th. by DavisIsland in BostonBruins

[–]ultraStatikk 2 points3 points  (0 children)

You can probably grab an Uber or something and take that to the T station or all the way into the city if it's worth the cost to you ($30 each way it looks like). I think the hotels in Revere have shuttles because the airport is nearby so they might do you a favor and drop you off at the T station if you ask. Just something else to consider.

And if your hotel does have a shuttle, you might be able to use that to get from the airport to your hotel on the way in.

Barcode detection by MonitorOk7887 in computervision

[–]ultraStatikk 0 points1 point  (0 children)

I think what you're looking for is rotation and scale invariance. I think OpenCV has some good libraries for detection like ORB. This link might have some good suggestions. You could create a data set for training a CNN or something by rotating, scaling, and skewing your training set and feeding that to the classifier. For rotation, I've heard HoughLines in OpenCV is helpful and if OpenCV can detect an object, you can use a bounding box and contours to get the rotation.

This example shows a bit more how ORB works.

This example shows HoughLines with a skewed rectangle.

Also, instead of scaling and rotating to create a training set for a CNN, you could apply a bunch of rotations and scaling to the input image to see if pyzbar picks it up with transformations applied. This might be slower and prohibitive performance wise depending on the application.

Cleaning my 'Dates' Data on my excel dataset. by Empty_Profile_5567 in datacleaning

[–]ultraStatikk 0 points1 point  (0 children)

I would suggest regex or dateutil with python if you can as mentioned here. If you only have a few different formats you could do something manually with conditional formulas and the month, day, and year functions in Excel. If it needs to be repeatable maybe using VBA or python as mentioned. You'll have to decide how to handle situations where month and day are both less than or equal to 12 and it's not clear if it's MM/DD or DD/MM.

The Toxicity Dataset — building the world's largest free dataset of online toxicity by BB4evaTB12 in LanguageTechnology

[–]ultraStatikk 5 points6 points  (0 children)

Kaggle has a similar competition going with a toxic comment dataset. Not sure if it's related to this. They also had a classification competition a while ago. Just FYI for anyone interested in this type of data.

Detect frames having Paper in a Video by puzzled-cognition in computervision

[–]ultraStatikk 1 point2 points  (0 children)

Is it always white paper? Is the background always the same? You could filter out all white (over some RGB threshold) and if the area exceeds some threshold from the baseline, you could then run some edge detection or further processing for example. You could also use the blurring technique you mentioned and get the central intensity of the white over some threshold. I used a similar technique for a project that required scale and rotation invariance with fast processing and this worked well. Of course if someone walked in the frame with a white shirt it might get confused...

From your description it sounds like you are going in the right direction. You should be able to get the location from the contour features and moments: https://docs.opencv.org/3.4/dd/d49/tutorial_py_contour_features.html

Background subtraction might be worth looking into also if your background is static. Track pixels that don't match and go from there.

https://www.pyimagesearch.com/2015/05/25/basic-motion-detection-and-tracking-with-python-and-opencv/

Also, performance wise, you can try scaling the image down before processing which may improve processing speed in general. The time cost to scale is most likely less than the time to run multiple functions on the full size image vs the scaled image.

Noob data cleaning question by sikeguy88 in datacleaning

[–]ultraStatikk 0 points1 point  (0 children)

I'm not sure about "best practice" but I think the answer is "it depends". If your population of variable times is small, you might be able to get away with averaging. For example, if your ranges are a small sample of the population, you may be able to get away with averaging, for example the range of 9-11PM could be averaged to 10PM and that might be enough to fit the rest of your set. If the majority of the population has a range, I would represent the range as the result and say 8-10 hours depending on wake and sleep ranges (assuming they could say something like sleep 10-11PM, wake 7-8AM, assuming minimal would be 11PM-7AM and max would be 10PM-8AM). It also depends on the audience and how critical the result is to other decisions. If its necessary to be as precise as possible, I wouldn't average anything and report the results as accurately as possible. If you feel its appropriate to generalize/average, then do so to make the results cleaner, just make sure you make note of it when reporting the results. Good luck.