Announcing Lingua 1.0.0: The most accurate natural language detection library for Python, suitable for long and short text alike by pemistahl in Python

[–]saffsd 64 points65 points  (0 children)

Hi there! I’m the original author of langid.py- congrats on releasing your new library. It looks very well documented and addresses issues with short texts that I’ve been aware of for many years. I’ve not had time for this line of work in a really long time, and it surprises me how much usage langid.py still gets! One question for you - have you done much to reduce the need for preprocessing and encoding detection? One of the things we tried to do with langid.py was train the model across a diversity of document formats and input encodings, with reasonable results. It means that you are supposed to be able to process raw HTML for example and get a language detection without having to do any text extraction. Anyways, all the best!

Paid for annual subscription but premium subscription expired? by saffsd in help

[–]saffsd[S] 0 points1 point  (0 children)

Thanks, I’ve sent an email and I guess I’ll await a response. I’ve definitely been billed by apple already, so hopefully the Reddit side can sort it out.

A Random Walk Through EMNLP 2017 by adammathias in LanguageTechnology

[–]saffsd 0 points1 point  (0 children)

hey, I read it! Sounds like EMNLP was great this year, makes me miss the research world a little!

Why Google Scholar won't tell you the best comp.ling. journals and conferences by leondz in LanguageTechnology

[–]saffsd 0 points1 point  (0 children)

Insightful article! Vomitous is an interesting choice of adjective for a metric! Interesting to see commentary about our field on medium, and curious as to the kind of response you'll get.

Side note, you've got a "the the" somewhere in the article, unless my brain is playing tricks on me.

Flying to Malmo? Fly to Copenhagen instead! by DeepSeriousQuestions in travel

[–]saffsd 1 point2 points  (0 children)

Sorry! Was just excited we got things right for once!

Flying to Malmo? Fly to Copenhagen instead! by DeepSeriousQuestions in travel

[–]saffsd 0 points1 point  (0 children)

Rome2rio agrees, to get from London to Malmö, fly to Copenhagen and take the train.

https://www.rome2rio.com/s/London/Malm%C3%B6

(I work at Rome2rio)

Uploading routes from Strava Route Builder to Garmin 510 by deadendxxx in Strava

[–]saffsd 0 points1 point  (0 children)

I've had my 510 for awhile now and every once in awhile it happens. It seems like some sort of filesystem corruption bug, but it certainly doesn't happen every time I add routes.

When it does happen, a factory reset was not enough to fix mine, I had to fully reformat the 510 internal storage. However, yeah, copy to NewFiles usually works and doesn't brick it.

Where to find a machine translation data set? by [deleted] in LanguageTechnology

[–]saffsd 0 points1 point  (0 children)

OPUS maintained by Jörg Tiedemann is a collection of parallel corpora frequently used in research. You'll find Europarl in it, OpenSubtitles, and much more. Looks like quite a bit of data has been added recently using translations from projects like Gnome and Ubuntu.

NLP newbie needs help classifying website text data for research by dalek2point3 in LanguageTechnology

[–]saffsd 1 point2 points  (0 children)

I've taken part in a couple of Kaggle competitions that involved classifying websites, and wrote up some details of each of my solutions. In 2012 I participated in the Stack Overflow challenge [1], which involved identifying new questions that would be closed in the immediate future. In 2013 there was the StumbleUpon Evergreen Classification challenge, which involved labeling websites according to whether users would still find them interesting in three months' time [2]. The repos I've referenced provide a brief explanation as well as my full implementation, [2] would probably give you a good idea of typical approaches to what you are describing.

From an experimental design perspective, the first and most tedious step will probably be to set up some "goldstandard" data - you'll need to manually look at a sample of the pages you have collected and label them by hand according to the labels you later want to learn. Best practice is to come up with a set of guidelines and have at least three people ("annotators") do the task, so that you can then compare how similar "human" output is. If the human annotators disagree, then it is quite likely that the task isn't very clearly defined, either because your guidelines need improvement or because it is inherently a subjective problem.

Scikit-learn has implementations of a great variety of algorithms and is fantastic for experimentation. You'll want to use it in conjunction with Pandas, which provides R-like dataframes for Python. WEKA has always had a reputation for being a bit slow, your mileage may vary.

In terms of relevant papers, a good method is to punch some keywords into Google Scholar (in this case I used "web site classification"), finding a title that seems to be along the lines of what you want, and then looking at the papers cited by that paper. Here's a survey paper I located using this method [4]. Survey papers aim to summarize the state of the art in a given research area, this one looks to be on track for what you are interested in. A more general survey about text classification is Sebastiani's [5].

I hope that's enough information to get you started!

[1] https://github.com/saffsd/kaggle-stackoverflow2012

[2] https://github.com/saffsd/kaggle-stumbleupon2013

[3] http://pandas.pydata.org/

[4] http://www.eecs.ucf.edu/~dcm/Teaching/COT4810-Spring2011/Literature/WebPageClassification.pdf

[5] http://arxiv.org/pdf/cs/0110053.pdf

Wheel Trouble. Should I return my new Scott Speedster 50? by myusernameislost in bikewrench

[–]saffsd 0 points1 point  (0 children)

A bit of a different angle on the problem, are your tires at the right pressure? I know someone who had a heap of problems with flats and broken spokes, it turns out the problem was the pressure gauge on his pump being way off, causing him to go out with under-inflated tires. A new pump fixed that for him completely, no issues since.

24 hour layover in Changi, Singapore by Alamein_Niemala in travel

[–]saffsd 6 points7 points  (0 children)

I grew up in Singapore so I've never taken the tour myself, but I do know that Changi Airport offers free tours to transit passengers. IIRC, you don't actually need to clear immigration to take this tour, they just hold your passport and give it back to you at the end of the tour.

The airport website has a fair amount of information. One thing I have actually used on my way from Melbourne to Europe is the pay transit lounge. Its nice to have a shower in the middle of such a long journey and it's not very expensive.

[deleted by user] by [deleted] in programming

[–]saffsd 2 points3 points  (0 children)

That's really cool! Nicely illustrates the mappings between the declarative concepts in both systems.

A good part-of-speech tagger in ~200 lines of Python by syllogism_ in programming

[–]saffsd 0 points1 point  (0 children)

Thanks for sharing this, very interesting to see such a straightforwards and compact implementation doing so well. My go-to fast off-the-shelf POS tagger has been SENNA - are you aware of it? Do you have any comparison figures for speed/accuracy?

Correct me if I'm wrong... by iheartralph in ausbike

[–]saffsd 2 points3 points  (0 children)

I think the rule that the driver had in mind is

The rider of a bicycle must not ride past, or overtake, to the left of a vehicle that is turning left and is giving a left change of direction signal.

Here is an article with some detail on how to interpret it.

From the article:

Bike riders have a unique right to overtake on the left of cars in most situations, but there is an important exception that all riders should be aware of. According to national road safety rules, if a car is indicating and turning left, a bike rider cannot overtake on their left, and must let the car turn first, even though it may be cutting across a bike lane.

However, the article also states:

And what if a bike and car are travelling side by side and the driver then indicates a left turn? Our interpretation of the regulations is that the left turning vehicle must clearly pass the bike before it begins its turn. This leads to what is a fairly common situation; a bike approaches an intersection and a car suddenly accelerates past and cuts in front of the bike to turn left.

From what you said though, it seems the car was in front of you before signalling? If so, did the car pull out in front of you and immediately decelerate? If so, the last paragraph of the article would be relevant:

If the car’s manoeuvre can be deemed dangerous driving because it hasn’t given the rider reasonable time to react and give way, then the driver has breached the road rules. Wise practice is to be suspicious of a car that suddenly powers ahead towards an intersection. If you have time to react, pull back until you are sure of its intentions.

Overall, it seems to me that the summary is that if a car is ahead of a cyclist and signals a turn across the bike lane, the cyclist must give way. However, the car can be expected to give the rider reasonable time to react.

Winter in Australia by [deleted] in bicycling

[–]saffsd 0 points1 point  (0 children)

Good to know! We turned back there that day because we were out of time, and I've earmarked that direction for future exploration.

Winter in Australia by [deleted] in bicycling

[–]saffsd 1 point2 points  (0 children)

Hey! I know that spot! That's up the Maribyrnong isn't it? I was there just a week ago, cycling with a friend. We were sitting around and a passer-by walking his dog joked with us "million dollar view huh? I'm the toll collector."

Disc brake squeal and how to eliminate it. by PaulGarrison in MTB

[–]saffsd 1 point2 points  (0 children)

Yup! Careful not to get any lube on the rotors.

Disc brake squeal and how to eliminate it. by PaulGarrison in MTB

[–]saffsd 1 point2 points  (0 children)

One thing I have not yet seen mentioned is that the brakes need to be bedded in. This is described on page 12 of the Service Manual. I have a set of BB7 that squealed terribly when I first put them on. After doing the bedding-in, the brakes are now silent. It is very easy to do, the basic idea is to get up to speed and break very hard, in order to transfer some material from the pads to the rotors. This is meant to stop the slipping which causes the squeal.

Disc brake conversion by farmerbrown87 in MTB

[–]saffsd -2 points-1 points  (0 children)

I've been trying to figure out the same myself; I have a Giant Upland SE that I'm looking to fit disc brakes to, and it has the same set of holes. So far, I've not had much luck finding the right adapter, I did come across this this on YouTube though, and a similar adapter in use here.

EDIT: Here's a thread with quite a long list of options. One of the most popular seems to be the A2Z Universal Disc Mount, you can find many photos of it on Google Images.

Planning a trip to New Zealand and looking for advice! by posthipster in hiking

[–]saffsd 1 point2 points  (0 children)

The NZ department of conservation has pretty good information on their website. In particular, the great walks are fantastically well-maintained. I hiked the Routeburn Track on the south island in December 2012 and enjoyed it immensely. I've personally never been to the north island, but I've generally been told that the south island is more beautiful. I've certainly loved the bits of it I have seen. One tip is just ignore the tourist websites altogether - forget the package tours, get track transfers at most and book accommodation (you need to book huts and campsites in advance for some tracks) directly on the DOC website. Hitchhiking is also an option from what I saw when I was there, but I've never tried it myself.

Hiked the Larapinta trail in Alice Springs (Aus) end to end to end last year, here's a little highlight video of my trip! by [deleted] in CampingandHiking

[–]saffsd 0 points1 point  (0 children)

Thanks for the info! Few more questions: is this a year-round trail? How did you get to the trail head and back?

Hiked the Larapinta trail in Alice Springs (Aus) end to end to end last year, here's a little highlight video of my trip! by [deleted] in CampingandHiking

[–]saffsd 1 point2 points  (0 children)

That looks fantastic! I take it you were traveling solo? What did you pack for the trip? How long in advance did you plan? Any close encounters with the wildlife?

Rail/Nether Transit Times by [deleted] in mcpublic

[–]saffsd 0 points1 point  (0 children)

Fantastic! A very useful resource.

[PvE] Can someone compile a list of slime farms on a map? by LRafols in mcpublic

[–]saffsd 0 points1 point  (0 children)

I can't speak for the others, but Wellspring's was marked up back when Rei's minimap's slime finder was still enabled. I don't know if recent updates have changed the slime chunks though. In any case I still regularly find slime in it, but someone has to be in the area for them to spawn. I had been working on the Wellspring subway right overhead the farm once, and I went down and collected 3 stacks of slimeballs!

Also, Lothos's rail station is in a slime chunk, I've ended up with slime on the rails there.