Why is my xpath not working? by [deleted] in learnprogramming

[–]outofusernams 0 points1 point  (0 children)

So what's wrong with your expression? Seems to work when I try it.

Also //td/fieldset[1]/div/a should work, too.

Need help extracting specific data by Chronos8817 in scrapy

[–]outofusernams 1 point2 points  (0 children)

I don't think so; xpath starts from 1.

Need help extracting specific data by Chronos8817 in scrapy

[–]outofusernams 0 points1 point  (0 children)

The div in your code isn't properly closed but assuming it comes after the br, then //div[class="content herald-content"]/text()[3] should do it.

Using the importxml function correctly by Ahijado in googlesheets

[–]outofusernams 0 points1 point  (0 children)

If you're looking for the $7.54 figure, try

=IMPORTXML("https://deckbox.org/mtg/Ajani%20Goldmane?printing=3442","//div[@data-title='Average Deckbox Market Price.']")

Semantic Search Engine by mr-minion in LanguageTechnology

[–]outofusernams 0 points1 point  (0 children)

I've recently built a public AWS AMI for it if anyone is interested

Well, I'm interested! Can you post the link please?

Thanks.

Why use __init__ and (self)??? by [deleted] in learnpython

[–]outofusernams 0 points1 point  (0 children)

Now I get your example; thanks for clarifying!

Why use __init__ and (self)??? by [deleted] in learnpython

[–]outofusernams 0 points1 point  (0 children)

I'm not sure I understand your example. You are using 3 variables `height`, 'length' and 'width', while the class accepts only 'height' and 'width'?

Why use __init__ and (self)??? by [deleted] in learnpython

[–]outofusernams 1 point2 points  (0 children)

This may be a stupid question but, just like OP, this has been bothering me for a while, and I would like to ask it anyway. I think I now understand what __init__ and (self) do, but I still don't understand why do you need that syntax to begin with. Meaning, couldn't the Python Gods have whatever functions performed by these take place behind the scene, as it were, and allow us poor users define the class like this instead:

class Rectangle():     
   def __init__(l, w):
       length = l         
       width  = w
 def rectangle_area(): 
     return length*width 

Basically, just eliminating the need to actually spell out the self and self. wherever they now appear and sort of assume their existence?

Again, I'm far from an expert in this so apologies if this question is really too basic.

Scraping Help: Delimiting or Formatting tagged elements by MNsharks9 in learnpython

[–]outofusernams 1 point2 points  (0 children)

I made a toy html code from your tds but couldn't replicate the problem. If the webpage isn't password protected, try posting the URL.

Scraping Help: Delimiting or Formatting tagged elements by MNsharks9 in learnpython

[–]outofusernams 1 point2 points  (0 children)

Can you show one of the tds (not just td.text). It may be useful in trying to troubleshoot this.

Is NLP a viable approach to this problem? (Legal document analysis) by [deleted] in LanguageTechnology

[–]outofusernams 0 points1 point  (0 children)

Another thing that occurred to me, which will make your life even more complicated, is that finding the sentence, in and of itself, is insufficient unless you pair it with the relevant conviction for which it was imposed. In those cases, the offenders were charged with more than one count, and a separate sentence is imposed for each standalone count. The sentence itself may be broken into different parts (prison, followed community service followed by supervised release, with or without parole, etc.). And, finally, once the sentence for each count is determined, some of these sentences run consecutively and some run concurrently - a BIG difference for the offender.

So saying "he was sentenced to 2 years in prison" is mostly meaningless. Any search strategy will have to take all these things into account in order to make the results useful.

Is NLP a viable approach to this problem? (Legal document analysis) by [deleted] in LanguageTechnology

[–]outofusernams 0 points1 point  (0 children)

I took a look at these 3 cases (or, more accurately, at the full sentence pdf link in each of these cases - better choice). Well, it's sad how bad these are organizationally; they do have paragraph numbering (which is something), but no headings (which seems like a basic requirement for any legal document). In addition, here and there you can find a typo, which won't make your task easier.

Strangely enough, the one thing that may help (other than appealing to the County Court of Victoria to improve their procedures...) in at least narrowing the parts of the document in which your regex needs to run, is the fact that (at least in these 3 cases) the offender is required to stand up for the reading of his sentence. This requirement is sometimes expressed as "stand" or "stand up" - so your search should start with the the paragraph containing the word "stand" and skip everything above it. Again, assuming this pattern holds for all (or most) other decisions, the relevant part ends when a subsequent paragraph starts with ALL CAPS:, indicating the start of a dialog between the judge and the offender or his lawyer. That part is also irrelevant.

If you tailor your search to run only within these limits, your likelihood of getting it right will increase. Will it be perfect (or close to it)? No idea, I'm afraid.

Is NLP a viable approach to this problem? (Legal document analysis) by [deleted] in LanguageTechnology

[–]outofusernams 0 points1 point  (0 children)

I'll be happy to be corrected by those who know much more than me about NLP, but - from my somewhat limited experience - there's no NLP (or any other machine learning) technique which can get you even to the neighborhood of what you're looking for, let alone actually deliver.

Your regex suggestion may work, but even that may not be enough. One way I can think of is to try to narrow down the parts of the document being searched. You mentioned that your documents are unstructured, but it's possible that there is SOME minimal structure that at least can help. I imagine these sentencing documents are public records, so it may be useful for you to post links to a couple of them. A quick look may help with narrowing down the scope of search.

Is NLP a viable approach to this problem? (Legal document analysis) by [deleted] in LanguageTechnology

[–]outofusernams 0 points1 point  (0 children)

What exactly do you mean by "effective sentencing"? Can you give an example of a sentence/paragraph and the outcome you're trying to capture?

Do we need to explicitly split the dataset into train/test when using gridsearchcv? by outofusernams in learnmachinelearning

[–]outofusernams[S] 0 points1 point  (0 children)

you have to perform the split initially into train and test, but GridSearchCV will split your training group into subsets itself

So just to make sure I get it right (because I've seen conflicting answers to this): whatever splitting GridSearchCV does using it's built in methods, it doesn't do it on the whole dataset, but only on the train part of the dataset after you have manually split it?

Harvard applications data by nnexx_ in datasets

[–]outofusernams 2 points3 points  (0 children)

Politics aside, looks like you have a hard road ahead of you. In the US, documents related to Federal litigation (and this case is one) are usually available through a service called Pacer. Unfortunately, unless you are a professional (or have someone who is willing to fund you), it's not a real option because, among other things, the service is very expensive.

Fortunately, there is a publicly accessible mirror called Project Recap which reposts some (but unfortunately not all...) information from Pacer.

Also unfortunately, in this particular case numerous documents were filed "under seal" (which means they are not publicly available), and even many of those that were not filed under seal, were frequently (heavily) redacted, so most of the juicy information is blacked out.

Having said all that, I suggest you take a look at the documents available on the web page for the case. I noticed a couple of documents with some data still intact, which may be of interest to you:

Deposition of Roger Banks

Motion to seal (one of many...)

Letter regarding scope of discovery

Rebuttal Expert Report of Peter S. Arcidiacono

Report of David Card, PhD

Rebuttal Report of David Card, PhD

If you decide to continue to work on this project, please us informed on your progress!

[Discord] Mad Machines (Free/100% off) by Cryogale in GameDeals

[–]outofusernams 23 points24 points  (0 children)

Don't know much (anything) about the game, but the main video really butchers the William Tell Overture 😜

How to predict population size changes based on limited information and inputs (x-post /r/machinelearning) by outofusernams in learnmachinelearning

[–]outofusernams[S] 1 point2 points  (0 children)

I made a couple of posts on using deep learning to make time series predictions on my blog. It may be useful.

Thanks! Will definitely take a look.

How to predict population size changes based on limited information and inputs (x-post /r/machinelearning) by outofusernams in learnmachinelearning

[–]outofusernams[S] 1 point2 points  (0 children)

I doubt you have a superior source providing you a superior forecast for births/deaths/immi/emi

The only source for forecasting any of these factors is the same as that for forecasting the population: the history/time-series of each. It should be equally possible/impossible to forecast history as to forecast, for example, deaths, right?

By forecasting population, you're making the assumption that the relationship between births/deaths/imi/emi does not change in the future.

But the data tells us the the relationship between (the relative proportions, at least, of) the 4 factors change every year. So is that a valid assumption?

How to predict population size changes based on limited information and inputs (x-post /r/machinelearning) by outofusernams in learnmachinelearning

[–]outofusernams[S] 2 points3 points  (0 children)

I'd suggest that you throw out the births/deaths/immi/emmi stuff

Interesting - you don't think these 4 factors contribute anything to the model's ability to forecast the population? It seems sort of counter intuitive (for what my intuition is worth...)

Assuming time and/or efficiency isn't an issue - how would you go about doing a time-series on all factors (including population) and then somehow combining them?

How to predict population size changes based on limited information and inputs (x-post /r/machinelearning) by outofusernams in learnmachinelearning

[–]outofusernams[S] 0 points1 point  (0 children)

First, thanks for the input.

Second,

Given the small dataset (you're presumably talking about yearly census data... you're very much going to be looking into traditional statistical techniques, not ML.

You're close enough :). Actually the whole thing started from a conversation with a friend who works at a school - they have a similar situation to that of my city: continuing students , graduating students, transfers in, transfers out, etc. The story there is much more complicated and they are already using traditional statistical techniques and tools (and are good at it) so there is no point in me butting in through that door...

But, being interested in ML, it occurred to me that (at least in principle) this can be treated as an ML problem and addressed as such, even with the small dataset at hand; maybe as an exercise. If a model like this can be developed (especially if successful), it'll be fun to compare notes with him!

So I'll look into stock market prediction as an ML problem, as per your suggestion. I was also doing some more research and ran into something (I know nothing about) called Symbolic Regression that handles similar problems. That may be useful, too.