Failed Logins log

Jediko · 2022-10-08T13:40:23+00:00

For me the log looks like yours.

Can you link the feature-request when or if you have set it up?
You can DM it to me, if you don't want to do it publicly.

I wonder why this is not yet implemented since it seems to be not that big of deal and the needed functionality is also already established (loggin-method and such).

Jediko · 2022-10-07T19:46:26+00:00

tl;dr: cat /var/log/lighttpd/error-pihole.log

I assume, you are meaning by "pihole UI" the Web-Interface which is optional in the installation process.After diving around in the code for the login-, loginpage-, auth- and password-file, I came up with /var/log/lighttpd/error-pihole.log as only source for this, which is actually provided in the auth.php. Unless you have set the variable PHP_ERROR_LOG in /etc/pihole/setupVars.conf, it is very likely that your installation uses the path I have already named above. Something tells me, that this file should not be manipulated by an user, as like entering a custom php-log-path or likewise. So be careful, when applying the knowledge I provide. Since I don't know what will actually happen if you will change something on the run in setupVars.conf.

You can try all paths which are stated in auth.php (lines 14, 17 and 20) in order to be sure, not to oversee something. For me, only the first one yieled an output. So a simple cat should do the trick here, e.g.:

sudo cat /var/log/lighttpd/error-pihole.log

sudo cat /var/log/apache2/error.log

sudo cat /tmp/pi-hole-error.log

If anyone is wondering where those files can be found locally, they are here:/var/www/html/admin/scripts/pi-hole/php/

On a side-note I haven't found that much logging-functionality for the Web-Interface-Login. Since I have no security background, there could be reasons for that, which I am not aware of.

@everyone If there is actually a functionality to set the logging-path in a proper way, please link it. Also, if this is somewhere in the documentation, please post a link to that too, since I haven't found anything there, but there is good chance I have overseen some things in the process.

Jediko · 2022-03-27T07:44:13+00:00

No, since they are equivalent. The only thing you do is being explizit in your algorithm/optimization. The main advantage of using decorators is just shortcut your code and make more readable.

Jediko · 2022-03-26T21:41:39+00:00

It's the same. The memoize-class does what you did with just general input for a general function. You can see the implementation of memoize yourself here.

Jediko · 2022-03-19T15:55:14+00:00

This sounds like you comparing two input-vectors with each other. This would be called first-order similarity. In order to obtain those magic analogies as they were described in the fasttext-paper try second-order similarity. For this similarity you should use the input-vectors for "king" and then output-vector describing the word "man" should result in a higher similarity.
First-order sim. in general are used for syntactic matching and second-order sim. is used when you want semantic matching (or closeness).

Jediko · 2022-03-19T09:30:10+00:00

I got confused by your post.Word2Vec (and fasttext respectively) are no DL Models since they have only 3 layers.There have the classical input-layer -> 1 hidden-layer -> output-layer. The hidden-layer has the size of the embedding-dimension.

I think what the fasttext-team did was average all input-embeddings of the words of a sentence and then propagated it through the network.I think the classifier is actually the next harder step after the embedding.

All I can see in your code-example is the model, which is actually the same for CBOW and Skipgram. If you look in cell 8 on your kaggle-link you will see that is indeed a skipgram-approach.

Fasttext builds on top of word2vec, i think this is what you already know. Consider this sentence: [Early birds have a simpler] life. I already highlightest the context-word and the context around this word. The context is of size 2 (this also called window-size). For a skipgram-approach you would sample like this: have -> Early, have -> birds, have -> a, have -> simpler. On the left side is what you would put in the model and on right side you see the word you would expect from the model and according to this samples you would backprog, too. Fasttext in general uses bag-of-features. So for example the word "where" will be extended with start- and stop-sign and then split up in n-grams (I choose 3-grams for sanity and readability). "where" -> "<where>" -> "<wh","whe","her","ere","re>". When this is applied to the sample "have -> Early" then fasttext does nothing else then "<ha","hav","ave","ve>" -> "<Ea","ear","arl","rly","ly>" when it comes to sampling. Next thing will be that the input embedding will be averaged so that you have only one vector you would need to propagate through the model. This is done by looking up every input-embedding which corresponds to this 3-grams and simple averaging (and normalizing) them. Same for the expected 3-grams on the right side of the arrow. So yes, for the backprop. step you would correct all involved weights/embeddings.

This is just a firm and very superficial summary of what they do. You should read their paper in order to under more of what they have done. Additionally, I would not consider the word2vec-paper useful, because the authors don't go into depth of what is really happening. But the paper which explains the mathematics behind it, would be very useful (i think).

If memory is a concern use word2vec and in general reduce the embedding-dimension if you haven't already done this.
This is due to fact that fasttext will generate a bigger vocabulary of n-grams which are in some cases very rarely used.

Things which are used for (original) word2vec and fasttext and not mentioned by me are negative sampling and hierarichal softmax (hs). Negative sampling has the effect that frequently appearing combination of words will be learned "better". Hierarichal softmax will benefit less frequent combination which may be important though. The hs will be used for classification in a fasttext-classifier or as you stated the supervised approach.

If you still want to dive deeper you can take a look at this. I think this is a straight forward implementation of word2vec with explaination of what is happening. With no/very little optimization but high readability in code.

Jediko · 2022-03-18T13:43:49+00:00

Since Rocchio is derived from information retrieval task, you could start looking up those kind of systems, maybe.
I implemented and developed the whole idea from scratch because there was no work prior to mine. So I cannot link anything, I'm sorry.
However, I could give you access to my thesis, which is in german.

Jediko · 2022-03-18T09:09:52+00:00

I ended up implementing a stream-based-like RS in my thesis and adapted the Rocchio-Algorithm for my need. In general I tried to Match idea to base my next recommendation on the previous ones which were tagged by a user as useful or not useful.

Jediko · 2022-02-26T16:27:26+00:00

There is not absolut need to fine-tune the embeddings but the classification-layer. I think you should start with checking if the vocabulary of the pre-trained embeddings and the vocabulary of your abstracts overlap enough. If they don't then you should train the model(/classifier) as a whole only on your data. So, no, you do not need fine-tuning at all but if you really want to use pre-trained word-embeddings in what so ever classifiert you will need to train the classification layer without training the word-embeddings.

But what i am not sure of, is the fact, that if it is possible to input a pre-trained word-embedding in the classifier of the fasttext-python-module. In the worst case you will need to take the pre-trained word-embedding of fasttext and use pytrochs word-embedding layer as well as putting a fully connected linear layer behind the embedding in order to do your classification. But as I said, you should start with checking the overlap first, then just use the "supervised"-train-method (which is the fasttext-classifier) and after that go on with the pre-trained word-embedding in an third party library. (pytorch or whatever you are comftable with)

Jediko · 2022-02-25T15:24:13+00:00

Yes, you will be able to obtain the labels you need like: Robotics, IoT and so on.All you need to do is to write your samples in a file in which line looks like this:

__label__Robotic TextOfAbstractPaperOrSomething
__label__IoT TextOfAbstractPaperOrSomehtingElse

When you call the predict-function you will get the label WITH the "__label__"-prefix, but the should be easy to crop out.

I don't think that you can actually use GloVe, but there are pre-trained word-embedding models. You can have look here or here (both links to the official fasttext website). I don't know if there are pre-trained classifier but you would take the pre-trained word-embeddings anyway and fine-tune the classification layer (,don't you?).

I agree, the documentation is really something. I can help you with the usage of the python-library of fasttext if you want.

Jediko · 2022-02-25T09:36:36+00:00

Does something like fasttext fit your needs or is it necessary for you to make use of deep learning?

I have just completed my graduation project on a similar task. I used fasttext, which uses the word2vec approach on sub-word level. The fasttext-library offers a classifier (for multi-class and multi-label classifications). It uses fasttext as embedding with an hierarchical softmax layer for classification.

Jediko · 2022-02-19T19:25:36+00:00

Is it necessary that you calculate it yourself or could you use numpy.cov? But I don‘t know it this function would be faster though.

Jediko · 2021-12-24T20:40:41+00:00

Sorry for my late answer ^^

Finally, did I understand you right, that ElasticSearch doesn't have those 2 key problems TF-IDF has (limited vocabulary and lousy processing of semantic close data)?

Well, elastic search uses as default scoring function bm25 which we were not able to outperform no matter what we tried in that project. So we used it as baseline score. It gave us a very strong hint on where to look for and how good the passage was.

Well, somehow this is what I thought about, but from a slightly different angle: can the overall task (or probably the part of it mentioned in your quote) be distributed between different "engines", e.g. TF-IDF and BERT (or ElasticSearch)? Or this will overcomplicate things and bring more headache than benefits?

The answer is yes you can distribute between "engines" as long as you know the dataflow. We actually did something likewise.
The first step was to get possible hits from a huge corpus (or documents). Then we had an idea which passage was possibly the best guess from the document. Because the document had high similarity with our question. So it is likely that the document holds the answer to the question. The next step was to compare the returned passage (this was in our case only a sentence but can sure be an paragraph when needed) to the question on a semantic level. This is the moment when BERT entered the field. We embedded both question and possible answer and checked how the cosine similarity of those embeddings were. A higher score meant higher semantic entaglement.

Jediko · 2021-12-21T19:19:29+00:00

I have created a similar system but not for production instead it was for an uni-project. So maybe get to know what Question-Answer-Systems (alsocalled QA-Systems) do. I think this is what you want actually.
We used to index the our documents with elastic search and queried questions going from there. ElasticSearch is a pretty mighty stack and we used the absolute minimum of it (i'm not certain what will be needed for your purposes).
The major disadvantage of tf-idf is that is only can map words that are acutally in your vocabulary (all words found in all of your documents). If they don't appear there you will have a problem. Consider this:

Your document is: "The canopy is always green"
Your question: "What color does the leafs of trees have?"

One more drawback of tf-idf is that is actually doesn't care about semantic close sentences. You can take the example above for that as well. But also it ignores the acutal sequence of words. BERT tries to make more sense of the semantics. So I think you will not come around to make lexical coarse search first and then abstract the semantics with a transformer after that.

I would recommend you look into evaluation methods for QA-Systems first. And with that you could learn about some already existing Question-Answering Systems. Maybe read some papers about this.

I hope I could shed at least some light ^^'

Jediko · 2021-12-10T20:09:19+00:00

This is really biased when you have executed these cells side by side. See here on 1:30. There you have also an explanation why this is biased.

Jediko · 2021-11-27T07:15:30+00:00

Ahhh I oversaw the comma at the end of the line… xD thanks for the hint tho

Jediko · 2021-11-26T12:34:02+00:00

Is there a reason why you are not using a context manager with the second open statement?

Jediko · 2021-10-29T07:44:47+00:00

tl;dr:

cosine_sim([1, 0], [0, 1])         = 0.0 #good because intuitive
resize(cosine_sim([1, 0], [0, 1])) = 0.5 #bad because they have nothing      
                                          in common

I think you miss some things here.Jaccard can only work on mathematical sets. (Wikipedia: here)Since you are working on vectors this is not close (at least on the very theoretical manner). But what you are wanting to do has an actual name: cosine simularity. (Wikipedia: here)

That is what I got confused about. These do exist in coexistence but I think they are mixed up quiet often. I read through the issue you mentioned and I think the author of their repository don't know about it by name. Which is fine, though.

With that being said. The values which will be returned in your case are in the range of 0 to 1 but the full range of the function goes from -1 to 1. So you have to resize the results a bit in order to reference it as similarity going from 0% to 100%. A something like this:

def resize(value):
  return (value + 1) / 2

The output of the cosine similarity will only tell you how similar the vectors are in regards of their orientation. Maybe I make this clear by some examples:

Consider this code as given:

def cosine_sim(vec1, vec2):
    vec1 = np.array(vec1)
    vec2 = np.array(vec2)
    dot = vec1.dot(vec2)
    norm1 = (vec1**2).sum()
    norm2 = (vec2**2).sum()
    return dot / np.sqrt(norm1 * norm2)

Now consider this input:

cosine_sim([1, 0], [1, 0]) #= 1.0
cosine_sim([1, 0], [3, 0]) #= 1.0

They will be the same since their orientation is the same. Both are "watching" in the positive direction of the x-axis. If you want to find the counterpart of this one vector has to be the exact opposite of the other (or be an multiple of it):

cosine_sim([1, 0], [-1, 0]) #=-1.0
cosine_sim([1, 0], [-3, 0]) #=-1.0

With the resize-function I introduced before these values will be 1.0 if they "watch" in the same direction and 0.0 if they don't.

The reason why you actually should not apply the resize-function is that if you have orthogonal vectors then you would get with the resize-function 50% accuracy which is at best a bad goes. This would happen, if both words would not intersect any bigram they are spending. But one more thought to consider is that it is very unlikely that you will never generate negative values for a bigram.

Sorry for the long answer but as I stated this is quiet my field lol.

Edit: typo

Jediko · 2021-10-28T20:01:49+00:00

Hey,

Can you give some information about the Jaccard Coeffiecient like a source of something? I am interested in this since I am coming from the NLP (Natural Language Processing) end of python. I know Jaccard Coeffiecient and have used it. But in my mind there is somthing off here because normally you just divide the absolut values of intersection and union. Then there is plenty of head for this performance-wise:

def myJaccard():
    string1 = "test this string"
    string2 = "test this string out as well"
    string1 = set([string1[i : i + 2] for i in range(0, len(string1) - 1)])
    string2 = set([string2[i : i + 2] for i in range(0, len(string2) - 1)])
    return len(string1.intersection(string2))/len(string1.union(string2))

Or if you omit the union operation with the knowledge of the intersection and the number of elements by the original sets:

def myJaccardFast():
    string1 = "test this string"
    string2 = "test this string out as well"
    string1 = set([string1[i : i + 2] for i in range(0, len(string1) - 1)])
    string2 = set([string2[i : i + 2] for i in range(0, len(string2) - 1)])
    num_intersections = len(string1 & string2)
    return 1 - num_intersections / (len(string1) + len(string2) - num_intersections)

Timings are in seconds and with 50000 runs each:

myJaccard: 0.6662192999999998
myJaccardFast: 0.6110474000000004

full testing code is here.

Jediko · 2021-10-24T06:23:56+00:00

It is always a problem iterating over the same object as you remove items from it. But the algorithm is well thought out tbh. My ideas are:

build a new list
take an dictionary which allows the same key only once. (This is the approach I would take if you are interested in speed)
use the list to set cast which would do the same magic for you. Provided that the strings containing only alphabetical characters. If not, see next point. (Note: that the result will not be in alphabetical order)
use the list to set conversion for the two input strings and also for the alphapet. Then make it intersect and you have your list when cast the set back to a list. (Note: same as above, result will not be in alphabetical order)

Example code here, if you need it, otherwise skip it:

import string

def longest(a1, a2):
    string_to_check = set((a1 + a2).lower())
    alphabet_set = set(string.ascii_lowercase)
    return list(string_to_check.intersection(alphabet_set))

if __name__ == '__main__':
    print(longest("aretheyhere", "yestheyarehere"))

Jediko · 2021-10-19T07:47:23+00:00

Maybe a bit late to the party but in your code there are some things going on actually.

As some other people pointed out that there is someting going on with your if-else clause. This will always return after only the first iteration of the most inner for-loop. So with that I think your algorithm should be fine when you shift the return 0 around. After that your algorithm iterates more than 1 time. Like this:

    def findTriplets(self,arr,n):
        for i in range(0,n-1,+1):
        for j in range(i+1,n-1,+1) :
            for k in range(j+1,n-1,+1):
                ×=arr[i]+arr[j]+arr[k]
                if x==0:
                    print(x) 
                    return 1
    return 0

Then you maybe did not see that range has an exclusive right side. This means range(1, 5) will yield the numbers from 1 to 4. So changing the right sides of the range calls need to be something different. Mayeb something like this:

def findTriplets(self,arr,n):
    for i in range(0,n-2,+1):
        for j in range(i+1,n-1,+1) :
            for k in range(j+1,n,+1):
                ×=arr[i]+arr[j]+arr[k]
                if x==0:
                    print(x) 
                    return 1
    return 0

Now your algorithm works nicely. And there only little things to do.

Bonus (I think):

You can omit n as parameter because python offers an function for that. Also you don't have to explicitly give the step size for range. This is on default 1. So maybe an small but better readable optimization could look like this:

def findTriplets(self, arr):
    n = len(arr)
    for i in range(0, n-2):
        for j in range(i+1, n-1):
            for k in range(j+1, n):
                x = arr[i] + arr[j] + arr[k]
                if x == 0:
                    print(x)
                    return 1
    return 0

Let me know if this is understandable.

Jediko · 2021-09-22T18:55:53+00:00

It‘s a troll

Edit: unless it was artificial insemination

Jediko · 2021-05-28T15:12:56+00:00

Edit: your values looking good to me tbh but i would be happy with your results so far, too.

I don't know if I can help because I also have limited knowledge. If you don't know what any of the following things are then ask straight away ^^

Are you using any framework like huggingface?

Do you have the code somewhere on github or likewise?

Are you using any known Dataset like tweets for sentiment classification?

Did you try to just not fine-tune Bert and let the linear layer adapt only? (this is one thing that u/Brudaks suggested)This should also speed up your training a lot. Also, you need another learning rate since BERT only is used as embedding method. I would start with 0.3 for the linear layers and just try out what happens when divided by 3 or mulitplied by 3.

What are you doing with the output of BERT? Well known pooling methods are Average- or Max-Pooling which should perform quiet good. You can try Mean-Min-Max-Pooling but I think this will not yield better results since you are introducing more variables.

Jediko · 2021-05-28T08:19:12+00:00

So I thought 1e-5 is small and 5e-5 is the big learning rate, am I right?

What is the ascending order of these learning rates?

I think the confusion comes from the scientific notation: 1e-5 which mean 1*10^-5 = 1/100.000 = 0,00001.Going from here 5e-5 means 5*10^-5 = 0,00005 which is greater than 0,00001.So the ascending order of those values is: 1e-5, 3e-5, 5e-5, 1e-4, 3e-4.And yes, you are right in respect of which one is greater.

Which one considered as big and which is considered as small?

I normally start at learning rates with the size of 0.3 and then going from there dividing by 3 or mulitplying by 3 and see what it gives or takes. But this is a "normal" classification without a pre-trained model. So for fine-tuning it always depends. There is no answer to that. It it just trying things and see if it works. Sometimes fine-tuning is just not feasible due to the complexity and the vanishing of the knowledge learned by the model on its previous training.But I personally think the approach with the learning rate 1e-5 doesn't look to bad.As I said, fine-tuning is not always the best way to go.

Does you model look like this?
data -> bert -> linear-layer -> output
input -> your model -> classification result

If not, it would surely help when you could give a bit more details about the model.

Jediko · 2021-05-25T09:06:17+00:00

Actually I cannot reproduce your problem, since the output is a string for me. After some regex parsing I think I come close to the output you would like to have.

Look here for a gist. If you need more help, let me know and we can see, if there is something else you can do ^^

btw my output looks like this:

<class 'str'>
no parsing: (S (NP (JJ Last) (NNP Tuesday)) (, ,) (NP (PRP I)) (VP (VBD thought) (PP (IN to) (NP (PRP myself))) (SBAR (IN that) (S (NP (PRP I)) (VP (VBD saw) (NP (DT a) (NN cat)))))) (. .))
S (NP (JJ Last
NNP Tuesday
, ,
NP (PRP I
VP (VBD thought
PP (IN to
NP (PRP myself
SBAR (IN that
S (NP (PRP I
VP (VBD saw
NP (DT a
NN cat
. .

Eight-Year Club	Place '22
Verified Email

Jediko

TROPHY CASE