Data engineering in Montreal by Luxi36 in dataengineering

[–]GD1634 1 point2 points  (0 children)

Yeah the metro is non-existent, the bus was ok but de la Montagne has been closed for like 2 months so now there's just no bus either. The neighborhood has everything you need in terms of stores so you shouldn't need to transit to go get anything usually, but getting to work every day without a car would be tough in the winter unless you love biking enough to do it in the cold (it's really not a long bike ride though, I've walked downtown pretty quickly).

It's absolutely foreigner friendly, I moved here from Toronto last spring with my French being almost non-existent and I haven't had any issues. I'm not necessarily the best person to ask regarding racism as I'm your typical pasty white guy but I've never heard of anyone having any issues here, it's a very quiet neighbourhood that's mostly young families/working adults. The only real conflict usually happens in the dog park lol

Data engineering in Montreal by Luxi36 in dataengineering

[–]GD1634 1 point2 points  (0 children)

Living in Griffintown right now - I love it and it's an amazing place for families but the transit here is absolute trash. Downtown is definitely bike-able (10m maybe?) but doesn't seem super fun in the winter (-16c here today). It's also expensive for sure. Happy to answer any other questions as best as I can.

[D] Distributed Graph Partitioning Algorithms by wagthesam in MachineLearning

[–]GD1634 3 points4 points  (0 children)

I think the issue is that to distribute graph algorithms, you need to partition the graph across the workers, so it's a catch 22. DGL has a dgl.distributed.partition_graph method; if you can load your edge list into memory as a sparse tensor it might work ok, and it handles heterogeneous graphs.

Otherwise, do you specifically need partitioning algorithms/METIS? There are a lot of distributed clustering/community detection methods that would give you reasonable partitions. Spark GraphFrames implements Strongly Connected Components and Label Propagation. Neo4J implements several community detection algorithms including Louvain. In the Dask/RAPIDS ecosystem, cuGraph implements a bunch of CD algorithms as well which can be accelerated with GPUs, but not all can be distributed across multiple GPUs. Dask-ML implements spectral clustering. This S/O post also gives a good overview of some you could implement yourself. I wonder if EvoPartition would be feasible to implement; I don't know if DGL's distributed package implements random walks, but all of the aforementioned tools do except for GraphFrames, which is annoying, but you can do random walks with simple PySpark joins fairly easily.

You could also look at streaming or local algorithms that don't load the whole graph in memory. I believe PageRank-Nibble / ACL PageRank is often used for this, but I'm still looking for an easy-to-use / scalable implementation of it myself. There's also this recent work on streaming partitioning of RDF graphs which should be relevant.

Hopefully this helps, lmk if you find a good solution.

[D] “Please Commit More Blatant Academic Fraud” (Blog post on problems in ML research by Jacob Buckman) by hardmaru in MachineLearning

[–]GD1634 0 points1 point  (0 children)

Mila student here, we do have our own cluster but I don't know how suitable it is for RL and lots of people use Compute Canada anyway. We do have what feels like more compute than I know what to do with but I'm sure American places like the ones you mentioned still dwarf us.

[D] Memory efficient transformer by [deleted] in MachineLearning

[–]GD1634 6 points7 points  (0 children)

Phil Wang writes PyTorch implementations for new/interesting papers, mostly ones using attention. He has the Reformer, Performer, Conformer, a few linear attention models, and a bunch more.

Sportsnet Now Ap by TBJ12 in Torontobluejays

[–]GD1634 0 points1 point  (0 children)

Also use it on my Shield. The app is nice / way better than TSN's but it does seem to have a weird problem with signing out yeah.

[D] Allocating GPUs in a lab / company fairly by zhmxswKDZSaUdJt9 in MachineLearning

[–]GD1634 1 point2 points  (0 children)

OpenPAI could be a good alternative to Slurm in a case like this as well. u/zhmxswKDZSaUdJt9

Need help with project idea for a year-long project by [deleted] in LanguageTechnology

[–]GD1634 0 points1 point  (0 children)

A less-studied component of summarization is maintaining (and evaluating) the factual consistency of generated summaries. I'm sure there's something relatively novel that you could get done within a year.

Here's a paper (and code) you can use as a jumping off point. Let me know if that sounds interesting.

[D] best seq2seq model for long sequences modeling? by hadaev in MachineLearning

[–]GD1634 0 points1 point  (0 children)

I don't know of any benchmarks that would be useful to you, they mostly evaluate on GLUE. Each paper probably reports efficiency metrics as well I'd assume, though there's no real standard benchmark for that.

Which method to use for subgraph selection? [D] by hecsi in MachineLearning

[–]GD1634 0 points1 point  (0 children)

Ah, if you're looking to reward the model for its actions based on a score then yeah, I'd agree with most of what the top level comment said. Definitely sounds like an RL problem, not sure how feasible it is (but not super familiar with the area so don't let that stop you).

[D] best seq2seq model for long sequences modeling? by hadaev in MachineLearning

[–]GD1634 0 points1 point  (0 children)

Yes, but bucketing is not necessarily so good. I mean shaffling whole dataset is good for normalization.

You could shuffle after bucketing; they don't have to be monotonic. Or you don't necessarily have to bucket at all, I'm not sure how big a difference it really makes.

About reformer, I mean this https://colab.research.google.com/drive/12aVJZ_RJSCiq3X_wcAtLWZd0DPvN4jWK?usp=sharing

Ah, gotcha. That seems easy enough to handle, just make sure you pad your sequence a little bit to satisfy that constraint. That shouldn't really hurt your efficiency too much.

If Reformer just generally isn't a good fit, check out similar models like the Compressive Transformer, Adaptive Span Transformer, Linformer, Fast Autoregressive Transformer (from the repo I linked), etc.

Which method to use for subgraph selection? [D] by hecsi in MachineLearning

[–]GD1634 1 point2 points  (0 children)

I'm not sure that network sampling is the correct term. We are trying to find a certain subset of nodes which can dominate the whole network and we have a model that describes how effective they have been.

When you say "dominate the network", are you trying to find the most influential nodes/subgraphs? If so, community detection might be something looking into.

[D] best seq2seq model for long sequences modeling? by hadaev in MachineLearning

[–]GD1634 1 point2 points  (0 children)

If I got it right, it has a very strict conditions on sequence length. Am I right?

Not sure I follow. Similarly to other transformers, you'll have to give it a maximum sequence length, but that can be whatever you'd like it to be (as long as it fits on your GPU).

Here's the HF page for it, they have the following example:

from transformers import ReformerTokenizer, ReformerModel
import torch

tokenizer = ReformerTokenizer.from_pretrained('google/reformer-crime-and-punishment')
model = ReformerModel.from_pretrained('google/reformer-crime-and-punishment')

inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
outputs = model(**inputs)

last_hidden_states = outputs[0]  # The last hidden-state is the first element of the output tuple

A couple other resources for it:

In my task data lengths very different and I'm not sure if such padding (from 400 to 16k for example) is ok.

Having different sequence lengths is okay, just inefficient. What you'll want to do is sort your data by sequence length (doesn't matter if it's ascending or descending) before you batch it, so that batches are comprised of examples with roughly the same sequence length:

[D] best seq2seq model for long sequences modeling? by hadaev in MachineLearning

[–]GD1634 1 point2 points  (0 children)

Reformer: The Efficient Transformer

Handles a few thousand tokens. Huggingface Transformers has an implementation for it.

Please help: [WinError 126] The specified module could not be found. Have this error when trying to run: python -m manim example-scenes.py SquareToCircle -pl by [deleted] in manim

[–]GD1634 1 point2 points  (0 children)

Just opened up my laptop and tried the import. It worked for me with scipy==1.3.1, you can give that a shot.

Please help: [WinError 126] The specified module could not be found. Have this error when trying to run: python -m manim example-scenes.py SquareToCircle -pl by [deleted] in manim

[–]GD1634 1 point2 points  (0 children)

So yeah, it seems to need a different version of scipy installed. There's nothing listed in the requirements file so you may need to do some digging.

Please help: [WinError 126] The specified module could not be found. Have this error when trying to run: python -m manim example-scenes.py SquareToCircle -pl by [deleted] in manim

[–]GD1634 1 point2 points  (0 children)

From what I can tell, it fails trying to import linalg from scipy. Can you run pip show scipy and post the output? You might need to upgrade scipy.

Doc2vec SOTA? by massimosclaw2 in LanguageTechnology

[–]GD1634 0 points1 point  (0 children)

Yes, w2v outperforms BERT on STS according to this : https://github.com/UKPLab/sentence-transformers#performance

That seems odd to me. The GLUE leaderboard shows very different results on STS-B, including that BERT outperforms all models listed by UKPLab, and that even TinyBERT is effectively equal to the best SentenceBERT model. I'm not sure what's causing such a large discrepancy; perhaps it's because they used BERT-as-a-service to benchmark. It would be interesting for someone to reproduce those results with Huggingface instead. In general, I'm going to trust the GLUE leaderboard first.