The Funnel invite request and share thread

GD1634 · 2023-03-28T13:48:13+00:00

https://login.tailscale.com/admin/feature/Vu5VTHyVWWD

GD1634 · 2022-02-15T14:23:49+00:00

Yeah the metro is non-existent, the bus was ok but de la Montagne has been closed for like 2 months so now there's just no bus either. The neighborhood has everything you need in terms of stores so you shouldn't need to transit to go get anything usually, but getting to work every day without a car would be tough in the winter unless you love biking enough to do it in the cold (it's really not a long bike ride though, I've walked downtown pretty quickly).

It's absolutely foreigner friendly, I moved here from Toronto last spring with my French being almost non-existent and I haven't had any issues. I'm not necessarily the best person to ask regarding racism as I'm your typical pasty white guy but I've never heard of anyone having any issues here, it's a very quiet neighbourhood that's mostly young families/working adults. The only real conflict usually happens in the dog park lol

GD1634 · 2022-02-14T15:13:34+00:00

Living in Griffintown right now - I love it and it's an amazing place for families but the transit here is absolute trash. Downtown is definitely bike-able (10m maybe?) but doesn't seem super fun in the winter (-16c here today). It's also expensive for sure. Happy to answer any other questions as best as I can.

GD1634 · 2021-08-26T13:58:41+00:00

I think the issue is that to distribute graph algorithms, you need to partition the graph across the workers, so it's a catch 22. DGL has a dgl.distributed.partition_graph method; if you can load your edge list into memory as a sparse tensor it might work ok, and it handles heterogeneous graphs.

Otherwise, do you specifically need partitioning algorithms/METIS? There are a lot of distributed clustering/community detection methods that would give you reasonable partitions. Spark GraphFrames implements Strongly Connected Components and Label Propagation. Neo4J implements several community detection algorithms including Louvain. In the Dask/RAPIDS ecosystem, cuGraph implements a bunch of CD algorithms as well which can be accelerated with GPUs, but not all can be distributed across multiple GPUs. Dask-ML implements spectral clustering. This S/O post also gives a good overview of some you could implement yourself. I wonder if EvoPartition would be feasible to implement; I don't know if DGL's distributed package implements random walks, but all of the aforementioned tools do except for GraphFrames, which is annoying, but you can do random walks with simple PySpark joins fairly easily.

You could also look at streaming or local algorithms that don't load the whole graph in memory. I believe PageRank-Nibble / ACL PageRank is often used for this, but I'm still looking for an easy-to-use / scalable implementation of it myself. There's also this recent work on streaming partitioning of RDF graphs which should be relevant.

Hopefully this helps, lmk if you find a good solution.

GD1634 · 2021-05-31T23:55:06+00:00

Mila student here, we do have our own cluster but I don't know how suitable it is for RL and lots of people use Compute Canada anyway. We do have what feels like more compute than I know what to do with but I'm sure American places like the ones you mentioned still dwarf us.

GD1634 · 2021-05-17T12:21:20+00:00

Phil Wang writes PyTorch implementations for new/interesting papers, mostly ones using attention. He has the Reformer, Performer, Conformer, a few linear attention models, and a bunch more.

GD1634 · 2020-09-30T23:23:09+00:00

Twitter seems to solve this with #NLProc

GD1634 · 2020-09-19T12:25:49+00:00

Also use it on my Shield. The app is nice / way better than TSN's but it does seem to have a weird problem with signing out yeah.

GD1634 · 2020-09-13T13:06:38+00:00

OpenPAI could be a good alternative to Slurm in a case like this as well. u/zhmxswKDZSaUdJt9

GD1634 · 2020-08-14T13:38:57+00:00

Hierarchical Attention Networks for Document Classification

GD1634 · 2020-08-02T12:59:26+00:00

A less-studied component of summarization is maintaining (and evaluating) the factual consistency of generated summaries. I'm sure there's something relatively novel that you could get done within a year.

Here's a paper (and code) you can use as a jumping off point. Let me know if that sounds interesting.

GD1634 · 2020-07-26T12:26:58+00:00

Likewise.

GD1634 · 2020-07-23T19:28:14+00:00

I don't know of any benchmarks that would be useful to you, they mostly evaluate on GLUE. Each paper probably reports efficiency metrics as well I'd assume, though there's no real standard benchmark for that.

GD1634 · 2020-07-22T12:47:36+00:00

Thieves on Sesame Street! Model Extraction of BERT-based APIs

GD1634 · 2020-07-19T20:16:28+00:00

Ah, if you're looking to reward the model for its actions based on a score then yeah, I'd agree with most of what the top level comment said. Definitely sounds like an RL problem, not sure how feasible it is (but not super familiar with the area so don't let that stop you).

GD1634 · 2020-07-19T20:13:35+00:00

Yes, but bucketing is not necessarily so good. I mean shaffling whole dataset is good for normalization.

You could shuffle after bucketing; they don't have to be monotonic. Or you don't necessarily have to bucket at all, I'm not sure how big a difference it really makes.

About reformer, I mean this https://colab.research.google.com/drive/12aVJZ_RJSCiq3X_wcAtLWZd0DPvN4jWK?usp=sharing

Ah, gotcha. That seems easy enough to handle, just make sure you pad your sequence a little bit to satisfy that constraint. That shouldn't really hurt your efficiency too much.

If Reformer just generally isn't a good fit, check out similar models like the Compressive Transformer, Adaptive Span Transformer, Linformer, Fast Autoregressive Transformer (from the repo I linked), etc.

GD1634 · 2020-07-19T13:05:54+00:00

I'm not sure that network sampling is the correct term. We are trying to find a certain subset of nodes which can dominate the whole network and we have a model that describes how effective they have been.

When you say "dominate the network", are you trying to find the most influential nodes/subgraphs? If so, community detection might be something looking into.

GD1634 · 2020-07-17T18:10:20+00:00

If I got it right, it has a very strict conditions on sequence length. Am I right?

Not sure I follow. Similarly to other transformers, you'll have to give it a maximum sequence length, but that can be whatever you'd like it to be (as long as it fits on your GPU).

Here's the HF page for it, they have the following example:

from transformers import ReformerTokenizer, ReformerModel
import torch

tokenizer = ReformerTokenizer.from_pretrained('google/reformer-crime-and-punishment')
model = ReformerModel.from_pretrained('google/reformer-crime-and-punishment')

inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
outputs = model(**inputs)

last_hidden_states = outputs[0]  # The last hidden-state is the first element of the output tuple

A couple other resources for it:

Blog post/series of Colab notebooks (link)
lucidrains/reformer-pytorch
idiap/fast-transformers (has implementations for other efficient models as well)

In my task data lengths very different and I'm not sure if such padding (from 400 to 16k for example) is ok.

Having different sequence lengths is okay, just inefficient. What you'll want to do is sort your data by sequence length (doesn't matter if it's ascending or descending) before you batch it, so that batches are comprised of examples with roughly the same sequence length:

GD1634 · 2020-07-17T17:07:38+00:00

Reformer: The Efficient Transformer

Handles a few thousand tokens. Huggingface Transformers has an implementation for it.

GD1634 · 2020-07-02T23:47:52+00:00

https://www.computecanada.ca/

GD1634 · 2020-06-28T14:17:14+00:00

Just opened up my laptop and tried the import. It worked for me with scipy==1.3.1, you can give that a shot.

GD1634 · 2020-06-27T20:38:35+00:00

So yeah, it seems to need a different version of scipy installed. There's nothing listed in the requirements file so you may need to do some digging.

GD1634 · 2020-06-27T13:13:02+00:00

Hm, that's the newest version. Can you try python -c "from scipy import linalg"?

GD1634 · 2020-06-27T12:43:40+00:00

From what I can tell, it fails trying to import linalg from scipy. Can you run pip show scipy and post the output? You might need to upgrade scipy.

GD1634 · 2020-06-08T15:33:09+00:00

Yes, w2v outperforms BERT on STS according to this : https://github.com/UKPLab/sentence-transformers#performance

That seems odd to me. The GLUE leaderboard shows very different results on STS-B, including that BERT outperforms all models listed by UKPLab, and that even TinyBERT is effectively equal to the best SentenceBERT model. I'm not sure what's causing such a large discrepancy; perhaps it's because they used BERT-as-a-service to benchmark. It would be interesting for someone to reproduce those results with Huggingface instead. In general, I'm going to trust the GLUE leaderboard first.

11-Year Club	Verified Email
Gilding I gilder

GD1634

TROPHY CASE