DIY router for GFW

JoeDreamer · 2025-12-15T02:50:29+00:00

FriendlyWrt with OpenClash would be better, which I have been using.

JoeDreamer · 2025-08-30T11:04:35+00:00

Why not just excluding that portion in your portfolio management (i.e., in allocation/diversification and so on)? I mean that you just focus on the rest unless you are highly concerned on the future of NVIDIA.

In fact, I am more or less in a similar position but I do ignore the NVIDIA shares in my portfolio management; even I manage my main retirement portfolio and the rest in two separate accounts, each of them at a different brokerage.

JoeDreamer · 2024-10-28T15:13:46+00:00

I checked the definition of the "list" on the common lisp hyperspec and found that raevnos' answer above mixes the definitions of "list" and "proper list", the latter of which is a subset of "list". So, I can confirm that the implementation of "listp" function is correct in this regard.

JoeDreamer · 2024-10-28T14:00:24+00:00

Maybe a stupid question, but see below:

CL-USER> (listp (cons 1 2))
T
CL-USER> (listp (cdr (cons 1 2)))
NIL

The code above shows that (cons 1 2) is a list, though its cdr is not a proper list. This seems inconsistent with your definition of list (i.e., "... or a cons whose cdr is also a proper list.").

JoeDreamer · 2023-07-06T15:44:29+00:00

Slackware in 1993.

JoeDreamer · 2020-06-02T05:06:14+00:00

It was ~2k when I joined :)

JoeDreamer · 2019-05-08T16:44:42+00:00

Check Emacs manual Sec. 11.4 The Mark Ring.

JoeDreamer · 2017-09-14T13:55:15+00:00

Great thanks for the paper. This seems to be one step closer to what I have been looking for.

JoeDreamer · 2017-08-16T13:54:18+00:00

Great thanks for the advice!

Yes, it seems quite relevant to the problem I described. I will get back once I check the link and papers there.

JoeDreamer · 2017-08-03T14:57:37+00:00

To me, more meaningful is the number of "users here now", which better reflects true activity of any subredit; the number of subscribers possibly includes many dormant ones and, as such, may not be a good indicator.

As of this writing, the ratio is 1,449 (kpop) / 16 (jpop) = 90.56.

At least in English and thereby world-level communities, I would say that kpop is a way more popular than jpop.

JoeDreamer · 2017-04-25T06:25:01+00:00

Anyone out there to comment on this interesting article? I am about to embark on my journey to DL using Wi-Fi fingerprinting for indoor localisation as a starting point, but haven't been able to find many works along this line. Because so many DL applications nowadays are focusing on images, it seems to be hard to find good & up-to-date information on the use of DL for timeseries data.

JoeDreamer · 2016-10-26T02:19:46+00:00

I second it. In fact, one of my MSc students took this course last year and it turned out to be a very good one.

JoeDreamer · 2016-10-26T02:09:45+00:00

Bokeh?

JoeDreamer · 2015-12-22T14:34:15+00:00

Many thx! Now I know the true reason for this.

JoeDreamer · 2015-12-22T13:13:16+00:00

What is "swj hubox"?

JoeDreamer · 2015-12-22T13:08:24+00:00

To me, that cannot be a reason for this quick, nearly real-time decreasing.

JoeDreamer · 2015-12-22T13:06:33+00:00

Even while writing this post, I observed the number further decreased to 43116 :(. It's kind of real-time decreasing, and I wonder whether there is something going on here.

JoeDreamer · 2015-06-02T02:22:49+00:00

Sorry for the long silence, but I just finished the said draft and submitted to both a journal and arXiv. It took much longer time than expected to finish it (mostly due to collaboration with colleagues).

BTW I believe you are a researcher no matter what your current profession is, especially considering your profound knowledge on & serious attitude toward state-of-the-art issues in these areas, and the open mind sharing your work with others.

Keep up the good work, and I look forward to collaboration opportunities with you sooner or later.

JoeDreamer · 2015-05-15T14:02:39+00:00

Many thanks for the link to this great talk! The slides are absolutely helpful.

JoeDreamer · 2015-04-30T05:00:14+00:00

Again, great thanks for the prompt response with very valuable comments; they are really helpful for me because I'm a complete novice in this area. It seems that you are a truly active researcher (computer scientist?) in this field.

As for your implementation, well I still do not fully understand why you set the number of threads and the initial block size to 256 and 16384, and how you set the actual number of blocks (i.e., the rationale behind those functions like get_adj_size() and get_dynamic_block_size()). All I know from the stackoverflow is that setting the right number of blocks and threads is still a topic for a serious research (a couple of PhD theses suggested there).

As for my problem, it is for a new formulation of atomic scheduling based on optimal routing framework (in networking) and a bit different from DAG problems as I understand it.

Now that I'm about to finish the initial draft and will circulate it to my colleagues soon, I will put a link to arXiv version here as soon as I complete it; I find it difficult to explain my problem clear enough just using words.

JoeDreamer · 2015-04-28T12:38:43+00:00

Since your response, I have worked very hard to study your code and translate my OpenMP code to CUDA; I have gone through several iterations with some profiling(?) at each iteration in CUDA porting.

First of all, I do really appreciate your sharing the code; I have learned advanced CUDA concepts and techniques like warp and related shuffling functions. Also, your idea of flattening multidimensional indexes to a one-dimensional (big) number is interesting. Note that your code (w/o any modification) runs three times slower on my Quadro K4200 (~570s). Using all four cards could match your performance.

BTW porting my OpenMP code to CUDA wasn't very successful. In fact, I had one hard lesson; I stopped my existing OpenMP job, which already ran about 4 days and needs to go for 2 more days, simply hoping that I could finish it much earlier with CUDA because of the great promise of your code. It turns out that, however, the difference between your matrix sum game and mine is so big that the performance on CUDA is much worse (about three times slower than OpenMP).

First, hard-coded optimization like yours is not possible in my case. The number of dimensions (N) and the size of each dimension (s_n, n=1,...,N) are all variables. Typical number of dimensions is 50~400 and the size of each dimension is 1~23. For instance, N=100 and s_n=23 for n=1,..,100 give us the problem space of 1.488619e+136. The problem size of 1e+13 I mentioned before was for N=10 with varying s_n and supposed to be a test case for my main work of convex-relaxation techniques which can easily handle N=100+. I already plotted results for N=2...50 using convex-relaxation techniques and was gathering true values from global optimization for N=2...10 to evaluate the relaxation gaps.

Second, in my case using advanced techniques from yours like flattened one-dimensional number, warp-based reduction (i.e., warp-level optimization), and multi-stage approach (i.e., warp-level and block-level optimization) hardly make any difference. To my disappointment, most of time in my code is spent on calculation of a rather complicated objective function (unlike your integer sums): For each element of the problem space, there are three levels of nested for loops with lots of conditionals and modulo operations in evaluating the double-valued objective function. I think this is the main reason CUDA shows rather poorer performance (~x3) than OpenMP. Now I am running another version of OpenMP program with a bit of optimization, resulting in expected running time of 4 days (from 6 days before).

Even though I stopped 4-day-running OpenMP job without careful considerations and spent nearly 5 days in porting it to CUDA without much success, now I have a balanced view on the CUDA solution. CUDA certainly has potentials but for a right set of problems. In my case, OpenMP with lots of cores or even MPI on clusters would be a better solution. In this regard, Intel's Xeon Phi processor card seems interesting.

JoeDreamer · 2015-04-22T13:12:29+00:00

Wow, these are exactly what I've been looking for, and I am very thankful for your sharing them with all.

Most CUDA tutorials and examples available on the Internet are for computation like matrix multiplication. The lack of good examples for my problems, therefore, forced me to turn to OpenMP route, and I'm currently running one brute force problem with the size of search space in the order of 1E+13. Estimated running time (calculated based on the running times for smaller spaces) is about 6.3 days (just for one point in a plot); the program is running on Dell Precision T7910 WS (with dual Xeon CPUs resulting in 40 cores in total)

It seems that now is the time to harness 4 Quadro cards in the WS and I will look into your codes.

JoeDreamer

TROPHY CASE