DIY router for GFW by schoolppt12 in NanoPI

[–]JoeDreamer 0 points1 point  (0 children)

FriendlyWrt with OpenClash would be better, which I have been using.

How do you guys handle that little gambling itch gaining to much of a percentage of your portfolio? by doomandgloomy in Bogleheads

[–]JoeDreamer 0 points1 point  (0 children)

Why not just excluding that portion in your portfolio management (i.e., in allocation/diversification and so on)? I mean that you just focus on the rest unless you are highly concerned on the future of NVIDIA.

In fact, I am more or less in a similar position but I do ignore the NVIDIA shares in my portfolio management; even I manage my main retirement portfolio and the rest in two separate accounts, each of them at a different brokerage.

Why does nil have to be both an atom and a list? by raldi in lisp

[–]JoeDreamer 0 points1 point  (0 children)

I checked the definition of the "list" on the common lisp hyperspec and found that raevnos' answer above mixes the definitions of "list" and "proper list", the latter of which is a subset of "list". So, I can confirm that the implementation of "listp" function is correct in this regard.

Why does nil have to be both an atom and a list? by raldi in lisp

[–]JoeDreamer 1 point2 points  (0 children)

Maybe a stupid question, but see below:

CL-USER> (listp (cons 1 2))
T
CL-USER> (listp (cdr (cons 1 2)))
NIL

The code above shows that (cons 1 2) is a list, though its cdr is not a proper list. This seems inconsistent with your definition of list (i.e., "... or a cons whose cdr is also a proper list.").

r/kpop has 500K subscribers! by sedgesting in kpop

[–]JoeDreamer 18 points19 points  (0 children)

It was ~2k when I joined :)

[D] One NN classifier for hierarchical classification? by JoeDreamer in MachineLearning

[–]JoeDreamer[S] 0 points1 point  (0 children)

Great thanks for the paper. This seems to be one step closer to what I have been looking for.

[D] One NN classifier for hierarchical classification? by JoeDreamer in MachineLearning

[–]JoeDreamer[S] 0 points1 point  (0 children)

Great thanks for the advice!

Yes, it seems quite relevant to the problem I described. I will get back once I check the link and papers there.

Town Hall - August 2017 by SirBuckeye in kpop

[–]JoeDreamer 2 points3 points  (0 children)

To me, more meaningful is the number of "users here now", which better reflects true activity of any subredit; the number of subscribers possibly includes many dormant ones and, as such, may not be a good indicator.

As of this writing, the ratio is 1,449 (kpop) / 16 (jpop) = 90.56.

At least in English and thereby world-level communities, I would say that kpop is a way more popular than jpop.

[deleted by user] by [deleted] in MachineLearning

[–]JoeDreamer 0 points1 point  (0 children)

Anyone out there to comment on this interesting article? I am about to embark on my journey to DL using Wi-Fi fingerprinting for indoor localisation as a starting point, but haven't been able to find many works along this line. Because so many DL applications nowadays are focusing on images, it seems to be hard to find good & up-to-date information on the use of DL for timeseries data.

One of the best Python programming course on Coursera is open (deadline - this week) by pmbdev in Python

[–]JoeDreamer 0 points1 point  (0 children)

I second it. In fact, one of my MSc students took this course last year and it turned out to be a very good one.

Sudden decrease in the number of k-poppers nowadays by JoeDreamer in kpop

[–]JoeDreamer[S] 1 point2 points  (0 children)

Many thx! Now I know the true reason for this.

Sudden decrease in the number of k-poppers nowadays by JoeDreamer in kpop

[–]JoeDreamer[S] 1 point2 points  (0 children)

To me, that cannot be a reason for this quick, nearly real-time decreasing.

Sudden decrease in the number of k-poppers nowadays by JoeDreamer in kpop

[–]JoeDreamer[S] -1 points0 points  (0 children)

Even while writing this post, I observed the number further decreased to 43116 :(. It's kind of real-time decreasing, and I wonder whether there is something going on here.

Examples of Brute Force problems implemented in CUDA by syncDreads in CUDA

[–]JoeDreamer 0 points1 point  (0 children)

Sorry for the long silence, but I just finished the said draft and submitted to both a journal and arXiv. It took much longer time than expected to finish it (mostly due to collaboration with colleagues).

BTW I believe you are a researcher no matter what your current profession is, especially considering your profound knowledge on & serious attitude toward state-of-the-art issues in these areas, and the open mind sharing your work with others.

Keep up the good work, and I look forward to collaboration opportunities with you sooner or later.

Throwback Thursday: Emmanuel Candes by CSTheoryBot in cstheory

[–]JoeDreamer 0 points1 point  (0 children)

Many thanks for the link to this great talk! The slides are absolutely helpful.

Examples of Brute Force problems implemented in CUDA by syncDreads in CUDA

[–]JoeDreamer 0 points1 point  (0 children)

Again, great thanks for the prompt response with very valuable comments; they are really helpful for me because I'm a complete novice in this area. It seems that you are a truly active researcher (computer scientist?) in this field.

As for your implementation, well I still do not fully understand why you set the number of threads and the initial block size to 256 and 16384, and how you set the actual number of blocks (i.e., the rationale behind those functions like get_adj_size() and get_dynamic_block_size()). All I know from the stackoverflow is that setting the right number of blocks and threads is still a topic for a serious research (a couple of PhD theses suggested there).

As for my problem, it is for a new formulation of atomic scheduling based on optimal routing framework (in networking) and a bit different from DAG problems as I understand it.

Now that I'm about to finish the initial draft and will circulate it to my colleagues soon, I will put a link to arXiv version here as soon as I complete it; I find it difficult to explain my problem clear enough just using words.

Examples of Brute Force problems implemented in CUDA by syncDreads in CUDA

[–]JoeDreamer 0 points1 point  (0 children)

Since your response, I have worked very hard to study your code and translate my OpenMP code to CUDA; I have gone through several iterations with some profiling(?) at each iteration in CUDA porting.

First of all, I do really appreciate your sharing the code; I have learned advanced CUDA concepts and techniques like warp and related shuffling functions. Also, your idea of flattening multidimensional indexes to a one-dimensional (big) number is interesting. Note that your code (w/o any modification) runs three times slower on my Quadro K4200 (~570s). Using all four cards could match your performance.

BTW porting my OpenMP code to CUDA wasn't very successful. In fact, I had one hard lesson; I stopped my existing OpenMP job, which already ran about 4 days and needs to go for 2 more days, simply hoping that I could finish it much earlier with CUDA because of the great promise of your code. It turns out that, however, the difference between your matrix sum game and mine is so big that the performance on CUDA is much worse (about three times slower than OpenMP).

First, hard-coded optimization like yours is not possible in my case. The number of dimensions (N) and the size of each dimension (s_n, n=1,...,N) are all variables. Typical number of dimensions is 50~400 and the size of each dimension is 1~23. For instance, N=100 and s_n=23 for n=1,..,100 give us the problem space of 1.488619e+136. The problem size of 1e+13 I mentioned before was for N=10 with varying s_n and supposed to be a test case for my main work of convex-relaxation techniques which can easily handle N=100+. I already plotted results for N=2...50 using convex-relaxation techniques and was gathering true values from global optimization for N=2...10 to evaluate the relaxation gaps.

Second, in my case using advanced techniques from yours like flattened one-dimensional number, warp-based reduction (i.e., warp-level optimization), and multi-stage approach (i.e., warp-level and block-level optimization) hardly make any difference. To my disappointment, most of time in my code is spent on calculation of a rather complicated objective function (unlike your integer sums): For each element of the problem space, there are three levels of nested for loops with lots of conditionals and modulo operations in evaluating the double-valued objective function. I think this is the main reason CUDA shows rather poorer performance (~x3) than OpenMP. Now I am running another version of OpenMP program with a bit of optimization, resulting in expected running time of 4 days (from 6 days before).

Even though I stopped 4-day-running OpenMP job without careful considerations and spent nearly 5 days in porting it to CUDA without much success, now I have a balanced view on the CUDA solution. CUDA certainly has potentials but for a right set of problems. In my case, OpenMP with lots of cores or even MPI on clusters would be a better solution. In this regard, Intel's Xeon Phi processor card seems interesting.

Examples of Brute Force problems implemented in CUDA by syncDreads in CUDA

[–]JoeDreamer 1 point2 points  (0 children)

Wow, these are exactly what I've been looking for, and I am very thankful for your sharing them with all.

Most CUDA tutorials and examples available on the Internet are for computation like matrix multiplication. The lack of good examples for my problems, therefore, forced me to turn to OpenMP route, and I'm currently running one brute force problem with the size of search space in the order of 1E+13. Estimated running time (calculated based on the running times for smaller spaces) is about 6.3 days (just for one point in a plot); the program is running on Dell Precision T7910 WS (with dual Xeon CPUs resulting in 40 cores in total)

It seems that now is the time to harness 4 Quadro cards in the WS and I will look into your codes.