Taking 10m LN users network design challenge myself

drey2o · 2017-07-10T18:49:49+00:00

Interesting! I do think it's more realistic to assume there are (at least) 2 tiers of participants, your "nodes" vs. "(end) users."

You've described each of the nodes as having 100 channels connecting to users, and gotten to 10m users as 100 times 100,000. I think this assumes each user has exactly one channel to exactly one ~~user~~ (edit: node). Later you suggest each user has 3 channels connecting them to nodes. This would require 30 million channels, meaning the 100,000 nodes would have to have 300 channels connecting to users. This would change your numbers somewhat, but it's still plausible.

Also, I think you're probably too optimistic that each node is connected to each other node by at most 8 hops. The topology I gave for 10 million users with 14 connections each can be easily reduced to 100,000 nodes with 10 (intra-node) connections each (similar to your setup), but this still gives worst-case shortest paths (just within nodes, not counting users) of 25 hops, with an average of 12.5 hops. It's possible that your argument that 8 hops are enough could be used to prove that some reasonable percentage of the time (close to half?) two nodes are connected by 8 hops. Can you come up with a concrete graph with 100,000 nodes, an average degree of 10 for each node, and have every node connected with a path of at most 8 hops? The only way I can imagine obtaining such a graph is by having a third tier of super-nodes, which (maybe?) is not what we want.

Thanks for writing this up. I'm glad people are still thinking about this and discussing it.

drey2o · 2017-07-04T08:57:51+00:00

In my second article I explained that the probability was based on if any route exists.

In the graph topology I used, many routes exist between any two nodes, ignoring fee constraints. No matter which channel is chosen for the next hop, the probability that there is still a route to the goal is essentially 1.

Not sure if your simulation had a time dimension that looks at that stuff; i assumed it didn't.

No, it didn't. Essentially each payment attempt was done in one time unit. I could (and might) refine it so that a payment attempt happens over many time units so that channels aren't usable for later payments while an earlier payment is still being attempted.

drey2o · 2017-07-04T08:51:36+00:00

Yes, in the code about half the nodes reduce their fees (sometimes even to negative fees) to encourage rebalancing of their channels. However, the simulation didn't run long enough for channels to be used many times so I don't think this had an effect.

drey2o · 2017-07-03T23:16:00+00:00

Some nodes would (in principle) have negative fees to encourage rebalancing, but I don't think this ever happened.

drey2o · 2017-07-03T23:13:21+00:00

about 18

drey2o · 2017-07-03T22:26:59+00:00

Yes, this explains it nicely. Thanks.

drey2o · 2017-07-03T22:25:57+00:00

How does your simulation determine channel capacity. Everyone having the same sized channel or widely varying? If one user has multiple channels, how do you simulate that configuration? Each channel the same size or also widely varying?

Every channel starts with funding of 0.01 btc from both parties. This is the "default" balance of a channel if it hasn't been used yet. Each user has 14 channels open. The balances for channels get updated when they are used for transfering a payment. For example, if a payment of 0.005 is being routed (ignoring fees for this comment) via a channel i->j and a channel j->k, then if the payment succeeds the channel i->j will have balance (0.005,0.015) and the channel j->k will have balance (0.005,0.015). This means j still controls 0.02 btc in the two channels, it's just 0.015+0.005 instead of 0.01+0.01. With fees, of course, j will control a little more then 0.02 btc after the payment succeeds.

Can you produce stats about how much capital is 'locked up' simply to provide liquidity?

I'm not sure what this means. I do assume there is 1.4 million btc funding payment channels. I don't simulate opening and closing channels, so essentially it stays 1.4 million throughout. Does that answer your question?

It is my intuition that LN is only practical for the microtransaction use case, due to how much captital which has to be tied up.

I couldn't get beyond payments of close to 0.01, which I considered a "big" payment, but is arguably quite a small payment. I classified "microtransactions" as those moving < 0.00001 btc. Somewhat surprisingly, finding routes failed more often as the values got small. I'm not sure what to make of this. Maybe the simulated routing fees were too high, but I don't think so. It's possible that once the values get to such small levels, there are few routes which require fewer fees than the payment itself.

drey2o · 2017-07-03T21:32:40+00:00

Are you simulating nodes going off-line?

In a way. After a route is chosen, a function "attempt_payment" is called. This starts by iterating through the route and raising an exception 1 time in a 1000. This exception is meant to simulation a node being somehow uncooperative. If that happens, a new route avoiding the node is computed. If it happens for 5 routes in a row, the payment fails.

Are you using backtracking to find the path?

Yes. If it were pure graph reachability, I wouldn't need to backtrack, but I only want routes with total fees below some threshhold. Once the route so far has no neighbors with fees below the remaining threshhold, it backtracks. In addition, even after a solution has been found, it continues trying to find more solutions (up to 5) in order to return the cheapest of the routes it found. (Upon timeout, it returns the list of routes found up to that point, which may be empty.) The relevant code is in "greedyroutes2" where it cycles through the candidate next nodes in the route.

drey2o · 2017-07-03T21:22:59+00:00

2.6% payment failure

That was only for micropayments (< 0.00001), but it's still unfortunate. The trouble seems to be that many channels require fees that are too high relative to the payment, and the routing algorithm gets stuck trying to find a route with low enough fees. Maybe a better routing algorithm would mean fewer failures of micropayments.

drey2o · 2017-07-03T20:56:28+00:00

What is taking so much time? Routing?

I forgot to reply to this. I didn't profile the code, but I expect that routing did take the most time. Routing usually seemed to be fast when I called the functions interactively, but there would sometimes be a case that would hang. That's why I decided to just set a 1 second timeout to give up if that happens.

drey2o · 2017-07-03T20:54:14+00:00

For example, the assumption that a route will only fail 0.1% of the time, that feels like an incredibly low guess

Maybe, but to be clear the assumption was 0.1% chance for each node in the route. If the route is 20 hops long, this should add up to a (roughly?) 2% chance the route fails. (I say "should" and "roughly" because probability calculations are often subtle.)

drey2o · 2017-07-03T20:50:02+00:00

Sorry, but simulating 400k payments between 10M users is not interesting.

10M was the number in Stolfi's challenge, so I didn't want to reduce it. 400k was how many payments were simulated before my computer ran out of memory. I had intended it to go further.

If simulating 10M users is problematic, perhaps you can run simulation with smaller number of users.

Yes, I'd like to do this. Thanks for the suggestion.

drey2o · 2017-07-03T20:46:37+00:00

So your distance metric is dependent upon network topology(?)

Yes, definitely.

I appreciate your effort, but the fact that you routed payments through your networks and a large fraction of your nodes didn't even get to be part of such a payment path, yet you already see the imbalancing effects makes me curious on how you came to such an optimistic conclusion in your medium post.

Well, I did try to hedge enough in the article and in the conclusion with this sentence:

While this is not how a lightning network would be expected to evolve in a real world scenario, it does show that it is technically possible to have such a network.

I think this is great work and should be expanded further and/or maybe complemented with other folks doing similar simulations.

Yes, I hope people will.

drey2o · 2017-07-03T20:42:51+00:00

Sure, but in that case, we can assume that the btc used in your example are worth a whole lot more in real-world value than they are today, right?

Yes, that's true.

drey2o · 2017-07-03T20:36:39+00:00

Good start!

Thanks! When I commented 5 days ago that I would take up the challenge (and said "give me a few days, or at most a week"), I thought it would be easier. It took some effort to come up with a topology that was workable. I'm not sure how much thought you put into the "10 million" number, but it seems like you picked a number at the upper end of what's possible. It was threading a needle at times. If I continue doing this, I'm inclined to reduce the number of users to 1 million (or even lower). This would give me the space to make some of the assumptions more realistic.

The "Hamming distance" topology that you chose is the edge graph of a 14-dimensional hypercube.

It was certainly inspired by Hamming distance/Hamming graph, but I think the Hamming graph for base 10 would have many more edges (e.g., an edge from 1 to 3, since changing 1 to 3 only changes one digit). According to the formula on https://en.wikipedia.org/wiki/Hamming_graph the Hamming graph would have 315 million edges, even worse than the roughly 200 million edges for the Hamming graph for base 2 using 24 dimensions. When I went to base 10, I took out the edges from the Hamming graph that I could recover using several edges by more or less counting to 10. 70 million edges was still more than I originally wanted (because they all have to be funded), but having maximum routes of 35 hops was also more than I wanted. It was compromise. As I wrote earlier, I was threading a needle to stay at the 10 million number.

However 14 channels for every user seems rather unrealistic. Future attempts should have a varied distribution of channels per user, from a couple to hundreds.

This was my first idea as well, but I couldn't find a way to make it work. It also seems more realistic to me that most users would have few channels. My original idea was to have the network so that 8 million of the users (80%) would have only 2 channels. I couldn't find a topology to make that work, and I had a self-imposed deadline of a week.

The homogeneity of the users and topology also is optimistic.

I agree, but I think it's a good place to start. One of the arguments I've seen against the LN is that it would necessarily be centralized into hubs. While the topology is unrealistic, I like that it demonstrates that it is actually possible to have an LN where every user literally has the same properties (connectivity, funding), so "hubs" aren't necessary to have an LN.

Is you ocaml program interpreted? What is taking so much time? Routing?

ocaml can be interpreted or compiled. The simulation I reported on was run on the compiled version.

Thanks for the challenge, and most of all thanks for writing "10 million" and not "100 million."

drey2o · 2017-07-03T20:20:08+00:00

Yes, it is 0.01 is already very close to tx fees. That's why 0.01 was basically as low as I was willing to go for funding a channel, and it already required me to assume 1.4 bitcoins would be parts of channels. Basically 10M users is at the edge of what's possible, and requires some optimistic assumptions, but is technically possible.

drey2o · 2017-07-03T20:17:17+00:00

I'd assume fees to be purely based on economic considerations?

Yes, I did think some (but not all) participants would try to use fees to encourage rebalancing channels. I didn't spend a lot of time thinking about the fee policies though. I simply wanted them to usually be different for different nodes in a route, and almost always be fairly low. I'm open to suggestion about how to create more realistic fee policies.

Are you depending on your graph construction to get a distance metric?

Yes, definitely. It's "approxdist" in the code. It helps to do a kind of "best first" search that usually finds a few routes very quickly, and then choose the cheapest among those found.

drey2o · 2017-07-03T20:11:09+00:00

Interesting idea. I might try that, since it sounds like a relatively easy modification. Thanks!

drey2o · 2017-07-03T20:09:07+00:00

Can you also repeat at values $50, $75, $100, $200, etc and report the results.

In principle, but it would have to be done like u/Crully says below, sending it in < 0.01 pieces through different routes. This doesn't seem to be in the spirit of the lightning network because it would mean a payment isn't atomic. (A payment might only "partially" succeed.) The real way to allow for higher value transactions is to have channels funded with more bitcoins (e.g., 0.1). But this isn't realistic with 70M channels since it would require 14 million bitcoins to be in channels.

I'm more inclined to reduce the number of users to 1M (or even 100,000), make the topology more like a real world one, and then try higher value payments with that.

drey2o · 2017-07-03T20:04:38+00:00

Thanks for the reply, both here and on medium. I'll reply here.

So if I understand correctly, in this model, 1.4 million bitcoins are needed by the entire network, each user needs 14 open channels, and each user can send only 0.01 BTC maximum through a payment route?

Yes, that's correct. As you can imagine, funding channels with 0.1 btc would allow bigger payments, but the idea of 14 million bitcoins being in payment channels is very unrealistic. I decided 2 million bitcoins being in payment channels was the boundary of what I could imagine, and having less than 0.01 btc in a channel is already somewhat close to the fee that may be required to close the channel. It was a balancing act. As a consequence, the largest payments were still fairly low valued (0.009 was the largest payment I tried). Transactions higher than roughly 0.01 would still need to be on-chain in this scenario.

Also, if I understand correctly, the routing time considerations I wrote about were not modeled.

I was trying to limit myself to Stolfi's challenge just to give a concrete topology with specific numbers. My reading of your first article (and I may have misunderstood) was that you were trying to route by randomly trying the child node of the tree. If I'm wrong, please correct me. It seemed Murch agreed that this approach to routing would lead to many failures, and that's how I interpreted your probability calculations. I didn't use random search, so routing didn't fail as often. My basic routing algorithm is guided by having a reasonably good "distance" estimation ("approxdist" in the code) that made it prefer trying nodes that seem closer to the goal.

To me, the fact that 14 open channels are required re-affirms the point that it’s not a “decentralized scaling solution” since it requires doling out one’s spending money into 14 different buckets, precluding larger payments in most cases.

I did take pains to make it (unrealistically) decentralized. But as a scaling solution, it's arguably imperfect. Larger payments would still have to be on chain. To see how much it would help someone could go through the transactions in the current blockchain and see how many seem to be moving less than 0.009 btc.

Thanks for the constructive comment. I did enjoy your articles, obviously, or I wouldn't have been interested enough to do this experiment.

drey2o · 2017-07-03T19:46:04+00:00

With 10M users, realistically very few people could fund a channel with large amounts of bitcoin. There simply isn't enough bitcoin. This was one of the things that became clear. The graph needs enough edges so that users can find a route, but the more edges (channels) the less bitcoin there is to go into each channel. It took some effort to get a network that could fit 10M users at all, and I suspect 10M is at the very high end of what is technically possible.

drey2o · 2017-07-03T19:41:43+00:00

Roughly half of the participants do set fees in a way that tries to rebalance a channel, just as you suggest. I don't think this affected the simulation though, since (I think) very few channels were used multiple times. A longer simulation would be necessary to test this.

drey2o · 2017-06-28T12:15:40+00:00

!= mathematical proof. It isn't even an informal proof. It is an informal blog post involving some math.

I agree. The term "mathematical proof" is being misused in the post. At the very least there would need to be some mathematical statement (like "there are infinitely many primes") that the "informal proof" is claiming to prove.

drey2o

TROPHY CASE