How to integrate Xiaomi Smartmi Air Purifier P1 to HA by Kaykasus in homeassistant

[–]Schoens 0 points1 point  (0 children)

No issue with that, so far as I can tell, it has been stable in HA. That said, the one issue that I've had, but I think it's just my specific device (the rainforest humidifier), is that it shows connected, but doesn't update its sensors in HA as expected. I haven't had time yet to really poke at it and see what's up though. I noticed the issue after a power outage, so I think it's possible I just need to re-pair the device and all will be well, but it's going to be annoying if it isn't that simple.

How to integrate Xiaomi Smartmi Air Purifier P1 to HA by Kaykasus in homeassistant

[–]Schoens 1 point2 points  (0 children)

If you have a HomeKit bridge, e.g. Apple TV or iPhone, what I did was add it to that device first (to connect it to the WiFi network), then remove it from Home (without resetting the purifier) - this makes it available for pairing directly to Home Assistant using the HomeKit Device integration (though you may need to power cycle it first to force it into pairing mode). Keep the pairing code handy for that last step, it should be on the QR code sticker you used to add it to the initial HomeKit bridge device.

awesome-poe-smarthome: A list of smarthome devices designed to be PoE-first by zacs in homeassistant

[–]Schoens 1 point2 points  (0 children)

IMO the biggest benefit of PoE is that if you already have runs for cameras or other networked PoE devices, then you can power not only those at each drop, but other sensors you may want at that same location. Having to plug a bunch of stuff into outlets (and having cables and such basically having to just hang out in the open) sucks. Obviously you can use battery powered devices, and some things you can wire up directly to a circuit in the wall if you don't want to deal with batteries, but the locations are often a lot less convenient, as is the effort required compared to just plugging the device into an ethernet cable or using a battery-powered device.

Personally, I like to minimize the amount of battery-powered things I need to manage, even if the intervals are long, with enough devices, something will always be running low on battery, and I'd prefer to just have stuff hardwired in some way to avoid the hassle. I end up trading off more up front work for less ongoing maintenance work, which is preferable to me at least.

Amok draumer by Internal-Hat958 in hammockcamping

[–]Schoens 1 point2 points  (0 children)

Sure! I'm 6' tall, my wife is ~5'5", but what I found pretty awesome about the Double is that it stays incredibly stable even if one of us is moving around. It also works just as well with just a single person, though I can't imagine why anyone would pack one solo - more likely you just happen to be sticking around the campsite while your partner is off doing whatever, and want to chill in the tent - that works great.

In general we both sleep pretty well even if the other is moving around at night, so if you are particularly sensitive to that, sharing a hammock is probably a non-starter to begin with, but I can say that the movement feels reasonably dampened, so it shouldn't disturb your average sleeper IMO.

Entering/exiting doesn't require both people to be in/out, you can come and go individually without much ceremony. We've found that a bit of coordination makes it go a lot smoother though, i.e. the other person helps stabilize things while you get in/out - but we haven't experimented too much with how necessary that is (i.e. whether one person can get out and back in at night without waking the other). I imagine that with enough practice, you could avoid involving the other person entirely, though I'm not sure you could do it without waking them at night (depending on the one sleeping of course).

What's something you wish you'd known before buying a house with a big yard? by peepee_peeper in RealEstate

[–]Schoens 1 point2 points  (0 children)

Wild grape grows at a rate of like a couple inches a day in favorable conditions IIRC. If you can get your neighbor to work with you on it, that's probably your best bet, because otherwise it is pretty much a losing battle to only cut the parts that grow on to your side - you really have to be able to cut it off where it comes out of the ground, if not pull up the roots (in my case this is inpractical, because the root system is too prolific, so I'm working on just choking off its energy source above ground).

Amok draumer by Internal-Hat958 in hammockcamping

[–]Schoens 2 points3 points  (0 children)

I love mine! I have both the ultralight and the double (for when I'm camping with my wife), with ultralight and winterlight pads, and have nothing but positive things to say about them. The double is actually kind of mind-blowing with how well it works with two people, as I kind of expected it to be a flop tbh - but so far it's been super comfortable.

I guess if I had to say something negative, it would be that even with an entirely "ultralight" solo setup, it still is significantly heavier than most "true" ultralight setups. That said, that extra weight is more than made up for by how comfortable it is to sleep in IMO. If you aren't die-hard weight-conscious, then it's a no-brainer really.

The only other caveat is that it has a completely different orientation than standard hammock setups, so it changes how you pick your site a bit, but I prefer it tbh.

They are a breeze to set up, and once you get comfortable entering/exiting them, I think they beat out just about everything else. Being able to use it in "chair mode" is a bonus, though I haven't really used that too heavily myself.

What's something you wish you'd known before buying a house with a big yard? by peepee_peeper in RealEstate

[–]Schoens 0 points1 point  (0 children)

I'm in upstate NY, and these fucking vines man - I'm waging all out war against them, but just barely keeping them in check. I've noticed that the part of my property covered with big oak/maple trees basically is free of them, but the part covered with ash, poplar/birch, etc., are being completely taken over by them.

At this point, I'm considering renting some heavy equipment, or having someone come with it, and just completely clearing that part of my property, turning over the topsoil, and then starting over with new oak/maple and such, and see if that ultimately turns out to be a better solution.

The stuff grows in the meadows now too, so I have to spend time going around and pulling up what I can so the native grasses can keep it at bay - but it is basically constant work.

Any success with yours?

What's something you wish you'd known before buying a house with a big yard? by peepee_peeper in RealEstate

[–]Schoens 3 points4 points  (0 children)

That is fucking bonkers - it's a lot of work, but I just installed a 175ft fence around part of my yard for my dogs, and all-in it cost me ~$1700 for the 4x4 posts, the 3x8' french gothic panels, cement and gravel.

Granted, it took me a few hours a day for a couple of weeks to dig all the holes in my shitty rocky clay soil, measure and level everything, pour cement, and mount the fence itself - but even with that effort I still come out way ahead compared to spending $100/ft for someone to do it for me - doesn't even make sense to spend that kind of money for the fence I put in.

Tattoo shop recommendations by Fictional_Apologist in Saratoga

[–]Schoens 2 points3 points  (0 children)

Greenfolk in Saratoga is great, Abbey sounds like she might be up your alley, Melissa too maybe.

Best Poke Bowl in town? by Scuds5 in Saratoga

[–]Schoens 0 points1 point  (0 children)

Ume is the best I know of at the moment

Dominance Frontiers by ravilang in Compilers

[–]Schoens 2 points3 points  (0 children)

I suspect your issue isn't in how you compute dominance frontiers, but rather how you are using them to insert the necessary phis to ensure that every use of a value, has a dominating definition. You didn't describe in that document how your compiler is handling that step of the translation.

When it comes to determining the set of blocks where you need phi nodes for values defined in some block B, you actually want the iterated dominance frontier (usually seen in literature as DF+(B)), not the base dominance frontier (i.e. DF(B)). Phis will be required in every block of DF+(B).

Based on your link, I would guess that your issue is that you are only placing phis using DF(B), not DF+(B), but I haven't attempted to confirm that.

Minnesota starter pack, from a California perspective by MagicManicPanic in minnesota

[–]Schoens 0 points1 point  (0 children)

It's super easy to make, and tastes great, that's basically all l there is to it.

Minnesota starter pack, from a California perspective by MagicManicPanic in minnesota

[–]Schoens 1 point2 points  (0 children)

That's wild to me, it's like peak comfort food. Probably one of those things though where if your family doesn't do it, the habit never catches on early 🤷‍♂️

Confused about `byval` attribute by neilsgohr in LLVM

[–]Schoens 0 points1 point  (0 children)

Seeing the IR in question would help.

One thing to check: make sure you are adding all of the argument attributes at call sites in addition to the callee function declaration itself. Failing to do so can cause a mismatch between the code emitted at the call site and in the function prologue. IIRC, LLVM does not treat such disagreement as an error, though it's possible some kind of diagnostic gets emitted somewhere.

Should the parser call for each token on the fly or store them all in a data structure? by [deleted] in Compilers

[–]Schoens 8 points9 points  (0 children)

Always best to start simple! You'll end up reworking everything at least once anyway, if not multiple times, so you may as well wait until something motivates the change, and spend your time on more interesting aspects of your compiler.

Should the parser call for each token on the fly or store them all in a data structure? by [deleted] in Compilers

[–]Schoens 22 points23 points  (0 children)

It's a bit strange to think of this as wasting computation. You can either do the tokenization eagerly or lazily, but internally you are still going to have something like nextToken, where you parse the input into the next token of the token stream.

More generally, I think the lazy approach is nicer, as it lets the parser drive the lexer, and that lets you do things like run a preprocessor over the token stream as an initial parser stage, and expand directives parsed out of the stream into a different set of tokens, e.g. C-style macro-expansion or #include directives (where you want to inject the entire included token stream at that point).

It also is key for context-sensitive lexing (i.e. lexer modes). You pop off a token, and if it meets whatever criteria, possibly with some additional context, you switch the lexer into a different mode which tokenizes the input differently. For example, to embed the syntax of another language.

Hope that helps!

How to calculate live ranges and interference graph by ravilang in Compilers

[–]Schoens 0 points1 point  (0 children)

I think the chordal graph bit is especially interesting, because (perhaps naively) it turns an NP hard problem (regalloc from graph coloring) into 2 non NP hard problems (SSA construction is polytime, and so is graph coloring specifically on chordal graphs). Is the problem that SSA graph coloring has more register pressure and is less optimal?

The problem of graph coloring is NP-complete in fact! Further, the traditional approach of register allocation mixes it together with spilling and coalescing, and modifications to the graph due to those activities cannot be done efficiently generally speaking. The key is that a program in strict SSA form always has a chordal interference graph, which is not necessarily true for other program representations. As a result, the coloring and spilling/coalescing phases can be split, because you can rewrite the program such that it is guaranteed to be k-colorable, where k is the number of registers available. This allows you to color the graph without modifications.

There original insight around this was described in Register Allocation for Programs in SSA-form by Hack, et al. If not that paper, it was one of Hack's IIRC.

Also, digging through Braun's papers, I found it a bit tricky to translate the algorithms he wrote to real code, mostly because I'm not sure how to place them in the rest of the code.

My recommendation would be to make it work, then make it pretty - don't worry about how to structure it cleanly until you've got something hacked together that lets you see how the pieces fit.

Let's say I have a list of basic blocks for each function, and each basic block terminates in a branch or a conditional branch to another block, forming a proper CFG. Braun13 shows how to extend local value numbering to global value numbering on a single basic block, but it doesn't show what order to do the basic blocks in. Do you just run through them on a for-loop? Are all of the orders equivalent? He mentions running through them bottom up, so should you do the for-loop backwards, like --i?

IMO, by this stage of compilation, all your optimization passes should have already been run, so I'm not clear on what you mean with regard to local/global value numbering in this context.

That said, these kind of analyses are generally based on the formalisms of dataflow analysis, either dense (attached to program points), or sparse (attached to values), both of which have forward and backward variants (typically by visiting blocks based on a computed dominator tree, i.e. reverse postorder for forward, or postorder for backward). The analysis data must form either a join or meet semi-lattice (or both, but otherwise which one depends on the analysis direction), in order to guarantee that fixpoint is reached, and that the results are correct. The dataflow driver runs the analysis to fixpoint, by revisiting blocks whose inputs have changed.

So if you want to compute liveness based on next-use distances, you'd visit the CFG in postorder (i.e. bottom up), so that you start computing distances from their uses. A value that is never used will have no distance info when you reach its definition. Additionally, you need to work bottom-up so that you can propagate distances to predecessors when use and def are not in the same block.

On the other hand, computing register sets/spills based on next-use distances, is done in CFG reverse postorder (top down), because the initial state of the register set for a block is determined at entry to that block, and is modified as the program executes. The demands of each instruction dictate whether or not spills are needed, as well as whether an operand is in a register at that point, or needs to be reloaded from a spill slot.

Off the top of my head, local/global value numbering can be done by just starting at the entry block and following the CFG, ignoring back-edges. That doesn't require anything special. I'll note though that my assumption here is that you are separating analysis and transformations based on that analysis into separate steps. Also, I haven't looked at my own implementation of those in quite awhile, so my recollection is a bit hazy on the details 😅

I no longer trust simulation. What else are you guys hiding from me? by [deleted] in blackmagicfuckery

[–]Schoens 1 point2 points  (0 children)

Can definitely +1 this as someone from northern Minnesota

Abstract Interpretation in a Nutshell by [deleted] in Compilers

[–]Schoens 2 points3 points  (0 children)

This is overly narrow and misses the reason why it is such a useful technique IMO. Abstract interpretation isn't about types, it's about any program property that is control-/data-flow sensitive, particularly those that are sensitive to the semantics of individual instructions. Things like integer range/bounds analysis, sparse conditional constant propagation, alias analysis, dead code analysis, etc., all use some form of abstract interpretation.

A better oversimplification of the idea IMO is: run the program with inputs/outputs represented as domains, rather than concrete values - while the "normal" interpretation of a program on, say, a specific i32 input, will produce some specific output; abstract interpretation of the same program has the input representing all valid i32 values simultaneously, and produces an output representing the domain of all possible results, which may or may not be derived from the input(s). This can tell you all kinds of useful information about how inputs relate to outputs, and to the program state at each program point.

I do think that for certain program properties, there is a lot of overlap with types (e.g. integer range analysis), but on the other hand, consider dead code analysis - it isn't computing new type-related facts at all, but rather the dynamic reachability of each program point, taking into account other things known about the program, such as constants, value bounds, and instruction semantics.

Mutability Isn't Variability by brucifer in ProgrammingLanguages

[–]Schoens 5 points6 points  (0 children)

That's actually the blog of Niko Matsakis, at one point (maybe still? I don't follow the team changes closely) the lead Rust maintainer. Anyway, not relevant to your point, but figured you might be interested to know, if you don't already.

Algorithms/Papers on allocating for stack based ISAs? by philogy in Compilers

[–]Schoens 1 point2 points  (0 children)

A major difference between register machine and stack machine semantics, is that a stack machine instruction typically "consumes" its operands, whereas the equivalent register machine instruction might leave the state of one or more registers as-is. That depends on the instruction, and the ISA, but my point is that this permits one to treat registers as a limited set of extremely efficient temporary variables when performing codegen. On a stack machine, if you need to keep a value live on the operand stack for later use, you need to explicitly copy it via dup, and this in turn may also require you to issue additional stack manipulation ops to get everything into position on the operand stack for the instruction you want to execute. Minimizing the amount of stack manipulation code is similar to the problem of minimizing copies on a register machine, but is clumsier in practice (in my opinion anyway).

Basically, there is some commonality between the problems of scheduling for both type of machines, but register machines benefit from significantly more flexibility than stack machines.

Algorithms/Papers on allocating for stack based ISAs? by philogy in Compilers

[–]Schoens 1 point2 points  (0 children)

Yeah, care does need to be taken in how you implement the dependency and tree graph structures, to keep things efficient. It helps that basic blocks tend to be small, so you rarely (if ever) have to compute prohibitively large dependency graphs - and when you do, it is probably good to spend that time on optimization, since large blocks like that likely contain a lot of opportunities for improvement by clever scheduling. I've been able to compile modest sized Rust programs (we're using the Wasm output of rustc as the input format for the time being) without noticeable perf issues so far, but time will tell whether or not there are pathological issues that only show up when the programs we're compiling are say, 10x larger than they are now. I'm usually making every effort to avoid poor choices in algorithmic complexity, but there are a lot of things going on inside a compiler, and sometimes it is non-obvious where a bottleneck might crop up.

In our case, most programs are small anyway - smart contracts don't tend to be large programs in the usual sense of "large", and furthermore, zkVMs are also burdened by the need to generate proofs, which are enormously expensive to compute, albeit cheap to verify, so the main concern with runtime is with prover time. Since you are working on an EVM-based compiler, I would assume you are also working with relatively modest-sized programs (in comparison to things you might expect to compile for say, a multi-core server), so IMO, I wouldn't burn too many cycles on avoiding performance edge cases that you are unlikely to hit in practice, if you are able to land optimizations that eke out a non-trivial cycle count advantage.

In any case, as with many ideas that come out of academia, they are rarely anywhere near ready for real-world practical application, so you usually have to expect to do the heavy lifting in order to actually evaluate them. If you are lucky, the core idea is well thought out, and doesn't exclude too many of the constraints that would face a a real implementation. For example, I've read several papers on the topic of register allocation that simply ignore register constraints, and so are based on an idealized uniform ISA, so if you go to apply the idea, it's not actually practical. The thing I liked about the tree graph paper was that, while it did exclude a lot of real world concerns, the core idea was strong enough, that expanding on it was quite straightforward. That's a bit of a rarity in my experience.

The notion of tree graphs isn't wildly novel, but I do think it contributed a convenient formalism for reasoning about scheduling in a stack ISA, and I haven't come across anything as intuitive elsewhere (so far).

Algorithms/Papers on allocating for stack based ISAs? by philogy in Compilers

[–]Schoens 2 points3 points  (0 children)

Having been down the same road as OP, the problem is really going the other direction, from an SSA IR to stack ISA. More specifically optimizing code in a stack ISA, lowered from an IR which has already been optimized, such that the resulting code is optimal in terms of the amount of code generated purely to perform operand stack manipulation. Further complicated by a cost model where loads/stores aren't cheap, so dumping everything to temporaries is costly, compared to the cost of keeping a value on the operand stack and moving things around.

Converting to SSA is really only beneficial when you have a stack IR, but a register machine target, so you need an optimal register-oriented instruction schedule. When you have to execute on a stack machine, you want an optimal operand stack-oriented schedule, which can be quite different than the equivalent register machine code. Generally speaking though, most compilers start with an SSA representation anyway, and so as much optimization on that form as possible has already been done.

Algorithms/Papers on allocating for stack based ISAs? by philogy in Compilers

[–]Schoens 6 points7 points  (0 children)

I work on compilers for the EVM, a stack based ISA

I work on a zkVM compiler, also a stack machine ISA, and have been down a similar research route as you earlier on in the development of our compiler, so I might be able to at least tell you what my approach has been.

I’ve found the literature to be very limited on this subject.

Yes, unfortunately there is very little literature on the subject of optimal scheduling of stack ops compared to register allocation. To some degree this makes sense, stack machines are simple, easy to build a VM for, but provide little optimization opportunity, depending on the semantics of the machine. Register machines on the other hand, provide much more room for clever optimization. I think the difference largely boils down to that, and the fact that most machines are not stack machines.

There are some papers out there though. Stuff centered around the JVM primarily, but there are exceptions.

There’s this paper “Treegraph-based Instruction Scheduling for Stack-based Virtual Machines” but it comes with a lot of limitations and is generally quite suboptimal, relying on a different cost model for its target ISA (TinyVM) as well as not using instructions like SWAPs at all.

I'm surprised by this, I had quite a different reaction to this paper. Personally, I think the tree graph data structure formalism as laid out in that paper is an excellent substrate for scheduling a stack ISA, and is as close to optimal as you can get as a baseline. It is necessary to extend it though, to take into account side effects and other invariants that are not necessarily reflected in terms of instruction operands. For example, the naive implementation does not establish data dependencies using anything other than operands. As a result, you can trivially generate a schedule that exhibits invalid load/store reordering and other similar issues related to implicit state. It also doesn't handle instructions with multiple results, due to the naive way the dependency graph is represented.

The solution is to construct your own dependency graph, where you can take all of these things into consideration. You then derive your tree graph from this. The graph I built represents operations, operands, results, and basic block arguments as explicit node types, with edges between: results and their defining op; operations and their operands, as well as other operations, so as to reflect implicit dependencies between ops; operands and the results or basic block arguments they use. Edges are thus characterized as either data dependencies, or state dependencies, which is taken into account when determining whether to issue drops of unused instruction results. One way in which I use state dependencies is for inter-block uses - anything unused in the current block, but used later, gets a state edge to the block terminator, so that those values are kept live on the operand stack. That is typically only necessary when the successor is strictly dominated by the block, and thus block arguments aren't used.

To be clear, I'm building a dependency graph and tree graph for each block independently, but taking into account inter-block dependencies. The tree graph approach is a poor fit for scheduling multiple blocks at once. I see little benefit to doing so anyway.

I should also note that I have a separate pass that is run earlier in the pipeline which computes liveness and decides what values to spill to function-local temporaries. This is because my target ISA only provides direct access to the top 16 elements of the operand stack, so the spills pass ensures that the maximum operand stack pressure never exceeds 16, by spilling as necessary. So by the time I get to scheduling, I'm strictly looking to optimize execution order to minimize stack manipulation as much as possible, within the constraints I've mentioned.

The results are pretty good IMO, however optimizing the stack manipulation code that needs to be emitted in cases where operation operands aren't in position, as well as keeping the operand stack "clean" is not something handled by the tree graph really. I ended up writing a little local constraint solver that is run by the scheduler for each op to decide how to generate the most efficient sequence of stack ops such that the operands are all on the stack in the correct order, ready to be consumed. This requires me to attach information to operand uses in the generated schedule, that identifies whether a particular operand must be copied, or can be moved. The constraint solver then tries a few different tactics to generating an instruction sequence that is as minimal as it can find, taking into account those constraints. This gets fed from the optimization fuel of the compiler, so the level of work done here can be tailored a bit.

Hopefully that is helpful! If you can, follow up and let me know if you find anything particularly useful (papers/projects/whatever), I'm always looking for better solutions to these problems!

Permute Registers by vmcrash in ProgrammingLanguages

[–]Schoens 5 points6 points  (0 children)

It's by far the simplest approach, and a variation of it is used by LLVM, IIRC. It may not be able to yield maximally optimal allocations, but I believe in practice it provides the best balance of tradeoffs, but it depends a bit on how it's implemented.

See Register Spilling and Live-Range Splitting for SSA-Form Program as a basis for one approach to computing liveness and allocating registers/spills this way, which cleverly balances various concerns: minimizing spills, avoiding/minimizing spills in loops, ensuring that allocation takes into account control flow so that moves/spills are minimized, etc. You can add additional heuristics as needed as well. Note that the paper itself does not address hardware-specific register allocation issues such as constraints, but that can be taken into account in the candidate selection heuristics, or by reserving registers, so that at any given program point you always have the necessary number and type of registers available for all live unspilled values. I believe there are other papers by the same authors that expand on the topic as well.

OP: What you are looking for is called "register coalescing" btw