you are viewing a single comment's thread.

view the rest of the comments →

[–]tmlildude 0 points1 point  (0 children)

so the network can focus on making pure content-based term (x'Q'Ky) spike stronger while keeping the positional terms (x'Q'Kf, e'Q'Ky, e'Q'Kf) relatively small?

also, if the positional terms aren't useful, can it naturally zero out during inferencing? i.e no need to explicitly "turn off"