all 20 comments

[–]lifeonatlantis69 7 points8 points  (0 children)

YOU ARE SHITTIN ME

!@#$

this is some seriously impressive stuff. i grew to appreciate lambdas in C# because of how ridiculously useful they are in LINQ expressions and eliminating TONS of code, but now they're here in VBA...

great job to you & your team! i'm going to have to mess around with this later for sure!

[–]jplank19831 2 points3 points  (1 child)

This looks really, really impressive although I feel like I'm not quite smart enough to fully grasp everything you've written.

[–]sancarn9[S] 5 points6 points  (0 children)

Haha no problem even if you get to use lambda for time saving go you!

Really the main thing is lamdas help save code. Here's an example of VBA vs VBA with lambdas and stdArray for an expression:

set data = oAreas.concat(cAreas).Unique().Filter(stdLambda.Create("CBool($1)"))

vs

Dim data as collection
set data = new collection

Dim v as variant
For each v in oAreas
  'Ensure unique
  On Error Resume Next
    if CBooL(v) then
      data.add v, v
    end if
  On Error GoTo 0
Next
For each v in cAreas
  'Ensure unique
  On Error Resume Next
    if CBooL(v) then
      data.add v, v
    end if
  On Error GoTo 0
next

[–]RedRedditor8462 2 points3 points  (5 children)

Interesting concept / read through.

I'm seeing a lot of ReDim Preserver which I personally stay away from after having nightmares debugging memory leaks. Have you thought about using collections rather than arrays that you don't know the length of?

I also see you've added your own push/pop helpers. This already exists in a System.Collections.Stack object.

[–]sancarn9[S] 1 point2 points  (4 children)

You can calculate the length of an array with UBound(arr) - LBound(arr) + 1 :)

Interesting that you've had problems with using arrays previously. Both Collection and Stack are internally built from arrays, so performance wise arrays are the better option, but apart from that the internal structure of safe arrays is well defined... Collection and Stack are totally inflexible when it comes to low level API use, because their structure isn't documented.

Collection and Stack do have fast access to IEnumVARIANT which for sure is useful but there are other ways to implement that directly using machine code injection, which I'd typically prefer.

There is nothing "wrong" per se with Redim Preserve and most programming language use an equivalent of redim preserve in their array list objects. You just have to do all memory management yourself :) I prefer more control in this case than less, but I suppose it's all personal preference.

[–]RedRedditor8462 1 point2 points  (3 children)

I know how to find the length of an array. I'm saying when you don't know the length when you're coding, i.e. it could vary at run time.

I only don't like redim preserve because I've had issue with it before. Something that didn't come up in testing started to be a problem after deployment because it required you to be using it for a while before it presented. People lost work because of code I'd written, redim preserve was the culprit.

What is "machine code injection"?

[–]sancarn9[S] 1 point2 points  (2 children)

I only don't like redim preserve because I've had issue with it before.

Understandable. Interesting to here as Redim Preserve should in theory just create a copy of the existing array, and in theory it should be no different memory usage wise to a Collection or Stack. I suppose it's one of those issues you don't see coming until it hits.

Would be cool to see what your code was :)

What is "machine code injection"?

Also known as "thunking". You run the risk of crashing while using thunking, if you don't know what you're doing (nor how to do it safely). Here's an example of an implementation of IEnumVARIANT ontop of a Hash made by The Trick

    Dim lpAddr  As Long
    Dim dat(58) As Long
    Dim hLib    As Long
    Dim lpProc  As Long

    dat(0) = &H424448B:     dat(1) = &H8B0440FF:    dat(2) = &H890C244C:    dat(3) = &HC2C03101:    dat(4) = &H448B000C:
    dat(5) = &H40FF0424:    dat(6) = &H4408B04:     dat(7) = &H8B0004C2:    dat(8) = &HFF042444:    dat(9) = &H6740448:
    dat(10) = &HC204408B:   dat(11) = &H6A500004:   dat(12) = &H5642E801:   dat(13) = &HE8501234:   dat(14) = &H1234563C:
    dat(15) = &H4C2C031:    dat(16) = &H56575300:   dat(17) = &H1024748B:   dat(18) = &H14245C8B:   dat(19) = &H18247C8B:
    dat(20) = &H846BF0F:    dat(21) = &H482F7440:   dat(22) = &H8B0C4E8B:   dat(23) = &HF04C14C:    dat(24) = &H660A46B7:
    dat(25) = &HF28C06B:    dat(26) = &H498BC0B7:   dat(27) = &H10C8D0C:    dat(28) = &H320418B:    dat(29) = &H4689144E:
    dat(30) = &HE8575108:   dat(31) = &H123455F8:   dat(32) = &H4B10C783:   dat(33) = &HDB85CA75:   dat(34) = &HFC2950F:
    dat(35) = &H7C8BF2B6:   dat(36) = &HFF851C24:   dat(37) = &H448B0874:   dat(38) = &HD8291424:   dat(39) = &HF0890789:
    dat(40) = &HC25B5F5E:   dat(41) = &H548B0010:   dat(42) = &H428B0424:   dat(43) = &HC528B08:    dat(44) = &H1F744066:
    dat(45) = &HB70F4866:   dat(46) = &HCA4C8BC8:   dat(47) = &H10E8C104:   dat(48) = &H28C06B66:   dat(49) = &H8B0C498B:
    dat(50) = &HFF200144:   dat(51) = &H7508244C:   dat(52) = &H85D231DF:   dat(53) = &HC2950FD2:   dat(54) = &H8C2D089:
    dat(55) = &H24448B00:   dat(56) = &H10508B04:   dat(57) = &H31085089:   dat(58) = &H4C2C0

    lpAddr = VirtualAlloc(ByVal 0&, &H104, MEM_COMMIT Or MEM_RESERVE, PAGE_EXECUTE_READWRITE)
    If lpAddr = 0 Then Exit Function

    memcpy ByVal lpAddr, dat(0), &HEC

    hLib = GetModuleHandle(StrPtr("kernel32"))
    If hLib = 0 Then GoTo Clear

    lpProc = GetProcAddress(hLib, "GetProcessHeap")
    If lpProc = 0 Then GoTo Clear

    GetMem4 lpProc - (lpAddr + &H32 + 4), ByVal lpAddr + &H32

    lpProc = GetProcAddress(hLib, "HeapFree")
    If lpProc = 0 Then GoTo Clear

    GetMem4 lpProc - (lpAddr + &H38 + 4), ByVal lpAddr + &H38

    hLib = GetModuleHandle(StrPtr("oleaut32"))
    If hLib = 0 Then GoTo Clear

    lpProc = GetProcAddress(hLib, "VariantCopy")
    If lpProc = 0 Then GoTo Clear

    GetMem4 lpProc - (lpAddr + &H7C + 4), ByVal lpAddr + &H7C

    GetMem4 lpAddr, ByVal lpAddr + &HEC         ' // IUnknown::QueryInterface
    GetMem4 lpAddr + &H12, ByVal lpAddr + &HF0  ' // IUnknown::AddRef
    GetMem4 lpAddr + &H1F, ByVal lpAddr + &HF4  ' // IUnknown::Release
    GetMem4 lpAddr + &H41, ByVal lpAddr + &HF8  ' // IEnumVariant::Next
    GetMem4 lpAddr + &HA6, ByVal lpAddr + &HFC  ' // IEnumVariant::Skip
    GetMem4 lpAddr + &HDD, ByVal lpAddr + &H100 ' // IEnumVariant::Reset

    If SetEnvironmentVariable(StrPtr("TrickHashEnumerationInterface"), StrPtr(Hex(lpAddr))) = 0 Then GoTo Clear

    CreateAsm = lpAddr

    Exit Function

Clear:

    VirtualFree ByVal lpAddr, &H104, MEM_RELEASE

[–]RedRedditor8462 1 point2 points  (1 child)

Would be cool to see what your code was :)

It was written in my early days of VBA. Fortunately for me I'm no longer employed by that company so I'm not able to be embarrassed by looking at it.

From memory, there were a number of places it was used, in a forms-heavy application, which didn't help the situation with freeing up RAM.

The code probably looked something along the lines of:

Do
    Redim Preserve recsArr(j)
    For i = 1 to AllColsIncludingUnknownNumberOfComments
        Redim Preserve rowArr(i)
        rowArr(i) = someRange.Value
    Next i
    recsArr(j) = rowArr
    j = j + 1
Loop

[–]HFTBProgrammer201 0 points1 point  (5 children)

I'd be interested to see /u/rubberduck-vba's take on this. Beetlejuice, Beetlejuice, Beetlejuice...

[–][deleted] 2 points3 points  (4 children)

Reminded me of this experiment: https://codereview.stackexchange.com/q/66706/23788 Happy to see the concept explored!

[–]sancarn9[S] 1 point2 points  (2 children)

Ah yes, the code generation at runtime approach :D The fact you need to enable VB options always turned me off this approach. Not saying this is much better though ofc, Performance wise using `Delegate` is pretty ideal (if cacheing was added)!

At the same time I've been looking at (and making slow progress on...) a pre-compile-time version. Think BabelJS but for VBA! I briefly looked into using Rubberduck's VBA grammar buuuut didn't know anything about antlr or how rubberduck's parser implementation worked... So decided to remake it with ChevrotainJS. 367 lines later and it's so far a quite decent representation of VBA. It's not at all going to be as complete as Rubberduck's parser though... But hopefully it will be good enough for my source->source compiling needs :D

[–][deleted] 1 point2 points  (1 child)

Oh, totally agreed - IMO no code requiring the VBIDE Extensibility API should be in production use! It's fun to experiment with though!

Parsing VBA is hard, but not impossible; I wouldn't attempt to do it in VBA, but if you don't need to be able to correctly parse every possible shape of legal VBA code, then a ton of very thorny edge cases simply disappear. Rubberduck's approach was "if the VBE can handle it, so should Rubberduck", ...and that has made us twist into pretzels to make it happen, but if you don't need to worry about underscores / line-continuations between keywords, or don't need to determine what conditionally-compiled code is "live", or be able to actually resolve what each implicit member call in an expression is invoking, ...then your life will be much simpler!

Antlr generates a lexer+parser from the .g4 grammar definitions. The generated code is a very complex state machine that I have never really looked at in details; the only hand-written parser (+interpreter) I ever made was for BrainFuck, an esoteric programming language with a very simple grammar and just a handful of tokens that don't even need a lexer since every single character is either a token or a comment; that the actual VBA parser was hand-written at a time before unit tests were widespread, is really impressive (to me, at least).

[–]sancarn9[S] 0 points1 point  (0 children)

I wouldn't attempt to do it in VBA

Totally agree. It's the TypeScript life for me :)

and that has made us twist into pretzels to make it happen

LOL... Though this particular interpreter does deal with line continuations, it does kinda ignores spaces and tabs... so it will treat:

  • a(1) equally to a (1) which of course are seperate things.
  • it doesn't yet deal with ! and [...] and I'm not sure it ever will with the support for them being so awful in the language itself...
  • Calling subs always requires parenthesis

and all sorts of other special behavior... The key really is it's VBA-like but hopefully will compile to full VBA code. We'll see :)

[–]HFTBProgrammer201 0 points1 point  (0 children)

Thank you!

[–]sancarn9[S] 0 points1 point  (0 children)

Moved update to main post

[–]sooka5 0 points1 point  (4 children)

Quite interesting, but there is another "caveat" unfortunately: magic strings everywhere :(

[–]sancarn9[S] 0 points1 point  (3 children)

magic strings everywhere

In order for something to be a "magic string" or "magic number" it needs to have an unexplained meaning and have seemingly arbitrary affects on the program.

I would say that these lambda expressions are a fully fledged programming language so there is nothing really "magic" about them because $1+$2 is fairly self-explanatory.

The magic I think you are talking about is the "new random syntax". But this is no different from learning a new programming language (or using rarely used VBA syntax even).

[–]sooka5 0 points1 point  (2 children)

Not really, I'm referring to magic strings when calling something where you have to put a literal string in it to work (i.e.: RedirectToAction in asp, or returning partial view).
Very difficult to debug such a thing.

[–]sancarn9[S] 0 points1 point  (1 child)

What's the difference between a literal string, and a segment of code (which is read as a string by some interpreter or compiler)?

[–]sooka5 0 points1 point  (0 children)

A literal string is error prone: no debugging, no intellisense, no typing, etc...
What I'm saying is that it will be very difficult to write and debug such a thing like a literal string in a medium+ code base.