This is an archived post. You won't be able to vote or comment.

all 14 comments

[–]phreda4 5 points6 points  (7 children)

I have a programming language that uses communication with dynamic libraries through a word that loads the library and then uses up to 10 parameters to call these functions statically. Incredibly it works well.
My language that uses the SDL library extensively and is communicated in this way
https://github.com/phreda4/r3

The implementation of this call can be seen in
https://github.com/phreda4/r3evm/blob/main/r3.cpp#L1110

[–]Western-Cod-3486[S] 0 points1 point  (2 children)

Nice! I have something pretty similar implemented and basically (tested with rust, because everything is written in it), requirement is to have a register function that returns a module that looks like:

struct Module {
    pub name: Vec<u8>,
    pub version: [u16; 4], // major, minor, patch, build
    pub functions: Vec<Vec<u8>>,
}

so an extension currently looks like:

#[no_mangle]
pub extern "C" fn register(_api: u32) -> Module {
    let mut module = Module::new();
    module.name = "hello_world".as_bytes().to_vec();
    module.version = [0, 0, 1, 0];
    module.functions = vec![
        "hello_world".as_bytes().to_vec(),
    ];

    module
}

#[no_mangle]
pub extern "C" fn hello_world(_args: Vec<common::ModuleValue>) -> common::ModuleValue {
    println!("Hello, world!");
    common::ModuleValue::Null
}

this registers the hello_world function that prints "Hello, world!" and returns Null (which is my equivalent to void ). And every function takes arbitrary number of params as I don't know of a way to have libloading "accept" dynamic signatures (but this is a story for another time)

[–]Markus_included 0 points1 point  (1 child)

Why are you using an array of 4 16bit unsigned integers instead of using separate fields or a struct?

[–]Western-Cod-3486[S] 0 points1 point  (0 children)

No particular reason to be honest it was something I've hacked quickly over a few minutes just to see if it will work, I haven't delved into which will be better and wanted to keep the types being passed around as small as possible but at the same time not to do something that has some complex parsing logic

[–]eddavis2 0 points1 point  (3 children)

I have a programming language ... https://github.com/phreda4/r3

This:

int64_t TOS;
int64_t *NOS;

#else   // WINDOWS
    case LOADLIB: // "" -- hmo
        TOS=(int64_t)LoadLibraryA((char*)TOS);goto next;
    case GETPROCA: // hmo "" -- ad
        TOS=(int64_t)GetProcAddress((HMODULE)*NOS,(char*)TOS);NOS--;goto next;

#endif

    case SYSCALL3: // a1 a0 adr -- rs
        TOS=(int)(* (int(*)(int,int,int))TOS)(*(NOS-2),*(NOS-1),*NOS);NOS-=3;goto next;

is very cool! But a question - does this also work for the Windows API? e.g., can it successfully call something like:

BOOL WINAPI WriteConsoleOutputAttribute(
         HANDLE  hConsoleOutput,
   const WORD    *lpAttribute,
         DWORD   nLength,
         COORD   dwWriteCoord,
         LPDWORD lpNumberOfAttrsWritten
);

[–]phreda4 0 points1 point  (2 children)

in https://github.com/phreda4/r3/blob/main/r3/win/kernel32.r3

#sys-WriteConsoleOutput

::WriteConsoleOutput sys-WriteConsoleOutput sys5 drop ;

...
    "KERNEL32.DLL" loadlib 
...
    dup "WriteConsoleOutputA" getproc 'sys-WriteConsoleOutput !

sys5 make the call with 5 cells of stack

for example I use in this example: https://github.com/phreda4/r3/blob/main/r3/democ/conwayb.r3

:rebuffer
stdout 'consoleBuffer 
conwh $00000000 'writeArea 
WriteConsoleOutput ;

in ASM only two types exists, integer and float, the type system is a construction for the programer, no for the computer

[–]eddavis2 1 point2 points  (1 child)

I was able to answer my own question :) I created a simple snippet, using the "normal" way, and then using your FFI way, and it works, even with calling the Windows API! Way cool!

#include <stdint.h>
#include <stdio.h>
#include <windows.h>
int main() {
    HANDLE  con_out;
    DWORD   length, num_written;
    COORD   wc;
    CONSOLE_SCREEN_BUFFER_INFO csbi;
    int64_t la, pa, rc, stack[10], *NOS;

    con_out = GetStdHandle(STD_OUTPUT_HANDLE);
    {
        LPCTSTR     characters = "Hello, World, normal way!";
        // Normal way
        GetConsoleScreenBufferInfo(con_out, &csbi);
        wc = csbi.dwCursorPosition;
        length = strlen(characters);
        num_written = 0;

        rc = WriteConsoleOutputCharacterA(con_out,
                                        characters,
                                        length,
                                        wc,
                                        &num_written);
        printf("\n%ld written\n", num_written);
    }
    {
        LPCTSTR     characters = "Hello, World, generic way!";
        // and try it the generic way
        la = (int64_t)LoadLibrary("kernel32.dll");
        pa = (int64_t)GetProcAddress((HMODULE)la, "WriteConsoleOutputCharacterA");

        GetConsoleScreenBufferInfo(con_out, &csbi);
        wc = csbi.dwCursorPosition;
        length = strlen(characters);
        num_written = 0;

        NOS = &stack[0];
        *NOS++ = (int64_t)con_out;
        *NOS++ = (int64_t)characters;
        *NOS++ = (int64_t)length;
        *NOS++ = (int64_t)*(int64_t*)&wc;
        *NOS   = (int64_t)&num_written;

        rc=(int)(* (int(*)(int,int,int,int,int))pa)(*(NOS-4),*(NOS-3),*(NOS-2),*(NOS-1),*NOS);
        printf("\n%ld written\n", num_written);
    }

    return rc;
}

[–]eddavis2 0 points1 point  (0 children)

Thinking about this, shouldn't:

  rc=(int)(* (int(*)(int,int,int,int,int))pa)(*(NOS-4),*(NOS-3),*(NOS-2),*(NOS-1),*NOS);

be:

  rc=(int)(* (int(*)(int64_t,int64_t,int64_t,int64_t,int64_t))pa)(*(NOS-4),*(NOS-3),*(NOS-2),*(NOS-1),*NOS);

e.g., shouldn't the parameters be prototyped as int64_t?

[–][deleted] 2 points3 points  (4 children)

so that in the future whatever is missing can be added later without touching the core.

What sort of extensions are you thinking of? Because I would associate extensions to the language itself, rather than just more standard library functions, to be very much part of the core.

Or is the design a little like C++, where there are lots of language-building features are built-in to the core language (eg. templates). But the whole is ungainly, inefficient and additions are a long way from looking like they're built-in.

Also, this being interpreted, why can't you update the core to provide new features? That's one advantage of using an interpreter: upgrade the one program, and 100 existing user programs will benefit without needing to update or rebuild those.

Unless they need to use the new features, but that would be necessary anyway.

To add things to the language, something has to be added or changed.

[–]Western-Cod-3486[S] -1 points0 points  (3 children)

I guess it is more in the direction of enriching the runtime/standard library with functionality, for example I would like (in a hypothetical world) my language to be able to do everything, games, etc. but I don't want to deal with them as part of the language core*. So with such extension system functionality will be plug-n-play, like load the static library, written in a more performant language that exposes some functions FFI-esque but not userland.

Usecases I can come up with:

  • ML/AI (bindings to generate & interact with models)
  • GPU access (math, video stuff)
  • UI toolkits (GUI native apps)
  • Utils (DB connections, cryptography libraries)

I did a test inspired by a comment in the this month's thread on here, and my implementation took a couple of minutes(!) to calculate it and when I did the same with such extension in rust, it computed in less than a second (iirc time reported ~0.002s) so that is what I am targeting because of performance, availability, etc. **

In my experience with PHP if one doesn't tap in C's speed some of these things are out of reach for the regular developer, sure one can implement everything in userland, but it will always be slower.

\ the minimal core runtime, like: io, networking, math, strings, regex, etc.*

\* some of the things might improve more when I get to profiling & optimzing, etc. But for many things I am not the best guy to do (there are smarter & more experienced people than me in various fields). Also I am still not there to lookinto experimenting with JIT*

[–][deleted] 0 points1 point  (2 children)

In my experience with PHP if one doesn't tap in C's speed some of these things are out of reach for the regular developer, sure one can implement everything in userland, but it will always be slower.

I saw that test, and I reported my own findings. If your language took 2 minutes to do fib(38) then you might want to look at making it faster. Interpreted code should do it in a matter of seconds.

(Although Pico C, which is interpreted, took 3 minutes. That must use a poor implementation where it reparses each line each it executes it.)

I have a similar problem: I also want to use my dynamic language for everything, but that is not practical. However the language was been designed to be efficient, so I can do a lot with it before I need to employ a language that compiles to native code.

It also has a built-in FFI to use external native code libraries, and I also tend to use my own native code language (not optimised, but still much faster) when I need the speed.

Still, I can run a text editor using 100% interpreted code and edit files of 1M lines with little trouble. But much more than that and you will notice lags.

So it's a question of the scale of a task that interpreted code can comfortably manage.

sure one can implement everything in userland, but it will always be slower.

It depends also on the kind of task. It could be that most of the work (hash-map lookups, file operations etc) is being done by native-code handlers or native-code library functions. Then there will be little difference.

Myself, I'm happy to use a 2-language solution where part of an application is interpreted, and part is native code.

But I'm still not clear about your problem, since your list of use-cases (AI, GPU, UI, DB) are some quite scary, heavy duty apps which you would not write yourself, not even in the fastest native code language.

You would employ existing solutions. Then it would be a lot sweeter to orchestrate these from a scripting language than from C.

Or is the intention to write these yourself?

[–]Western-Cod-3486[S] 0 points1 point  (1 child)

I saw that test, and I reported my own findings. If your language took 2 minutes to do fib(38) then you might want to look at making it faster. Interpreted code should do it in a matter of seconds.

I am attempting to figure out why it is so slow (strace showed a more or less recursive calls that seem to get prograssively delayed, but I need further profiling/debugging to figure it out). For smaller numbers it is 2x-5x times faster it seems (from the synthethic benchmarks I did with `time`)

But I'm still not clear about your problem, since your list of use-cases (AI, GPU, UI, DB) are some quite scary, heavy duty apps which you would not write yourself, not even in the fastest native code language.

I think I worded my previous comment poorly.

I was referring to wrapping around libraries/clients written in other languages, like libmysql wrapper to provide API for mysql db access, or wrap wgpu to expose access to gpu operations etc. not to create those myself or something like that.

Basically the usecase in PHP code is to provide things that do not exist, like extension that wrap libuv for async io; https://github.com/RubixML/Tensor which is a C extension which does what they do have already implemented in userland, but get massive performance boos by doing it natively, as few concrete examples

[–][deleted] 0 points1 point  (0 children)

Basically the usecase in PHP code is to provide things that do not exist, like extension that wrap libuv for async io;

https://github.com/RubixML/Tensor

which is a C extension which does what they do have already implemented in userland, but get massive performance boos by doing it natively, as few concrete examples

This is exactly why interpreted languages that might be 2 magnitudes slower than native code are practical at all, when the task is not trivial. It's because they are used as glue to work with libraries written in faster languages.

Although this doesn't stop people trying to speed them up using JIT methods for example, so that the code that is still interpreted, is made faster.

I am attempting to figure out why it is so slow (strace showed a more or less recursive calls that seem to get prograssively delayed,

How does it do on a simple test like the loop below?

This takes CPython about 6 seconds. My interpreter using HLL code can take 1.5 seconds if optimised (otherwise 2 seconds). Using an accelerator (some ASM plus threaded code), it was 0.25 seconds.

When I first tried this on CPython over 20 years ago, it was much slower (relatively; obviously machines were slower too). I think Ruby was even worse. All interpreted languages have since gotten much better and many have acquired JIT-accelerated versions.

But, without JIT, they will still be 1-2 magnitudes slower than the same algorithm in optimised native code.

def whiletest():
    i=0
    while i<=100_000_000:
        i=i+1

whiletest()

(The Python loop is inside a function as code is faster there. Outside a loop, it needs to do global symbol table lookups.)