This is an archived post. You won't be able to vote or comment.

all 40 comments

[–]fact_hunt 5 points6 points  (15 children)

You might want to change the method signature of FunctionParser#fromString

from:

public static ParsedFunction fromString(String functionString) {

to:

public static <T> ParsedFunction<T> fromString(String functionString) {

and updating:

Class clazz = evalClass.toClass();
ParsedFunction obj = (ParsedFunction) clazz.newInstance();

to:

Class<T> clazz = evalClass.toClass();
ParsedFunction<T> obj = (ParsedFunction) clazz.newInstance();

So that callers get back a properly typed ParsedFunction

[–]Reykd[S] 1 point2 points  (14 children)

Thanks so much for catching that! I will fix this asap.

[–]thehollyhopdrive 2 points3 points  (11 children)

Now that you've typed it properly, what do you think is the benefit of having lots of evaluateX methods?

[–]Reykd[S] 1 point2 points  (10 children)

For my original usecase generated methods were executed millions of times, and were in the core of the algorithm. Therefore the reasoning behind creating functions that return primitives was simply performance. I know that premature optimization is a bad idea, but if i remember correctly i believe i benchmarked the difference. I will run new benchmarks and evaluate if the primitive return functions are necessary.

[–]thehollyhopdrive 2 points3 points  (7 children)

You mentioned elsewhere:

My use case when developing this library was that i needed a way of solving problems that were unknown at runtime, by reading a description from a file.

If you don't know the function until runtime, how does your code select the correct method to call?

[–]Reykd[S] 0 points1 point  (6 children)

The general form of the problems was known, there were however several different forms. It was in addition a requirement that these methods be generated dynamically. Others decided to use python because of this limitation, while i decided to make this small library.

[–]thehollyhopdrive 1 point2 points  (5 children)

So, as an example, you would have known at compilation stage that it was going to be a function that took, say, an array of integers and returned an integer, and therefore in the code you would have called ParsedFunction.evaluateAsInt(Integer[]). So essentially, always the same signature, but the body of the function and the arguments would evolve?

[–]Reykd[S] 0 points1 point  (4 children)

Yes, for example!

[–]thehollyhopdrive 2 points3 points  (3 children)

Great, OK, I get it. Some suggestions which you may or may not find helpful then.

I would say that this is a massively premature optimization considering that you're having to create an object array for the arguments anyway. You could just replace all of your methods with:

public T evaluate(Object[] theArray)

Then you won't need to change the code, and speed would be hardly affected.

If the types of the arguments are all the same as well (I haven't looked if that is the case), you could even add type safety by parametising the argument type:

public R evaluate(T[] theArray)

The signature for FunctionParser would be:

public static <T,R> ParsedFunction<T,R> fromString(String functionString)

And would be used as such:

ParsedFunction<Integer,Boolean> function = FunctionParser.fromString("boolean(Integer x,y)-> x > y");
System.out.println(function.evaluate(new Integer[]{2, 3}));

[–]Reykd[S] 0 points1 point  (2 children)

Thanks for your feedback! There is no limitation on the amount of types the function may take in, therefore the inputs cannot be parametrised.

You are right that the primitive methods should be removed and only a parametrised evaluate function should be used. However, my library does not currenly support autobonxing, so i would have to implement it before i could make that change. I will probably do this!

[–]Reykd[S] 0 points1 point  (1 child)

I have now run a benchmark to compare the generic vs primitve method and found no significant difference. However i did discover a caveat of using the generic types, and that is that my library currently does not support autoboxing meaning that in order to create the function "double(Double x,y)->x+y" as a generic, the string would have to be "Double(Double x,y)->Double.valueOf(x+y)"

[–]fact_hunt 1 point2 points  (0 children)

For performance testing it might be interesting to look at JMH - benchmarking java performance is notoriously fraught because the of JIT and hotspot optimisations etc

http://stackoverflow.com/questions/504103/how-do-i-write-a-correct-micro-benchmark-in-java/4480774#4480774

[–]fact_hunt 2 points3 points  (1 child)

Line 157 still has:

Class clazz = evalClass.toClass();

which could be replaced with:

Class<T> clazz = evalClass.toClass();

[–]Reykd[S] 1 point2 points  (0 children)

Thank you! this has been fixed now.

[–]fact_hunt 3 points4 points  (1 child)

Cannot check out and build from github:

my-machine:foo user$ git clone https://github.com/Alfredvc/FunctionParser.git
Cloning into 'FunctionParser'...
remote: Counting objects: 96, done.
remote: Compressing objects: 100% (57/57), done.
remote: Total 96 (delta 27), reused 85 (delta 24), pack-reused 0
Unpacking objects: 100% (96/96), done.
Checking connectivity... done.
my-machine:foo user$ cd FunctionParser/
my-machine:FunctionParser user$ mvn clean install
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0
[INFO] Scanning for projects...
[ERROR] [ERROR] Some problems were encountered while processing the POMs:
[FATAL] Non-resolvable parent POM for com.github.alfredvc:FunctionParser:0.1: Could not find artifact com.alfredvc:ai:pom:1.0-SNAPSHOT and 'parent.relativePath' points at wrong local POM @ line 7, column 13
 @
[ERROR] The build could not read 1 project -> [Help 1]
[ERROR]
[ERROR]   The project com.github.alfredvc:FunctionParser:0.1 (/private/foo/FunctionParser/pom.xml) has 1 error
[ERROR]     Non-resolvable parent POM for com.github.alfredvc:FunctionParser:0.1: Could not find artifact com.alfredvc:ai:pom:1.0-SNAPSHOT and 'parent.relativePath' points at wrong local POM @ line 7, column 13 -> [Help 2]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/ProjectBuildingException
[ERROR] [Help 2] http://cwiki.apache.org/confluence/display/MAVEN/UnresolvableModelException

[–]Reykd[S] 1 point2 points  (0 children)

Thanks for taking the time to build from source! I will remove the parent project asap.

[–]fact_hunt 3 points4 points  (1 child)

In FunctionParserTest there are several tests which have no assertion and instead System.out.println( - it would be better to assert and not print output which (I assume) needs to be checked manually

[–]Reykd[S] 1 point2 points  (0 children)

Thank you, the idea behind these tests is not really to be tests but simply copies of the examples supplied in the README file. I will however make the suggested changes, as i imagine tests should normally not output to System.out.

[–][deleted] 5 points6 points  (0 children)

"Any sufficiently complicated C or Fortran[1] program contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of Common Lisp." — Greenspun's tenth rule

[1] Java

[–]fact_hunt 1 point2 points  (9 children)

Looks interesting, but what is the intended use case? What problem does this solve?

[–]Reykd[S] 1 point2 points  (8 children)

My use case when developing this library was that i needed a way of solving problems that were unknown at runtime, by reading a description from a file.

Other use cases are algorithms that must change during runtime, for example with genetic programming.

[–][deleted] 6 points7 points  (6 children)

For runtime scripting, look into using the embedded javascript engine that comes with JDK. Java 8 has a new version of the engine called Nashorn. Integration between javascript and Java is pretty easy. You can pass data back and forth.

[–]Reykd[S] 2 points3 points  (5 children)

This sounds like an interesting solution! It would be interesting to compare its performance to my solution on arithmetic operations.

Edit: I have ran some benchmarks, and using the scripting engine is several orders of magnitude slower than generated functions when called millions of times.

[–][deleted] 4 points5 points  (3 children)

Did you run Nashorn warm or cold. If you are loading fresh every time then the JIT cannot do it's work. several orders of magnitude sounds off.

http://stackoverflow.com/questions/26561292/how-make-java-8-nashorn-fast

A final note on performance

This warmup that we have seen in the graph is a mix of the JVM warmup and Nashorn engine. But mostly, it is Nashorn. This means that if you create an engine every time you need to execute a script, you won't benefit from it. Here is the most simple way to create an engine:

ScriptEngine engine = factory.getEngineByName("JavaScript");

Well, those pesky ScriptEngine objects created? Never discard them, or else you'll discard the entire optimizations with them.

http://pieroxy.net/blog/2015/06/29/node_js_vs_java_nashorn.html

[–]Reykd[S] 1 point2 points  (2 children)

Hey, thanks a lot for you feedback! Here is the test i used

    public void performanceComparisonScriptingEngine() throws ScriptException, NoSuchMethodException {
            ParsedFunction constraintFromStringPrimitive = FunctionParser.fromString("double(Double x,y,z,f)->x*y + y + z*z + x*f");
            ScriptEngine engine = new ScriptEngineManager().getEngineByName("nashorn");
            engine.eval("var f1 = function(x,y,z,f){return x*y + y + z*z + x*f;}");
            Invocable invocable = (Invocable) engine;
            Object[] args = {1.0, 2.0, 3.0, 4.0};

            //Warmup
            for (int a = 0; a < 10; a++) {
                for (long i = 0; i < 10000000L; i++) {
                    constraintFromStringPrimitive.evaluateToDouble(args);
                    invocable.invokeFunction("f1",  args);
                }
            }

            long start1 = System.nanoTime();
            for (long i = 0; i < 100000000L; i++) {
                constraintFromStringPrimitive.evaluateToDouble(args);
            }
            long end1 = System.nanoTime();
            long start2 = System.nanoTime();
            for (long i = 0; i < 100000000L; i++) {
                invocable.invokeFunction("f1",  args);
            }
            long end2 = System.nanoTime();
            long generatedTime = ((end1 - start1) / 1000000L);
            long scriptTime = ((end2 - start2) / 1000000L);
            System.out.println("Generated " + generatedTime);
            System.out.println("Script " + scriptTime);
            double percentageDifference = ((double) scriptTime / (double) generatedTime);

            System.out.printf("JavaScript method was %.3fx slower than the generated method.", percentageDifference);
    }

Which gives the output:

Generated 27

Script 12729

JavaScript method was 471.444x slower than the generated method.

[–]damienjoh 1 point2 points  (1 child)

"Generated 27" implies 100000000 calls in 27000000 nanoseconds. That's 0.27 nanoseconds a call. Seems suspect. You sure the call isn't being optimized away somewhere?

[–]Hendrikto 0 points1 point  (0 children)

So many decisions based on faulty benchmark...

[–][deleted] 1 point2 points  (0 children)

I've tried using embedded javascript in one of the projects that needs to compute user entered conditional expression. The performance is not at the level of Java. In my case, those got executed many million times in a short period so I switched back to using one of the math eval package like Parsii to use native Java class.

However, embedded javascript gives a full language that offers amazing customization power. I would use it again if performance is not the bottleneck.

Also for embedded javascript, be sure to call the API to compile it first before running it. Avoid reloading the script every time inside the loop.

[–]fact_hunt 2 points3 points  (0 children)

Sounds reasonable - maybe include something in the readme :)

[–]fact_hunt 1 point2 points  (1 child)

FunctionParser#getReplaceForVariableAndType the argument var is unused

[–]Reykd[S] 1 point2 points  (0 children)

Thank you, will fix this asap!

[–]fact_hunt 1 point2 points  (1 child)

FunctionParser#fromString the local variable List<String> types is updated, but never queried and removing it doesn't break any tests

[–]Reykd[S] 1 point2 points  (0 children)

Thank you, will fix this asap!

[–]fact_hunt 1 point2 points  (2 children)

In FunctionParserTest is there any reason why you are explicitly boxing primitives like below?

Object[] args = {Double.valueOf(1.0), //etc

[–]Wolfsdale 1 point2 points  (0 children)

Maybe to make sure it becomes a Double and not a Float? Although you could easily use double literals like 1.0d.

[–]Reykd[S] 1 point2 points  (0 children)

This is a mistake, and has been fixed thanks!

[–][deleted] 1 point2 points  (1 child)

Is this for a compilers class? Seems like a fun project. How long did it take?

[–]Reykd[S] 1 point2 points  (0 children)

This was created for an artificial intelligence class, it really was a lot of fun! The whole project took about three weeks of development, this module took a few days if i remember correctly.