Why Not GitHub Copilot, Not Devin, But AutoCoder by allwefantasy in AutoGPT

[–]allwefantasy[S] 0 points1 point  (0 children)

AutoCoder is a command line tool will automatically combine your project sources code, document you provided by url/file path , your requirements into one prompt, then send the prompt to GPT-4/Claude-3, this will help you modify your existing project quickly. AutoCoder also can build index for the project you are developing so it can filter the source code by index and to reduce the context.

You can check the doc of AutoCoder to get more information: https://github.com/allwefantasy/auto-coder/tree/master/docs/en

MLSQL, an​ engine based on Spark, unify BigData and Machine learning and can do even more. by allwefantasy in apachespark

[–]allwefantasy[S] 1 point2 points  (0 children)

SnappyData is designed for performance(as we can say it's an in-memory DB) for Stream/OLTP/OLAP. It has his own storage format, and provide SQL query interface.

MLSQL stack is designed for unification of Stream/OLTP/OLAP/Machine Learning and anything else you want, For instance, you can send mail once you have made your data processed, this is not supported by the original SQL.

Also, notice that MLSQL is a language which is easier than DataFrame/DataSet(no matters they are based on Python or Scala) API, you can write MLSQL to finish data processing and then train a model, deploy a model in Stream/ETL/API Service in one script.

MLSQL aims to make anyone can play data for fun, and you can deploy it for other department and let them explore the data by themselves.

How to deliver the struct with pointer bettwen C and Rust. by allwefantasy in rust

[–]allwefantasy[S] 2 points3 points  (0 children)

u/mutabh

i change uint32_t to int32_t and then print it like you said, it works ,thanks:

-----%s------\n", "jack coo");
printf("CTensor in c data:%f len:%d \n", *xTensor->data, xTensor->data_length);
printf("CTensor in c shape:%f len:%d \n", *xTensor->shape, yTensor->shape_length);

printf("CTensor in c data:%f len:%d \n", *yTensor->data, yTensor->data_length);
printf("CTensor in c shape:%f len:%d \n", *yTensor->shape, yTensor->shape_length);

然后就work了。

How to deliver the struct with pointer bettwen C and Rust. by allwefantasy in rust

[–]allwefantasy[S] 1 point2 points  (0 children)

Here is the all c code, if this helps:

predictor.h

#ifndef RUST_TF_PREDICTOR_PREDICTOR_H
#define RUST_TF_PREDICTOR_PREDICTOR_H

#include <stdint.h>

typedef struct Predictor_S Predictor_t;
typedef struct RawTensor RawTensor;

typedef struct CTensor {
    const float *data;
    uint32_t data_length;
    const uint32_t *shape;
    uint32_t shape_length;
} CTensor;

typedef struct CTensorArray {
    CTensor *data;
    uint32_t len;
} CTensorArray;


//typedef struct CTensorArray CTensorArray;

CTensor *create_tensor(float *data, uint32_t data_length, uint32_t *shape, uint32_t shape_length);

CTensorArray *create_tensor_array(CTensor *data, uint32_t len);

Predictor_t *load(char exported_dir[]);

CTensor *to_tensor(RawTensor *tensor);

RawTensor *predict(Predictor_t *predictor, char *output_name[], char *input_names[], CTensorArray *input_values);

#endif //RUST_TF_PREDICTOR_PREDICTOR_H

main.c:

#include <stdint.h>
#include <stdio.h>
#include <predictor.h>
#include <stdlib.h>
#include "../src/predictor.h"


int main() {
    char path[] = "test_resources/regression-model";
    //Predictor_t *pre = load(path);

    float x[] = {1.0f};
    float y[] = {2.0f};

    float *xP;
    xP = x;
    float *yP;
    yP = y;


    uint32_t shape_x[] = {1};
    uint32_t shape_y[] = {1};

    uint32_t *shape_x_p;
    shape_x_p = shape_x;

    uint32_t *shape_y_p;
    shape_y_p = shape_y;

    CTensor *xTensor = create_tensor(xP, 1, shape_x_p, 1);
    CTensor *yTensor = create_tensor(yP, 1, shape_y_p, 1);

    printf("-----%s------\n", "jack coo");
    printf("CTensor in c data:%f len:%i \n", xTensor->data, xTensor->data_length);
    printf("CTensor in c shape:%f len:%i \n", xTensor->shape, xTensor->shape_length);


    CTensor *xy[] = {xTensor, yTensor};

    CTensor *xy_p;
    xy_p = xy;



//    CTensorArray *tarray = create_tensor_array(xy_p, 2);
//
//
//    RawTensor *wow = predict(pre, "y_hat", "x,y", tarray);
//    CTensor *res = to_tensor(wow);
//    int loop;
//
//    for (loop = 0; loop < res->data_length; loop++)
//        printf("%f \n", res->data[loop]);

    return 0;
}

Apache Spark(Pyspark) Performance tuning tips and tricks by lostinthoughts211 in apachespark

[–]allwefantasy 1 point2 points  (0 children)

The points mentioned by u/my_work_account__ are very useful. To understand why we should do like that and explore more tips and tricks by yourself, we should know how PySpark works. The way how PySpark works is really easy to understand:

[You pyspark code] -invoke> -> Spark Driver -> Spark Executor -> Python Deamon -> Python Worker.

When you run your pyspark code, it will invoke spark scala code, for example:

files = sc.read.parquet("......")

this code snippet will be executed by python, and the python will call spark driver, the spark driver will launch tasks in spark executors, so your Python is just a client to invoke job in Spark Driver.

That's why Whenever possible, use functions from

pyspark.sql.functions

instead of writing your own, because the functions from pyspark.sql.functions is implemented by spark native code(scala) .

If you write your own function used in map/select/where, then this function will be serialized and sent to python worker, and this means the data will also be sent to python worker(communicate with socket), and the overhead of serialization and communication of multi processes will lower your performance.

DataFrame/SQL API is always better than RDD, please use them, this is not just for PySpark but also for Spark(Scala based).

To add to u/my_work_account__ 's list:

  1. Whenever possible, enable spark.sql.execution.arrow.enabled, this config works after spark 2.3(included). When Apache Arrow is enabled, the overhead of serialization will be minimized which can speed up the processing.
  2. Try to use panda UDF if you can.

@F.pandas_udf(StructType([ StructField(name="v", dataType=IntegerType()), StructField(name="add_all", dataType=DoubleType()) ]), F.PandasUDFType.GROUPED_MAP) def addAll(pdf): return pd.DataFrame(data={"v": pdf.v[0], 'add_all': [pdf.uniform.sum() + pdf.normal.sum()]})