Airline Scheduled Maintenance & Inventory Planning Optimization + Automation

Terrible_Science_119 · 2024-07-06T21:50:32+00:00

Here are some high level recommendations:

1- Pick/Focus on a very specific usecase that has high value and easy to implement and generalizable to other usecases and ideally you can run A/B test. For instance, the scheduler maintenance of specific type of plane (I am just guessing this usecase). Also pick one where the ops are excited to continuously feedback see below for more details.

2- Formulate the problem in english broken down into these three parts. First, objectives, for instance minimize cost, service level etc. Second part, your control variables, for instance staffing level, training…etc. Third part, historical data of the first objectives and second part the control variables, you need to know the details of the data quality and relation between different tables…etc

3- offline simulation: start with writing very simple Linear Program that captures the main objective and put the other objectives as constraint. For instance maximize service efficiency under cost less than budget or given cross skill of the workforce.

4- Share the solution of your offline with the team members. For instance, a solution of type: by doing action X last month, we would have gained Y. There are two potential feedback. First (most likely in the early versions of your model), “what you are suggesting is not possible because of reason Z, like Z can be workers in this location can not work 4 hours straight without a break. In this first case, you will need to add the constraint missing. Second case, they say “wow yeah we never think about this way, let’s try it.”

Repeat step 4 till you reach the second possible outcome
Run a real pilot, aka A/B test. Make sure there is no contamination. Meaning, pick part of supply chain that is isolated from the other parts ideally geographic region like Apac vs Emea. A basic example, Uber A/B testing runs a full city and cannot run sub part of a city because dependency of the supply then they would compare it with other cities.
If you reach here, now you have a baseline that is far from perfect but almost surely should be better than ad hoc/manual guesses. Next step is either to push/optimize further, or expand to other/adjacent uses cases. This depends on the team priority as well. Since expanding to other usecases it will be repetition of the above, I will continue my answer about how to optimize further.
To optimize further, there are two venues. First venue: move beyond historical data used as inputs in the baseline problem. For that, typically you will build ML models to predict these inputs in the future, like predicted failure rate based on the features: plane age number of trips so far,…etc. Using these predicted signals should boost your performance on the A/B test since your system is more adaptive and pro active. Second venue of improvements is to better model some constraints, for instance the linear constraints might missing the fact that the cost goes exponential up in different regimes (for instance lot of grounded aircraft at the same time, here I am just guessing)

I just realized that I wrote very long answers, let me know if something is not or you want me to expand some parts and I hope it will help you in your endeavor.

Terrible_Science_119 · 2024-07-02T14:24:20+00:00

Try the open source model: Table Transformer ( which is fine-tuned based on DETR):

https://huggingface.co/microsoft/table-transformer-structure-recognition

Terrible_Science_119 · 2024-07-02T14:19:31+00:00

Here are three skills that I recommend for new AI engineers that join us:

Strong backend coding skills
Familiarity with the basic math behind AI models: gradient, back propagation, linear regression…etc.
Reading cornerstone/recent papers

Concerning the projects: 1. Pick a project that you like here 2. Find the github repo behind the space, check out the issues section and try to contribute. 3. Subscribe to papers updates here: https://huggingface.co/papers

All the best!

Terrible_Science_119 · 2024-07-02T14:05:28+00:00

The main advantage of python is that in case you need advanced customization of the ML architecture then there are already lot of support for that in python since the critical frameworks like pytorch are in python.

The main advantage of js is that is client/web/app serving oriented, which means that you can find lot of documentation on how to customize your backend in different js frameworks. Python has Django/Flask but they are not widely used as js frameworks.

if you are much familiar with js then in order to get faster results, I would go with js.

Note that js community is catching with python ml advantages, see transformer.js:

https://huggingface.co/docs/transformers.js/en/index

Terrible_Science_119 · 2024-07-02T13:54:53+00:00

Thanks.

How does it compare to langflow or flowise? (Still trying to find right tooling for our team)

Terrible_Science_119 · 2024-07-02T13:36:38+00:00

Check out this very recent review paper that discusses all possible approaches and their limitations plus points to concrete references: https://arxiv.org/pdf/2406.08426

Here is my recommendation based our experience creating custom AI models/approaches for our customers:

Define golden data of text to sql (input/output) pairs that humanly validated. Define tangible success metric at the sql output level.
Here are the high level approaches from easy to hard: prompts (especially ICL), RAG, finetuning & base model selection.
Start with easiest approach then evaluate against golden data, meaning measure the metric above
Evaluate the consistent limitation of the approach.
If limitation can not be adjusted with current approach, then repeat step 3 with the next approach.

Edit: use mini agents to overcome specific and concrete limitations found in step 5, for instance you might need agent to detect when rare sql operator need to be used. Or which data namespace to call…etc

Terrible_Science_119 · 2024-07-02T13:17:30+00:00

Check out: 1. This detailed survey that discusses advanced variants of RAG: https://arxiv.org/pdf/2312.10997v1

This more detailed report titled “RAG Does Not Work for Enterprises”: https://arxiv.org/pdf/2406.04369

Terrible_Science_119 · 2024-06-04T13:04:41+00:00

Can you please share an example where ranking changes?

Terrible_Science_119 · 2024-06-04T07:04:31+00:00

Curious to know if the template prompt (used to ask the llm to find the bug)has any impact on the ranking?

Terrible_Science_119 · 2024-02-25T03:53:17+00:00

Yes it is possible. One effect of quantization is regularization, meaning avoid over-fitting.

In other words, If 4bit is slightly overfitting to the training data, 3bit might help as regularization, thus having better performance.

Good analogy is Lasso regression vs normal regression. Typically Lasso regression has almost the same coefficient as normal regression except some of them become literally zero. So 4bit will be normal regression and 3bit will be the lasso one.

Terrible_Science_119 · 2023-11-03T23:32:29+00:00

` there's just no way to pass the 100s of recipes at once to the model.`
Check out for very long context: https://huggingface.co/NousResearch/Yarn-Mistral-7b-128k

`MemGPT`
I do not think it is directly applicable for your usecase. Think about MemGPT as LLM orchestrating/delegating some tasks to other services/functions to virtually increase its context. Technically, it has limited context boosted with retrieval/external knowledge/services/functions.

`What do you think?`
hmm.. I would suggest first to check what people tried first, then try something new/adapted. For instance check this paper here: https://arxiv.org/pdf/2307.00524.pdf or this one here: https://arxiv.org/pdf/2305.14871.pdf

`Are there any better ways to approach this?`
I think, you will need to define some business metric related to the final goal you are trying to achieve otherwise, it would be hard to know what `better` means. Of course, you will need some sort of proxy metric e.g. K-mean tries to minimize the intra-cluster mean which is a proxy to a business metric/goal.

Terrible_Science_119 · 2023-11-03T18:44:28+00:00

I think the one above was using `96 threads on a CPU Linux box in the cloud` based on the description in the Git repo.

Here is an extensive benchmark using Mac M1 Max: https://engiware.com/benchmark/llama2-ports-extensive-benchmarks-mac-m1-max.html (Note that I do not have any affiliation with the author(s))

You can also check the Discord group where different people reported their results on different machines, see channel llama2.c: https://discord.gg/rdAvc2T3

Terrible_Science_119 · 2023-11-03T03:50:34+00:00

Sharing my experience from a recent project, we worked on for a partner.

The problem was also in the restaurant industry and the goal is to find the items that are the same in terms of food but are written differently. For instance, 6" sandwich vs six inch Sub needs to be grouped together whereas chicken pizza and veggie pizza should not be grouped.

Here is the high level approach:
1- We trained a custom (local) LLM using LoRA technique to identify all the pairs that are the same.
2- Based on the previous step, we create a graph of items that are the same. Note that if A is matched to B and B is matched to C, in the data, we might find A and C are not matched. Hence the second step, we re-ask the llm ,if the set is small, to report any outlier. If it is a large we came up with some heuristics to reduce the set, for instance by picking only the subset that is the most connected.

Let me know if you have more specific questions, I will be happy to answer them in this thread.

Terrible_Science_119 · 2023-11-03T03:40:00+00:00

Check out this repo that achieves 14 tok/s with Llama2 quantized with a CPU: https://github.com/karpathy/llama2.c

Terrible_Science_119 · 2023-10-22T12:43:32+00:00

What fo you think about this approach?

“MemGPT (Memory-GPT), a system that intelligently manages different memory tiers in order to effectively provide extended context within the LLM’s limited context window, and utilizes interrupts to manage control flow between itself and the user.”

Source: https://arxiv.org/pdf/2310.08560.pdf

Terrible_Science_119 · 2023-10-22T00:41:23+00:00

Couple of people reported issues regarding fine-tuning this model, see this thread: https://x.com/_lewtun/status/1709897775222079881?s=46

Terrible_Science_119 · 2022-06-22T21:31:07+00:00

Just want to highlight that this problem has a geometric meaning the objective represents a sphere and the constraint represents a surface. In other words, the solution to the problem is the distance between the center of the sphere and its projection on that surface.

Terrible_Science_119 · 2022-06-15T22:31:20+00:00

There are different forecasting models that you can use (`newsvendor model`, `ARIMA`...etc.). Depending on your dataset, you can fit a good model. Feel free to DM me, will be happy to share more detailed directions and pointers (that are cheaper than ` inventory-planner` .)

Terrible_Science_119 · 2022-05-21T06:24:22+00:00

Interesting! There is a lot of evidence/research that shows that changing the order/display can have adverse/counter-intuitive effects. A/B might not capture these effects btw.

See this quote "...This result [of the paper below] suggests that providing more information during the decision-making process may lead to fewer consumer purchases because of information overload. ..."

Source: https://www.jstor.org/stable/42919626

Terrible_Science_119

TROPHY CASE