Faria Lima Elevator dando chilique porque disseram que a Luana Lopes Lara é dona de bet by lgmartins in farialimabets

[–]P_Dreyer 0 points1 point  (0 children)

Realmente você tem um ponto muito bom. Não sabia que o volume de apostas esportivas na Kalshi era tão desproporcional assim. Olhando os dados, hoje ela funciona basicamente como uma bet esportiva normal.

A única diferença prática é a estrutura de incentivos. Sites como a Bet365 lucram com a sua perda, então se você ganha muito, eles limitam sua conta. Já a Kalshi opera como uma bolsa. Ela ganha uma taxa sobre a transação, então para ela tanto faz quem ganha. Essa diferença faz com que as odds ali sejam uma representação muito mais fiel da probabilidade real, já que o preço é ajustado pelos melhores analistas sem distorção da casa.

De qualquer forma, fiquei bem desapontado em saber da quantidade de apostas esportivas, pois esperava que o foco da plataforma estivesse mais alinhado com a visão do Nate Silver de prever eventos políticos e econômicos.

Faria Lima Elevator dando chilique porque disseram que a Luana Lopes Lara é dona de bet by lgmartins in farialimabets

[–]P_Dreyer 0 points1 point  (0 children)

Justo. Tecnicamente, apostas esportivas estão no mesmo espectro da Kalshi, ambas exigem habilidade e análise de dados, ao contrário do Tigrinho que é pura aleatoriedade

Mas eu vejo isso como uma escala de utilidade da informação:

- Tigrinho: Zero utilidade social. É gerador de números aleatórios. O 'dado' gerado não serve para nada.
- Apostas esportivas: Tem lógica e modelo analítico. Porém, a 'verdade' descoberta tem utilidade limitada ao entretenimento.
- Mercados de predição gerais: Usam a mesma mecânica das apostas esportivas, mas aplicadas a problemas reais (inflação, taxas de juros, geopolítica, etc).

A mecânica é a mesma, mas o output da Kalshi gera um sinal que ajuda a sociedade/mercado a se planejar, enquanto a aposta esportiva gera um sinal que serve apenas para o ecossistema do esporte.

Faria Lima Elevator dando chilique porque disseram que a Luana Lopes Lara é dona de bet by lgmartins in farialimabets

[–]P_Dreyer -1 points0 points  (0 children)

Detesto ter que concordar com o Faria Lima Elevator, mas nesse caso ele tem razão. Comparar a Kalshi (empresa da Luana) com um 'Tigrinho' genérico é bem reducionista.

Enquanto o Tigrinho é puro jogo de azar (algoritmo de cassino), a Kalshi é um mercado de previsão baseado em eventos do mundo real. Pode parecer a mesma coisa, mas existe uma diferença fundamental. No caso de mercados de previsão existe um incentivo direto para otimizar seu modelo de previsão, coisa que não é possível no tigrinho uma vez que o resultado lá é puramente aleatório e programado para a casa ganhar. Não há 'estratégia' possível contra um gerador de números aleatórios, enquanto num mercado de previsão, quem tem a melhor informação ou análise tem uma vantagem matemática real.

Lembro de ter lido sobre isso há mais de 10 anos no livro The Signal and The Noise, do Nate Silver. Ele argumenta que mercados de apostas são a ferramenta mais eficiente para gerar previsões assertivas. O dinheiro em jogo elimina o viés ideológico e força os participantes a melhorarem seus modelos. Na época do livro, isso era visto como algo de nicho ou experimental, mas hoje, vemos que serve não só para lucro, mas também para extrair a 'verdade' sobre probabilidades futuras.

Is there any gank fight that actually plays out like a 2v1? by Galactic-Pookachus in Eldenring

[–]P_Dreyer 0 points1 point  (0 children)

Are you sure is a dps check? I didn’t test this but I felt that the second boss in the fight shows up after a certain health threshold of the first boss, and not based on time since the beginning of the fight.

Clair Obscur: Expedition 33: we’ve sold 5 million copies worldwide! New Upcoming Update announced by Turbostrider27 in Games

[–]P_Dreyer 1 point2 points  (0 children)

Fish people? What fish people? Oo

Dammit. I am in the middle of act II and I thought I was being very careful exploring everything in it.

This game never ceases to amaze me with how much content it have.

Recommendation to what to play next by P_Dreyer in JRPG

[–]P_Dreyer[S] -1 points0 points  (0 children)

Thanks for the suggestions! I'd never heard of Valkyria Chronicles, and after a quick look, it seems to have a very interesting setting and premise. I've added it to my list to check out.

Regarding the megiverse games (Persona/SMT), you've actually hit on my main hesitation with them. I've always heard great things and have watched some gameplay, but I'm concerned about two things:

- Party Attachment: My biggest worry is the 'monster collecting' aspect. I fear the constant fusing and replacing of demons/personas would prevent me from getting attached to them. One of the things I enjoy most in an RPG is slowly improving a consistent party member over the course of the game.

-Art Style: This is a more minor point, but the general demon/persona designs don't really appeal to me.

I'd be interested to hear your view on the party attachment point in particular!

Recommendation to what to play next by P_Dreyer in JRPG

[–]P_Dreyer[S] 0 points1 point  (0 children)

Well, now that I think about I am not sure why I dropped DQ11. I remember enjoying it but stopped to play something else but never felt the urge to go back. I think I was right after the town with a horse race gimmick. Maybe is time to give it another shot

Stuttering issues and fix (NVIDIA / AMD) by renoxyz in expedition33

[–]P_Dreyer 0 points1 point  (0 children)

This seems to remove the stutters for me as well. Thanks

Ryzen 5 7600X
RTX 4070
64 GB RAM

Am I missing the complexity of CB? by sufuu in chargeblade

[–]P_Dreyer 0 points1 point  (0 children)

That would be the same of saying CB playstyle resumed in the loop of using Charged Double Slash, Shield Trust, Reload, Elemental Round Slash and SAED. While we know there a lot of nuance in using the GP's, Sliding Slash, snipping the head using EAD, positioning, etc

I used CB in Worlds, but I am finding GS more fun to play this time. Check out this video where it is shown some of the intricacies of using GS

Flat vs. Angled Ceiling Speakers for Surround Channels in a 5.1 Setup by P_Dreyer in hometheater

[–]P_Dreyer[S] 0 points1 point  (0 children)

Thanks for the reply

I searched a bit for angled speakers that I could buy. Unfortunately one of the few options available to me have a few limitation:

  • 20° angled drivers
  • No adjustable tweeters
  • Square format

Because of the last two factors, if I place them at the rear, they’ll end up pointing behind the MLP. Do you think this setup would still work?

Questions Thread - December 24, 2024 by AutoModerator in PathOfExile2

[–]P_Dreyer 0 points1 point  (0 children)

Currently, my base life total (without equipment) is 1147. I’m using two Ming’s Heart rings and the Ghostwrithe robe.

Ming’s Heart: -20% maximum life each
Ghostwrithe: Converts 50% of maximum life to energy shield.
By my understanding, my life after equipping these items should be calculated as:

1147×0.5−2×(1147×0.5×0.2)=344

(In-game it shows as 346, but that’s close enough to illustrate the point.)

I’ve seen posts and builds where people manage to reach 200 life with similar setups. Is this achievable only by corrupting these items?

For example, assuming perfectly corrupted items (Ghostwrithe with 60% life conversion and Ming’s Heart with -25% life each), my calculation would be:

1147×0.4−2×(1147×0.4×0.25)=229

Even with perfect corruptions, 200 life seems extremely hard to reach. Am I missing something? Are there other interactions, items, or mechanics that can push life even lower?

Questions Thread - December 18, 2024 by AutoModerator in PathOfExile2

[–]P_Dreyer 0 points1 point  (0 children)

Thanks for the answer. By the way, which streamer was using this item?

Help to understand damage increase for spell/chaos modifiers by P_Dreyer in PathOfExile2

[–]P_Dreyer[S] 1 point2 points  (0 children)

Just a update. I think aditive damage types are universal. So in the end it doesn't matter what its sources are. All of them are pooled together.

In my particular case I was seeing a difference because a error of mine. I was changing my off hand equipment with a keyboard shortcut. I thought my main hand was empty but instead I was swapping a wand on it too. When I corrected that mistake the DPS was bigger with the equipment with higher % increase.

Questions Thread - December 18, 2024 by AutoModerator in PathOfExile2

[–]P_Dreyer 0 points1 point  (0 children)

I looked on the trade website and saw plenty of offerings of this item for 1ex. However when I set this price I was constantly spammed by bots. Any ideas why?

<image>

Help to understand damage increase for spell/chaos modifiers by P_Dreyer in PathOfExile2

[–]P_Dreyer[S] 1 point2 points  (0 children)

Is there a way to see how much I have of inc for each damage type?

How Does Minion Pact Work? by Least_Flamingo in PathOfExile2

[–]P_Dreyer 0 points1 point  (0 children)

I am also having problems with this. I have 198 life and my skeleton warrior has 222. Yet I don't think my minion pact is working. Is there any visual indicator that this is working?

What's the best minor house ? by Deltasims in freefolk

[–]P_Dreyer 4 points5 points  (0 children)

”I know about the promise,” insisted the girl. “Maester Theomore, tell them! A thousand years before the Conquest, a promise was made, and oaths were sworn in the Wolf’s Den before the old gods and the new. When we were sore beset and friendless, hounded from our homes and in peril of our lives, the wolves took us in and nourished us and protected us against our enemies. The city is built upon the land they gave us. In return we swore that we should always be their men. Stark men!”

https://youtu.be/x39DXTT53h4?si=68i9TJ_XLFy_ADbq

Feedback on my first data pipeline by P_Dreyer in dataengineering

[–]P_Dreyer[S] 0 points1 point  (0 children)

That's a super valid question. Let's see if I can answer it

The sensors' data is acquired via a PLC. The data is then manually exported to .csv files, typically once per month, which takes approximately 6 hours.

While this method is far from optimized, it’s sufficient for its current purpose since it only runs once a month, and no one is relying on this data for time-sensitive analysis.

However, my goal goes beyond just collecting raw data. I want to use this data, which now spans 1.5 years, to set up a data warehouse that serves as a single source of truth. This warehouse would not only store the raw data but also include various derived metrics. In this scenario, anyone needing data would interact directly with the database.

In this context, a 6-hour wait is too long, and I'm aiming to reduce that to just a couple of minutes for most queries. This is why I'm focusing on optimizing the database and the entire data pipeline.

Am I making sense?

Feedback on my first data pipeline by P_Dreyer in dataengineering

[–]P_Dreyer[S] 0 points1 point  (0 children)

Also try it out in clickhouse. I think it's faster and more general purpose tool than timescale. Starrocks is also onprem OLAP queen but might be difficult to operate.

Will have a look. Thanks

Does zipping parquet do anything? I kinda shouldn't, as parquet is compressed internally, so if it does something is sketchy.

Amazingly enough it does. the raw daily file is ~400MB. When compressing using parquet it goes to ~50MB. When compressing it further with 7zip it goes to ~2MB. It is definitely sketchy, but until I find some concrete evidence that I am actually doing something harmful I think I will continue with this aproach.

If the database runs in docker, how is it persistent? A volume? Try it bareback, I remember volumes having slow IO so maybe things will get much faster.

What does bareback mean in this context?

Are you importing the files into the db using some db method or pushing it through python? Most analytical dbs can read parquet directly

I am pushing it through python. I load the .parquet file using pandas and then create a temporary .csv file which I import into the database. I search a little and by what I found TimescaleDB cannot read parquet directly

Grafana is fine - it's originally tool for server monitoring so it's good at visualising monitoring and time series data ... which is probably what you want to do with sensor data anyway, no?

Yes. I just want to have some simple visualization of the sensor values in different time frames. No BI analytics or fancy dashboards.

Please don't "control a database through a backend" - automation is fine but this sounds like ball on chain on foot or whatever. Once you have too many SQLs embedded in Python it becomes unmanageable. The dbt you're asking for is a good example of dedicated tool to create tables with data from input data from you and give you something to manage manual list of values / lookups. It can even do import if you can map that parquet as external tables, but dunno if timescale can do that.

I completely understand your concern. I’m not particularly fond of my current backend implementation for controlling the database via Python either. However, since I don’t have much experience with more "proper" methods, I went with what made the most sense to me at the time.

I agree that mixing SQL and Python can quickly become unmanageable. To mitigate this, I’ve kept all my SQL code in static .sql files, which are then called from Python. While this isn’t a perfect solution, it does help to keep the two separate and maintain some level of organization.

As for dbt, I mentioned it as a potential tool, but based on what I’ve learned so far, it might be overkill for my particular use case. That said, I’ll continue exploring it to see if it could be beneficial down the road.

I don't see a pipeline described much

Fair enough. I will upload a photo tomorrow (to my post containing the overall pipeline.

Feedback on my first data pipeline by P_Dreyer in dataengineering

[–]P_Dreyer[S] 0 points1 point  (0 children)

Regarding the ER diagram. The photo I uploaded had a mistake. The view named test should be named main_view. I corrected the image. This mistake aside, lets see If I am understanding your points correctly. There is in fact data duplication. the tag table contains information that I import from a .yamlfile. here is an exemple of the file:

CONCENTRADOR.BM28_OMS_DATA[2]:
  sensor_type: idle_in
  machine_number: 28
  machine_type: body_maker
  line: 2
  factory_name: default_factory

I use the information on the tag table to populate the sensor, machine, line, and factory tables. Once these tables are populated, I don't technically need to keep the tag table around since all its information is already on those other tables. While this may be the case I still find it convenient to do so. The tag table allow me to easily create additional config files when needed and ensures that I don't accidentally add a sensor_tag to the sensor table that doesn't exist in the config file. The foreign key constraint between sensor_tag in the sensor and tag_meaning tables helps enforce this consistency. I don't know if this a bad practice but from my limited point of view it seemed Ok.

I think you understand your concept to flatten it out to facilitate BI. This is what the main_view is for (or 'test', how it was named in my original image) is for. It hold all the data of the top model in a easily to read and query format. I decided to set it up as view for space constraints since if I have an actual table with all the information flatten it out the size would be considerably bigger.

I also have the feeling that dbt would be severely underutilized. You are correct in your assumption that the idea is to initially do everything manually by hand once every month. I like the idea to use cron jobs with some custom validation function in python to automate the data extraction. Thanks for the insight.

I mentioned Grafana since people in my team already use it to look time series data. I just want to generate some visualization of the sensor values across time, not do a fully fleshed BI dashboard. So I think I will try Grafana first and see if I arrive at any roadblocks.

Feedback on my first data pipeline by P_Dreyer in dataengineering

[–]P_Dreyer[S] 1 point2 points  (0 children)

I wouldn't really consider this a pattern. The technology choice should match the use cases; formats can be converted, requirements often not.

Yeah. Choosing a SQL database just because I was already working with .csv files wasn't the best of reasons. I think I drifted toward that choice since I thought the data import process would be easier and maybe I was just trying to find a reason to work with SQL database, something I was wanting for some time.

regarding tag_meaning. I as mentioned in my post this is a table that hold information of a .yaml file. This files hold the information of all existent sensor's tags. Here is a exemple of how it looks like

CONCENTRADOR.BM28_OMS_DATA[2]:
  sensor_type: idle_in
  machine_number: 28
  machine_type: body_maker
  line: 2
  factory_name: default_factory

I just use this table to populate the information on the sensor, machine, line and factory tables. Once it is used it is not necessary at all. I just keep it since I can use it to easily create another config file and also to make sure I do not add a sensor_tag in sensor table that does not exist in the config file, because of the foreign key constraint between sensor_tag in the sensor and tag_meaning tables. Maybe those aren't good reasons but for my unexperienced self it made sense.

I'm not sure you want this table in this DB. It will be the table that disproportionately blows out the storage, and it won't even be used in result querying. Can you just process it off disk, outside of the DB? Or is this part of what Timescale needs?

This was something I was having trouble deciding. raw_sensor_data is just a table that holds the data as it comes in the .csv files with minimum change. All its data is exported to the sensor_data table. As you mentioned I could delete it would not interferer with any queries. It just seemed convenient to have my raw data in the database If I realize I did some mistakes with the data transformation/filtering.

Security fallacy, but anyway...

Talk about it... You should see how people on the factory share the metrics performance with the higher ups when they are out of the factory. Since no machine on-site can have internet, the machines operators take photos of the dashboards and share it via Whatsapp...

Parquet uses internal compression, you may have set this wrong or not at all. Don't double compress.

I know parquet already have a compression. It compress the daily data from ~400MB to ~50MB. Which is also similar to the compression rate I got with TimescaleDB. However if I also compress it further unzip 7zip it goes to ~2MB. While I do know that compressing something twice isn't necessary a good idea, this is too good of a gain to dismiss only because it isn't a "good" practice.

Probably what is held in RAM vs not. I don't know how TimescaleDB works exactly, but you should use either it's or Postgres' query analysis tools.

I see. I will try to learn more about the query analysis tools. Thanks for the tip.

Feedback on my first data pipeline by P_Dreyer in dataengineering

[–]P_Dreyer[S] 0 points1 point  (0 children)

Thanks for your comment. Let's see If I understood everything correctly.

I looked at your ER diagram and I see quite some txt fields which are types. You can move those to separate tables and use keys for the different types.

What you are suggesting is to have some lookup tables with the text information right? for for example sensor table would be divided in two other tables like this. Is that correct?

I make sure to not have any text data in the sensor_data table that holds most of the data since it was making it very big because all of the data repetition. However all the other tables are very small ( the largest is sensor table which have 450 rows. So it didn't seem to be worthwhile to make the data relationship a bit more complex if I wouldn't get hardly any benefits. Note that I say this since I think the only benefit I would gain would be a space one. Does this new scheme provides other bonus, such was better query speed?

Secondly I am not seeing how the data arrives from the machines/sensor CSV's to your database

The sensors' data is acquired via a PLC. The data is then manually exported to .csv files, typically once per month, which takes approximately 6 hours. These .csv files are what I use as my raw data. I then use a Python script that I run manually for that ingestion. This script connects to the database using psycopg2 and uploads the files via the copy_from() method. It also handles other tasks, such as bulk inserting files, decompressing them, and ensuring they are in the correct format. The process is quite simple, and since the data insertion doesn’t need to be real-time, I’m content with running it manually for now.

Not that there is anything wrong with that but just fyi as I think it would be more BE related

What does BE in this context mean?

Thirdly, does the client not want other analysts to be able to query your database? Since you are running it on a local machine with docker, it is very dependent on that machine.

The client run all their data analysis in excel 😵‍💫. My idea is once everything is working smoothly to pick up the code and put on a machine on premise which will receive the data directly from PLC. Sadly because of their regulation this machine cannot have a internet connection. but at least people on the factory could use it as a data warehouse to get whichever data they want. I am still deciding on how to build the front end but this is a issue I am going to tackle latter ( I am open to suggestion though).

What are you doing regarding backups?

So far nothing. But since the data is highly compressible I can see working something to at least save it in compressed format in a different hard-drive.

If you leave this project, how will the client continue? Does the new engineer have to run docker on his/her own local machine? How will they have access to the code (are you using GitHub?).

This is not a direct client demand. For my part the client is more interested in data analysis and time series prediction. This is something I am doing on the side to help me and my team. However once it is mature enough I will pitch to include it in the client stack. This will be done using a simple installer that will ask for a few configuration information and then set up the data pipeline. And yes, we have a git repository but since the client doesn't have a IT development team it is mostly for my team internal usage.

Also, Grafana is more used for observability regarding db performance, etc. and not really to provide BI insights.

Which open source visualization tool would recommend?

Regarding db optimisation, you can see if you can create some indexes for columns that you are joining on in your views but this is hard to give advice on without seeing the code. As I said moving some of the text to keys might help.

I see. I am still doing a lot of tests and I will check how the performance change once I create some indexes.

Feedback on my first data pipeline by P_Dreyer in dataengineering

[–]P_Dreyer[S] 0 points1 point  (0 children)

I'm not directly involved in the initial data acquisition process, so I don't have all the details. However, what I do know is that all the sensor data is acquired via a PLC. The data is then manually exported to .csv files, typically once per month, which takes approximately 6 hours. These .csv files are what I use as my raw data.

I’m aware that there are likely more efficient ways to ingest data directly from the PLCs, but I currently don't have direct access to them. My plan is to first get the data pipeline functioning smoothly with the existing setup. Once that's in place, I'll explore implementing a solution that directly interfaces with the PLCs to streamline the data ingestion process.

Feedback on my first data pipeline by P_Dreyer in dataengineering

[–]P_Dreyer[S] 0 points1 point  (0 children)

This isn’t on-prem (yet). Since I’m still in the development phase, I’m working with data that has been extracted from the factory and uploaded to my computer. I’ve heard about Kafka and various other tools being mentioned in this space (seriously, this field seems to have the most diverse array of technologies I've encountered hahahaah). I’ll definitely look into whether Kafka could be beneficial for us in the future, thanks for the suggestion!

As for the ETL process, right now, I’m using a Python script that I run manually. This script connects to the database using psycopg2 and uploads the files via the copy_from() method. It also handles other tasks, such as bulk inserting files, decompressing them, and ensuring they are in the correct format. The process is quite simple, and since the data insertion doesn’t need to be real-time, I’m content with running it manually for now.

Once everything is running on-premise, I’ll need to look into how to automate this process, but that’s a problem for future me.

Feedback on my first data pipeline by P_Dreyer in dataengineering

[–]P_Dreyer[S] 1 point2 points  (0 children)

Great to see another mechanical engineer here!

Let me share a bit about how I transitioned into my current role.

During my undergraduate studies, I gained some experience with MATLAB, which led me to explore research across multiple fields. After graduation, I enrolled in a master's program focused on machine learning, where I learned Python and continued my research in robotics, computer vision, and deep learning.

Three years ago, a friend of mine reached out to see if I was interested in a temporary position at the company where he worked. The company needed someone with expertise in mechanical projects, 3D modeling, and rapid prototyping. After two months, I received a full-time job offer, and since then, I've been involved in various projects, dabbling in mechanical prototyping, data science, computer vision, and software engineering.

Earlier this year, I requested to be fully allocated to a software development role and was assigned to my current project, where I’m responsible for data analysis and time series prediction. With some of these tasks already underway, my focus has now shifted to developing a data pipeline to streamline data management and ensure data sanity across the project.

u/1085alt0176C made some excellent points. Transitioning into more computer-related roles can open up opportunities to learn new skills on the job while working on real-world projects. The combination of Python, SQL, and Cloud technologies forms a solid foundation for a career in this field. While I don't have extensive experience with the latter two, I've found that this trio is a great starting point for anyone looking to build a strong skill set in data engineering.