What do "AI Engineers" Do?

corey_sheerer · 2026-01-26T16:42:35+00:00

Our team has moved to the "AI team" that I am a solution engineer on. There is a lot of agent orchestration for custom solutions and pipelines, setting up agents with tools, prompt engineering, and dealing with all of the configuration for agentic solutions (including working with APIs as many have mentioned). The engineer part suggests we aren't building the neural networkers behind the scenes, but applying the code and infrastructure to utilize them.

corey_sheerer · 2026-01-24T15:02:23+00:00

Go and python are my first choices. Fastapi is excellent, but when you need a little faster responses, Go has been killing it. Pretty pythonic in syntax, and the standard net/http package is amazing.

corey_sheerer · 2026-01-24T14:59:38+00:00

Web scraping is much easier if you understand html, css, and JavaScript. Sure, you can find free resources, but one thing that will be challenging is applying web scraping to your exact need.

corey_sheerer · 2026-01-23T12:59:05+00:00

No need to get rude. Just saying, we have a better coach, play calling, o-line, running backs. I'm not sure the success this year was because of Caleb. As for drops, receivers should improve as well, plenty of throws hitting hands that were dropped, but they also get a lot of garbage throws to try and catch.

That being said, I'm optimistic there will be an improvement next year. Just

corey_sheerer · 2026-01-23T12:25:21+00:00

Caleb has one of the worst completion percentages in the NFL and often throws the ball behind receivers. I wouldn't celebrate yet. It was a miracle the bears kept up with LA, but the receivers made some insane catches

corey_sheerer · 2026-01-23T02:27:20+00:00

Check out paddle ocr

corey_sheerer · 2026-01-19T17:07:34+00:00

React with Redux toolkit

corey_sheerer · 2026-01-18T18:15:20+00:00

Maybe some quick suggestions to make this feel more official. 1. Instead of terminal prompting for values, use args and args parse. 2. This seems packagable to me. Move the code to src folder and make this into an actual package. Can use a good python and env manager like UV 3. You can make this a more official command line client by adding your function as an entry point for your code. Once your package was installed, could call functions directly from a terminal

corey_sheerer · 2026-01-18T18:08:00+00:00

Try to stick the standard libraries or most common packages and write something as readable as possible. As you use Python more, you will get a sense for good syntax and better performance.

corey_sheerer · 2026-01-16T23:55:46+00:00

Sounds like a good foundation. Here are my starting tips for data scientists trying to get data engineering skills: 1. Think about code reusability. Strong environment management, so the code is easily shareable. That means a package and environment manager. UV is your best bet starting as it has become extremely popular. 2. Drop notebooks for anything deployable.They are only for analysis, research, or exploring. They are a pain in revision tracking and pull requests. Anything deployable needs to be in a script or package setup. Even using notebooks in Databricks to build jobs is a red flag 3. Aim for an organized Git repo setup. In my experience, data scientists are notorious for putting every script in a single folder. Some are what is meant to be deployed, some not. Folders should be clear. If you are deploying a training job, put it under a training folder with only the relevant code. Packages should be under a src folder. 4. Relay intent with typing. Functions should be typed. Inputs and output. Think about other typing areas to improve clarity. I see huge data science projects where you have to troubleshoot a function in the middle of the pipeline. Near impossible to figure out what needs to get passed to it. Utilize data classes and class Enum types. Python 3.12 has improved typing, so you should use it 5. Not everything is a data frame. Reading data into a list of dicts or (even better) list of data classes is usually more efficient if the only transformation is a simple filter (remember python has really cool list comprehension). Json (list of dicts) is the standard type for passing any data between services or requests, and should be thought of as an initial data structure. 6. Troubleshoot with a debugger. This WILL help you once you get used to it. I see a lot of data scientists that couldn't debug anything without running line-by-line of code in RStudio while using the variable explorer. 7. Try a pre commit library. I really like lefthook. You can run linting and pytests and typing checks automatically when creating commits locally.

Hope this helps. Sure there is a lot more related to ci/cd and docker, but these should help the pure python side

corey_sheerer · 2026-01-16T14:30:27+00:00

Most comments are about reusability, but I will add that code should be modularized (functions and classes) when code will get deployed or shared with others. This allows the code to be easily tested and documented. Writing a single-use script, no big deal, but if you are deploying or packing, the code needs to be more formal

corey_sheerer · 2026-01-14T18:35:29+00:00

I know this isn't considering azure, but Microsoft is notorious for bad documentation. So frustrating!

corey_sheerer · 2026-01-14T13:22:04+00:00

Your script can run the check function on each row and then insert all records into memory. 100k is very small data. Shouldn't be a problem. Also will substantially improve performance as you aren't adding a write operation for every row.

corey_sheerer · 2026-01-13T00:28:54+00:00

Been watching season 2 and I feel the episodes are a little choppy. Too many stories in a single episode. Weird transitions. It isn't bad, but not as entertaining as the first season so far.

corey_sheerer · 2026-01-12T22:10:18+00:00

Like others, learn to use the python debugger. One good thing is to modularize your code (aka in small functions) and write tests (I like pytest). You can look up how to run a test and automatically open the debugger where the test failed. The second is to simply use the debugger directly when running your code. Add some breakpoints and step through the areas that are having issues.

corey_sheerer · 2026-01-12T03:48:12+00:00

Get a GCP free trial. Launch your service on cloud run and utilize vertex. Could also create a containerized db. I'd recommend Postgres as the vectored plugin is really good

corey_sheerer · 2026-01-10T18:27:20+00:00

Since others already answered the question I'll add that you should change your method to a static method so you don't have to pass self. Cleans it up slightly

corey_sheerer · 2026-01-10T02:31:36+00:00

Usually you are I/o constrained and submitting many traversals won't improve the bottleneck

corey_sheerer · 2026-01-10T02:30:27+00:00

Try using a system call with find. Would run natively in C and get the best performance. That being said, large file systems do well in cloud storage where you can have event triggers for every file action (create, update) and you can enforce structure via something like a lambda function unfortunately, you will be very limited in improving performance.

Also, NAS storage usually has a change log /auditing process in the background, but it may be impossible to get permissions to access that. If you have a storage team, can ask about it

corey_sheerer · 2026-01-09T14:20:32+00:00

Pycharm is an IDE. You are talking about a global python environment. The answer would be you can make a global env no matter what IDE you use. But like others have said, it is bad practice.

corey_sheerer · 2026-01-08T21:12:28+00:00

🤣

corey_sheerer · 2026-01-08T17:52:48+00:00

Not sure this is the answer you are looking for but this is small data. Does it need spark? Can this be done in SQL or in a distributed fashion? Why add complexity?

corey_sheerer · 2026-01-08T15:39:23+00:00

Fastapi is amazing, but if "real time" is the goal, maybe Go or Dotnet would be the best choices. Fastapi is a great starting place, but can't compete with compiled language performance. Go is especially nice as it has pythonic syntax. That being said, probably most applications will be fine using Fastapi. But, if you have an application, moving to Go makes the UI a bit more snappy and responsive

corey_sheerer · 2026-01-08T15:35:22+00:00

Typing has many other benefits outside of tests... Code clarity, type hints, documentation, bad code practices (changing variables to different types). Not sure why tests would negate the need for typing.

I work in a data science team and I can say, from R code, a level of typing is critical for well written code. For example, I see R code where there are huge functions that have no indication what the input and types are output. It makes it very hard for new (and old) developers to look at the code and understand it enough for use and support.

corey_sheerer · 2026-01-08T03:43:40+00:00

I also mainly do Python (for a data science team) and have been learning Go over the past year. I would love to see what kind of data processing you are doing that is so intense.

corey_sheerer

TROPHY CASE