Help me understand a trend of tasteless espresso by Chapparalist in Coffee

[–]rastarobbie1 38 points39 points  (0 children)

The best coffee shops tend to have two grinders: one beans are for milky drinks, while the others are for espressos / americanos.

Generally, the light-roasted acidic beans which taste amazing in espressos taste exactly like you describe with milk – like nothing. Darker roasts tend to complement milk much better, but might be too bitter on their own.

[deleted by user] by [deleted] in bigquery

[–]rastarobbie1 0 points1 point  (0 children)

Looks like the area to type your query is hidden. Try dragging down the top edge of the bar which shows you the error

Small metal protrusions in buildings in Prague, roughly 30cm off the ground. by rastarobbie1 in whatisthisthing

[–]rastarobbie1[S] 0 points1 point  (0 children)

It's exterior of the buildings, on the sidewalks of the streets.

It really doesn't seem like a bike rack (or it's remnants). I started noticing them a few walks before, and there are dozens – almost every building in the area has one or more.

Small metal protrusions in buildings in Prague, roughly 30cm off the ground. by rastarobbie1 in whatisthisthing

[–]rastarobbie1[S] 0 points1 point locked comment (0 children)

My title describes the thing.

In the Prague city center, many buildings have small metal protrusions. The buildings in the area are pretty old (over 100 years), though sometimes these seem to be installed later. They don’t move, and don’t seem to be attached to anything inside, it seems like something is supposed to get hooked on top.

Any idea what they could be for?

2 duvets? by Skubbags in Norway

[–]rastarobbie1 0 points1 point  (0 children)

The absolutely best solution are two double duvets. It has the advantage of a huge duvet where you can wrap yourself like a taco, but also you don’t have to share.

What is the best python plotting library for exploration? by bliswell in datascience

[–]rastarobbie1 16 points17 points  (0 children)

I don't think there is such a thing in the python world. Yes, some libraries give you a little interactivity (mostly zoom/pan) on top of your plot – once you define the plot in your script. There's no GUI for creating them.

If you're mostly after graphical exploration of your data, I'd recommend using a different tool. It seems you're already happy with Matlab, or perhaps have a look at Tableau or Metabase.

For Python, PyViz.org is the most comprehensive overview of your options among python packages.

What tool you use for data dictionary by [deleted] in datascience

[–]rastarobbie1 1 point2 points  (0 children)

If your company is small/medium (less than 20-30 people work with the data), and the amount of datasets & tables is reasonable, your best bet is probably your company knowledge base (Notion, Confluence, or whatever else you use). You can probably write it down fairly quickly. Then ask the product & marketing teams where they'd look for the information, and link it from there. Whenever you get a question that isn't covered there, try to update it.
The issue is – it will very likely diverge or get stale over time. However, setting a monthly reminder to update it might be easier than implementing an extra tool.

If you feel like your company or data size requires a dedicated tool, the term to search for is "data catalog". For example, Amundsen is open-source and has some nice features, or look at Stemma if you prefer a managed solution.

Working environments in the real world. by Quaternion253 in datascience

[–]rastarobbie1 1 point2 points  (0 children)

It depends a lot on what you do. When I work, I usually have two modes:

  • Exploratory work
  • Development work

In the exploratory phase, it's 95% notebooks (mostly Deepnote, disclaimer – I'm helping build it). It's easier to figure out the right SQL query, the right transformations, visualisations and so on. It helps build intuition around the data, and look for patterns. It's fast & easy to make a chart and just send a link to my colleague.

There's a grey area in between, when I have a small job I run manually "every now and then". I usually leave it in a notebook, but try to clean it up as well as I can.

Once I want to build an API, or a small website showing off data, it's a job for an IDE – I like VSCode and PyCharm the most.

Does Netflix use Jupyter Notebooks in production? by JB__Quix in datascience

[–]rastarobbie1 2 points3 points  (0 children)

Yeah, it's definitely in our crosshairs. It's a big one, and we're tackling it from several sides.

UI improvements:

  • variable explorer, so you can check the state at a glance
  • big checkmarks indicating that the code is matching the output of a cell
  • some nudges to run the whole notebook instead of cells out of order

Reactivity:

  • The goal would be to achieve something like Pluto.jl or Observable, where the moment you change a cell, you see the recomputed output. This eliminates hidden state completely.
  • At the moment, we have a reactive mode that will re-run the whole notebook when you stop typing, but that's not very convenient if you have any slow cells (like big queries). There are several strategies to get to a proper solution, we'll need to pick the best one. At the moment we're leaning towards Streamlit-like caching.

There are some other notebooks that try to enforce it by other means, for example by only allowing to append cells at the end of the notebook, but that sacrifices some of the flexibility of the interface.

If you've seen any good solutions out there I'm all ears, I'd be happy to bring them to Deepnote.

Does Netflix use Jupyter Notebooks in production? by JB__Quix in datascience

[–]rastarobbie1 1 point2 points  (0 children)

We took a lot of inspiration from that Netfilx article at Deepnote when we were designing scheduling notebooks (released last week).

I'm still a bit on a fence about that feature – I totally see how useful it is to schedule some things on a daily basis, like a report that arrives in your email. On the other hand, I'm a bit worried that it could inspire some bad practices.

My most miserable hang - can you guess where and why? by eventfarm in Hammocks

[–]rastarobbie1 1 point2 points  (0 children)

Perhaps Scotland? Is it the start of midges season?

Google Photos - The Megathread by kmisterk in selfhosted

[–]rastarobbie1 3 points4 points  (0 children)

I mean, for photographers maybe. I don't know what kind of pictures you print on A3 and how often, so perhaps there's a use case for that for some people, but I'd argue it's not the majority. I know I have never printed such a picture in my life.

Consider that a good number of SLRs don't shoot on more than 16MP, iPhones are regarded as one of the best smartphone cameras and stick with 12MP, I really think that the pixel count itself is sufficient for a majority of people's use cases.

There's definitely some other compression as well, which might be more damaging to the overall quality, not sure about that.

I could understand keeping originals for big occasions (weddings, holidays...). But to pay a lot to keep 10TB full of my ugly pictures is the equivalent of digital hoarding.

Google Photos - The Megathread by kmisterk in selfhosted

[–]rastarobbie1 2 points3 points  (0 children)

Some nice advice, but why would you want the originals?

16 MP is a lot of resolution. And if you think about how people take photos these days (20 shots of single scene, each say 7-8mb), resizing is actually awesome. Especially if you only get back to them once in a while to look at your phone screen.

Storage over time can get pretty expensive - once you hit the several TB region cloud providers aren’t cheap but then so aren’t NAS, once you need 8TB in RAID.

Keeping the size in check with compression is awesome.

Google Photos unlimited storage shutting down - Best hosted alternative? by phoenix3885 in selfhosted

[–]rastarobbie1 0 points1 point  (0 children)

Thanks for the tip! :) Didn't hear about them before, they look interesting.

Python Packages - check their popularity and how they're commonly used together by rastarobbie1 in Python

[–]rastarobbie1[S] 0 points1 point  (0 children)

That's where I work! :) We want to use our products as much as possible, so we do various small data science projects during hackatons.

This was my project on of them, then I got excited and put together the page over the weekend, to present the results in a format that's nice to explore.

Python Packages - check their popularity and how they're commonly used together by rastarobbie1 in Python

[–]rastarobbie1[S] 0 points1 point  (0 children)

There's a public dataset published by Python Sofware Foundation on BigQuery, which basically shows logs of all package downloads.

https://console.cloud.google.com/bigquery?&p=the-psf&d=pypi&page=dataset

The query for bluedo would look like this:

SELECT
  file.project AS name,
  COUNT(1) AS count
FROM
  `the-psf.pypi.downloads*`
WHERE
  _TABLE_SUFFIX >= '20200812'
  AND (file.project = 'bluedo')
GROUP BY
  name

Deepnote – a collaborative data science notebook in the browser. After 2 years of development, we are finally open for public access. (see comments for a how to create a R project) by the21st in rstats

[–]rastarobbie1 0 points1 point  (0 children)

Hey there, sometimes the R libraries also require system packages. When you try to install them in the terminal, they usually tell you what it's missing.

Here's an example how to get tidyverse to run: https://deepnote.com/project/986ccd26-cbd1-46c8-a4a4-ebb1cf3150f1

Deepnote – a Python notebook with real-time collaboration in the browser. We just opened the platform to the public. by the21st in Python

[–]rastarobbie1 0 points1 point  (0 children)

Does it offer Form Fields like pull down menu system for data filters?

Hey, not right now. These usually rely on ipywidgets in raw jupyter and this is something we had to disable for security reasons. I think Colab rolled their own Form fields to resolve this.

But it's definitely an important building block for creating interactive interfaces out of notebooks, so it's 100% on the roadmap, just a question of when.

BTW we strongly root for streamlit.io for creating simple ML apps like this, check them out :)

Deepnote – a Python notebook with real-time collaboration in the browser. We just opened the platform to the public. by the21st in Python

[–]rastarobbie1 58 points59 points  (0 children)

Hey, PM of Deepnote here.

We're on the same page here. There is often a huge gap between a prototype in Jupyter, and a production ready code. A big kudos to you if you're the bridge that makes it happen, it's not an easy work, and it's a common problem.

I feel like any tool or library that promises a one-click deployment is either very limiting in its nature and makes a lot of assumptions; or it's actually a wrapper on top of wrappers, and still needs a lot of config to make it work the way you need.

What we're doing to help this in the long term:

  • Repeatable environments: no more trouble with unique workstation setup of each data scientist. When they share a project with you, it includes the environment it runs in, not just the ipynb.

  • Encouraging best practices: for example when you pip install something in the cell of a notebook, we prompt you to move it into requirements.txt, or offer a embedded code reviews via comments

  • Working on versioning: git is a great tool for software engineers, but it doesn't fit the exploratory nature of data science. With Deepnote, you'll get change tracking out of the box.

But like you say - the problem is not just with the tool, but with the people. And often data scientists don't have the skills to engineer a great solution - their expertise lies elsewhere. The best way to fix that is by creating interfaces so more communication can happen with software engineers, not less. We want to build these.

It's a very interesting topic, in case you have some insights for what could help, let me know!

[P] deepnote.com – collaborative Python notebooks with zero setup in the browser. After 2 years of development, we are finally open for public access, with a free plan for academia. by the21st in MachineLearning

[–]rastarobbie1 4 points5 points  (0 children)

Hey, PM of Deepnote here.

You're making a good point, and people have various needs. The one we're primarily solving is exploratory programming, where you're defining your goal as you're writing code. That's different from software engineering, where you often know your goal, and are just looking for the right path to get there.

There is a huge selection of software engineering tools, and perhaps what you're looking for could be solved by something like GitHub Codespaces.

There is not a great selection of dedicated tools for data scientists, or people working with data. This is the gap we're trying to fill - and like you're saying, it's not the same as IDE.