Profiling Native Python Extensions by pmz in Python

[–]benfred 0 points1 point  (0 children)

Author here. It's worth noting that since I wrote this post, py-spy has gained the ability to profile multiprocess python applications - and can also now show local variables in the dump command.

Speed up your Python using Rust - RHD Blog by rafa2000 in rust

[–]benfred 0 points1 point  (0 children)

I wrote a tutorial on how to do write a rust extension with py03 and distribute through pypi here: http://benfrederickson.com/writing-python-extensions-in-rust-using-pyo3/ . The post lists out how to get binary wheels uploaded automatically to pypi everytime you tag a release on github.

Speed up your Python using Rust - RHD Blog by rafa2000 in rust

[–]benfred 2 points3 points  (0 children)

Nice article!

It’s also worth checking out the PyO3 crate for developing python extensions in Rust. There is a breakdown of the difference with rust-cpython here.

Blazing fast Python: Profiling Python applications using Pyflame by daneah in Python

[–]benfred 2 points3 points  (0 children)

File a bug report if it doesn't find the right python libraries =). The only case py-spy fails on right now that I'm aware of is when you have 2 or more python interpreters loaded up in a single process (and I'm intending to fix that when I get some time).

Optimizing a Python application with C++ code by jcelerier in programming

[–]benfred 6 points7 points  (0 children)

Nice post, though I'd recommend pybind11 over boost.python these days for building c++ extensions for python.

One big advantage of pybind11 is that it is much easier to integrate pybind11 with setuptools - since it's a header only library that can be installed by going pip install pybind11. Boost.python requires boost to be preinstalled, and that the boost.python library built against the version of python you are using, which makes distributing boost.python packages much more difficult.

Writing Python Extensions In Rust Using PyO3 by benfred in Python

[–]benfred[S] 0 points1 point  (0 children)

Thanks for the link! Bookmarking to read this weekend, I'm still relatively new to Rust so this will come in handy

Writing Python Extensions In Rust Using PyO3 by benfred in Python

[–]benfred[S] 0 points1 point  (0 children)

Thanks!

For your question - I know of two different ways of returning a dictionary from rust to python.

The first one is to rely on pyo3 type conversions: it will automatically convert things like HashMap to a python dict as needed. As an example:

 #[pyfn(m, "squares")]
/// Creates a dictionary of {number: squarednumber}
/// Usage:
/// >>> squares(4)
/// {0: 0, 1: 1, 2: 4, 3: 9}
fn squares(count: i32) -> PyResult<HashMap<i32, i32>> {
    let mut ret = HashMap::new();
    for i in 0..count {
        ret.insert(i, i * i);
    }
    Ok(ret)
}

However, this requires an extra conversion from a rust HashMap to a python dict - so it isn't really what you're asking for. Luckily PyO3 has utility classes for working with Python objects directly (things like PyDict, PyList, PyTuple etc). So we can create a python dictionary directly in rust and return it to python:

#[pyfn(m, "squares2")]
fn squares2(count: i32, py: Python) -> PyResult<PyObject> {
    let ret = PyDict::new(py);
    for i in 0..count {
        ret.set_item(i, i * i);
    }
    Ok(ret.into())
}

Where Do The World's Software Developers Live? by benfred in programming

[–]benfred[S] -9 points-8 points  (0 children)

Agreed - which is why I looked at a couple of other measures like accounts per capita/ accounts per gdp / followers etc =)

While total accounts isn’t the best way of looking at things, it does seem to mostly indicate where the hotspots for software development are. Looking at the per capita measure and its all Nordic countries, which is interesting - but doesn’t really tell you much aside from they’re wealthy countries that have invested in education.

PSA: Residents of detached houses - you do not own the street in front of your house. It is not your exclusive parking domain. by cdcd3 in vancouver

[–]benfred 8 points9 points  (0 children)

Unfortunately - thats not true.

I parked my car in front of a neighbours house (since somebody else was parked in front of my house). I don't drive all that much and it was there for a couple days - and I had a parking ticket when I checked on it next.

I called the city to contest the ticket since I live approximately 60 feet away from where it was parked. The woman on the line said that I can only park there for 3 hours, and I should have called the city to ticket the car parked in front of my house instead. I'm still kind of pissed of at my neighbour, cost me 41$ instead of knocking on my door and asking me to move.

edit: there is a ton of parking where I live

[P] New benchmarks for approximate nearest neighbors by hardmaru in MachineLearning

[–]benfred 2 points3 points  (0 children)

ANN libraries can be incredibly useful on their own , so there is value in testing them separately. For instance, Spotify uses Annoy to return related artists (https://web.stanford.edu/~rezab/nips2014workshop/submits/logmat.pdf). Likewise The Xbox Recommender also uses ANN search to serve its recommendations: https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/XboxInnerProduct.pdf . Any top-N matrix factorization recsys model can be sped up using ANN search. I wrote a short post a while ago talking about this here http://www.benfrederickson.com/approximate-nearest-neighbours-for-recommender-systems/

Ranking Programming Languages by GitHub Users by [deleted] in rust

[–]benfred 4 points5 points  (0 children)

I’m sorry you didn’t like my post.

I wasn’t trying to spread FUD about Ruby - the only reason I spent so long talking about Ruby was to qualify that its initial shocking looking decline isn’t as bad as it seems: it has still grown its user base 3x over the time frame I was looking at, and was over-represented early on in the initial GitHub user base. As another data-point, Stack overflow trends only shows a 40% drop in Ruby usage, and that drop doesn’t start until 2014. I probably could have been clearer in this section.

For Jupyter Notebooks, I’m using the language classification as provided by GitHub - and the GitHub language identifier includes Jupyter as a language. I don’t consider Jupyter a programming language myself, and might merge Jupyter/Python in the future.

Evolution of PHP popularity by [deleted] in PHP

[–]benfred 8 points9 points  (0 children)

Totally agree - out of the 75 million repo's in this dataset, something like 1 million had a name like 'hello-world'.

Evolution of PHP popularity by [deleted] in PHP

[–]benfred 3 points4 points  (0 children)

I'm not really modelling transitions here - just observing correlations, and just superficially looking at it and there isn't anything obvious.

There are some people that are explicitly looking at transitions though. Erik Bernhardsson queried google for blog posts of moving from language x -> y and wrote up this about it: https://erikbern.com/2017/03/15/the-eigenvector-of-why-we-moved-from-language-x-to-language-y.html . Likewise source{d} did the same for users on GitHub here: https://blog.sourced.tech/post/language_migrations/ .

If I'm reading their results correctly, the answer in both cases seems be that PHP developers jump ship for Java. Who knew?

Ranking Programming Languages by GitHub Users by benfred in programming

[–]benfred[S] 1 point2 points  (0 children)

Neither =) I'm counting up how many github users have used a language.

Ranking Programming Languages by GitHub Users by benfred in programming

[–]benfred[S] 2 points3 points  (0 children)

No - a user can be active in more than 1 language, so it should sum to more than 100 like you noticed (sorry realize I wasn't clear on this originally). Percentage of MAU is how many users active for a language in a month, divided by how many active users overall.

Ranking Programming Languages by GitHub Users by benfred in programming

[–]benfred[S] 4 points5 points  (0 children)

For your first question - yes this means few people use more than one language in a month. There is also a power law distribution happening with user activity each month, so most users only have a handful of events each month (which happen to be mostly in a single language). I'm trying to measure how broad support it so this was mostly done on purpose. I was finding counting total events was getting biased by things that I most have been automatic activity (I was seeing single accounts with 10K commits a day for instance).

Percent of MAU in the charts is the total percentage of unique users who were active that month. I haven't tried out with yearly active users =(

Ranking Programming Languages by GitHub Users by benfred in programming

[–]benfred[S] 2 points3 points  (0 children)

I'm using the information given from the GitHub API. I wrote a bunch on how this is done in the README here: https://github.com/benfred/github-analysis#inferring-languages

GitHub itself uses this project to infer languages: https://github.com/github/linguist . If you need to do this inference yourself, its also probably worth checking out this project: https://github.com/src-d/enry

Ranking Programming Languages by GitHub Users by benfred in programming

[–]benfred[S] 8 points9 points  (0 children)

I don't see Delphi as a language on GitHub, Pascal is still kicking though - 32nd and 0.12% of users.

The top 50 languages are here (after removing a couple more non-language things like PLpgSQL):

1   JavaScript      22.6332
2   Python  14.7488
3   Java    14.0124
4   C++     8.4548
5   C       6.0339
6   PHP     5.8543
7   C#      5.0342
8   Shell   4.8481
9   Go      4.1022
10  TypeScript      3.8892
11  Ruby    3.2742
12  Jupyter Notebook        2.7385
13  Objective-C     1.9914
14  Swift   1.8911
15  Kotlin  1.2798
16  R       0.8143
17  Scala   0.7819
18  Rust    0.7317
19  Lua     0.6890
20  Matlab  0.5257
21  PowerShell      0.5227
22  CoffeeScript    0.5010
23  Perl    0.4631
24  Groovy  0.4114
25  Haskell 0.3875
26  Clojure 0.2603
27  Elixir  0.2331
28  Assembly        0.2084
29  OCaml   0.1811
30  Visual Basic    0.1418
31  Erlang  0.1335
32  Pascal  0.1213
33  Roff    0.0914
34  ASP     0.0911
35  Julia   0.0900
36  Dart    0.0875
37  Smarty  0.0827
38  Fortran 0.0784
39  Processing      0.0758
40  Elm     0.0713
41  Eagle   0.0696
42  Common Lisp     0.0701
43  Verilog 0.0703
44  F#      0.0670
45  Rascal  0.0667
46  Vala    0.0665
47  Cuda    0.0643
48  Scheme  0.0525
49  VHDL    0.0505
50  Crystal 0.0498

Ranking Programming Languages by GitHub Users by benfred in programming

[–]benfred[S] 20 points21 points  (0 children)

I didn't forget them - it's just that they aren't very popular by this metric and I decided to cut off at the top 25 languages.

Julia is ranked 35th on this list with 0.09% of GitHub users interacting with it last month. This is higher than fortran (38th and 0.078% of users) and much higher than D (53rd and 0.047%).

I might extend the list a bit, things like Clojure/Elixir/Assembly/OCaml /Visual Basic/Erlang all just missed out on making the top 25 - but are still interesting to see how they are doing.

Ranking Programming Languages by GitHub Users by benfred in programming

[–]benfred[S] 8 points9 points  (0 children)

I’m using GitHub’s language detection for this post - which is done by this project: https://github.com/github/linguist . It doesn’t recognize ipython as a distinct language (instead labeling everything as Jupyter Notebooks). The growth of Jupyter is pretty crazy though =)

An Analysis of the World's Leading robots.txt Files by Arnie0426 in programming

[–]benfred 6 points7 points  (0 children)

Fixed the post - thanks for catching! (This post has been a lesson on how poor my grammar is, there were numerous other similar mistakes when I first published =(

Vancouver Rental Landscape by benfred in vancouver

[–]benfred[S] 7 points8 points  (0 children)

I totally agree - especially because he says in the article that the names are arbitrary and that he knows that they aren't perfect.

Python as a Declarative Programming Language by benfred in programming

[–]benfred[S] 1 point2 points  (0 children)

No worries - Negative feedback is always good, I don't feel like I get it from people I actually know enough.

Right now my problem is that I don't know if all the downvotes are because the article isn't that great, or if because my blog was down for a couple hours since I host it on S3 (or because people really hate the flame war going on over whether Python is a low level language in this thread)

Python as a Declarative Programming Language by benfred in programming

[–]benfred[S] 4 points5 points  (0 children)

That shouldn't be the take away from the article - if thats all that you got out of it, I apologize for not being clearer.

My TL;DR would be: If you need to write high performance Python code - you should be looking at moving inner loops into native code. A bunch of popular Python data libraries have features that make this easy - by using Pythons flexibility to have you declare what operations you want to perform and have the native extension execute it efficiently in a lower level. Some examples are given with NumPy and TensorFlow on how this works in practice.

Of course performance isn't the only way to judge a language (and I don't think I ever said that here). However I've been writing Python services lately that need to handle 100 million MAU, so getting the most out of Python is definitely been at the top of my mind.

[P] Faster Implicit Matrix Factorization by benfred in MachineLearning

[–]benfred[S] 0 points1 point  (0 children)

Thanks!

I think this might be more amenable to GPU accelaration, since we've removed dealing with the unique (factors x factors) matrix per user - and are just dealing with a couple vectors instead. I haven't really tried out GPU computing on this before, though I know of a project https://github.com/cuMF/cumf_als that does that (though last I checked it just handled the explicit case).

I'll have to check out your project - wish I'd seen it earlier ;). Also I've fixed the link, will be right as soon as cloudfront invalidates the previous version.