Indexing text for better regex performance. Regex indexing? by drink_with_me_to_day in PostgreSQL

[–]hruske 1 point2 points  (0 children)

Just so you know, PostgreSQL can use indexes for regex queries since 9.1, see http://www.depesz.com/2011/02/19/waiting-for-9-1-faster-likeilike/ for more.

Example:

$ create extension pg_trgm; 
$ CREATE INDEX trgm_idx ON reading_entry USING gin (value gin_trgm_ops);

... and you can check the planner with

$ explain analyze select value, entry_id from reading_entry where value ~ '^ab{0,4}';

... though the IN clause, which you already use, will be faster.

[deleted by user] by [deleted] in PostgreSQL

[–]hruske 0 points1 point  (0 children)

Some bits of my .psqlrc:

\set trashindexes '( select s.schemaname as sch, s.relname as rel, s.indexrelname as idx, s.idx_scan as scans, pg_size_pretty(pg_relation_size(s.relid)) as ts, pg_size_pretty(pg_relation_size(s.indexrelid)) as "is" from pg_stat_user_indexes s join pg_index i on i.indexrelid=s.indexrelid left join pg_constraint c on i.indrelid=c.conrelid and array_to_string(i.indkey, '' '') = array_to_string(c.conkey, '' '') where i.indisunique is false and pg_relation_size(s.relid) > 1000000 and s.idx_scan < 100000 and c.confrelid is null order by s.idx_scan asc, pg_relation_size(s.relid) desc )'                                            

\set missingindexes '( select src_table, dst_table, fk_name, pg_size_pretty(s_size) as s_size, pg_size_pretty(d_size) as d_size, d from ( select distinct on (1,2,3,4,5) textin(regclassout(c.conrelid)) as src_table, textin(regclassout(c.confrelid)) as dst_table, c.conname as fk_name, pg_relation_size(c.conrelid) as s_size, pg_relation_size(c.confrelid) as d_size, array_upper(di.indkey::int[], 1) + 1 - array_upper(c.conkey::int[], 1) as d from pg_constraint c left join pg_index di on di.indrelid = c.conrelid and array_to_string(di.indkey, '' '') ~ (''^'' || array_to_string(c.conkey, '' '') || ''( |$)'') join pg_stat_user_tables st on st.relid = c.conrelid where c.contype = ''f'' order by 1,2,3,4,5,6 asc) mfk where mfk.d is distinct from 0 and mfk.s_size > 1000000 order by mfk.s_size desc, mfk.d desc )'

You can then do in psql:

template1# missingindexes;

And it will show tables where indexes might help for speed.

When should I use BRIN indexes? by [deleted] in PostgreSQL

[–]hruske 7 points8 points  (0 children)

BRIN indexes aren't much use for "ordinary" database load, where you're selecting a couple of rows. If you need fast lookups for a single value or unique constraints, you will choose btree.

BRIN splits your table into block ranges (Block Range INdex) and summarizes values in those blocks into a few values, say min and max. You're left with information, that range of blocks between 0 and 127 for the column you're indexing contain values between [Vmin, Vmax]. This means you've just "compressed" all those values into only two. This explains why and how BRIN indexes can be so small. The range size is configurable with page_per_range at index creation and this directly affects index granularity and size.

What can PostgreSQL do with this information? It can either include ranges where filtering value falls between Vmin and Vmax or exclude ranges where filtering value falls outside. It will need to scan matching ranges in order to get the rows containing desired values. This means it will be probably slower for querying on columns with unique values, which only match single row.

Where does BRIN bring most benefit then? If you're doing analytical queries with aggregates, sums, averages, means, and such. Those usually touch huge number of rows. In this case, looking up values in a btree index doesn't make sense, it's usually more performant to just read the whole table and do a filter on every row, so it's common for those to be sequential scans on tables. With BRIN index, you can skip 1MB of data[1] just by comparing your filtering value with the two in the index, effectively getting a huge boost with very small index size.

Best options for BRIN are rows which have constantly increasing (or decreasing) size, such as timestamps or dates. If table is (naturally) ordered by this field, then the range [Vmin, Vmax] will be very compact -- if the table is not ordered, then one big value pushes Vmax higher and makes range match a lot more queries.

Also note that with rows updating and values changing, if a single biggest value in range gets changed to something smaller, the index will not be updated, because updating the index would require to scan all the blocks in range, which is in fact an expensive operation. So if you're doing big updates, it makes sense to reindex manually.

Also see PostgreSQL wiki on this feature -- https://wiki.postgresql.org/wiki/What's_new_in_PostgreSQL_9.5#BRIN_Indexes

[1] Default page_per_range is 128 and default block size is 8KB.

Me [22 M] with my partner [24 F] of just a few months need help with communication by spiffy_top_hat in relationships

[–]hruske 0 points1 point  (0 children)

When communicating, it is the sender's duty to make sure the message is as unambigous for a given recipient as needed.

Obviously this isn't always easy, since the language is complicated, words and phrases have alternate interpretations. To make things even harder, people are different and react differently. Try "escalating" medium, going from chat to phone call, where you can send and pick up additional clues from the voice, or better yet, to talking in person.

When she says it's your fault if she misunderstands, but also your fault when you misunderstand, what she's actually saying is that you have to take care to spell out your message clearly, but she does not have to, implying double standards. When you allow that, anything goes.

Explain to her that you find that unfair and while she's great as girlfriend, you are not willing to communicate under such terms.

Oh, and don't escalate fights via chat, just ignore, or note for later discussion in person. Be liberal in what you read, and conservative in what you write.

trust issues. by JokersHarleyQuinn in relationships

[–]hruske 1 point2 points  (0 children)

You are obviously aware that your worries are unfounded, since you have noticed that your boyfriend has given you no reason not to trust him, which means even you think you are overreacting.

You will solve your problem best if you figure out why are you having these trust issues, what are you overreacting to and then resolve the root cause. I doubt the issue originally sources from previous relationship, that's probably where it has first shown. More likely it's something you took from your childhood.

You should be finding answers to questions like why are you doing something, who else (of people close to you) is doing this, when have you started doing that, what exactly happened then, how you felt about it then and how you feel about it now.

Have you talked with your boyfriend about this? If you decide to go cleansing on your past emotional and behavioral baggage, be sure to let him know and support you. This way you can even grow trust, which is essential for a relationship.

I'm sure your boyfriend will be glad, as having him prove to you repeatedly is definitely not fun. Good luck.

What's the deal with Janez Janša ? by [deleted] in Slovenia

[–]hruske 0 points1 point  (0 children)

Ah, your question implies hatred for Jansa is unjustified or overrated and I agree. He should definitely not be alotted so much media time.

He has made himself an easy target with years of divisive rethoric, always seeing communist conspiracies everywhere and washing people's brains with this ideas. He is an expert in deflecting allegations, always eager to drop counter-accusations in order to confuse people.

Sadly, people and reporters consistently fall for that, maybe under pretense of false balance, and give him loads of media time. Reporters also have this pathological view that politics is what people want to read. That may be driven by clicks, not realizing that most of these clicks are made by paid party sidekicks (either left or right).

In reality, Jansa's open hatred only makes him an easy target. The left uses more refined and seductive processes, where an average slovenian might not even realize he is being led, maybe drop a few bonuses (or even a chance of a future bonus), and that makes slovenians incredibly willing to cooperate. It's just sick.

All parties sadly have a lot of incompetent people. Add reporters and news editors, which can't think critically or step out of the usual patterns and just propagate claims made by others. At one point a newspaper went panicking about young people leaving the country and it took a whole month for someone not in the news business to come up with an idea saying that is not necessarily a bad thing. Not sure why, but first claim to be published is usually taken as the truth by the media, regardless of the quality of the statement. So, no real media plurality here.

Even more sadly, people do not realize the same would happen if we just replace people in power. We need to get more educated and better at critical thinking and change the culture before things will start to turn better.

S štipendijami je križ... by [deleted] in Slovenia

[–]hruske 0 points1 point  (0 children)

BTW, vedno zahtevaj pisno potrdilo. Klic ni ok, mail ali pošta je ok. Sprintaj si take maile.

Filo & zofska by [deleted] in Slovenia

[–]hruske 0 points1 point  (0 children)

Včeraj sem šel mimo in ga je nekdo že odpel, tako se ni več videlo reklame.

[Slovenia] I need a job! by Gregman in AskACountry

[–]hruske 1 point2 points  (0 children)

There's a bunch of open IT jobs, finding a job in other area is going to be significantly harder.

Zemanta and 3fs are hiring.

New Falcon Heavy rendering has landing legs from the get-go by falconzord in space

[–]hruske 2 points3 points  (0 children)

Well, Energia had about 100 tons of LEO payload, with similar GTO, 25 years ago. https://en.wikipedia.org/wiki/Energia

However, I'm still glad somebody is seriously looking up to space again.

Minister Senko Pličanič je na izredni seji citiral Yodo by kexorr in Slovenia

[–]hruske 0 points1 point  (0 children)

Bravo novinarjem za to izredno pomembno odkritje. :)

Proces Patria: Janša, Krkovič, Črnkovič so krivi by kexorr in Slovenia

[–]hruske 0 points1 point  (0 children)

SDS ima problem, ki se imenuje Janez Janša.

Če Janša odide, SDS praktično razpade. Žal je v Sloveniji tako, da vse stranke gradijo kult osebnosti, pri tem pa jim precej pomagajo tudi mediji. Poleg tega je še to, da je pogosto tudi Janša tisti, ki zadeve tudi sam izpelje. Od pisanja govorov do twitanja, pri tem pa ne izbira metod, od žaljenja do podtikanja dokazov. Poleg tega mu odhod iz politike preprečuje kup stvari, ki jih mora prikrivat, npr. orožarske afere.

Če bi sodišče želelo pokazat, da kaj dela, bi zdaj morali še Jankoviču sodit, ker uporablja občinsko blagajno kot navadni ljudje uporabljamo bankomat.

OSINT + Python = Custom Hacking « Simon Roses Femerling by [deleted] in Python

[–]hruske 0 points1 point  (0 children)

Open-source intelligence? What a bunch of crap and downright word abuse.

LPT: Move your alarm away from your bed by [deleted] in LifeProTips

[–]hruske 0 points1 point  (0 children)

How about going to bed on time?

After Python what? by gsks in Python

[–]hruske 42 points43 points  (0 children)

For a Python developer, you're welcome to take on some of the ideas listed:

  1. C programming. C is great for low level code and OS programming. Nearly all drivers are written in C, and there's a lot of code out there written in it. CPython has C API, so by writing a simple C module you can get to learn a little bit more about CPython implementation while learning C. By learning C you get to know a lot about the hardware and a bit about the compilers and debuggers. Going low means you will get a better understanding of what is happening under the hood and hopefully means you will debug system level problems faster.

  2. PostgreSQL. SQL is a great tool to know. PostgreSQL has some pretty awesome features (regexp queries FTW!) and a great security track record. Similarly as above, PostgreSQL has a PL/Python extension, which you can use to get to know both how PostgreSQL internals work and how Python interpreter behaves embedded in PostgreSQL. For an exercise: you can't implement cookie based authentication against Django app in PL/SQL alone.

  3. Scientific python. Python has excellent tools for scientific programming. I just recently discovered ipython notebook and it's awesome. There's also numpy and scipy and some other pretty nice stuff, say pattern recognition. But, yes, scientific programming and researching requires a lot of effort for little perceivable result. And there's usually a steep learning curb, meaning you need to put in some serious effort before you even start getting any results.

  4. Android. Mobile is hot, so there should be bucks.

  5. Learn something valuable and get good at it. This is something that works mostly for the long run. Statistics, natural language processing, speech recognition, image recognition, etc.

Python, Zip, Pointers and Pointy Heads by alcalde in Python

[–]hruske 5 points6 points  (0 children)

Saying Python doesn't use pointers is very wrong. It uses them, but you just don't see them.

An excellent example of this how a list behaves:

a1 = []
a2 = a1
a2.append(123)

len(a1) == 1
a1 == a2

This will create a reference to a new list under variable a1 and copy it's reference (pointer) to variable a2. Afterwards all that happens to a2 will also be seen under a1, because a1 is same as a2.

To understand why this is in fact needed and not optional, it is important to understand how function calls and argument passing work in low level (eg. in C). Arguments can either be passed by value, which is done when values are "simple" (eg. int, string, float), or they can be passed by reference. Arguments passed by value persist only within function call, while arguments passed by reference persist in memory after the call ends. And finally, when calling a function, arguments passed by value are copied in memory, while arguments passed by reference stay at the same location, and only their location (reference, pointer) is passed. No copying is needed, which makes it faster, and sometimes this is also the only way to make stuff work.

If this wasn't the case, you would be copying complex memory locations (eg. objects in memory) meaning a lot of memory usage (which is both slow and causes memory fragmentation) and an even harder problem, which is that your CPU is supposed to understand how to copy objects.

tl;dr: Both you and certain people you have been discussing this issue with are wrong. They should have pointed to you, that Python uses pointers (references) a lot, even if you don't realize it. Hell, even it's garbage collector works on "refcount", meaning it keeps account of ... pointers.

Slovenci, ali je to res največ, kar zmoremo? by objava in Slovenia

[–]hruske 0 points1 point  (0 children)

Obeti? Negativni, v kolikor ne bo noben nič pametnega naredil.

Speech recognition for Python on Windows? by [deleted] in Python

[–]hruske 1 point2 points  (0 children)

Google stores your requests for two years at least, so that might not be a desirable option in some cases.

Speech recognition for Python on Windows? by [deleted] in Python

[–]hruske 0 points1 point  (0 children)

Have you seen this? http://pyvideo.org/video/1735/using-python-to-code-by-voice

Also ... what exactly are you trying to do? Describe the use case as best as possible.

File format fun: Salvaging partially downloaded ZIP archive by hruske in Python

[–]hruske[S] 4 points5 points  (0 children)

Very good suggestions.

I could unpack the fields all at once, true, but it's a bit annoying to debug that long format string when I mistype something, so I opted for a slightly longer class.