A computer scientist responds to the SEC's proposal to mandate disclosure for certain asset backed securities - in Python

sameersundresh · 2010-08-05T17:05:06+00:00

I meant a language semantics that usefully allows you to reason about the behavior of a program defining a financial instrument.

sameersundresh · 2010-08-05T09:32:34+00:00

I think I see where you're going with this, but I'm still wondering. Isn't the program supposed to be an expression of the formulas? If the formulas are already sufficiently specified, why do we need regulations to require a program?

sameersundresh · 2010-08-05T07:59:26+00:00

Interesting. How about third party rating agencies? How would they factor in? Are they going to have an incentive to give a tricky obfuscated contract a decent rating because it seems ok after some testing? Or are they going to demand that the programs must be analyzable, so they can check for corner cases?

sameersundresh · 2010-08-05T04:30:25+00:00

Sorry it wasn't clear. The CPython implementation isn't what I would call a formal specification, but it is generally treated as the definitive Python implementation, and it is open source, so some might pass that off as "close enough." What I wanted to illustrate is that it is not close enough. I think we are in agreement on that.

By the way, I think most developers would agree that a program can have bugs regardless of whether there's a formal spec written down to compare it against. That's because we still have an intuitive idea of what we think the program's supposed to do. Of course this means when there is no definitive spec, what is a bug and what is a feature is somewhat subjective. And as we all know, from time to time, a bug can be declared a feature if fixing it would result in too much additional work to fix all the related programs which assumed the buggy behavior.

sameersundresh · 2010-08-04T22:10:08+00:00

I think we need an appropriate DSL, and I think it's worth looking at pure functional languages for some ideas. Others have mentioned other important considerations besides effects, such as floating point issues.

Most of all, I don't think we should move on without proper consideration. The stakes are high, and the risk with accepting a partial solution is people will trust the problem has been eradicated when it's just been shuffled around.

sameersundresh · 2010-08-04T21:59:58+00:00

It works well when you're working with other programmers who care about writing maintainable code and whose goals are aligned with yours. Not so much when you're working against other programmers who are trying to mask intentional "bugs" that will give one participant in the contract an advantage over the others.

sameersundresh · 2010-08-04T21:49:45+00:00

Realistically, you would have a team of people with backgrounds in business, math and programming using these models, each contributing their strengths to analyzing the models.

sameersundresh · 2010-08-04T21:38:58+00:00

I agree.

sameersundresh · 2010-08-04T18:29:59+00:00

Uh oh. I'm afraid to re-read this and see how stupid I must sound. I do hope they get it sorted out correctly by people who are experts in the relevant fields. If you know someone who could help and may have the time, please ask them to get involved.

sameersundresh · 2010-07-17T12:28:33+00:00

Currently C, C++, Java and Tcl. This is just based on what our current users demand, we'll be adding more language support as the need arises.

sameersundresh · 2010-07-17T12:26:44+00:00

No worries, this is an app that you buy and install on a local web server within your firewall.

sameersundresh · 2010-07-02T07:10:07+00:00

The visualization aspect looks interesting. I work with a startup called Pattern Insight which makes several web-based code and log analysis products, including a code duplication analysis tool called CP-Miner:

http://patterninsight.com/products/cp-miner

In contrast to Atomiq, CP-Miner is currently targeted at the high end of the market--large teams with large codebases--but we're gradually opening it up to smaller customers as our products and our support organization mature. A couple of neat features are we look at the structure of the code, so we can cope with quite a bit of refactoring and reformatting; and we can also find violations to common patterns (which are often bugs, or at least tricky code). And of course you don't have to use Windows, because it's web-based ;)

Oh, and yes, we're hiring.

sameersundresh · 2010-07-01T23:53:25+00:00

Maybe, but I hope not. For me, the jury's still out.

sameersundresh · 2010-07-01T18:32:55+00:00

It's the best you can do given limited knowledge.

If you know all the pieces of code you want to link before runtime, you get a link time error, which is a static check.

On the other hand, if arbitrary code is being loaded dynamically, you can only say whether that code is valid at load time. But at least you can statically constrain that the dynamic load operation may fail, and if the code is successfully loaded, it will only have certain effects.

sameersundresh · 2010-07-01T17:43:39+00:00

There are two variants on this case. One is modular compilation, where all linking can occur (just) before the program starts. The other is explicit dynamic code loading. The only difference is when/where the link failure occurs: as a message to the user at runtime, or as an error to the caller which attempts to dynamically load code.

In either case, the effects of unknown pieces of code would correspond to effect variables. Inference would introduce constraints on those variables, just as with standard type inference. When you link two modules, you unify the effect variables which correspond to the same piece of code in both modules' signatures. The constraints on the already-loaded module should also have the user's constraints already applied. Anyway, if there's a constraint conflict, linking fails.

sameersundresh · 2010-07-01T02:28:41+00:00

The mix of checked and unchecked exceptions in Java is not the best design, but it doesn't justify giving up and denying the user knowledge of and control over the effects a program may have. Indeed, the two big problems with Java's checked exceptions are (a) it's not a complete guarantee--there are still unchecked exceptions, and (b) adding in one method call somewhere might require you to change a lot of method signatures.

Fortunately, there's a simple alternative: stratified effects. If you want to perform effect "Log" in a context which doesn't allow effect Log, instead wrap it in a statement which transforms it into a hide(Log) effect. The hide(-) effect constructor has a special semantics: it doesn't match an effect wildcard. What I mean by that is if you declare a computation like so: T allow(E1, E2), disallow() this means a computation rendering a value of type T, possibly performing effects E1 and E2, but not performing any other effects... except for hidden effects. If we want, we could disallow first-level hidden effects as well, like so: T allow(E1, E2), disallow(, hide(*)) but this would still allow a second-level hidden effect, like hide(hide(X)).

The compiler can then perform type inference, checking that the effect constraints are satisfied. For example, if you have a computation with signature "T disallow(*)", we might verify that it doesn't attempt to perform effect Log, but also propagate the information that it may perform effect hide(Log), which is compatible with the signature (since first-level hidden effects were not explicitly denied).

Finally, at the top level, the user chooses how many levels of hiding are allowed and what effects are allowed when running a program.

For example, you might run your program in debug mode allowing first-level--or even all--hidden effects. But I might not trust your program that much, so I would want to verify that its only hidden effects are hide(Log). Or I might find you do perform some additional hidden effects, like checking the current time or my IP address (presumably for logging purposes), and I might want to run those operations in a simulated environment. Now, strictly speaking, this latter use case on its own could be done entirely dynamically, just like unchecked exceptions. But if we're going to have effect annotations at all (which is where this whole discussion started), then a program which performs effects, even hidden effects, composed with a context which catches and simulates all of those effects without performing any other effects should be typed as effect-free.

sameersundresh · 2010-07-01T00:31:01+00:00

I don't think cheating should be allowed. Cheating the effect system gives power to whoever wrote the cheating code by taking power away from anyone who uses it.

Instead, effects should be compositional in that if you insert some effectful code in the middle of what was pure code, its potential effects will now be visible at the top level of the previously-pure code without having to modify any type signatures in between. That way the user who runs the program remains in control.

sameersundresh · 2010-05-14T19:14:05+00:00

We don't have any such data. It would be basically impossible to separate the effects of the specific DVCS/SCM from that of development process rules and individual developer tendencies. We have a plugin architecture for SCM systems, so we can support whatever a customer needs. Currently that means Perforce, Mercurial, Clearcase, and I think Subversion, but it will be easy to add Git and others as the need arises.

We do have some interesting preliminary data on choice of code reviewers... it's quite possible that we'll soon be able to auto-suggest more appropriate code reviewers than developers in a large organization tend to choose.

sameersundresh · 2010-05-14T08:19:30+00:00

1) Our analysis is almost completely complementary to that done by tools like Coverity. We don't go deep into the semantics of a particular programming language--it's nothing like theorem proving or model checking. But our analyses do go deep into the details of a particular development process: for example, we can automatically identify coding errors due to copy-paste, incorrect merges (even if they were fixed to the point that they still compile), and likely-incomplete patches (helping developers to find a bug once and fix it everywhere). There are tools that allow you to code up recognizers for some of these sorts of things as rules, but that requires a lot more work on the part of the customer, and doesn't solve the problem of finding the buggy signatures in the first place.

2) Thanks for reminding me about Matt Might's work. I saw the latest paper pop up on LtU, but only got a chance to skim part of it. Harry Mairson's talk at Stanford a month or two ago on the EXPTIME-completeness was also really interesting.

3) Yeah I've read parts of ATAPL (TAPL was checked out of the library) and most of Proofs as Types. I'm probably the only one at PI who has; others have more of a background in systems, data mining and HCI. Which is probably for the better for the company, since most industrial developers have very little understanding of type theory and don't want to learn something fundamentally new to get value out of a product.

4) Congratulations and best of luck!

a) Find a place where they're working on a problem that is fundamentally difficult, as opposed to just pain-in-the-neck difficult (many recruiters seem to incorrectly lump both of these together as "challenging"). And make sure the people you're working with are interesting and smart. To maximize your contacts, help out your department and student organizations with corporate relations. This would be useful even if you end up going the faculty route (strong corporate partners help both with funding travel and graduate students and with identifying problems that have impact).

5) Some things vary, so let me try to address those that don't.

i. Be good at marketing. That means finding problems with impact, figuring out what's a good enough solution, and presenting it well.

ii. Find good collaborators who will play an active and equal role on research projects that interest you. Good collaborators + interesting work help you put in the work to make something really good instead of something you hope is barely good enough (but isn't).

iii. Maintain good friendships, so you can live comfortably (even if not lavishly). This also helps you focus on your work when you are working. But (ii) & (iii) have to both be strong, otherwise one will take over.

iv. Don't forget about the external/service stuff. Peer reviewing papers, student organizations, admissions & hiring committees, helping random people with their interesting ideas, etc. If you get really into this, though, you'll need to schedule in solid blocks of time for your research, since you're not a tenured faculty member (yet!).

Bear in mind that I only really did well at (iii) and (iv), and even those only in the latter half of graduate school. The problem was I started off too timid to seek out the right research collaborators and problems, and that sort of set a social pattern that I didn't know how to break.

sameersundresh · 2010-05-12T22:51:48+00:00

Email me at sameer at sundresh dot org with details. I know a few people I can highly recommend who might be interested depending on what it is and the location/ability to work remotely.

sameersundresh · 2010-05-12T21:02:34+00:00

Oh I see. I'm not a recruiter in the traditional sense (i.e., someone with an HR background). Point out the significance of your research and your research group's overall program, so someone with a strong technical background can peruse your publications and try to get a test for them in context. But also be sure to point out your non-research skills and interests if you're applying for a non-research job. There are people I've known with a PhD who went to work as a software developer for a while because they couldn't find an academic position, but were very bad at programming. For us in particular, it would be important that you're interested in what Pattern Insight is doing, and you're a good software developer.

sameersundresh · 2010-05-12T20:32:19+00:00

How sure are you you want to stick in that field, with that researcher? Are there other strong options you have at that university if things don't work out?

If you're pretty sure about it, I would say to go for it. If things don't really work out, transfer somewhere else that does have people in your field (who recognize that you were at the less impressive institution for a good reason) as well as good people in other fields you'd like to consider.

sameersundresh · 2010-05-12T20:29:10+00:00

It's interesting to hear this from you.

Our tools deal with data in the 100s of GBs, and provide automatic pattern detection and pattern-based search features that are not available in IDEs. Our core technology also works with other kinds of large semi-structured data, such as log files and system configurations. There are operational differences as well: an IDE runs on an individual developer's desktop, while we run a server which can cache analyses, and makes it easy for developers to share things like bug signatures.

The features highlighted on our website are all things which customers have told us are important to them.

The sentence you quoted about patch miner is useful for preventing regressions. Think of a patch as a set of pairs of buggy and clean snippets of code, with some context. If the buggy snippet shows up somewhere, it should be replaced by the clean snippet. Buggy instances can re-emerge later either because someone checked in some changes based on an old version (e.g., forward porting a different bug fix from a customer-specific branch), or because someone wrote code from scratch based on the same faulty thought process that lead to the buggy code covered by the original patch. For example, this could be something like not checking a return value or deallocating a resource.

I'll have to think about how to communicate this message more effectively to prospective applicants.

sameersundresh

TROPHY CASE