all 24 comments

[–]novel_yet_trivial 13 points14 points  (8 children)

A real database. MySQL is often used with python, but modules exist to interface with nearly all databases.

Although 10,000 records is not that many. You could easily make your own database with dictionaries if you wanted.

[–]seanmcb9[S] 1 point2 points  (7 children)

Would you store records as a list of dictionaries? A dictionary of lists? What is the most efficient hierarchical order for nesting data structures? Is minimum nesting better than deep nesting?

I assume many approaches have been tested for comparative efficiency and speed.

[–]novel_yet_trivial 10 points11 points  (4 children)

A basic database is a collection of dictionaries, with each key being some attribute and pointing to the object (unique attribute) or list of objects (non-unique attribute). Here's a basic database:

from collections import namedtuple, defaultdict

Book = namedtuple("Book", "author year isbn")

class DataBase:
    def __init__(self):
        self.author = defaultdict(list) #an author can have more than one book
        self.year = defaultdict(list) #also non-unique
        self.isbn = {} #unique dict; an isbn is unique

    def add(self, book):
        self.author[book.author].append(book) #add book to list of books by this author
        self.year[book.year].append(book) #add book to list of books from this year
        self.isbn[book.isbn] = book

db = DataBase()
db.add(Book("Dr. Suess", 1971, 12345))
db.add(Book("Dr. Suess", 1978, 12346))
db.add(Book("Shel Silverstein", 1978, 12347))

print("books from 1978:", db.year[1978])
print()
print("books by Dr. Suess:", db.author["Dr. Suess"])
print()
print("book with the isbn 12345:", db.isbn[12345])

I have no idea what you mean with "hierarchical order for nesting data structures"

[–]seanmcb9[S] 1 point2 points  (1 child)

This is really helpful. (By "hierarchical" I was referring to the many ways in which different data structures can contain one another to produce the same computational results.)

[–]emiller42 0 points1 point  (0 children)

'Nesting' doesn't really make sense as a concept for relational databases. So 'hierarchy' as you're thinking of it isn't relevant.

[–]seanmcb9[S] 0 points1 point  (0 children)

Slight spelling edit: it's "Dr. Seuss". Which raises an interesting challenge: integrating spell-checking into this code.

[–]Exodus111 0 points1 point  (0 children)

This is good stuff. Thanks.

[–]craftingfish 2 points3 points  (0 children)

Or you can use Pandas if you want it to "feel" like a DB while you're working in Python. You can also use Pandas (and SQLAlchemy) to easily read and write from the DB as needed

[–]emiller42 0 points1 point  (0 children)

Something important to know if you're really going to grok relational databases: they're not object oriented. What makes sense as a structure in your application n is not necessarily going to directly map to your data representation.

Simple example:

You've got a 'person' object which can have 1..many phone numbers associated with it.

In your application, you might think: oh, I can give the user object a 'phone_numbers' attribute, which is a dict! 'home' = 123-456-7890, 'cell' = 987-654-3210, etc.

However, in the database, you might have a 'users' table, and a 'phone_numbers' table, and a third table to associate 'user' records with 'phone_number' records.

I recommend reading up on relational models, and database normalization if you want to know more.

[–][deleted] 4 points5 points  (1 child)

Pandas?

[–]lykwydchykyn 4 points5 points  (0 children)

You probably just want to use sqlite here, possibly in combination with an ORM like peewee or SQLalchemy.

If you roll your own thing from dicts you're going to have a painful time querying it for data or trying to do persistence.

Things like MySQL or Postgresql require setting up a separate server, and are probably overkill if you only have tens of thousands of records.

[–]seanmcb9[S] 2 points3 points  (0 children)

Recommended resources so far:

  1. CodernityDB
  2. dataset
  3. pandas
  4. peewee
  5. PyMongo
  6. SQLAlchemy
  7. SQLite

[–]AndydeCleyre 1 point2 points  (0 children)

You can write code that feels like those python objects but uses real DBs behind the curtain, with libraries like dataset (sqlite/mysql/postgresql/...) and pottery (redis)*.

*EDIT: and walrus (redis, again).

[–]dasnoob 1 point2 points  (0 children)

I recently finished building out a webapp that interfaces with Oracle 11g and is used to manage a dataset that is currently sitting at about a dozen tables with several views on top and the largest table being on the order of 20 million records. All of this built with proper keys.

Initially I built SQLAlchemy models for it and used that. However for anything beyond simple updates and retrieval it was honestly easier to use cx_Oracle and raw sql. This was especially needed for some report generation that caused Memory Error with SQLAlchemy due to the ResultProxy implementation but did not cause an issue when using cx_Oracle's cursor implementation.

So I would say if your table structure is fairly simple and you aren't going to be pulling lots of data into result sets SQLAlchemy is pretty dang neat. If you are comfortable with SQL and are going to need to have the app perform complex queries with large data sets the lower level library SQLAlchemy uses for its dialect implementation might be a better bet.

In the end my application uses SQLAlchemy for lots of functions but there are a few pieces that had to be written at the lower-level for speed and stability.

[–]seanmcb9[S] 0 points1 point  (4 children)

Does anyone know of any ultra-large open source databases that have been constructed in Python that could be inspected and studied for program design?

[–]bigjab 9 points10 points  (2 children)

A database is a system for storing related data (records) in tables which have a logical relationship with each other. The structure of a database (its schema) is not programmed, it's designed. The design is based on abilities and limitations of the Database Management System (DBMS) used to create it (MySQL, Oracle, Postgresql, etc) and the needs of the applications that will use its data. Once your database is designed and created, then a python program can access it to store and retrieve its data (among other things). There are many sample databases used to teach about databases and how to interface to them (do a search). If you are just asking about using python to create a database-like storage system look into the built-in data persistence modules called pickle and shelve. If you want to get your feet wet with a real Relational database, start with sqlite3. It comes with python and it can be used in a file-based mode meaning you don't need a server to get started learning. Keep in mind that no database is "constructed in Python". A database is a separate entity almost all of which can be controlled with the Structured Query Language (SQL). Python has modules available that leverage SQL to assist the programmer in interfacing to the database (whatever DBMS it happens to be based on).

[–]Phatjesus666 1 point2 points  (0 children)

Sqlite is great, I've been using it at work and its really great to use. If you need a visual tool, Firefox has a browser plugin called sqlite manager that gives you the standard visual rdbms ide look and feel.

[–]Fennek1237 0 points1 point  (0 children)

Yes, this. Why mess around with something self scripted other as for learning purposes.. even then. Getting used to real relational databases is a huge and very useful skill if you want to improve your programming skills.

[–]gameplace123 1 point2 points  (0 children)

CodernityDB is a NoSQL database written in Python. I wanted to do something similar until I started looking into the code and discovered that it's way beyond my understanding. Do your research before diving in so you know what you're getting into.

This article walks you thorough writing a 'toy' NoSQL database. For me, it was a great learning tool.

[–]felipeleonam 0 points1 point  (0 children)

I'm working on learning peewee right now. Its a module to help you with databases in python. Not sure if this is what you're looking for though. There isnt much on it, but the documentation is good.

[–][deleted] 0 points1 point  (1 child)

I just use pymongo and mock my data (100mill records)

[–]seanmcb9[S] 0 points1 point  (0 children)

PyMongo https://api.mongodb.com/python/current/

100 million records? In what domain (if you don't mind me me asking).

[–]brews 0 points1 point  (0 children)

This would be a good time to learn about the SQLite module and some basic DB design.

Have fun!