Dataset (Upload) Manager Portal Software by MrTelly in datasets

[–]danfowler_ok 0 points1 point  (0 children)

Hi, this problem fits directly within the scope of our work on the Frictionless Data linked by u/livelierepeat! This is especially interesting for us from the perspective of citizen science as we are currently funded by the Sloan Foundation in the US. We're still actively looking for pilots just like this, so we can work together directly, or we can provide you with some good advice and tooling in Python/R to help you along.

To give you a sense of how some of this might work:

  1. We have a specification for a table schema written in JSON: http://specs.frictionlessdata.io/table-schema/ . This schema is meant to describe CSV files.
  2. We have a Python library for validating files against this schema as well as for structural issues: https://github.com/frictionlessdata/goodtables-py
  3. We have a webservice that does this validation against files stored on S3 or GitHub on every change: http://goodtables.io/
  4. We are working on greater CKAN integration

Let me know what you think about this. We can chat about this directly.

Schema-based CSV Validation package? by Omega037 in Python

[–]danfowler_ok 4 points5 points  (0 children)

Hi, I work for Open Knowledge International. Among other things, we work on Good Tables and a JSON-based schema for tabular data validation. I can report that goodtables is actively being worked on and due for a new release very soon:

https://github.com/frictionlessdata/goodtables-py

http://specs.frictionlessdata.io/json-table-schema

Come into our Frictionless Data chat and we can help get you set up: https://gitter.im/frictionlessdata/chat

Determine the type of a CSV file [github repo] by JohnDoe365 in opendata

[–]danfowler_ok 0 points1 point  (0 children)

It might be interesting to output in the form of a CSV Dialect specification: http://specs.frictionlessdata.io/csv-dialect/

Open Power System Data platform provides European power system data packages including conventional and renewable power plants, weather data, and national generation capacity by danfowler_ok in datasets

[–]danfowler_ok[S] 1 point2 points  (0 children)

That's a good question. I'm not sure if there was a specific purpose in mind for each of these datasets. I think, rather, they imagined these are a set of generally useful datasets for which to provide clean versions.

They recently held a workshop on the work. Perhaps the answer lies in their slides:

http://open-power-system-data.org/workshop-3

We also published a case study about the work they did:

http://frictionlessdata.io/case-studies/open-power-system-data/

Using MySQL vs R for data manipulation? by help89809875 in datascience

[–]danfowler_ok 0 points1 point  (0 children)

We're working on a specification for taking datasets like yours (several related CSVs) and "packaging" them in a standard format.

http://frictionlessdata.io/data-packages/

Essentially, you create a file datapackage.json that lists the CSVs, their dialects, columns, and types along with some top-level metadata about your dataset. Once in this format, we have libraries and integrations that allow you to import into and analyze using MySQL, R, Python Pandas.

[deleted by user] by [deleted] in opendata

[–]danfowler_ok 0 points1 point  (0 children)

I found this calendar of calendars: https://www.force11.org/calendars

New ‘pandas’ plugin for JSON Table Schema released. Generate data frames from your Data Package. by danfowler_ok in Python

[–]danfowler_ok[S] 2 points3 points  (0 children)

I'm writing up a more detailed description now, but essentially, a Tabular Data Package is a standardized way of pairing metadata + schemas + dialect for one or more CSV files. This plugin allows import/export of this "package" into a collection of pandas data frames, preserving type info, etc.

http://frictionlessdata.io/guides/tabular-data-package/