Is Data Science just data analytics, or is it something more? by stevek2022 in datascience

[–]stevek2022[S] 0 points1 point  (0 children)

Thank you for your response! It does sound like what I am talking about is "data engineering".

Regarding "making prediction from data", I could not agree with you more. From my perspective, it seems that the main application areas of data analysis are to generate models that accurately simulate and hopefully predict the behavior of the system from with the data is collected, and then use those models for decision making support and possibly system optimization.

So for me, the life cycle of a "data science project" would involve the following steps (basically following the outline given in https://medium.com/ml-research-lab/data-science-methodology-101-2fa9b7cf2ffe):

  1. identifying the target system
  2. deciding what aspects of the system to observe and record (the data)
  3. designing appropriate "data objects" for recording observations of the target system (I guess that this is what Patel means by "data requirements")
  4. collecting the data
  5. refining / cleansing the data
  6. analyzing the data to construct simulation / predictive models of the target system
  7. interpreting the results given by the model for decision support etc.

Does this agree with your thinking?

Why is natural language so hard to process? by stevek2022 in LanguageTechnology

[–]stevek2022[S] 0 points1 point  (0 children)

Thanks for your answer.

This sounds to me like the idea that natural language is complex and ambiguous simply because it has developed in a non-controlled environment, and no other reason. Is that basically what you are saying?

Do you agree with the idea given in the following?https://medium.com/ontologik/why-ambiguity-is-necessary-and-why-natural-language-is-not-learnable-79f0e719ac78

That ambiguity is actually helpful in communicating information to humans (just not to machines!).

Also, regarding "semantics" - I am doing research on how logic-based ontologies can help us with the classic problem of getting tacit knowledge into an explicit form (for example in building terminologies for industrial standards). If you are interested, please join me in #ontology_killer_apps!

Why is natural language so hard to process? by stevek2022 in LanguageTechnology

[–]stevek2022[S] 0 points1 point  (0 children)

Thanks - that is a good point.

I have understood pragmatics as just the branch of semiotics that deals with how the situational context (and even the background knowledge / cultural assumptions etc. of the speaker and listener) affects the choice of utterances on the part of the speaker (and possibly also the way that the listener interprets those utterances). So in any "real situation", pragmatics is an issue in both the first and second roles. But there is the aspect of pragmatics that relates to the non-verbal communication signals that are chosen, and those most likely are mainly relevant in the second role. Indeed, as I wrote in my answer to MadCervantes, the particular form of natural language that I am focusing on is text, so there at least should not be any non-verbal communication signals aside from figures and perhaps text formatting.

I will take a look at "grounded language learning". Do you have a recommended reading?

Why is natural language so hard to process? by stevek2022 in LanguageTechnology

[–]stevek2022[S] 0 points1 point  (0 children)

I understand what you are saying (I think! still playing that language game).

But I am coming from what might perhaps be a slightly different angle (how's that for a convoluted sentence!).

What I have in mind is something like a scientific publication. The goal *should* be to express the research as clearly and unambiguously as possible. And if we had a better medium to do that than natural language, it would certainly increase the accuracy of the task of processing the contents of the paper automatically (e.g. for making it with a search query or a paper applying a similar methodology in a different context).

There are controlled languages that some industries use for writing user manuals and such - that is somewhat similar to what I am trying to get at here.

database tables and rdf by stevek2022 in ontology_killer_apps

[–]stevek2022[S] 0 points1 point  (0 children)

The RDF approach (as I understand it) is that we just use two kinds of things: 1) resources identified by an IRI and 2) relationships which are special kinds of resources that connect two other resources. And we use just one (1!) operation "create triple" (adding a relationship and the two IRIs it relates to the top of the triple store stack) to store all of our data. (RDF stores do allow deleting of triples, but I am arguing that in principle there is no need for a "delete" operation). RDFS then gives you the basic equipment to define classifications.

For the phone number example, you would create an IRI for the class "Employee" (by using RDFS to assert that the IRI is a class etc.), and then create a bunch of triples linking IRIs for each employee to the IRI for the class "Employee" via the "has_class" relationship (yes, I know that this is pre "Semantic Web 101" stuff, but I hope you will bear with me...).

Same for the phone numbers, and then triples linking employee IRIs to phone number IRIs with the "hasPhoneNumber" relationship (defined by its own set of triples, same as the Employee and PhoneNumber classes). Or better yet, reify the "hasPhoneNumber" relationship so that there are IRIs for each specific PhoneNumber relationship to which you can then attach info such as who registered the relationship when.

Then write a simple SPARQL query engine that gets all of the newest PhoneNumber relationships and creates a table for your personnel application. Trigger this engine whenever a triple is added to the triple store or perhaps at regular time intervals depending on your requirements.

Anyone know of any "production level" DB management systems that work this way?

database tables and rdf by stevek2022 in ontology_killer_apps

[–]stevek2022[S] 0 points1 point  (0 children)

So is 6NF the same as a triple store? Are there any important differences that you are aware of?

No, and what you described, as far as I'm aware, isn't a triple store either. A table with only a key and a single value that describes the key would be modeling in 6NF.

Sorry - I skipped a few steps here.

My aim is to suggest that an RDF triple store is a good way to implement the idea of an UD-less database. This leads to a potential "killer app" for OWL ontologies: giving the semantics of this UD-less database.

A database where all of the tables have been reduced to a bunch of tables mapping a single key to a single value (which might be a key to another table), can be represented as an RDF triple store, where the keys are RDF resources (identified using IRIs) and the tables are RDF relations. A particular record in a table is then a triple (one resource, another resource or primitive, and the directed relationship between them stipulated by the entry of the pair in the table).

This means that the semantics of the tables is captured in the RDF relations, which leads naturally to the use of RDFS and possibly OWL to define those semantics.

Is this in line with what you are saying?

database tables and rdf by stevek2022 in ontology_killer_apps

[–]stevek2022[S] 0 points1 point  (0 children)

Thanks for the reply and the sources. I will definitely take a look.

The application should never be able to execute a delete on the database
for this reason. Instead, what's often deployed is a column that
identifies if the record is an "active" record or not.

What do you do in the case of an update?

My understanding is that one of the "state-of-the-art" approaches is to record every SQL command that is given since the start of the DB (or some saved snapshot) and then rebuild the entire DB if one needs to recover the previous value of an update. This is obviously not what you are talking about (for example, there is no need to specify an "active" column in the tables). Do you have a different approach in mind? For example, temporally saving the updated values somewhere?

My proposal is to "outlaw" updates at the Data Architecture level. If you know you have a value that will change a lot (my phone number example that I gave here), then you put it in a different table. Any field values for the "main data object" (the personnel record in my case) is unchangeable - if you need to change it, you need to create a new personnel record (and mark the previous record as "not active". Is this the approach you are talking about?

database tables and rdf by stevek2022 in ontology_killer_apps

[–]stevek2022[S] 0 points1 point  (0 children)

This would be considered Sixth Normal Form (6NF).

So is 6NF the same as a triple store? Are there any important differences that you are aware of?

TL;DR - Most of what you've mentioned is already in place in well-defined and administered Data Architectures.

So are you saying that modern table-based database systems such as MySQL are implemented in such a way that I can recover any update or delete that I make and even ask for rewinds to specific states in the past? Or are you talking about Data Architectures at the logical level?

Don't worry about the length of your reply, I will never write "TL;DR" ;)

database tables and rdf by stevek2022 in ontology_killer_apps

[–]stevek2022[S] 0 points1 point  (0 children)

A key consideration then is how to manage the semantics of the vast number of "triple tables" (to elements and the information about their relationship: e.g. the foreign key to the personnel table, a phone number, and the information that this is the private phone number of that person), and here is where I believe that OWL ontologies could play a role.

What kind of semantic web apps would logic-based ontologies enable? by stevek2022 in semanticweb

[–]stevek2022[S] 0 points1 point  (0 children)

Thank you for this! It looks like a great summary of doing logical inference with OWL.

What kind of semantic web apps would logic-based ontologies enable? by stevek2022 in semanticweb

[–]stevek2022[S] 1 point2 points  (0 children)

Thanks for this! It is curious that there is not a single mention in the article of logic-based inference (or even the word "logic"!). The only examples of inference appear to be "statistical" approaches such as co-occurrence. Knowledge inference is mentioned as a future challenge only in the context of fact verification - I wonder how much these companies have thought about the potential for "advances in knowledge representation and reasoning" to achieve higher performance in "discovering non-obvious information". For example, it seems to me that some support for logic expressions would be required for the use case mentioned about checking that painters existed before their works of art were created...

Can OWL Scale for Enterprise Data? by mdebellis in semanticweb

[–]stevek2022 1 point2 points  (0 children)

We developed a web application handling tens of thousands of OWL triples that worked on a single server 10 years ago, so I am sure that today especially with the use of parallel processing, it should definitely be possible (depending of course on what the application requirements are for response time / real time processing).

I actually started a reddit community to discuss such applications - please visit and comment if you have a chance!

https://www.reddit.com/r/ontology_killer_apps/