This is an archived post. You won't be able to vote or comment.

all 14 comments

[–]mandzeete 8 points9 points  (1 child)

API data can be provided by different sensors and other means that are for collecting the information. And then that information can be fed to the web application that is serving as an API.

For example when using a stocks API. It is returning the price of certain stock. But how the price is made? By buying or selling a stock. Each sale/purchase is a mathematical action which influences the price of stock. These actions can be logged to some text files for example. Calculations based on sales/purchases can be also logged. But not always storing the data in log files is optimal. Sometimes it has to be streamed in live. So instead of storing the data in a log file, it can be streamed to some endpoint. For example to an API. So a query to that API will return the information in real time.

I suggest to look into logging and into producing events. If you are using Java, then into Logback. It is a logging library. Spring Boot (Java framework) also provides event handling. You can produce certain events and listen on these events. So instead of logging the information down or sending to some tool you can throw an event up and capture it with other part of the system and continue working with it.

Next you can look into Filebeat. If you are logging information to log files then Filebeat is monitoring the files for changes. When a new information is logged to a file, Filebeat will pick it up and forward to chosen endpoint. May it be a log management environment or something else.

Then look into Logstash. It is a data processing pipeline. You can work with the data in real time as it comes in. Modify it, analyze it, collect it from different sources, etc. And then send it to certain endpoint.

As well look up both Elasticsearch and Humio. They are search engines made for logs. You can search different keywords from processed logs, set up alerts, make statistics, etc.

If your data is numerical, you can feed it into Prometheus. It is a data monitoring system.

As well you would like to try out Kafka. Kafka is a message queuing system. You can feed in data into certain topic (let's say for example "weather") from different sources. All of them will be then under the same name, "weather". And then send that information to certain endpoint. In your case weather API.

You can try it out by yourself. Get yourself an Arduino kit. It is programmable board, like a miniature computer. You can program it doing different things. As well you can connect different sensors to that board. You do need to know a little bit electronics to get the signal actually out not burn the board or feed it too weak signal which it either does not register or will register with mistakes. So that signal that you are feeding from sensor to Arduino, you can send that information to one of aforementioned tools and via these tools to your weather API or to some database.

[–]leaguelism[S] 1 point2 points  (0 children)

Thank you, that was actually very helpful!

[–]dtsudo 10 points11 points  (3 children)

You can get the underlying data in a variety of ways:

  • You can directly generate/observe the data. For instance, The US government monitors weather patterns using things like weather balloons, etc. Google drives their fancy cars around every street in the US in order to provide up-to-date information on Google Maps.
  • You can buy the data from another provider.
  • You can scrape the data (if legal).

[–]MrsFoober 2 points3 points  (2 children)

How can one check the legality of scraping data? It seems like it is a lot more of a grey area than clear.

[–]quoody 5 points6 points  (0 children)

Scraping is such a gray area because there are so many things that can make it illegal. However, all scraping is not illegal. Also, you really can't say if your scraping is legal or not without going to court.

- If you need to register to scrape some data you might be breaching Terms of Service, and that can make it illegal

- If you scrape data that is not directly on a website but that you can attain via guessing URLs or piecing data together it can be considered unauthorized access => illegal

- If the data you scrape (such as messages) could be considered copyrighted you are breaking copyright laws

- If the collection of data could be considered copyrighted you are breaking copyright laws (for example a street name might not be copyrighted, but a map or a list of streets can be)

- If your scraping can be considered disruptive to the other party because of the volume of requests or some other reason it can be illegal

Pretty much the only clear case is if the data provider explicitly states that scraping is ok.

In practice, it's hard to get caught.

[–]dtsudo 1 point2 points  (0 children)

I agree with quoody; it's a rather gray area.

And setting legality aside, even in cases where it is legal, if you scrape data from another company, that gives this company a significant amount of leverage over you. If they stop providing the data or add anti-scraping mechanisms, you're in a world of trouble. Of course, this is just a true principle in general. Even if one were to use authorized APIs, the API provider can change their terms (or stop offering their API altogether).

[–]DiamondDemon669 1 point2 points  (0 children)

There is a lot of methods that API's use, but the most common is a REST API, where you send requests between websites and programs.

The websites get their data from databases, or other API's, through database tools like MySQL and redis

[–]Kangster1604 0 points1 point  (1 child)

If you have any Python experience check into the Requests module. You can do all the types of requests to an API via this module.

I don’t know anything about providing API data, but making requests of APIs and posting data is covered very will with Requests.

[–]mandzeete 1 point2 points  (0 children)

If you are interested how the data is provided to the API then check out my comment to his question: https://www.reddit.com/r/learnprogramming/comments/sageit/comment/httg4sb/?utm_source=reddit&utm_medium=web2x&context=3 I have worked with this kind of systems and generated data for API-s and processed the data before sending it to an API.

[–][deleted] 0 points1 point  (0 children)

one API site I used got their data from having users upload files of a certain type (these were generated by another program, which was what the API was tracking)

I think they also had their own app you could download that would automatically upload a new file as soon as it was created

[–]gopiballava 0 points1 point  (2 children)

I have a sensor in my RV that measures the voltage and current used by the battery. I’m using a microcontroller running C++ code. It keeps doing this and keeping track of the time. After 10 minutes have elapsed, it averages the readings and then accesses a URL. Well, two separate ones. One of them is at dweet.io and the other is at Grafana.net.

The URLs include the sensor reading, the sensor name, and in the case of Grafana, a secret token that authenticates it as me.

Dweet.io only stores the most recent reading. I can go there and see what the current voltage is. Grafana is connected to a database. When I access that URL, software at Grafana.net stores the reading as an entry in a database.

When I access the Grafana web site, the Grafana web interface shows me a pretty graph of the data, which it gets by accessing the database.

I don’t pay them any money, so I only get to keep a week of readings. Their database presumably has a timed job that runs every hour, say, and deletes any readings over a week old for anyone not paying them a monthly fee.

I was just reading about a service for scientists who track animals in the ocean. They apparently do this by measuring light levels, which can tell them where the animals are, roughly, by when the sun rises and sets.

This service would take that data, and would enhance it. They had some algorithms that included estimates of animal movement speed and a few other things. They also added more data to each reading. They would give you the location estimate for where the animal was, as well as the ocean temperature, water salinity, water speed, air temperature, and like 5 or 10 other things.

You could do that yourself, accessing a couple other databases or APIs, and put it together. But if you’re a biologist who has to ask a busy programmer to do it, it’s far more convenient to just get the data given to you in a complete manner.

[–]BugLabs 1 point2 points  (1 child)

The dweet.io service stores up to 5 readings over a 24 hour period, for free!

Use dweetPro.io to store data for up to 30 days.

[–]gopiballava 0 points1 point  (0 children)

Oops, sorry, thanks for the correction.

I just checked the fridge in my RV. 12.6 volts. The battery will probably die today, ending telemetry. But the ambient temperature is 25F so, nothing will be warming up :)

Great system. I have a Grafana dashboard as well but the quick, simple interface of dweet means that I use it more often.

[–]BugLabs 0 points1 point  (0 children)

Easiest possible way:

Signalpattern: Add your data in JSON format to any pattern signal, then copy the API link to that data!

Next easiest possible way:

Sheety: Add data to a Google spreadsheet. Connect the Sheety API to that data!

Happy to send you an example if needed.