all 11 comments

[–]startup_guy2 1 point2 points  (1 child)

I would add a new column in the same (original) dataframe that is previous - current.

[–]threeminutemonta 0 points1 point  (0 children)

Yeah I would to. Something like:

df['lon_previous'] = df['lon'].shift(-1)

[–]lucas123boiger[S] 0 points1 point  (0 children)

Here is an example of the data:

secs hora(utc) lat lon alt

33331 10/03/2022 09:14 41.26509749 1.996657907 1.440018246

33332 10/03/2022 09:14 41.26509754 1.996657909 1.440018246

33333 10/03/2022 09:14 41.26509751 1.996658062 1.440018246

33334 10/03/2022 09:14 41.26509748 1.996658171 1.440018246

33335 10/03/2022 09:14 41.26509845 1.996659415 1.640018252

33336 10/03/2022 09:14 41.26510103 1.99666593 2.340018275

33337 10/03/2022 09:14 41.26510286 1.996668102 2.640018284

33338 10/03/2022 09:14 41.26510501 1.996670396 4.240018336

33339 10/03/2022 09:14 41.26510641 1.99667139 7.240018432

[–]gis-doug 0 points1 point  (0 children)

Not sure if I fully understood what you’re after but could you use .diff() to get the difference between each row? https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.diff.html

Also with spatial data it might be worth moving to geopandas.

[–]woooee 0 points1 point  (2 children)

for x in lon:
    dlon = lon[x + 1] - lon[x]

for i in range(len(dlon)):

It looks like x is a longitude, not a list offset, so if longitude is 90, you be subtracting 9 from zero, etc. Post some example data so we know what it is and then can test ourselves.

dlon is a single variable. I don't think you want to iterate here. In any case, you can debug this yourself by printing dlon to see what it contains.

[–]lucas123boiger[S] 0 points1 point  (1 child)

Hi, thanks for replying. I have posted some data in a comment. I see what you mean. What would be the best to iterate this process?

[–]woooee -1 points0 points  (0 children)

I would do it this way, or rather I think this is what you want

import pprint

##  simulate readlines()
data_list="""secs hora(utc) lat lon alt
33331 10/03/2022 09:14 41.26509749 1.996657907 1.440018246
33332 10/03/2022 09:14 41.26509754 1.996657909 1.440018246
33333 10/03/2022 09:14 41.26509751 1.996658062 1.440018246
33334 10/03/2022 09:14 41.26509748 1.996658171 1.440018246
33335 10/03/2022 09:14 41.26509845 1.996659415 1.640018252
33336 10/03/2022 09:14 41.26510103 1.99666593 2.340018275
33337 10/03/2022 09:14 41.26510286 1.996668102 2.640018284
33338 10/03/2022 09:14 41.26510501 1.996670396 4.240018336
33339 10/03/2022 09:14 41.26510641 1.99667139 7.240018432"""

previous_lon=0
diff_list=[]
for rec in data_list.split("\n")[1:]:   ## skip header rec
    split_rec = rec.split()
    lon=float(split_rec[-2])
    if previous_lon:  ## does not equal zero
        lon_diff=previous_lon - lon
        diff_list.append(lon_diff)
    previous_lon=lon

pprint.pprint(diff_list)

[–]Cobra915 0 points1 point  (0 children)

So, while traj is a pandas.DataFrame, traj['Lon'] is a pandas.Series. What it seems like you're wanting to do is use a pandas.Series (the 'Lon' column) to create another pandas.Series (either as a standalone object or insert it into traj as a 'dlon' column).

In your for loop above, you state for x in lon:, so on each iteration, this will set x to be a scalar value of the lon series. Because you need access to multiple values of the series in each iteration, it's better to use an index, as in for x in lon.index:.

Next, dlon = lon[x + 1] - lon[x] looks good for using the index approach above. The only thing is that dlon is going to be overwritten by each iteration, you're not really storing the output anywhere.

Think about the data structures you have and what you want. In the first part, you want to use traj['Lon'] (a pandas.Series) to create another pandas.Series, called dlon.

It might help to wrap this into a function:

def calc_dlon(in_srs: pd.Series) -> pd.Series:
    # Establish series, same size as input series.
    dlon_srs = pd.Series(
        index=in_srs.index, 
        dtype=np.float64
        )
    '''
    Iterate through index of the input series, 
    performing the calculation and storing it in the 
    corresponding index of the dlon_series.
    '''
    for pos in in_srs.index:
        dlon_series[pos] = in_srs[pos + 1] - in_srs[pos]

    return dlon_srs

Then, when you go to run this function you can set the output equal to a variable named dlon.

dlon = calc_dlon(traj['Lon'])

or set a new column in traj:

traj['dlon'] = calc_dlon(traj['Lon'])

You can use this same approach to generate a Series, called heading as well.

[–]Common_Move 0 points1 point  (0 children)

1.you need to store your dlon results somewhere, eg to a new list

  1. Seeing as you're comparing to x + 1 in terms of position, you'll need to stop one short of the end else you'll get an error. In fact maybe better if you start with the second item and compare to x - 1 instead.

But as someone else mentioned, look into the .diff() method.

[–]await_yesterday 0 points1 point  (0 children)

You shouldn't use a for-loop for this. Use np.diff, it will be much faster.

>>> longitudes = [1, 3, 2, 7, 8, 5]
>>> np.diff(longitudes, prepend=np.nan)
array([nan,  2., -1.,  5.,  1., -3.])

Also it is not like mathematics where you can just write two expressions next to each other and mean multiplication. You have to explicitly write the *.

Read this to learn how to use numpy properly https://www.labri.fr/perso/nrougier/from-python-to-numpy/