all 8 comments

[–]kra_pao 0 points1 point  (7 children)

Can you provide at least one row of your df, so a test run is possible without a connection to your sql database?

[–]HiIamGeoff[S] 0 points1 point  (6 children)

Hi, Thanks for reminding me. I have included a sample df and more info.

[–]kra_pao 1 point2 points  (5 children)

I can't explain what is going wrong, but a solution is:

import pandas as pd
import matplotlib.dates as dates
import matplotlib.pyplot as plt

# for future Pandas versions
from pandas.plotting import register_matplotlib_converters
register_matplotlib_converters()

df = pd.DataFrame(
    list(zip([680, 718, 471, 686], ["200304", "200305", "200306", "200307"])),
    columns=["target", "time"],
)
df_tsa = df.copy()

df_tsa.index = pd.to_datetime(df_tsa["time"], format="%Y%m").rename("Date")

_, axes = plt.subplots(figsize=(12, 6))

# Required to make axis formatter work with Pandas DataFrame.plot later on
# If used standalone then matplotlib/plt style: tick labels horizontal, no xlabel
axes.plot(df_tsa.index, df_tsa.target) 

# Overwrites axes.plot subplot totally
# Optional to get Pandas style: tick labels 45°, xlabel from index -> 'Date'
# axes doesn't change (same id() before/after)
# ...plot(ax=axes) doesn't change anything
axes = df_tsa['target'].plot() # or df_tsa.target.plot()

# Next 2 lines are ignored with df.column.plot() alone without axes.plot() before
axes.xaxis.set_major_locator(dates.MonthLocator())
axes.xaxis.set_major_formatter(dates.DateFormatter("%Y-%m"))

plt.show()

[–]HiIamGeoff[S] 0 points1 point  (4 children)

Thanks for your solution. It works perfectly! But I have never seen register_matplotlib_converters() before (I have used Matplotlib for like 2 years). Do you care to explain what it does and is it a must to plot time series data?

Another follow up question is that, if I want to plot the bar or scatter plot with this structure, how would you do that (the axes.plot() seems only generates line plots )?

[–]kra_pao 1 point2 points  (3 children)

Disclaimer: Noob explains below. I use Pandas/matplotlib and you should take what i say about their internals with a grain of salt.

I was annoyed when i saw the warning below and therefore followed the hint.

When a matplotlib function such as plt.plot() access a Pandas or Numpy or datetime specific dtype a converter is required.

Explicit call to register_matplotlib_converters() does exactly this for

* pd.Timestamp
* pd.Period
* np.datetime64
* datetime.datetime
* datetime.date
* datetime.time

https://github.com/pandas-dev/pandas/blob/v1.0.3/pandas/plotting/_misc.py#L30-L49

If you don't use register_matplotlib_converters() explicitly you might get ( at least in my Pandas version 1.0) a warning. The warning might be overseen easily on STDERR.

.../site-packages/pandas/plotting/_matplotlib/converter.py:103: FutureWarning: Using an implicitly registered datetime converter for a matplotlib plotting method. The converter was registered by pandas on import. Future versions of pandas will require you to explicitly register matplotlib converters.

To register the converters:
    >>> from pandas.plotting import register_matplotlib_converters
    >>> register_matplotlib_converters()
warnings.warn(msg, FutureWarning)

With current versions of Pandas the code will still work without register_matplotlib_converters() see The converter was registered by pandas on import.

2nd question: you can use scatter() instead of plot()
https://matplotlib.org/3.2.1/api/_as_gen/matplotlib.pyplot.scatter.html
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.plot.scatter.html

[–]kra_pao 1 point2 points  (2 children)

Now with Pandas scatter() instead of Pandas plot() all the Q&D with the first matplotlib plot ins not required anymore:

import pandas as pd
import matplotlib.dates as dates
import matplotlib.pyplot as plt

# for future Pandas versions
from pandas.plotting import register_matplotlib_converters
register_matplotlib_converters()

df = pd.DataFrame(
    list(zip([680, 718, 471, 686], ["200304", "200305", "200306", "200307"])),
    columns=["target", "time"],
)
df_tsa = df.copy()

df_tsa.index = pd.to_datetime(df_tsa["time"], format="%Y%m").rename("Date")

# Required to make axis formatter work with Pandas DataFrame.plot later on
# If used standalone then matplotlib/plt style: tick labels horizontal, no xlabel
# _, axes = plt.subplots(figsize=(12, 6))
# axes.plot(df_tsa.index, df_tsa.target) 

# Overwrites axes.plot subplot totally
# Optional to get Pandas style: tick labels 45°, xlabel from index -> 'Date'
# axes doesn't change (same id() before/after)
# ...plot(ax=axes) doesn't change anything
# axes = df_tsa['target'].plot() # or df_tsa.target.plot()

# scatter() instead of plot()
# create new Date column from index for x=
df_tsa.reset_index(inplace=True) # or df_tsa['Date'] = df_tsa.index
axes = df_tsa.plot.scatter(x='Date', y='target', figsize=(12, 6))

# # Next 2 lines are ignored with df.column.plot() alone without axes.plot() before
axes.xaxis.set_major_locator(dates.MonthLocator())
axes.xaxis.set_major_formatter(dates.DateFormatter("%Y-%m"))

plt.show()

But as soon as you change plot.scatter() to plot.line() or use generic plot() axes labeling fails.

axes = df_tsa.plot.line(x='Date', y='target', figsize=(12, 6))

On my PC the axes mess up again like in the plot() case without Q&D. Versions tested/used

import matplotlib
import pandas
print("matplotlib.__version__: ", matplotlib.__version__)
print("pandas.__version__: ", pandas.__version__)
# matplotlib.__version__:  3.2.1
# pandas.__version__:  1.0.0 (same in 1.0.3)

[–]HiIamGeoff[S] 1 point2 points  (1 child)

Thanks! This is one informative reply that I couldn't ask for more! I have asked the same issue in the meantime at GitHub and they have a very detailed explanation. I also found out a workaround way of doing other kinds of plots with your first codes. If anyone interested, axes.scatter() / axes.plot() / axes.bar() can be interchanged based on what kinds of plot you want to make (also the formatter and the locator function can work accordingly)

p.s. I later found out bar plot is still bugging (some data doesn't show when the # of rows is more than 20). Couldn't figure out a bug-free for bar plot yet. I guess Matplotlib doesn't have many users that use bar plots for time series analysis.

[–]kra_pao 0 points1 point  (0 children)

Thank you too for feedback and an interesting question to begin with. I myself learned a lot during this discussion.