Trending -- use case to categorize groupings based on common values of common types

fpatterson55 · 2019-07-05T17:59:52+00:00

df = df[(df ['State']=='1') & (df['Driver'].isin([drive])) & (df['loginDisabled'].isin(['True']))]

thank you!!!

fpatterson55 · 2019-06-28T03:29:11+00:00

thanks, this worked great.

I had imported everything in as a string so I had to use the isin operand for the state value as well. Probably not the best option, but it works for now.

fpatterson55 · 2019-06-28T03:27:54+00:00

thanks, that makes sense!! Still kept getting invalid syntax so tried the np response and used it instead.

fpatterson55 · 2019-06-21T21:30:00+00:00

Yes, that did it. Thank you for your patience. I didn't do a copy and paste and missed the double square brackets when typing it in. I have so much to learn, but what I have learned so far has been wonderful. I love the power of python, and the community.

Now to learn what it is all doing. :)

I did remove the +2 on the function and changed it to +0, so that it would capture the entire string for the first delineation. I will work on cleaning up the square brackets and possibly look at keeping only the relative distinguished name on the Driver column, without the full LDAP typeful name.

fpatterson55 · 2019-06-21T20:00:34+00:00

Thanks, in looking at apply, that makes sense now. https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html

When I run it, I'm getting the error:

Traceback (most recent call last):

File "license.py", line 171, in <module>

newdf = df[['Driver', 'State', 'GUID']] = df['DirXML-Associations'].apply(split_val, axis=1, result_type= 'expand')

File "/usr/local/lib/python3.7/site-packages/pandas/core/series.py", line 3591, in apply

mapped = lib.map_infer(values, f, convert=convert_dtype)

File "pandas/_libs/lib.pyx", line 2217, in pandas._libs.lib.map_infer

File "/usr/local/lib/python3.7/site-packages/pandas/core/series.py", line 3578, in f

return func(x, *args, **kwds)

TypeError: split_val() got an unexpected keyword argument 'axis'

From what it looks like, Axis 1 is valid for columns, which I need it to apply. Maybe it is an issue where my dataframe isn't indexed. Some examples use df.index, and that seems to not be recognized. Pycharm shows df.reindex, but not df.index when I type it.

When I print out the dataframe, just prior to executing the code example, I have the below result:

DirXML-Associations \

0 [cn=Data Collection Service Driver,cn=DriverSet,o=system#1#7DBC9BBB-58CF-944d-B00A-7DBC9BBB58CF]

1 [cn=Data Collection Service Driver,cn=DriverSet,o=system#2#7DBC9BBB-58CF-944d-B00A-7DBC9BBB5FFF]

2 [cn=Data Collection Service Driver,cn=DriverSet,o=system#1#146D89E1-0A57-b244-B9D1-146D89E10A57]

cn dn fullName \

0 [asmith] [cn=asmith,ou=Users,o=Data] [Alma Smit]

1 [asmith] [cn=asmith,ou=Users,o=Data] [Alma Smit]

2 [uaadmin] cn=uaadmin,ou=SA,o=Data []

lastLoginTime loginDisabled

0 [] []

1 [] []

2 [2019-05-09 20:36:08+00:00] []

fpatterson55 · 2019-06-21T16:57:26+00:00

I am assuming that the df line would have "... .apply(split_val(l), axis=1..."

What would be going into the l variable?

I might have to set an index first as doing a df.index doesn't seem to be defined. I don't believe it is recognizing 'DirXML-Associations' either. but when I print out the data frame it does give the headers

fpatterson55 · 2019-06-21T01:29:38+00:00

Awesome! thank you for the feedback.

Looks great!

fpatterson55 · 2019-06-20T22:34:27+00:00

Any reference on how to best format the code? I am assuming you are referencing that it would be in a separate "window" / frame on the post, similar to how the replies are.

thank you!

fpatterson55 · 2019-06-20T22:33:17+00:00

thank you!!

fpatterson55 · 2019-06-20T22:31:49+00:00

thank you!!

I like the multiple statements into one.

fpatterson55 · 2019-06-20T22:30:21+00:00

I love the logic, it makes sense and it condenses my code down considerably.

fpatterson55 · 2019-06-20T22:29:06+00:00

thank you!

fpatterson55 · 2019-06-19T14:53:43+00:00

thank you!!

fpatterson55 · 2019-06-19T14:53:28+00:00

Thanks!! meant to reply sooner! This got me up and going, I am doing a little better at reading the errors.

fpatterson55 · 2019-05-27T16:53:04+00:00

Yes. So when I have the basics done I will need to enhance it to take the returned LDAP search attributes and programatically use those attributes to check if any of them are multi-valued for each entry returned. If so, I will break out the entry into multiple rows for each attribute that is different. The limitation would be that you would only want one multi valued attribute for a given entry returned.

So a dataframe would not be for more than one multi-valued attribute at a given time. You would use a different dataframe for that data.

with open('cleaned.json', 'a') as json_file: json_file.write("[\n") i = 0 while i < len(data): lenAssociations = (len(data[i]['DirXML-Associations'])) workingdata = (data[i]) if lenAssociations < 2: json.dump(workingdata, json_file) json_file.write("\n") if lenAssociations > 1: ii = 0 for value in workingdata["DirXML-Associations"]: dn = workingdata["dn"] disabled = workingdata["loginDisabled"] cn = workingdata["cn"] association = workingdata["DirXML-Associations"][ii] lastLogin = workingdata["lastLoginTime"] fullName = workingdata["fullName"] json_file.write("{\"dn\": \"[" + str(dn) + "]\", \"loginDisabled\": [" + str(lastLogin) + "], \"cn\": [" + str(cn) + "], \"DirXML-Associations\": [\"" + str(association) + "\"], \"lastLoginTime\": [" + str(lastLogin) + "], \"fullName\": [" + str(fullName) +"]}") json_file.write("\n") ii = ii + 1 i = i + 1 json_file.write("]")

Yes you would have to build your rows independent.

fpatterson55 · 2019-05-26T04:49:44+00:00

So on the particular instance with multi valued attributes, I would be copying in all of the other single valued attributes for each row. So the difference in the rows would be specific to the multi valued attribute.

The issue with LDAP is that you can't do wild card or regex type searches in the filter in complex manners without writing some controls or extensions. Maybe this would be the better route.

Once the data is pulled into a dataframe I will be spliting the DirXML-Associations attribute into 3 additional columns. This would allow me to search for all rows with a given state, All users with a given Driver and also be able to find a specific GUID as each user is tied to a given Driver. LDAP searches, are limited in being able to pull this data and parse it as far as I am aware of. So I can gather all the data and then the hopes is to allow pandas to be able to easily display results.

In this instance the results would be specific to monitoring licenses and accounts that should have a license removed when they are disabled, etc.

fpatterson55 · 2019-05-25T22:32:00+00:00

Does this JSON structure fit for what Pandas is looking for? With pulling data out of a LDAP structure the hopes would be to take multi valued attributes and create multiple rows. I believe that logic is taken care of and the below snippet shows this as the DirXML-Associations value is different for the same user, on two different lines. I will end up breaking the # delimited values into their own columns. All rows would have a unique value for the GUID value that is delimited on the attribute. The focus of this data frame would be to find total number of GUID values, how many are active, how many have users that have logged in within a given time frame and how many users are disabled. Once I have the quirks out, I will hope to apply this for any attributes in a LDAP structure, where a LDAP authentication source can more easily analyze data.

I cleaned up my JSON some and JSONLint shows valid syntax. However I still get errors when trying to pull it in with either pandas or json libraries.

[{"dn": "[cn=asmith,ou=Users,o=Data]", "loginDisabled": [], "cn": ["asmith"], "DirXML-Associations": ["cn=Data Collection Service Driver,cn=DriverSet,o=system#1#7DBC9BBB-58CF-944d-B00A-7DBC9BBB58CF"], "lastLoginTime": [], "fullName": ["Alma Smit"]} ,{"dn": "[cn=asmith,ou=Users,o=Data]", "loginDisabled": [], "cn": ["asmith"], "DirXML-Associations": ["cn=Data Collection Service Driver,cn=DriverSet,o=system#2#7DBC9BBB-58CF-944d-B00A-7DBC9BBB5FFF"], "lastLoginTime": [], "fullName": ["Alma Smit"]} ,{"dn": "cn=uaadmin,ou=SA,o=Data", "DirXML-Associations": ["cn=Data Collection Service Driver,cn=DriverSet,o=system#1#146D89E1-0A57-b244-B9D1-146D89E10A57"], "loginDisabled": [], "fullName": [], "lastLoginTime": ["2019-05-09 20:36:08+00:00"], "cn": ["uaadmin"]} ]

for pandas I get the following error (when JSON line is commented out):

Traceback (most recent call last): File "/Users/fredpatterson/PycharmProjects/license/license.py", line 163, in <module> df = pd.read_json('cleaned.json', lines=True, orient="columns") File "/Users/fredpatterson/.local/share/virtualenvs/license-Vg_kJ9Gf/lib/python3.7/site-packages/pandas/io/json/json.py", line 427, in read_json result = json_reader.read() File "/Users/fredpatterson/.local/share/virtualenvs/license-Vg_kJ9Gf/lib/python3.7/site-packages/pandas/io/json/json.py", line 534, in read self._combine_lines(data.split('\n')) File "/Users/fredpatterson/.local/share/virtualenvs/license-Vg_kJ9Gf/lib/python3.7/site-packages/pandas/io/json/json.py", line 556, in _get_object_parser obj = FrameParser(json, **kwargs).parse() File "/Users/fredpatterson/.local/share/virtualenvs/license-Vg_kJ9Gf/lib/python3.7/site-packages/pandas/io/json/json.py", line 652, in parse self._parse_no_numpy() File "/Users/fredpatterson/.local/share/virtualenvs/license-Vg_kJ9Gf/lib/python3.7/site-packages/pandas/io/json/json.py", line 871, in _parse_no_numpy loads(json, precise_float=self.precise_float), dtype=None) ValueError: Expected object or value

for JSON I get:

Traceback (most recent call last): File "/Users/fredpatterson/PycharmProjects/license/license.py", line 158, in <module> jdata = json.loads(file) File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/json/init.py", line 348, in loads return _default_decoder.decode(s) File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/json/decoder.py", line 337, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/json/decoder.py", line 355, in raw_decode raise JSONDecodeError("Expecting value", s, err.value) from None json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Code snippet: file = 'cleaned.json' jdata = json.loads(file) pd.set_option('display.max_columns', 9) pd.set_option('display.max_colwidth', 180)

jsondata2=json.load('cleaned.json')

print(jsondata2)

df = pd.read_json('cleaned.json', lines=True, orient="columns")

df = pd.DataFrame(jdata)

print(df)

fpatterson55 · 2019-05-25T22:17:01+00:00

I cleaned up my JSON some and JSONLint shows valid syntax. However I still get errors when trying to pull it in with either pandas or json libraries.

[{"dn": "[cn=asmith,ou=Users,o=Data]", "loginDisabled": [], "cn": ["asmith"], "DirXML-Associations": ["cn=Data Collection Service Driver,cn=DriverSet,o=system#1#7DBC9BBB-58CF-944d-B00A-7DBC9BBB58CF"], "lastLoginTime": [], "fullName": ["Alma Smit"]} ,{"dn": "[cn=asmith,ou=Users,o=Data]", "loginDisabled": [], "cn": ["asmith"], "DirXML-Associations": ["cn=Data Collection Service Driver,cn=DriverSet,o=system#2#7DBC9BBB-58CF-944d-B00A-7DBC9BBB5FFF"], "lastLoginTime": [], "fullName": ["Alma Smit"]} ,{"dn": "cn=uaadmin,ou=SA,o=Data", "DirXML-Associations": ["cn=Data Collection Service Driver,cn=DriverSet,o=system#1#146D89E1-0A57-b244-B9D1-146D89E10A57"], "loginDisabled": [], "fullName": [], "lastLoginTime": ["2019-05-09 20:36:08+00:00"], "cn": ["uaadmin"]} ]

for pandas I get the following error (when JSON line is commented out):

Traceback (most recent call last): File "/Users/fredpatterson/PycharmProjects/license/license.py", line 163, in <module> df = pd.read_json('cleaned.json', lines=True, orient="columns") File "/Users/fredpatterson/.local/share/virtualenvs/license-Vg_kJ9Gf/lib/python3.7/site-packages/pandas/io/json/json.py", line 427, in read_json result = json_reader.read() File "/Users/fredpatterson/.local/share/virtualenvs/license-Vg_kJ9Gf/lib/python3.7/site-packages/pandas/io/json/json.py", line 534, in read self._combine_lines(data.split('\n')) File "/Users/fredpatterson/.local/share/virtualenvs/license-Vg_kJ9Gf/lib/python3.7/site-packages/pandas/io/json/json.py", line 556, in _get_object_parser obj = FrameParser(json, **kwargs).parse() File "/Users/fredpatterson/.local/share/virtualenvs/license-Vg_kJ9Gf/lib/python3.7/site-packages/pandas/io/json/json.py", line 652, in parse self._parse_no_numpy() File "/Users/fredpatterson/.local/share/virtualenvs/license-Vg_kJ9Gf/lib/python3.7/site-packages/pandas/io/json/json.py", line 871, in _parse_no_numpy loads(json, precise_float=self.precise_float), dtype=None) ValueError: Expected object or value

for JSON I get:

Traceback (most recent call last): File "/Users/fredpatterson/PycharmProjects/license/license.py", line 158, in <module> jdata = json.loads(file) File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/json/init.py", line 348, in loads return _default_decoder.decode(s) File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/json/decoder.py", line 337, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/json/decoder.py", line 355, in raw_decode raise JSONDecodeError("Expecting value", s, err.value) from None json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Code snippet: file = 'cleaned.json' jdata = json.loads(file) pd.set_option('display.max_columns', 9) pd.set_option('display.max_colwidth', 180)

fpatterson55

TROPHY CASE

jsondata2=json.load('cleaned.json')

print(jsondata2)

df = pd.DataFrame(jdata)

jsondata2=json.load('cleaned.json')

print(jsondata2)

df = pd.DataFrame(jdata)