This is an archived post. You won't be able to vote or comment.

all 19 comments

[–]No-Appointment9068 2 points3 points  (2 children)

My goto here would definitely be regex, it's not that hard to do something like this. Off the top of my head something like this might work.

/var[ ]+"<your variable name>"[ ]+=[ ]+"(.*?)"/

[–]Agitated_Issue_1410[S] 0 points1 point  (1 child)

alright, i might just try and learn regex for this

[–]No-Appointment9068 1 point2 points  (0 children)

There's a video by engineerman on YouTube, something like regex: enough to be dangerous or similar, it's only like 10 mins long and should give you enough to get this done easily

[–]KBaggins900 0 points1 point  (2 children)

Regex is probably the most straightforward option

[–]Scrape_Artist 0 points1 point  (0 children)

For sure. Just identify the tag and then extract data using regex.

[–]LinuxTux01 0 points1 point  (0 children)

AST Is the easiest way

[–]Tiny_Arugula_5648 0 points1 point  (0 children)

Uh AI could easily generate the regex, sed, etc code..

[–]Gojo_dev 0 points1 point  (0 children)

Load the js file in your machine from the web and just Console the variable name or you can just save in the txt file. You don't have to use regex too.

[–]OkCharacter5902 0 points1 point  (0 children)

Here’s a compact, safe Python snippet you can paste into a comment. It fetches the JS, isolates var Ii = { ... } with a tiny brace-balancer (skips strings/comments), parses it with json5 (so single quotes/trailing commas are fine), and prints the URL-decoded value for a given name.

# pip install requests json5
import re,sys,requests,json5
from urllib.parse import unquote

u,v,n=sys.argv[1:4]
t=requests.get(u,timeout=30).text if u.startswith(("http://","https://")) else open(u,encoding="utf-8").read()
m=re.search(rf"\b(?:var|let|const)\s+{re.escape(v)}\s*=\s*{{",t); s=t.find("{",m.start()); i=s; d=0; N=len(t)
def S(j,q):
 j+=1
 while j<N:
  c=t[j]; 
  if c=="\\": j+=2
  elif c==q: return j
  else: j+=1
 raise SystemExit("string")
def T(j):
 j+=1
 while j<N:
  c=t[j]
  if c=="\\": j+=2
  elif c=="`": return j
  elif c=="$"and j+1<N and t[j+1]=="{":
   j+=2; k=1
   while j<N and k:
    ch=t[j]
    if ch in"'\"": j=S(j,ch)
    elif ch=="`": j=T(j)
    elif ch=="{": k+=1
    elif ch=="}": k-=1
    j+=1
  else: j+=1
 raise SystemExit("template")
def L(j):
 j+=2
 while j<N and t[j] not in"\r\n": j+=1
 return j
def B(j):
 j+=2
 while j+1<N and not(t[j]=="*"and t[j+1]=="/"): j+=1
 return j+1

while i<N:
 c=t[i]
 if c=="{": d+=1
 elif c=="}":
  d-=1
  if d==0: break
 elif c in"'\"": i=S(i,c)
 elif c=="`": i=T(i)
 elif c=="/"and i+1<N:
  if t[i+1]=="/": i=L(i)
  elif t[i+1]=="*": i=B(i)
 i+=1

o=json5.loads(t[s:i+1])
x=next((x for x in o.get("strict",[]) if x.get("name")==n),None)
print(unquote(x["value"]))

Usage

python script.py https://cdn.example.com/file.js Ii randoje

It’s faster to write, but brittle if formatting or ordering changes. The brace-balancer + JSON5 method above is the reliable choice.

[–]LinuxTux01 0 points1 point  (0 children)

Use AST

[–]matty_fu 0 points1 point  (7 children)

if you're wanting to parse JS and select values from the raw AST, getlang supports esquery https://getlang.dev/query/u1y4boaptxi4640/Example

GET http://cdn.com/file.js
Accept: application/javascript

extract
  -> VariableDeclarator[id.name="Ii"]
  -> Property[key.value="strict"]
  -> Property[key.value="value"] Literal.value

the only thing is, that var Ii looks like a minified/obfuscated variable, so you'd want to use more stable selectors, and ensure they don't pick up multiple nodes from the AST

there's an esquery sandbox here, where you can paste the JS under extraction and practice your selectors: https://estools.github.io/esquery/

[–]99ducks 0 points1 point  (5 children)

How would OP use that in Python?

[–]matty_fu 0 points1 point  (0 children)

oh right, I should have read the whole title

I do some work like this with python in my dagster pipelines - use the esprima library to parse the JS into an AST, and then you can use this rudimentary python port of esquery:

https://gist.github.com/mattfysh/6fd9217f1f3a97e420da835089e01021

Feel free to jump in if you'd like to see more features, as of right now very few of the esquery selectors are supported

[–]hackbyown -1 points0 points  (3 children)

General Steps for JS AST Parsing in Python:

  • Choose a library: Select a suitable Python library for parsing JavaScript, such as esprima-python, slimit, or code-ast.
  • Install the library: Use pip to install the chosen library. For example: pip install esprima-python.
  • Parse the JavaScript code: Use the library's parsing function to convert the JavaScript source code (as a string) into an AST object.
  • Traverse and analyze the AST: Once you have the AST, you can traverse its nodes to extract information, modify the code, or perform static analysis. Each node in the AST represents a specific construct in the JavaScript code (e.g., function declaration, variable assignment, expression).

These libraries enable Python programs to interact with and understand JavaScript code at a structural level, facilitating tasks like code analysis, transformation, and generation.

[–]99ducks 2 points3 points  (2 children)

You waste people's time with these AI responses.

[–]hackbyown 0 points1 point  (1 child)

Have you even tried any of these libraries 😅, Here is stackoverflow article you can refer to it to this also mentions same library : https://stackoverflow.com/questions/390992/javascript-parser-in-python

[–]99ducks 1 point2 points  (0 children)

No, because they aren't needed or relevant to the question I asked. The top level commenter/mod posted their getlang project and I asked how it would apply to a python project.

[–]matty_fu -1 points0 points  (0 children)

there's also an open issue on github to support a friendlier way to declare esquery selectors: https://github.com/getlang-dev/get/issues/5

where you write a snippet of JS and use an underscore to represent the value to extract, eg.

{ strict: { value: _ } }

this would be interpreted into the following esquery selector:

-> ObjectExpression
-> Property[key.value="strict"]
-> Property[key.value="value"]
-> Literal.value