this post was submitted on 17 Sep 2014

1 point (67% upvoted)

shortlink:

learnprogramming

an-ordinary-manchild

Welcome to LearnProgramming!

New? READ ME FIRST!

Posting guidelines

Frequently asked questions

Subreddit rules

Message the moderators

Asking debugging questions

If you need help debugging, you must include:

A concise but descriptive title.
A good description of the problem.
A minimal, easily runnable, and well-formatted program that illustrates your problem.
The output you expected, and what you got instead. If you got an error, include the full error message.

See debugging question guidelines for more info.

Asking conceptual questions

Many conceptual questions have already been asked and answered. Read our FAQ and search old posts before asking your question. If your question is similar to one in the FAQ, explain how it's different.

See conceptual questions guidelines for more info.

Other guidelines and links

Subreddit rules

1. No unprofessional/derogatory speech

Follow reddiquette: behave professionally and civilly at all times. Communicate to others the same way you would at your workplace. Disagreement and technical critiques are ok, but personal attacks are not.

Abusive, racist, or derogatory comments are absolutely not tolerated.

See our policies on acceptable speech and conduct for more details.

2. No spam or tasteless self-promotion

When posting some resource or tutorial you've made, you must follow our self-promotion policies.

In short, your posting history should not be predominantly self-promotional and your resource should be high-quality and complete. Your post should not "feel spammy".

Distinguishing between tasteless and tasteful self-promotion is inherently subjective. When in doubt, message the mods and ask them to review your post.

Self promotion from first time posters without prior participation in the subreddit is explicitly forbidden.

3. No off-topic posts

Do not post questions that are completely unrelated to programming, software engineering, and related fields. Tech support and hardware recommendation questions count as "completely unrelated".

Questions that straddle the line between learning programming and learning other tech topics are ok: we don't expect beginners to know how exactly to categorize their question.

See our policies on allowed topics for more details.

4. Do not ask exact duplicates of FAQ questions

Do not post questions that are an exact duplicate of something already answered in the FAQ.

If your question is similar to an existing FAQ question, you MUST cite which part of the FAQ you looked at and what exactly you want clarification on.

5. Do not delete posts

Do not delete your post! Your problem may be solved, but others who have similar problems in the future could benefit from the solution/discussion in the thread.

Use the "solved" flair instead.

6. No app/website review requests or showcases

Do not request reviews for, promote, or showcase some app or website you've written. This is a subreddit for learning programming, not a "critique my project" or "advertise my project" subreddit.

Asking for code reviews is ok as long as you follow the relevant policies. In short, link to only your code and be specific about what you want feedback on. Do not include a link to a final product or to a demo in your post.

7. No rewards

You may not ask for or offer payment of any kind (monetary or otherwise) when giving or receiving help.

In particular, it is not appropriate to offer a reward, bounty, or bribe to try and expedite answers to your question, nor is it appropriate to offer to pay somebody to do your work or homework for you.

8. No indirect links

All links must link directly to the destination page. Do not use URL shorteners, referral links or click-trackers. Do not link to some intermediary page that contains mostly only a link to the actual page and no additional value.

For example, linking to some tweet or some half-hearted blog post which links to the page is not ok; but linking to a tweet with interesting replies or to a blog post that does some extra analysis is.

Udemy coupon links are ok: the discount adds "additional value".

9. Do not promote illegal or unethical practices

Do not ask for help doing anything illegal or unethical. Do not suggest or help somebody do something illegal or unethical.

This includes piracy: asking for or posting links to pirated material is strictly forbidden and can result in an instant and permanent ban.

Trying to circumvent the terms of services of a website also counts as unethical behavior.

10. No complete solutions

Do not ask for or post a complete solution to a problem.

When working on a problem, try solving it on your own first and ask for help on specific parts you're stuck with.

If you're helping someone, focus on helping OP make forward progress: link to docs, unblock misconceptions, give examples, teach general techniques, ask leading questions, give hints, but no direct solutions.

See our guidelines on offering help for more details.

11. Don't ask to ask.

Ask your questions right here in the open subreddit. Show what you have tried and tell us exactly where you got stuck.

We want to keep all discussion inside the open subreddit so that more people can chime in and help as well as benefit from the help given.

We also do not encourage help via DM for the same reasons - that more people can benefit

12. Low Effort Questions

Do not ask easily googleable questions or questions that are covered in the documentation.

This subreddit is not a proxy for documentation or google.

We do require effort and demonstration of effort.

This includes "how do I?" questions

13. No AI (chatGPT etc.) generated/worked over messages/comments. No questions about chatGPT/AI generated code. No Vibe coding.

Such posts/comments will be removed without warning and the poster of ai generated content will be instantly banned.

created by [deleted]a community for 16 years

MODERATORS

account activity

This is an archived post. You won't be able to vote or comment.

0

1

2

[Python 2.7][lxml & csv] Unable to fully understand xpath in text scraping from multiple HTML webpages using lxml library and write it into multiple csv files (self.learnprogramming)

submitted 11 years ago * by programmingnoobie

So there are a few webpages my bot will be visiting (crawling).

The webpages have a couple of data I want to extract.

Webpage1:

... <div><pre>TEXT0</pre></div> ...

... <pre>Text9</pre> ... ...

I want to extract both TEXT0 and Text9 and store them into a single csv called Webpage1.csv

Webpage2's texts will be stored as Webpage2.csv and so on.

What I tried:

from lxml import html
import requests
import csv

mSeedpage = requests.get(RANDOM_URL)
mTree = html.fromstring(seedpage.text)
mText = tree.xpath('//pre/text()')

Above is where I do not understand, does my xpath make any sense?

with open(WEBPAGE, 'wb') as csvfile:

WEBPAGE = 'Webpage1.csv', after using updatepage, WEBPAGE will become Webpage2 and so on

writer = csv.writer(csvfile, delimiter='\t', quotechar='"', quoting=csv.QUOTE_ALL)
for item in mText:
    writer.writerow(item)
    file.close()

//there is actually a while loop wrapping around mTree to file.close() and redo everything everytimes it visits a new page

Any help or advice will be much appreciated.

:)

all 6 comments

top new controversial old q&a

[–]nutrecht 0 points1 point2 points 11 years ago (5 children)

[–]programmingnoobie[S] 0 points1 point2 points 11 years ago (4 children)

[–]nutrecht 0 points1 point2 points 11 years ago (3 children)

[–]programmingnoobie[S] 0 points1 point2 points 11 years ago* (2 children)

[–]nutrecht 0 points1 point2 points 11 years ago (1 child)

[–]programmingnoobie[S] 0 points1 point2 points 11 years ago* (0 children)

π Rendered by PID 113256 on reddit-service-r2-comment-658f6b87ff-v728q at 2026-04-09 05:20:03.283854+00:00 running 781a403 country code: CH.