Skip to main content

XML parsing and Python: the lxml way

Today I was tinkering with an idea involving parsing a Html file by taking an XML structure and building a simple xquery engine in python.

I know...most of my "World changing ideas" finally end up on my unfinished/fragile Github Project category.
But knowing how it's gonna help me in long drive I tried to give it a shot anyway.

And this is when I discovered lxml and immediately fell in love with it.
But rather going to the intricacies I'm gonna share how simple it was for me to pars an xml file with it.

First, we'll make sure we have everything in place.
Things we need
  • lxml
Yeah...that's kinda only thing you'll need :) (and of course Python!!!).
If you are sing Ubuntu or Xubuntu (like me) then just goto synaptics package manager and install it. Or get it from the website if you are gonna tinker in windows.

Now, let's assume that we want to parse a xml file named 'file.xml' with the following content:



Now I want to get all the element_id for my purpose from the file. The following python code serves the purpose:


from lxml import etree

doc = etree.parse('file.xml')
element_list = doc.findall('element')

for store in store_list:
element_id = element.findtext('element_id')


print element_id


Another way would be  


from lxml import etree

doc = etree.parse('file.xml')

for element in doc.getiterator('element'):
element_id = element.findtext('element_id')
print element_id

Comments

Popular posts from this blog

HackRice 7.5: How "uFilter" was born

I have a thing for Hackathon. I am a procrastinator. A lazy and procrastinator graduate student, not a nice combination to have. But still when I see hundreds of sharp minds in a room scrabbling over idea, hungry to build and prototype their idea. Bring it to life, it finally pushes me to activity, makes me productive.  That is why I love Hackathon, that is why I love HackRice, our resident Hackathon of Rice University.

TL;DR: if you just want to try the extension, chrome version is here and Firefox version is here.
I have been participating at HackRice since 2014, when I think for the first time it was open for non-rice students, and have been participating ever since. What a roller coaster ride it has been, but that is a story for another day. HackRice 7.5 being the last one I will be able to attend at Rice, it was somewhat special and emotional for me.
HackRice 7.5 was a tad different form the other iterations. For starters it was the first time it was being held in Spring semester…

Story of a Drupal theme mis-configuration, Hacking and Ministry of Defense India

If you have been following news or were online for past couple of hours you might have noticed this news making a tweet-storm and appearing all over your timeline regarding how India's Ministry of Defense website got hacked (allegedly by 'Chinese' origin).
Almost all the big media outlets covered it. Including
* Youtube : TimesNow * Times Now * Hindustan Times * NDTV
* Business Standard * Times of India An example of the coverage

Fueled by our own famous ministers chiming in with their own ideas

Action is initiated after the hacking of MoD website ( https://t.co/7aEc779N2b ). The website shall be restored shortly. Needless to say, every possible step required to prevent any such eventuality in the future will be taken. @DefenceMinIndia@PIB_India@PIBHindi — Nirmala Sitharaman (@nsitharaman) April 6, 2018
It all seemed for the fact that the homepage of the websites showed this image with a Chinese character
And though most of india's government portals and websites aren'…

LinuxCon China 2017: Trip Report

Linux Foundation held a combination of three events in China as part of their foray into Asia early this year. It was a big move for them since this was supposed to be the first time Linux Foundation would hold an event in Asia. I was invited to present a talk on Hardening IoT endpoints. The event was held in Beijing, and since I have never been to Beijing before I was pretty excited for the talk. However, it turned out the journey is pretty long and expensive. Much more than a student like me can hope to bear. Normally I represent Mozilla in such situations, but the topic of the talk was too much into security and not aligned much with the goals of Mozilla at that moment. Fortunately, Linux Foundation gave me a Scholarship to come and speak at LinuxCon China which enabled me to attend LinuxCon and the awesome team at Mozilla TechSpeakers including Michael Ellis and Havi helped me get ready for the talk.

The event was held at China National Convention Center. It's a beautiful and …