Skip to main content

XML parsing and Python: the lxml way

Today I was tinkering with an idea involving parsing a Html file by taking an XML structure and building a simple xquery engine in python.

I know...most of my "World changing ideas" finally end up on my unfinished/fragile Github Project category.
But knowing how it's gonna help me in long drive I tried to give it a shot anyway.

And this is when I discovered lxml and immediately fell in love with it.
But rather going to the intricacies I'm gonna share how simple it was for me to pars an xml file with it.

First, we'll make sure we have everything in place.
Things we need
  • lxml
Yeah...that's kinda only thing you'll need :) (and of course Python!!!).
If you are sing Ubuntu or Xubuntu (like me) then just goto synaptics package manager and install it. Or get it from the website if you are gonna tinker in windows.

Now, let's assume that we want to parse a xml file named 'file.xml' with the following content:



Now I want to get all the element_id for my purpose from the file. The following python code serves the purpose:


from lxml import etree

doc = etree.parse('file.xml')
element_list = doc.findall('element')

for store in store_list:
element_id = element.findtext('element_id')


print element_id


Another way would be  


from lxml import etree

doc = etree.parse('file.xml')

for element in doc.getiterator('element'):
element_id = element.findtext('element_id')
print element_id

Popular posts from this blog

Visualizing large scale Uber Movement Data

Last month one of my acquaintances in LinkedIn pointed me to a very interesting dataset. Uber's Movement Dataset. It was fascinating to explore their awesome GUI and to play with the data. However, their UI for exploring the dataset leaves much more to be desired, especially the fact that we always have to specify source and destination to get relevant data and can't play with the whole dataset. Another limitation also was, the dataset doesn't include any time component. Which immediately threw out a lot of things I wanted to explore. When I started looking out if there is another publicly available dataset, I found one at Kaggle. And then quite a few more at Kaggle. But none of them seemed official, and then I found one released by NYC - TLC which looked pretty official and I was hooked.
To explore the data I wanted to try out OmniSci. I recently saw a video of a talk at jupytercon by Randy Zwitch where he goes through a demo of exploring an NYC Cab dataset using OmniSci. A…

HackRice 7.5: How "uFilter" was born

I have a thing for Hackathon. I am a procrastinator. A lazy and procrastinator graduate student, not a nice combination to have. But still when I see hundreds of sharp minds in a room scrabbling over idea, hungry to build and prototype their idea. Bring it to life, it finally pushes me to activity, makes me productive.  That is why I love Hackathon, that is why I love HackRice, our resident Hackathon of Rice University.

TL;DR: if you just want to try the extension, chrome version is here and Firefox version is here.
I have been participating at HackRice since 2014, when I think for the first time it was open for non-rice students, and have been participating ever since. What a roller coaster ride it has been, but that is a story for another day. HackRice 7.5 being the last one I will be able to attend at Rice, it was somewhat special and emotional for me.
HackRice 7.5 was a tad different form the other iterations. For starters it was the first time it was being held in Spring semester…

FirefoxOS, A keyboard and prediction: Story of my first contribution

Returning to my cubical holding a hot cup of coffee and with a head loaded with frustration and panic over a system codebase that I managed to break with no sufficient time to fix it before the next morning. 

This was at IBM, New York where I was interning and working on the TJ Watson project. I returned back to my desk, turned on my dual monitors, started reading some blogs and engaging on Mozilla IRC (a new found and pretty short lived hobby). Just a few days before that, FirefoxOS was launched in India in the form of an Intex phone with a $35 price tag. It was making waves all around, because of its hefty price and poor performance . The OS struggle was showing up in the super low cost hardware. I was personally furious about some of the shortcomings, primarily the keyboard which at that time didn’t support prediction in any language other than English and also did not learn new words. Coincidentally, I came upon Dietrich Ayala in the FirefoxOS IRC channel, who at that time was a P…