Notes from Pycon India 2014

Python India Conference Image credits: @Mukesh

I attended Pycon India during last couple of days. As always, it felt great to meet so many smart programmers creating amazing things. These are my notes from various workshops, talks and interactions [^Note].

In an interaction with @Gaurav, he mentioned about Behave library - “it is for behaviour-driven development. It is so intuitive that even a non techy person can write specs for your code.” The concept sounds interesting for writing browser tests with Selenium.

@Shashank gave a workshop on “scraping even the hardest websites.” Though I have used BeautifulSoup [^Soup] before, using Selenium with PhantomJS was a new learning. Another interesting note from the talk was using DeathByCaptcha API for solving captchas on websites.

So now my workflow for scraping websites is something like this:

Traditional GET requests

<dt>Forms / POST requests</dt>
<dd><a href="">RoboBrowser</a>
(to go around CSRF and ASP websites).</dd>

<dt>Modern single-page-websites</dt>
<dd>It probably has REST APIs - use them.</dd>

<dt>For other Javascript heavy websites</dt>
<dd>Selenium + PhantomJS.</dd>

In another workshop, @Anand not only explained us how decorators work, but also made us do lots of exercises. One can go through the notebook here. By the end of the workshop we learned to write our own trace, memorize / cache and routing decorators.

Apart from decorators, I also learned about various magic commands in IPython Notebook:

 # Python Code
 # below it will save the code
 # in the given filename

!python # using ! we can run bash-commands

%magic # To see all the available magic commands in IPython # (found in an interaction with Eswar Vandanapu)

@Kushal gave a keynote on the importance of contributing upstream. By going through the source codes of our favorite programs, we can not only become a better programmer but also get an opportunity to contribute to them. I pledge to have a closer look at the source codes of two of my favorite Python packages - Rhythmbox and Django.

Guys from IIT Bhu talked about using SimpleCV to play flappy-birds using gestures. The library looks quite powerful to do some interesting stuff with cameras and images. I have a few toy-ideas in my mind now ^SimpleCV Ideas - will see how they go.

@Kiran shared the problems he faced while creating LastUser service. I learned about database-backed sessions and about how they differ from cookie based sessions. Kiran also emphasized on using HTTPS by showing how easy it was to hijack sessions using FireSheep on an unsecured network.

Django uses database-backed sessions by default. While digging into the related documentation, I found the solution to some of my problems [^sessions].

Second keynote speaker, @Michael Foord, emphasized on moving the projects from platform-as-a-service (such as Heroku other such hostings) to infrastructure-as-a-service (such as Amazon AWS). A hosting service should be considered as livestock or sheep instead of as pet. It should be easily migratable and redeployable whenever needed. Docker and JuJu are steps towards that.

@S Anand presented a talk on “faster data processing in python.” He began with basic pieces of codes and then optimized them incrementally - explaining the measurement techniques and conceptual approach towards optimization. The complete talk can be viewed online, while the notebook is available here.

Few important libraries mentioned in the talk were:

Shows the number of hits and time spent on each line of code.

<dt>Pandas and Numpy</dt>
<dd>For performing vector operations.</dd>

<dt><a href="">Numba</a> @jit decorator</dt>
<dd>Converts a Python function to efficient machine instruction.</dd>

During the lightning talks, @Vineet showcased his library Pipdeptree, which displays all the pip dependencies in a tree format. It will surely come quite handy for generating sane requirements.txt.

@Ankur showcased his awesome new project - a customized Python weekly newsletter. Apart from that, the website lists all the Python books (free as well as paid) categorized by reader’s levels. In future he plans to bring everything else, including videos, presentations etc. related to Python at one place.

Together, this is a long list of new libraries and concepts to try-out and practice. It should keep me busy well until the next PyCon.

[^Note]: Note: This is not a post about everything interesting that happened at the event. There were many other interesting talks and workshops. These are only the things I noted down during the event to implement them in my projects at sometime.

[^Soup]: I prefer to use lxml with PyQuery for parsing and selecting the elements.

  1. Write a script to take images each time I open the lid of laptop.
  2. Write a script to automatically filter the best, find duplicates, and fix images from a large album.

[^sessions]: In a database-backed session, a session row is created in the database on each user login. The encrypted id of this row is stored in the user’s cookie. On each is_authenticated() call, this session row is fetched. Thus this can be used to store session data of any size without incurring any additional overhead - example use cases: storing user’s votes, alerts, carts, trackers etc to show “already voted or item present in cart” flags.

Pratyush Mittal @pratyush