Pratyush - A <GeeK>!!!

I attended Pycon India during last couple of days. As always, it felt great to meet so many smart programmers creating amazing things. These are my notes from various workshops, talks and interactions 1.

In an interaction with @Gaurav, he mentioned about Behave library - “it is for behaviour-driven development. It is so intuitive that even a non techy person can write specs for your code.” The concept sounds interesting for writing browser tests with Selenium.

@Shashank gave a workshop on “scraping even the hardest websites.” Though I have used BeautifulSoup 2 before, using Selenium with PhantomJS was a new learning. Another interesting note from the talk was using DeathByCaptcha API for solving captchas on websites.

So now my workflow for scraping websites is something like this:

Traditional GET requests
Requests.
Forms / POST requests
RoboBrowser (to go around CSRF and ASP websites).
Modern single-page-websites
It probably has REST APIs - use them.
For other Javascript heavy websites
Selenium + PhantomJS.

In another workshop, @Anand not only explained us how decorators work, but also made us do lots of exercises. One can go through the notebook here. By the end of the workshop we learned to write our own trace, memorize / cache and routing decorators.

Apart from decorators, I also learned about various magic commands in IPython Notebook:

%%file filename.py
# Python Code
# below it will save the code
# in the given filename

!python filename.py
# using ! we can run bash-commands

%magic
# To see all the available magic commands in IPython
# (found in an interaction with Eswar Vandanapu)

Guys from IIT Bhu talked about using SimpleCV to play flappy-birds using gestures. The library looks quite powerful to do some interesting stuff with cameras and images. I have a few toy-ideas in my mind now 3 - will see how they go.

@Kiran shared the problems he faced while creating LastUser service. I learned about database-backed sessions and about how they differ from cookie based sessions. Kiran also emphasized on using HTTPS by showing how easy it was to hijack sessions using FireSheep on an unsecured network.

Django uses database-backed sessions by default. While digging into the related documentation, I found the solution to some of my problems 4.

Second keynote speaker, @Michael Foord, emphasized on moving the projects from platform-as-a-service (such as Heroku other such hostings) to infrastructure-as-a-service (such as Amazon AWS). The hosting services should be considered as livestock or sheeps instead of as pets. They should be easily migratable and redeployable whenever needed. Docker and JuJu are steps towards that.

@S Anand presented a talk on “faster data processing in python.” He began with basic pieces of codes and then optimized them incrementally - explaining the measurement techniques and conceptual approach towards optimization. The complete talk can be viewed online, while the notebook is available here.

Few important libraries mentioned in the talk were:

line_profiler
Shows the number of hits and time spent on each line of code.
Pandas and Numpy
For performing vector operations.
Numba @jit decorator
Converts a Python function to efficient machine instruction.

During the lightning talks, @Vineet showcased his library Pipdeptree, which displays all the pip dependencies in a tree format. It will surely come quite handy for generating sane requirements.txt.

@Ankur showcased his awesome new project ImportPython.com - a customized Python weekly newsletter. Apart from that, the website lists all the Python books (free as well as paid) categorized by reader’s levels. In future he plans to bring everything else, including videos, presentations etc. related to Python at one place.

Together, this is a long list of new libraries and concepts to try-out and practice. It should keep me busy well until the next PyCon.


  1. Note: This is not a post about everything interesting that happened at the event. There were many other interesting talks and workshops. These are only the things I noted down during the event to implement them in my projects at sometime. 

  2. I prefer to use lxml with PyQuery for parsing and selecting the elements. 

  3. Toy-ideas: 1. Write a script to take images each time I open the lid of laptop. 2. Write a script to automatically filter the best, find duplicates, and fix images from a large album. 

  4. In a database-backed session, a session row is created in the database on each user login. The encrypted id of this row is stored in the user’s cookie. On each is_authenticated() call, this session row is fetched. Thus this can be used to store session data of any size without incurring any additional overhead - example use cases: storing user’s votes, alerts, carts, trackers etc to show “already voted or item present in cart” flags. 

These are few of the bash commands I often find useful:


z: Z allows jumping (cd) into frecently (i.e. frequently and recently) used directories. Insted of doing cd /home/pratyush/websites/project_name each time, I can now simply do z project_name. Z supports tab completion too. Link to Z library.


ps -u <username> -o pid,rss,command | awk '{print $0}{sum+=$2} END {print "Total", sum/1024, "MB"}': When accessing remote servers using shell, I often need to check the list of scripts running and the memory they are consuming. This command does exactly that. It is a sort of minimal task-manager. I found this command here.


php -S localhost:8888 -t .: Though I am not a PHP fan (anymore), this command serves the current directory using a live php server. This comes handy to tryout a local wordpress installation by just extracting the package.

python -m SimpleHTTPServer 8080 is a Python alternative for starting a live server from current directory to serve the files.


du file/path -chs: du is the disk-utility command. This command shows the total size of any directory. -c is for total, -h is for showing humanize file sizes and -s for showing only the summary. I often use this command on web-servers to find the size of file-system caches.


ssh-copy-id [email protected]: This appends the public key to the remote-host for password-less ssh logins.


mysqldump -u username -p --all-databases > alldbs.sql: For creating a backup of all the mysql databases.

mysql -u username -p < alldbs.sql: For restoring all the databases from the dump.

Both of the these mysql commands are high on performance. These come handy for creating a backup snapshots of databases.

I found these two commands here.


ssh -D 31500 [email protected]: This turns the SSH client into a SOCKS proxy server. It provides me a VPN on the fly. So if a website refuses to open, or is restricted to a particular country, I run this command and then update the proxy settings in Firefox as below:

1. Enable proxy in firefox.
2. Enter "127.0.0.1" for "SOCKS Host"
3. Enter "31500" (or whatever port we chose) for Port.

Full documentation for this trick is available here.


howdoi: HowDoI provides answers to programming questions from command line. Thus instead of opening a browser and getting distracted in web, I can now simply type in something like howdoi convert csv to namedtuple to get the leads.

I usually pass the -ac arguments: -a provides the full text of the answer, -c enables colorized output.


ab -n 100 -c 10 http://www.example.com: AB is a poor man’s website performance benchmarking tool. -c specifies the number of concurrent requests and -n specifies the total number of requests to be sent to a webpage. This comes handy while migrating websites to new servers or when making significant frontend changes.


wget "url" -c: WGET is for downloading files from command-line. I often find the download speeds significantly different in browser’s built-in download managers and wget. -c enables the resume support.


There are various other must-know bash commands such as awk, uniq, head which are super-useful in daily work. Akshay has covered them in a brief tutorial here.

These are the websites I use the most based on my browser’s frecency:

This post is inspired from S Anand’s blogpost.

  • A is for amazon.in. Though I don’t shop too much, I usually use Amazon to browse though books and read their initial pages.
  • B is bseindia.com - mostly for checking current stock prices and for keeping a track of latest corporate announcements.
  • C shows one of my favorite tech blogs - codinghorror.com. Recently, have been visiting it more frequently to check the blog’s integration with Discourse. It is followed by cricket.yahoo.com, which I use to keep a track of latest scores.
  • D is docs.google.com. I love to use the live sharing and commenting feature in Google Docs to collaborate on articles, letters and reports. For other writings, I prefer to use gVim.
  • E is english.stackexchange.com. I recently read Strunk and White’s The Elements of Style; have been using this StackExchange to follow up with few of the rules mentioned in the book.
  • F is (obviously) facebook.com. The runner-up is feedly.com - rss reader to keep a track of my favorite blogs. In future I would like to see Feedly climb above :).
  • G is github.com - I love open-source projects.
  • H is hootsuite.com - used to manage multiple Twitter accounts including @faltoo and @screener_in.
  • I is irctc.co.in - for booking all the train tickets. It is followed by imdb.com for checking all the movie recommendations.
  • J shows one of my favorite financial blogs janav.wordpress.com. It is followed by jpegmini.com, which I use to compress images in blog posts.
  • K is kickass.to - for movies and music.
  • L is localhost - a playground for all my development projects.
  • M is (again obviously) mail.google.com.
  • N is news.ycominator.com - I use it to stay updated with the latest programing developments. Lately, I have been using more of hckrnews.com for its better interface.
  • O is openshift.redhat.com - for deploying new test projects for free.
  • P is play.fully-faltoo.com - a custom domain used for testing different internal projects at different times. It is followed by pixlr.com/express/, used for quickly fixing the photographs.
  • Q is Quora. I am not a huge fan of Quora but it is the only one with Q.
  • R is Readiot.com - a side-project I have been working upon. It is like Evernote but saves everything in your DropBox.
  • S is for screener.in - the project that takes most of my time :). It is followed by stackoverflow.com - my favorite resource to get and provide help.
  • T is twitter.com - I like to keep track of latest updates through Twitter rather than watching news.
  • U is ubuntu.com, an old and the only item in U. Now I like to use Arch.
  • V is valuepickr.com, my favorite forum for in-depth discussions on stocks.
  • W is webfaction.com - my webhost for all the production deployments.
  • X is xiaomi.eu - an old one time item when I used MIUI Rom on my Nexus S.
  • Y is youtube.com - I love to hear music and watch my favorite channels online.
  • Z has no items.

I celebrated the new year on a vacation with a group of friends. One of the best time-pass was the group game of Werewolf.

Be it a bus journey, waiting at a stop or a late night hangout on a beach, we played Werewolf everywhere. It required only some chits of paper (though we used a playing deck as a better substitute).

It requires minimum 7 players and is best experienced with 10-12 players.

I would highly recommend it to anyone who has a large group, scope to shout, and time to go through the rulebook.

Dhoni's shot in world cup final 2011

Of all the sportsmen, I admire MS Dhoni and Brian Lara the most. Whether India wins or not, I try to hear his words and learn from his temperament. Found this wonderful interview of Dhoni where he talks about the role of mind, and gives advice on how to be a better player. These advice apply as well to other things as to cricket.

What advice would you give to a young player who wants to improve his game?

First and foremost, I would tell him that he must love and enjoy his sport. If he does not enjoy it he would not learn to play the game as quickly or as well as he should. Second, I would tell him to keep things simple. The more he complicates the process the harder it will be for him to improve his game. For example, when he tells himself to watch the ball and play it on its merits, he might have other thoughts like scoring runs or not getting out in his mind.

Those thoughts can break his concentration and prevent him from watching the ball. If he knows that the bowler can bowl an out-swinger, an in-swinger and a good bouncer as well, he has three other things to think about. But the more he thinks about what the bowler might do the more complex and difficult batting becomes.

Third, I would tell him to capitalise on his strengths, improve his weaknesses and recognise his limitations.

A lot of people talk about the problems players face when they have to play in conditions that are foreign to them. For instance, when Indian batsmen who are brought up on slow flat wickets have to play on the fast and bouncy wickets in Australia and South Africa.

When I go to Australia or South Africa I try to be positive and see the visit as a challenge and an opportunity to explore, learn and improve my game. I try not to be negative or worry about the pace and bounce of the wickets or the things that could possibly go wrong.

Learning and improvement take time. When you leave nursery school you don’t expect to go straight into a graduate school. In the following years you slowly improve as a student and when you reach a certain standard you graduate and afterwards go on to higher levels. The same thing happens in sport.

The player should therefore be patient and persistent and he should keep things simple and enjoy his sport. Not only should he enjoy his own performance on the field but he should also get pleasure from sharing his experiences with other players and from creating an atmosphere that helps the guy sitting next to him in the dressing room to perform better.

This is one area where the Indian team is very blessed. The senior players in our team have helped the younger players to learn, develop and perform better. Your individual performance is important but how much better you help your teammate to play better is equally important.

This was just one of the passages from the complete interview. Would try to go through the interview again and again at regular intervals.

I recently moved from Dell Studio to Lenovo’s Thinkpad Edge E430 laptop. It is an all-Intel machine and the drivers support is quite good. Initially the touchpad felt over sensitive as the cursor jumped a few pixels on finger lifts. Luckily, it is a common touchpad issue on Lenovo machines and easily fixable.

Create the following config file in /usr/share/X11/xorg.conf.d/50-thinkpad-touchpad.conf (create xorg.conf.d directory if it doesn’t exist):

Section "InputClass"
        Identifier "touchpad"
        MatchProduct "SynPS/2 Synaptics TouchPad"
        Driver "synaptics"
        # fix touchpad resolution
        Option "VertResolution" "100"
        Option "HorizResolution" "65"
        # increment noise cancellation factor
        Option "HorizHysteresis" "50"
        Option "VertHysteresis" "50"
EndSection

Update for ArchLinux: file path /etc/X11/xorg.conf.d/50-thinkpad-touchpad.conf

Section "InputClass"
        Identifier "touchpad"
        MatchProduct "SynPS/2 Synaptics TouchPad"
        Driver "synaptics"
        # fix touchpad resolution
        Option "VertResolution" "100"
        Option "HorizResolution" "65"
        # disable synaptics driver pointer acceleration
        Option "MinSpeed" "1"
        Option "MaxSpeed" "1"
        # tweak the X-server pointer acceleration
        Option "AccelerationProfile" "2"
        Option "AdaptiveDeceleration" "16"
        Option "ConstantDeceleration" "16"
        Option "VelocityScale" "128"
EndSection

Different Lenovo laptops may require different tweaking. Above works best for Edge series. Other configs are available here.

This method copies the PIP packages locally and uses them for faster installation when starting new projects.

First, download the archives that fulfill your requirements:

$ pip install --download <DIR> -r requirements.txt

Then, install using –find-links and –no-index:

$ pip install --no-index --find-links=<DIR> -r requirements.txt

Page 1 / 51 Older posts »