Linux distros

July 22, 2008 – 10:35 pm

I found a cool schema that I hadn’t seen before on reddit today: the linux distro timeline.

We notice three main roots Debian the community distro, Slackware the one man distro and Red Hat the company developed distro. I guess everyone know them, nothing to see move along.

What’s interesting ? The other roots that are still alive.

  • Smoothwall: a firewall
  • Engarde: company developed internet services oriented
  • Yoper: supposedly the “fastest out-of-the-box distribution”
  • Pardus: now an interesting one, a distro which rewrote many tools in python including a package manager and an init system. It is easy to use and KDE based. Also it’s developped in Turkey. Great distro.
  • Puppy: a livecd that is so small that the entire operating system and all its applications can be loaded in 256 MB of RAM
  • DeLi: a desktop distribution for old PCs
  • Sorcerer: a source based distribution like Gentoo
  • Gentoo: THE source based distribution, used for its great source based package manage (portage)
  • CRUX: a lightweight, i686-optimized distro targeted at experienced users, gave birth to Arch
  • Rock Linux: a flexible Linux distribution Build Kit
  • Linux from scratch: not really a distro, it’s a book about how to build your own linux distro
  • GoboLinux: let’s reorganize the filesystem and place all programs in one folder and keep things simple and logical, nice ideas, lack packages
  • dine:bolix: no idea and I’m getting tired of this
  • Ark: easy to use desktop distro

What about me ? Well I’m still on the ubuntu that was installed on my dell laptop. I’m planning to install arch linux (in fact I already did it in a virtual machine) which has a really great package manager which makes it really easy to make your own packages. Also I like its minimalistic approach and the idea of not patching the upstream more than necessary, and debian and therefor ubuntu love patches. But dell puts all in one partition, so I can’t install a new distro without backuping all my data and formating my hard disk. So for the time being I’m staying on my nice and working ubuntu.

What are your chances of hitting a fly with a tennis racquet?

July 18, 2008 – 1:09 am

Just one of the problems that I finished solving a few hours ago for google code jam. It’s the first time I tried a programming competition, and it’s a pretty interesting one. Problems are hard but solvable and you always get a few sample inputs and outputs, which is really really helpful. Programming competition often focus on speed of execution and python isn’t a good competitor then (except for a scripting language around C maybe), but I think code jam is a lot more focussed on the algorithm: you’ve got a bad algorithm ? You’ll be too slow anyway. You have a good one ? Your program will be fast enough in python or (even probably) ruby.

I see problems in three categories:

  1. problems complicated to understand or to translate to a program but without algorithmic complexity (if your implementation works it’s usually fast enough) and without too much math, think string manipulation for example, or parsing a grammar
  2. the same but with the complexity based on math, often geometry problems requiring a good knowledge of trigonometry, yeah the hit a fly with a tennis racquet problem is one of them
  3. problems that you can easily make into an inefficient brute force program, but that get hard when trying to solve with an efficient algorithm

I don’t like the first category: you make a program that works with the few examples you have and then if you are lucky you’re finished pretty quickly. But because there are so many cases it will probably break in some situations, and you can’t know which ones, and you can’t really test your program cause you don’t know what the output should really be. To test your program you would need to reimplement it in another way and since you have already done the easier way … well it’s hard.

The second type is … harder when you don’t have (or have forgotten) the math skill … easier if you’re a math nerd. After having solved the math problem implementing the program is usually easy enough.

The third type is fun. You can start by implementing a brute force method quickly, that will obviously be too slow for large inputs, but then you can use this method to compare the result with your optimized method, if your optimized method doesn’t always provide the good result. And I like this category of problems, I like exploring the space of solutions in a smart way. Here is a small class I made to write exploring code quickly, but which is often not efficient enough …

class Explorer(object):
    """The explorer class to explore a finite tree of possibilities.
    The basic usage is
    e = Explorer()
    while e.next():
        person = e.choose(['Linus', 'Theo'])
        if person == 'Linus':
            object = e.choose(['the linux guru.', 'a stupid dickhead.'])
        elif person == 'Theo':
            object = e.choose(['the openbsd guru.', 'a masturbating monkey.'])
        print person, 'is', object
    Which should display:
    Linus is the linux guru.
    Linus is a stupid dickhead.
    Theo is the openbsd guru.
    Theo is a masturbating monkey.
    """
    def __init__(self):
        """Init isn't enough you need to call next after initialising.
        """
        self.current_branch = None
    def next(self):
        """Start a new branch. Return False if it is the end. True if it is not.
        """
        if self.current_branch != None:
            branch = self._next_branch(self.current_branch)
            if branch == None:
                return False
        else:
            branch = []
        self.infinite_branch = self._infinite_branch(branch)
        self.current_branch = []
        return True
    def choose(self, list):
        """Choose an element in a list.
        """
        choice = self.infinite_branch.next()
        self.current_branch.append((choice, len(list) - 1))
        return list[choice]
    def choose_or_not(self, list):
        """Choose an element in a list, or return None.
        """
        choice = self.infinite_branch.next()
        self.current_branch.append((choice, len(list)))
        if choice == 0:
            return None
        return list[choice - 1]
    def _next_increment(self, branch):
        for i, (choice, maximum) in enumerate(reversed(branch)):
            if choice != maximum:
                return len(branch) - i - 1
        return None
    def _next_branch(self, branch):
        position = self._next_increment(branch)
        if position == None:
            return None
        result = branch[:]
        result[position] = result[position][0] + 1, result[position][1]
        for i in range(position+1, len(result)):
            result[i] = (0, result[i][1])
        return result
    def _infinite_branch(self, branch):
        for choice, maximum in branch:
            yield choice
        while True:
            yield 0

Oh yeah I don’t actually answer the question in the title … I don’t really feel like explaning all the problem actually. I’ll just say that I solved it thanks to sage. It’s a great replacement for a TI-89, or matlab, or mapple or mathematica. It’s all of that and more, and in python. The language is actually a very slightly modified python. Here is an example that was helpfull in solving the tennis racquet problem:

sage: var('x')
x
sage: integral(cos(asin(x)))
arcsin(x)/2 + x*sqrt(1 - x^2)/2

Good news everyone ! Futurama: the beast with a billion backs is out

July 13, 2008 – 1:19 am

Futurama: the beast with a billion backs is out in DVD in the US and Canada and illegally on the internet for others, may it lead to the end of civilisation. This is a must watch. Futurama is as witty and fun as ever. You will have the honor to meet Fry new girlfriends (well one isn’t really a girl but I digress), the head of Stephen Hawking with the real synthesised voice of Stephen Hawking, and a great way to settle scientific disputes: Deathball

Bad news everyone. Five more months to wait before the next futurama movie: Bender’s Game.

Smart indentation for python in gedit

July 10, 2008 – 9:18 pm

A few days ago I developed a plugin for gedit that provides smart indentation for python code.

The code is indented when the previous line ends with ‘:’ and un-indented if the previous line starts with ‘return’, ‘pass’, ‘continue’ or ‘break’. This plugin will use your tab configuration for indentation. To respect PEP8 you should set tab width to 4 and choose to insert spaces instead of tabs.

You can download it on the gedit plugins page or with this direct link.

To install it you should in the folder ~/.gnome2/gedit/plugins/ (or create it if it doesn’t exist) and uncompress the tgz here. Then in gedit:

Edit > Preferences > Plugins > Python Indentation

That’s it.

It seems like gedit is starting to get some cool plugins for code edition. For exemple snipplets that allow the quick insertion of complicated code templates as well as moving between various “fields” in this template with tab.

And someone even provided a plugin for python code completion ! Even if there is still some work to do on this plugin this is a great start.

Anyway gedit is starting to get good for editing python. Maybe one day I won’t have to use eclipse at work anymore. I wish. Eclipse has really lots of functionnalities but it’s too big and heavy, and if it breaks (your project files getting corrupted for example) you’re dead, you had everything in it. I prefer serveral programs to one big. And I’m not much into vi or emacs either … that’s why gedit is my favorite for code editing. If I used KDE it would probably be kwrite which seems nice enough too.

Hope for the best, prepare for slashdot

July 8, 2008 – 12:31 am

You have a brand new wordpress blog. How fast is it ? It’s supposed to be fast, isn’t it ? Lighttpd is dead fast, php is fast, mysql is fast.

So let’s try it with ApacheBench:

ab -n 1000 http://libreamoi.com/
This is ApacheBench, Version 2.0.40-dev <$Revision: 1.146 $> apache-2.0
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Copyright 2006 The Apache Software Foundation, http://www.apache.org/
Benchmarking libreamoi.com (be patient)
Completed 100 requests
Completed 200 requests
Completed 300 requests
Completed 400 requests
Completed 500 requests
Completed 600 requests
Completed 700 requests
Completed 800 requests
Completed 900 requests
Finished 1000 requests
Server Software:        lighttpd/1.4.19
Server Hostname:        libreamoi.com
Server Port:            80
Document Path:          /
Document Length:        20504 bytes
Concurrency Level:      1
Time taken for tests:   295.536900 seconds
Complete requests:      1000
Failed requests:        0
Write errors:           0
Total transferred:      20725000 bytes
HTML transferred:       20504000 bytes
Requests per second:    3.38 [#/sec] (mean)
Time per request:       295.537 [ms] (mean)
Time per request:       295.537 [ms] (mean, across all concurrent requests)
Transfer rate:          68.48 [Kbytes/sec] received
Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:   290  294  19.9    293     741
Waiting:      222  225  15.0    224     536
Total:        290  294  19.9    293     741
Percentage of the requests served within a certain time (ms)
  50%    293
  66%    294
  75%    294
  80%    294
  90%    295
  95%    296
  98%    298
  99%    323
 100%    741 (longest request)

Ouch, only 3.38 requests per second I hoped for more ! That’s definitely not slashdot ready !

By the way my server is a Celeron at 2.66 GHz with 1Go of ram so it’s nothing great but not that bad either.

So what if I get slashdotted and get a thousand hit per second? Well the first 100 visitors will see my article all right. And then for about 1 day my site will be unusable while I could have won millions of dollars with adsense. That’s not what you want is it ?

That’s why you need varnish. Varnish is a cool reverse proxy. That will make your blog real fast. How fast ? Well we’ll see after installing it. So remember I have and ubuntu server, which mean apt-get for installing stuff.

$ apt-get install varnish

Run that as root and it will install varnish and run it. Is it finished yet ? No. Lighttpd is on port 80 so varnish can’t have taken it. It didn’t take it cause by default varnish run on port 6081.

You can change that by editing /etc/default/varnish and putting

VARNISH_LISTEN_PORT=80

Now you need to edit the configuration of lighttpd to run on another port, I chose port 81, so uncomment the following line in /etc/lighttpd/lighttpd.conf

server.port               = 81

We now need to edit the configuration file of varnish, varnish uses its own configuration syntax called vcl for varnish configuration language and which is compiled to c. So open /etc/varnish/vcl.conf

# This is a basic vcl.conf file for varnish.
 # Modifying this file should be where you store your modifications to
 # varnish. Settnigs here will override defaults.
backend default {
 set backend.host = "127.0.0.1";
 set backend.port = "80";
 }
sub vcl_recv {
 if (req.request == "POST") {
 pipe;
 }
# force lookup even when cookies are present
 if (req.request == "GET" && req.http.cookie) {
 lookup;
 }
 }
sub vcl_fetch {
 # force minimum ttl of 180 seconds
 if (obj.ttl < 180s) {
 set obj.ttl = 180s;
 }
}

This is the default debian configuration. So first change the backend port to 81 to match the lighttpd port. Then all is ready it will cache all http “GET” requests for 3 minutes. But it’s not perfect. If you keep the configuration like that all pages will be cached by varnish even when logged in as the admin. So it will cache some pages with “Log out” instead of “Log in”, and as the admin you will often see “Log in” when you are already logged in.

We could disable the cache when there is a cookie present, but wordpress puts on some crap cookies even when you are not logged in. So nothing would be cached.

The good solution is to add a little bit of vcl magic

if (req.http.cookie ~ "(comment_)|(wordpress_\w{32}=)") {
        pipe;
}

This small bit of magic need to be added before the “# force lookup even when cookies are present” comment, and it will match the cookie header in the request for the following regular expression “(comment_)|(wordpress_\w{32}=)” and will not cache the request if it matches. It matches if the user has posted a comment or if he is logged in.

By the way I used firecookie to see what cookies are present on web pages.

Now, Restart lighttpd and varnish

/etc/init.d/lighttpd restart && /etc/init.d/varnish restart

Your blog is now cached by varnish and should be much faster. You don’t believe me ? Let’s test it.

ab -n 1000 http://libreamoi.com/
This is ApacheBench, Version 2.0.40-dev <$Revision: 1.146 $> apache-2.0
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Copyright 2006 The Apache Software Foundation, http://www.apache.org/
Benchmarking libreamoi.com (be patient)
Completed 100 requests
Completed 200 requests
Completed 300 requests
Completed 400 requests
Completed 500 requests
Completed 600 requests
Completed 700 requests
Completed 800 requests
Completed 900 requests
Finished 1000 requests
Server Software:        lighttpd/1.4.19
Server Hostname:        libreamoi.com
Server Port:            80
Document Path:          /
Document Length:        20504 bytes
Concurrency Level:      1
Time taken for tests:   0.531340 seconds
Complete requests:      1000
Failed requests:        0
Write errors:           0
Total transferred:      20806000 bytes
HTML transferred:       20504000 bytes
Requests per second:    1882.03 [#/sec] (mean)
Time per request:       0.531 [ms] (mean)
Time per request:       0.531 [ms] (mean, across all concurrent requests)
Transfer rate:          38239.17 [Kbytes/sec] received
Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:     0    0   0.0      0       0
Waiting:        0    0   0.0      0       0
Total:          0    0   0.0      0       0
Percentage of the requests served within a certain time (ms)
  50%      0
  66%      0
  75%      0
  80%      0
  90%      0
  95%      0
  98%      0
  99%      0
 100%      0 (longest request)

That’s a nice performance gain: 1882 requests per second ! That’s more than 500 times faster than before !

That’s probably slashdot proof.

[Edit: thank you Youenn for noticing the apache bench were actually hitting a 302 page ... cause I did it over localhost which was redirected to libreamoi.com so I did them again]

Starting ipython from pdb

July 7, 2008 – 1:52 pm

Well this is not so useful now that ipdb exists. Still ipdb sometimes fights with ipython, the ? of ipython doesn’t work, so it might still be useful from time to time.

For those that are used to ipython the python debugger is often frustrating by its limitations: no completion, no function auto-call, no import completion … That’s why before ipdb existed I often ended up launching ipython from the python debugger.

Of course in ipython you won’t have access to the special commands of the debugger but you’ll have all the nice features of ipython. And you’ll always be able to quit ipython with the Quit command to return to pdb.

So the way to start an embedded ipython from pdb is :

from IPython.Shell import IPShellEmbed
IPShellEmbed([])()
 

But this is kinda long. So if you are a good (lazy) programmer you’ll want to create an ipy.py file in your site-package directory (or any other directory in your python path). This file should contain :

from IPython.Shell import IPShellEmbed
shell = IPShellEmbed([])

That way you can start ipython with :

from ipy import shell; shell()

Installing wordpress and lighttpd on ubuntu

July 7, 2008 – 1:28 am

My blog is finally up ! But it’s a wordpress blog, not one that I made myself in python. Wordpress is great as a tool. Simple enough, functional enough, I wish it was written in python for me to extend. Anyway, let’s forget python for a second and go into how I installed wordpress.

It’s a brand new server that I ordered this evening at ovh for 20€/month and that I could connect to in less than an hour. It has an ubuntu 8.04 server on it. I would have chosen a debian server if it was more up to date … unfortunately it’s not, so ubuntu it was. I could have been tempted by archlinux which I’m probably going to move to on my desktop, but it was not proposed.

Apache is too big and complicated so I chose lighttpd. So now I need to install lighttpd php and mysql to run wordpress. In all the following commands I assume you are the root user.

$ apt-get install lighttpd php5-cgi php5-mysql mysql-server

Let’s add a database for wordpress.

$ mysql -u root -p
mysql> CREATE DATABASE wordpress;
mysql> GRANT ALL PRIVILEGES ON wordpress.* TO "wordpress"@"hostname"
       IDENTIFIED BY "your password";
mysql> FLUSH PRIVILEGES;
mysql> EXIT;

That’s it the wordpress database was created. Oh yeah and put your password instead of your password !

Now let’s download wordpress and put it in /var/www

$ cd /var/www
$ wget http://wordpress.org/latest.tar.gz
$ tar -xzf latest.tar.gz
$ mv wordpress/* .
$ rm -rf wordpress
$ mv wp-config-sample.php wp-config.php

And we need to configure wordpress so that it connects to our mysql database. For that we need to edit the wp-config.php file. You can use vim or nano at you convenience. You can also edit it with gedit if you want: just type ssh://root@yourserverip/var/www in nautilus and you’ll be in your remove folder and you’ll be able to edit wp-config with gedit.

So put the following informations in the file:

define('DB_NAME', 'wordpress');    // The name of the database
define('DB_USER', 'wordpress');     // Your MySQL username
define('DB_PASSWORD', 'your password'); // ...and password

Save it.

Load mod-fascgi for lighttpd.

$ lighty-enable-mod fastcgi

Restart it.

$ /etc/init.d/lighttpd restart

That’s it wordpress is installed. Well everything on the server at least. You now need to connect to your server to finish the installation and configure wordpress.

So go to http://your_domain_name_or_ip/ in your web browser.

One last thing wordpress use the domain name or ip address that you use the first time you connect to it as it’s default address. To change that go in setting in your blog admin.