Using grep to search in files

November 5, 2008 – 11:14 pm

I’m moving away from eclipse and more and more into the wonderfull (and cold) world of the command line.

So I need to search for file that contain something in the current directory and its subdirectories all the time

grep -R "something I'm searching for" .

Grep is the way. And with some colors it’s still nicer

grep -R --color "something I'm searching for" .

Now what if I only want files that end with .py that’s a little more subtle

grep -R "something" $(find . -name "*.py")

Now if you want still better you can go for ack which is a grep like tool, only better

ack "something" .

Yeah you don’t have to type the -R it’s the default, also the executable is called ack-grep under debian/ubuntu.

Don’t feel like typing the . that says you’re searching in the current directory ? Add that to your .bashrc

g() {
  if [ $# = 1 ]
  then ack -i $1 .
  else ack -i $@
  fi
}

Now you can search in the current directory with a simple

g something

Don’t forget to replace ack by ack-grep under debian/ubuntu.

Finally installed archlinux

November 5, 2008 – 10:35 pm

That’s it, the new ubuntu was coming and I was thinking, I’ll get gimp 2.6 and tabs in nautilus … But I should wait a few days before updating cause update is always buggy the first days …

And fuck that I wanted to install archlinux anyway (it’s just that I needed to backup my whole computer and felling very lazy about it) so I overcame my laziness and finally got into the wonderfull world of arch pacman and yaourt.

Pacman is the package manager used by arch and it is great, making a new package is really really simple which means that you can make your own package when installing software from source. Also arch has a good number of compiled packages, not as many as debian but quite a lot. And it has a huge number of source packages that can be installed really easily by using yaourt. Yaourt makes installing anything as easy as typing

yaourt anything

Oh and the archlinux wiki is great. Making the install really easy.

Gimp 2.6 is out !

October 1, 2008 – 8:49 pm

I’m using GIMP 2.4 at my job to make a website and I heard a lot of people bad-mouthing it so I expected having a bad time with it, but it’s really usable. And I had no experience of it before, my previous experience of photo editing takes me back 7 years ago in high school and with some Paint Shop Pro.

At work I have Photoshop in a windows within a VirtualBox in my Ubuntu, and … it’s awfully slow. I mean gimp isn’t fast to start up but Photoshop is something else. It takes about 20 seconds to start up.

So having the choice between the two I chose to use Gimp and I’m really happy with it. Can’t wait to try 2.6 and all its new nice features.

We are just like C

September 26, 2008 – 12:06 am

Dear blog, it’s been long since I last talked to you, but I felt like doing it tonight. This is the unfaithful transcription and translation of a chat I had tonight.

me: Math sucks.

mister M: That’s not what you told me 10 minutes ago when talking about the number theory described in GEB.

me: You don’t know when something is beautifull and cool or when it is bloated and ugly and you only learn it that way cause it was discovered that way.

me: You’re not good enough to know.

me: The two cases exist.

Mister M.: Hehe, I think there is a limit to ugliness.

Mister M.: When you reach it, people snap and agree, “ok let’s change that”.

Mister M.: But the limit must be pretty far away.

me: Math are good when you’re able to say: your concepts sucks, they are equivalent to this beautiful concept which is very powerfull, so let’s not talk about your shits anymore.

me: And I agree with you there is a practical limit, if you simplify old concepts your students won’t understand anything about those official old concepts. So for a new concept to work it must be simple and powerfull enough for your students to gain enough time from using it to be able to translate back into the old concepts within that gained time.

me: Actually I read a post about this recently, well not exactly that it was about programming languages, for a new language to work it must solve a problem that is harder that the time you need to learn this new language.

Mister M.: It was saying bad stuff about C I’m sure.

me: No. At all. Hum I don’t remember, but I don’t think so.

Mister M.: I’m just on the defensive.

(Mister M likes the C language a lot, and what I just said could be a good argument to say that C is still here cause it was here first.)

me: Hehe, I understand.

(And I do like to criticize C)

me: Survival of the fittest meme, or maybe not fittest but already in place.

Mister M.: Hehe.

me: It’s the same for animals: they start from a niche and only from there they can spread.

Mister M.: Haha how we blew up all the other species !

me: And now we’re in place they have no chance. We’re just like C.

Mister M.: Clear, C has nothing to do in many places where it is used.

Mister M.: Like us in antactica.

Mister M.: Or in the Arizona desert.

me: Well this deserves a blog entry. And I’m gonna twist all your words …

END

PS

me: How do you translate “Je suis juste sur la défensive” in english ? I’m just on the defence ?

Mister M.: It sounds strange.

Mister M.: WordRefence says on the defensive http://www.wordreference.com/fren/defensive

me: Arg. It sounds french, come on, tell me your sentense in english.

Mister M.: I’m on the defensive.

me: lol, ok fine.

Mister M.: English is french.

Linux distros

July 22, 2008 – 10:35 pm

I found a cool schema that I hadn’t seen before on reddit today: the linux distro timeline.

We notice three main roots Debian the community distro, Slackware the one man distro and Red Hat the company developed distro. I guess everyone know them, nothing to see move along.

What’s interesting ? The other roots that are still alive.

  • Smoothwall: a firewall
  • Engarde: company developed internet services oriented
  • Yoper: supposedly the “fastest out-of-the-box distribution”
  • Pardus: now an interesting one, a distro which rewrote many tools in python including a package manager and an init system. It is easy to use and KDE based. Also it’s developped in Turkey. Great distro.
  • Puppy: a livecd that is so small that the entire operating system and all its applications can be loaded in 256 MB of RAM
  • DeLi: a desktop distribution for old PCs
  • Sorcerer: a source based distribution like Gentoo
  • Gentoo: THE source based distribution, used for its great source based package manage (portage)
  • CRUX: a lightweight, i686-optimized distro targeted at experienced users, gave birth to Arch
  • Rock Linux: a flexible Linux distribution Build Kit
  • Linux from scratch: not really a distro, it’s a book about how to build your own linux distro
  • GoboLinux: let’s reorganize the filesystem and place all programs in one folder and keep things simple and logical, nice ideas, lack packages
  • dine:bolix: no idea and I’m getting tired of this
  • Ark: easy to use desktop distro

What about me ? Well I’m still on the ubuntu that was installed on my dell laptop. I’m planning to install arch linux (in fact I already did it in a virtual machine) which has a really great package manager which makes it really easy to make your own packages. Also I like its minimalistic approach and the idea of not patching the upstream more than necessary, and debian and therefor ubuntu love patches. But dell puts all in one partition, so I can’t install a new distro without backuping all my data and formating my hard disk. So for the time being I’m staying on my nice and working ubuntu.

What are your chances of hitting a fly with a tennis racquet?

July 18, 2008 – 1:09 am

Just one of the problems that I finished solving a few hours ago for google code jam. It’s the first time I tried a programming competition, and it’s a pretty interesting one. Problems are hard but solvable and you always get a few sample inputs and outputs, which is really really helpful. Programming competition often focus on speed of execution and python isn’t a good competitor then (except for a scripting language around C maybe), but I think code jam is a lot more focussed on the algorithm: you’ve got a bad algorithm ? You’ll be too slow anyway. You have a good one ? Your program will be fast enough in python or (even probably) ruby.

I see problems in three categories:

  1. problems complicated to understand or to translate to a program but without algorithmic complexity (if your implementation works it’s usually fast enough) and without too much math, think string manipulation for example, or parsing a grammar
  2. the same but with the complexity based on math, often geometry problems requiring a good knowledge of trigonometry, yeah the hit a fly with a tennis racquet problem is one of them
  3. problems that you can easily make into an inefficient brute force program, but that get hard when trying to solve with an efficient algorithm

I don’t like the first category: you make a program that works with the few examples you have and then if you are lucky you’re finished pretty quickly. But because there are so many cases it will probably break in some situations, and you can’t know which ones, and you can’t really test your program cause you don’t know what the output should really be. To test your program you would need to reimplement it in another way and since you have already done the easier way … well it’s hard.

The second type is … harder when you don’t have (or have forgotten) the math skill … easier if you’re a math nerd. After having solved the math problem implementing the program is usually easy enough.

The third type is fun. You can start by implementing a brute force method quickly, that will obviously be too slow for large inputs, but then you can use this method to compare the result with your optimized method, if your optimized method doesn’t always provide the good result. And I like this category of problems, I like exploring the space of solutions in a smart way. Here is a small class I made to write exploring code quickly, but which is often not efficient enough …

class Explorer(object):
    """The explorer class to explore a finite tree of possibilities.
    The basic usage is
    e = Explorer()
    while e.next():
        person = e.choose(['Linus', 'Theo'])
        if person == 'Linus':
            object = e.choose(['the linux guru.', 'a stupid dickhead.'])
        elif person == 'Theo':
            object = e.choose(['the openbsd guru.', 'a masturbating monkey.'])
        print person, 'is', object
    Which should display:
    Linus is the linux guru.
    Linus is a stupid dickhead.
    Theo is the openbsd guru.
    Theo is a masturbating monkey.
    """
    def __init__(self):
        """Init isn't enough you need to call next after initialising.
        """
        self.current_branch = None
    def next(self):
        """Start a new branch. Return False if it is the end. True if it is not.
        """
        if self.current_branch != None:
            branch = self._next_branch(self.current_branch)
            if branch == None:
                return False
        else:
            branch = []
        self.infinite_branch = self._infinite_branch(branch)
        self.current_branch = []
        return True
    def choose(self, list):
        """Choose an element in a list.
        """
        choice = self.infinite_branch.next()
        self.current_branch.append((choice, len(list) - 1))
        return list[choice]
    def choose_or_not(self, list):
        """Choose an element in a list, or return None.
        """
        choice = self.infinite_branch.next()
        self.current_branch.append((choice, len(list)))
        if choice == 0:
            return None
        return list[choice - 1]
    def _next_increment(self, branch):
        for i, (choice, maximum) in enumerate(reversed(branch)):
            if choice != maximum:
                return len(branch) - i - 1
        return None
    def _next_branch(self, branch):
        position = self._next_increment(branch)
        if position == None:
            return None
        result = branch[:]
        result[position] = result[position][0] + 1, result[position][1]
        for i in range(position+1, len(result)):
            result[i] = (0, result[i][1])
        return result
    def _infinite_branch(self, branch):
        for choice, maximum in branch:
            yield choice
        while True:
            yield 0

Oh yeah I don’t actually answer the question in the title … I don’t really feel like explaning all the problem actually. I’ll just say that I solved it thanks to sage. It’s a great replacement for a TI-89, or matlab, or mapple or mathematica. It’s all of that and more, and in python. The language is actually a very slightly modified python. Here is an example that was helpfull in solving the tennis racquet problem:

sage: var('x')
x
sage: integral(cos(asin(x)))
arcsin(x)/2 + x*sqrt(1 - x^2)/2

Good news everyone ! Futurama: the beast with a billion backs is out

July 13, 2008 – 1:19 am

Futurama: the beast with a billion backs is out in DVD in the US and Canada and illegally on the internet for others, may it lead to the end of civilisation. This is a must watch. Futurama is as witty and fun as ever. You will have the honor to meet Fry new girlfriends (well one isn’t really a girl but I digress), the head of Stephen Hawking with the real synthesised voice of Stephen Hawking, and a great way to settle scientific disputes: Deathball

Bad news everyone. Five more months to wait before the next futurama movie: Bender’s Game.

Smart indentation for python in gedit

July 10, 2008 – 9:18 pm

A few days ago I developed a plugin for gedit that provides smart indentation for python code.

The code is indented when the previous line ends with ‘:’ and un-indented if the previous line starts with ‘return’, ‘pass’, ‘continue’ or ‘break’. This plugin will use your tab configuration for indentation. To respect PEP8 you should set tab width to 4 and choose to insert spaces instead of tabs.

You can download it on the gedit plugins page or with this direct link.

To install it you should in the folder ~/.gnome2/gedit/plugins/ (or create it if it doesn’t exist) and uncompress the tgz here. Then in gedit:

Edit > Preferences > Plugins > Python Indentation

That’s it.

It seems like gedit is starting to get some cool plugins for code edition. For exemple snipplets that allow the quick insertion of complicated code templates as well as moving between various “fields” in this template with tab.

And someone even provided a plugin for python code completion ! Even if there is still some work to do on this plugin this is a great start.

Anyway gedit is starting to get good for editing python. Maybe one day I won’t have to use eclipse at work anymore. I wish. Eclipse has really lots of functionnalities but it’s too big and heavy, and if it breaks (your project files getting corrupted for example) you’re dead, you had everything in it. I prefer serveral programs to one big. And I’m not much into vi or emacs either … that’s why gedit is my favorite for code editing. If I used KDE it would probably be kwrite which seems nice enough too.

Hope for the best, prepare for slashdot

July 8, 2008 – 12:31 am

You have a brand new wordpress blog. How fast is it ? It’s supposed to be fast, isn’t it ? Lighttpd is dead fast, php is fast, mysql is fast.

So let’s try it with ApacheBench:

ab -n 1000 http://libreamoi.com/
This is ApacheBench, Version 2.0.40-dev <$Revision: 1.146 $> apache-2.0
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Copyright 2006 The Apache Software Foundation, http://www.apache.org/
Benchmarking libreamoi.com (be patient)
Completed 100 requests
Completed 200 requests
Completed 300 requests
Completed 400 requests
Completed 500 requests
Completed 600 requests
Completed 700 requests
Completed 800 requests
Completed 900 requests
Finished 1000 requests
Server Software:        lighttpd/1.4.19
Server Hostname:        libreamoi.com
Server Port:            80
Document Path:          /
Document Length:        20504 bytes
Concurrency Level:      1
Time taken for tests:   295.536900 seconds
Complete requests:      1000
Failed requests:        0
Write errors:           0
Total transferred:      20725000 bytes
HTML transferred:       20504000 bytes
Requests per second:    3.38 [#/sec] (mean)
Time per request:       295.537 [ms] (mean)
Time per request:       295.537 [ms] (mean, across all concurrent requests)
Transfer rate:          68.48 [Kbytes/sec] received
Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:   290  294  19.9    293     741
Waiting:      222  225  15.0    224     536
Total:        290  294  19.9    293     741
Percentage of the requests served within a certain time (ms)
  50%    293
  66%    294
  75%    294
  80%    294
  90%    295
  95%    296
  98%    298
  99%    323
 100%    741 (longest request)

Ouch, only 3.38 requests per second I hoped for more ! That’s definitely not slashdot ready !

By the way my server is a Celeron at 2.66 GHz with 1Go of ram so it’s nothing great but not that bad either.

So what if I get slashdotted and get a thousand hit per second? Well the first 100 visitors will see my article all right. And then for about 1 day my site will be unusable while I could have won millions of dollars with adsense. That’s not what you want is it ?

That’s why you need varnish. Varnish is a cool reverse proxy. That will make your blog real fast. How fast ? Well we’ll see after installing it. So remember I have and ubuntu server, which mean apt-get for installing stuff.

$ apt-get install varnish

Run that as root and it will install varnish and run it. Is it finished yet ? No. Lighttpd is on port 80 so varnish can’t have taken it. It didn’t take it cause by default varnish run on port 6081.

You can change that by editing /etc/default/varnish and putting

VARNISH_LISTEN_PORT=80

Now you need to edit the configuration of lighttpd to run on another port, I chose port 81, so uncomment the following line in /etc/lighttpd/lighttpd.conf

server.port               = 81

We now need to edit the configuration file of varnish, varnish uses its own configuration syntax called vcl for varnish configuration language and which is compiled to c. So open /etc/varnish/vcl.conf

# This is a basic vcl.conf file for varnish.
 # Modifying this file should be where you store your modifications to
 # varnish. Settnigs here will override defaults.
backend default {
 set backend.host = "127.0.0.1";
 set backend.port = "80";
 }
sub vcl_recv {
 if (req.request == "POST") {
 pipe;
 }
# force lookup even when cookies are present
 if (req.request == "GET" && req.http.cookie) {
 lookup;
 }
 }
sub vcl_fetch {
 # force minimum ttl of 180 seconds
 if (obj.ttl < 180s) {
 set obj.ttl = 180s;
 }
}

This is the default debian configuration. So first change the backend port to 81 to match the lighttpd port. Then all is ready it will cache all http “GET” requests for 3 minutes. But it’s not perfect. If you keep the configuration like that all pages will be cached by varnish even when logged in as the admin. So it will cache some pages with “Log out” instead of “Log in”, and as the admin you will often see “Log in” when you are already logged in.

We could disable the cache when there is a cookie present, but wordpress puts on some crap cookies even when you are not logged in. So nothing would be cached.

The good solution is to add a little bit of vcl magic

if (req.http.cookie ~ "(comment_)|(wordpress_\w{32}=)") {
        pipe;
}

This small bit of magic need to be added before the “# force lookup even when cookies are present” comment, and it will match the cookie header in the request for the following regular expression “(comment_)|(wordpress_\w{32}=)” and will not cache the request if it matches. It matches if the user has posted a comment or if he is logged in.

By the way I used firecookie to see what cookies are present on web pages.

Now, Restart lighttpd and varnish

/etc/init.d/lighttpd restart && /etc/init.d/varnish restart

Your blog is now cached by varnish and should be much faster. You don’t believe me ? Let’s test it.

ab -n 1000 http://libreamoi.com/
This is ApacheBench, Version 2.0.40-dev <$Revision: 1.146 $> apache-2.0
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Copyright 2006 The Apache Software Foundation, http://www.apache.org/
Benchmarking libreamoi.com (be patient)
Completed 100 requests
Completed 200 requests
Completed 300 requests
Completed 400 requests
Completed 500 requests
Completed 600 requests
Completed 700 requests
Completed 800 requests
Completed 900 requests
Finished 1000 requests
Server Software:        lighttpd/1.4.19
Server Hostname:        libreamoi.com
Server Port:            80
Document Path:          /
Document Length:        20504 bytes
Concurrency Level:      1
Time taken for tests:   0.531340 seconds
Complete requests:      1000
Failed requests:        0
Write errors:           0
Total transferred:      20806000 bytes
HTML transferred:       20504000 bytes
Requests per second:    1882.03 [#/sec] (mean)
Time per request:       0.531 [ms] (mean)
Time per request:       0.531 [ms] (mean, across all concurrent requests)
Transfer rate:          38239.17 [Kbytes/sec] received
Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:     0    0   0.0      0       0
Waiting:        0    0   0.0      0       0
Total:          0    0   0.0      0       0
Percentage of the requests served within a certain time (ms)
  50%      0
  66%      0
  75%      0
  80%      0
  90%      0
  95%      0
  98%      0
  99%      0
 100%      0 (longest request)

That’s a nice performance gain: 1882 requests per second ! That’s more than 500 times faster than before !

That’s probably slashdot proof.

[Edit: thank you Youenn for noticing the apache bench were actually hitting a 302 page ... cause I did it over localhost which was redirected to libreamoi.com so I did them again]

Starting ipython from pdb

July 7, 2008 – 1:52 pm

Well this is not so useful now that ipdb exists. Still ipdb sometimes fights with ipython, the ? of ipython doesn’t work, so it might still be useful from time to time.

For those that are used to ipython the python debugger is often frustrating by its limitations: no completion, no function auto-call, no import completion … That’s why before ipdb existed I often ended up launching ipython from the python debugger.

Of course in ipython you won’t have access to the special commands of the debugger but you’ll have all the nice features of ipython. And you’ll always be able to quit ipython with the Quit command to return to pdb.

So the way to start an embedded ipython from pdb is :

from IPython.Shell import IPShellEmbed
IPShellEmbed([])()
 

But this is kinda long. So if you are a good (lazy) programmer you’ll want to create an ipy.py file in your site-package directory (or any other directory in your python path). This file should contain :

from IPython.Shell import IPShellEmbed
shell = IPShellEmbed([])

That way you can start ipython with :

from ipy import shell; shell()