2007


27
set 07

Hotspot Simulator

One of the systems I work on is VexBox, a multi-architecture Linux distribution for hotspots, it comes with all the softwares needed to serve a user, dhcp server, legitimate ARP Spoof, web server, radius client, etc.

Hotspot Simulator 01

For simple testing purposes it’s ok to open up a notebook, find the SSID, connect and do all your tests. But some times you need more then this, you need more users, more time connected and you might not have all the resources you need.

Hotspot Simulator 02

That’s when I came up with the idea of creating a hotspot simulator. For each client we use an Access Point (AP) running a modified version of VexBox.

Each one of them can be seen as a real user with a Wi-Fi enabled device, they will do exactly the same thing users does, it will navigate, download, idle, roam between APs, disconnect, reconnect and everything you can imagine.

Hotspot Simulator 03

We have projected and implemented a model to handle twenty simultaneous users, they are stored in two piles connected to two switches and one external power source with enough watts with an on/off button.

Hotspot Simulator 04

The main software controls the clients telling them what to do, find the SSID, try to get ip, authenticate, do some traffic, disconnect and so on.

It is written in Python and has a ncurses GUI so you can watch clients activities and control each one of them individually or globally. This way it can run in background or screen for months and can be accessed from any external network via a simple shell.


Hospot Simulator GUI


2
set 07

Why blog comments doesn’t work

I think blog comments aren’t productive, that’s why I don’t allow them. A lot has been said, I’ll just quote what I agree with:

“…to the extent that comments interfere with the natural expression of the unedited voice of an individual, comments may act to make something not a blog…. The cool thing about blogs is that while they may be quiet, and it may be hard to find what you’re looking for, at least you can say what you think without being shouted down. This makes it possible for unpopular ideas to be expressed. And if you know history, the most important ideas often are the unpopular ones…. That’s what’s important about blogs, not that people can comment on your ideas. As long as they can start their own blog, there will be no shortage of places to comment.” Dave Winer ideas about comments on blogs.

Imagem de Amostra do You Tube

“The important thing to notice here is that Dave does not see blog comments as productive to the free exchange of ideas. They are a part of the problem, not the solution. You don’t have a right to post your thoughts at the bottom of someone else’s thoughts. That’s not freedom of expression, that’s an infringement on their freedom of expression. Get your own space, write compelling things, and if your ideas are smart, they’ll be linked to, and Google will notice, and you’ll move up in PageRank, and you’ll have influence and your ideas will have power.”

“When a blog allows comments right below the writer’s post, what you get is a bunch of interesting ideas, carefully constructed, followed by a long spew of noise, filth, and anonymous rubbish that nobody … nobody … would say out loud if they had to take ownership of their words. Look at this innocent post on a real estate blog. By comment #6 you’re already seeing complete noise. By #13 you have someone cursing and saying “go kill yourself.” On a real estate blog. #18 and #23 have launched into a middle eastern nuclear conflageration which continues for 100 posts.” Joel Spolsky


1
set 07

Network Manager with WISPr support

Network Manager

I’ve sent this message to the NetworkManager List, if the idea is well accepted I’ll implement it:

“I would like to add WISPr support to NetworkManager, I would like to hear your comments if it’s an acceptable feature, before I start working on it.

First of all, what is WISPr?

From wikipedia: WISPr or Wireless Internet Service Provider roaming – Pronounced “whisper,” WISPr is A protocol from the Wi-Fi Alliance that allows users to roam between wireless internet service providers in similar fashion to cellphones. A RADIUS server is used to authenticate the subscriber’s credentials.
Clearly speaking, when you have a hotspot with a captive portal where you have to type your username and password, normally you have an alternative form of authentication called wispr, which it’s commonly used by smart clients.

What’s the idea of implementation?

The only way to identify if a network is wispr compliant is after connecting to this network, perform a http request to an non-walled garden URL and it will return a standard XML tag.

This XML contains the url where you need to send your credentials and get automatically connected.

I would like to use the same structure used for security, but let me list all possible forms of networks:

  • WISPr only
  • WISPr + Encryption
  • Encryption Only
  • None

As you can see it’s an optional attribute combined or not with security. I would like to see it as a trigger after successful connection and I would to implement it flexible enough so other implementations could be added in the future, any kind of trigger after connection.

I am thinking about having one more select list called Post-Authentication, values [WISPr, others...], right bellow “Wireless Security”.

Network Manager Post-Authentication

And when selecting WISPr it will draw a username and password text field, everytime connected to a wispr compliant network it would automatically authenticate myself.

If selected a network from the list where I never registered my credentials, I could be prompted any time to type it, just like encryption passwords.

Comments?”


10
ago 07

How to Write a Spelling Corrector

Peter Norvig, Google’s Director of Research, wrote an article explaining how to write a spelling corrector. He wrote using python in 21 lines. After this, many people implemented in other languages, I wrote in C to compare the amount of lines and speed.

I quote some of Norvig’s paragraphs below:

“In the past week, two friends (Dean and Bill) independently told me they were amazed at how Google does spelling correction so well and quickly. Type in a search like [speling] and Google comes back in 0.1 seconds or so with Did you mean: spelling. (Yahoo and Microsoft are similar.) What surprised me is that I thought Dean and Bill, being highly accomplished engineers and mathematicians, would have good intuitions about statistical language processing problems such as spelling correction. But they didn’t, and come to think of it, there’s no reason they should: it was my expectations that were faulty, not their knowledge.

I figured they and many others could benefit from an explanation. The full details of an industrial-strength spell corrector like Google’s would be more confusing than enlightening, but I figured that on the plane flight home, in less than a page of code, I could write a toy spelling corrector that achieves 80 or 90% accuracy at a processing speed of at least 10 words per second.

So here, in 21 lines of Python 2.5 code, is the complete spelling corrector:”

import re, collections
 
def words(text): return re.findall('[a-z]+', text.lower()) 
 
def train(features):
    model = collections.defaultdict(lambda: 1)
    for f in features:
        model[f] += 1
    return model
 
NWORDS = train(words(file('big.txt').read()))
 
alphabet = 'abcdefghijklmnopqrstuvwxyz'
 
def edits1(word):
    n = len(word)
    return set([word[0:i]+word[i+1:] for i in range(n)] +                     # deletion
               [word[0:i]+word[i+1]+word[i]+word[i+2:] for i in range(n-1)] + # transposition
               [word[0:i]+c+word[i+1:] for i in range(n) for c in alphabet] + # alteration
               [word[0:i]+c+word[i:] for i in range(n+1) for c in alphabet])  # insertion
 
def known_edits2(word):
    return set(e2 for e1 in edits1(word) for e2 in edits1(e1) if e2 in NWORDS)
 
def known(words): return set(w for w in words if w in NWORDS)
 
def correct(word):
    candidates = known([word]) or known(edits1(word)) or known_edits2(word) or [word]
    return max(candidates, key=lambda w: NWORDS[w])

And here is my version in C:

/* 
 * spell.c --- spell corrector
 * 
 * Copyright  (C)  2007  Marcelo Toledo <marcelo@marcelotoledo.org>
 * 
 * Version: 1.0
 * Keywords: spell corrector
 * Author: Marcelo Toledo <marcelo@marcelotoledo.org>
 * Maintainer: Marcelo Toledo <marcelo@marcelotoledo.org>
 * URL: http://www.marcelotoledo.org
 * 
 * This program is free software; you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation; either version 2 of the License, or
 * (at your option) any later version.
 * 
 * This program is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU General Public License for more details.
 * 
 * You should have received a copy of the GNU General Public License
 * along with this program; if not, write to the Free Software
 * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
 * 
 * Commentary: 
 * 
 * See http://www.marcelotoledo.org.
 * 
 * Code:
 */
 
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
#include <search.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
 
#define DICTIONARY "./big.txt"
#define DICT_SZ    3000000
 
const char delim[]    = ".,:;`/\"+-_(){}[]<>*&^%$#@!?~/|\\=1234567890 \t\n";
const char alphabet[] = "abcdefghijklmnopqrstuvwxyz";
 
static char *strtolower(char *word)
{
        char *s;
 
        for (s = word; *s; s++)
                *s = tolower(*s);
 
        return word;
}
 
static ENTRY *find(char *word)
{
        ENTRY e;
 
        e.key = word;
        return hsearch(e, FIND);
}
 
static int update(char *word)
{
        ENTRY *e = find(word);
 
        if (!e)
                return 0;
 
        e->data++;
 
        return 1;
}
 
static int read_file(ENTRY dict)
{
        char *file, *word, *w;
        FILE *fp = fopen(DICTIONARY, "r");
        struct stat sb;
 
        if (!fp)
                return 0;
 
        if (stat(DICTIONARY, &sb))
                return 0;
 
        file = malloc(sb.st_size);
        if (!file) {
                fclose(fp);
                return 0;
        }
 
        fread(file, sizeof(char), sb.st_size, fp);
 
        word = strtok(file, delim);
        while(word != NULL) {
                w = strtolower(strdup(word));
 
                if (!update(w)) {
                        dict.key  = w;
                        dict.data = 0;
                        hsearch(dict, ENTER);
                }
 
                word = strtok(NULL, delim);
        }
 
        free(file);
        fclose(fp);
 
        return 1;
}
 
static char *substr(char *str, int offset, int limit)
{
        char *new_str;
        int str_size = strlen(str);
 
        if ((limit > str_size) || ((offset + limit) > str_size) || 
            (str_size < 1) || (limit == 0))
                return NULL;
 
        new_str = malloc(limit+1 * sizeof(char));
        if (!new_str)
                return NULL;
 
        strncpy(new_str, str+offset, limit);
        *(new_str + limit) = '\0';
 
        return new_str;
}
 
static char *concat(char *str1, char *str2)
{
        if (!str1) { 
                str1 = malloc(sizeof(char));
                *str1 = '\0';
        }
 
        if (!str2) { 
                str2 = malloc(sizeof(char));
                *str2 = '\0';
        }
 
        str1 = realloc(str1, strlen(str1) + strlen(str2) + 1);
        return strcat(str1, str2);
}
 
static int deletion(char *word, char **array, int start_idx)
{
        int i, word_len = strlen(word);
 
        for (i = 0; i < word_len; i++)
                array[i + start_idx] = concat(substr(word, 0, i), substr(word, i+1, word_len-(i+1)));
 
        return i;
}
 
static int transposition(char *word, char **array, int start_idx)
{
        int i, word_len = strlen(word);
 
        for (i = 0; i < word_len-1; i++)
                array[i + start_idx] = concat(concat(substr(word, 0, i), 
                                                     substr(word, i+1, 1)), 
                                              concat(substr(word, i, 1), 
                                                     substr(word, i+2, word_len-(i+2))));
 
        return i;
}
 
static int alteration(char *word, char **array, int start_idx)
{
        int i, j, k, word_len = strlen(word);
        char c[2] = { 0, 0 };
 
        for (i = 0, k = 0; i < word_len; i++)
                for (j = 0; j < sizeof(alphabet); j++, k++) {
                        c[0] = alphabet[j];
                        array[start_idx + k] = concat(concat(substr(word, 0, i), (char *) &c), 
                                                      substr(word, i+1, word_len - (i+1)));
                }
 
        return k;
}
 
static int insertion(char *word, char **array, int start_idx)
{
        int i, j, k, word_len = strlen(word);
        char c[2] = { 0, 0 };
 
        for (i = 0, k = 0; i <= word_len; i++)
                for (j = 0; j < sizeof(alphabet); j++, k++) {
                        c[0] = alphabet[j];
                        array[start_idx + k] = concat(concat(substr(word, 0, i), (char *) &c), 
                                                      substr(word, i, word_len - i));
                }
 
        return k;
}
 
static int edits1_rows(char *word)
{
        register int size = strlen(word);
 
        return (size)                          + // deletion
               (size - 1)                      + // transposition
               (size * sizeof(alphabet))       + // alteration
               (size + 1) * sizeof(alphabet);    // insertion
}
 
static char **edits1(char *word)
{
        int next_idx;
        char **array = malloc(edits1_rows(word) * sizeof(char *));
 
        if (!array)
                return NULL;
 
        next_idx  = deletion(word, array, 0);
        next_idx += transposition(word, array, next_idx);
        next_idx += alteration(word, array, next_idx);
        insertion(word, array, next_idx);
 
        return array;
}
 
static int array_exist(char **array, int rows, char *word)
{
        int i;
 
        for (i = 0; i < rows; i++)
                if (!strcmp(array[i], word))
                        return 1;
 
        return 0;
}
 
static char **known_edits2(char **array, int rows, int *e2_rows)
{
        int i, j, res_size, e1_rows;
        char **res = NULL, **e1;
 
        for (i = 0, res_size = 0; i < rows; i++) {
                e1      = edits1(array[i]);
                e1_rows = edits1_rows(array[i]);
 
                for (j = 0; j < e1_rows; j++)
                        if (find(e1[j]) && !array_exist(res, res_size, e1[j])) {
                                res             = realloc(res, sizeof(char *) * (res_size + 1));
                                res[res_size++] = e1[j];
                        }
        }
 
        *e2_rows = res_size;
 
        return res;
}
 
static char *max(char **array, int rows)
{
        char *max_word = NULL;
        int i, max_size = 0;
        ENTRY *e;
 
        for (i = 0; i < rows; i++) {
                e = find(array[i]);
                if (e && ((int) e->data > max_size)) {
                        max_size = (int) e->data;
                        max_word = e->key;
                }
        }
 
        return max_word;
}
 
static void array_cleanup(char **array, int rows)
{
        int i;
 
        for (i = 0; i < rows; i++)
                free(array[i]);
}
 
static char *correct(char *word)
{
        char **e1, **e2, *e1_word, *e2_word, *res_word = word;
        int e1_rows, e2_rows;
 
        if (find(word))
                return word;
 
        e1_rows = edits1_rows(word);
        if (e1_rows) {
                e1      = edits1(word);
                e1_word = max(e1, e1_rows);
 
                if (e1_word) {
                        array_cleanup(e1, e1_rows);
                        free(e1);
                        return e1_word;
                }
        }
 
        e2 = known_edits2(e1, e1_rows, &e2_rows);
        if (e2_rows) {
                e2_word = max(e2, e2_rows);
                if (e2_word)
                        res_word = e2_word;
        }
 
        array_cleanup(e1, e1_rows);
        array_cleanup(e2, e2_rows);
 
        free(e1);
        free(e2);
 
        return res_word;
}
 
int main(int argc, char **argv)
{
        char *corrected_word;
        ENTRY dict;
 
        hcreate(DICT_SZ);
 
        if (!read_file(dict))
                return -1;
 
        corrected_word = correct(argv[1]);
        if (strcmp(corrected_word, argv[1])) {
                printf("Did you mean \"%s\"?\n", corrected_word);
        } else {
                printf("\"%s\" is correct!\n", argv[1]);
        }
 
        return 0;
}

The code was pasted, but you can download it here. You might be asking, where is the 184 lines of code? I used the same metric as Norvig, no blank lines, no main function and reduced as much as possible the extras, but keeping the readability and keeping the same code, see the result here.

“The code defines the function correct, which takes a word as input and returns a likely correction of that word. For example:”

In python:

>>> correct('speling')
'spelling'
>>> correct('korrecter')
'corrector'

In C:

$ ./spell boxng
Did you mean "boxing"?
 
$ ./spell speling
Did you mean "spelling"?

I knew how fast was Norvig’s code, when I first finished mine, I was very impressed with Python simplicity in 21 lines of code and it’s speed, very similar to C, in the beginning. I used the same 6.3MB dictionary for the initial tests:

$ du -sh big.txt
6,3M    big.txt

Python:

$ time python spell.py
spelling
 
real    0m1.911s
user    0m1.340s
sys     0m0.048s

C:

$ time ./spell speling
Did you mean "spelling"?
 
real    0m0.892s
user    0m0.812s
sys     0m0.076s

Result:

C was 1.01 seconds or 114.2% faster.

I really wanted to see how bad it was going to get if I grew up the dictionary. So I did tests with 50MB, 100MB, 168MB and 149MB.

The results using 50MB dictionary:

$ du -sh big.txt
50M     big.txt

Python:

$ time python spell.py
spelling
 
real    0m17.892s
user    0m11.353s
sys     0m0.684s

C:

$ time ./spell speling
Did you mean "spelling"?
 
real    0m6.896s
user    0m6.636s
sys     0m0.244s

Result:

C was 10.99 seconds or 159.4% faster.

The results using 100MB dictionary:

$ du -sh big.txt
100M    big.txt

Python:

$ time python spell.py
spelling
 
real    1m25.579s
user    0m24.262s
sys     0m1.704s

C:

$ time ./spell speling
Did you mean "spelling"?
 
real    0m14.474s
user    0m13.425s
sys     0m0.496s

Result:

C was 1 minute and 11.10 seconds or 491.2% faster.

The results using 168MB dictionary:

$ du -sh huge.txt
168M    huge.txt

Python:

$ time python spell.py
 
Killed

C:

$ time ./spell speling
Did you mean "speling"?
 
real    0m44.627s
user    0m21.689s
sys     0m1.324s

Result:

Couldn't compare, Python process took to much time and was killed by kernel.

Seeing this I tried with a smaller dictionary 149MB:

$ du -sh big.txt
149M    big.txt

Python:

$ time python spell.py
Killed

C:

$ time ./spell speling
Did you mean "spelling"?
 
real    0m24.974s
user    0m19.149s
sys     0m0.852s

Result:

Couldn't compare, Python process took to much time and was killed by kernel.

“Other computer languages:

After I posted this article, various people wrote versions in different programming languages. While the purpose of this article was to show the algorithms, not to highlight Python, the other examples may be interesting for those who like comparing languages:”

Language Lines of Code Author (and link to implementation)
Python 21 Peter Norvig
Haskell 24 Grzegorz
F# 34 Sebastian G
Ruby 38 Brian Adkins
Scheme 45 Shiro
Perl 63 Federico Feroldi
Scheme 89 Jens Axel
Rebol 133 Cyphre
C 184 Marcelo Toledo <-- here we are
Java 372 Dominik Schulz



3
ago 07

How an attack technique can save your life

You work for a company that gives laptop to all employees, if you are not from the technology department, you’ll not have access to the administrator area, this normally means, among many other things, that you’ll have a fixed ip and you can’t change it.

Let’s say your fixed ip is: 192.168.5.10

Now imagine that you need to travel on business and you’ll need to connect in many networks during the travel. The airport hotspot, the client you’ll visit and the hotel.

Probably all networks you’ll visit will have a DHCP server, which would give you an available ip in the their network if your computer had DHCP set, but you have a fixed ip address and you can only use the network 192.168.5.x.

Most part of the client software that does network connection, isn’t flexible enough to switch the configuration if you are in a different network. In this case, in the hotspot, hotel or client, it would do nothing, since your gateway isn’t the same as configured in your machine, of course, unless the gateway of this new network supported ARP Spoof technique.

If those gateways had this feature, it would be totally different, the client would just join the network transparently, without having to modify a line in the laptop’s configuration.

Here is how the magic works:

It’s a man in the middle attack, the gateway will listen to all ARP Requests, whenever a client look for a different gateway, it automatically become this gateway you are looking for if possible, answer with an ARP Reply, that says I am your gateway and here is my MAC Address, from that moment on, the user is able to communicate in the network.

It’s the perfect world, you would never have to change the configuration of your computer and would join all networks without problem.

In the other hand, your work travel could be horrible, most of the networks wouldn’t be accessible and you would come back totally outside of what’s going on, not to think about the amount of unread emails.