Few days ago I started thinking about the scalability limits of the TIP Fast-Flux Tracking module and realized its design was really awful. The approach was based on the idea of assigning a monitoring thread to each fluxy domain. This approach is well suited if the number of threads is quite small but not for what I was just realizing. First of all, when the number of threads starts growing the performance starts decreasing due to the Python Global Interpreter Lock which limits concurrency of a single interpreter process with multiple threads (and there are no improvements in running the process on a multiprocessor system). Moreover, it’s really hard to guarantee each thread enough stack space for running not raising segmentation faults. For these reasons I decided to rewrite the module from scratch and currently I’m testing it. The new design is really simple, effective and scalable and I have to thank Jose Nazario, Marcello Barnaba and Orlando Bassotto for the really interesting talks we had about this matter. Just one process and no monitoring threads. The code is written is such a way not to have blocking calls thus realizing a really asynchronous module. But when a domain starts being monitored there’s the need to access to backend database thus requiring blocking calls. When this happens, the blocking calls are delegated to the Twisted thread pool with a cloned copy of the collected data in order not to compromise code scalability with not necessary locks. Moreover the module is now turning to be a Twisted Application of its own and the first tests done using the Twisted Epoll Reactor are absolutely encouraging. Stay tuned!
Category: Projects
In the last days, the inner workings of TIP changed too much. In fact, as soon as I plugged in the new Spamtrap module, I realized that the core engine was far from perfect. In particular, it was designed when I had no precise idea of the work load it had to face and this forced me to rethink about it from scratch. First of all the new implementation is based on the Twisted Application Framework. Using this infrastructure freed me from having to write a large amount of boilerplate code by hooking the application into existing tools that manage daemonization, logging, choosing a reactor and more. Moreover, TIP is moving towards a component-based architecture by using the interfaces and adapters created by the Zope3 team for developing the submodules. The current implementation scales much better than the previous one because every time a module is scheduled, it runs inside its own subprocess controlled by the twistd master process. This design allows to avoid any kind of memory leakage issue which is exactly the reason why I moved towards a new scheduler design. Each subprocess is independent from the others and the main aim of the master process is to synchronize the subprocesses and free resources when they complete their tasks. Another important change which is worth mentioning is about the Fast Flux tracking module which is now handled as a two-pass subprocess in such a way to free resources as soon as it completes the domain fluxiness classification. Right now the first tests are running. Stay tuned!
I spent my last days working on a subtle bug in TIP which didn’t allow a correct engine rescheduling and thus a correct information sources updating. The bug has gone now but I’m realizing how hard is working always close to the limits of the operating system and the database management system. But it’s a nice challenge to face every day so I think I’ll not stopping having fun for a while! While going crazy in realizing where the bug was I introduced a new interesting feature which lets you discover virtual domains associated to an IP address through a SOAP request to the Windows Live Search. I think that this feature could be quite useful in the company I work for in order to easily handle security incidents. Moreover I spent a good amount of time in creating a comfortable Web 2.0 interface for the daily working. I’m not so cool in Ajax and similar matters but I feel quite satisfied about the result. Keep a look at it!
Today I came back from my Christmas holidays with the precise idea of rewriting the Fast Flux Tracking module from scratch. In fact, in the last days I observed strange behaviors during its working when the number of domains to monitor exceeded a few thousands. A deep investigation of the code revelead to me the sad truth. While using the monitoring threads I forgot cleaning an object related to asynchronous DNS requests at the thread exit. This lead to a great number of unused socket descriptors flying around thus causing the process to quickly hit the limit of the operating system. Three lines of code were added and everything works fine with about 24000 domains monitored right now. Moreover I think few improvements in the module are on the way. Stay tuned!
Eppur si muove!
TIP (Tracking Intelligence Project) is taking its first steps. In my most beautiful dreams, TIP should be an information gathering framework whose purpose is to autonomously collect Internet threat trends. Currently, TIP is closely monitoring information derived from few publicly available blacklists thus identifying malicious domains and networks. To reach its goal, TIP core engine was designed to be totally asynchronous in order to handle common situations where few thousands of running monitoring threads are needed. It’s a nice challenge but something is moving. Have a look at this Fast-Flux Network that TIP is tracking right now (few information are skipped for obvious reasons).
Stay tuned!
Current Datetime: 2008-12-19 12:01:14.890779
Domain: XXXXXX.XX
set([(‘24.99.40.14′, ‘7922’, ‘US’), (‘24.170.188.201′, ‘13343’, ‘US’), (‘65.78.225.126′, ‘15227’, ‘US’), (‘70.249.156.136′, ‘7132’, ‘US’), (‘12.74.195.185′, ‘7018’, ‘US’), (‘68.80.105.44′, ‘33287’, ‘US’), (‘69.212.242.67′, ‘7132’, ‘US’), (‘75.57.204.104′, ‘7132’, ‘US’), (‘24.196.173.208′, ‘20115’, ‘US’), (‘65.102.56.213′, ‘209’, ‘US’), (‘71.84.127.132′, ‘20115’, ‘US’), (‘76.188.63.80′, ‘11060’, ‘US’), (‘70.230.233.165′, ‘7132’, ‘US’), (‘75.134.56.185′, ‘20115’, ‘US’), (‘68.125.30.251′, ‘7132’, ‘US’), (‘70.235.23.96′, ‘7132’, ‘US’), (‘69.183.233.1′, ‘7132’, ‘US’), (‘24.99.40.14′, ‘7725’, ‘US’), (‘65.65.115.103′, ‘7132’, ‘US’), (‘75.75.104.133′, ‘21508’, ‘US’), (‘68.80.105.44′, ‘7922’, ‘US’), (‘76.243.206.63′, ‘7132’, ‘US’), (‘76.31.181.115′, ‘33662’, ‘US’), (‘68.112.81.129′, ‘19115’, ‘US’), (‘76.100.63.146′, ‘7922’, ‘US’), (‘98.200.194.173′, ‘7922’, ‘US’), (‘65.68.29.83′, ‘7132’, ‘US’), (‘69.214.1.18′, ‘7132’, ‘US’), (‘99.4.106.71′, ‘7132’, ‘US’), (‘76.100.166.114′, ‘7922’, ‘US’), (‘70.242.120.139′, ‘7132’, ‘US’), (‘99.147.192.180′, ‘7132’, ‘US’), (‘67.38.1.229′, ‘7132’, ‘US’), (‘24.216.181.139′, ‘20115’, ‘US’), (‘65.78.225.66′, ‘15227’, ‘US’), (‘70.154.82.100′, ‘6389’, ‘US’), (‘99.14.234.37′, ‘7132’, ‘US’), (‘99.185.120.153′, ‘7132’, ‘US’), (‘208.104.118.101′, ‘14615’, ‘US’), (‘74.138.219.230′, ‘36727’, ‘US’), (‘96.28.227.194′, ‘36727’, ‘US’), (‘76.73.237.59′, ‘12083’, ‘US’), (‘70.252.189.177′, ‘7132’, ‘US’), (‘98.209.249.15′, ‘33668’, ‘US’), (‘165.166.236.74′, ‘21766’, ‘US’), (‘75.14.2.240′, ‘7132’, ‘US’), (‘70.255.31.131′, ‘7132’, ‘US’), (‘98.196.113.58′, ‘33662’, ‘US’), (‘67.190.147.1′, ‘33652’, ‘US’), (‘69.66.237.74′, ‘30160’, ‘US’), (‘75.140.65.220′, ‘20115’, ‘US’), (‘70.245.236.32′, ‘7132’, ‘US’), (‘68.92.101.61′, ‘7132’, ‘US’), (‘68.202.88.12′, ‘13343’, ‘US’), (‘64.205.9.114′, ‘4565’, ‘US’), (‘68.249.101.241′, ‘7132’, ‘US’), (‘12.74.196.251′, ‘7018’, ‘US’), (‘76.31.181.115′, ‘7922’, ‘US’), (‘76.100.166.114′, ‘33657’, ‘US’), (‘75.75.104.133′, ‘7922’, ‘US’), (‘98.196.113.58′, ‘7922’, ‘US’), (‘66.168.247.70′, ‘20115’, ‘US’), (‘76.31.18.86′, ‘33662’, ‘US’), (‘173.17.180.79′, ‘6478’, ‘US’), (‘68.88.237.35′, ‘7132’, ‘US’), (‘24.165.123.218′, ‘12262’, ‘US’), (‘66.40.18.206′, ‘11388’, ‘US’), (‘75.57.76.156′, ‘7132’, ‘US’), (‘68.46.94.202′, ‘33287’, ‘US’), (‘67.10.192.229′, ‘11427’, ‘US’), (‘72.81.245.3′, ‘19262’, ‘US’), (‘97.102.118.61′, ‘10994’, ‘US’), (‘66.61.12.107′, ‘11060’, ‘US’), (‘72.29.41.120′, ‘7018’, ‘US’), (‘70.238.63.194′, ‘7132’, ‘US’), (‘99.140.238.111′, ‘7132’, ‘US’), (‘12.174.145.169′, ‘7018’, ‘US’), (‘173.16.99.131′, ‘6478’, ‘US’), (‘68.58.0.197′, ‘33491’, ‘US’), (‘68.120.80.194′, ‘7132’, ‘US’), (‘98.140.114.227′, ‘16810’, ‘US’), (‘72.48.182.104′, ‘7459’, ‘US’), (‘70.143.32.104′, ‘7132’, ‘US’), (‘76.124.170.244′, ‘7922’, ‘US’), (‘24.10.74.199′, ‘33651’, ‘US’), (‘76.123.76.113′, ‘20214’, ‘US’), (‘76.217.109.205′, ‘7132’, ‘US’), (‘76.114.200.211′, ‘33657’, ‘US’), (‘68.114.165.229′, ‘20115’, ‘US’), (‘151.118.181.151′, ‘3909’, ‘US’), (‘98.200.194.173′, ‘33662’, ‘US’), (‘98.21.234.37′, ‘7029’, ‘US’), (‘24.151.161.136′, ‘20115’, ‘US’), (‘64.179.154.169′, ‘20412’, ‘US’), (‘99.149.194.36′, ‘7132’, ‘US’), (‘76.243.199.248′, ‘7132’, ‘US’), (‘76.27.140.172′, ‘7725’, ‘US’), (‘99.150.11.135′, ‘7132’, ‘US’), (‘64.91.14.27′, ‘5668’, ‘US’), (‘165.166.236.74′, ‘2711’, ‘US’), (‘69.14.27.151′, ‘29737’, ‘US’), (‘68.251.37.64′, ‘7132’, ‘US’), (‘68.121.22.131′, ‘7132’, ‘US’), (‘68.122.57.79′, ‘7132’, ‘US’), (‘70.242.25.29′, ‘7132’, ‘US’), (‘76.124.170.244′, ‘33287’, ‘US’), (‘69.176.46.57′, ‘3801’, ‘US’), (‘205.209.232.253′, ‘13693’, ‘US’), (‘99.139.206.54′, ‘7132’, ‘US’), (‘68.117.155.101′, ‘20115’, ‘US’), (‘98.209.249.15′, ‘7922’, ‘US’), (‘76.252.105.50′, ‘7132’, ‘US’), (‘67.197.98.249′, ‘14615’, ‘US’), (‘76.31.18.86′, ‘7922’, ‘US’), (‘76.100.63.146′, ‘33657’, ‘US’)])