Customer: one of the leading system integrators in IT and communication in Russia.Task: was to create a software development kit (SDK) for full text search.One of the Customer’s key requirements was to develop the SDK from scratch, without using any third-party tools or libraries.Solution: The solution has the following features:- Supports various character sets for the input text (UTF8, UTF16, Win1251, Dos866, Vietnamese and Chinese languages;- Supports various search types (exact, fuzzy, morphological, semantic, search by template);- Supports multiple languages (Russian, English, German, French, Spanish, Hungarian)- Processing speed is 1,5 Mb/second- Supports various popular text document formats (MSOffice, OpenOffice, PDF, RTF etc.).
Technologies: С++, Python, AWK, MVS, SVN
Customer: world’s leading anti-spam service provider.
Task: The Customer requested our team to enhance and support its product
Solution: The RELEX team designed and developed the low level (kernel) of the incoming e-mail analysis system. Based on the kernel was created a new email scanner. To improve the filtering quality, the following functions were implemented in the system:
- Message language auto-detection (without impairing the performance),
- Natural language parsing
- Collecting statistics based on the parser results.
These functional modules became the basis for an artificial intelligence system and containers for storing and accessing the accumulated statistics. Our team successfully resolved the task of filtering messages in Eastern languages (Chinese, Korean, Japanese etc.). The final application, besides the high email filtering quality and additional services, can process over 500 messages per second with full content analysis in the integrated target system.It also uses traditional methods of spam traffic limitation, like filtering the incoming TCP/IP traffic by the incoming IP address, which helps narrowing down the channel for unknown or suspicious traffic sources.Along with spam filtering, the system can identify messages of some specific categories, such as phishing, mail/safe-list and bounce/backscatter auto-generated.