BlueBox
What is BlueBox?
Bluebox is a project carried out after finding that the use of independent search engine on websites was not always easy and supported over the long term. This search engine is easy to integrate, administer and replace if necessary.
To better understand the project, but also to achieve the goals, a slideshow has been realized to detail all its aspects.
For which use BlueBox has been developed?
Bluebox is intended to be used on sites of small / medium and large sizes (conclusive tests have already been carried out over 1,500,000 pages scanned) who want to keep their independence with data. Data scanned by Bluebox is stored in the MySQL server that the user defined, contrary to all third-party solutions that often consist of excluding the Internet except your website for your search. Bluebox can scan all web pages, including an intranet and password-protected pages.
For which use BlueBox has not been developed?
Bluebox does not and will never replace an evolved search engine like Google, Altavista, Yahoo, Live, Exalead, ... It does not index very quickly a large quantity of pages or run very complex queries at short notice.
BlueBox's benefits
Bluebox has the following advantages:
- It is open source, freely editable and freely distributable
- It is very easy to install
- It requires no specific account. Only an FTP account and a MySQL are needed
- It uses dominant technology on the Internet (PHP5 - MySQL5)
- It is very fast and optimized
- It is capable of excluding portions of pages with xHTML comments
- It is fully configurable.
- It is developed in object-oriented programming, making it very concise integration and very understandable.
- It is capable to index sites as seperate and independant sites.
- It is not based on file extensions to determine their nature, but on their MIME/Type.
- It is able to search by special criteria (URL, vars GET, POST vars ...)
BlueBox's disadvantages
BlueBox disadvantages are the following:
- Currently, Bluebox, due to its philosophy, has no indexing engine other than PHP interpreted script. As a result, indexing, though effective, can still be optimized for users with more rights than the strict necessary to install Bluebox.
Planned improvements
- Ability to read and index META of the following files:
- Adobe PDF
- JPEG
- GIF
- PNG
- Microsoft Office documents
- iWork documents
- OpenOffice documents
- ...
- Ability to read and index content of the following files:
- Adobe PDF
- Microsoft Office documents
- iWork documents
- OpenOffice documents
- RTF
- TXT
- ...
- Ability to index the code of interpreted files (PHP, ASP, JSP, ...)
- Ability to index files through various protocols (FTP, SSH, FILE, ...)
- Improved generated logs for better error monitoring
- Development of a script in PHP CLI for indexing
- Development of a script / compiled application for indexing
- ...
Where to find BlueBox?
The project url is http://sourceforge.net/projects/bluebx/
Using BlueBox
A trial version of Bluebox is used on this site. However, the amount of data on lapinbleu.ch is not consistent enough to push its limits and have a good idea of its performance. I urge you to try it on your website.
How to contribute to BlueBox?
- You do not have technical skills: do not hesitate to contact me if want to make improvements to Bluebox, I will change the features according to your needs.
- You have a technical background and want to participate in the project: contact me, I will be glad to include you to the development team.