Metagenomics enables the discovery and study of the collective genomic content from diverse environments through computational methods and analyses of data. The fast growth of public repositories of sequences contributes to the success of metagenomics applications. But repositories are growing faster than the resources to use them, a challenge for current methods. Reference selection and acquisition, taxonomy definition, statistical analysis and visualization pose further challenges to fully and properly explore environmental data. This project provides interconnected methods to mitigate those issues, with a focus on the development of high performance algorithms and their implementation on metagenomics tools. The central goal is to enable comparative metagenomics analysis in short time using the whole of the quickly growing number of assembled sequences openly available. Also, constant updates with the influx of new sequences are enabled by implementing high performance algorithms coupled with efficient data structures to improve sequence indexing and classification. They are developed on the top of state-of-the-art methods, extending their capacities to index and analyze very large sets of data, keeping or increasing precision and sensitivity of their final results. This algorithm is the core of a proposed workflow for high performance metagenomics analysis. Further, the workflow enables reference sequence selection, acquisition and filtration. This is crucial to take full advantage of the currently under-explored data repositories due to the lack of metadata, presence of contamination and species over-representation. The workflow will also integrate a molecular-based taxonomy. This aims to resolve anomalies of current taxonomic definitions, improving the sensitivity of results. Finally, reports, visualizations and extended statistical analysis are developed to translate raw results into an intelligible output for single- and multi-sample metagenomics studies.