IM@T Online November 2003

Convera launches RetrievalWare® 8 Knowledge Discovery Platform

Categorisation and dynamic classification software delivers new end-user capabilities for search, navigation and discovery

Screenshot click to view full size imageIN OCTOBER Convera announced the commercial availability of RetrievalWare 8, a knowledge discovery platform to help organisations automate their knowledge management and discovery processes.

RetrievalWare 8 integrates Convera’s enterprise search capabilities with a new dynamic classification methodology, providing a single integrated platform that can categorise, organise and deliver enterprise content, regardless of format, language or storage location.

The new classification software enables end users to dynamically personalise their view onto the available data and so provide immediate support for innovation and problem solving challenges. According to Graham Charlesworth, VP & GM of European Operations at Convera, “Legacy categorisation systems based, for example, on naïve Bayesian technology have failed to support innovators as they tend to impose the same monolithic taxonomy view onto all users. We don’t know of any legacy Bayesian-based installations that have successfully maintained workable categorisation accuracy beyond a few hundred taxonomic nodes. Yet our enterprise scale customers are demanding a level of granularity in automated categorisation, which requires thousands of taxonomic nodes. We are going to deliver against that with RetrievalWare 8, where previous generations of technology have failed.”

To an aid to fast start-up, Convera will be supplying a range of pre-seeded, highly granular taxonomies off-the-shelf, covering more than a dozen industry domains. These will be editable by customers and system integrators so that precise customer needs can be met.

According to Mark Walter of The Seybold Report: “RetrievalWare raises the bar in categorisation. Convera has made impressive strides in its categorisation and classification techniques. There are new tools for importing, creating and editing taxonomies, a new categorisation engine and impressive display options for end users. This is the first system we've seen that lets users mix and match taxonomies in their search results, even to the point of showing search results classified in a table along two user-definable axes at once, like showing me the search hits by topic and geography.”

Screenshot Click to see gfull size imageConvera’s Domain Cartridges
Finding the right information quickly and efficiently is critical to problem solving and innovation in any business. This is particularly true for knowledge workers and information analysts accessing domain or subject-specific content such as scientific journals, engineering reports, financial analysis, industry specific news, industry websites, and intelligence reports. For these professionals, the cost of not finding and extracting ‘critical’ knowledge from the vast sea of available information can result in millions of dollars in both lost productivity and lost business opportunities. For some military and intelligence applications, the result of missing that ‘critical’ bit of information could even result in catastrophic loss.

A significant challenge in making the most of an information retrieval system is that each industry or field of endeavour uses its own unique terminology and concepts to communicate and express ideas. This unique terminology is not only found in the source content being searched over, but is also used by professionals to form their ‘queries’ while searching or organising their information. Understanding and leveraging the terminology, concepts, and relationships between concepts for a particular industry or field is vital to mission critical information retrieval.

Convera’s Domain Cartridges provide out-of-the-box domain specific semantic networks to improve precision and relevancy of search, retrieval and categorisation when used with RetrievalWare. These semantic networks are stored in a flexible cartridge format and contain thousands of domain specific concepts and terms linked together based on their relationships to one another. When an end-user performs a search, RetrievalWare will use the Domain Cartridge to expand the query terms to retrieve relevant documents based on not only the query terms, but also related domain-specific concepts found in the Domain Cartridge.

Convera’s pre-packaged Domain Cartridges seamlessly plug into RetrievalWare® and extend Convera’s existing set of semantic network cartridges that support a wide range of languages. Each Domain Cartridge can be used in combination with other industry cartridges or with customised cartridges that include your business specific vocabulary and concepts

Professionals in each industry or specialised field have their own unique set of vocabulary and concepts to communicate and express ideas. Understanding the terminology, concepts, and relationships between them is vital to mission critical information used in those industries or fields.

Screenshot Click to see full size imageThe Convera Domain Cartridges provide an out-of-the-box solution for businesses to leverage domain specific semantic networks to improve precision and recall of search and categorisation. Each domain specific cartridge can be used in combination with Convera’s other industry and language cartridges or with customised cartridges that include your business specific vocabulary and concepts.

The primary sources used to populate the cartridges cover a number of principal domains subdivided into several thousand specialised domains. The content for these sources have been gathered over many years by terminologists working with external language specialists and field specialists. The content has been reviewed and filtered to ensure quality coverage or a technically oriented nature in domains of interest to professionals, technical staff and researchers. The accumulated data would span roughly 3,000 reference books if grouped together.

Example of domains and sub-domains
The following list details the domains and sub-domains of Convera's MeSH (Medical Subject Headings) Domain Cartridge. There are 37,600 concepts represented by approximately 142,700 terms and expressions distributed across the following domains:

Anatomy
|_ Body Regions
|_ Musculoskeletal System
|_ Digestive System
|_ Respiratory System
|_ Urogenital System
|_ Endocrine System
|_ Cardiovascular System
|_ Nervous System
|_ Sense Organs
|_ Tissues
|_ Cells
|_ Fluids and Secretions
|_ Animal Structures
|_ Stomatognathic System
|_ Hemic and Immune Systems
|_ Embryonic Structures
|_ Integumentary System

Organisms
|_ Invertebrates
|_ Vertebrates
|_ Bacteria
|_ Viruses
|_ Algae and Fungi
|_ Plants
|_ Archaea

Diseases
|_ Bacterial Infections and Mycoses
|_ Virus Diseases
|_ Parasitic Diseases
|_ Neoplasms
|_ Musculoskeletal Diseases
|_ Digestive System Diseases
|_ Stomatognathic Diseases
|_ Respiratory Tract Diseases
|_ Otorhinolaryngologic Diseases
|_ Nervous System Diseases
|_ Eye Diseases
|_ Urologic and Male Genital Diseases
|_ Female Genital Diseases and Pregnancy Complications
|_ Cardiovascular Diseases
|_ Hemic and Lymphatic Diseases
|_ Neonatal Diseases and Abnormalities
|_ Skin and Connective Tissue Diseases
|_ Nutritional and Metabolic Diseases
|_ Endocrine Diseases
|_ Immunologic Diseases
|_ Disorders of Environmental Origin
|_ Animal Diseases
|_ Pathological Conditions, Signs and Symptoms

Chemicals and Drugs
|_ Inorganic Chemicals
|_ Organic Chemicals
|_ Heterocyclic Compounds Polycyclic Hydrocarbons
|_ Environmental Pollutants, Noxae, and Pesticides
|_ Hormones, Hormone Substitutes, and Hormone Antagonists
|_ Reproductive Control Agents
|_ Enzymes, Coenzymes, and Enzyme Inhibitors
|_ Carbohydrates and Hypoglycemic Agents
|_ Lipids and Antilipemic Agents
|_ Growth Substances, Pigments, and Vitamins I_Amino Acids, Peptides, and Proteins
|_ Nucleic Acids, Nucleotides, and Nucleosides
|_ Neurotransmitters and Neurotransmiuer Agents
|_ Central Nervous System Agents
|_ Peripheral Nervous System Agents
|_ Anti-Inflammatory Agents, Antirheumatic Agents, and Inflammation Mediators
|_ |_ Cardiovascular Agents
|_ Hematologic, Gastrointestinal, and Renal Agents ~_ Anti-Infective Agents
|_ Anti-Allergic and Respiratory System Agents
|_ Antineoplastic and Immunosuppressive Agents
|_ Dermatologic Agents
|_ Immunologic and Biological Factors
|_ Biomedical and Dental Materials Specialty Chemicals and Products
|_ Chemical Actions and Uses

Analytical, Diagnostic and Therapeutic Techniques and Equipment
|_Diagnosis
|_ Therapeutics
|_ Anesthesia and Analgesia
|_ Surgical Procedures, Operative
|_ Investigative Techniques
|_ Dentistry
|_ Equipment and Supplies

Psychiatry and Psychology
|_ Behavior and Behavior Mechanisms
|_ Psychological Phenomena and Processes
|_ Mental Disorders
|_ Behavioral Disciplines and Activities

Biological Sciences
|_ Biological Sciences
|_ Health Occupations
|_ Environment and Public Health
|_ Biological Phenomena, Cell Phenomena, and Immunity
|_ Genetic Processes
|_ Biochemical Phenomena, Metabolism, and Nutrition
|_ Physiological Processes
|_ Reproductive and Urinary Physiology
|_ Circulatory and Respiratory Physiology
|_ Digestive, Oral, and Skin Physiology
|_ Musculoskeletal, Neural, and Ocular Physiology
|_ Chemical and Pharmacologic Phenomena
|_ Genetic Phenomena
|_ Genetic Structures _Physical Sciences

Physical Sciences
Physical sciences

Anthropology, Education, Sociology and Social Phenomena
|_ Social Sciences
|_ Education
|_ Human Activities

Technology and Food and Beverages
|_ |Technology, Industry, and Agriculture
|_ Food and Beverages

Humanities
|_ Humanities

Information Science
|_ Information Science

Persons
|_ Persons

Health Care
|_ Population Characteristics
|_ Health Care Facilities, Manpower, and Services
|_ Health Care Economics and Organizations
|_ Health Services Administration
|_ Health Care Quality, Access, and Evaluation

Geographic Locations
|_ Geographic Locations

Sample Expansions
Although it is not possible to give an exhaustive sampling of the terminology included in the MeSH Domain Cartridge, the following samples illustrate the diversity and potential richness of the cartridge.

For example, “in the domain” we would find the term:

Nervous System Diseases” > brain ischemia

English synonyms automatically searched
Ischemic Encephalopathy
Ischemic Encephalopathies
Ischemia, Brain
Encephalopathy, Ischemic
Brain Ischemias

Heterocyclic Compounds” > Methotrimeprazine
English synonyms automatically searched
Levopromazine
Levomepromazine
Levomeprazin

Enzymes, Coenzymes, and Enzyme Inhibitors” > micrococcal nuclease
English synonyms automatically searched
Nuclease, Thermostable
TNase
Thermonuclease
Thermostable Nuclease
Staphylococcal Nuclease
Nuclease, Staphylococcal
Micrococcal Nuclease
Nuclease, Micrococcal

Organic Chemicals” > mustard gas
English synonyms automatically searched
Yperite
Yellow Cross Liquid
Mustard, Sulfur
Sulfur Mustard
Mustard gas
Sulfide, Dichlorodiethyl
Dichlorodiethyl Sulfide
Sulfide, Di-2-chloroethyl
Di-2-chloroethyl Sulfide
Bis(beta-chloroethyl) Sulfide

Convera claims to offer the most extensive, scalable and intelligent search and categorisation system available today. The MeSH Domain Cartridge is just one example of many industry-specific cartridges available from Convera.

Other industry-specific cartridges include:
Biology, Chemistry
Computers
Electronics
Finance
Food Science
Geography
Geology
Health Sciences
Information Science
Law
Mathematics
MeSH (Medical Subject
Headings)
Military
Petroleum Natural Gas & Petrochemicals
Pharmacology
Physics
Plastics
Rubber
Telecommunications

Users
Convera has certified a number of partners into its new Taxonomy Developer Certification Program. These include Access Innovations, IBM and Veridian. Early adopters of RetrievalWare 8 include the UK Ministry of Defence (MOD) offices within the US Department of Defense, US Department of Energy and the US Navy. Initial interest is also strong within the pharmaceutical and financial communities.

The US Federal Bureau of Investigation (FBI)is also using Convera RetrievalWare. See case study below.

 

FBI selects Convera for new FBI investigative data warehouse

RetrievalWare® to increase information sharing among law enforcement, intelligence and homeland security communities

FBI SealCONVERA announced in October that the US Federal Bureau of Investigation has selected Convera's RetrievalWare as a search and categorisation platform within the Agency's new Investigative Data Warehouse. The initial value of the deployment of Convera's software is approximately $1.5 million.

After the events of September 11th, the FBI created a sophisticated Secure Collaborative Operational Prototype Environment (SCOPE) with a counter-terrorism and intelligence data repository. RetrievalWare was selected by the FBI for the repository to improve the sharing of intelligence information and collaboration across multiple government agencies, enhancing the government's ability to prevent terrorist attacks. RetrievalWare will work with other tools to help FBI analysts identify critical pieces of intelligence within the massive information repository that they use to drive investigative and intelligence activities. Specific RetrievalWare capabilities required by the FBI for the project include extensive security options, real-time message profiling, breadth of language support, multimedia search, scalability and powerful new dynamic classification capabilities.

Information sharing among intelligence agencies - essential to national security - will be bolstered by RetrievalWare's ability to cut through enormous amounts of data to find minute details agents need to respond to possible homeland security threats. RetrievalWare will also ensure FBI agents can search authorised information in other agency databases, in addition to the FBI's own data repository.

The new Intelligence Data Warehouse system will provide a Web-based, collaborative environment for hundreds of agents who will eventually analyse over one billion text, video, audio and image files. Using RetrievalWare, agents can compare and contrast relevant information and find missing links by securely accessing the Agency's Investigative Data Warehouse.

Search and classification requirements
RetrievalWare met the FBI's stringent search and classification requirements that included:

Dynamic Classification
RetrievalWare will be tailored for the FBI's document classification system to meet the agency's specific user requirements. RetrievalWare's dynamic classification improves search and discovery quality by presenting search results in personalised views enabled by visual discovery techniques that reduce the time required to find and share knowledge. This gives the FBI freedom and flexibility to dynamically organise and view essential information assets, providing more efficient information exploration and problem solving capabilities. With dynamic classification, agents will define personalised criteria for information of interest and be notified when relevant information enters the data warehouse.

Language Breadth
The FBI will also use RetrievalWare to search in nearly 50 languages, including European, Asian and Middle Eastern languages. Convera's advanced concept search technology will be utilised to search many of the languages. RetrievalWare's cross-lingual cartridges will offer agents the option of asking a question in one language and receiving the answer in another - a unique feature and useful for the FBI. For example, an agent could construct a search query in English and receive results in French or German.

Security
Essential for the FBI's sensitive and worldwide operations, RetrievalWare offers complete document level security, as well as cross-repository and cross-platform security.

Multimedia Search (text, image, video, paper and other)
RetrievalWare will allow agents to reach into and search upon a variety of multimedia assets representing, for example, surveillance videotapes; forensic reports such as blood, fingerprint and DNA; automated case files; credit card transactions; terrorist watch lists; wiretaps; bank records; credit card transactions; automated case files and even local law enforcement arrest reports.

Industry-Specific Taxonomies
A pivotal agency protecting national security, the FBI has its own 'language' for operations around the world. Convera's industry-specific taxonomy and semantic network cartridges help ensure thousands of FBI mission-specific concepts and terms are used to optimise search, discovery accuracy, relevancy and personalisation for the Agency.

Convera. Tel: + 44 1344 781800; fax: + 44 1344 781801; www.convera.com; e-mail: info@convera.co.uk



IM@T Online November 2003

Previous item Contents Next item