×

You are using an outdated browser Internet Explorer. It does not support some functions of the site.

Recommend that you install one of the following browsers: Firefox, Opera or Chrome.

Contacts:

+7 961 270-60-01
ivdon3@bk.ru

  • Visualization and comparison of semantic trees reflecting the component structure of the patented device

    This paper describes approaches to visualization and comparison of semantic trees reflecting the component structure of the patented device and the connections between them using graph databases. DBMS data uses graph structures to store, process, and represent data. The main elements of a graph database are nodes and edges, which, within the framework of the task, model entities of 3 types (SYSTEM, COMPONENT, ATTRIBUTE) and 5 types of connections (PART-OF, LOCATED-AT, CONNECTED-WITH, ATTRIBUTE-FOR, IN-MANNER-OF). According to the results of the study, it can be stated that Neo4j demonstrates the best possibilities for graph visualization; ArangoDB, despite correctly entered queries, performs incomplete visualization; AllegroGraph showed difficult work with code, difficult configuration of graph tree visualization. 3 algorithms for comparing graph representations of information have been tested: Graph Edit Distance, Topological Comparison, Subgraph Isomorphism. The algorithms are implemented in python, compares 2 graph trees, displays visualization and analysis of common graph structures and differences.

    Keywords: semantic tree, component structure, patent, graph databases, Neo4j, AllegroGraph, ArangoDB

  • Automation of recognition of radio listeners' requests

    The article describes the automation of the audio recording recognition process in order to identify the ordered song on the radio station. The Golos Russian speech recognition model from the SberDevices was used. An algorithm for correcting the text obtained as a result of audio recording analysis using the Golos model based on the Levenshtein distance method has been developed. For recognized requests from radio listeners, interaction with the DIGISPOT II database is organized (formation and execution of queries to search for artists and their songs).

    Keywords: speech recognition, Golos, Digispot II

  • Analysis of images of mathematical and chemical formulas from patent documents

    Currently, patent documents contain graphic images of device drawings, graphs, chemical and mathematical formulas, and formulas often need to be recognized and brought to a unified standard. In this work, the analysis of graphic images extracted from the descriptions of patents of the FIPS of Rospatent is carried out. Thematic filtering of mathematical and chemical formulas contained in patent documents and their recognition is provided. The theoretical value lies in the developed algorithms for parsing patents in the Yandex system.Patents; recognition of chemical and mathematical formulas among graphic patent images; translation of graphic images of chemical formulas into SMILES format; conversion of graphic images of mathematical formulas into LaTeX format. The practical significance of the work lies in the developed software module for analyzing graphic images from patent documents. The field of application of the developed system is the study of patents and the reduction of graphic images to a unified standard for solving patent search problems.

    Keywords: patent, image, mathematical formula, chemical formula, LaTeX, SMILES

  • The technique of analyzing video files for detecting the presence of persons and attractions, using recognition by key, non-repeating frames

    In this paper, we consider a technique for automatic analysis of video files for detecting the presence of persons and attractions, using recognition by key, non-repeating frames, based on algorithms for their extraction. Recognition of landmarks and faces only by keyframes will significantly reduce computational costs, as well as avoid overflowing with repetitive information. The effectiveness of the proposed technique is evaluated in terms of accuracy and speed on a set of test videos.

    Keywords: keyframe, recognition, computer vision, algorithm, video

  • Formation of a visualized representation of the patent landscape

    Methods and technologies for solving the problem of patent landscape visualization based on cluster analysis of the patent array are considered and used. Algorithms for downloading patent archives, parsing patent documents, clustering patents and visualizing the patent landscape have been developed. A software for clustering patent documents based on the Latent Dirichlet allocation model and visualization of the patent landscape on clustering data using the gensim, PySpark, and sklearn libraries has been implemented. The implemented software has been tested on patents issued by the US Patent and Trademark Office. The accuracy of classification of patents by category has been achieved - 84%.

    Keywords: patents, information extraction, clustering, patent landscape, innovation potential

  • Development of a software module for searching for patent analogues

    With the development of industry and science, the size of the patent base is growing, as well as the number of patent applications received by the agencies regulating the issue of patents is growing. Each patent application must be checked for the uniqueness of the patented technology, for this, the patent office experts need to search the patent database and find analog patents. In the absence of analog patents, this technology can be considered unique and accepted for patenting. Since the patent database of various departments can number tens of millions of patents, such a patent search and evaluation of the uniqueness of the patented technology can take a very long time. The existing systems do not meet all the requirements and do not have the full necessary functionality. This article describes the development of an automated system for searching for analog patents in the patent array.

    Keywords: patent, database, search, patent-analog, Hadoop, Solr, Django, Python, Haystack, HDFS