Vector database: Difference between revisions
Created page with "Vector databases like Neo4j have been important for quite some time now. They are ever more important now that Artificial Intelligence is mainstream. A vector database is a collection of data stored as mathematical representations. Vector databases make it possible for computer programs to draw comparisons, identify relationships, and understand context. They enable '''Semantic Search''' which is search based on meaning rather than..." |
No edit summary |
||
| Line 8: | Line 8: | ||
https://mariadb.org/amazon-mariadb-vector/</ref> and [[Postgres]]<ref>https://github.com/pgvector/pgvector</ref> offer vector capability now.) | https://mariadb.org/amazon-mariadb-vector/</ref> and [[Postgres]]<ref>https://github.com/pgvector/pgvector</ref> offer vector capability now.) | ||
== Commercial == | |||
'''Pinecone''' is a commercial vector database product. They have a page [https://www.pinecone.io/learn/vector-database/ What is a Vector Database & How Does it Work? Use Cases + Examples] that goes quite a bit into describing what a vector database is, compared with simple vector indexes or traditional scalar databases, and how a vector database works. The even get into algorithms and discuss things like [[wp:random_projection|random projection]] (the Wikipedia article needs more work but it's the tip of an iceberg of statistics, data science and artificial intelligence). | |||
Random Projection is a topic all unto itself. Professor '''Michael Pyrcz''' at the University of Texas, Austin posts his coursework and code online for all to learn. Here's [https://www.youtube.com/watch?v=bfS7JAjiOMI PGE 383 - Feature Projection] which he starts off with Random Projection. (Note too that he puts all materials onto his GitHub account; e.g. https://github.com/GeostatsGuy/PythonNumericalDemos/blob/master/SubsurfaceDataAnalytics_Multidimensional_Scaling.ipynb) It's one of many algorithms in "[[wp:Dimensionality_reduction|dimensionality reduction]]" that have applications in neuroscience, artificial intelligence and [[wp:Recommender_system|recommender systems]]. | |||
== Open Source == | |||
One interesting open source vector database is '''Memgraph'''. Memgraph is like Neo4j without the cost. Memgraph uses the same Cypher query language as Neo4j. However, it is written in C++ and integrates better with Python than Neo4j, which uses Java to build applications. An interesting case study is how NASA is building a People Knowledge Graph with LLMs and Memgraph<ref>https://www.theregister.com/2025/05/07/nasa_people_memgraph/</ref>. In the NASA case study, they use [[Ollama]] which is a locally deployed AI model runner which can be thought of as like [[Docker Desktop]] for running [[Docker]] images. | One interesting open source vector database is '''Memgraph'''. Memgraph is like Neo4j without the cost. Memgraph uses the same Cypher query language as Neo4j. However, it is written in C++ and integrates better with Python than Neo4j, which uses Java to build applications. An interesting case study is how NASA is building a People Knowledge Graph with LLMs and Memgraph<ref>https://www.theregister.com/2025/05/07/nasa_people_memgraph/</ref>. In the NASA case study, they use [[Ollama]] which is a locally deployed AI model runner which can be thought of as like [[Docker Desktop]] for running [[Docker]] images. | ||