1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256
#![doc(hidden)]
//! A primer on prepared statement caching in Diesel
//! ------------------------------------------------
//!
//! Diesel uses prepared statements for virtually all queries. This is most
//! visible in our lack of any sort of "quoting" API. Values must always be
//! transmitted as bind parameters, we do not support direct interpolation. The
//! only method in the public API that doesn't require the use of prepared
//! statements is `Connection#batch_execute`.
//!
//! In order to avoid the cost of re-parsing and planning subsequent queries,
//! Diesel caches the prepared statement whenever possible. Queries will fall
//! into one of three buckets:
//!
//! - Unsafe to cache
//! - Cached by SQL
//! - Cached by type
//!
//! A query is considered unsafe to cache if it represents a potentially
//! unbounded number of queries. This is communicated to the connection through
//! `QueryFragment#is_safe_to_cache_prepared`. While this is done as a full AST
//! pass, after monomorphisation and inlining this will usually be optimized to
//! a constant. Only boxed queries will need to do actual work to answer this
//! question.
//!
//! The majority of AST nodes are safe to cache if their components are safe to
//! cache. There are at least 4 cases where a query is unsafe to cache:
//!
//! - queries containing `IN` with bind parameters
//! - This requires 1 bind parameter per value, and is therefore unbounded
//! - `IN` with subselects are cached (assuming the subselect is safe to
//! cache)
//! - `INSERT` statements
//! - The SQL varies based on the number of rows being inserted. We could in
//! theory cache the single row case, but the cost of parsing in a write
//! query is pretty insignificant compared to reads
//! - `UPDATE` statements
//! - Technically it's bounded on "number of optional values being passed to
//! `SET` factorial" but that's still quite high, and not worth caching
//! for the same reason as single row inserts
//! - `SqlLiteral` nodes
//! - We have no way of knowing whether the SQL was generated dynamically or
//! not, so we must assume that it's unbounded
//!
//! For queries which are unsafe to cache, the statement cache will never insert
//! them. They will be prepared and immediately released after use (or in the
//! case of PG they will use the unnamed prepared statement).
//!
//! For statements which are able to be cached, we then have to determine what
//! to use as the cache key. The standard method that virtually all ORMs or
//! database access layers use in the wild is to store the statements in a
//! hash map, using the SQL as the key.
//!
//! However, the majority of queries using Diesel that are safe to cache as
//! prepared statements will be uniquely identified by their type. For these
//! queries, we can bypass the query builder entirely. Since our AST is
//! generally optimized away by the compiler, for these queries the cost of
//! fetching a prepared statement from the cache is the cost of `HashMap<u32,
//! _>::get`, where the key we're fetching by is a compile time constant. For
//! these types, the AST pass to gather the bind parameters will also be
//! optimized to accessing each parameter individually.
//!
//! Determining if a query can be cached by type is the responsibility of the
//! `QueryId` trait. This trait is quite similar to `Any`, but with a few
//! differences:
//!
//! - No `'static` bound
//! - Something being a reference never changes the SQL that is generated,
//! so `&T` has the same query id as `T`.
//! - `Option<TypeId>` instead of `TypeId`
//! - We need to be able to constrain on this trait being implemented, but
//! not all types will actually have a static query id. Hopefully once
//! specialization is stable we can remove the `QueryId` bound and
//! specialize on it instead (or provide a blanket impl for all `T`)
//! - Implementors give a more broad type than `Self`
//! - This really only affects bind parameters. There are 6 different Rust
//! types which can be used for a parameter of type `timestamp`. The same
//! statement can be used regardless of the Rust type, so `Bound<ST, T>`
//! defines its `QueryId` as `Bound<ST, ()>`.
//!
//! A type returning `Some(id)` or `None` for its query ID is based on whether
//! the SQL it generates can change without the type changing. At the moment,
//! the only type which is safe to cache as a prepared statement but does not
//! have a static query ID is something which has been boxed.
//!
//! One potential optimization that we don't perform is storing the queries
//! which are cached by type ID in a separate map. Since a type ID is a u64,
//! this would allow us to use a specialized map which knows that there will
//! never be hashing collisions (also known as a perfect hashing function),
//! which would mean lookups are always constant time. However, this would save
//! nanoseconds on an operation that will take microseconds or even
//! milliseconds.
use std::any::TypeId;
use std::borrow::Cow;
use std::cell::{RefCell, RefMut};
use std::collections::HashMap;
use std::hash::Hash;
use std::ops::{Deref, DerefMut};
use backend::Backend;
use query_builder::*;
use result::QueryResult;
#[doc(hidden)]
#[allow(missing_debug_implementations)]
pub struct StatementCache<DB: Backend, Statement> {
pub cache: RefCell<HashMap<StatementCacheKey<DB>, Statement>>,
}
#[allow(clippy::len_without_is_empty, clippy::new_without_default_derive)]
impl<DB, Statement> StatementCache<DB, Statement>
where
DB: Backend,
DB::TypeMetadata: Clone,
DB::QueryBuilder: Default,
StatementCacheKey<DB>: Hash + Eq,
{
pub fn new() -> Self {
StatementCache {
cache: RefCell::new(HashMap::new()),
}
}
pub fn len(&self) -> usize {
self.cache.borrow().len()
}
pub fn cached_statement<T, F>(
&self,
source: &T,
bind_types: &[DB::TypeMetadata],
prepare_fn: F,
) -> QueryResult<MaybeCached<Statement>>
where
T: QueryFragment<DB> + QueryId,
F: FnOnce(&str) -> QueryResult<Statement>,
{
use std::collections::hash_map::Entry::{Occupied, Vacant};
let cache_key = StatementCacheKey::for_source(source, bind_types)?;
if !source.is_safe_to_cache_prepared()? {
let sql = cache_key.sql(source)?;
return prepare_fn(&sql).map(MaybeCached::CannotCache);
}
refmut_map_result(self.cache.borrow_mut(), |cache| {
match cache.entry(cache_key) {
Occupied(entry) => Ok(entry.into_mut()),
Vacant(entry) => {
let statement = {
let sql = entry.key().sql(source)?;
prepare_fn(&sql)
};
Ok(entry.insert(statement?))
}
}
})
.map(MaybeCached::Cached)
}
}
#[doc(hidden)]
#[allow(missing_debug_implementations)]
pub enum MaybeCached<'a, T: 'a> {
CannotCache(T),
Cached(RefMut<'a, T>),
}
impl<'a, T> Deref for MaybeCached<'a, T> {
type Target = T;
fn deref(&self) -> &Self::Target {
match *self {
MaybeCached::CannotCache(ref x) => x,
MaybeCached::Cached(ref x) => &**x,
}
}
}
impl<'a, T> DerefMut for MaybeCached<'a, T> {
fn deref_mut(&mut self) -> &mut Self::Target {
match *self {
MaybeCached::CannotCache(ref mut x) => x,
MaybeCached::Cached(ref mut x) => &mut **x,
}
}
}
#[doc(hidden)]
#[allow(missing_debug_implementations)]
#[derive(Hash, PartialEq, Eq)]
pub enum StatementCacheKey<DB: Backend> {
Type(TypeId),
Sql {
sql: String,
bind_types: Vec<DB::TypeMetadata>,
},
}
impl<DB> StatementCacheKey<DB>
where
DB: Backend,
DB::QueryBuilder: Default,
DB::TypeMetadata: Clone,
{
pub fn for_source<T>(source: &T, bind_types: &[DB::TypeMetadata]) -> QueryResult<Self>
where
T: QueryFragment<DB> + QueryId,
{
match T::query_id() {
Some(id) => Ok(StatementCacheKey::Type(id)),
None => {
let sql = Self::construct_sql(source)?;
Ok(StatementCacheKey::Sql {
sql: sql,
bind_types: bind_types.into(),
})
}
}
}
pub fn sql<T: QueryFragment<DB>>(&self, source: &T) -> QueryResult<Cow<str>> {
match *self {
StatementCacheKey::Type(_) => Self::construct_sql(source).map(Cow::Owned),
StatementCacheKey::Sql { ref sql, .. } => Ok(Cow::Borrowed(sql)),
}
}
fn construct_sql<T: QueryFragment<DB>>(source: &T) -> QueryResult<String> {
let mut query_builder = DB::QueryBuilder::default();
source.to_sql(&mut query_builder)?;
Ok(query_builder.finish())
}
}
/// Similar to `RefMut::map`, but for functions which return `Result`
///
/// If we were in Haskell (and if `RefMut` were a functor), this would just be
/// `sequenceA`.
fn refmut_map_result<T, U, F>(mut refmut: RefMut<T>, mapper: F) -> QueryResult<RefMut<U>>
where
F: FnOnce(&mut T) -> QueryResult<&mut U>,
{
// We can't just use `RefMut::map` here, since to lift the error out of that
// closure we'd need to return *something*.
//
// Instead we will very briefly convert to a raw pointer to eliminate
// lifetimes from the equation. Ultimately the cast is safe since the input
// and output lifetimes are identical. However, without the raw pointer
// we would have two live mutable references at the same time.
let ptr = mapper(&mut *refmut).map(|x| x as *mut _)?;
Ok(RefMut::map(refmut, |_| unsafe { &mut *ptr }))
}