This document describes how to add built-in functions to TiDB.
Background
How is the SQL statement executed in TiDB?
The SQL statement is parsed to an abstract syntax tree (AST) by the parser first and then uses Query Optimizer to generate an execution plan. The plan can then be executed to get the result. This process involves how to access the data in the table, and how to filter, calculate, sort, aggregate, and distinct the data, etc. For a built-in function, the most important part is to parse and to evaluate.
For parsing, it is redundant work because you should know how to write YACC commands and how to modify TiDB syntax parser. But we have finished this work for you and syntax parsing of most built-in functions is done.
As for evaluation, it should be finished in the TiDB expression evaluation framework. Each built-in function is considered as an expression indicated by ScalarFunction
and obtains the corresponding function type and function signature through the function name and parameters to evaluate.
The procedure discussed above is complicated for users who are not familiar with TiDB. We have finished syntax parsing and function signature confirmation of most unimplemented functions. But implementation is left empty. In other words, locating and completing the empty implementation makes a Pull Request (PR).
The procedure to add a built-in function
The following procedure describes how to add a built-in function.
- Locate the unimplemented function.
- Search for
errFunctionNotExists
in theexpression
directory of TiDB source code. You can find all the unimplemented functions. - Choose a function you are interested in. Take the SHA2 function as an example:
func (b *builtinSHA2Sig) eval(row []types.Datum) (d types.Datum, err error) { return d, errFunctionNotExists.GenByArgs("SHA2") }
- Search for
- Implement the function signature.This step is to implement
eval
. For the function features, see MySQL documentation. For the specific implementation method, see the method of implemented functions. - Add the type inference information to the
typeinferer
file.Add the type of the returned result of the function tohandleFuncCallExpr()
in the theplan/typeinferer.go
file and make sure the result is consistent with the result in MySQL. See MySQL Const for the complete list of the type definition.Note: For most functions, you need to input the type of the returned result and obtain the length of the returned result.
- Add a unit test case.Add a unit test case for the function to the
expression
directory. Add a unit test case oftypeinferer
to theplan/typeinferer_test.go
file. - Run the
make dev
command and make sure all the test cases can pass.
Example
Take the Pull Request to add the SHA1()
function as an example:
- Open the
expression/builtin_encryption.go
file and complete the evaluation ofSHA1()
.func (b *builtinSHA1Sig) eval(row []types.Datum) (d types.Datum, err error) { // Evaluate the arguments. In most cases, you do not need to make any modification. args, err := b.evalArgs(row) if err != nil { return types.Datum{}, errors.Trace(err) } // See MySQL documentation for the meaning of each argument. // SHA/SHA1 function only accept 1 parameter arg := args[0] if arg.IsNull() { return d, nil } // The type of the argument value is changed. See "util/types/datum.go" for the function implementation. bin, err := arg.ToBytes() if err != nil { return d, errors.Trace(err) } hasher := sha1.New() hasher.Write(bin) data := fmt.Sprintf("%x", hasher.Sum(nil)) // Set the return value. d.SetString(data) return d, nil }
- Add a unit test case for the function implementation. See
expression/builtin_encryption_test.go
:var shaCases = []struct { origin interface{} crypt string }{ {"test", "a94a8fe5ccb19ba61c4c0873d391e987982fbbd3"}, {"c4pt0r", "034923dcabf099fc4c8917c0ab91ffcd4c2578a6"}, {"pingcap", "73bf9ef43a44f42e2ea2894d62f0917af149a006"}, {"foobar", "8843d7f92416211de9ebb963ff4ce28125932878"}, {1024, "128351137a9c47206c4507dcf2e6fbeeca3a9079"}, {123.45, "22f8b438ad7e89300b51d88684f3f0b9fa1d7a32"}, } func (s *testEvaluatorSuite) TestShaEncrypt(c *C) { defer testleak.AfterTest(c)() // The tool for monitoring goroutine leak. You can just copy it. fc := funcs[ast.SHA] for _, test := range shaCases { in := types.NewDatum(test.origin) f, _ := fc.getFunction(datumsToConstants([]types.Datum{in}), s.ctx) crypt, err := f.eval(nil) c.Assert(err, IsNil) res, err := crypt.ToString() c.Assert(err, IsNil) c.Assert(res, Equals, test.crypt) } // test NULL input for sha var argNull types.Datum f, _ := fc.getFunction(datumsToConstants([]types.Datum{argNull}), s.ctx) crypt, err := f.eval(nil) c.Assert(err, IsNil) c.Assert(crypt.IsNull(), IsTrue) }
Note: Besides conventional cases, you had better add some exceptional cases in which, for example, the input value is “nil” or the arguments of various types.
- Add the type inference information and the test case. See
plan/typeinferer.go
andplan/typeinferer_test.go
:case ast.SHA, ast.SHA1: tp = types.NewFieldType(mysql.TypeVarString) chs = v.defaultCharset tp.Flen = 40
{`sha1(123)`, mysql.TypeVarString, "utf8"}, {`sha(123)`, mysql.TypeVarString, "utf8"},
Experience modern data infrastructure firsthand.
TiDB Cloud Dedicated
A fully-managed cloud DBaaS for predictable workloads
TiDB Cloud Serverless
A fully-managed cloud DBaaS for auto-scaling workloads