Embulk v0.11 is coming soon
- Author: @dmikurube
- Created at: 2023-04-13
(A Japanese version of this article is at: https://zenn.dev/dmikurube/articles/embulk-v0-11-is-coming-soon-ja)
It has been a long time since we started Embulk v0.10 as a “development series”. 1
We will soon release Embulk v0.11, the next “stable” versions. This article introduces some notes on migration to the v0.11 series. 2
Embulk v0.10.50
First, as of today, Embulk v0.10.50 has been released.
This v0.10.50 is intended to be the final version of the v0.10 series, and is effectively the Release Candidate for v0.11.0. If no problems are found with it, v0.11.0 will be released the same as v0.10.50.
As of publishing this article, we intended to have v0.10.48 as the final version. But, we had a couple of updates after publishing.
- v0.10.49: Fix for newer Java
- v0.10.50: Fix for explicit null in CSV quote and escape
So, please try v0.10.50 before v0.11.0 is released. And, if you find any problem with v0.10.50, please report it to our User forum. We plan to release v0.11.0 in June 2023 if no significant problems are found.
The Embulk v0.11 series contains a number of changes that are incompatible with the previous v0.9 series. Many older plugins are no longer considered to work. The major plugins maintained by the Embulk project (https://github.com/embulk) are ready for the v0.11 series, but there would be many plugins that are not yet ready.
Almost all the existing plugins can be ready for v0.11 if the developers of them follow the instructions in this article.
There may be cases where the Embulk core needs to be changed for some plugins to support v0.11. If it turns out before the stable version v0.11.0 is released, we would likely be able to help such cases by changing the Embulk core.
After v0.11.0 is officially released, it could happen that “the part of the Embulk core can no longer be changed”, and there would be no way. Please try v0.10.50 before it happens.
Plugin’s v0.11 support
As mentioned above, many plugins are expected not to work with v0.10.50 (v0.11.0). Most of them should work again if the plugins are updated to support v0.11. Please contact the developer of each plugin.
You can report it in our User forum, but it is up to the developer of each plugin whether to release its v0.11-compatible version or not.
Download
The selfupdate
command, used to update Embulk, is now deprecated.
- The
selfupdate
command alone to update to the latest version is already disabled. - The
selfupdate X.Y.Z
command to update to a specified version is still effective, but will eventually become obsolete.
Choose an appropriate site for you to download from the following options.
- Download from GitHub Releases with confirming the version.
- You can confirm changes in the version.
- This will be appropriate if you download it manually.
- Download from a version-specific URL like:
https://dl.embulk.org/embulk-X.Y.Z.jar
- This may be better in some cases, especially if you are downloading automatically.
- This is a redirect to GitHub Releases above.
- Ex: https://dl.embulk.org/embulk-0.10.50.jar
- Download the latest version from
https://dl.embulk.org/embulk-latest.jar
.- This is a redirect to
https://dl.embulk.org/embulk-X.Y.Z.jar
above. - This URL still points v0.9.24 as v0.10 is a development series.
- We will update this URL in a while after v0.11.0 is released.
- This is a redirect to
Jfyi, the file of Embulk v0.10.50 (embulk-0.10.50.jar
) is 6.39 MB, while Embulk v0.9.24 (embulk-0.9.24.jar
) was 42.4 MB. It is much smaller.
Changes in the command line
You will see the following message once you rename the downloaded file embulk-X.Y.Z.jar
to embulk
, and run it, in the same way as before.
This is a warning message that you will not be able to run Embulk with a command like: $ embulk run ...
================================== [ NOTICE ] ==================================
Embulk will not be executable as a single command, such as 'embulk run'.
It will happen at some point in v0.11.*.
Get ready for the removal by running Embulk with your own 'java' command line.
Running Embulk with your own 'java' command line has already been available.
For instance in Java 1.8 :
java -XX:+AggressiveOpts -XX:+UseConcMarkSweepGC -jar embulk-X.Y.Z.jar run ...
java -XX:+AggressiveOpts -XX:+TieredCompilation -XX:TieredStopAtLevel=1 -Xverify:none -jar embulk-X.Y.Z.jar guess ...
See https://github.com/embulk/embulk/issues/1496 for the details.
================================================================================
You can still run Embulk like $ embulk run ...
as of v0.10.50 and v0.11.0. You don’t need to change the command line immediately at the same time as trying v0.10.50 or migrating to v0.11. However, a command line like $ embulk run ...
will become unavailable during the v0.11 series shortly. Please now start preparing your startup script, or command line starting with java
.
Commands like
$ embulk run ...
will be unsupported so that Embulk will support newer Java, as explained in the next section. Theembulk
command has been implemented by an embedded shell script. However, it turned out to be hard to support various command lines (options) of varieties of Java versions by varieties of distributors after Java 9 in the shell script. We concluded that it would be impossible to keep the embedded shell script.Please start preparing your own startup script, or command line, appropriate to your Java version in use.
Support for newer Java
Unfortunately, Embulk supports only Java 8 officially, as of v0.10.50 and v0.11.0. However, we believe that Embulk should work basically with Java 11 and Java 17 at this point. If you are interested, please give it a try.
We plan to expand testing with newer Java versions, along with moving forward with Embulk v0.11, so that we can officially announce that Embulk supports Java 11 and Java 17 at some point in the future.
There are some cases in that plugins are not compatible with newer Java. 3 Such a plugin would need some work.
Embulk System Properties
Embulk System Properties has been added, a mechanism to configure global Embulk settings. They can be configured by a Java properties
file embulk.properties
, and overridden with a command line option -Xkey=value
.
There are several possible locations for embulk.properties
(see below). But as a starting point when migrating to v0.10.50 or v0.11, an easy option is to start from ~/.embulk/embulk.properties
under ~/.embulk/
. ~/.embulk/
is the directory where plugins have been installed even until v0.9.
JRuby
Embulk v0.10.50 and v0.11.0 do not include JRuby. If you wanted to use an Embulk plugin in the Ruby gem format (in almost every case at this time), you need to download JRuby separately, and configure your Embulk System Properties to use it. (See below in particular.)
It may seem annoying to you at a glance. But, please consider it a chance for you to start using an arbitrary JRuby version (at your own risk) as you need, whereas you had not been able to upgrade JRuby embedded in Embulk so far. Some newer JRuby versions may not work well as we do not test with many JRuby versions. But, now, you should have freedom much more improved.
In the long term, however, we plan to gradually shrink our support on (J)Ruby. 4 We will make the Maven-format plugins, which do not require JRuby, easier to use so that they will be the main plugin format in v0.11 and later.
Example: set up with JRuby
Let’s set up with JRuby 9.1.15.0, which is quite old, but the same with Embulk v0.9.
First, download jruby-complete-9.1.15.0.jar
from JRuby Files/downloads/9.1.15.0, and place it in an appropriate directory.
Then, specify the location of the jruby-complete-9.1.15.0.jar
as a file:
URL in ~/.embulk/embulk.properties
as follows. (Replace /home/user/
appropriately.)
jruby=file:///home/user/jruby-complete-9.1.15.0.jar
This is the first setup to use Ruby gem-format plugins in Embulk v0.10.50 and v0.11.
embulk.gem
To use Ruby gem-format plugins with JRuby, you also need to install embulk.gem
. It must be the same version as the Embulk core.
# If Embulk is v0.10.50, embulk.gem must be 0.10.50, too.
$ java -jar embulk-0.10.50.jar gem install embulk -v 0.10.50
This
embulk.gem
is completely different fromembulk.gem
until v0.8. We are considering an idea to yank the oldembulk.gem
until v0.8 from rubygems.org, as we see some confusion due to this. If you have any thoughts, please leave a comment in GitHub Discussions at:
The embulk.gem
depends on msgpack.gem
. You may need to install it explicitly in some cases.
Bundler and Liquid
You’ll need to install Bundler or Liquid explicitly if you are going to use them.
$ java -jar embulk-0.10.50.jar gem install bundler -v 1.16.0
$ java -jar embulk-0.10.50.jar gem install liquid -v 4.0.0
However, we have not done much testing with them either. The same versions of Bundler 1.16.0 and Liquid 4.0.0 are working now, and new versions would probably work as well. But please try their new versions at your own risk. (Feedback is welcome.)
Embulk home
Embulk v0.10.50 and v0.11.0 have a directory setting called “Embulk home”. This is mostly equivalent to ~/.embulk/
up to v0.9, and still defaults to ~/.embulk/
. In the example of Embulk System Properties above, we started from putting embulk.properties
in ~/.embulk/
.
You can change this “Embulk home” to another directory from ~/.embulk/
, and load embulk.properties
from there. Embulk home is chosen in the following order of precedence.
- Embulk home can be set from an Embulk System Property
embulk_home
in a command line option-X
.- It would be an absolute path, or a relative path from the current working directory.
java -jar embulk-0.10.50.jar -Xembulk_home=/var/tmp/foo run ...
(absolute path)java -jar embulk-0.10.50.jar -Xembulk_home=.embulk2/bar run ...
(repative path)
- Embulk home can be set from an environment variable
EMBULK_HOME
.- It must be an absolute path.
env EMBULK_HOME=/var/tmp/baz java -jar embulk-0.10.50.jar run ...
(absolute path)
- If neither 1 nor 2, Embulk home is set by searching according to the following conditions.
- Search from the current directory, moving one by one toward the parent.
- In each directory, look for a directory “whose name is
.embulk/
”, and “which contains a regular file namedembulk.properties
”. - If such a directory is found, it is chosen as Embulk home.
- If the current working directory is under the user’s home directory, the search stops at the home directory.
- Otherwise, the search continues to the root directory.
- If none of 1 to 3 matches,
~/.embulk/
is chosen unconditionally as Embulk home.
embulk.properties
directly under Embulk home is loaded as Embulk System Properties, then.
Finally, an Embulk System Property embulk_home
is overridden with the absolute path of Embulk home.
Directories to install plugins
Along with setting Embulk home, the directories to install Embulk plugins are determined more explicitly. They are configured in the following order of precedence.
- The directories to install Ruby gem-format plugins, and the directory to install Maven-format plugins can be set from Embulk System Properties
gem_home
,gem_path
, andm2_repo
in command line options-X
.- They would be absolute paths, or relative paths from the current working directory.
java -jar embulk-0.10.50.jar -Xgem_home=/var/tmp/bar gem install ...
(absolute path)java -jar embulk-0.10.50.jar -Xm2_repo=.m2/repository run ...
(relative path)
- The directories to install Ruby gem-format plugins, and the directory to install Maven-format plugins can be set from Embulk System Properties
gem_home
,gem_path
, andm2_repo
in theembulk.properties
at Embulk home.- They would be absolute paths, or relative paths from Embulk home.
- The directories to install Ruby gem-format plugins, and the directory to install Maven-format plugins can be set from environment variables
GEM_HOME
,GEM_PATH
, andM2_REPO
.- They must be absolute paths.
env GEM_HOME=/var/tmp/baz java -jar embulk-0.10.50.jar gem ...
(absolute path)
- If none of 1 to 3 matches, Ruby gem-format plugins are installed in
lib/gems
under Embulk home, and Maven-format plugins are installed inlib/m2/repository
under Embulk home.
Finally, Embulk System Properties gem_home
, gem_path
, and m2_repo
are overridden with the absolute paths of the chosen directories. gem_path
would be empty if not explicitly set.
On the other hand, environment variables GEM_HOME
, GEM_PATH
, and M2_REPO
are not overridden. They are simply ignored if 1 or 2 above matches.
Summary
You should be able to run Embulk v0.10.50 with the configurations above. Please try it before v0.11.0 is released in June 2023.
If you see any problem, please report it in our User forum.
Plans for v0.11.0 and later
We initially thought to move forward to v1.0 without adding many new features sooner after v0.11.0 is released. However, we now discuss implementing some requests before v1.0.
We have not decided anything, but we may go into v0.12 or v0.13 before v1.0 for such new features. But anyway, we do not intend to make significant incompatible changes like this time, such as v0.9 → v0.10 → v0.11.
Thank you for your continued support of Embulk.
Minimum things we wanted to do before v1.0
- Bug fixes based on feedback
- Support for plugin developers for v0.11
- Improving support for Maven-format plugins (especially installation)
- New testing framework for plugins
- Official support for newer Java
- Refactoring and reorganization of the code
Maybe some more to do before v1.0
(Note: no any decisions made)
- Enhancement in reporting errors and else from plugins
- Support for some Data Lineage-like functionality
-
For the background reason why it took so long, see “Embulk in TD, and in the future” in Treasure Data Developer Blog, if you are interested. ↩
-
Some of the information is duplicated with “For Embulk users: What will change in v0.11 and v1.0?” that was published about two years ago. But, this article in 2023 contains some further changes and notes. ↩
-
A typical case is using classes in
javax.xml.*
, which are removed from the standard API. It is not always easy to determine at a glance since some transitive dependencies may calljavax.xml.*
. ↩ -
Join us in maintaining Embulk if you think, “We don’t want that! Continue to support JRuby!” :) ↩