Patchworkβ json_quote_str should handle non-ASCII characters

login
register
about
Submitter Gregor Hoffleit
Date 2010-03-04 10:40:03
Message ID <1267697893-sup-4538@sam.mediasupervision.de>
Download mbox | patch
Permalink /patch/411/
State New
Headers show

Comments

Gregor Hoffleit - 2010-03-04 10:40:03
The current code in json_quote_str() only accepts strict printable ASCII
code points (i.e. 32-127), all other code points are dropped from the
JSON output.

This patch accepts code points 32-255.

json_quote_str() should handle non-ASCII characters.
---
 json.c |    4 ++--
  1 files changed, 2 insertions(+), 2 deletions(-)

--
1.7.0
Sebastian Spaeth - 2010-03-04 13:57:27
On 2010-03-04, Sebastian Spaeth wrote:
> The current code in json_quote_str() only accepts strict printable ASCII
> code points (i.e. 32-127), all other code points are dropped from the
> JSON output.

That would explain why my web interface does not display any umlaut
symbols.

Nice finding,
Sebastian
Carl Worth - 2010-04-13 15:36:06
On Thu, 04 Mar 2010 11:40:03 +0100, Gregor Hoffleit <gregor@hoffleit.de> wrote:
> The current code in json_quote_str() only accepts strict printable ASCII
> code points (i.e. 32-127), all other code points are dropped from the
> JSON output.
> 
> This patch accepts code points 32-255.

Thanks, Gregor!

I've pushed this patch out now.

And I noted in our TODO file that we really should add some tests to the
test suite to exercise our --format=json code paths. It would be nice to
ensure we don't have any regressions in this area later.

-Carl
Carl Worth - 2010-04-15 00:35:44
On Tue, 13 Apr 2010 18:37:57 +0200, Gregor Hoffleit <gregor@hoffleit.de> wrote:
> The test suite doesn't yet cover --format=json output nor UTF-8 in
> subject or body.
> 
> This patch starts with test cases for 'search --format=json' and
> 'show --format=json'.

Thanks for the tests, Gregor!

I was about to push this, but first noticed that I hadn't run the test
suite in the last day and that it had recently broken (oops!). I fixed
that, but then also noticed that I got failures with your tests.


> +execute_expecting "show --format=json 'json-show-message'" '[[[{"id":
> "'${gen_msg_id}'", "match": true, "filename": "'${gen_msg_filename}'",
> "date_unix": 946728000, "date_relative": "2000-01-01", "tags":
...
> +printf " Search message: json...\t"
> +add_message '[subject]="json-search-subject"' '[date]="Sat, 01 Jan 2000 12:00:00 -0000"' '[body]="json-search-message"'
> +execute_expecting "search --format=json 'json-search-message'" '[{"thread": "XXX",
> +"timestamp": 946724400,

I'm getting a timestamp value here of 946756800 which is clearly an
interpretation of the above date as if it were my local time zone. That
is:

	$ date -u +%s -d "Sat, 01 Jan 2000 12:00:00 -0800"
	946756800

And the value you have appears to have been generated in your timezone:

	$ date -u +%s -d "Sat, 01 Jan 2000 12:00:00 +0100"
	946724400

Meanwhile, the value that should be printed here[*] is the value from
interpreting the original date in the timezone explicitly specified in
that date:

	$ date -u +%s -d "Sat, 01 Jan 2000 12:00:00 -0000"
	946728000

Note that the "notmuch show --format=json" test above does have the
correct timestamp.

So, a double thanks for this test, it seems to have uncovered another
bug.

-Carl

[*] I say "should" because I don't believe we have any actual
specification of the data coming out of the JSON output yet. One other
thing that seems odd is the name of "date_unix" in the show output and
"timestamp" in the search output for what is effectively the same
field.
Michal Sojka - 2010-04-15 08:33:46
On Thu, 15 Apr 2010, Carl Worth wrote:
> On Tue, 13 Apr 2010 18:37:57 +0200, Gregor Hoffleit <gregor@hoffleit.de> wrote:
> > The test suite doesn't yet cover --format=json output nor UTF-8 in
> > subject or body.
> > 
> > This patch starts with test cases for 'search --format=json' and
> > 'show --format=json'.
> 
> Thanks for the tests, Gregor!

Carl,

are you still interrested in modular test suite from git? If so, could
you please look at id:87mxxg7bxo.fsf@steelpick.2x.cz and tell me your
opinion. I'm still updating the modularized tests to match the state in
master but every change in master takes me quite long time to convert.

-Michal
Carl Worth - 2010-04-15 19:58:58
On Thu, 15 Apr 2010 10:33:46 +0200, Michal Sojka <sojkam1@fel.cvut.cz> wrote:
> are you still interrested in modular test suite from git? If so, could
> you please look at id:87mxxg7bxo.fsf@steelpick.2x.cz and tell me your
> opinion. I'm still updating the modularized tests to match the state in
> master but every change in master takes me quite long time to convert.

Hi Michal,

I would still like to have a modular test suite, yes.

Thanks for pointing out that other message to me, which I had missed in
the general notmuch-mailing-list backlog I'm still dealing with. I've
now replied to it over there.

I am sorry that you keep having to re-do a bunch of work to keep your
patch up-to-date. I'm just about to push another change which might
further cause problems.

But you might actually like that change since it's one you requested in
your first version of the modular test suite. I'm dropping the annoying
execute_expecting macro that both runs notmuch and tests the
output. There's now a much cleaner separation such as:

	output=$($NOTMUCH search for-something)
	pass_if_equal "$output" "something was found"

I still think it wouldn't be hard to just gradually implement any
particular features we want in the test suite. But if the git thing ever
does become available, then that will be fine too.

-Carl
Michal Sojka - 2010-04-16 08:17:15
> But you might actually like that change since it's one you requested in
> your first version of the modular test suite. I'm dropping the annoying
> execute_expecting macro that both runs notmuch and tests the
> output. There's now a much cleaner separation such as:
> 
> 	output=$($NOTMUCH search for-something)
> 	pass_if_equal "$output" "something was found"

It's definitely better than before. The current implementation of
pass_if_equal has IMHO one drawback - if it compares multiline text and
there is a difference, it is quite hard to see where.

In my tests for maildir synchronization I use this approach:

  notmuch search tag:inbox | filter_output > actual &&
  diff -u - actual <<EOF
  thread:XXX   2000-01-01 [1/1] Notmuch Test Suite; test message 3 (inbox)
  EOF

Thanks to the usee of diff, I immediately see only the differences.

-Michal
David Edmondson - 2010-04-16 11:49:19
On Wed, 14 Apr 2010 17:35:44 -0700, Carl Worth <cworth@cworth.org> wrote:
> [*] I say "should" because I don't believe we have any actual
> specification of the data coming out of the JSON output yet. One other
> thing that seems odd is the name of "date_unix" in the show output and
> "timestamp" in the search output for what is effectively the same
> field.

The show output is updated to use `timestamp' in
id:1271418469-19031-1-git-send-email-dme@dme.org (just sent).

dme.
Carl Worth - 2010-04-22 21:14:31
On Fri, 16 Apr 2010 10:17:15 +0200, Michal Sojka <sojkam1@fel.cvut.cz> wrote:
> It's definitely better than before. The current implementation of
> pass_if_equal has IMHO one drawback - if it compares multiline text and
> there is a difference, it is quite hard to see where.
> 
> In my tests for maildir synchronization I use this approach:
> 
>   notmuch search tag:inbox | filter_output > actual &&
>   diff -u - actual <<EOF
>   thread:XXX   2000-01-01 [1/1] Notmuch Test Suite; test message 3 (inbox)
>   EOF
> 
> Thanks to the usee of diff, I immediately see only the differences.

I just added a very long multi-line test to the test suite.

So I took the opportunity to fix it to output the failures with diff. It
also saves both the complete expected and actual output (for any failed
tests) to files of the form test-XXX.expected and test-XXX.output.

Enjoy!

-Carl
Carl Worth - 2010-04-23 00:15:16
On Wed, 14 Apr 2010 17:35:44 -0700, Carl Worth <cworth@cworth.org> wrote:
> On Tue, 13 Apr 2010 18:37:57 +0200, Gregor Hoffleit <gregor@hoffleit.de> wrote:
> > The test suite doesn't yet cover --format=json output nor UTF-8 in
> > subject or body.
> > 
> > This patch starts with test cases for 'search --format=json' and
> > 'show --format=json'.
...
> So, a double thanks for this test, it seems to have uncovered another
> bug.

I've now fixed that bug, (hurrah for deleting code!), and pushed out
these tests, (which did have to be updated to apply to the current test
suite, but that was straightforward).

Thanks again for the tests.

-Carl

Patch

diff --git a/json.c b/json.c
index 9614143..6dc0345 100644
--- a/json.c
+++ b/json.c
@@ -59,7 +59,7 @@  json_quote_str(const void *ctx, const char *str)
 	return NULL;
 
     for (ptr = str; *ptr; len++, ptr++) {
-	if (*ptr < 32 || *ptr == '\"' || *ptr == '\\')
+	if ((unsigned char)(*ptr) < 32 || *ptr == '\"' || *ptr == '\\')
 	    len++;
     }
 
@@ -70,7 +70,7 @@  json_quote_str(const void *ctx, const char *str)
 
     *ptr2++ = '\"';
     while (*ptr) {
-	    if (*ptr > 31 && *ptr != '\"' && *ptr != '\\') {
+	    if ((unsigned char)(*ptr) > 31 && *ptr != '\"' && *ptr != '\\') {
 		*ptr2++ = *ptr++;
 	    } else {
 		*ptr2++ = '\\';